Rex Xiong created SPARK-6384: -------------------------------- Summary: saveAsParquet doesn't clean up attempt_* folders Key: SPARK-6384 URL: https://issues.apache.org/jira/browse/SPARK-6384 Project: Spark Issue Type: Bug Components: SQL Reporter: Rex Xiong
After calling SchemaRDD.saveAsParquet, it runs well and generate *.parquet, _SUCCESS, _common_metadata, _metadata files successfully. But sometimes, there will be some attempt_* folder (e.g. attempt_201503170229_0006_r_000006_736, attempt_201503170229_0006_r_000404_416) under the same folder, it contains one parquet file, seems to be a working temp folder. It happens even though _SUCCESS file created. In this situation, SparkSQL throws exception when loading this parquet folder: Error: java.io.FileNotFoundException: Path is not a file: ............../attempt_201503170229_0006_r_000006_736 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.ja va:69) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.ja va:55) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations UpdateTimes(FSNamesystem.java:1728) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations Int(FSNamesystem.java:1671) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations (FSNamesystem.java:1651) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations (FSNamesystem.java:1625) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLoca tions(NameNodeRpcServer.java:503) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTra nslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:32 2) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cl ientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.cal l(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma tion.java:1594) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) (state=,co de=0) I'm not sure whether it's a Spark bug or a Parquet bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org