[ 
https://issues.apache.org/jira/browse/SPARK-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093150#comment-14093150
 ] 

Yin Huai commented on SPARK-2700:
---------------------------------

Can we resolve it?

> Hidden files (such as .impala_insert_staging) should be filtered out by 
> sqlContext.parquetFile
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2700
>                 URL: https://issues.apache.org/jira/browse/SPARK-2700
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.0.1
>            Reporter: Teng Qiu
>             Fix For: 1.1.0
>
>
> when creating a table in impala, a hidden folder .impala_insert_staging will 
> be created in the folder of table.
> if we want to load such a table using Spark SQL API sqlContext.parquetFile, 
> this hidden folder makes trouble, spark try to get metadata from this folder, 
> you will see the exception:
> {code:borderStyle=solid}
> Caused by: java.io.IOException: Could not read footer for file 
> FileStatus{path=hdfs://xxx:8020/user/hive/warehouse/parquet_strings/.impala_insert_staging;
>  isDirectory=true; modification_time=1406333729252; access_time=0; 
> owner=hdfs; group=hdfs; permission=rwxr-xr-x; isSymlink=false}
> ...
> ...
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is 
> not a file: /user/hive/warehouse/parquet_strings/.impala_insert_staging
> {code}
> and impala side do not think this is their problem: 
> https://issues.cloudera.org/browse/IMPALA-837 (IMPALA-837 Delete 
> .impala_insert_staging directory after INSERT)
> so maybe we should filter out these hidden folder/file by reading parquet 
> tables



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to