[ 
https://issues.apache.org/jira/browse/SPARK-13207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133529#comment-15133529
 ] 

Yin Huai commented on SPARK-13207:
----------------------------------

It will be better to let partitioning discovery ignore files/dirs starting with 
"_" or ".". But, we need to change parquet to not rely on leaf files to get 
those metadata files (see 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala#L421-L422).
 I am thinking we can get a simple fix in master and 1.6 branch and then have a 
better fix in master.

> _SUCCESS should not break partition discovery
> ---------------------------------------------
>
>                 Key: SPARK-13207
>                 URL: https://issues.apache.org/jira/browse/SPARK-13207
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> Partitioning discovery will fail with the following case
> {code}
> test("_SUCCESS should not break partitioning discovery") {
>     withTempPath { dir =>
>       val tablePath = new File(dir, "table")
>       val df = (1 to 3).map(i => (i, i, i, i)).toDF("a", "b", "c", "d")
>       df.write
>         .format("parquet")
>         .partitionBy("b", "c", "d")
>         .save(tablePath.getCanonicalPath)
>       Files.touch(new File(s"${tablePath.getCanonicalPath}/b=1", "_SUCCESS"))
>       Files.touch(new File(s"${tablePath.getCanonicalPath}/b=1/c=1", 
> "_SUCCESS"))
>       Files.touch(new File(s"${tablePath.getCanonicalPath}/b=1/c=1/d=1", 
> "_SUCCESS"))
>       
> checkAnswer(sqlContext.read.format("parquet").load(tablePath.getCanonicalPath),
>  df)
>     }
>   }
> {code}
> Because {{_SUCCESS}} is the in the inner partitioning dirs, partitioning 
> discovery will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to