[ 
https://issues.apache.org/jira/browse/SPARK-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159534#comment-15159534
 ] 

Yin Huai commented on SPARK-13046:
----------------------------------

hmm... This is weird... I could not reproduce the problem. 

My setup is
{code}
yhuai$ ls -lsR /tmp/some_path/
total 16
0 -rwxrwxrwx  1 yhuai  wheel    0 Feb 23 11:46 _SUCCESS
8 -rwxrwxrwx  1 yhuai  wheel  216 Feb 23 11:46 _common_metadata
8 -rwxrwxrwx  1 yhuai  wheel  723 Feb 23 11:46 _metadata
0 drwxr-xr-x  3 yhuai  wheel  102 Feb 23 11:46 date_received=2016-01-13
0 drwxr-xr-x  3 yhuai  wheel  102 Feb 23 11:46 date_received=2016-01-14
0 drwxr-xr-x  3 yhuai  wheel  102 Feb 23 11:46 date_received=2016-01-15

/tmp/some_path//date_received=2016-01-13:
total 0
0 drwxr-xr-x  5 yhuai  wheel  170 Feb 23 11:47 fingerprint=2f6a09d370b4021d

/tmp/some_path//date_received=2016-01-13/fingerprint=2f6a09d370b4021d:
total 8
0 -rwxr-xr-x  1 yhuai  wheel    0 Feb 23 11:47 _SUCCESS
8 -rwxrwxrwx  1 yhuai  wheel  330 Feb 23 11:46 
part-r-00002-f826790f-90b9-49f9-8f77-b82c64fe7a6f.gz.parquet

/tmp/some_path//date_received=2016-01-14:
total 0
0 drwxr-xr-x  5 yhuai  wheel  170 Feb 23 11:47 fingerprint=2f6a09d370b4021d

/tmp/some_path//date_received=2016-01-14/fingerprint=2f6a09d370b4021d:
total 8
0 -rwxr-xr-x  1 yhuai  wheel    0 Feb 23 11:47 _SUCCESS
8 -rwxrwxrwx  1 yhuai  wheel  330 Feb 23 11:46 
part-r-00005-f826790f-90b9-49f9-8f77-b82c64fe7a6f.gz.parquet

/tmp/some_path//date_received=2016-01-15:
total 0
0 drwxr-xr-x  5 yhuai  wheel  170 Feb 23 11:47 fingerprint=2f6a09d370b4021d

/tmp/some_path//date_received=2016-01-15/fingerprint=2f6a09d370b4021d:
total 8
0 -rwxr-xr-x  1 yhuai  wheel    0 Feb 23 11:47 _SUCCESS
8 -rwxrwxrwx  1 yhuai  wheel  330 Feb 23 11:46 
part-r-00007-f826790f-90b9-49f9-8f77-b82c64fe7a6f.gz.parquet
{code}



> Partitioning looks broken in 1.6
> --------------------------------
>
>                 Key: SPARK-13046
>                 URL: https://issues.apache.org/jira/browse/SPARK-13046
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.0
>            Reporter: Julien Baley
>
> Hello,
> I have a list of files in s3:
> {code}
> s3://bucket/some_path/date_received=2016-01-13/fingerprint=2f6a09d370b4021d/{_SUCCESS,metadata,some
>  parquet files}
> s3://bucket/some_path/date_received=2016-01-14/fingerprint=2f6a09d370b4021d/{_SUCCESS,metadata,some
>  parquet files}
> s3://bucket/some_path/date_received=2016-01-15/fingerprint=2f6a09d370b4021d/{_SUCCESS,metadata,some
>  parquet files}
> {code}
> Until 1.5.2, it all worked well and passing s3://bucket/some_path/ (the same 
> for the three lines) would correctly identify 2 pairs of key/value, one 
> `date_received` and one `fingerprint`.
> From 1.6.0, I get the following exception:
> {code}
> assertion failed: Conflicting directory structures detected. Suspicious paths
> s3://bucket/some_path/date_received=2016-01-13
> s3://bucket/some_path/date_received=2016-01-14
> s3://bucket/some_path/date_received=2016-01-15
> {code}
> That is to say, the partitioning code now fails to identify 
> date_received=2016-01-13 as a key/value pair.
> I can see that there has been some activity on 
> spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
>  recently, so that seems related (especially the commits 
> https://github.com/apache/spark/commit/7b5d9051cf91c099458d092a6705545899134b3b
>   and 
> https://github.com/apache/spark/commit/de289bf279e14e47859b5fbcd70e97b9d0759f14
>  ).
> If I read correctly the tests added in those commits:
> -they don't seem to actually test the return value, only that it doesn't crash
> -they only test cases where the s3 path contain 1 key/value pair (which 
> otherwise would catch the bug)
> This is problematic for us as we're trying to migrate all of our spark 
> services to 1.6.0 and this bug is a real blocker. I know it's possible to 
> force a 'union', but I'd rather not do that if the bug can be fixed.
> Any question, please shoot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to