[ https://issues.apache.org/jira/browse/SPARK-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159534#comment-15159534 ]
Yin Huai commented on SPARK-13046: ---------------------------------- hmm... This is weird... I could not reproduce the problem. My setup is {code} yhuai$ ls -lsR /tmp/some_path/ total 16 0 -rwxrwxrwx 1 yhuai wheel 0 Feb 23 11:46 _SUCCESS 8 -rwxrwxrwx 1 yhuai wheel 216 Feb 23 11:46 _common_metadata 8 -rwxrwxrwx 1 yhuai wheel 723 Feb 23 11:46 _metadata 0 drwxr-xr-x 3 yhuai wheel 102 Feb 23 11:46 date_received=2016-01-13 0 drwxr-xr-x 3 yhuai wheel 102 Feb 23 11:46 date_received=2016-01-14 0 drwxr-xr-x 3 yhuai wheel 102 Feb 23 11:46 date_received=2016-01-15 /tmp/some_path//date_received=2016-01-13: total 0 0 drwxr-xr-x 5 yhuai wheel 170 Feb 23 11:47 fingerprint=2f6a09d370b4021d /tmp/some_path//date_received=2016-01-13/fingerprint=2f6a09d370b4021d: total 8 0 -rwxr-xr-x 1 yhuai wheel 0 Feb 23 11:47 _SUCCESS 8 -rwxrwxrwx 1 yhuai wheel 330 Feb 23 11:46 part-r-00002-f826790f-90b9-49f9-8f77-b82c64fe7a6f.gz.parquet /tmp/some_path//date_received=2016-01-14: total 0 0 drwxr-xr-x 5 yhuai wheel 170 Feb 23 11:47 fingerprint=2f6a09d370b4021d /tmp/some_path//date_received=2016-01-14/fingerprint=2f6a09d370b4021d: total 8 0 -rwxr-xr-x 1 yhuai wheel 0 Feb 23 11:47 _SUCCESS 8 -rwxrwxrwx 1 yhuai wheel 330 Feb 23 11:46 part-r-00005-f826790f-90b9-49f9-8f77-b82c64fe7a6f.gz.parquet /tmp/some_path//date_received=2016-01-15: total 0 0 drwxr-xr-x 5 yhuai wheel 170 Feb 23 11:47 fingerprint=2f6a09d370b4021d /tmp/some_path//date_received=2016-01-15/fingerprint=2f6a09d370b4021d: total 8 0 -rwxr-xr-x 1 yhuai wheel 0 Feb 23 11:47 _SUCCESS 8 -rwxrwxrwx 1 yhuai wheel 330 Feb 23 11:46 part-r-00007-f826790f-90b9-49f9-8f77-b82c64fe7a6f.gz.parquet {code} > Partitioning looks broken in 1.6 > -------------------------------- > > Key: SPARK-13046 > URL: https://issues.apache.org/jira/browse/SPARK-13046 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.6.0 > Reporter: Julien Baley > > Hello, > I have a list of files in s3: > {code} > s3://bucket/some_path/date_received=2016-01-13/fingerprint=2f6a09d370b4021d/{_SUCCESS,metadata,some > parquet files} > s3://bucket/some_path/date_received=2016-01-14/fingerprint=2f6a09d370b4021d/{_SUCCESS,metadata,some > parquet files} > s3://bucket/some_path/date_received=2016-01-15/fingerprint=2f6a09d370b4021d/{_SUCCESS,metadata,some > parquet files} > {code} > Until 1.5.2, it all worked well and passing s3://bucket/some_path/ (the same > for the three lines) would correctly identify 2 pairs of key/value, one > `date_received` and one `fingerprint`. > From 1.6.0, I get the following exception: > {code} > assertion failed: Conflicting directory structures detected. Suspicious paths > s3://bucket/some_path/date_received=2016-01-13 > s3://bucket/some_path/date_received=2016-01-14 > s3://bucket/some_path/date_received=2016-01-15 > {code} > That is to say, the partitioning code now fails to identify > date_received=2016-01-13 as a key/value pair. > I can see that there has been some activity on > spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala > recently, so that seems related (especially the commits > https://github.com/apache/spark/commit/7b5d9051cf91c099458d092a6705545899134b3b > and > https://github.com/apache/spark/commit/de289bf279e14e47859b5fbcd70e97b9d0759f14 > ). > If I read correctly the tests added in those commits: > -they don't seem to actually test the return value, only that it doesn't crash > -they only test cases where the s3 path contain 1 key/value pair (which > otherwise would catch the bug) > This is problematic for us as we're trying to migrate all of our spark > services to 1.6.0 and this bug is a real blocker. I know it's possible to > force a 'union', but I'd rather not do that if the bug can be fixed. > Any question, please shoot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org