[ https://issues.apache.org/jira/browse/SPARK-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-8037: ----------------------------------- Assignee: Cheng Lian (was: Apache Spark) > Ignores files whose name starts with "." while enumerating files in > HadoopFsRelation > ------------------------------------------------------------------------------------ > > Key: SPARK-8037 > URL: https://issues.apache.org/jira/browse/SPARK-8037 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.0 > Reporter: Cheng Lian > Assignee: Cheng Lian > Priority: Minor > > Temporary files like {{.DS_Store}} generated by Mac OS X finder may cause > trouble for partition discovery. A directory whose layout looks like the > following > {noformat} > > find parquet_partitioned > parquet_partitioned > parquet_partitioned/._common_metadata.crc > parquet_partitioned/._metadata.crc > parquet_partitioned/._SUCCESS.crc > parquet_partitioned/_common_metadata > parquet_partitioned/_metadata > parquet_partitioned/_SUCCESS > parquet_partitioned/year=2014/.DS_Store > parquet_partitioned/year=2014/month=9 > parquet_partitioned/year=2014/month=9/.DS_Store > parquet_partitioned/year=2014/month=9/day=1/.DS_Store > parquet_partitioned/year=2014/month=9/day=1/.part-r-00008.gz.parquet.crc > parquet_partitioned/year=2014/month=9/day=1/part-r-00008.gz.parquet > parquet_partitioned/year=2015 > parquet_partitioned/year=2015/month=10 > parquet_partitioned/year=2015/month=10/day=25 > parquet_partitioned/year=2015/month=10/day=25/.part-r-00002.gz.parquet.crc > parquet_partitioned/year=2015/month=10/day=25/.part-r-00004.gz.parquet.crc > parquet_partitioned/year=2015/month=10/day=25/part-r-00002.gz.parquet > parquet_partitioned/year=2015/month=10/day=25/part-r-00004.gz.parquet > parquet_partitioned/year=2015/month=10/day=26 > parquet_partitioned/year=2015/month=10/day=26/.part-r-00005.gz.parquet.crc > parquet_partitioned/year=2015/month=10/day=26/part-r-00005.gz.parquet > parquet_partitioned/year=2015/month=9 > parquet_partitioned/year=2015/month=9/day=1 > parquet_partitioned/year=2015/month=9/day=1/.part-r-00007.gz.parquet.crc > parquet_partitioned/year=2015/month=9/day=1/part-r-00007.gz.parquet > {noformat} > causes exception like this: > {noformat} > scala> val df = sqlContext.read.parquet("parquet_partitioned") > java.lang.AssertionError: assertion failed: Conflicting partition column > names detected: > ArrayBuffer(year, month) > ArrayBuffer(year) > ArrayBuffer(year, month, day) > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.sql.sources.PartitioningUtils$.resolvePartitions(PartitioningUtils.scala:189) > at > org.apache.spark.sql.sources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:87) > at > org.apache.spark.sql.sources.HadoopFsRelation.org$apache$spark$sql$sources$HadoopFsRelation$$discoverPartitions(interfaces.scala:492) > at > org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:449) > at > org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:448) > {noformat} > This is because {{.DS_Store}} files are considered as a data file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org