[ https://issues.apache.org/jira/browse/SPARK-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-15743: ------------------------------------- Labels: releasenotes (was: ) > Prevent saving with all-column partitioning > ------------------------------------------- > > Key: SPARK-15743 > URL: https://issues.apache.org/jira/browse/SPARK-15743 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Dongjoon Hyun > Labels: releasenotes > > When saving datasets on storage, `partitionBy` provides an easy way to > construct the directory structure. However, if a user choose all columns as > partition columns, some exceptions occurs. > - ORC: `AnalysisException` on **future read** due to schema inference failure. > - Parquet: `InvalidSchemaException` on **write execution** due to Parquet > limitation. > The followings are the examples. > **ORC with all column partitioning** > {code} > scala> > spark.range(10).write.format("orc").mode("overwrite").partitionBy("id").save("/tmp/data") > > > scala> spark.read.format("orc").load("/tmp/data").collect() > org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC at > /tmp/data. It must be specified manually; > {code} > **Parquet with all-column partitioning** > {code} > scala> > spark.range(100).write.format("parquet").mode("overwrite").partitionBy("id").save("/tmp/data") > [Stage 0:> (0 + 8) / > 8]16/06/02 16:51:17 ERROR Utils: Aborting task > org.apache.parquet.schema.InvalidSchemaException: A group type can not be > empty. Parquet does not support empty group without leaves. Empty group: > spark_schema > ... (lots of error messages) > {code} > Although some formats like JSON support all-column partitioning without any > problem, it seems not a good idea to make lots of empty directories. > This issue prevents this by consistently raising `AnalysisException` before > saving. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org