SparkSQL saveAsParquetFile does not preserve AVRO schema

storm Tue, 25 Aug 2015 05:14:15 -0700

Hi,

I have serious problems with saving DataFrame as parquet file. 
I read the data from the parquet file like this:


val df = sparkSqlCtx.parquetFile(inputFile.toString)

and print the schema (you can see both fields are required)

root
 |-- time: long (nullable = false)
 |-- time_ymdhms: long (nullable = false)
...omitted...

Now I try to save DataFrame as parquet file like this:

df.saveAsParquetFile(outputFile.toString)

The code runs normally, but loading the file, which I have saved in the
previous step (outputFile)  together with the same inputFile fails with this
error:

Caused by: parquet.schema.IncompatibleSchemaModificationException:
repetition constraint is more restrictive: can not merge type required int64
time into optional int64 time

The problem is that saveAsParquetFile does not preserve nullable flags! So
once I try to load outputFile parquet file and print the schema I get this:

root
 |-- time: long (nullable = true)
 |-- time_ymdhms: long (nullable = true)
...omitted...

I use Spark 1.3.0 with Parquet 1.6.0
Is it somehow possible to keep also these flags? Or is it a bug?

Any help will be appreciated.
Thanks in advance!

Petr



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-saveAsParquetFile-does-not-preserve-AVRO-schema-tp24444.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

SparkSQL saveAsParquetFile does not preserve AVRO schema

Reply via email to