Re: Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Michael Armbrust
For compatibility reasons, we always write data out as nullable in parquet. Given that that bit is only an optimization that we don't actually make much use of, I'm curious why you are worried that its changing to true? On Tue, Oct 20, 2015 at 8:24 AM, Jerry Lam wrote: >

Re: Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Xiao Li
Let me share my 2 cents. First, this is not documented in the official document. Maybe we should do it? http://spark.apache.org/docs/latest/sql-programming-guide.html Second, nullability is a significant concept in the database people. It is part of schema. Extra codes are needed for evaluating

Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Jerry Lam
Hi Spark users and developers, I have a dataframe with the following schema (Spark 1.5.1): StructType(StructField(type,StringType,true), StructField(timestamp,LongType,false)) After I save the dataframe in parquet and read it back, I get the following schema:

Re: Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Michael Armbrust
> > First, this is not documented in the official document. Maybe we should do > it? http://spark.apache.org/docs/latest/sql-programming-guide.html > Pull requests welcome. > Second, nullability is a significant concept in the database people. It is > part of schema. Extra codes are needed for

Re: Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Xiao Li
Sure. Will try to do a pull request this week. Schema evolution is always painful for database people. IMO, NULL is a bad design in the original system R. It introduces a lot of problems during the system migration and data integration. Let me find a possible scenario: RDBMS is used as an ODS.

Re: Spark SQL: Preserving Dataframe Schema

2015-10-20 Thread Richard Hillegas
ich...@databricks.com> > Cc: Jerry Lam <chiling...@gmail.com>, "user@spark.apache.org" > <user@spark.apache.org> > Date: 10/20/2015 01:18 PM > Subject: Re: Spark SQL: Preserving Dataframe Schema > > Sure. Will try to do a pull request this week. >