Fantastic - glad to see that it's in the pipeline!
On Wed, Jan 7, 2015 at 11:27 AM, Michael Armbrust mich...@databricks.com
wrote:
I want to support this but we don't yet. Here is the JIRA:
https://issues.apache.org/jira/browse/SPARK-3851
On Tue, Jan 6, 2015 at 5:23 PM, Adam Gilmore
Anyone got any further thoughts on this? I saw the _metadata file seems to
store the schema of every single part (i.e. file) in the parquet directory,
so in theory it should be possible.
Effectively, our use case is that we have a stack of JSON that we receive
and we want to encode to Parquet
I want to support this but we don't yet. Here is the JIRA:
https://issues.apache.org/jira/browse/SPARK-3851
On Tue, Jan 6, 2015 at 5:23 PM, Adam Gilmore dragoncu...@gmail.com wrote:
Anyone got any further thoughts on this? I saw the _metadata file seems
to store the schema of every single
I saw that in the source, which is why I was wondering.
I was mainly reading:
http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/
A query that tries to parse the organizationId and userId from the 2
logTypes should be able to do so correctly, though they are positioned
differently
Hi all,
I understand that parquet allows for schema versioning automatically in the
format; however, I'm not sure whether Spark supports this.
I'm saving a SchemaRDD to a parquet file, registering it as a table, then
doing an insertInto with a SchemaRDD with an extra column.
The second