Hi Matthes,

Can you post an example of your schema? When you refer to nesting, are you 
referring to optional columns, nested schemas, or tables where there are 
repeated values? Parquet uses run-length encoding to compress down columns with 
repeated values, which is the case that your example seems to refer to. The 
point Matt is making in his post is that if you have a Parquet files with 
contain records with a nested schema, e.g.:

record MyNestedSchema {
  int nestedSchemaField;
}

record MySchema {
  int nonNestedField;
  MyNestedSchema nestedRecord;
}

Not all systems support queries against these schemas. If you want to load the 
data directly into Spark, it isn’t an issue. I’m not familiar with how SparkSQL 
is handling this, but I believe the bit you quoted is saying that support for 
nested queries (e.g., select ... from … where nestedRecord.nestedSchemaField == 
0) will be added in Spark 1.0.1 (which is currently available, BTW).

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

On Sep 26, 2014, at 7:38 AM, matthes <mdiekst...@sensenetworks.com> wrote:

> Thank you Jey,
> 
> That is a nice introduction but it is a may be to old (AUG 21ST, 2013)
> 
> "Note: If you keep the schema flat (without nesting), the Parquet files you
> create can be read by systems like Shark and Impala. These systems allow you
> to query Parquet files as tables using SQL-like syntax. The Parquet files
> created by this sample application could easily be queried using Shark for
> example."
> 
> But in this post
> (http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-td8377.html)
> I found this: Nested parquet is not supported in 1.0, but is part of the
> upcoming 1.0.1 release.
> 
> So the question now is, can I use it in the benefit way of nested parquet
> files to find fast with sql or do I have to write a special map/reduce job
> to transform and find my data?
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-use-Parquet-with-Dremel-encoding-tp15186p15234.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to