Hi Matthes, Can you post an example of your schema? When you refer to nesting, are you referring to optional columns, nested schemas, or tables where there are repeated values? Parquet uses run-length encoding to compress down columns with repeated values, which is the case that your example seems to refer to. The point Matt is making in his post is that if you have a Parquet files with contain records with a nested schema, e.g.:
record MyNestedSchema { int nestedSchemaField; } record MySchema { int nonNestedField; MyNestedSchema nestedRecord; } Not all systems support queries against these schemas. If you want to load the data directly into Spark, it isn’t an issue. I’m not familiar with how SparkSQL is handling this, but I believe the bit you quoted is saying that support for nested queries (e.g., select ... from … where nestedRecord.nestedSchemaField == 0) will be added in Spark 1.0.1 (which is currently available, BTW). Regards, Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 On Sep 26, 2014, at 7:38 AM, matthes <mdiekst...@sensenetworks.com> wrote: > Thank you Jey, > > That is a nice introduction but it is a may be to old (AUG 21ST, 2013) > > "Note: If you keep the schema flat (without nesting), the Parquet files you > create can be read by systems like Shark and Impala. These systems allow you > to query Parquet files as tables using SQL-like syntax. The Parquet files > created by this sample application could easily be queried using Shark for > example." > > But in this post > (http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-td8377.html) > I found this: Nested parquet is not supported in 1.0, but is part of the > upcoming 1.0.1 release. > > So the question now is, can I use it in the benefit way of nested parquet > files to find fast with sql or do I have to write a special map/reduce job > to transform and find my data? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-use-Parquet-with-Dremel-encoding-tp15186p15234.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org