Hi again! At the moment I try to use parquet and I want to keep the data into the memory in an efficient way to make requests against the data as fast as possible. I read about parquet it is able to encode nested columns. Parquet uses the Dremel encoding with definition and repetition levels. Is it at the moment possible to use this in spark as well or is it actually not implemented? If yes, I’m not sure how to do it. I saw some examples, they try to put some arrays or case classes in other case classes, nut I don’t think that is the right way. The other thing that I saw in this relation was SchemaRDDs.
Input: Col1 | Col2 | Col3 | Col4 Int | long | long | int --------------------------------------------- 14 | 1234 | 1422 | 3 14 | 3212 | 1542 | 2 14 | 8910 | 1422 | 8 15 | 1234 | 1542 | 9 15 | 8897 | 1422 | 13 Want this Parquet-format: Col3 | Col1 | Col4 | Col2 long | int | int | long -------------------------------------------- 1422 | 14 | 3 | 1234 “ | “ | 8 | 8910 “ | 15 | 13 | 8897 1542 | 14 | 2 | 3212 “ | 15 | 9 | 1234 It would be awesome if somebody could give me a good hint how can I do that or maybe a better way. Best, Matthes -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-use-Parquet-with-Dremel-encoding-tp15186.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org