Re: Writing wide parquet file in Spark SQL
This article by Ryan Blue should be helpful to understand the problem http://ingest.tips/2015/01/31/parquet-row-group-size/ The TL;DR is, you may decrease |parquet.block.size| to reduce memory consumption. Anyway, 100K columns is a really big burden for Parquet, but I guess your data should be pretty sparse. Cheng On 3/11/15 4:13 AM, kpeng1 wrote: Hi All, I am currently trying to write a very wide file into parquet using spark sql. I have 100K column records that I am trying to write out, but of course I am running into space issues(out of memory - heap space). I was wondering if there are any tweaks or work arounds for this. I am basically calling saveAsParquetFile on the schemaRDD. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-wide-parquet-file-in-Spark-SQL-tp21995.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Writing wide parquet file in Spark SQL
Even I am keen to learn an answer for this but as an alternate you can use hive to create a table stored as parquet and then use it in spark. On Wed, Mar 11, 2015 at 1:44 AM kpeng1 kpe...@gmail.com wrote: Hi All, I am currently trying to write a very wide file into parquet using spark sql. I have 100K column records that I am trying to write out, but of course I am running into space issues(out of memory - heap space). I was wondering if there are any tweaks or work arounds for this. I am basically calling saveAsParquetFile on the schemaRDD. -- View this message in context: http://apache-spark-user-list. 1001560.n3.nabble.com/Writing-wide-parquet-file-in-Spark-SQL-tp21995.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Writing wide parquet file in Spark SQL
Hi All, I am currently trying to write a very wide file into parquet using spark sql. I have 100K column records that I am trying to write out, but of course I am running into space issues(out of memory - heap space). I was wondering if there are any tweaks or work arounds for this. I am basically calling saveAsParquetFile on the schemaRDD. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-wide-parquet-file-in-Spark-SQL-tp21995.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org