Re: very slow parquet file write

2016-09-16 Thread tosaigan...@gmail.com
Hi, try this conf val sc = new SparkContext(conf) sc.hadoopConfiguration.setBoolean("parquet.enable.summary-metadata", false) Regards, Sai Ganesh On Thu, Sep 15, 2016 at 11:34 PM, gaurav24 [via Apache Spark User List] < ml-node+s1001560n27738...@n3.nabble.com> wrote: > Hi Rok, > > facing

Re: very slow parquet file write

2015-11-14 Thread Sabarish Sasidharan
How are you writing it out? Can you post some code? Regards Sab On 14-Nov-2015 5:21 am, "Rok Roskar" wrote: > I'm not sure what you mean? I didn't do anything specifically to partition > the columns > On Nov 14, 2015 00:38, "Davies Liu" wrote: > >>

Re: very slow parquet file write

2015-11-13 Thread Davies Liu
Have you use any partitioned columns when write as json or parquet? On Fri, Nov 6, 2015 at 6:53 AM, Rok Roskar wrote: > yes I was expecting that too because of all the metadata generation and > compression. But I have not seen performance this bad for other parquet > files

Re: very slow parquet file write

2015-11-13 Thread Rok Roskar
I'm not sure what you mean? I didn't do anything specifically to partition the columns On Nov 14, 2015 00:38, "Davies Liu" wrote: > Do you have partitioned columns? > > On Thu, Nov 5, 2015 at 2:08 AM, Rok Roskar wrote: > > I'm writing a ~100 Gb

Re: very slow parquet file write

2015-11-13 Thread Davies Liu
Do you have partitioned columns? On Thu, Nov 5, 2015 at 2:08 AM, Rok Roskar wrote: > I'm writing a ~100 Gb pyspark DataFrame with a few hundred partitions into a > parquet file on HDFS. I've got a few hundred nodes in the cluster, so for > the size of file this is way

Re: very slow parquet file write

2015-11-06 Thread Cheng Lian
I'd expect writing Parquet files slower than writing JSON files since Parquet involves more complicated encoders, but maybe not that slow. Would you mind to try to profile one Spark executor using tools like YJP to see what's the hotspot? Cheng On 11/6/15 7:34 AM, rok wrote: Apologies if

Re: very slow parquet file write

2015-11-06 Thread Jörn Franke
Do you use some compression? Maybe there is some activated by default in your Hadoop environment? > On 06 Nov 2015, at 00:34, rok wrote: > > Apologies if this appears a second time! > > I'm writing a ~100 Gb pyspark DataFrame with a few hundred partitions into a >

Re: very slow parquet file write

2015-11-06 Thread Rok Roskar
yes I was expecting that too because of all the metadata generation and compression. But I have not seen performance this bad for other parquet files I’ve written and was wondering if there could be something obvious (and wrong) to do with how I’ve specified the schema etc. It’s a very simple

Re: very slow parquet file write

2015-11-06 Thread Cheng Lian
On 11/6/15 10:53 PM, Rok Roskar wrote: yes I was expecting that too because of all the metadata generation and compression. But I have not seen performance this bad for other parquet files I’ve written and was wondering if there could be something obvious (and wrong) to do with how I’ve