Re: very slow parquet file write

2016-09-16 Thread tosaigan...@gmail.com
added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/very- > slow-parquet-file-write-tp25295p27738.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User Li

Re: very slow parquet file write

2015-11-14 Thread Sabarish Sasidharan
How are you writing it out? Can you post some code? Regards Sab On 14-Nov-2015 5:21 am, "Rok Roskar" wrote: > I'm not sure what you mean? I didn't do anything specifically to partition > the columns > On Nov 14, 2015 00:38, "Davies Liu" wrote: > >>

Re: very slow parquet file write

2015-11-13 Thread Davies Liu
ion. > > I'll see what I can do about running a profiler -- can you point me to a > resource/example? > > Thanks, > > Rok > > ps: my post on the mailing list is still listed as not accepted by the > mailing list: > http://apache-spark-user-list.1001560.n3.nabbl

Re: very slow parquet file write

2015-11-13 Thread Rok Roskar
I'm not sure what you mean? I didn't do anything specifically to partition the columns On Nov 14, 2015 00:38, "Davies Liu" wrote: > Do you have partitioned columns? > > On Thu, Nov 5, 2015 at 2:08 AM, Rok Roskar wrote: > > I'm writing a ~100 Gb

Re: very slow parquet file write

2015-11-13 Thread Davies Liu
Do you have partitioned columns? On Thu, Nov 5, 2015 at 2:08 AM, Rok Roskar wrote: > I'm writing a ~100 Gb pyspark DataFrame with a few hundred partitions into a > parquet file on HDFS. I've got a few hundred nodes in the cluster, so for > the size of file this is way

Re: very slow parquet file write

2015-11-06 Thread Cheng Lian
-spark-user-list.1001560.n3.nabble.com/very-slow-parquet-file-write-tp25295.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: very slow parquet file write

2015-11-06 Thread Jörn Franke
cessors are fully loaded while the disk I/O is > essentially zero for long periods of time. I don't see any obvious garbage > collection issues and there are no problems with memory. > > Any ideas on how to debug/fix this? > > Thanks! > > > > -- > View this mess

Re: very slow parquet file write

2015-11-06 Thread Rok Roskar
by the mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/very-slow-parquet-file-write-td25295.html -- none of your responses are there either. I am definitely subscribed to the list though (I get daily digests). Any clue how to fix it? On Nov 6, 2015, at 9:26 AM, Cheng Lian

Re: very slow parquet file write

2015-11-06 Thread Cheng Lian
: https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit Thanks, Rok ps: my post on the mailing list is still listed as not accepted by the mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/very-slow-parquet-file-write-td25295.html -- none

very slow parquet file write

2015-11-05 Thread Rok Roskar
I'm writing a ~100 Gb pyspark DataFrame with a few hundred partitions into a parquet file on HDFS. I've got a few hundred nodes in the cluster, so for the size of file this is way over-provisioned (I've tried it with fewer partitions and fewer nodes, no obvious effect). I was expecting the dump to

very slow parquet file write

2015-11-05 Thread rok
in context: http://apache-spark-user-list.1001560.n3.nabble.com/very-slow-parquet-file-write-tp25295.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org