added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/very-
> slow-parquet-file-write-tp25295p27738.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User Li
How are you writing it out? Can you post some code?
Regards
Sab
On 14-Nov-2015 5:21 am, "Rok Roskar" wrote:
> I'm not sure what you mean? I didn't do anything specifically to partition
> the columns
> On Nov 14, 2015 00:38, "Davies Liu" wrote:
>
>>
ion.
>
> I'll see what I can do about running a profiler -- can you point me to a
> resource/example?
>
> Thanks,
>
> Rok
>
> ps: my post on the mailing list is still listed as not accepted by the
> mailing list:
> http://apache-spark-user-list.1001560.n3.nabbl
I'm not sure what you mean? I didn't do anything specifically to partition
the columns
On Nov 14, 2015 00:38, "Davies Liu" wrote:
> Do you have partitioned columns?
>
> On Thu, Nov 5, 2015 at 2:08 AM, Rok Roskar wrote:
> > I'm writing a ~100 Gb
Do you have partitioned columns?
On Thu, Nov 5, 2015 at 2:08 AM, Rok Roskar wrote:
> I'm writing a ~100 Gb pyspark DataFrame with a few hundred partitions into a
> parquet file on HDFS. I've got a few hundred nodes in the cluster, so for
> the size of file this is way
-spark-user-list.1001560.n3.nabble.com/very-slow-parquet-file-write-tp25295.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
cessors are fully loaded while the disk I/O is
> essentially zero for long periods of time. I don't see any obvious garbage
> collection issues and there are no problems with memory.
>
> Any ideas on how to debug/fix this?
>
> Thanks!
>
>
>
> --
> View this mess
by the
mailing list:
http://apache-spark-user-list.1001560.n3.nabble.com/very-slow-parquet-file-write-td25295.html
-- none of your responses are there either. I am definitely subscribed to
the list though (I get daily digests). Any clue how to fix it?
On Nov 6, 2015, at 9:26 AM, Cheng Lian
:
https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit
Thanks,
Rok
ps: my post on the mailing list is still listed as not accepted by the
mailing list:
http://apache-spark-user-list.1001560.n3.nabble.com/very-slow-parquet-file-write-td25295.html
-- none
I'm writing a ~100 Gb pyspark DataFrame with a few hundred partitions into
a parquet file on HDFS. I've got a few hundred nodes in the cluster, so for
the size of file this is way over-provisioned (I've tried it with fewer
partitions and fewer nodes, no obvious effect). I was expecting the dump to
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/very-slow-parquet-file-write-tp25295.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
11 matches
Mail list logo