.
Best Regards,
Raymond Liu
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Wednesday, April 30, 2014 1:22 PM
To: user@spark.apache.org
Subject: Re: How fast would you expect shuffle serialize to be?
Hm - I'm still not sure if you mean
100MB/s for each task
Hi
I am running a WordCount program which count words from HDFS, and I
noticed that the serializer part of code takes a lot of CPU time. On a
16core/32thread node, the total throughput is around 50MB/s by JavaSerializer,
and if I switching to KryoSerializer, it doubles to around
Is this the serialization throughput per task or the serialization
throughput for all the tasks?
On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote:
Hi
I am running a WordCount program which count words from HDFS, and I
noticed that the serializer part of code
For all the tasks, say 32 task on total
Best Regards,
Raymond Liu
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Is this the serialization throughput per task or the serialization throughput
for all the tasks?
On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond
By the way, to be clear, I run repartition firstly to make all data go through
shuffle instead of run ReduceByKey etc directly ( which reduce the data need to
be shuffle and serialized), thus say all 50MB/s data from HDFS will go to
serializer. ( in fact, I also tried generate data in memory
Later case, total throughput aggregated from all cores.
Best Regards,
Raymond Liu
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Wednesday, April 30, 2014 1:22 PM
To: user@spark.apache.org
Subject: Re: How fast would you expect shuffle serialize to be?
Hm