RE: How fast would you expect shuffle serialize to be?

2014-04-30 Thread Liu, Raymond
I just tried to use serializer to write object directly in local mode with code:

val datasize =  args(1).toInt
val dataset = (0 until datasize).map( i = (asmallstring, i))

val out: OutputStream = {
new BufferedOutputStream(new FileOutputStream(args(2)), 1024 * 100)
  }

val ser = SparkEnv.get.serializer.newInstance()
val serOut = ser.serializeStream(out)

dataset.foreach( value =
  serOut.writeObject(value)
)
serOut.flush()
serOut.close()

Thus one core on one disk. When using javaserializer, throughput is 10~12MB/s, 
and kryo doubles. So it seems to me that when running the full path code in my 
previous case, 32 core with 50MB/s total throughput are reasonable?


Best Regards,
Raymond Liu


-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com] 


Later case, total throughput aggregated from all cores.

Best Regards,
Raymond Liu


-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Wednesday, April 30, 2014 1:22 PM
To: user@spark.apache.org
Subject: Re: How fast would you expect shuffle serialize to be?

Hm - I'm still not sure if you mean
100MB/s for each task = 3200MB/s across all cores
-or-
3.1MB/s for each task = 100MB/s across all cores

If it's the second one, that's really slow and something is wrong. If it's the 
first one this in the range of what I'd expect, but I'm no expert.

On Tue, Apr 29, 2014 at 10:14 PM, Liu, Raymond raymond@intel.com wrote:
 For all the tasks, say 32 task on total

 Best Regards,
 Raymond Liu


 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]

 Is this the serialization throughput per task or the serialization throughput 
 for all the tasks?

 On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote:
 Hi

 I am running a WordCount program which count words from HDFS, 
 and I noticed that the serializer part of code takes a lot of CPU 
 time. On a 16core/32thread node, the total throughput is around 
 50MB/s by JavaSerializer, and if I switching to KryoSerializer, it 
 doubles to around 100-150MB/s. ( I have 12 disks per node and files 
 scatter across disks, so HDFS BW is not a problem)

 And I also notice that, in this case, the object to write is 
 (String, Int), if I try some case with (int, int), the throughput will be 
 2-3x faster further.

 So, in my Wordcount case, the bottleneck is CPU ( cause if 
 with shuffle compress on, the 150MB/s data bandwidth in input side, 
 will usually lead to around 50MB/s shuffle data)

 This serialize BW looks somehow too low , so I am wondering, what's 
 BW you observe in your case? Does this throughput sounds reasonable to you? 
 If not, anything might possible need to be examined in my case?



 Best Regards,
 Raymond Liu




Re: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Patrick Wendell
Is this the serialization throughput per task or the serialization
throughput for all the tasks?

On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote:
 Hi

 I am running a WordCount program which count words from HDFS, and I 
 noticed that the serializer part of code takes a lot of CPU time. On a 
 16core/32thread node, the total throughput is around 50MB/s by 
 JavaSerializer, and if I switching to KryoSerializer, it doubles to around 
 100-150MB/s. ( I have 12 disks per node and files scatter across disks, so 
 HDFS BW is not a problem)

 And I also notice that, in this case, the object to write is (String, 
 Int), if I try some case with (int, int), the throughput will be 2-3x faster 
 further.

 So, in my Wordcount case, the bottleneck is CPU ( cause if with 
 shuffle compress on, the 150MB/s data bandwidth in input side, will usually 
 lead to around 50MB/s shuffle data)

 This serialize BW looks somehow too low , so I am wondering, what's 
 BW you observe in your case? Does this throughput sounds reasonable to you? 
 If not, anything might possible need to be examined in my case?



 Best Regards,
 Raymond Liu




RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
For all the tasks, say 32 task on total

Best Regards,
Raymond Liu


-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com] 

Is this the serialization throughput per task or the serialization throughput 
for all the tasks?

On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote:
 Hi

 I am running a WordCount program which count words from HDFS, 
 and I noticed that the serializer part of code takes a lot of CPU 
 time. On a 16core/32thread node, the total throughput is around 50MB/s 
 by JavaSerializer, and if I switching to KryoSerializer, it doubles to 
 around 100-150MB/s. ( I have 12 disks per node and files scatter 
 across disks, so HDFS BW is not a problem)

 And I also notice that, in this case, the object to write is (String, 
 Int), if I try some case with (int, int), the throughput will be 2-3x faster 
 further.

 So, in my Wordcount case, the bottleneck is CPU ( cause if 
 with shuffle compress on, the 150MB/s data bandwidth in input side, 
 will usually lead to around 50MB/s shuffle data)

 This serialize BW looks somehow too low , so I am wondering, what's 
 BW you observe in your case? Does this throughput sounds reasonable to you? 
 If not, anything might possible need to be examined in my case?



 Best Regards,
 Raymond Liu




RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
By the way, to be clear, I run repartition firstly to make all data go through 
shuffle instead of run ReduceByKey etc directly ( which reduce the data need to 
be shuffle and serialized), thus say all 50MB/s data from HDFS will go to 
serializer. ( in fact, I also tried generate data in memory directly instead of 
read from HDFS, similar throughput result)

Best Regards,
Raymond Liu


-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com] 

For all the tasks, say 32 task on total

Best Regards,
Raymond Liu


-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com] 

Is this the serialization throughput per task or the serialization throughput 
for all the tasks?

On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote:
 Hi

 I am running a WordCount program which count words from HDFS, 
 and I noticed that the serializer part of code takes a lot of CPU 
 time. On a 16core/32thread node, the total throughput is around 50MB/s 
 by JavaSerializer, and if I switching to KryoSerializer, it doubles to 
 around 100-150MB/s. ( I have 12 disks per node and files scatter 
 across disks, so HDFS BW is not a problem)

 And I also notice that, in this case, the object to write is (String, 
 Int), if I try some case with (int, int), the throughput will be 2-3x faster 
 further.

 So, in my Wordcount case, the bottleneck is CPU ( cause if 
 with shuffle compress on, the 150MB/s data bandwidth in input side, 
 will usually lead to around 50MB/s shuffle data)

 This serialize BW looks somehow too low , so I am wondering, what's 
 BW you observe in your case? Does this throughput sounds reasonable to you? 
 If not, anything might possible need to be examined in my case?



 Best Regards,
 Raymond Liu




RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
Later case, total throughput aggregated from all cores.

Best Regards,
Raymond Liu


-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Wednesday, April 30, 2014 1:22 PM
To: user@spark.apache.org
Subject: Re: How fast would you expect shuffle serialize to be?

Hm - I'm still not sure if you mean
100MB/s for each task = 3200MB/s across all cores
-or-
3.1MB/s for each task = 100MB/s across all cores

If it's the second one, that's really slow and something is wrong. If it's the 
first one this in the range of what I'd expect, but I'm no expert.

On Tue, Apr 29, 2014 at 10:14 PM, Liu, Raymond raymond@intel.com wrote:
 For all the tasks, say 32 task on total

 Best Regards,
 Raymond Liu


 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]

 Is this the serialization throughput per task or the serialization throughput 
 for all the tasks?

 On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond raymond@intel.com wrote:
 Hi

 I am running a WordCount program which count words from HDFS, 
 and I noticed that the serializer part of code takes a lot of CPU 
 time. On a 16core/32thread node, the total throughput is around 
 50MB/s by JavaSerializer, and if I switching to KryoSerializer, it 
 doubles to around 100-150MB/s. ( I have 12 disks per node and files 
 scatter across disks, so HDFS BW is not a problem)

 And I also notice that, in this case, the object to write is 
 (String, Int), if I try some case with (int, int), the throughput will be 
 2-3x faster further.

 So, in my Wordcount case, the bottleneck is CPU ( cause if 
 with shuffle compress on, the 150MB/s data bandwidth in input side, 
 will usually lead to around 50MB/s shuffle data)

 This serialize BW looks somehow too low , so I am wondering, what's 
 BW you observe in your case? Does this throughput sounds reasonable to you? 
 If not, anything might possible need to be examined in my case?



 Best Regards,
 Raymond Liu