Re: use netty shuffle for network cause high gc time

2015-01-14 Thread lihu
I used the spark1.1

On Wed, Jan 14, 2015 at 2:24 PM, Aaron Davidson ilike...@gmail.com wrote:

 What version are you running? I think spark.shuffle.use.netty was a
 valid option only in Spark 1.1, where the Netty stuff was strictly
 experimental. Spark 1.2 contains an officially supported and much more
 thoroughly tested version under the property 
 spark.shuffle.blockTransferService,
 which is set to netty by default.

 On Tue, Jan 13, 2015 at 9:26 PM, lihu lihu...@gmail.com wrote:

 Hi,
  I just test groupByKey method on a 100GB data, the cluster is 20
 machine, each with 125GB RAM.

 At first I set  conf.set(spark.shuffle.use.netty, false) and run
 the experiment, and then I set conf.set(spark.shuffle.use.netty, true)
 again to re-run the experiment, but at the latter case, the GC time is much
 higher。


  I thought the latter one should be better, but it is not. So when should
 we use netty for network shuffle fetching?






-- 
*Best Wishes!*

*Li Hu(李浒) | Graduate Student*

*Institute for Interdisciplinary Information Sciences(IIIS
http://iiis.tsinghua.edu.cn/)*
*Tsinghua University, China*

*Email: lihu...@gmail.com lihu...@gmail.com*
*Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/
http://iiis.tsinghua.edu.cn/zh/lihu/*


use netty shuffle for network cause high gc time

2015-01-13 Thread lihu
Hi,
 I just test groupByKey method on a 100GB data, the cluster is 20
machine, each with 125GB RAM.

At first I set  conf.set(spark.shuffle.use.netty, false) and run
the experiment, and then I set conf.set(spark.shuffle.use.netty, true)
again to re-run the experiment, but at the latter case, the GC time is much
higher。


 I thought the latter one should be better, but it is not. So when should
we use netty for network shuffle fetching?


Re: use netty shuffle for network cause high gc time

2015-01-13 Thread Andrew Ash
To confirm, lihu, are you using Spark version 1.2.0 ?

On Tue, Jan 13, 2015 at 9:26 PM, lihu lihu...@gmail.com wrote:

 Hi,
  I just test groupByKey method on a 100GB data, the cluster is 20
 machine, each with 125GB RAM.

 At first I set  conf.set(spark.shuffle.use.netty, false) and run
 the experiment, and then I set conf.set(spark.shuffle.use.netty, true)
 again to re-run the experiment, but at the latter case, the GC time is much
 higher。


  I thought the latter one should be better, but it is not. So when should
 we use netty for network shuffle fetching?





Re: use netty shuffle for network cause high gc time

2015-01-13 Thread Aaron Davidson
What version are you running? I think spark.shuffle.use.netty was a valid
option only in Spark 1.1, where the Netty stuff was strictly experimental.
Spark 1.2 contains an officially supported and much more thoroughly tested
version under the property spark.shuffle.blockTransferService, which is
set to netty by default.

On Tue, Jan 13, 2015 at 9:26 PM, lihu lihu...@gmail.com wrote:

 Hi,
  I just test groupByKey method on a 100GB data, the cluster is 20
 machine, each with 125GB RAM.

 At first I set  conf.set(spark.shuffle.use.netty, false) and run
 the experiment, and then I set conf.set(spark.shuffle.use.netty, true)
 again to re-run the experiment, but at the latter case, the GC time is much
 higher。


  I thought the latter one should be better, but it is not. So when should
 we use netty for network shuffle fetching?