Question about executor memory setting

2016-09-27 Thread Dogtail L
Hi all, May I ask a question about executor memory setting? I was running PageRank with input size 2.8GB on one workstation for testing. I gave PageRank one executor. In case 1, I set --executor-cores to 4, and --executor-memory to 1GB, the stage (stage 2) completion time is 14 min, the the

How to decide the number of tasks in Spark?

2016-04-18 Thread Dogtail L
Hi, When launching a job in Spark, I have great trouble deciding the number of tasks. Someone says it is better to create a task per HDFS block size, i.e., make sure one task process 128MB of input data; others suggest that the number of tasks should be the twice of the total cores available to

Question about Spark shuffle read size

2015-11-04 Thread Dogtail L
Hi all, When I run WordCount using Spark, I find that when I set "spark.default.parallelism" to different numbers, the Shuffle Write size and Shuffle Read size will change as well (I read these data from history server's web UI). Is it because the shuffle write size also include some metadata

Re: How to compile Spark with customized Hadoop?

2015-10-14 Thread Dogtail L
ild (see > http://spark.apache.org/docs/latest/building-spark.html). > > Matei > > On Oct 9, 2015, at 3:10 PM, Dogtail L <spark.ru...@gmail.com> wrote: > > Hi all, > > I have modified Hadoop source code, and I want to compile Spark with my > modified Hadoop. Do you know how to do that? Great thanks! > > >

How to compile Spark with customized Hadoop?

2015-10-09 Thread Dogtail L
Hi all, I have modified Hadoop source code, and I want to compile Spark with my modified Hadoop. Do you know how to do that? Great thanks!