Re: Problems with GC and time to execute with different number of executors.

2015-02-06 Thread Sandy Ryza
That's definitely surprising to me that you would be hitting a lot of GC for this scenario. Are you setting --executor-cores and --executor-memory? What are you setting them to? -Sandy On Thu, Feb 5, 2015 at 10:17 AM, Guillermo Ortiz konstt2...@gmail.com wrote: Any idea why if I use more

Re: Problems with GC and time to execute with different number of executors.

2015-02-06 Thread Guillermo Ortiz
This is an execution with 80 executors MetricMin25th percentileMedian75th percentileMax Duration 31s 44s 50s 1.1min 2.6 min GC Time 70ms 0.1s 0.3s 4s 53 s Input 128.0MB 128.0MB 128.0MB 128.0MB 128.0MB I executed as well with 40 executors MetricMin25th percentileMedian75th percentileMax Duration

Re: Problems with GC and time to execute with different number of executors.

2015-02-06 Thread Guillermo Ortiz
Yes, It's surpressing to me as well I tried to execute it with different configurations, sudo -u hdfs spark-submit --master yarn-client --class com.mycompany.app.App --num-executors 40 --executor-memory 4g Example-1.0-SNAPSHOT.jar hdfs://ip:8020/tmp/sparkTest/ file22.bin parameters This is

Re: Problems with GC and time to execute with different number of executors.

2015-02-06 Thread Sandy Ryza
Yes, having many more cores than disks and all writing at the same time can definitely cause performance issues. Though that wouldn't explain the high GC. What percent of task time does the web UI report that tasks are spending in GC? On Fri, Feb 6, 2015 at 12:56 AM, Guillermo Ortiz

Re: Problems with GC and time to execute with different number of executors.

2015-02-05 Thread Guillermo Ortiz
I'm not caching the data. with each iteration I mean,, each 128mb that a executor has to process. The code is pretty simple. final Conversor c = new Conversor(null, null, null, longFields,typeFields); SparkConf conf = new SparkConf().setAppName(Simple Application); JavaSparkContext sc = new

Re: Problems with GC and time to execute with different number of executors.

2015-02-05 Thread Guillermo Ortiz
Any idea why if I use more containers I get a lot of stopped because GC? 2015-02-05 8:59 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com: I'm not caching the data. with each iteration I mean,, each 128mb that a executor has to process. The code is pretty simple. final Conversor c = new

Problems with GC and time to execute with different number of executors.

2015-02-04 Thread Guillermo Ortiz
I execute a job in Spark where I'm processing a file of 80Gb in HDFS. I have 5 slaves: (32cores /256Gb / 7physical disks) x 5 I have been trying many different configurations with YARN. yarn.nodemanager.resource.memory-mb 196Gb yarn.nodemanager.resource.cpu-vcores 24 I have tried to execute the