That's definitely surprising to me that you would be hitting a lot of GC
for this scenario. Are you setting --executor-cores and
--executor-memory? What are you setting them to?
-Sandy
On Thu, Feb 5, 2015 at 10:17 AM, Guillermo Ortiz konstt2...@gmail.com
wrote:
Any idea why if I use more
This is an execution with 80 executors
MetricMin25th percentileMedian75th percentileMax
Duration 31s 44s 50s 1.1min 2.6 min
GC Time 70ms 0.1s 0.3s 4s 53 s
Input 128.0MB 128.0MB 128.0MB 128.0MB 128.0MB
I executed as well with 40 executors
MetricMin25th percentileMedian75th percentileMax
Duration
Yes, It's surpressing to me as well
I tried to execute it with different configurations,
sudo -u hdfs spark-submit --master yarn-client --class
com.mycompany.app.App --num-executors 40 --executor-memory 4g
Example-1.0-SNAPSHOT.jar hdfs://ip:8020/tmp/sparkTest/ file22.bin
parameters
This is
Yes, having many more cores than disks and all writing at the same time can
definitely cause performance issues. Though that wouldn't explain the high
GC. What percent of task time does the web UI report that tasks are
spending in GC?
On Fri, Feb 6, 2015 at 12:56 AM, Guillermo Ortiz
I'm not caching the data. with each iteration I mean,, each 128mb
that a executor has to process.
The code is pretty simple.
final Conversor c = new Conversor(null, null, null, longFields,typeFields);
SparkConf conf = new SparkConf().setAppName(Simple Application);
JavaSparkContext sc = new
Any idea why if I use more containers I get a lot of stopped because GC?
2015-02-05 8:59 GMT+01:00 Guillermo Ortiz konstt2...@gmail.com:
I'm not caching the data. with each iteration I mean,, each 128mb
that a executor has to process.
The code is pretty simple.
final Conversor c = new
I execute a job in Spark where I'm processing a file of 80Gb in HDFS.
I have 5 slaves:
(32cores /256Gb / 7physical disks) x 5
I have been trying many different configurations with YARN.
yarn.nodemanager.resource.memory-mb 196Gb
yarn.nodemanager.resource.cpu-vcores 24
I have tried to execute the