Hi Marko,

Thanks for your enthousiastic and useful report! We had similiar 
experiences over here. SparkGraphcomputer seems to like small chunks of 
data of 128MB or so, even if you have 8 or 16 Gb in your executors.

In addition, when running Spark/Yarn, you need a high 
spark.yarn.executor.memoryOverhead 
value of about 20%, while 6-10% is mentioned in the SparkYarn reference 
https://spark.apache.org/docs/1.5.2/running-on-yarn.html . 
<https://spark.apache.org/docs/1.5.2/running-on-yarn.html>
Otherwise, the executor starves when Yarn is set to police queues.
I am sorry I cannot provide any quantative data, but I thought I'd mention 
it anyway, to give people a hint which knobs to tune.

Cheers,     Marc

Reply via email to