Hi Marko, Thanks for your enthousiastic and useful report! We had similiar experiences over here. SparkGraphcomputer seems to like small chunks of data of 128MB or so, even if you have 8 or 16 Gb in your executors.
In addition, when running Spark/Yarn, you need a high spark.yarn.executor.memoryOverhead value of about 20%, while 6-10% is mentioned in the SparkYarn reference https://spark.apache.org/docs/1.5.2/running-on-yarn.html . <https://spark.apache.org/docs/1.5.2/running-on-yarn.html> Otherwise, the executor starves when Yarn is set to police queues. I am sorry I cannot provide any quantative data, but I thought I'd mention it anyway, to give people a hint which knobs to tune. Cheers, Marc
