Re: Spark job does not perform well when some RDD in memory and some on Disk

2016-02-04 Thread Alonso Isidoro Roman
"But learned that it is better not to reduce it to 0." could you explain a bit more this sentence? thanks Alonso Isidoro Roman. Mis citas preferidas (de hoy) : "Si depurar es el proceso de quitar los errores de software, entonces programar debe ser el proceso de introducirlos..." - Edsger

Re: Spark job does not perform well when some RDD in memory and some on Disk

2016-02-04 Thread Prabhu Joseph
Okay, the reason for the task delay within executor when some RDD in memory and some in Hadoop i.e, Multiple Locality Levels NODE_LOCAL and ANY, in this case Scheduler waits for *spark.locality.wait *3 seconds default. During this period, scheduler waits to launch a data-local task before giving

Re: Spark job does not perform well when some RDD in memory and some on Disk

2016-02-04 Thread Prabhu Joseph
If spark.locality.wait is 0, then there are two performance issues: 1. Task Scheduler won't wait to schedule the tasks as DATA_LOCAL, will launch it immediately on some node even if it is less local. The probability of tasks running as less local will be higher and affect the overall Job