Hi,

I'm running short spark jobs on rdds cached in memory. I'm also using a
long running job context. I want to be able to complete my jobs (on the
cached rdd) in under 1 sec.
I'm getting the following job times with about 15 GB of data distributed
across 6 nodes. Each executor has about 20GB of memory available. My
context has about 26 cores in total.

If number of partitions < no of cores -

Some jobs run in 3s others take about 6s -- the time difference can be
explained by GC time.

If number of partitions = no of cores -

All jobs run in 4s. The initial tasks of each stage on every executor take
about 1s.

If partitions > cores -

Jobs take more time. The initial tasks of each stage on every executor take
about 1s. The other tasks run in 45-50 ms each. However, since the initial
tasks again take about 1s each, the total time in this case is about 6s
which is more than the previous case.

Clearly the limiting factor here is the initial set of tasks. For every
case, these tasks take 1s to run, no matter the amount of partitions. Hence
best results are obtained with partitions = cores, because in that case.
every core gets 1 task which takes 1s to run.
In this case, I get 0 GC time. The only explanation is "scheduling delay"
which is about 0.2 - 0.3 seconds. I looked at my task size and result size
and that has no bearing on this delay. Also, I'm not getting the task size
warnings in the logs.

For what I can understand, the first time a task runs on a core, it takes
1s to run. Is this normal?

Is it possible to get sub-second latencies?

Can something be done about the scheduler delay?

What other things can I look at to reduce this time?

Regards,

Anshul

Reply via email to