You just want to be able to replicate hot cached blocks right?

On Tuesday, March 8, 2016, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote:

> Hi All,
>
>     When a Spark Job is running, and one of the Spark Executor on Node A
> has some partitions cached. Later for some other stage, Scheduler tries to
> assign a task to Node A to process a cached partition (PROCESS_LOCAL). But
> meanwhile the Node A is occupied with some other
> tasks and got busy. Scheduler waits for spark.locality.wait interval and
> times out and tries to find some other node B which is NODE_LOCAL. The
> executor on Node B will try to get the cached partition from Node A which
> adds network IO to node and also some extra CPU for I/O. Eventually,
> every node will have a task that is waiting to fetch some cached partition
> from node A and so the spark job / cluster is basically blocked on a single
> node.
>
> Spark JIRA is created https://issues.apache.org/jira/browse/SPARK-13718
>
> Beginning from Spark 1.2, Spark introduced External Shuffle Service to
> enable executors fetch shuffle files from an external service instead of
> from each other which will offload the load on Spark Executors.
>
> We want to check whether a similar thing of an External Service is
> implemented for transferring the cached partition to other executors.
>
>
> Thanks, Prabhu Joseph
>
>
>

Reply via email to