Re: Spark Scheduler creating Straggler Node

Prabhu Joseph Tue, 08 Mar 2016 21:53:11 -0800

I don't just want to replicate all Cached Blocks. I am trying to find a way
to solve the issue which i mentioned above mail. Having replicas for all
cached blocks will add more cost to customers.




On Wed, Mar 9, 2016 at 9:50 AM, Reynold Xin <[email protected]> wrote:

> You just want to be able to replicate hot cached blocks right?
>
>
> On Tuesday, March 8, 2016, Prabhu Joseph <[email protected]>
> wrote:
>
>> Hi All,
>>
>>     When a Spark Job is running, and one of the Spark Executor on Node A
>> has some partitions cached. Later for some other stage, Scheduler tries to
>> assign a task to Node A to process a cached partition (PROCESS_LOCAL). But
>> meanwhile the Node A is occupied with some other
>> tasks and got busy. Scheduler waits for spark.locality.wait interval and
>> times out and tries to find some other node B which is NODE_LOCAL. The
>> executor on Node B will try to get the cached partition from Node A which
>> adds network IO to node and also some extra CPU for I/O. Eventually,
>> every node will have a task that is waiting to fetch some cached
>> partition from node A and so the spark job / cluster is basically blocked
>> on a single node.
>>
>> Spark JIRA is created https://issues.apache.org/jira/browse/SPARK-13718
>>
>> Beginning from Spark 1.2, Spark introduced External Shuffle Service to
>> enable executors fetch shuffle files from an external service instead of
>> from each other which will offload the load on Spark Executors.
>>
>> We want to check whether a similar thing of an External Service is
>> implemented for transferring the cached partition to other executors.
>>
>>
>> Thanks, Prabhu Joseph
>>
>>
>>

Re: Spark Scheduler creating Straggler Node

Reply via email to