[ 
https://issues.apache.org/jira/browse/SPARK-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SuYan updated SPARK-4167:
-------------------------
    Description: 
Recently, when run a spark on yarn job. it occurs executor schedules imbalance. 
the procedure is that: 
1. because user's mistake, the spark on yarn job's input split contains 0 byte 
empty splits. 
1.1:
 task0-99 , no-preference task(0 byte) 
task100-800, node-local task 
1.2: user will run task 500 loops
1.3: 60 executor

 2.
 executor A only have 2 node-local task in the first loop, executor A first 
finished node-local-task, the it will run no-preference task, and the 
no-preference task in our situation have smaller input split than node-local 
task. So executor A finished all no-reference task, while others still run 
node-local job. 

in the second loop, all task have process-local level, and all task finished in 
3 seconds, so while executor A is still run process-local task while others are 
all finished process-local task. but all process-task run by executor A will 
finished in 3 seconds, so the local level will always be process-local. 

it results other executors are all wait for executor A the same situation in 
the left loops. 

To solve this situation, we let user to delete the empty input split.
 but is still have implied imbalance, while in some loops, a executor got more 
process-local task than others in one loop, and this task all less-3 seconds 
task. and then in the left loops, the others executor will wait that executor 
to finished all process-local tasks.

  was:
Recently, when run a spark on yarn job. it occurs executor schedules imbalance. 
the procedure is that: 
1. because user's mistake, the spark on yarn job's input split contains 0 byte 
empty splits. 
1.1:
 task0-99 , no-preference task(0 byte) 
task100-800, node-local task 1.2: user will run task 500 loops
1.3: 60 executor

 2.
 executor A only have 2 node-local task in the first loop, executor A first 
finished node-local-task, the it will run no-preference task, and the 
no-preference task in our situation have smaller input split than node-local 
task. So executor A finished all no-reference task, while others still run 
node-local job. 

in the second loop, all task have process-local level, and all task finished in 
3 seconds, so while executor A is still run process-local task while others are 
all finished process-local task. but all process-task run by executor A will 
finished in 3 seconds, so the local level will always be process-local. 

it results other executors are all wait for executor A the same situation in 
the left loops. 

To solve this situation, we let user to delete the empty input split.
 but is still have implied imbalance, while in some loops, a executor got more 
process-local task than others in one loop, and this task all less-3 seconds 
task. and then in the left loops, the others executor will wait that executor 
to finished all process-local tasks.


> Schedule task on Executor will be Imbalance while task run less than 
> local-wait time
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-4167
>                 URL: https://issues.apache.org/jira/browse/SPARK-4167
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: SuYan
>
> Recently, when run a spark on yarn job. it occurs executor schedules 
> imbalance. 
> the procedure is that: 
> 1. because user's mistake, the spark on yarn job's input split contains 0 
> byte empty splits. 
> 1.1:
>  task0-99 , no-preference task(0 byte) 
> task100-800, node-local task 
> 1.2: user will run task 500 loops
> 1.3: 60 executor
>  2.
>  executor A only have 2 node-local task in the first loop, executor A first 
> finished node-local-task, the it will run no-preference task, and the 
> no-preference task in our situation have smaller input split than node-local 
> task. So executor A finished all no-reference task, while others still run 
> node-local job. 
> in the second loop, all task have process-local level, and all task finished 
> in 3 seconds, so while executor A is still run process-local task while 
> others are all finished process-local task. but all process-task run by 
> executor A will finished in 3 seconds, so the local level will always be 
> process-local. 
> it results other executors are all wait for executor A the same situation in 
> the left loops. 
> To solve this situation, we let user to delete the empty input split.
>  but is still have implied imbalance, while in some loops, a executor got 
> more process-local task than others in one loop, and this task all less-3 
> seconds task. and then in the left loops, the others executor will wait that 
> executor to finished all process-local tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to