[ https://issues.apache.org/jira/browse/SPARK-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
SuYan updated SPARK-4167: ------------------------- Description: Recently, when run a spark on yarn job. it occurs executor schedules imbalance. the procedure is that: 1. because user's mistake, the spark on yarn job's input split contains 0 byte empty splits. 1.1: task0-99 , no-preference task(0 byte) task100-800, node-local task 1.2: user will run task 500 loops 1.3: 60 executor 2. executor A only have 2 node-local task in the first loop, executor A first finished node-local-task, the it will run no-preference task, and the no-preference task in our situation have smaller input split than node-local task. So executor A finished all no-reference task, while others still run node-local job. in the second loop, all task have process-local level, and all task finished in 3 seconds, so while executor A is still run process-local task while others are all finished process-local task. but all process-task run by executor A will finished in 3 seconds, so the local level will always be process-local. it results other executors are all wait for executor A the same situation in the left loops. To solve this situation, we let user to delete the empty input split. but is still have implied imbalance, while in some loops, a executor got more process-local task than others in one loop, and this task all less-3 seconds task. and then in the left loops, the others executor will wait that executor to finished all process-local tasks. was: Recently, when run a spark on yarn job. it occurs executor schedules imbalance. the procedure is that: 1. because user's mistake, the spark on yarn job's input split contains 0 byte empty splits. 1.1: task0-99 , no-preference task(0 byte) task100-800, node-local task 1.2: user will run task 500 loops 1.3: 60 executor 2. executor A only have 2 node-local task in the first loop, executor A first finished node-local-task, the it will run no-preference task, and the no-preference task in our situation have smaller input split than node-local task. So executor A finished all no-reference task, while others still run node-local job. in the second loop, all task have process-local level, and all task finished in 3 seconds, so while executor A is still run process-local task while others are all finished process-local task. but all process-task run by executor A will finished in 3 seconds, so the local level will always be process-local. it results other executors are all wait for executor A the same situation in the left loops. To solve this situation, we let user to delete the empty input split. but is still have implied imbalance, while in some loops, a executor got more process-local task than others in one loop, and this task all less-3 seconds task. and then in the left loops, the others executor will wait that executor to finished all process-local tasks. > Schedule task on Executor will be Imbalance while task run less than > local-wait time > ------------------------------------------------------------------------------------ > > Key: SPARK-4167 > URL: https://issues.apache.org/jira/browse/SPARK-4167 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.1.0 > Reporter: SuYan > > Recently, when run a spark on yarn job. it occurs executor schedules > imbalance. > the procedure is that: > 1. because user's mistake, the spark on yarn job's input split contains 0 > byte empty splits. > 1.1: > task0-99 , no-preference task(0 byte) > task100-800, node-local task > 1.2: user will run task 500 loops > 1.3: 60 executor > 2. > executor A only have 2 node-local task in the first loop, executor A first > finished node-local-task, the it will run no-preference task, and the > no-preference task in our situation have smaller input split than node-local > task. So executor A finished all no-reference task, while others still run > node-local job. > in the second loop, all task have process-local level, and all task finished > in 3 seconds, so while executor A is still run process-local task while > others are all finished process-local task. but all process-task run by > executor A will finished in 3 seconds, so the local level will always be > process-local. > it results other executors are all wait for executor A the same situation in > the left loops. > To solve this situation, we let user to delete the empty input split. > but is still have implied imbalance, while in some loops, a executor got > more process-local task than others in one loop, and this task all less-3 > seconds task. and then in the left loops, the others executor will wait that > executor to finished all process-local tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org