[jira] [Comment Edited] (SPARK-24474) Cores are left idle when there are a lot of tasks to run

2018-07-05 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533499#comment-16533499
 ] 

Hari Sekhon edited comment on SPARK-24474 at 7/5/18 10:43 AM:
--

My main concern with this workaround is pulling half the blocks over the 
network, which would deteriorate performance across jobs and queues on our 
clusters if everyone does it, since there is no network quota isolation.

I've raised a request for HDFS Anti-Affinity Block Placement improvement to 
solve dataset placement skew across a subset of datanodes. An improved spread 
of a dataset across datanodes would allow data local task scheduling to work as 
it is intended, which seems like a much better long term fix. Please vote up 
the issue here if this is affecting you:

https://issues.apache.org/jira/browse/HDFS-13720

 


was (Author: harisekhon):
My main concern with this workaround is pulling half the blocks over the 
network, which would deteriorate our clusters if everyone does it.

I've raised a request for HDFS Anti-Affinity Block Placement improvement to 
solve dataset placement skew across a subset of datanodes. An improved spread 
of a dataset across datanodes would allow data local task scheduling to work as 
it is intended, which seems like a much better long term fix. Please vote up 
the issue here if this is affecting you:

https://issues.apache.org/jira/browse/HDFS-13720

 

> Cores are left idle when there are a lot of tasks to run
> 
>
> Key: SPARK-24474
> URL: https://issues.apache.org/jira/browse/SPARK-24474
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Al M
>Priority: Major
>
> I've observed an issue happening consistently when:
>  * A job contains a join of two datasets
>  * One dataset is much larger than the other
>  * Both datasets require some processing before they are joined
> What I have observed is:
>  * 2 stages are initially active to run processing on the two datasets
>  ** These stages are run in parallel
>  ** One stage has significantly more tasks than the other (e.g. one has 30k 
> tasks and the other has 2k tasks)
>  ** Spark allocates a similar (though not exactly equal) number of cores to 
> each stage
>  * First stage completes (for the smaller dataset)
>  ** Now there is only one stage running
>  ** It still has many tasks left (usually > 20k tasks)
>  ** Around half the cores are idle (e.g. Total Cores = 200, active tasks = 
> 103)
>  ** This continues until the second stage completes
>  * Second stage completes, and third begins (the stage that actually joins 
> the data)
>  ** This stage works fine, no cores are idle (e.g. Total Cores = 200, active 
> tasks = 200)
> Other interesting things about this:
>  * It seems that when we have multiple stages active, and one of them 
> finishes, it does not actually release any cores to existing stages
>  * Once all active stages are done, we release all cores to new stages
>  * I can't reproduce this locally on my machine, only on a cluster with YARN 
> enabled
>  * It happens when dynamic allocation is enabled, and when it is disabled
>  * The stage that hangs (referred to as "Second stage" above) has a lower 
> 'Stage Id' than the first one that completes
>  * This happens with spark.shuffle.service.enabled set to true and false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24474) Cores are left idle when there are a lot of tasks to run

2018-07-04 Thread Al M (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532910#comment-16532910
 ] 

Al M edited comment on SPARK-24474 at 7/4/18 4:17 PM:
--

My initial tests suggest that this stops the issue from happening.  Thanks!  I 
will perform more tests to make 100% sure that it does not still occur.

 

I am surprised that this config makes a difference.  My tasks are usually quite 
big; normally taking about a minute each.  I would not have expected a change 
from waiting 3s per task to 0s per task to make such a huge difference.

Do you know if there is any unexpected logic around this config setting?


was (Author: alrocks46):
My initial tests suggest that this stops the issue from happening.  Thanks!  I 
will perform more tests to make 100% sure that it does not still occur.

 

I am surprised that this config makes a difference.  My tasks are usually quite 
big; normally taking about a minute each.  I would not have expected a change 
from waiting 3s per task to 0s per task to make such a huge difference.

Do you know if there is any unusual behaviour around this config setting?

> Cores are left idle when there are a lot of tasks to run
> 
>
> Key: SPARK-24474
> URL: https://issues.apache.org/jira/browse/SPARK-24474
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.2.0
>Reporter: Al M
>Priority: Major
>
> I've observed an issue happening consistently when:
>  * A job contains a join of two datasets
>  * One dataset is much larger than the other
>  * Both datasets require some processing before they are joined
> What I have observed is:
>  * 2 stages are initially active to run processing on the two datasets
>  ** These stages are run in parallel
>  ** One stage has significantly more tasks than the other (e.g. one has 30k 
> tasks and the other has 2k tasks)
>  ** Spark allocates a similar (though not exactly equal) number of cores to 
> each stage
>  * First stage completes (for the smaller dataset)
>  ** Now there is only one stage running
>  ** It still has many tasks left (usually > 20k tasks)
>  ** Around half the cores are idle (e.g. Total Cores = 200, active tasks = 
> 103)
>  ** This continues until the second stage completes
>  * Second stage completes, and third begins (the stage that actually joins 
> the data)
>  ** This stage works fine, no cores are idle (e.g. Total Cores = 200, active 
> tasks = 200)
> Other interesting things about this:
>  * It seems that when we have multiple stages active, and one of them 
> finishes, it does not actually release any cores to existing stages
>  * Once all active stages are done, we release all cores to new stages
>  * I can't reproduce this locally on my machine, only on a cluster with YARN 
> enabled
>  * It happens when dynamic allocation is enabled, and when it is disabled
>  * The stage that hangs (referred to as "Second stage" above) has a lower 
> 'Stage Id' than the first one that completes
>  * This happens with spark.shuffle.service.enabled set to true and false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org