[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934544#comment-14934544 ] Balagopal Nair commented on SPARK-10644: Let me try to explain this one last time.. 7 machines - 4 cores, 8GB RAM (Physical hardware) Number of worker processes - 3 Number of executors per worker processes - 3 Total number of workers = 21 Total number of executors = 63 Per worker memory limit = 512m Per executor memory limit = 512m Scenario 1: Submit one job requesting 21 cores => Number of remaining cores = 43 Submit another job requesting 20 cores - This WAITS Scenario 2: Submit one job requesting 20 cores => Number of remaining cores = 43 Submit one more job requesting 20 cores - This RUNS => Number of remaining cores = 23. Submit one more job requesting 20 cores - This WAITS Comparing scenario 1 and 2, the speculation/theory based on lack of memory do not hold. What I'm trying to say here is that if at at least one worker is not free, executors don't get allocated. This is the behavior that I see while using spark. If you would still like to close it, please go ahead. I don't have anymore details to provide. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair >Priority: Minor > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933475#comment-14933475 ] Balagopal Nair commented on SPARK-10644: I set both executor and worker memory to 512m. Let me rephrase what I said before. Under full load, each host had about 1g RAM free. As I said before, if this was a memory issue, spark will still try to launch jobs and the worker processes would die. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair >Priority: Minor > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933323#comment-14933323 ] Balagopal Nair commented on SPARK-10644: I'm guessing you're trying to figure out whether the executors were not allocated because there was not enough RAM. If that was the case, under full load, the executors would still try to run but will die with an out of memory error. With 512m per executor, i did not have any memory issues and the cluster ran fine under full load. I don't think the issue is related to memory. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair >Priority: Minor > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909758#comment-14909758 ] Balagopal Nair commented on SPARK-10644: That's true.. My memory config is 512m per executor Each machine has 6.7G of available RAM > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair >Priority: Minor > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907100#comment-14907100 ] Balagopal Nair commented on SPARK-10644: 4 core machine, 3 Workers with 3 executors each and there is enough memory. As I said before, I did switch to using one worker processor with 9 executors and the issue is not there anymore. So this is a minor bug. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair >Priority: Minor > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904831#comment-14904831 ] Balagopal Nair commented on SPARK-10644: I'm overallocating hardware here. Each machine has 4 cores and I'm launching 3 workers with 3 executors each. I have 7 such machines which makes Number of worker = 7 x 3 = 21 Number of cores/executors = 21 x 3 = 63 I found out this week that if change the configuration to launch just 1 worker process with 9 executors, this problem does NOT show up anymore. So this issue seems specific to a case where you launch more than one Worker process per host. (I've reduced the priority of this bug to Minor because of this.) > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair >Priority: Minor > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balagopal Nair updated SPARK-10644: --- Priority: Minor (was: Major) > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair >Priority: Minor > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791443#comment-14791443 ] Balagopal Nair commented on SPARK-10644: Standalone cluster manager. I've verified this behaviour again now. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791434#comment-14791434 ] Balagopal Nair edited comment on SPARK-10644 at 9/17/15 1:51 AM: - No. These are independent jobs running under different SparkContexts. Sorry about not being clear enough before... I'm trying share the same cluster between varrious applications. This issue is related to scheduling across applications and not within the same application. was (Author: nbalagopal): No. These are independent jobs running under different SparkContexts > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791434#comment-14791434 ] Balagopal Nair commented on SPARK-10644: No. These are independent jobs running under different SparkContexts > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10644) Applications wait even if free executors are available
Balagopal Nair created SPARK-10644: -- Summary: Applications wait even if free executors are available Key: SPARK-10644 URL: https://issues.apache.org/jira/browse/SPARK-10644 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 1.5.0 Environment: RHEL 6.5 64 bit Reporter: Balagopal Nair Number of workers: 21 Number of executors: 63 Steps to reproduce: 1. Run 4 jobs each with max cores set to 10 2. The first 3 jobs run with 10 each. (30 executors consumed so far) 3. The 4 th job waits even though there are 33 idle executors. The reason is that a job will not get executors unless the total number of EXECUTORS in use < the number of WORKERS If there are executors available, resources should be allocated to the pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org