[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-16668: --- Fix Version/s: (was: 4.0.x) (was: 4.0-rc) 4.1 4.0-rc2 > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0-rc2, 4.0, 4.1 > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16668: Since Version: (was: 4.0-rc1) > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0, 4.0-rc, 4.0.x > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16668: Fix Version/s: 4.0.x 4.0 Since Version: 4.0-rc1 Source Control Link: https://github.com/apache/cassandra/commit/8cd02afce972ecaf0e0cf0fe09c610d67d9af9c5 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0, 4.0-rc, 4.0.x > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16668: Status: Ready to Commit (was: Review In Progress) > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0-rc > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16668: Reviewers: Andres de la Peña, Ekaterina Dimitrova, Jon Meredith (was: Andres de la Peña, Jon Meredith) > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0-rc > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andres de la Peña updated CASSANDRA-16668: -- Reviewers: Andres de la Peña, Jon Meredith, Andres de la Peña (was: Andres de la Peña, Jon Meredith) Andres de la Peña, Jon Meredith, Andres de la Peña (was: Andres de la Peña, Jon Meredith) Status: Review In Progress (was: Patch Available) > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0-rc > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andres de la Peña updated CASSANDRA-16668: -- Reviewers: Andres de la Peña, Jon Meredith (was: Jon Meredith) > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0-rc > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-16668: Test and Documentation Plan: [https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01] Status: Patch Available (was: In Progress) > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0-rc > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero
[ https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Meredith updated CASSANDRA-16668: - Bug Category: Parent values: Code(13163)Level 1 values: Bug - Unclear Impact(13164) Complexity: Normal Component/s: Local/Other Discovered By: Unit Test Fix Version/s: 4.0-rc Reviewers: Jon Meredith Severity: Low Assignee: Matt Fleming Status: Open (was: Triage Needed) > Intermittent failure of > SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race > condition when shrinking maximum pool size to zero > - > > Key: CASSANDRA-16668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16668 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Matt Fleming >Assignee: Matt Fleming >Priority: Normal > Fix For: 4.0-rc > > > A difficult-to-hit race condition exists in > changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool > size from 0 -> 4 which results in the test failing like so: > {{junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but > was:junit.framework.AssertionFailedError: Test tasks did not hit max > concurrency goal expected: but was: at > org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198) > at > org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}} > I can hit this issue maybe 2/3 times for every 100 invocations of the unit > test. > The issue that causes the failure is that if tasks are still enqueued when > the maximum pool size is set to zero and if all of the SEPWorker threads > enter the STOP state before the pool size is bumped to 4, then no SEPWorker > threads will be spun up to service the task queue. This causes the above > error. > Why don't we spin up SEPWorker threads when enqueing tasks? Because of the > guard logic in addTask: > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121] > In this scenario taskPermits will not be zero (because we have tasks on the > queue) so we never call {{maybeStartSpinningWorker()}}. > A trick to make this issue much easier to hit is to insert a > {{Thread.sleep(500)}} immediately after setting the pool size to zero. This > has the effect of guaranteeing that all SEPWorker threads will be STOP'd > before enqueueing more work. > Here's a fix that attempts to spin up an SEPWorker whenever we grow the > number of work permits: > https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org