[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-21 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349431#comment-17349431
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16668:
-

Committed to 4.0 and propagated to trunk. Thank you all!

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-21 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349349#comment-17349349
 ] 

Matt Fleming commented on CASSANDRA-16668:
--

[~adelapena] your changes look good! Thanks for doing that. +1

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349335#comment-17349335
 ] 

Andres de la Peña commented on CASSANDRA-16668:
---

[~e.dimitrova] I have the two patches, the suggestions and few additional minor 
warning fixes in [this 
branch|https://github.com/adelapena/cassandra/tree/16668-trunk-review], which 
is the one used for the CircleCI runs above.

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-21 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349305#comment-17349305
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16668:
-

+1 from me too, if it is fine with the others I can address on commit the nits 
from Andres later today. Thank you!

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348618#comment-17348618
 ] 

Andres de la Peña commented on CASSANDRA-16668:
---

+1 on both changes.
{quote}
On the {{shutdownTest} failures I agree it is unlikely related to the change. 
Perhaps 100 milliseconds is not long enough for the thread to notice it is 
being shutdown and exit for the join call. Threads should request a nano sleep 
for at most 10ms, but are not guaranteed to be awoken on a busy CI server and 
may exceed the test deadline. What do you think about an increase from 100 
milliseconds up to 1 second to see if it improves things?
{quote}
Indeed [a wait of 1 
second|https://github.com/adelapena/cassandra/commit/87351868a2b09b605fbd7931e4aba7ef26928ecb]
 seems to make {{shutdownTest}} not to fail anymore, at least not in these 20K 
runs:
* 
[j8-j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/475/workflows/71011b7f-3883-4a73-89cf-552099ff5896/jobs/4200]
* 
[j8-j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/475/workflows/71011b7f-3883-4a73-89cf-552099ff5896/jobs/4201]
* 
[j11-j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/475/workflows/a20f69cd-ec0e-499b-ad7f-0d2e4a92a2fb/jobs/4208]

I think we could include that here, as part of the fix.

I'm leaving a few very minor suggestions in the commit for {{SEPExecutorTest}}.

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-18 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347196#comment-17347196
 ] 

Jon Meredith commented on CASSANDRA-16668:
--

+1 from me on 071516d29e41da9924af24e8002822d3c6af0e01 and 
b4f43608c9a8db23a622608804d95629616a66da . Thanks so much for fixing it and 
updating the tests. 

I think it fixes the issue you found, and I don't think it is possible to 
create an unbounded number of threads in the shared pool by frequently calling 
{{setMaximumPoolSize}} as it only calls schedule if there are no spinning 
threads, and tries to satisfy itself for work from the parked/spinning pools 
first.

I'm slightly concerned about the fixed duration sleep() in the new test given 
how much variability there is in CI infrastructure timing. Ideally the test 
would verify that all of the SEPWorkers were parked (and none spinning), but 
there's not any straightforward way I can see to do that without refactoring. 
Worst case is the test doesn't always cover the all-SEPWorkers parked case 
every time, but seems unlikely to fail and create a new flaky test.

On the {{shutdownTest} failures I agree it is unlikely related to the change. 
Perhaps 100 milliseconds is not long enough for the thread to notice it is 
being shutdown and exit for the join call. Threads should request a nano sleep 
for at most 10ms, but are not guaranteed to be awoken on a busy CI server and 
may exceed the test deadline. What do you think about an increase from 100 
milliseconds up to 1 second to see if it improves things?


> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-18 Thread Matt Fleming (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346892#comment-17346892
 ] 

Matt Fleming commented on CASSANDRA-16668:
--

I think that failure is probably unrelated because we saw similar failures for 
shutdownTest before this patch here 
[https://app.circleci.com/pipelines/github/adelapena/cassandra/441/workflows/bcf154ff-0b56-48ed-9f82-6b3e395f53ed/jobs/3880/tests#failed-test-0]

Btw, I've also written a new unit test to catch this bug in the future: 
[https://github.com/mfleming/cassandra/commit/b4f43608c9a8db23a622608804d95629616a66da]
 

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346130#comment-17346130
 ] 

Andres de la Peña commented on CASSANDRA-16668:
---

Here are 10K runs of {{SEPExecutorTest}} with the patch, using the [CircleCI 
multiplexer|https://github.com/apache/cassandra/blob/trunk/doc/source/development/testing.rst#circleci]:
 * 
[j8-j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/457/workflows/cb5b1d27-75d4-4b3a-814c-04454fb4f4ef/jobs/4017]
 * 
[j8-j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/457/workflows/cb5b1d27-75d4-4b3a-814c-04454fb4f4ef/jobs/4015]
 * 
[j11-j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/457/workflows/1b82f571-9dd8-4a98-892b-1c0e3704f2d9/jobs/4013]

It seems that {{changingMaxWorkersMeetsConcurrencyGoalsTest}} happily survives 
the three 10K runs, but there are some uncommon failures on {{shutdownTest}}:
 * [j8-j8 runner 62 iteration 
49|https://4017-85817267-gh.circle-artifacts.com/62/stdout/fails/049/testsome-org.apache.cassandra.concurrent.SEPExecutorTest.txt]
 * [j8-j11 runner 73 iteration 
9|https://4015-85817267-gh.circle-artifacts.com/73/stdout/fails/009/testsome-org.apache.cassandra.concurrent.SEPExecutorTest.txt]
 * [j11-j11 runner 68 iteration 
51|https://4013-85817267-gh.circle-artifacts.com/68/stdout/fails/051/testsome-org.apache.cassandra.concurrent.SEPExecutorTest.txt]

Not sure whether that is related or an independent failure.

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16668) Intermittent failure of SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race condition when shrinking maximum pool size to zero

2021-05-13 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344146#comment-17344146
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16668:
-

Jenkins [CI run | https://jenkins-cm4.apache.org/job/Cassandra-devbranch/776/]

> Intermittent failure of 
> SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest caused by race 
> condition when shrinking maximum pool size to zero
> -
>
> Key: CASSANDRA-16668
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16668
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Matt Fleming
>Assignee: Matt Fleming
>Priority: Normal
> Fix For: 4.0-rc
>
>
> A difficult-to-hit race condition exists in 
> changingMaxWorkersMeetsConcurrencyGoalsTest when changing the maximum pool 
> size from 0 -> 4 which results in the test failing like so:
> {{junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but 
> was:junit.framework.AssertionFailedError: Test tasks did not hit max 
> concurrency goal expected: but was: at 
> org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:198)
>  at 
> org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:132)}}
> I can hit this issue maybe 2/3 times for every 100 invocations of the unit 
> test.
> The issue that causes the failure is that if tasks are still enqueued when 
> the maximum pool size is set to zero and if all of the SEPWorker threads 
> enter the STOP state before the pool size is bumped to 4, then no SEPWorker 
> threads will be spun up to service the task queue. This causes the above 
> error.
> Why don't we spin up SEPWorker threads when enqueing tasks? Because of the 
> guard logic in addTask: 
> [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L113,L121]
> In this scenario taskPermits will not be zero (because we have tasks on the 
> queue) so we never call {{maybeStartSpinningWorker()}}.
> A trick to make this issue much easier to hit is to insert a 
> {{Thread.sleep(500)}} immediately after setting the pool size to zero. This 
> has the effect of guaranteeing that all SEPWorker threads will be STOP'd 
> before enqueueing more work.
> Here's a fix that attempts to spin up an SEPWorker whenever we grow the 
> number of work permits: 
> https://github.com/mfleming/cassandra/commit/071516d29e41da9924af24e8002822d3c6af0e01



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org