[
https://issues.apache.org/jira/browse/SOLR-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18082166#comment-18082166
]
Chris M. Hostetter commented on SOLR-18252:
-------------------------------------------
Both types of failures seem to stem from race conditions involving: {{private
List<ConsumerBatch> consumerBatches; // ... = new ArrayList(...)}}
* it appears a kafka consumer is appending to this list (presumably as it
polls?)
* the "main" test thread attempts to iterate over this list, and then makes
assertions about the number of documents found
# The first type of failure happens when the "main" thread "finishes" looping
before the kafka consuming thread is done writing all of the records from kafka
# The second type of failure is more common, and happens when the kafka
consuming thread adds to the list in between iterator advancements of the main
thread.
In general, using an {{ArrayList}} like this (as an information exchange
between threads) isn't particularly safe – but even if it was, the "race
condition" of the first thread is still a very real risk: the "main" thread has
no way of knowing if/when the kafka consuming thread is "done" writtign records.
----
Suggested redesign of this test:
* replace:
** {{private List<ConsumerBatch> consumerBatches}}
** with: {{private BlockingQueue<ConsumerBatch> consumerBatches; // ... new
Linked BlockingQueue(...)}}
* replace the simple iterator loop in the main test thread with a loop that
calls {{{}consumerBatches.poll(/* some reasonable timeout */){}}}.
** The (successful) loop end condition should be once the expected number of
total documents is found in all batches
** otherwise keep polling (and propagate any TimeoutException as a failure if
we never receive all the records we expect
> high failure rate from SolrAndKafkaIntegrationTest.testPartitioning
> --------------------------------------------------------------------
>
> Key: SOLR-18252
> URL: https://issues.apache.org/jira/browse/SOLR-18252
> Project: Solr
> Issue Type: Test
> Components: module - crossDC
> Reporter: Chris M. Hostetter
> Assignee: Andrzej Bialecki
> Priority: Major
>
> Since it was added a few weeks ago,
> {{SolrAndKafkaIntegrationTest.testPartitioning}} has had a jenkins failure
> rate of ~20%.
> based on an ad-hoc review of some of the jenkins logs, the failures seem to
> fall into 2 categories...
> {noformat}
> > java.lang.AssertionError: incorrect count in collection collection1
> expected:<200> but was:<199>
> > at
> __randomizedtesting.SeedInfo.seed([F418E49D11BF02D5:A809F6571555C802]:0)
> > at org.junit.Assert.fail(Assert.java:89)
> > at org.junit.Assert.failNotEquals(Assert.java:835)
> > at org.junit.Assert.assertEquals(Assert.java:647)
> > at
> org.apache.solr.crossdc.manager.SolrAndKafkaIntegrationTest.lambda$testPartitioning$2(SolrAndKafkaIntegrationTest.java:380)
> {noformat}
> ...and...
> {noformat}
> 2> 54261 INFO
> (TEST-SolrAndKafkaIntegrationTest.testPartitioning-seed#[710FF61F1D2B5BF0])
> [] o.a.s.SolrTestCaseJ4 ###Ending testPartitioning
> > java.util.ConcurrentModificationException
> > at
> __randomizedtesting.SeedInfo.seed([710FF61F1D2B5BF0:2D1EE4D519C19127]:0)
> > at
> java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1104)
> > at java.base/java.util.ArrayList$Itr.next(ArrayList.java:1058)
> > at
> org.apache.solr.crossdc.manager.SolrAndKafkaIntegrationTest.testPartitioning(SolrAndKafkaIntegrationTest.java:363)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]