[ https://issues.apache.org/jira/browse/BEAM-3848?focusedWorklogId=85422&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85422 ]
ASF GitHub Bot logged work on BEAM-3848: ---------------------------------------- Author: ASF GitHub Bot Created on: 28/Mar/18 22:37 Start Date: 28/Mar/18 22:37 Worklog Time Spent: 10m Work Description: timrobertson100 commented on a change in pull request #4905: [BEAM-3848] Enables ability to retry Solr writes on error (SolrIO) URL: https://github.com/apache/beam/pull/4905#discussion_r177909607 ########## File path: sdks/java/io/solr/src/test/java/org/apache/beam/sdk/io/solr/SolrIOTest.java ########## @@ -263,4 +276,109 @@ public void testSplit() throws Exception { // therefore, can not exist an empty shard. assertEquals("Wrong number of empty splits", expectedNumSplits, nonEmptySplits); } + + /** + * Ensure that the retrying is ignored under success conditions. + */ + @Test + public void testWriteDefaultRetrySuccess() throws Exception { + SolrIO.Write write = mock(SolrIO.Write.class); + when(write.getRetryConfiguration()) + .thenReturn(SolrIO.RetryConfiguration.create(10, Duration.standardSeconds(10))); + SolrIO.Write.WriteFn writeFn = new SolrIO.Write.WriteFn(write); + AuthorizedSolrClient solrClient = mock(AuthorizedSolrClient.class); + + // simulate success + when(solrClient.process(any(String.class), any(SolrRequest.class))) + .thenReturn(mock(SolrResponse.class)); + + List<SolrInputDocument> batch = SolrIOTestUtils.createDocuments(1); + writeFn.flushBatch(solrClient, batch); + verify(solrClient, times(1)).process(any(String.class), any(SolrRequest.class)); + } + + /** + * Ensure that the default retrying behavior surfaces errors immediately under failure conditions. + */ + @Test + public void testWriteRetryFail() throws Exception { + SolrIO.Write write = mock(SolrIO.Write.class); + when(write.getRetryConfiguration()).thenReturn(SolrIO.DEFAULT_RETRY_CONFIGURATION); + SolrIO.Write.WriteFn writeFn = new SolrIO.Write.WriteFn(write); + AuthorizedSolrClient solrClient = mock(AuthorizedSolrClient.class); + + // simulate failure + when(solrClient.process(any(String.class), any(SolrRequest.class))) + .thenThrow(new SolrServerException("Fail")); + + List<SolrInputDocument> batch = SolrIOTestUtils.createDocuments(1); + try { + writeFn.flushBatch(solrClient, batch); + fail("Error should have been surfaced when flushing batch"); + } catch (IOException e) { + verify(solrClient, times(1)).process(any(String.class), any(SolrRequest.class)); + } + } + + /** + * Ensure that a time bounded retrying is observed. + */ + @Test + public void testWriteRetryTimeBound() throws Exception { + SolrIO.Write write = mock(SolrIO.Write.class); + when(write.getRetryConfiguration()) + .thenReturn( + SolrIO.RetryConfiguration.create(Integer.MAX_VALUE, Duration.standardSeconds(3))); + SolrIO.Write.WriteFn writeFn = new SolrIO.Write.WriteFn(write); + AuthorizedSolrClient solrClient = mock(AuthorizedSolrClient.class); + + // simulate failure + when(solrClient.process(any(String.class), any(SolrRequest.class))) + .thenThrow( + new HttpSolrClient.RemoteSolrException( + "localhost", 1, "ignore", new IOException("Network"))); + + List<SolrInputDocument> batch = SolrIOTestUtils.createDocuments(1); + Stopwatch stopwatch = Stopwatch.createStarted(); + + try { + writeFn.flushBatch(solrClient, batch); + fail("Error should have been surfaced when flushing batch"); + } catch (IOException e) { + // at least two attempts must be made + verify(solrClient, Mockito.atLeast(2)).process(any(String.class), any(SolrRequest.class)); + long seconds = stopwatch.elapsed(TimeUnit.SECONDS); + assertTrue( + "Retrying should have executed for at least 3 seconds but was " + seconds, + seconds >= 3); + } + } + + /** + * Ensure that retries are initiated up to a limited number. + */ + @Test + public void testWriteRetryAttemptBound() throws Exception { + SolrIO.Write write = mock(SolrIO.Write.class); Review comment: How about the following (copied from the `JdbcIO` design and test approach) 1. I introduce a configurable `RetryStrategy` that can be added to the `Write` and will come into effect when a `RetryConfiguration` is also provided 2. A `DefaultRetryStrategy` will retry on `HttpSolrClient.RemoteSolrException`, `SolrServerException`, `IOException` and also on `SolrException` (where the code is `5xx` only and I see is missing in the original PR). 3. I introduce `SLF4J` logging at `warn` when retries are invoked 4. I run the mini-IT style tests using the embedded Solr server with 1. a custom `RetryStrategy` to return `true`(i.e. retry) on any `SolrException` 2. write to a non existing collection which will cause a `SolrException` (code = 400 and a message "collection not found: ...") thus invoking the retry thanks to the strategy 3. Use the same as apprach as [`JdbcIOTest` of checking for log statements](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOTest.java#L359) to verify retry behavior is invoked I'd suggest that we keep the 10 lines of retry within the ´SolrIO´ and then not move to the `AuthorizedSolrClient` as we considered above, to be similar to the ´JdbcIO´ and also so that it is the `SolrIO` logs we're using in the test. I'm happy to implement this (especially since it brings further consistency with `JdbcIO`) but would like to have a committer quickly approve the procedure first. Fundamentally we're saying that relying on log statements trumps mocking the underlying `SolrClient` (not something I particularly agree with, but keen to follow the project approach). CC @iemejia @echauchot @jkff ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 85422) Time Spent: 4h 40m (was: 4.5h) > SolrIO: Improve retrying mechanism in client writes > --------------------------------------------------- > > Key: BEAM-3848 > URL: https://issues.apache.org/jira/browse/BEAM-3848 > Project: Beam > Issue Type: Improvement > Components: io-java-solr > Affects Versions: 2.2.0, 2.3.0 > Reporter: Tim Robertson > Assignee: Tim Robertson > Priority: Minor > Time Spent: 4h 40m > Remaining Estimate: 0h > > A busy SOLR server is prone to return RemoteSOLRException on writing which > currently fails a complete task (e.g. a partition of a spark RDD being > written to SOLR). > A good addition would be the ability to provide a retrying mechanism for the > batch in flight, rather than failing fast, which will most likely trigger a > much larger retry of more writes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)