EgbertW opened a new pull request, #37166: URL: https://github.com/apache/beam/pull/37166
I encountered two issues with ElasticsearchIO: 1. We have Elasticsearch running in Kubernetes behind a proxy that can reply with "504 Gateway Timeout" or similar server errors. The current implementation _seems_ to handle this correctly, but it assumes that _all_ exceptions are wrapped. While the wrapping does happen here: https://github.com/elastic/elasticsearch/blob/main/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L304C31-L304C50 this only applies to exceptions like SocketException, ConnectExceptions and such. In this particular case, the request simply succeeds and the response has a status 504 (or similar). Therefore it ends up in ConvertResponse: https://github.com/elastic/elasticsearch/blob/main/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L304C31-L304C50 ConvertResponse is the only method that ever creates a ResponseException, here: https://github.com/elastic/elasticsearch/blob/main/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L304C31-L304C50 This is thrown a few lines below, without being wrapped. This means that the current implementation here: https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L2819 ``` if (t.getCause() instanceof ResponseException) { ResponseException ex = (ResponseException) t.getCause(); ``` will actually never execute as intended, as the ResponseException is always the main exception and never wrapped. The consequence of this is that `isRetryableClientException` never returns `true` for these type of errors even though it should - these proxy errors are typically temporarily and a retry should definitely be done. The fix is easy: ``` if (t instanceof ResponseException) { ResponseException ex = (ResponseException) t; ``` While addressing this issue I encountered that now it is marked retryable and ElasticsearchIO does this, using `handleRetry` here: https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L2896 However, this is no longer executed in a `try/catch` construct and hence, if it fails for all attempts, the exception is always thrown, even if `throwWriteErrors` was set to false. This MR also addresses this issue. The current tests did test a `ResponseException` scenario but this was only a scenario that triggers a HTTP 4XX Bad Request, no 500 Server Errors. Also, since this test did not specify a `RetryConfiguration`, from the output it did not become clear which path the exception follows: either just simply being caught and forwarded to the output tag `org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.Write#FAILED_WRITES`, or attempted to retry because it satisfies `isRetryableClientException`. This MR adds a test that uses a mocked webserver that always returns a specific HTTP error code so that the flow can properly be tested. Thanks for considering this MR, and please let me know what I can do to improve it if necessary. Question: do I need to create an issue before merging the fix? ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier). To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md) GitHub Actions Tests Status (on master branch) ------------------------------------------------------------------------------------------------ [](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
