EgbertW opened a new pull request, #37166:
URL: https://github.com/apache/beam/pull/37166

   I encountered two issues with ElasticsearchIO:
   
   1. We have Elasticsearch running in Kubernetes behind a proxy that can reply 
with "504 Gateway Timeout" or similar server errors. The current implementation 
_seems_ to handle this correctly, but it assumes that _all_ exceptions are 
wrapped. 
   
   While the wrapping does happen here: 
https://github.com/elastic/elasticsearch/blob/main/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L304C31-L304C50
 this only applies to exceptions like SocketException, ConnectExceptions and 
such.
   
   In this particular case, the request simply succeeds and the response has a 
status 504 (or similar). Therefore it ends up in ConvertResponse: 
https://github.com/elastic/elasticsearch/blob/main/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L304C31-L304C50
   
   ConvertResponse is the only method that ever creates a ResponseException, 
here: 
https://github.com/elastic/elasticsearch/blob/main/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L304C31-L304C50
   
   This is thrown a few lines below, without being wrapped.
   
   This means that the current implementation here: 
https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L2819
   
   ```
           if (t.getCause() instanceof ResponseException) {
             ResponseException ex = (ResponseException) t.getCause();
   ```
   
   will actually never execute as intended, as the ResponseException is always 
the main exception and never wrapped. 
   
   The consequence of this is that `isRetryableClientException` never returns 
`true` for these type of errors even though it should - these proxy errors are 
typically temporarily and a retry should definitely be done. The fix is easy:
   
   ```
           if (t instanceof ResponseException) {
             ResponseException ex = (ResponseException) t;
   ```
   
   While addressing this issue I encountered that now it is marked retryable 
and ElasticsearchIO does this, using `handleRetry` here: 
https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L2896
   
   However, this is no longer executed in a `try/catch` construct and hence, if 
it fails for all attempts, the exception is always thrown, even if 
`throwWriteErrors` was set to false. This MR also addresses this issue.
   
   The current tests did test a `ResponseException` scenario but this was only 
a scenario that triggers a HTTP 4XX Bad Request, no 500 Server Errors. Also, 
since this test did not specify a `RetryConfiguration`, from the output it did 
not become clear which path the exception follows: either just simply being 
caught and forwarded to the output tag 
`org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.Write#FAILED_WRITES`, or 
attempted to retry because it satisfies `isRetryableClientException`. This MR 
adds a test that uses a mocked webserver that always returns a specific HTTP 
error code so that the flow can properly be tested.
   
   Thanks for considering this MR, and please let me know what I can do to 
improve it if necessary.
   
   Question: do I need to create an issue before merging the fix? 
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/actions/workflows/build_wheels.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/actions/workflows/python_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/actions/workflows/java_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/actions/workflows/go_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to