[ https://issues.apache.org/jira/browse/FLINK-35546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-35546: ----------------------------------- Labels: pull-request-available (was: ) > Elasticsearch 8 connector fails fast for non-retryable bulk request items > ------------------------------------------------------------------------- > > Key: FLINK-35546 > URL: https://issues.apache.org/jira/browse/FLINK-35546 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch > Reporter: Mingliang Liu > Priority: Major > Labels: pull-request-available > > Discussion thread: > [https://lists.apache.org/thread/yrf0mmbch0lhk3rgkz94fr0x5qz2417l] > {quote} > Currently the Elasticsearch 8 connector retries all items if the request > fails as a whole, and retries failed items if the request has partial > failures > [[1|https://github.com/apache/flink-connector-elasticsearch/blob/5d1f8d03e3cff197ed7fe30b79951e44808b48fe/flink-connector-elasticsearch8/src/main/java/org/apache/flink/connector/elasticsearch/sink/Elasticsearch8AsyncWriter.java#L152-L170]\]. > I think this infinitely retries might be problematic in some cases when > retrying can never eventually succeed. For example, if the request is 400 > (bad request) or 404 (not found), retries do not help. If there are too many > failed items non-retriable, new requests will get processed less effectively. > In extreme cases, it may stall the pipeline if in-flight requests are > occupied by those failed items. > FLIP-451 proposes timeout for retrying which helps with un-acknowledged > requests, but not addressing the case when request gets processed and failed > items keep failing no matter how many times we retry. Correct me if I'm wrong. > One opinionated option is to fail fast for non-retriable errors like 400 / > 404 and to drop items for 409. Or we can allow users to configure "drop/fail" > behavior for non-retriable errors. I prefer the latter. I checked how > LogStash ingests data to Elasticsearch and it takes a similar approach for > non-retriable errors > [[2|https://github.com/logstash-plugins/logstash-output-elasticsearch/blob/main/lib/logstash/plugin_mixins/elasticsearch/common.rb#L283-L304]\]. > In my day job, we have a dead-letter-queue in AsynSinkWriter for failed > entries that exhaust retries. I guess that is too specific to our setup and > seems an overkill here for Elasticsearch connector. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)