egalpin commented on code in PR #26424:
URL: https://github.com/apache/beam/pull/26424#discussion_r1288883681
##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -1224,26 +1241,9 @@ static class DefaultRetryPredicate implements
RetryPredicate {
this(429);
}
- /** Returns true if the response has the error code for any mutation. */
- private static boolean errorCodePresent(HttpEntity responseEntity, int
errorCode) {
Review Comment:
Thanks @mrkm4ntr for your patience, I have kept you waiting too many times.
This change is definitely interesting. I can see that in some cases, such
as successfully _creating_ a document via _bulk, the http status code is `200`
while the `status` field under `items` in the response body is `201` (updating
the doc results in `status` field being `200`:
```json
{
"took": 383,
"errors": false,
"items": [
{
"update": {
"_index": "foo",
"_type": "_doc",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
```
It's also the case, more importantly, that when one of the operations of the
bulk payload fails, but another succeeds, there can be an http status code of
200 but one of the items can have `status` field with non-200 code:
```json
{
"took": 19,
"errors": true,
"items": [
{
"update": {
"_index": "foo",
"_type": "_doc",
"_id": "2",
"_version": 7,
"result": "updated",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 7,
"_primary_term": 1,
"status": 200
}
},
{
"update": {
"_index": "foo",
"_type": "_doc",
"_id": "2",
"status": 400,
"error": {
"type": "illegal_argument_exception",
"reason": "failed to execute script",
"caused_by": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"if(ctx._source.group !=== null) {
ctx._source.gro ...",
" ^---- HERE"
],
"script": "if(ctx._source.group !=== null) {
ctx._source.group = params.id % 2 } else { ctx._source.group = 0 }",
"lang": "painless",
"position": {
"offset": 24,
"start": 0,
"end": 49
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "invalid sequence of tokens near
['='].",
"caused_by": {
"type": "no_viable_alt_exception",
"reason": "no_viable_alt_exception: null"
}
}
}
}
}
}
]
}
```
So the status code alone cannot be "trusted" to inform of all failures,
since there could be false negatives. However, it seems that the http status
code can be used as a first check: if the http status code is >=400, we _know_
a failure has occurred and we don't need to iterate through the `items`
necessarily.
Checking the http status code first might be a viable path forward for
properly handling the 429 case.
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]