egalpin commented on code in PR #26424:
URL: https://github.com/apache/beam/pull/26424#discussion_r1288883681


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -1224,26 +1241,9 @@ static class DefaultRetryPredicate implements 
RetryPredicate {
         this(429);
       }
 
-      /** Returns true if the response has the error code for any mutation. */
-      private static boolean errorCodePresent(HttpEntity responseEntity, int 
errorCode) {

Review Comment:
   Thanks @mrkm4ntr for your patience, I have kept you waiting too many times.
   
   This change is definitely interesting.  I can see that in some cases, such 
as successfully _creating_ a document via _bulk, the http status code is `200` 
while the `status` field under `items` in the response body is `201` (updating 
the doc results in `status` field being `200`:
   
   ```json
   {
       "took": 383,
       "errors": false,
       "items": [
           {
               "update": {
                   "_index": "foo",
                   "_type": "_doc",
                   "_id": "2",
                   "_version": 1,
                   "result": "created",
                   "_shards": {
                       "total": 2,
                       "successful": 1,
                       "failed": 0
                   },
                   "_seq_no": 0,
                   "_primary_term": 1,
                   "status": 201
               }
           }
       ]
   }
   ```
   
   It's also the case, more importantly, that when one of the operations of the 
bulk payload fails, but another succeeds, there can be an http status code of 
200 but one of the items can have `status` field with non-200 code:
   
   ```json
   {
       "took": 19,
       "errors": true,
       "items": [
           {
               "update": {
                   "_index": "foo",
                   "_type": "_doc",
                   "_id": "2",
                   "_version": 7,
                   "result": "updated",
                   "_shards": {
                       "total": 2,
                       "successful": 2,
                       "failed": 0
                   },
                   "_seq_no": 7,
                   "_primary_term": 1,
                   "status": 200
               }
           },
           {
               "update": {
                   "_index": "foo",
                   "_type": "_doc",
                   "_id": "2",
                   "status": 400,
                   "error": {
                       "type": "illegal_argument_exception",
                       "reason": "failed to execute script",
                       "caused_by": {
                           "type": "script_exception",
                           "reason": "compile error",
                           "script_stack": [
                               "if(ctx._source.group !=== null) { 
ctx._source.gro ...",
                               "                        ^---- HERE"
                           ],
                           "script": "if(ctx._source.group !=== null) { 
ctx._source.group = params.id % 2 } else { ctx._source.group = 0 }",
                           "lang": "painless",
                           "position": {
                               "offset": 24,
                               "start": 0,
                               "end": 49
                           },
                           "caused_by": {
                               "type": "illegal_argument_exception",
                               "reason": "invalid sequence of tokens near 
['='].",
                               "caused_by": {
                                   "type": "no_viable_alt_exception",
                                   "reason": "no_viable_alt_exception: null"
                               }
                           }
                       }
                   }
               }
           }
       ]
   }
   ```
   
   So the status code alone cannot be "trusted" to inform of all failures, 
since there could be false negatives.  However, it seems that the http status 
code can be used as a first check: if the http status code is >=400, we _know_ 
a failure has occurred and we don't need to iterate through the `items` 
necessarily.  
   
   Checking the http status code first might be a viable path forward for 
properly handling the 429 case.
   
   Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to