ss77892 opened a new pull request, #9711:
URL: https://github.com/apache/ozone/pull/9711

   ## What changes were proposed in this pull request?
   
   In [HDDS-11558](https://issues.apache.org/jira/browse/HDDS-11558), we 
introduced a retry idempotent. If a retry came from the client, we always want 
to return the existing reply. But it doesn't cover the case when a failed 
request landed in the retry cache. In this case, we can't build the response 
properly, and NPE is thrown.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-13621
   ## How was this patch tested?
   ozone freon ommg on the cluster where the OM process is continuously 
struggling from the JVM pauses. Without the patch, the retry operation never 
succeeded, so the number of correct writes was always less than expected. With 
the provided patch, the success rate of retries was always 100%.
   I attempted to create a unit test, but I found it quite challenging to 
replicate the scenario where a pause occurs after the request passes the check 
for the active leader. As a result, the request ended up in the RetryCache, but 
it failed when no leader was available.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to