steveloughran opened a new pull request, #6716: URL: https://github.com/apache/hadoop/pull/6716
Improve resilience of task commit save and rename operation with retries. * Retries of save() 5 attempts, with 500 millis sleep between them. No configuration. Issue: should we make this configurable? * Split delete(path, recursive) into deleteFile and rmdir for separate statistics. Test simulation expands to: * Support recovery through a countdown of calls to fail. * Simulate timeout before *and after* rename calls. This is based on #6596 but skips the rate limiting logic spanning common and azure, instead it only contains changes in manifest committer -easier to backport. ### How was this patch tested? * manual test of new tests * full test suite left to yetus * azure test run in progress. ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org