keith-turner commented on PR #2967: URL: https://github.com/apache/accumulo/pull/2967#issuecomment-1264478349
@ctubbsii @cshannon @dlmarion I did three experiments in [406b9f8](https://github.com/apache/accumulo/pull/2967/commits/406b9f84e96faf936e071797e125f11b8118d00b) where I tried different ways of limiting the retries and returning the final map of props. Did this based on reading to the feedback from yesterday just to see how different approaches looked. I did the following : * I made instanceOperations().modifyProperties() take a timeout duration. * I renamed namespaceOperations().modifyProperties() to modifyPropertiesAsync() and it returns a CompletableFuture. It creates a cached thread pool (can't shut the thread pool down before method exits so the cached thread pool is good option) to execute the task asynchronously. I was wary of using the common pool from fork join because it shared JVM wide and is intended for computational task and not I/O task. I added Async to the name to make it clear that the operation may not be complete when the method returns. * I made tableOperations().modifyProperties() return a CompleteableFuture and take an Executor as parameter that it uses to execute the task. This avoids the Accumulo code having to create and manager an executor to run the async task. I made all of the operations return a `Map<String, String>`. The returned map is the final map that was accepted on the last retry. I think this will be useful for conditional updates as it allows the user to inspect what the final result was w/o having to do anything odd in their lambda like setting an atomic reference. To get a feel for how all of these changes would look when used I added three new test that make two conditional updates using each API. These three test were added at the end of PropStoreConfigIT.java. After making all these change I was digging a bit deeper in the code and found that the current code that is committed will already retry forever. The currently committed code calls [executeVoidTableCommand()](https://github.com/apache/accumulo/blob/8a7ed07d74a0ba0d127a37a5fced23d34a270dfe/core/src/main/java/org/apache/accumulo/core/rpc/clients/ManagerThriftClient.java#L101-L136) which retries forever when there are problems communicating with the Manager. After seeing this I think retrying forever is a good option in this PR for the following reasons : * The code to communicate with the manager retries forever and many table operations use this. * When things are working well (Manager and Zookeeper are healthy) I think its extremely unlikely that any modifyProperties call would retry forever given they are doing exponential backoff. When that code does have a high chance of retrying forever is when ZK and/or manager are unhealthy and in some code already retries forever in those cases. It would still be nice to return the final Map that was accepted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
