[ https://issues.apache.org/jira/browse/HBASE-19441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406687#comment-16406687 ]
Josh Elser commented on HBASE-19441: ------------------------------------ {code:java} [WARNING] /testptch/hbase/hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupManager.java:[87,19] [MissingOverride] run implements method in Runnable; expected @Override [WARNING] /testptch/hbase/hbase-backup/src/test/java/org/apache/hadoop/hbase/backup/TestBackupManager.java:[120,24] [ThreadJoinLoop] Thread.join needs to be surrounded by a loop until it succeeds, as in Uninterruptibles.joinUninterruptibly.{code} Also please fix the javac warnings {code:java} + throw new IOException(); ... + } catch (IOException | InterruptedException e) { + fail("Unexpected exception: "+ e.getMessage());{code} Can you please add a message to the IOException thrown in the test? If this test would fail for some reason, we'd get a really non-intuitive explanation. {noformat} + assertTrue(startTimes.get(1) - startTimes.get(0) >= sleepTime); + assertTrue(stopTimes.get(1) - stopTimes.get(0) >= sleepTime);{noformat} Would be nice to print out the {{startTimes}} and {{stopTimes}} in the respective messages for these assertions. Otherwise, this looks fine to me. A nice improvement over what is presently in master. Good work, Vlad. > Implement retry logic around starting exclusive backup operation > ---------------------------------------------------------------- > > Key: HBASE-19441 > URL: https://issues.apache.org/jira/browse/HBASE-19441 > Project: HBase > Issue Type: Improvement > Components: backup&restore > Reporter: Josh Elser > Assignee: Vladimir Rodionov > Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-19441-v1.patch, HBASE-19441-v2.patch > > > {quote} > Specifically, the client does a checkAndPut to specifics coordinates in the > backup table and throws an exception when that fails. Remember that backups > are client driven (per some design review from a long time ago), so queuing > is tough to reason about (we have no "centralized" execution system to use). > At a glance, it seems pretty straightforward to add some retry/backoff > semantics to BackupSystemTable#startBackupExclusiveOperation(). > {quote} > While we are in a state in which backup operations cannot be executed in > parallel, it would be nice to provide some retry logic + configuration. This > would alleviate users from having to build this themselves. -- This message was sent by Atlassian JIRA (v7.6.3#76005)