[ https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jerry He updated HBASE-9397: ---------------------------- Attachment: HBASE-9397-trunk.patch HBASE-9397-0.94.patch > Snapshots with the same name are allowed to proceed concurrently > ---------------------------------------------------------------- > > Key: HBASE-9397 > URL: https://issues.apache.org/jira/browse/HBASE-9397 > Project: HBase > Issue Type: Bug > Components: snapshots > Affects Versions: 0.95.2, 0.94.11 > Reporter: Jerry He > Assignee: Jerry He > Fix For: 0.94.12, 0.96.0 > > Attachments: HBASE-9397-0.94.patch, HBASE-9397-trunk.patch > > > Snapshots with the same name (but on different tables) are allowed to proceed > concurrently. > This seems to be loop hole created by allowing multiple snapshots (on > different tables) to run concurrently. > There are two checks in SnapshotManager, but fail to catch this particular > case. > In isSnapshotCompleted(), we only check the completed snapshot directory. > In isTakingSnapshot(), we only check for the same table name. > The end result is the concurrently running snapshots with the same name are > overlapping and messing up each other. For example, cleaning up the other's > snapshot working directory in .hbase-snapshot/.tmp/snapshot-name. > {code} > 2013-08-29 18:25:13,443 ERROR > org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking > snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to > exception:Couldn't read snapshot info > from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read > snapshot info > from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo > at > org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321) > at > org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira