[ 
https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673171#comment-16673171
 ] 

Ted Yu commented on HBASE-21387:
--------------------------------

>From https://builds.apache.org/job/PreCommit-HBASE-Build/14932/console :
{code}
00:38:23 +1 overall
00:38:23 
00:38:23 | Vote |       Subsystem |  Runtime   | Comment
00:38:23 
============================================================================
00:38:23 |   0  |         reexec  |   0m 11s   | Docker mode activated. 
00:38:23 |   0  |          patch  |   0m  2s   | The patch file was not named 
according 
00:38:23 |      |                 |            | to hbase's naming conventions. 
Please
00:38:23 |      |                 |            | see
00:38:23 |      |                 |            | 
https://yetus.apache.org/documentation/0.
00:38:23 |      |                 |            | 8.0/precommit-patchnames for
00:38:23 |      |                 |            | instructions.
00:38:23 |      |                 |            | Prechecks 
00:38:23 |  +1  |      hbaseanti  |   0m  0s   | Patch does not have any 
anti-patterns. 
00:38:23 |  +1  |        @author  |   0m  0s   | The patch does not contain any 
@author 
00:38:23 |      |                 |            | tags.
00:38:23 |  -0  |     test4tests  |   0m  0s   | The patch doesn't appear to 
include any 
00:38:23 |      |                 |            | new or modified tests. Please 
justify
00:38:23 |      |                 |            | why no new tests are needed 
for this
00:38:23 |      |                 |            | patch. Also please list what 
manual
00:38:23 |      |                 |            | steps were performed to verify 
this
00:38:23 |      |                 |            | patch.
00:38:23 |      |                 |            | master Compile Tests 
00:38:23 |  +1  |     mvninstall  |   4m 49s   | master passed 
00:38:23 |  +1  |        compile  |   1m 46s   | master passed 
00:38:23 |  +1  |     checkstyle  |   1m  7s   | master passed 
00:38:23 |  +1  |     shadedjars  |   4m  2s   | branch has no errors when 
building our 
00:38:23 |      |                 |            | shaded downstream artifacts.
00:38:23 |  +1  |       findbugs  |   2m  1s   | master passed 
00:38:23 |  +1  |        javadoc  |   0m 30s   | master passed 
00:38:23 |      |                 |            | Patch Compile Tests 
00:38:23 |  +1  |     mvninstall  |   4m 45s   | the patch passed 
00:38:23 |  +1  |        compile  |   1m 50s   | the patch passed 
00:38:23 |  +1  |          javac  |   1m 50s   | the patch passed 
00:38:23 |  +1  |     checkstyle  |   1m  4s   | the patch passed 
00:38:23 |  +1  |     whitespace  |   0m  0s   | The patch has no whitespace 
issues. 
00:38:23 |  +1  |     shadedjars  |   4m  6s   | patch has no errors when 
building our 
00:38:23 |      |                 |            | shaded downstream artifacts.
00:38:24 |  +1  |    hadoopcheck  |   9m 53s   | Patch does not cause any 
errors with 
00:38:24 |      |                 |            | Hadoop 2.7.4 or 3.0.0.
00:38:24 |  +1  |       findbugs  |   2m 11s   | the patch passed 
00:38:24 |  +1  |        javadoc  |   0m 29s   | the patch passed 
00:38:24 |      |                 |            | Other Tests 
00:38:24 |  +1  |           unit  | 128m 21s   | hbase-server in the patch 
passed. 
00:38:24 |  +1  |     asflicense  |   0m 25s   | The patch does not generate 
ASF License 
00:38:24 |      |                 |            | warnings.
00:38:24 |      |                 | 168m  0s   | 
00:38:24 
00:38:24 
00:38:24 || Subsystem || Report/Notes ||
00:38:24 
============================================================================
00:38:24 | Docker | Client=17.05.0-ce Server=17.05.0-ce 
Image:yetus/hbase:b002b0b |
00:38:24 | JIRA Issue | HBASE-21387 |
00:38:24 | JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12946617/21387.v3.txt |
{code}

> Race condition surrounding in progress snapshot handling in snapshot cache 
> leads to loss of snapshot files
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21387
>                 URL: https://issues.apache.org/jira/browse/HBASE-21387
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Major
>              Labels: snapshot
>         Attachments: 21387.v1.txt, 21387.v2.txt, 21387.v3.txt
>
>
> During recent report from customer where ExportSnapshot failed:
> {code}
> 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
> snapshot.SnapshotReferenceUtil: Can't find hfile: 
> 44f6c3c646e84de6a63fe30da4fcb3aa in the real 
> (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  or archive 
> (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
>  directory for the primary table. 
> {code}
> We found the following in log:
> {code}
> 2018-10-09 18:54:23,675 DEBUG 
> [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
> cleaner.HFileCleaner: Removing: 
> hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
> from archive
> {code}
> The root cause is race condition surrounding in progress snapshot(s) handling 
> between refreshCache() and getUnreferencedFiles().
> There are two callers of refreshCache: one from RefreshCacheTask#run and the 
> other from SnapshotHFileCleaner.
> Let's look at the code of refreshCache:
> {code}
>       if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) {
> {code}
> which only excludes the temp dir, but not in progress snapshot(s).
> Suppose when the RefreshCacheTask runs refreshCache, SnapshotDirectoryInfo 
> for the in progress snapshot doesn't include all store file (leaving some 
> hole in cache).
> When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that 
> lastModifiedTime is up to date. So cleaner proceeds to check in progress 
> snapshot(s). However, the snapshot has completed by that time, resulting in 
> some file(s) deemed unreferenced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to