[
https://issues.apache.org/jira/browse/HBASE-22190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830373#comment-16830373
]
Duo Zhang commented on HBASE-22190:
-----------------------------------
Even hit this...
{noformat}
2019-04-30 22:58:08,151 WARN [snapshot-hfile-cleaner-cache-refresher]
snapshot.SnapshotFileCache$RefreshCacheTask(294): Failed to refresh snapshot
hfile cache!
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: unable to parse
data manifest Message missing required fields: table_schema
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.readDataManifest(SnapshotManifest.java:561)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.load(SnapshotManifest.java:389)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.open(SnapshotManifest.java:142)
at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.visitTableStoreFiles(SnapshotReferenceUtil.java:113)
at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:348)
at
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:331)
at
org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:102)
at
org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.refreshCache(SnapshotFileCache.java:269)
at
org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.access$0(SnapshotFileCache.java:216)
at
org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache$RefreshCacheTask.run(SnapshotFileCache.java:292)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Caused by:
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
Message missing required fields: table_schema
at
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:79)
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:68)
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:86)
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:91)
at
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
at
org.apache.hbase.thirdparty.com.google.protobuf.GeneratedMessageV3.parseWithIOException(GeneratedMessageV3.java:335)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.SnapshotProtos$SnapshotDataManifest.parseFrom(SnapshotProtos.java:5816)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.readDataManifest(SnapshotManifest.java:557)
... 11 more
{noformat}
I think the problem is because the race between SnapshotFileCache refreshing
and snapshot file generating. The directory for the snapshot may have already
been created but the snapshot manifest may not be ready yet, so if we try to
load the snapshot into cache then we can just see an empty file list, since the
manifest has not been generated yet, and we may also see the above exception if
the manifest is half done...
What's more, we will update the recorded modification time before actually
loading anything, and when hitting an exception, like the above one, we will
not reset the recorded modification time, this could also lead to an incorrect
state in cache and we have no chance to update it unless there is a new
snapshot coming...
I think this is a very critical bug, as sometimes snapshot is used retain
critical data which may be used to do recovery, if it is not stable then...
> TestSnapshotFromMaster is flakey
> --------------------------------
>
> Key: HBASE-22190
> URL: https://issues.apache.org/jira/browse/HBASE-22190
> Project: HBase
> Issue Type: Task
> Reporter: Duo Zhang
> Priority: Blocker
>
> And it seems that it is not only a test issue, we do delete the files under
> the archive directory, which is incorrect.
> Need to find out why, this maybe a serious bug.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)