[
https://issues.apache.org/jira/browse/HBASE-29111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925416#comment-17925416
]
Hudson commented on HBASE-29111:
--------------------------------
Results for branch branch-2.5
[build #657 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]
(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(x) {color:red}-1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop ${HADOOP_THREE_VERSION} backward compatibility
checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop ${HADOOP_THREE_VERSION} backward compatibility
checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(x) {color:red}-1 source release artifact{color}
-- Something went wrong with this stage, [check relevant console
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657//console].
(x) {color:red}-1 client integration test{color}
-- Something went wrong with this stage, [check relevant console
output|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/657//console].
> Data loss in table cloned from a snapshot
> -----------------------------------------
>
> Key: HBASE-29111
> URL: https://issues.apache.org/jira/browse/HBASE-29111
> Project: HBase
> Issue Type: Bug
> Components: dataloss, snapshots
> Reporter: Junegunn Choi
> Assignee: Junegunn Choi
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.2, 2.5.12
>
>
> We experienced permanent data loss in a table cloned from a snapshot.
> Here's what we found.
> * If you clone a table from a snapshot that contains split regions and
> reference files, and immediately delete the snapshot, HBase can prematurely
> delete the original HFiles causing data loss.
> h2. How to reproduce
> To quickly reproduce the issue, adjust the cleaner and janitor intervals and
> set HFile TTL to zero. Also disable compaction so that reference files are
> not compacted away. Or, you can put a lot of data so that compaction doesn't
> finish during the test.
> {code:java}
> <property>
> <name>hbase.master.cleaner.interval</name>
> <value>1000</value>
> </property>
> <property>
> <name>hbase.catalogjanitor.interval</name>
> <value>1000</value>
> </property>
> <property>
> <name>hbase.master.hfilecleaner.ttl</name>
> <value>0</value>
> </property>
> <property>
> <name>hbase.regionserver.compaction.enabled</name>
> <value>false</value>
> </property>
> {code}
> And run this code on HBase shell.
> {code:java}
> # Create test table and write some data
> create 't', 'd'
> 10.times do |i|
> put 't', i, 'd:foo', '_' * 1024
> end
> # Split in the middle and take the snapshot
> split 't', '5'
> snapshot 't', 's'
> # Drop the table and clone it from the snapshot
> disable 't'
> drop 't'
> clone_snapshot 's', 't'
> # Immediately delete the snapshot
> delete_snapshot 's'
> # Try disabling and re-enabling the table
> sleep 2
> disable 't'
> enable 't'
> # java.io.FileNotFoundException: HFileLink locations=[...]
> {code}
> h2. What actually happens
> User clones a table from a snapshot containing split regions and reference
> files.
> {noformat}
> snapshot.RestoreSnapshotHelper: clone region=a23be88470c13611f6f24f20e0cf00ed
> as a23be88470c13611f6f24f20e0cf00ed in snapshot s
> ...
> regionserver.HRegion: creating {ENCODED => a23be88470c13611f6f24f20e0cf00ed,
> NAME => 't,40000000,1738562472443.a23be88470c13611f6f24f20e0cf00ed.',
> STARTKEY => '40000000', ENDKEY => '80000000', OFFLINE => true, SPLIT =>
> true}, tableDescriptor='t', {TABLE_ATTRIBUTES => {METADATA =>
> {'hbase.store.file-tracker.impl' => 'DEFAULT'}}}, {NAME => 'd',
> INDEX_BLOCK_ENCODING => 'NONE', VERSIONS => '1', KEEP_DELETED_CELLS =>
> 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS =>
> '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false',
> COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536 B (64KB)'},
> regionDir=file:/Users/jg/github/hbase/tmp/hbase
> ...
> snapshot.RestoreSnapshotHelper: finishing restore table regions using
> snapshot=name: "s"
> {noformat}
> And the user deletes the snapshot.
> {noformat}
> snapshot.SnapshotManager: Deleting snapshot: s
> {noformat}
> After a while, CatalogJanitor garbage-collects the split parents as it sees
> no daughter information in the meta table.
> {noformat}
> janitor.CatalogJanitor: Cleaning parent region {ENCODED =>
> a23be88470c13611f6f24f20e0cf00ed, NAME =>
> 't,40000000,1738562472443.a23be88470c13611f6f24f20e0cf00ed.', STARTKEY =>
> '40000000', ENDKEY => '80000000', OFFLINE => true, SPLIT => true}
> janitor.CatalogJanitor: Deleting region a23be88470c13611f6f24f20e0cf00ed
> because daughters -- null, null -- no longer hold references
> {noformat}
> (see "{{{}null, null"{}}} part)
> This causes the HFileLinks to be archived.
> {noformat}
> backup.HFileArchiver: Archived from FileablePath,
> file:/Users/jg/github/hbase/tmp/hbase/data/default/t/a23be88470c13611f6f24f20e0cf00ed/d/t=a23be88470c13611f6f24f20e0cf00ed-ecdd6aa22a6146599467839c56767522
> to
> file:/Users/jg/github/hbase/tmp/hbase/archive/data/default/t/a23be88470c13611f6f24f20e0cf00ed/d/t=a23be88470c13611f6f24f20e0cf00ed-ecdd6aa22a6146599467839c56767522
> {noformat}
> And the cleaners unanimously agree to delete the original HFile.
> * We already deleted the snapshot, so SnapshotHFileCleaner won't complain.
> * Because HFileLink is archived, HFileLinkCleaner won't complain.
> And the HFile is deleted before the daughter regions succeed to rebuild the
> data in it through compaction.
> {noformat}
> cleaner.HFileCleaner: Removing
> file:/Users/jg/github/hbase/tmp/hbase/archive/data/default/t/a23be88470c13611f6f24f20e0cf00ed/d/ecdd6aa22a6146599467839c56767522
> {noformat}
> And the data loss.
> {noformat}
> regionserver.CompactSplit: Compaction selection failed
> region=t,6,1738562566034.8ad3785a3afe89e59b72db5d5d3a1bf5.,
> storeName=8ad3785a3afe89e59b72db5d5d3a1bf5/d, priority=14,
> startTime=1738562622689
> java.io.FileNotFoundException: HFileLink locations=[
>
> file:/Users/jg/github/hbase/tmp/hbase/data/default/t/a23be88470c13611f6f24f20e0cf00ed/d/ecdd6aa22a6146599467839c56767522,
>
> file:/Users/jg/github/hbase/tmp/hbase/archive/data/default/t/a23be88470c13611f6f24f20e0cf00ed/d/ecdd6aa22a6146599467839c56767522,
>
> file:/Users/jg/github/hbase/tmp/hbase/.tmp/data/default/t/a23be88470c13611f6f24f20e0cf00ed/d/ecdd6aa22a6146599467839c56767522,
>
> file:/Users/jg/github/hbase/tmp/hbase/mobdir/data/default/t/a23be88470c13611f6f24f20e0cf00ed/d/ecdd6aa22a6146599467839c56767522
> ]
> {noformat}
> h2. Fix
> Make sure to put the split information to the meta table when cloning a
> table. The information was already there, we just didn't use it.
> Let me open pull requests both on master and branch-2.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)