Jan Van Besien created HBASE-29774:
--------------------------------------
Summary: incremental backup fails on empty WAL files
Key: HBASE-29774
URL: https://issues.apache.org/jira/browse/HBASE-29774
Project: HBase
Issue Type: Bug
Components: backup&restore
Affects Versions: 2.6.4
Reporter: Jan Van Besien
Incremental backup fails during
{{IncrementalTableBackupClient.convertWALsToHFiles}} when one of these WAL
files is empty (zero bytes).
The map tasks fail like this:
{code:java}
2025-12-12 06:22:21,785 INFO [main]
org.apache.hadoop.hbase.mapreduce.WALInputFormat: Opening
hdfs://hdfsns/hbase/hbase/oldWALs/hbase-region-0.hbase-region.qa03-shared.svc.cluster.local%2C16020%2C1764740248035.hbase-region-0.hbase-region.qa03-shared.svc.cluster.local%2C16020%2C1764740248035.regiongroup-0.1764754214409
for
hdfs://hdfsns/hbase/hbase/oldWALs/hbase-region-0.hbase-region.qa03-shared.svc.cluster.local%2C16020%2C1764740248035.hbase-region-0.hbase-region.qa03-shared.svc.cluster.local%2C16020%2C1764740248035.regiongroup-0.1764754214409
(-9223372036854775808:9223372036854775807) length:0
2025-12-12 06:22:21,810 INFO [main]
org.apache.hadoop.hbase.mapreduce.WALInputFormat: Closing reader
2025-12-12 06:22:21,811 INFO [main] org.apache.hadoop.mapred.MapTask: Starting
flush of map output
2025-12-12 06:22:21,815 INFO [main] org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor [.deflate]
2025-12-12 06:22:21,904 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child :
org.apache.hadoop.hbase.regionserver.wal.WALHeaderEOFException: EOF while
reading PB WAL magic
at
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufWALReader.readHeader(AbstractProtobufWALReader.java:221)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufWALReader.init(AbstractProtobufWALReader.java:147)
at
org.apache.hadoop.hbase.wal.WALFactory.createStreamReader(WALFactory.java:360)
at
org.apache.hadoop.hbase.wal.WALFactory.createStreamReader(WALFactory.java:481)
at
org.apache.hadoop.hbase.mapreduce.WALInputFormat$WALRecordReader.openReader(WALInputFormat.java:162)
at
org.apache.hadoop.hbase.mapreduce.WALInputFormat$WALRecordReader.openReader(WALInputFormat.java:204)
at
org.apache.hadoop.hbase.mapreduce.WALInputFormat$WALRecordReader.initialize(WALInputFormat.java:197)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:561)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
at java.base/javax.security.auth.Subject.doAs(Subject.java:525)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.io.EOFException
at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210)
at java.base/java.io.DataInputStream.readFully(DataInputStream.java:179)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufWALReader.readHeader(AbstractProtobufWALReader.java:219)
... 14 more
{code}
The file mentioned in the above log snippet is indeed zero bytes.
The calling code fails like this:
{code}
2025-12-12 06:22:45,140 ERROR
org.apache.hadoop.hbase.backup.impl.TableBackupClient: Unexpected exception in
incremental-backup: incremental copy backup_1765519365442WAL Player failed
java.io.IOException: WAL Player failed
at
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.walToHFiles(IncrementalTableBackupClient.java:448)
at
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.convertWALsToHFiles(IncrementalTableBackupClient.java:414)
at
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:311)
at
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)
{code}
The javadoc in {{WALHeaderEOFException}} says "This usually means the WAL file
just contains nothing and we are safe to skip over it.". So maybe that is
indeed what needs to happen here?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)