Hernan Gelaf-Romer created HBASE-29197:
------------------------------------------
Summary: Deleting bulk loaded rows from the backup system table
can result in large batch rejections failures
Key: HBASE-29197
URL: https://issues.apache.org/jira/browse/HBASE-29197
Project: HBase
Issue Type: Bug
Components: backup&restore
Reporter: Hernan Gelaf-Romer
At my company, we're experimenting with the new incremental backup system.
We've experienced issues deleting large number of bulkloaded rows from the
system table if when exceeding the batch limit
??2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl
- id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last
exception=java.io.IOException: java.io.IOException: Rejecting large batch
operation for current batch with firstRegionName:
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested
Number of Rows: 2048 , Size Threshold: 1500??
?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
?? at
org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
??Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException:
Rejecting large batch operation for current batch with firstRegionName:
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested
Number of Rows: 2048 , Size Threshold: 1500??
?? at
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
?? at
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
?? at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
?? ... 4 more??
?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259,
tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 --
final attempt!??
??2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR
o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776
actions: IOException: 75776 times, servers with issues:
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
??org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
75776 actions: IOException: 75776 times, servers with issues:
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
?? at
org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
?? at
com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
?? at
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
?? at
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
?? at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
?? at
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
?? at
com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
?? at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
?? at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
?? at java.base/java.lang.Thread.run(Thread.java:1583)??
?? Suppressed:
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
6144 actions: IOException: 6144 times, servers with issues:
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
We should split these batches up into chunks so they don't cause issues
--
This message was sent by Atlassian Jira
(v8.20.10#820010)