Hernan Gelaf-Romer created HBASE-29197:
------------------------------------------

             Summary: Deleting bulk loaded rows from the backup system table 
can result in large batch rejections failures
                 Key: HBASE-29197
                 URL: https://issues.apache.org/jira/browse/HBASE-29197
             Project: HBase
          Issue Type: Bug
          Components: backup&restore
            Reporter: Hernan Gelaf-Romer


At my company, we're experimenting with the new incremental backup system. 
We've experienced issues deleting large number of bulkloaded rows from the 
system table if when exceeding the batch limit

 
??2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl 
- id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last 
exception=java.io.IOException: java.io.IOException: Rejecting large batch 
operation for current batch with firstRegionName: 
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested 
Number of Rows: 2048 , Size Threshold: 1500??
 ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
 ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
 ?? at 
org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
 ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
 ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
 ??Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
Rejecting large batch operation for current batch with firstRegionName: 
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested 
Number of Rows: 2048 , Size Threshold: 1500??
 ?? at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
 ?? at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
 ?? at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
 ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
 ?? ... 4 more??
 ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259, 
tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 -- 
final attempt!??
 ??2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR 
o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776 
actions: IOException: 75776 times, servers with issues: 
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
 ??org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
75776 actions: IOException: 75776 times, servers with issues: 
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
 ?? at 
com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
 ?? at 
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
 ?? at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
 ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
 ?? at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
 ?? at 
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
 ?? at 
com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
 ?? at 
com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
 ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
 ?? at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
 ?? at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
 ?? at java.base/java.lang.Thread.run(Thread.java:1583)??
 ?? Suppressed: 
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
6144 actions: IOException: 6144 times, servers with issues: 
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
 ?? at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
 ?? at 
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
 
We should split these batches up into chunks so they don't cause issues
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to