Hernan Gelaf-Romer created HBASE-29197: ------------------------------------------
Summary: Deleting bulk loaded rows from the backup system table can result in large batch rejections failures Key: HBASE-29197 URL: https://issues.apache.org/jira/browse/HBASE-29197 Project: HBase Issue Type: Bug Components: backup&restore Reporter: Hernan Gelaf-Romer At my company, we're experimenting with the new incremental backup system. We've experienced issues deleting large number of bulkloaded rows from the system table if when exceeding the batch limit ??2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl - id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last exception=java.io.IOException: java.io.IOException: Rejecting large batch operation for current batch with firstRegionName: backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested Number of Rows: 2048 , Size Threshold: 1500?? ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)?? ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)?? ?? at org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)?? ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)?? ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)?? ??Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: Rejecting large batch operation for current batch with firstRegionName: backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested Number of Rows: 2048 , Size Threshold: 1500?? ?? at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)?? ?? at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)?? ?? at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)?? ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)?? ?? ... 4 more?? ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259, tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 -- final attempt!?? ??2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776 actions: IOException: 75776 times, servers with issues: na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259?? ??org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 75776 actions: IOException: 75776 times, servers with issues: na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177, na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)?? ?? at org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)?? ?? at com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)?? ?? at com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)?? ?? at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)?? ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)?? ?? at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)?? ?? at com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)?? ?? at com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)?? ?? at com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)?? ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)?? ?? at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)?? ?? at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)?? ?? at java.base/java.lang.Thread.run(Thread.java:1583)?? ?? Suppressed: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 6144 actions: IOException: 6144 times, servers with issues: na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)?? ?? at org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)?? ?? at org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)?? We should split these batches up into chunks so they don't cause issues -- This message was sent by Atlassian Jira (v8.20.10#820010)