[
https://issues.apache.org/jira/browse/HBASE-29197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hernan Gelaf-Romer updated HBASE-29197:
---------------------------------------
Description:
At my company, we're experimenting with the new incremental backup system.
We've experienced issues deleting large number of bulkloaded rows from the
system table if when exceeding the batch limit
2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl -
id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last
exception=java.io.IOException: java.io.IOException: Rejecting large batch
operation for current batch with firstRegionName:
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested
Number of Rows: 2048 , Size Threshold: 1500
?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
?? at
org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException:
Rejecting large batch operation for current batch with firstRegionName:
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested
Number of Rows: 2048 , Size Threshold: 1500
?? at
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
?? at
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
?? at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
?? ... 4 more??
?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259,
tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 –
final attempt!??
2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR
o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776
actions: IOException: 75776 times, servers with issues:
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
75776 actions: IOException: 75776 times, servers with issues:
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
?? at
org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
?? at
com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
?? at
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
?? at
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
?? at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
?? at
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
?? at
com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
?? at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
?? at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
?? at java.base/java.lang.Thread.run(Thread.java:1583)??
?? Suppressed:
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
6144 actions: IOException: 6144 times, servers with issues:
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
We should split these batches up into chunks so they don't cause issues
was:
At my company, we're experimenting with the new incremental backup system.
We've experienced issues deleting large number of bulkloaded rows from the
system table if when exceeding the batch limit
??2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl
- id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last
exception=java.io.IOException: java.io.IOException: Rejecting large batch
operation for current batch with firstRegionName:
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested
Number of Rows: 2048 , Size Threshold: 1500??
?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
?? at
org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
??Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException:
Rejecting large batch operation for current batch with firstRegionName:
backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested
Number of Rows: 2048 , Size Threshold: 1500??
?? at
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
?? at
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
?? at
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
?? ... 4 more??
?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259,
tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 --
final attempt!??
??2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR
o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776
actions: IOException: 75776 times, servers with issues:
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
??org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
75776 actions: IOException: 75776 times, servers with issues:
na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
?? at
org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
?? at
com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
?? at
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
?? at
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
?? at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
?? at
com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
?? at
com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
?? at
com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
?? at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
?? at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
?? at java.base/java.lang.Thread.run(Thread.java:1583)??
?? Suppressed:
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
6144 actions: IOException: 6144 times, servers with issues:
na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
?? at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
?? at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
We should split these batches up into chunks so they don't cause issues
> Deleting bulk loaded rows from the backup system table can result in large
> batch rejections failures
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-29197
> URL: https://issues.apache.org/jira/browse/HBASE-29197
> Project: HBase
> Issue Type: Bug
> Components: backup&restore
> Reporter: Hernan Gelaf-Romer
> Priority: Major
>
> At my company, we're experimenting with the new incremental backup system.
> We've experienced issues deleting large number of bulkloaded rows from the
> system table if when exceeding the batch limit
>
> 2025-03-18 13:03:01.208 [htable-pool-6] WARN o.a.h.h.c.AsyncRequestFutureImpl
> - id=10, table=backup:system_bulk, attempt=15/13, failureCount=2048ops, last
> exception=java.io.IOException: java.io.IOException: Rejecting large batch
> operation for current batch with firstRegionName:
> backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. ,
> Requested Number of Rows: 2048 , Size Threshold: 1500
> ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:511)??
> ?? at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)??
> ?? at
> org.apache.hadoop.hbase.ipc.CallRunnerWithContext.run(CallRunnerWithContext.java:103)??
> ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:105)??
> ?? at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:85)??
> Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException:
> Rejecting large batch operation for current batch with firstRegionName:
> backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. ,
> Requested Number of Rows: 2048 , Size Threshold: 1500
> ?? at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)??
> ?? at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)??
> ?? at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)??
> ?? at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443)??
> ?? ... 4 more??
> ?? on na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259,
> tracking started Tue Mar 18 13:01:12 UTC 2025; NOT retrying, failed=2048 –
> final attempt!??
> 2025-03-18 13:03:01.275 [pool-116-thread-1] ERROR
> o.a.h.h.b.impl.TableBackupClient - Unexpected BackupException : Failed 75776
> actions: IOException: 75776 times, servers with issues:
> na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 75776 actions: IOException: 75776 times, servers with issues:
> na1-tart-soft-mountain.iad03.hubinternal.net,60020,1741890145177,
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:209)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:431)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.BackupManager.deleteBulkLoadedRows(BackupManager.java:362)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:201)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:594)??
> ?? at
> com.hubspot.hbase.recovery.core.factories.HBaseBackupAdminFactory$HBaseBackupAdmin.backupTables(HBaseBackupAdminFactory.java:92)??
> ?? at
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.lambda$runTableBackup$2(BackupManager.java:524)??
> ?? at
> com.hubspot.hadoop.auth.utils.HadoopAuthHelper.lambda$doAs$9(HadoopAuthHelper.java:590)??
> ?? at
> java.base/java.security.AccessController.doPrivileged(AccessController.java:714)??
> ?? at java.base/javax.security.auth.Subject.doAs(Subject.java:525)??
> ?? at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)??
> ?? at
> com.hubspot.hadoop.auth.utils.HadoopAuthHelper.doAs(HadoopAuthHelper.java:603)??
> ?? at
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.runTableBackup(BackupManager.java:521)??
> ?? at
> com.hubspot.hbase.recovery.core.backup.BackupManager$MonitoredTableBackupRunner.run(BackupManager.java:449)??
> ?? at
> com.hubspot.hbase.recovery.core.backup.BackupManager.runBackups(BackupManager.java:103)??
> ?? at
> com.hubspot.hbase.recovery.jobs.BackupJob.takeBackups(BackupJob.java:166)??
> ?? at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)??
> ?? at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)??
> ?? at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)??
> ?? at java.base/java.lang.Thread.run(Thread.java:1583)??
> ?? Suppressed:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 6144 actions: IOException: 6144 times, servers with issues:
> na1-grand-steamed-salmon.iad03.hubinternal.net,60020,1741889101259??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.makeException(BufferedMutatorImpl.java:343)??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:317)??
> ?? at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:246)??
> ?? at
> org.apache.hadoop.hbase.backup.impl.BackupSystemTable.deleteBulkLoadedRows(BackupSystemTable.java:424)??
>
> We should split these batches up into chunks so they don't cause issues
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)