[
https://issues.apache.org/jira/browse/HBASE-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629369#comment-15629369
]
Ted Yu commented on HBASE-14417:
--------------------------------
I observed this in the TestHRegionServerBulkLoad-output for the version (v11
and earlier) where bulk load marker is written directly to hbase:backup table
in postAppend hook:
{code}
2016-09-13 23:10:14,072 DEBUG [B.defaultRpcServer.handler=4,queue=0,port=35667]
ipc.CallRunner(112): B.defaultRpcServer.handler=4,queue=0,port=35667: callId:
10646 service: ClientService methodName: Scan size: 264 connection:
172.18.128.12:59780
org.apache.hadoop.hbase.RegionTooBusyException: failed to get a lock in 60000
ms. regionName=atomicBulkLoad,,1473808150804.6b6c67612b01bce3348c144b959b7f0e.,
server=cn012.l42scl.hortonworks.com,35667,1473808145352
at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7744)
at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:7725)
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:7634)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2588)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2582)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2569)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33516)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2229)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:136)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
{code}
Here was the state of the BulkLoadHandler thread (stuck):
{code}
"RS:0;cn012:36301.append-pool9-t1" #453 prio=5 os_prio=0 tid=0x00007fc3945bb000
nid=0x18ec in Object.wait() [0x00007fc30dada000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at
org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1727)
- locked <0x0000000794750580> (a java.util.concurrent.atomic.AtomicLong)
at
org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1756)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:241)
at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:191)
- locked <0x0000000794750048> (a
org.apache.hadoop.hbase.client.BufferedMutatorImpl)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:949)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:569)
at
org.apache.hadoop.hbase.backup.impl.BackupSystemTable.writeBulkLoadDesc(BackupSystemTable.java:227)
at
org.apache.hadoop.hbase.backup.impl.BulkLoadHandler.postAppend(BulkLoadHandler.java:83)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog.postAppend(FSHLog.java:1448)
{code}
Even increasing handler count didn't help:
{code}
diff --git a/hbase-server/src/test/resources/hbase-site.xml
b/hbase-server/src/test/resources/hbase-site.xml
index bca90a3..829fcc9 100644
--- a/hbase-server/src/test/resources/hbase-site.xml
+++ b/hbase-server/src/test/resources/hbase-site.xml
@@ -30,6 +30,10 @@
</description>
</property>
<property>
+ <name>hbase.backup.enable</name>
+ <value>true</value>
+ </property>
+ <property>
<name>hbase.defaults.for.version.skip</name>
<value>true</value>
</property>
@@ -48,11 +52,11 @@
</property>
<property>
<name>hbase.regionserver.handler.count</name>
- <value>5</value>
+ <value>50</value>
</property>
<property>
{code}
Post v11, the data stored in zookeeper is temporary: once an incremental backup
is run for the table receiving bulk load, data in zookeeper would be stored for
the backup Id and removed from zookeeper.
> Incremental backup and bulk loading
> -----------------------------------
>
> Key: HBASE-14417
> URL: https://issues.apache.org/jira/browse/HBASE-14417
> Project: HBase
> Issue Type: New Feature
> Affects Versions: 2.0.0
> Reporter: Vladimir Rodionov
> Assignee: Ted Yu
> Priority: Critical
> Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14417.v1.txt, 14417.v11.txt, 14417.v13.txt,
> 14417.v2.txt, 14417.v21.txt, 14417.v23.txt, 14417.v24.txt, 14417.v25.txt,
> 14417.v6.txt
>
>
> Currently, incremental backup is based on WAL files. Bulk data loading
> bypasses WALs for obvious reasons, breaking incremental backups. The only way
> to continue backups after bulk loading is to create new full backup of a
> table. This may not be feasible for customers who do bulk loading regularly
> (say, every day).
> Google doc for design:
> https://docs.google.com/document/d/1ACCLsecHDvzVSasORgqqRNrloGx4mNYIbvAU7lq5lJE
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)