Rushabh Shah created HBASE-26195:
------------------------------------
Summary: Data is present in replicated cluster but not visible on
primary cluster.
Key: HBASE-26195
URL: https://issues.apache.org/jira/browse/HBASE-26195
Project: HBase
Issue Type: Bug
Components: Replication, wal
Affects Versions: 1.7.0
Reporter: Rushabh Shah
Assignee: Rushabh Shah
We encountered a case where we are seeing some rows (via Phoenix) in replicated
cluster but they are not present in source/active cluster.
Triaging further we found memstore rollback logs in few of the region servers.
{noformat}
2021-07-28 14:17:59,353 DEBUG [3,queue=3,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
2021-07-28 14:17:59,353 DEBUG [,queue=25,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
2021-07-28 14:17:59,354 DEBUG [3,queue=3,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
2021-07-28 14:17:59,354 DEBUG [,queue=25,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
2021-07-28 14:17:59,354 DEBUG [3,queue=3,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
2021-07-28 14:17:59,354 DEBUG [,queue=25,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
2021-07-28 14:17:59,355 DEBUG [3,queue=3,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
2021-07-28 14:17:59,355 DEBUG [,queue=25,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
2021-07-28 14:17:59,356 DEBUG [,queue=25,port=60020] regionserver.HRegion -
rollbackMemstore rolled back 23
{noformat}
Looking more into logs, found that there were some hdfs layer issues sync'ing
wal to hdfs.
It was taking around 6 mins to sync wal. Logs below
{noformat}
2021-07-28 14:19:30,511 WARN [sync.0] hdfs.DataStreamer - Slow
waitForAckedSeqno took 391210ms (threshold=30000ms). File being written:
/hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620,
block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write
pipeline datanodes:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]].
2021-07-28 14:19:30,589 WARN [sync.1] hdfs.DataStreamer - Slow
waitForAckedSeqno took 391148ms (threshold=30000ms). File being written:
/hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620,
block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write
pipeline datanodes:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]].
2021-07-28 14:19:30,589 WARN [sync.2] hdfs.DataStreamer - Slow
waitForAckedSeqno took 391147ms (threshold=30000ms). File being written:
/hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620,
block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write
pipeline datanodes:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]].
2021-07-28 14:19:30,591 INFO [sync.0] wal.FSHLog - Slow sync cost: 391289 ms,
current pipeline:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
2021-07-28 14:19:30,591 INFO [sync.1] wal.FSHLog - Slow sync cost: 391227 ms,
current pipeline:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
2021-07-28 14:19:30,591 WARN [sync.1] wal.FSHLog - Requesting log roll because
we exceeded slow sync threshold; time=391227 ms, threshold=10000 ms, current
pipeline:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
2021-07-28 14:19:30,591 INFO [sync.2] wal.FSHLog - Slow sync cost: 391227 ms,
current pipeline:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
2021-07-28 14:19:30,591 WARN [sync.2] wal.FSHLog - Requesting log roll because
we exceeded slow sync threshold; time=391227 ms, threshold=10000 ms, current
pipeline:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
2021-07-28 14:19:30,591 WARN [sync.3] hdfs.DataStreamer - Slow
waitForAckedSeqno took 391217ms (threshold=30000ms). File being written:
/hbase/WALs/<rs-node>,60020,1626191371499/<rs-node>%2C60020%2C1626191371499.1627480615620,
block: BP-958889176-<namenode>-1567030695029:blk_1689647875_616028364, Write
pipeline datanodes:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]].
2021-07-28 14:19:30,591 INFO [sync.3] wal.FSHLog - Slow sync cost: 391217 ms,
current pipeline:
[DatanodeInfoWithStorage[<dn-1>:50010,DS-b5747702-8ab9-4a5e-916e-5fae6e305738,DISK],
DatanodeInfoWithStorage[<dn-2>:50010,DS-505dabb0-0fd6-42d9-b25d-f25e249fe504,DISK],
DatanodeInfoWithStorage[<dn-3>:50010,DS-6c585673-d4d0-4ec6-bafe-ad4cd861fb4b,DISK]]
{noformat}
Just to add more details, we also found there was a namenode failover. Might be
a contributing factor on why sync's were slow but not sure.
{noformat}
2021-07-28 14:19:30,390 WARN [<rs-host>:60020-3] retry.RetryInvocationHandler
- A failover has occurred since the start of call #8491727 Client
NamenodeProtocolTranslatorPB.mkdirs over <namenode-hostname>/<namenode-ip>:8020
2021-07-28 14:19:30,390 WARN [<rs-host>:60020-7] retry.RetryInvocationHandler
- A failover has occurred since the start of call #8491726
ClientNamenodeProtocolTranslatorPB.mkdirs over
<namenode-hostname>/<namenode-ip>:8020
2021-07-28 14:19:30,391 WARN [20%2C1626191371499,1]
retry.RetryInvocationHandler - A failover has occurred since the start of call
#8491723 ClientNamenodeProtocolTranslatorPB.getBlockLocations over
<namenode-hostname>/<namenode-ip>:8020
2021-07-28 14:19:30,391 WARN [0.40:60020.logRoller]
retry.RetryInvocationHandler - A failover has occurred since the start of call
#8491729 ClientNamenodeProtocolTranslatorPB.getFileInfo over
<namenode-hostname>/<namenode-ip>:8020
2021-07-28 14:19:30,391 WARN [MemStoreFlusher.1] retry.RetryInvocationHandler
- A failover has occurred since the start of call #8491730
ClientNamenodeProtocolTranslatorPB.getFileInfo over
<namenode-hostname>/<namenode-ip>:8020
2021-07-28 14:19:30,391 WARN [371499.1627480615620]
retry.RetryInvocationHandler - A failover has occurred since the start of call
#8491731 ClientNamenodeProtocolTranslatorPB.addBlock over
<namenode-hostname>/<namenode-ip>:8020
2021-07-28 14:19:30,391 WARN [c@prod] retry.RetryInvocationHandler - A
failover has occurred since the start of call #8491725
ClientNamenodeProtocolTranslatorPB.renewLease over
<namenode-hostname>/<namenode-ip>:8020
2021-07-28 14:19:30,391 WARN [91371499@<Replicatonsource-thread>]
retry.RetryInvocationHandler - A failover has occurred since the start of call
#8491724 ClientNamenodeProtocolTranslatorPB.getBlockLocations over
<namenode-hostname>/<namenode-ip>:8020
{noformat}
We use the default timeout for hbase.regionserver.wal.sync.timeout which is 5
mins.
Since the sync took more than 6 minutes (which is more than
hbase.regionserver.wal.sync.timeout), the client received an Exception and we
rolled back tje mutations in memstore on the primary cluster.
But the hdfs sync succeeded +eventually+, the row was present in wal and was
eligible for replication.
[~bharathv] [[email protected]] [~gjacoby] helped brainstorming for a
possible solution. Will update the possible approaches in the comments section.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)