Dong0829 created HBASE-28971:
--------------------------------
Summary: FSHLog can not roll the WAL log properly
Key: HBASE-28971
URL: https://issues.apache.org/jira/browse/HBASE-28971
Project: HBase
Issue Type: Bug
Components: wal
Affects Versions: 2.5.10, 2.6.1, 2.4.18
Reporter: Dong0829
Assignee: Dong0829
For the FSHLog, when we try to roll the writer, we will
# initiate the zigzagLatch, and wait for the safe point
# After the safe point obtained, continue to close the writer
For above process, looks like we have
[assumption|https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L388]
that the highestSyncedTxid will must be bigger than highestUnsyncedTxid, I do
not think it must be true, because for
[attainSafePoint][[https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L1119],]
we did not limit the save point much be sync workload, if its stop at append
workload, the highestUnsyncedTxid will always be bigger highestSyncedTxid,
right?
In our environment, we can reproduce the issue which will causing the wal log
pilling up very quickly if a lot of writing, if we want to to make sure the the
existing logic is working, we need to add a check and make attainSafePoint only
on sync workload
--
This message was sent by Atlassian Jira
(v8.20.10#820010)