[ https://issues.apache.org/jira/browse/HBASE-26960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated HBASE-26960: ----------------------------- Description: Besides HBASE-26768, there is another case replication in {{RegionReplicationSink}} would be suspend: For {{RegionReplicationSink}}, when there is a replication error , {{RegionReplicationSink}} invokes {{MemStoreFlusher#requestFlush}} to request a flush, and after receiving the {{FlushAction#START_FLUSH}} or {{FlushAction#CANNOT_FLUSH}} flush marker, it would resume the replication. But when {{MemStoreFlusher}} flushing, it invokes following method {{HRegion.flushcache}} with the {{writeFlushRequestWalMarker}} set to false: {code:java} public FlushResultImpl flushcache(List<byte[]> families, boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) throws IOException { } {code} When {{writeFlushRequestWalMarker}} is set to false, {{HRegion.flushcache}} does not write the {{FlushAction#CANNOT_FLUSH}} flush marker to {{WAL}} when the memstore is empty, so when there is a replication error when the memstore is empty(eg. replicating the {{FlushAction#START_FLUSH}} or {{FlushAction#COMMIT_FLUSH}} ), the replication may suspend until next memstore flush,even though later there are user writes and it could replicate normally. I simulate this problem in the PR , and for {{writeFlushRequestWalMarker}} paramter, it is introduced by HBASE-11580 and just only determines whether writing the {{FlushAction#CANNOT_FLUSH}} flush marker to WAL when the memstore is empty, so I think for simplicity, we could set it to true always for {{MemStoreFlusher}}. was: There is another case replication in {{RegionReplicationSink}} would be suspend: For {{RegionReplicationSink}}, when there is a replication error , {{RegionReplicationSink}} invokes {{MemStoreFlusher#requestFlush}} to request a flush, and after receiving the {{FlushAction#START_FLUSH}} or {{FlushAction#CANNOT_FLUSH}} flush marker, it would resume the replication. But when {{MemStoreFlusher}} flushing, it invokes following method {{HRegion.flushcache}} with the {{writeFlushRequestWalMarker}} set to false: {code:java} public FlushResultImpl flushcache(List<byte[]> families, boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) throws IOException { } {code} When {{writeFlushRequestWalMarker}} is set to false, {{HRegion.flushcache}} does not write the {{FlushAction#CANNOT_FLUSH}} flush marker to {{WAL}} when the memstore is empty, so when there is a replication error when the memstore is empty(eg. replicating the {{FlushAction#START_FLUSH}} or {{FlushAction#COMMIT_FLUSH}} ), the replication may suspend until next memstore flush,even though later there are user writes and it could replicate normally. I simulate this problem in the PR , and for {{writeFlushRequestWalMarker}} paramter, it is introduced by HBASE-11580 and just only determines whether writing the {{FlushAction#CANNOT_FLUSH}} flush marker to WAL when the memstore is empty, so I think for simplicity, we could set it to true always for {{MemStoreFlusher}}. > Another case for unnecessary replication suspending in RegionReplicationSink > ---------------------------------------------------------------------------- > > Key: HBASE-26960 > URL: https://issues.apache.org/jira/browse/HBASE-26960 > Project: HBase > Issue Type: Bug > Components: read replicas > Affects Versions: 3.0.0-alpha-2 > Reporter: chenglei > Assignee: chenglei > Priority: Major > > Besides HBASE-26768, there is another case replication in > {{RegionReplicationSink}} would be suspend: > For {{RegionReplicationSink}}, when there is a replication error , > {{RegionReplicationSink}} invokes {{MemStoreFlusher#requestFlush}} to request > a flush, and after receiving the {{FlushAction#START_FLUSH}} or > {{FlushAction#CANNOT_FLUSH}} flush marker, it would resume the replication. > But when {{MemStoreFlusher}} flushing, it invokes following method > {{HRegion.flushcache}} with the {{writeFlushRequestWalMarker}} set to false: > {code:java} > public FlushResultImpl flushcache(List<byte[]> families, > boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) > throws IOException { > } > {code} > When {{writeFlushRequestWalMarker}} is set to false, {{HRegion.flushcache}} > does not write the {{FlushAction#CANNOT_FLUSH}} flush marker to {{WAL}} when > the memstore is empty, so when there is a replication error when the memstore > is empty(eg. replicating the {{FlushAction#START_FLUSH}} or > {{FlushAction#COMMIT_FLUSH}} ), the replication may suspend until next > memstore flush,even though later there are user writes and it could replicate > normally. > I simulate this problem in the PR , and for {{writeFlushRequestWalMarker}} > paramter, it is introduced by HBASE-11580 and just only determines whether > writing the {{FlushAction#CANNOT_FLUSH}} flush marker to WAL when the > memstore is empty, so I think for simplicity, we could set it to true always > for {{MemStoreFlusher}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)