[ 
https://issues.apache.org/jira/browse/HBASE-26960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated HBASE-26960:
-----------------------------
    Description: 
Besides HBASE-26768, there is another case replication  in 
{{RegionReplicationSink}} would be suspend:
For {{RegionReplicationSink}}, when there is a replication error , 
{{RegionReplicationSink}} invokes {{MemStoreFlusher#requestFlush}} to request a 
flush, and after receiving the {{FlushAction#START_FLUSH}} or 
{{FlushAction#CANNOT_FLUSH}} flush marker, it would resume the replication. But 
when {{MemStoreFlusher}}  flushing, it invokes following method 
{{HRegion.flushcache}} with the {{writeFlushRequestWalMarker}} set to false:
{code:java}
  public FlushResultImpl flushcache(List<byte[]> families,
      boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) throws 
IOException {
 }
{code}
When  {{writeFlushRequestWalMarker}} is set to false, {{HRegion.flushcache}} 
does not write the {{FlushAction#CANNOT_FLUSH}} flush marker to {{WAL}} when 
the memstore is empty, so when there is a replication error when the memstore 
is empty(eg. replicating the {{FlushAction#START_FLUSH}}  or 
{{FlushAction#COMMIT_FLUSH}} ), the replication may suspend until next memstore 
flush,even though later there are user writes and it could replicate normally.

I simulate this problem in the PR , and for {{writeFlushRequestWalMarker}} 
paramter, it is introduced by HBASE-11580 and just only determines whether 
writing the {{FlushAction#CANNOT_FLUSH}} flush marker to WAL when the memstore 
is empty, so I think for simplicity, we could set it to true always for 
{{MemStoreFlusher}}.

  was:
There is another case replication  in {{RegionReplicationSink}} would be 
suspend:
For {{RegionReplicationSink}}, when there is a replication error , 
{{RegionReplicationSink}} invokes {{MemStoreFlusher#requestFlush}} to request a 
flush, and after receiving the {{FlushAction#START_FLUSH}} or 
{{FlushAction#CANNOT_FLUSH}} flush marker, it would resume the replication. But 
when {{MemStoreFlusher}}  flushing, it invokes following method 
{{HRegion.flushcache}} with the {{writeFlushRequestWalMarker}} set to false:
{code:java}
  public FlushResultImpl flushcache(List<byte[]> families,
      boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) throws 
IOException {
 }
{code}
When  {{writeFlushRequestWalMarker}} is set to false, {{HRegion.flushcache}} 
does not write the {{FlushAction#CANNOT_FLUSH}} flush marker to {{WAL}} when 
the memstore is empty, so when there is a replication error when the memstore 
is empty(eg. replicating the {{FlushAction#START_FLUSH}}  or 
{{FlushAction#COMMIT_FLUSH}} ), the replication may suspend until next memstore 
flush,even though later there are user writes and it could replicate normally.

I simulate this problem in the PR , and for {{writeFlushRequestWalMarker}} 
paramter, it is introduced by HBASE-11580 and just only determines whether 
writing the {{FlushAction#CANNOT_FLUSH}} flush marker to WAL when the memstore 
is empty, so I think for simplicity, we could set it to true always for 
{{MemStoreFlusher}}.


> Another case for unnecessary replication suspending in RegionReplicationSink
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-26960
>                 URL: https://issues.apache.org/jira/browse/HBASE-26960
>             Project: HBase
>          Issue Type: Bug
>          Components: read replicas
>    Affects Versions: 3.0.0-alpha-2
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>
> Besides HBASE-26768, there is another case replication  in 
> {{RegionReplicationSink}} would be suspend:
> For {{RegionReplicationSink}}, when there is a replication error , 
> {{RegionReplicationSink}} invokes {{MemStoreFlusher#requestFlush}} to request 
> a flush, and after receiving the {{FlushAction#START_FLUSH}} or 
> {{FlushAction#CANNOT_FLUSH}} flush marker, it would resume the replication. 
> But when {{MemStoreFlusher}}  flushing, it invokes following method 
> {{HRegion.flushcache}} with the {{writeFlushRequestWalMarker}} set to false:
> {code:java}
>   public FlushResultImpl flushcache(List<byte[]> families,
>       boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) 
> throws IOException {
>  }
> {code}
> When  {{writeFlushRequestWalMarker}} is set to false, {{HRegion.flushcache}} 
> does not write the {{FlushAction#CANNOT_FLUSH}} flush marker to {{WAL}} when 
> the memstore is empty, so when there is a replication error when the memstore 
> is empty(eg. replicating the {{FlushAction#START_FLUSH}}  or 
> {{FlushAction#COMMIT_FLUSH}} ), the replication may suspend until next 
> memstore flush,even though later there are user writes and it could replicate 
> normally.
> I simulate this problem in the PR , and for {{writeFlushRequestWalMarker}} 
> paramter, it is introduced by HBASE-11580 and just only determines whether 
> writing the {{FlushAction#CANNOT_FLUSH}} flush marker to WAL when the 
> memstore is empty, so I think for simplicity, we could set it to true always 
> for {{MemStoreFlusher}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to