[ 
https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087100#comment-16087100
 ] 

Amrit Sarkar commented on SOLR-11069:
-------------------------------------

Regarding {{updateLogSynchronizer}} ::

Everytime we call {{DISABLEBUFFER}} or {{ENABLEBUFFER}}, 
CdcrBufferManager::stateUpdate gets invoked::
{code}
@Override
  public synchronized void stateUpdate() {
    CdcrUpdateLog ulog = (CdcrUpdateLog) core.getUpdateHandler().getUpdateLog();
    // If I am not the leader, I should always buffer my updates
    if (!leaderStateManager.amILeader()) {
      ulog.enableBuffer();
      return;
    }
    // If I am the leader, I should buffer my updates only if buffer is enabled
    else if 
(bufferStateManager.getState().equals(CdcrParams.BufferState.ENABLED)) {
      ulog.enableBuffer();
      return;
    }
    // otherwise, disable the buffer
    ulog.disableBuffer();
  }
{code}

The non-leader nodes are by-defaulted are always buffer enabled ::
{code}
if (!leaderStateManager.amILeader()) {
      ulog.enableBuffer();
      return;
    }
{code}
though LPV always calculated on leader but it has serious drawbacks explained 
later:

in CdcrUpdateLogSynchronizer:: run :: if buffering is {enabled} ::
{code}
// if we received -1, it means that the log reader on the leader has not yet 
started to read log entries
        // do nothing
        if (lastVersion == -1) {
          return;
        }
        try {
          CdcrUpdateLog ulog = (CdcrUpdateLog) 
core.getUpdateHandler().getUpdateLog();
          if (ulog.isBuffering()) {
            log.debug("Advancing replica buffering tlog reader to {} @ {}:{}", 
lastVersion, collection, shardId);
            ulog.getBufferToggle().seek(lastVersion);
          }
        }
{code}
It always returns on {lastVersion == -1} and look at the comment {{if we 
received -1, it means that the log reader on the leader has not yet started to 
read log entries}}, that's misleading.

As the {{lastVersion}} is not +ve, the seek for the corresponding non-leader 
nodes are never set to appropriate LPV. 

Now if the leader goes down, and some non-leader becomes the leader himself, 
the LPV is not set properly resulting in improper sync and I have no idea how 
the impact will be in that case. 

Also, as for non-leader nodes buffer is always on, if in the future it becomes 
the leader itself, even if we have disabled buffer for the source collection 
cluster, the status and its action will be {{buffer enabled}}. Again, not sure 
of the impact, need to look closely.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -----------------------------------------------------------------
>
>                 Key: SOLR-11069
>                 URL: https://issues.apache.org/jira/browse/SOLR-11069
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>    Affects Versions: 7.0
>            Reporter: Amrit Sarkar
>
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides correct seek to the {{non-leader}} nodes 
> to advance at. I am not sure whether this is an intended behavior for sync 
> but it surely doesn't feel right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to