[ https://issues.apache.org/jira/browse/SOLR-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087100#comment-16087100 ]
Amrit Sarkar commented on SOLR-11069: ------------------------------------- Regarding {{updateLogSynchronizer}} :: Everytime we call {{DISABLEBUFFER}} or {{ENABLEBUFFER}}, CdcrBufferManager::stateUpdate gets invoked:: {code} @Override public synchronized void stateUpdate() { CdcrUpdateLog ulog = (CdcrUpdateLog) core.getUpdateHandler().getUpdateLog(); // If I am not the leader, I should always buffer my updates if (!leaderStateManager.amILeader()) { ulog.enableBuffer(); return; } // If I am the leader, I should buffer my updates only if buffer is enabled else if (bufferStateManager.getState().equals(CdcrParams.BufferState.ENABLED)) { ulog.enableBuffer(); return; } // otherwise, disable the buffer ulog.disableBuffer(); } {code} The non-leader nodes are by-defaulted are always buffer enabled :: {code} if (!leaderStateManager.amILeader()) { ulog.enableBuffer(); return; } {code} though LPV always calculated on leader but it has serious drawbacks explained later: in CdcrUpdateLogSynchronizer:: run :: if buffering is {enabled} :: {code} // if we received -1, it means that the log reader on the leader has not yet started to read log entries // do nothing if (lastVersion == -1) { return; } try { CdcrUpdateLog ulog = (CdcrUpdateLog) core.getUpdateHandler().getUpdateLog(); if (ulog.isBuffering()) { log.debug("Advancing replica buffering tlog reader to {} @ {}:{}", lastVersion, collection, shardId); ulog.getBufferToggle().seek(lastVersion); } } {code} It always returns on {lastVersion == -1} and look at the comment {{if we received -1, it means that the log reader on the leader has not yet started to read log entries}}, that's misleading. As the {{lastVersion}} is not +ve, the seek for the corresponding non-leader nodes are never set to appropriate LPV. Now if the leader goes down, and some non-leader becomes the leader himself, the LPV is not set properly resulting in improper sync and I have no idea how the impact will be in that case. Also, as for non-leader nodes buffer is always on, if in the future it becomes the leader itself, even if we have disabled buffer for the source collection cluster, the status and its action will be {{buffer enabled}}. Again, not sure of the impact, need to look closely. > LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled > ----------------------------------------------------------------- > > Key: SOLR-11069 > URL: https://issues.apache.org/jira/browse/SOLR-11069 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: CDCR > Affects Versions: 7.0 > Reporter: Amrit Sarkar > > {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to > poorly initialised and maintained buffer log for either source or target > cluster core nodes. > If buffer is enabled for cores of either source or target cluster, it return > {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* > node of each shard of respective collection of respective cluster. Once > disabled, it starts telling us the correct LPV for each core. > Due to the same flawed behavior, Update Log Synchroniser may doesn't work > properly as expected, i.e. provides correct seek to the {{non-leader}} nodes > to advance at. I am not sure whether this is an intended behavior for sync > but it surely doesn't feel right. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org