[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count
[ https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158133#comment-17158133 ] Erick Erickson commented on SOLR-14648: --- [~sayan.das] Hmmm, I suppose we could automatically check the version stamp on all the PULL replicas, they should be comparable. [~tflobbe] Right, I think I'd prioritize things this way: 1> don't create a new TLOG replica if there's no leader. You can fix this manually, but it's be something like 1.1> copy the index from some selected PULL replica somewhere 1.2> nuke the collection 1.3> create a single TLOG replica 1.4> shut down Solr, copy the saved index back and start Solr 1.5> build out the additional PULL replicas 1.6> yuck. but at least it doesn't leave you with nothing. 2> Implement the expert flag to get you out of this mess automatically. I'll add parenthetically that running with a single TLOG replica has no HA/DR built in by definition, it's not a good practice. We shouldn't nuke a collection even so of course. > Creating TLOG with pure multiple PULL replica, leading to 0 doc count > - > > Key: SOLR-14648 > URL: https://issues.apache.org/jira/browse/SOLR-14648 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.3.1 >Reporter: Sayan Das >Priority: Major > > With only PULL replica whenever we create a new TLOG as leader fresh > replication happens, resulting in flushing the older indexes from existing > PULL replicas > Steps to replicate: > # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas > # Index few documents and let it replicate in all the replicas > # Delete all the TLOG/NRT replica leaving PULL types > # Create a new TLOG/NRT as leader, once recovery completes it replaces all > the older indexes > In ideal scenario it should have replicated from any one of the PULL replicas > that has latest indexes after that TLOG/NRT replica should be registered as > leader -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count
[ https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158009#comment-17158009 ] Sayan Das commented on SOLR-14648: -- I think #2 which [~ichattopadhyaya] suggested is the safest way to tackle this issue. We can have some flag which will take user consent of freshly replicating new set of indexes to PULL replicas. By default it should be other way around, in which it will check which pull replica have the latest set of index/segments (not sure if that is even possible or we can just skip this step?). Now when we know from where to replicate it will start reverse replication from "pseudo leader" PULL to leader elect TLOG. Maybe we can give user the option from which PULL replica they want to replicate? > Creating TLOG with pure multiple PULL replica, leading to 0 doc count > - > > Key: SOLR-14648 > URL: https://issues.apache.org/jira/browse/SOLR-14648 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.3.1 >Reporter: Sayan Das >Priority: Major > > With only PULL replica whenever we create a new TLOG as leader fresh > replication happens, resulting in flushing the older indexes from existing > PULL replicas > Steps to replicate: > # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas > # Index few documents and let it replicate in all the replicas > # Delete all the TLOG/NRT replica leaving PULL types > # Create a new TLOG/NRT as leader, once recovery completes it replaces all > the older indexes > In ideal scenario it should have replicated from any one of the PULL replicas > that has latest indexes after that TLOG/NRT replica should be registered as > leader -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count
[ https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157773#comment-17157773 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-14648: -- bq. When a TLOG replica comes up and sees there are no other TLOG/NRT replicas to replicate from, either +1. I think #1 is very important (don't nuke a running cluster), #2 is desired (it'll help users get unstuck once they fell into #1) bq. H, or would the ability to "promote" a replica from PULL to TLOG work? No other "replica type change" is currently supported. And still, this would be an operation that causes data loss. We should make sure it's not used under regular circumstances, etc. I think this would be tricky. > Creating TLOG with pure multiple PULL replica, leading to 0 doc count > - > > Key: SOLR-14648 > URL: https://issues.apache.org/jira/browse/SOLR-14648 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.3.1 >Reporter: Sayan Das >Priority: Major > > With only PULL replica whenever we create a new TLOG as leader fresh > replication happens, resulting in flushing the older indexes from existing > PULL replicas > Steps to replicate: > # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas > # Index few documents and let it replicate in all the replicas > # Delete all the TLOG/NRT replica leaving PULL types > # Create a new TLOG/NRT as leader, once recovery completes it replaces all > the older indexes > In ideal scenario it should have replicated from any one of the PULL replicas > that has latest indexes after that TLOG/NRT replica should be registered as > leader -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count
[ https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157579#comment-17157579 ] Erick Erickson commented on SOLR-14648: --- (2) seems safer, assuming that the TLOG replica never gets created without the special flag. Maybe the expert flag would be the PULL replica to grab the index from? (1) seems trickier. It'd have to be generalized to _never_ become the leader, even if the cluster was restarted afterwards. Having a TLOG replica successfully created seems like it'd have more places to go wrong. Also, how would the cluster get out of that state? You'd have to do something like FORCELEADER which would have some of the same problems... > Creating TLOG with pure multiple PULL replica, leading to 0 doc count > - > > Key: SOLR-14648 > URL: https://issues.apache.org/jira/browse/SOLR-14648 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.3.1 >Reporter: Sayan Das >Priority: Major > > With only PULL replica whenever we create a new TLOG as leader fresh > replication happens, resulting in flushing the older indexes from existing > PULL replicas > Steps to replicate: > # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas > # Index few documents and let it replicate in all the replicas > # Delete all the TLOG/NRT replica leaving PULL types > # Create a new TLOG/NRT as leader, once recovery completes it replaces all > the older indexes > In ideal scenario it should have replicated from any one of the PULL replicas > that has latest indexes after that TLOG/NRT replica should be registered as > leader -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count
[ https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157456#comment-17157456 ] Ishan Chattopadhyaya commented on SOLR-14648: - Agree, Erick. This will result in a data loss. I propose we fix this as: When a TLOG replica comes up and sees there are no other TLOG/NRT replicas to replicate from, either # TLOG replica doesn't become a leader, since it is behind the PULL replicas. This will stop the scary situation where it becomes leader with empty index. # Or, when the ADDREPLICA command contains some special expert level flag, it replicas from the PULL replica (with the full knowledge that there will be a data loss situation). WDYT, [~sayan.das], [~erickerickson]? > Creating TLOG with pure multiple PULL replica, leading to 0 doc count > - > > Key: SOLR-14648 > URL: https://issues.apache.org/jira/browse/SOLR-14648 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.3.1 >Reporter: Sayan Das >Priority: Major > > With only PULL replica whenever we create a new TLOG as leader fresh > replication happens, resulting in flushing the older indexes from existing > PULL replicas > Steps to replicate: > # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas > # Index few documents and let it replicate in all the replicas > # Delete all the TLOG/NRT replica leaving PULL types > # Create a new TLOG/NRT as leader, once recovery completes it replaces all > the older indexes > In ideal scenario it should have replicated from any one of the PULL replicas > that has latest indexes after that TLOG/NRT replica should be registered as > leader -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count
[ https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157424#comment-17157424 ] Erick Erickson commented on SOLR-14648: --- First, I agree that nuking all the existing PULL replica indexes is A Bad Thing in the scenario you point out. That said I don't think picking a PULL replica to grab the index from automatically is a good solution since there's no guarantee that any PULL replica is up to date. Consider this scenario: PULL replica1 is offline the TLOG and other PULL replicas get lots of udpates. For some unfathomable reason, everything except replica1 is taken down. Now a new TLOG replica is created and it gets the index from replica1 which isn't up to date at all. This is just the most egregious scenario. In a more realistic scenario, the PULL replicas may or may not have gotten the latest index changes when the TLOG replica goes down, so how would it be possible to choose among them? In fact there's no guarantee at all that _any_ of the PULL replicas remaining have the latest updates. I'm thinking of something along the lines of creating a new TLOG replica in your scenario failing. More generally failing if there are no active TLOG replicas (leaders) in the existing collection. We'd need a way to fix this, perhaps a way to "promote" a PULL replica to a TLOG replica. There'd still be the possibility of losing documents, but just like FORCELEADER we can document this and let people decide whether it's worth the risk or they should just reindex everything. We should not risk data inconsistency without somehow making sure that the users understand the risk. > Creating TLOG with pure multiple PULL replica, leading to 0 doc count > - > > Key: SOLR-14648 > URL: https://issues.apache.org/jira/browse/SOLR-14648 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.3.1 >Reporter: Sayan Das >Priority: Major > > With only PULL replica whenever we create a new TLOG as leader fresh > replication happens, resulting in flushing the older indexes from existing > PULL replicas > Steps to replicate: > # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas > # Index few documents and let it replicate in all the replicas > # Delete all the TLOG/NRT replica leaving PULL types > # Create a new TLOG/NRT as leader, once recovery completes it replaces all > the older indexes > In ideal scenario it should have replicated from any one of the PULL replicas > that has latest indexes after that TLOG/NRT replica should be registered as > leader -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org