[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count

2020-07-15 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158133#comment-17158133
 ] 

Erick Erickson commented on SOLR-14648:
---

[~sayan.das] Hmmm, I suppose we could automatically check the version stamp on 
all the PULL replicas, they should be comparable.

[~tflobbe] Right, I think I'd prioritize things this way:
1> don't create a new TLOG replica if there's no leader. You can fix this 
manually, but it's be something like 
1.1> copy the index from some selected PULL replica somewhere
1.2> nuke the collection
1.3> create a single TLOG replica
1.4> shut down Solr, copy the saved index back and start Solr
1.5> build out the additional PULL replicas
1.6> yuck. but at least it doesn't leave you with nothing.

2> Implement the expert flag to get you out of this mess automatically.

I'll add parenthetically that running with a single TLOG replica has no HA/DR 
built in by definition, it's not a good practice. We shouldn't nuke a 
collection even so of course.

> Creating TLOG with pure multiple PULL replica, leading to 0 doc count
> -
>
> Key: SOLR-14648
> URL: https://issues.apache.org/jira/browse/SOLR-14648
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.3.1
>Reporter: Sayan Das
>Priority: Major
>
> With only PULL replica whenever we create a new TLOG as leader fresh 
> replication happens, resulting in flushing the older indexes from existing 
> PULL replicas
> Steps to replicate:
>  # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas
>  # Index few documents and let it replicate in all the replicas
>  # Delete all the TLOG/NRT replica leaving PULL types
>  # Create a new TLOG/NRT as leader, once recovery completes it replaces all 
> the older indexes
> In ideal scenario it should have replicated from any one of the PULL replicas 
> that has latest indexes after that TLOG/NRT replica should be registered as 
> leader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count

2020-07-15 Thread Sayan Das (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158009#comment-17158009
 ] 

Sayan Das commented on SOLR-14648:
--

I think #2 which [~ichattopadhyaya] suggested is the safest way to tackle this 
issue. We can have some flag which will take user consent of freshly 
replicating new set of indexes to PULL replicas. By default it should be other 
way around, in which it will check which pull replica have the latest set of 
index/segments (not sure if that is even possible or we can just skip this 
step?). Now when we know from where to replicate it will start reverse 
replication from "pseudo leader" PULL to leader elect TLOG.
Maybe we can give user the option from which PULL replica they want to 
replicate?

> Creating TLOG with pure multiple PULL replica, leading to 0 doc count
> -
>
> Key: SOLR-14648
> URL: https://issues.apache.org/jira/browse/SOLR-14648
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.3.1
>Reporter: Sayan Das
>Priority: Major
>
> With only PULL replica whenever we create a new TLOG as leader fresh 
> replication happens, resulting in flushing the older indexes from existing 
> PULL replicas
> Steps to replicate:
>  # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas
>  # Index few documents and let it replicate in all the replicas
>  # Delete all the TLOG/NRT replica leaving PULL types
>  # Create a new TLOG/NRT as leader, once recovery completes it replaces all 
> the older indexes
> In ideal scenario it should have replicated from any one of the PULL replicas 
> that has latest indexes after that TLOG/NRT replica should be registered as 
> leader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count

2020-07-14 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157773#comment-17157773
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14648:
--

bq. When a TLOG replica comes up and sees there are no other TLOG/NRT replicas 
to replicate from, either
+1. I think #1 is very important (don't nuke a running cluster), #2 is desired 
(it'll help users get unstuck once they fell into #1)

bq. H, or would the ability to "promote" a replica from PULL to TLOG work?
No other "replica type change" is currently supported. And still, this would be 
an operation that causes data loss. We should make sure it's not used under 
regular circumstances, etc. I think this would be tricky.

> Creating TLOG with pure multiple PULL replica, leading to 0 doc count
> -
>
> Key: SOLR-14648
> URL: https://issues.apache.org/jira/browse/SOLR-14648
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.3.1
>Reporter: Sayan Das
>Priority: Major
>
> With only PULL replica whenever we create a new TLOG as leader fresh 
> replication happens, resulting in flushing the older indexes from existing 
> PULL replicas
> Steps to replicate:
>  # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas
>  # Index few documents and let it replicate in all the replicas
>  # Delete all the TLOG/NRT replica leaving PULL types
>  # Create a new TLOG/NRT as leader, once recovery completes it replaces all 
> the older indexes
> In ideal scenario it should have replicated from any one of the PULL replicas 
> that has latest indexes after that TLOG/NRT replica should be registered as 
> leader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count

2020-07-14 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157579#comment-17157579
 ] 

Erick Erickson commented on SOLR-14648:
---

(2) seems safer, assuming that the TLOG replica never gets created without the 
special flag. Maybe the expert flag would be the PULL replica to grab the index 
from?

(1) seems trickier. It'd have to be generalized to _never_ become the leader, 
even if the cluster was restarted afterwards. Having a TLOG replica 
successfully created seems like it'd have more places to go wrong. Also, how 
would the cluster get out of that state? You'd have to do something like 
FORCELEADER which would have some of the same problems...

> Creating TLOG with pure multiple PULL replica, leading to 0 doc count
> -
>
> Key: SOLR-14648
> URL: https://issues.apache.org/jira/browse/SOLR-14648
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.3.1
>Reporter: Sayan Das
>Priority: Major
>
> With only PULL replica whenever we create a new TLOG as leader fresh 
> replication happens, resulting in flushing the older indexes from existing 
> PULL replicas
> Steps to replicate:
>  # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas
>  # Index few documents and let it replicate in all the replicas
>  # Delete all the TLOG/NRT replica leaving PULL types
>  # Create a new TLOG/NRT as leader, once recovery completes it replaces all 
> the older indexes
> In ideal scenario it should have replicated from any one of the PULL replicas 
> that has latest indexes after that TLOG/NRT replica should be registered as 
> leader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count

2020-07-14 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157456#comment-17157456
 ] 

Ishan Chattopadhyaya commented on SOLR-14648:
-

Agree, Erick. This will result in a data loss.
I propose we fix this as:
When a TLOG replica comes up and sees there are no other TLOG/NRT replicas to 
replicate from, either
# TLOG replica doesn't become a leader, since it is behind the PULL replicas. 
This will stop the scary situation where it becomes leader with empty index.
# Or, when the ADDREPLICA command contains some special expert level flag, it 
replicas from the PULL replica (with the full knowledge that there will be a 
data loss situation).

WDYT, [~sayan.das], [~erickerickson]?

> Creating TLOG with pure multiple PULL replica, leading to 0 doc count
> -
>
> Key: SOLR-14648
> URL: https://issues.apache.org/jira/browse/SOLR-14648
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.3.1
>Reporter: Sayan Das
>Priority: Major
>
> With only PULL replica whenever we create a new TLOG as leader fresh 
> replication happens, resulting in flushing the older indexes from existing 
> PULL replicas
> Steps to replicate:
>  # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas
>  # Index few documents and let it replicate in all the replicas
>  # Delete all the TLOG/NRT replica leaving PULL types
>  # Create a new TLOG/NRT as leader, once recovery completes it replaces all 
> the older indexes
> In ideal scenario it should have replicated from any one of the PULL replicas 
> that has latest indexes after that TLOG/NRT replica should be registered as 
> leader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14648) Creating TLOG with pure multiple PULL replica, leading to 0 doc count

2020-07-14 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157424#comment-17157424
 ] 

Erick Erickson commented on SOLR-14648:
---

First, I agree that nuking all the existing PULL replica indexes is A Bad Thing 
in the scenario you point out.

That said I don't think picking a PULL replica to grab the index from 
automatically is a good solution since there's no guarantee that any PULL 
replica is up to date. Consider this scenario:

PULL replica1 is offline

the TLOG and other PULL replicas get lots of udpates. For some unfathomable 
reason, everything except replica1 is taken down. Now a new TLOG replica is 
created and it gets the index from replica1 which isn't up to date at all.

This is just the most egregious scenario. In a more realistic scenario, the 
PULL replicas may or may not have gotten the latest index changes when the TLOG 
replica goes down, so how would it be possible to choose among them? In fact 
there's no guarantee at all that _any_ of the PULL replicas remaining have the 
latest updates.

I'm thinking of something along the lines of creating a new TLOG replica in 
your scenario failing. More generally failing if there are no active TLOG 
replicas (leaders) in the existing collection. We'd need a way to fix this, 
perhaps a way to "promote" a PULL replica to a TLOG replica. There'd still be 
the possibility of losing documents, but just like FORCELEADER we can document 
this and let people decide whether it's worth the risk or they should just 
reindex everything.

We should not risk data inconsistency without somehow making sure that the 
users understand the risk.

> Creating TLOG with pure multiple PULL replica, leading to 0 doc count
> -
>
> Key: SOLR-14648
> URL: https://issues.apache.org/jira/browse/SOLR-14648
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.3.1
>Reporter: Sayan Das
>Priority: Major
>
> With only PULL replica whenever we create a new TLOG as leader fresh 
> replication happens, resulting in flushing the older indexes from existing 
> PULL replicas
> Steps to replicate:
>  # Create 1 NRT or 1 TLOG replica as leader with multiple PULL replicas
>  # Index few documents and let it replicate in all the replicas
>  # Delete all the TLOG/NRT replica leaving PULL types
>  # Create a new TLOG/NRT as leader, once recovery completes it replaces all 
> the older indexes
> In ideal scenario it should have replicated from any one of the PULL replicas 
> that has latest indexes after that TLOG/NRT replica should be registered as 
> leader



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org