subject:"\[jira\] \[Commented\] \(SOLR\-3939\) An empty or just replicated index cannot become the leader of a shard after a leader goes down."

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2013-03-22 Thread Commit Tag Bot (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610637#comment-13610637
]

Commit Tag Bot commented on SOLR-3939:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1402362

SOLR-3933: Distributed commits are not guaranteed to be ordered within a
request.

SOLR-3939: An empty or just replicated index cannot become the leader of a
shard after a leader goes down.

SOLR-3971: A collection that is created with numShards=1 turns into a
numShards=2 collection after starting up a second core and not specifying
numShards.

SOLR-3932: SolrCmdDistributorTest either takes 3 seconds or 3 minutes.

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Key: SOLR-3939
URL: https://issues.apache.org/jira/browse/SOLR-3939
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 4.0-BETA, 4.0
Reporter: Joel Bernstein
Assignee: Mark Miller
Priority: Critical
Labels: 4.0.1_Candidate
Fix For: 4.1, 5.0

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

When a leader core is unloaded using the core admin api, the followers in the
shard go into recovery but do not come out. Leader election doesn't take
place and the shard goes down.
This effects the ability to move a micro-shard from one Solr instance to
another Solr instance.
The problem does not occur 100% of the time but a large % of the time.
To setup a test, startup Solr Cloud with a single shard. Add cores to that
shard as replicas using core admin. Then unload the leader core using core
admin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2013-03-22 Thread Commit Tag Bot (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610642#comment-13610642
]

Commit Tag Bot commented on SOLR-3939:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1402361

SOLR-3933: Distributed commits are not guaranteed to be ordered within a
request.

SOLR-3939: An empty or just replicated index cannot become the leader of a
shard after a leader goes down.

SOLR-3971: A collection that is created with numShards=1 turns into a
numShards=2 collection after starting up a second core and not specifying
numShards.

SOLR-3932: SolrCmdDistributorTest either takes 3 seconds or 3 minutes.

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2013-03-22 Thread Commit Tag Bot (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13610678#comment-13610678
]

Commit Tag Bot commented on SOLR-3939:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1397672

SOLR-3939: Consider a sync attempt from leader to replica that fails due to 404
a success.
SOLR-3940: Rejoining the leader election incorrectly triggers the code path for
a fresh cluster start rather than fail over.

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-26 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485000#comment-13485000
]

Mark Miller commented on SOLR-3939:
---

Okay, I'm going to resolve this - we can make a new issue for the case where a
replica comes up and is ahead somehow.

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-25 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484611#comment-13484611
]

Mark Miller commented on SOLR-3939:
---

I've committed my latest work to 4x Joel - can you do a bit more testing with a
recent checkout?

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-25 Thread Joel Bernstein (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484652#comment-13484652
]

Joel Bernstein commented on SOLR-3939:
--

I ran the Oct 14th test and the leader election worked perfectly. Then I tested
shutting down the leader VM instead of unloading the loader core and this
worked fine.

Then I tried a leader with two replicas that had both just been replicated to.
When I unloaded the leader neither replica became leader. But this was the case
that was not yet accounted for I believe.

I can't think of a use case where the second scenario would happen though.

The first scenario though is critical for migrating micro-shards, so it's great
that you committed this.

Thanks for your work on this issue.

Joel

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-25 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484683#comment-13484683
]

Yonik Seeley commented on SOLR-3939:

bq. Isn't that what capturing the starting versions is all about?

For a node starting up, yeah. For a leader syncing to someone else - I don't
think it should matter.

bq. but if you want to peer sync from the leader to a replica that is coming
back up, if updates are coming in, you are going to force a replication anyway.

If updates were coming in fast enough during the bounce... I guess so.

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-24 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483567#comment-13483567
]

Yonik Seeley commented on SOLR-3939:

Trying to think if this could happen when there are versions too... say that
instead of having no versions, we just have old versions from before we did the
replication. This may argue for somehow marking the start of a replication in
the transaction log and then never retrieving versions older than that.

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-24 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483595#comment-13483595
]

Yonik Seeley commented on SOLR-3939:

Thinking of some scenarios where this could happen:

1. R1,R2 both up and active, add docs 1,2,3
2. bring R2 down
3. add docs 4 through 1million
4. bring R2 up, peersync fails, replication is kicked off
5. R2 finishes replication and becomes active, but it's recent version still
list 1,2,3
6. bring R1 down, R2 becomes the leader
7. bring R2 up, it does a peer-sync with R1, which looks like it has really old
versions (and succeeds because of that)
8. if the leader (R2) does a peer-sync back with R1, it will fail (not sure of
the consequences of this)

Another variation... if there's an update between 6 and 7:
6.5. add doc 1million+1

This will cause recent versions of R2 to be 1,2,3,101
It would be good to verify that peersync to the leader will either fail
(causing full replication), or pick up the new document.

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-24 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483600#comment-13483600
]

Mark Miller commented on SOLR-3939:
---

Currently the leader does not peer sync back to a replica coming up because it
would have to buffer updates.

I think that if a replica is somehow ahead of the leader when coming back,
peersync should fail and it should replicate. I think since this is not a
common case, that is much simpler than trying to peersync back from the leder
to the replica in this case.

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-24 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483649#comment-13483649
]

Yonik Seeley commented on SOLR-3939:

bq. Currently the leader does not peer sync back to a replica coming up because
it would have to buffer updates.

peer sync doesn't require buffering updates. AFAIK, we don't do that until we
realize we need to replicate?

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

2012-10-24 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483664#comment-13483664
]

Mark Miller commented on SOLR-3939:
---

As far as I remember, if updates are coming in when you try and peer sync, we
fail it? Isn't that what capturing the starting versions is all about?

When a leader syncs with his replicas on leader election, we know docs are not
coming in, so we don't worry about that starting versions check - but if you
want to peer sync from the leader to a replica that is coming back up, if
updates are coming in, you are going to force a replication anyway. Since it's
already an uncommon case, it doesn't seem worth tackling. I mention buffering,
because it seemed you would have to to be able to peer sync when updates are
coming in (or block updates).

An empty or just replicated index cannot become the leader of a shard after a
leader goes down.
---

Attachments: cloud2.log, cloud.log, SOLR-3939.patch, SOLR-3939.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

[jira] [Commented] (SOLR-3939) An empty or just replicated index cannot become the leader of a shard after a leader goes down.

12 matches

Site Navigation

Mail list logo

Footer information