[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248516#comment-13248516 ] Michael Garski commented on SOLR-2358: -- I have a use case for shard distribution based on something other than a hash on the document's unique id and was wondering if there are any thoughts as to how such functionality should be implemented? It looks like SOLR-2341 (Shard distribution policy) and SOLR-2592 (pluggable shard lookup mechanism) complement each other for indexing and searching and was wondering if anyone had thoughts as to the approach to take. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195732#comment-13195732 ] Robert Muir commented on SOLR-2358: --- {quote} I can't currently get into the hudson machine - used the wrong username the other day and seemed to get ip banned pretty much right away. Looking into getting that undone. {quote} Yeah thats probably the best way to move forward. Otherwise you have to wait like an hour just to see if one tweak to a single test worked. {quote} Which tricks? This could be part of it by the sound of things. {quote} It depends on what the test is doing, but just a few ideas: * any client operations in tests should have a low connect()timeout/so_timeout. if you always set this then it will never hang for long periods of time. * if you absolutely need to test the case where you don't get a timeout but another exception, use an ipv6 test address (eg [ff01::114]). because jenkins has no ipv6, it fails fast always. this won't work forever... * in a situation where you have A talking to B, and you want to test a condition where B goes down, instead of just bringing B down, instead you can consider mocking up a remote node to test failures. bring up a mock downed server (e.g. just a ServerSocket on that same port with reuseAddress=true). this one can return whatever error you want, or just disconnect, and even assert that A tried to connect to it. maybe instead of using real remote jettys at all, most tests could even be totally implemented this way: it would be faster and simpler than spinning up so many jettys in all the tests. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195759#comment-13195759 ] Mark Miller commented on SOLR-2358: --- These tests really need to be done with real jetty instances (at least some of them). I'll try adding some timeouts where we are not currently using them (generally they are used from any test code but not always in non test code). Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195761#comment-13195761 ] Yonik Seeley commented on SOLR-2358: We should be careful of using socket read timeouts in non-test code for operations that could potentially take a long time... commit, optimize, and even query requests (depending on what the request is). By default, solr does not currently time out requests because we don't know what the upper bound is. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195783#comment-13195783 ] Mark Miller commented on SOLR-2358: --- Yup, I agree - in general in non test code we don't want to time out by default - that is why I've stuck to only using them in the tests until now. I've tried adding one to the Solr cmd distributor for a bit though - just to see if that helps on Jenkins any. I'd like to narrow in and at least know if this is the problem or not (blackhole hangups). For some things, like a request to recover, timeouts may be fine I think. Once I am able to log into jenkins again, I can hopefully narrow down what is happening a lot faster. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195787#comment-13195787 ] Yonik Seeley commented on SOLR-2358: bq. For some things, like a request to recover, timeouts may be fine I think. Definitely - we have a lot better handle on Solr created requests. Replication (although it can take a long time to send a big file, there shouldn't be long periods where no packets are sent), PeerSync, etc. Although IIRC, a new cloud-style replication request involves the recipient doing a commit? Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195689#comment-13195689 ] Mark Miller commented on SOLR-2358: --- bq. Should another issue be opened for the tests? I have another issue for the test problem: SOLR-3066 bq. Do the failures reproduce if you ssh into the hudson machine itself and test from there? I can't currently get into the hudson machine - used the wrong username the other day and seemed to get ip banned pretty much right away. Looking into getting that undone. bq. Do any tests rely upon not being able to connect to a tcp/udp port Sometimes, yes - because jetties are going up and down during these tests, sometimes you wouldn't be able to connect - I wouldn't say we rely on it, but it seems it could happen. bq. unless you do some tricks. Which tricks? This could be part of it by the sound of things. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194360#comment-13194360 ] Robert Muir commented on SOLR-2358: --- Should another issue be opened for the tests? Do the failures reproduce if you ssh into the hudson machine itself and test from there? I've found this useful before when things are hard to reproduce. Do any tests rely upon *not* being able to connect to a tcp/udp port (even localhost)? Our hudson machine has an interesting network configuration: it blackholes connections to closed ports, so any tests that rely upon this will just hang (for a very long time!) unless you do some tricks. This is actually great for testing (imo), because it simulates how a real outage can behave: but is likely different from anyone's local machine. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193254#comment-13193254 ] Mark Miller commented on SOLR-2358: --- Okay, I just hit commit. I expect I'll have to do some more test hardening, but I will be pretty responsive to that initially. I have not worked out the whole changes entry and how to handle all of these sub issues - but I will start on that and leave this issue unresolved until I get that done (today or tomorrow depending on how it goes). Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193434#comment-13193434 ] Mark Miller commented on SOLR-2358: --- I knew hudson would get me - that series of tubes runs stuff in some funny land I always have a hard time reproducing. I've ignored a couple tests for the very short term while I try and replicate the first fails on my mac, linux box, or windows VM. So far, it's proving difficult to replicate those fails, but I'll keep banging away over the short term. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192219#comment-13192219 ] Yonik Seeley commented on SOLR-2358: +1, looks good! Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191394#comment-13191394 ] Mark Miller commented on SOLR-2358: --- Okay, tests are passing on my linux box, mac and windows vm. I am working on a patch right now to highlight the changes, then I plan on committing this issue in a day or two. From there, we can iterate on any rough edges on trunk. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190181#comment-13190181 ] Mark Miller commented on SOLR-2358: --- I'm ready to start looking at merging this branch to trunk - the primary blocker to that that I see at the moment is that org.apache.solr.search.TestRecovery does not pass on Windows. After that is resolved, I hope to start the merge process! Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190188#comment-13190188 ] Yonik Seeley commented on SOLR-2358: bq. the primary blocker to that that I see at the moment is that org.apache.solr.search.TestRecovery does not pass on Windows Yeah, it's the old transaction logs that are still open after a shutdown (and the test tries to remove those log files). I'm in the middle of some deleteByQuery stuff right now, but I should be able to figure out a workaround for the TestRecovery issue this weekend. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189347#comment-13189347 ] Mark Miller commented on SOLR-2358: --- I've tried to make good use of atLeast to minimize the times of some of the larger new solrcloud tests, but they are still not super light weight (a few of the new ones spin up multiple jetty instances). Here is where they currently stand in comparison to current tests without any nightly or multiplier boosts: {noformat} Worst Times: test:org.apache.solr.cloud.FullSolrCloudTest time:33.933 test:org.apache.solr.handler.TestReplicationHandler time:30.002 test:org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest time:24.572 test:org.apache.solr.cloud.ChaosMonkeySafeLeaderTest time:24.271 test:org.apache.solr.cloud.RecoveryZkTest time:22.875 test:org.apache.solr.cloud.FullSolrCloudDistribCmdsTest time:22.161 test:org.apache.solr.cloud.BasicDistributedZkTest time:16.696 test:org.apache.solr.search.TestRealTimeGet time:16.385 test:org.apache.solr.TestDistributedGrouping time:15.136 test:org.apache.solr.TestDistributedSearch time:14.609 {noformat} Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184537#comment-13184537 ] Mark Miller commented on SOLR-2358: --- Came up with a conversation with a user in #solr IRC - we really want to change the search param distrib to default to true rather than false when in SolrCloud mode. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184621#comment-13184621 ] Mark Miller commented on SOLR-2358: --- I've made the above change in the branch. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183238#comment-13183238 ] Darren Govoni commented on SOLR-2358: - Great job Mark. Thanks! Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183436#comment-13183436 ] Mark Miller commented on SOLR-2358: --- bq. Perhaps within couple/few weeks, after we stabilize and finish up some hanging work? I think we are pretty close to this! There are only a few more nocommits to work down. There is more to add, but I think we will have something stable enough to start iterating on in trunk - hopefully that will trigger even more testing and feedback - it is getting toward the point where the cost of the branch is starting to outweigh the benefits. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182305#comment-13182305 ] Mark Miller commented on SOLR-2358: --- Hey Darren - I have re written the description a bit, attached a little diagram, and started working on an updated version of the solrcloud wiki page (http://wiki.apache.org/solr/SolrCloud) at http://wiki.apache.org/solr/SolrCloud2. If you have any user level questions, it might be more useful to do those on the user mailing list. Anything more related to development, fire away right here. Loosely, this issue covers the indexing side of the solrcloud vision - the search side had already been largely done in an earlier phase (though some of that has been improved as well in this phase). Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: 2shard4server.jpg, SOLR-2358.patch The indexing side of SolrCloud - the goal of this issue is to provide durable, fault tolerant indexing to an elastic cluster of Solr instances. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177713#comment-13177713 ] Mark Miller commented on SOLR-2358: --- As I was working on transforming the old distrib update processor code into something we needed for solrcloud, I dropped it's ability to buffer updates. It just made work quicker and I wasn't really sure how much re-factoring would end up happening, so I didn't want to spend too much time on something that only related to performance so early. I'm going to work on adding back buffering to the new SolrCmdDistributor class shortly - I think it means I have to move 'forward failures' retry logic back into the SolrCmdDistributor - I had this there before, but it was ugly, so I pulled it up a level into the distrib update processor. I think with buffering though, it needs to go back. (when a forward to leader fails, we would often like to pause and retry as it is possible the leader went down and now there is a new one) Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177743#comment-13177743 ] Mark Miller commented on SOLR-2358: --- Okay - I've got basic buffering back - I've lost forwarding retries for the moment though - I'll wait to commit to the branch until I've brought that back. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177864#comment-13177864 ] Mark Miller commented on SOLR-2358: --- Buffering is back in with retries on failed forwards to leaders. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163178#comment-13163178 ] Mark Miller commented on SOLR-2358: --- We are starting to get some stable, usable stuff here (even though there is much to do!). We are also starting to get some users that are interested in using this stuff (critical feedback there). So I'd like to propose we try and merge the branch into trunk sooner rather than later, and then iterate from there. Anything too experimental in the future could move back onto a branch again. This will make the merge a bit more digestible as well - rather than building up a crazy amount of differences on the branch. There are also a variety of improvements and fixes in the testing framework and elsewhere that would be nice to get back into trunk. Perhaps within couple/few weeks, after we stabilize and finish up some hanging work? Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163346#comment-13163346 ] Mark Miller commented on SOLR-2358: --- I just made it so that version can be specified on delete's in solrxml and did the work necessary for distrib deletes to work with versioning. You can do delete by id now. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162464#comment-13162464 ] Yonik Seeley commented on SOLR-2358: I've made the distrib update processor default. I had to @Ignore BasicZkTest for some reason though. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161759#comment-13161759 ] Mark Miller commented on SOLR-2358: --- note: distrib delete by id not working at the moment - we need to start propagating versions on SolrCmd objects - right now they are lost on conversion to an update request, and the versioning code is not happy. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155693#comment-13155693 ] Lance Norskog commented on SOLR-2358: - Lamport Clocks are a time-tested way to sequence actions across a network. In this case, you can use an iterate-until-happy algorithm using the locks. [Google Lamport Clock|https://www.google.com/search?q=lamport+clock] Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125421#comment-13125421 ] Mark Miller commented on SOLR-2358: --- P.S. This lock is simply for auto layout of the cluster - if you are going to manually specific the layout, it wouldn't be used. If we ended up with an overseer, this lock could happen on it instead. Basically, if all the nodes fire up at the same time, you still want them to be sanely assigned to be a shard / replica, which requires knowing the assignments that have already happened. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123702#comment-13123702 ] Mark Miller commented on SOLR-2358: --- Initially, a request will be fully synchronous and will not return success to the client until the request is sent to each replica. So if a leader goes down before all replicas receive and ACK the request, the client will not get an ACK. A new leader will be elected. When the downed, previous leader comes back, he will come up in recovery mode. I expect recovery to be a difficult part and we have not fully worked it out yet. To recover, the node will have to talk to the leader and figure out what it has that it should not, what it doesn't have, etc. Then the recovering node either receives replays, or replaces the entire index. Lot's of details to work out here. You have an interesting problem in that some replica leader candidates may have an update while others don't, as the leader may have died in the middle of relaying requests. We might prefer a new leader with the greatest versioned doc? Most client retries in this case will be fine (global unique id's are required, so no worry about dupes). Then replicas talk to the leader and sync up. Or when a new leader is elected, replicas just talk amongst each other and sync up, or… If the leader fails right before sending an ACK, the client will likely repeat the request. In the case of doc adds/updates and the same id it will just replace the previous success or will be able to use optimistic locking to figure out that either its update or someone else's actually went through already. The client would already know that perhaps its update went through because the connection would have timed out rather than receive a failure. Eventually, we might consider a mode where the request is ACK'd before it's on all replicas, in which case you might accept a higher risk of data loss. bq. indexes diverge because some replicas commit a change while others do not It's an area we have not fully worked out (though Yonik has likely thought about a lot of this more than I have yet) - initially though, Yonik's point was that you can usually expect success on all nodes unless the issue is something that would require the node come down and then come back in recovery mode I think. We certainly want to be resilient here eventually though. As we work through recovery scenarios, I think this will become more clear. Long, short, we have been discussing and thinking about these various scenarios, but largely we are also taking things an issue at a time. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123706#comment-13123706 ] Mark Miller commented on SOLR-2358: --- bq. Generally speaking, it seems like we should avoid locks as much as possible. Should be more scalable... Yeah, I had the same initially reaction - a collection wide lock? Who likes locks? In reality, I'm not too worried though - its a simple very short lock for changing the cluster layout for a collection - this is not a normal thing that will happen - normally the cluster layout will be stable - this is mostly just as the cluster is coming up. So for simplicity and in the spirit of getting something working, it's easy to just start with a simple lock here - if it's really a problem (I doubt it myself), it's easy enough to do this differently later. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122953#comment-13122953 ] Mark Miller commented on SOLR-2358: --- Okay, I'm going to commit some really early stuff to the branch here...ugly code and lots of system.out's still there...but we can start tying in versioning and what not... Commit adds the distrib update processor and makes it cloud aware. If you add a doc to a replica, it forwards it to the leader. If a doc comes to the leader, it versions it (super mock/fake at the moment - param is set to docversion=yes) and forwards it to each replica in the shard (including itself). Also a couple basic tests added around this, and other little fixes that where found/needed along the way... The current main test for this fires up a control and 3 shards, each with 1 replia (6 cores total). Indexing is then round robin'd to each shard (randomly adding either to the leader or the replica). Then the standard distrib search tests are run (with load balancing across replicas) and results compared with control. Early, early, stuff - but it's a start. None of the hashing stuff we will be doing involved yet. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123004#comment-13123004 ] Mark Miller commented on SOLR-2358: --- Actually, commit will be a bit delayed - new test likes to hang when running in parallel to others with ant test - will have to dig... Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123198#comment-13123198 ] Yonik Seeley commented on SOLR-2358: As far as locking vs leader, I think maybe both can make sense. Some things are logically more node specific and a lock can make more sense there (so that a node can modify it's own state). Also, something like a command to create a new collection might be easier with a cluster lock. The node that received the command can just do it, rather than introducing logic to forward the command to the cluster leader (or put the request in a ZK queue or something, to be pulled by someone, which still needs coordination to make sure only one node is trying to do it). On the other hand, cluster overseer code that might want to watch the cluster and change the configuration... a single cluster leader makes sense there (and they may end up also grabbing some sort of lock to avoid conflicts with what other nodes may do). Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123217#comment-13123217 ] Ted Dunning commented on SOLR-2358: --- I think locks should be completely out of bounds if only because they are hell to deal with in the presence of failures. This is a major reason that ZK is not a lock manager but supports atomic updates at a fundamental level. State of a node doesn't need a lock. The node should just update it's own state and that state should be ephemeral so if the node disappears, the state reflects that. Anybody who cares in a real-time kind of way about the state of that node should put a watch on that node's state. Creating a new collection is relatively trivial without a lock as well. One of the simplest ways is to simply put a specification of the new collection into a collections directory in ZK. The cluster overseer sees the addition and it parcels out shard assignments to nodes. The nodes see the assignments change and they take actions to conform to the specification, advertising their progress in their state files. All that is needed here is atomic update which ZK does just fine. If it helps, there is a simplified form of this in Chapter 16 of Mahout in action. The source code for this example is available at https://github.com/tdunning/Chapter-16. This example only has nodes, but the basic idea of parcelling out assignments is the same. A summary of what I would suggest is this: - three directories: {code} /collections /node-assignments /node-states {code} The /collections directory is updated by anybody wishing to advertise or delete a collection. The node-assignments directory is updated only by the overseer. The node-states directory is updated by each node. - one leader election file {code} /cluster-leader {code} All of the potential overseers try to create this file (ephemerally) and insert their IP and port. The one that succeeds is the overseer, the others watch for the file to disappear. On disconnect from ZK, the overseer stops acting as overseer, but does not tear down local state. On reconnect, the overseer continues acting as overseer. On session expiration, the overseer tears down local state and attempts to regain the leadership position. The cluster overseer never needs to grab locks since atomic read-modify-write to node state is all that is required. Again for emphasis, 1) cluster-wide locks are a bug in a scalable clustered system. Leader election is an allowable special case. 2) locks are not required for clustered SOLR. 3) a lock-free design is incredibly simple to implement. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123231#comment-13123231 ] Ted Dunning commented on SOLR-2358: --- Mark, How do you handle failure scenarios? The failures I am curious about are: - the leader fails, but a transaction is still sent to it because the client didn't get the memo in time - the leader fails but has already written a transaction locally without having a chance to forward it to the followers - the leader fails after writing locally and to the replicas but before sending an ACK - a replica is partitioned from the cluster, a transaction is received and committed by all live replicas and then the failed index returns from the land of the living dead. The bad behaviors that need to be avoided include - document acked but not inserted - document not acked, inserted again and two copies wind up in the index - indexes diverge because some replicas commit a change while others do not Two phase commit is not generally a viable solution for this in a cluster where failures can occur because it requires locks to be taken. Once these locks are taken, the cluster cannot proceed until the locks are cleared and this cannot be done reliably in the presence of failures. Zookeeper avoids this to a large degree by making updates idempotent before they are inserted into the update queue. This means that if the updates are done more than once, most importantly during error recovery, that no error actually occurs. This is what makes ZK able to take snapshots without stopping the world. It does not entirely resolve the case of transactions that are committed but not acked. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Components: SolrCloud, update Reporter: William Mayor Priority: Minor Fix For: 4.0 Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995570#comment-12995570 ] Alex Cowell commented on SOLR-2358: --- bq. Since this functionality is core to Solr and should always be present, it would be natural to either build it into the DirectUpdateHandler2 or to add this processor to the set of default UpdateProcessors that are executed if no update.processor parameter is specified. What advantage would we gain from moving this functionality into DirectUpdateHandler2? From what I understand, the UpdateHandler deals directly with the index whereas the DistributedUpdateRequestProcessor merely takes requests deemed to be distributed by the request handler and distributes them to a list of shards based on a distribution policy. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Reporter: William Mayor Priority: Minor Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995628#comment-12995628 ] Jan Høydahl commented on SOLR-2358: --- I'm not sure if DirectUpdateHandler2 is the right location either. My point is that the user should not need to manually make sure that the UpdateProcessor is present in all his UpdateChains for distributed indexing to work. See new issue SOLR-2370 for a suggestion on how to tackle this. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Reporter: William Mayor Priority: Minor Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994766#comment-12994766 ] Jan Høydahl commented on SOLR-2358: --- See SOLR-2293 for some thoughts. Since this functionality is core to Solr and should always be present, it would be natural to either build it into the DirectUpdateHandler2 or to add this processor to the set of default UpdateProcessors that are executed if no update.processor parameter is specified. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Reporter: William Mayor Priority: Minor Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org