[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-4114: - Attachment: SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch I promissed Robert Muir to make a test of the feature introduced here in SOLR-4114 as a unit-test directly on OverseerCollectionProcessor. I did this in attached SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch. It fits on top of revision 1420194 of branch_4x, but shouldnt be hard to port to other branches, since it is basically just a new test OverseerCollectionProcessorTest. Besides the new test, OverseerCollectionProcessor has been modified a little in order to easily be able to extend it in the test. OverseerCollectionProcessorTest tests OverseerCollectionProcessor alone, by mocking the components it interacts with directly: * DistributedQueue - the work-queue with messages from ZK * ZkStateReader * ClusterState * ShardHandler - the component handling/distributing the CoreAdmin requests comming out of OverseerCollectionProcessor. I wanted to use mockito but found that you are already using easymock, so I decided to use that. I had to upgrade easymock from version 2.0 to version 3.0, because I wanted to mock classes (not only interfaces) - nothing is interfaces in Solr. Guess no one would mind that. OverseerCollectionProcessorTest tests a few things including the feature introduced here in SOLR-4114, and to some extend eliminates the additional test-parts added to BasicDistributedZkTest here in SOLR-4114. A.o. the controversial 10-60 sec wait test {code} int liveNodes = getCommonCloudSolrServer().getZkStateReader().getClusterState().getLiveNodes().size(); int numShards = (liveNodes/2) + 1; int numReplica = 1; int maxShardsPerNode = 1; collectionInfos = new HashMapString,ListInteger(); createCollection(collectionInfos, cnt, numShards, numReplica, maxShardsPerNode); checkCollectionIsNotCreated(collectionInfos.keySet().iterator().next()); {code} OverseerCollectionProcessorTest establishes a nice platform for testing OverseerCollectionProcessor on unit-level using mocking, and can probably be extended to further eliminate tests in BasicDistributedZkTest.testCollectionAPI. And it can be extended to do more than just create-tests - also do reload-tests and remove-tests. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114_trunk.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4114: -- Fix Version/s: 5.0 4.1 Assignee: Mark Miller (was: Per Steffensen) Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Mark Miller Labels: collection-api, multicore, shard, shard-allocation Fix For: 4.1, 5.0 Attachments: SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114_trunk.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4114: -- Attachment: SOLR-4114.patch My latest patch - I'll commit this soon and we can iterate from there. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114_trunk.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-4114: - Attachment: SOLR-4114_trunk.patch Here is the patch for trunk (5.x). The main mistake was the you didnt used the calculated shardName as the shardName - instead you used collectionName. This caused different shards on the same node to shard name and data-dir - not so cool :-) Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114_trunk.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4114: -- Attachment: SOLR-4114.patch Here is a patch of my quick attempted merge. The test fails in the collections api test while waiting for recoveries to finish after creating a collection(s). Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-4114: - Attachment: SOLR-4114.patch New patch SOLR-4114.patch attached (not including the only-spread-shards-over-solrs-mentioned-in-provided-list thingy) New, compared to the first patch: * maxShardsPerNode implemented * Tests (BasicDistributedZkTest.testCollectionAPI) now tests additional stuff ** That the expected number of shards are actually created ** That if there is not room for all the shards due to the provided maxShardsPerNode, nothing is created Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-4114: - Summary: Collection API: Allow multiple shards from one collection on the same Solr server (was: Allow multiple shards from one collection on the same Solr server) Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-4114: - Attachment: SOLR-4114.patch About SOLR-4114.patch: * It fits on top of revision 1412602 of branch lucene_solr_4_0 * The shard allocation algorithm explained ** Shards are allocated to Solr servers one by one. The next shard is always assigned to the next server in a shuffled list of live servers. Whenever you reach the end of the list of live servers you start over again. ** Replica for a certain shard are allocated to the #replication-factor next servers in the list ** replication-factor is reduced if it is requested to be higher than the number of live servers - 1. Kinda pointless to run two shards belonging to the same slice on the same server *** Unfortunately only able to log the decission about such a replication-factor reduction - no easy way to get info back to caller since the job is handled asynchronously by the Overseer * Besides that a bug-fix included ** OverseerCollectionProcessor.createCollection and .collectionCmd reused params-objects too much. The same params-object was used for several submits to ShardHandler, but since the ShardsHandler issues asynchronous jobs, the params-object might be changed by the OverseerCollectionProcessor before the asynchronous job is executed - resulting in a lot of fun :-) Comments added around the fixes *** This bug does not appear to be fixed on lucene_solr_4_0 *** It appears to be partly fixed on branch_4x - fixed in collectionCmd (used for delete and reload) but not in createCollection (used for create) * Besides that a little cleaning up - I know you don't like it, but my eyes cannot handle such mess :-) ** BasicDistributedZkTest: Introduced method getCommonCloudSolrServer to be used instead of just using solrj. The solrj variable was initialized in method queryServer but used lots of other places. For this to work your test needs to call queryServer before any of the other methods using solrj. This is fragile, when you change the test, and if you (as I did) commented out parts of the test. ** HttpShardHandler: Made getURLs thread-safe so that you do not have to be so careful using it ** General: Took a small step towards consistent usage of terms collection, node-name, node-base-url, slice, shard and replica. All over the code the terms are mixed up, I took the opportunity to clean up in the code nearby my changes. IMHO you should do a lot more cleaing up in this project. I will try to sneak in clean-ups whenever I can :-) My view on correct meaning of terms *** collection: A big logical bucket to fill data into *** slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping *** shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice. *** node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server *** node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed If you dont want the cleaning up stuff the following parts of the patch can be left out * BasicDistributedZkTest: Eveything except maybe the change from new ZkCoreNodeProps(node).getCoreUrl() to ZkCoreNodeProps.getCoreUrl(node.getStr(ZkStateReader.BASE_URL_PROP), collection) in method getUrlFromZk * ShardHandler: Everything * HttpShardHandler: Everything * OverseerCollectionProcessor: The renaming stuff The important stuff is in OverseerCollectionProcessor - the modified shard allocation algoritm that allows for multiple shards from the same collection on each Solr server, and the bug-fix dealing with too eager reuse of params-objects. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an