[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-11 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Attachment: 
SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch

I promissed Robert Muir to make a test of the feature introduced here in 
SOLR-4114 as a unit-test directly on OverseerCollectionProcessor. I did this in 
attached SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch. It 
fits on top of revision 1420194 of branch_4x, but shouldnt be hard to port to 
other branches, since it is basically just a new test 
OverseerCollectionProcessorTest.

Besides the new test, OverseerCollectionProcessor has been modified a little in 
order to easily be able to extend it in the test.

OverseerCollectionProcessorTest tests OverseerCollectionProcessor alone, by 
mocking the components it interacts with directly:
* DistributedQueue - the work-queue with messages from ZK
* ZkStateReader
* ClusterState
* ShardHandler - the component handling/distributing the CoreAdmin requests 
comming out of OverseerCollectionProcessor.

I wanted to use mockito but found that you are already using easymock, so I 
decided to use that. I had to upgrade easymock from version 2.0 to version 3.0, 
because I wanted to mock classes (not only interfaces) - nothing is interfaces 
in Solr. Guess no one would mind that.

OverseerCollectionProcessorTest tests a few things including the feature 
introduced here in SOLR-4114, and to some extend eliminates the additional 
test-parts added to BasicDistributedZkTest here in SOLR-4114. A.o. the 
controversial 10-60 sec wait test
{code}
int liveNodes = 
getCommonCloudSolrServer().getZkStateReader().getClusterState().getLiveNodes().size();
int numShards = (liveNodes/2) + 1;
int numReplica = 1;
int maxShardsPerNode = 1;
collectionInfos = new HashMapString,ListInteger();
createCollection(collectionInfos, cnt, numShards, numReplica, maxShardsPerNode);
checkCollectionIsNotCreated(collectionInfos.keySet().iterator().next());
{code}

OverseerCollectionProcessorTest establishes a nice platform for testing 
OverseerCollectionProcessor on unit-level using mocking, and can probably be 
extended to further eliminate tests in 
BasicDistributedZkTest.testCollectionAPI. And it can be extended to do more 
than just create-tests - also do reload-tests and remove-tests.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: 
 SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, 
 SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, 
 SOLR-4114_trunk.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4114:
--

Fix Version/s: 5.0
   4.1
 Assignee: Mark Miller  (was: Per Steffensen)

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Mark Miller
  Labels: collection-api, multicore, shard, shard-allocation
 Fix For: 4.1, 5.0

 Attachments: 
 SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch, 
 SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, 
 SOLR-4114_trunk.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-03 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4114:
--

Attachment: SOLR-4114.patch

My latest patch - I'll commit this soon and we can iterate from there.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, 
 SOLR-4114.patch, SOLR-4114_trunk.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-02 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Attachment: SOLR-4114_trunk.patch

Here is the patch for trunk (5.x). The main mistake was the you didnt used the 
calculated shardName as the shardName - instead you used collectionName. This 
caused different shards on the same node to shard name and data-dir - not so 
cool :-)

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch, 
 SOLR-4114_trunk.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-12-01 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4114:
--

Attachment: SOLR-4114.patch

Here is a patch of my quick attempted merge. The test fails in the collections 
api test while waiting for recoveries to finish after creating a collection(s).

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Attachment: SOLR-4114.patch

New patch SOLR-4114.patch attached (not including the 
only-spread-shards-over-solrs-mentioned-in-provided-list thingy)

New, compared to the first patch:
* maxShardsPerNode implemented
* Tests (BasicDistributedZkTest.testCollectionAPI) now tests additional stuff
** That the expected number of shards are actually created
** That if there is not room for all the shards due to the provided 
maxShardsPerNode, nothing is created

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Summary: Collection API: Allow multiple shards from one collection on the 
same Solr server  (was: Allow multiple shards from one collection on the same 
Solr server)

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation

 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Attachment: SOLR-4114.patch

About SOLR-4114.patch:
* It fits on top of revision 1412602 of branch lucene_solr_4_0
* The shard allocation algorithm explained
** Shards are allocated to Solr servers one by one. The next shard is always 
assigned to the next server in a shuffled list of live servers. Whenever you 
reach the end of the list of live servers you start over again.
** Replica for a certain shard are allocated to the #replication-factor next 
servers in the list
** replication-factor is reduced if it is requested to be higher than the 
number of live servers - 1. Kinda pointless to run two shards belonging to the 
same slice on the same server
*** Unfortunately only able to log the decission about such a 
replication-factor reduction - no easy way to get info back to caller since the 
job is handled asynchronously by the Overseer
* Besides that a bug-fix included
** OverseerCollectionProcessor.createCollection and .collectionCmd reused 
params-objects too much. The same params-object was used for several submits to 
ShardHandler, but since the ShardsHandler issues asynchronous jobs, the 
params-object might be changed by the OverseerCollectionProcessor before the 
asynchronous job is executed - resulting in a lot of fun :-) Comments added 
around the fixes
*** This bug does not appear to be fixed on lucene_solr_4_0
*** It appears to be partly fixed on branch_4x - fixed in collectionCmd (used 
for delete and reload) but not in createCollection (used for create)
* Besides that a little cleaning up - I know you don't like it, but my eyes 
cannot handle such mess :-)
** BasicDistributedZkTest: Introduced method getCommonCloudSolrServer to be 
used instead of just using solrj. The solrj variable was initialized in method 
queryServer but used lots of other places. For this to work your test needs to 
call queryServer before any of the other methods using solrj. This is fragile, 
when you change the test, and if you (as I did) commented out parts of the test.
** HttpShardHandler: Made getURLs thread-safe so that you do not have to be so 
careful using it
** General: Took a small step towards consistent usage of terms collection, 
node-name, node-base-url, slice, shard and replica. All over the code the terms 
are mixed up, I took the opportunity to clean up in the code nearby my changes. 
IMHO you should do a lot more cleaing up in this project. I will try to sneak 
in clean-ups whenever I can :-) My view on correct meaning of terms
*** collection: A big logical bucket to fill data into
*** slice: A logical part of a collection. A part of the data going into a 
collection goes into a particular slice. Slices for a particular collection are 
non-overlapping
*** shard: A physical instance of a slice. Running without replica there is one 
shard per slice. Running with replication-factor X there are X+1 shards per 
slice.
*** node-base-url: The prefix/base (up to and including the webapp-context) of 
the URL for a specific Solr server
*** node-name: A logical name for the Solr server - the same as node-base-url 
except /'s are replaced by _'s and the protocol part (http(s)://) is removed

If you dont want the cleaning up stuff the following parts of the patch can be 
left out
* BasicDistributedZkTest: Eveything except maybe the change from new 
ZkCoreNodeProps(node).getCoreUrl() to 
ZkCoreNodeProps.getCoreUrl(node.getStr(ZkStateReader.BASE_URL_PROP), 
collection) in method getUrlFromZk
* ShardHandler: Everything
* HttpShardHandler: Everything
* OverseerCollectionProcessor: The renaming stuff

The important stuff is in OverseerCollectionProcessor - the modified shard 
allocation algoritm that allows for multiple shards from the same collection on 
each Solr server, and the bug-fix dealing with too eager reuse of 
params-objects.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an