[ 
https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397362#comment-13397362
 ] 

Sami Siren commented on SOLR-3488:
----------------------------------

Mark, nice work.

bq. I'm still somewhat unsure about handing failures though...

IMO Fail fast: at minimum an error should be reported back (the completed queue 
Yonik mentions?). It seems that in the latest patch even in case of failure the 
job is removed from queue.

bq. I also have not switched to requiring or respecting a replication factor - 
I was thinking perhaps specifying nothing or -1 would give you what you have 
now? An infinite rep factor? And we would enforce a lower rep factor if 
requested?

Sounds good to me.

bq. I'm not sure how replication factor would be enforced though? The Oveerseer 
just periodically prunes and adds given what it sees and what the rep factor 
is? Is that how failures should be handled? Don't readd to the queue, just let 
the periodic job attempt to fix things later?

I would first implement the simplest? case first where if not enough nodes are 
available to meet #shards and/or #replication factor: report error to user and 
do not try to create the collection. Or did you mean at runtime after the 
collection has been created?

I have one question about the patch specifically in the 
OverseerCollectionProcessor where you create the collection: why do you need 
the collection param? 
In context of creating N * R cluster: why don't you just go though live nodes 
to find available nodes and perhaps then based on some "strategy" class create 
specific shards (with shardids) to specific nodes? The rest of the overseer 
would have to respect that same strategy (instead of the dummy AssignShard that 
is now used) so that things would not break when new nodes are attached to the 
collection. Perhaps this "strategy" could also handle things like time based 
sharding and whatnot?

bq. it should be easy to merge but I think that it'd be also good to start 
committing your patch and improve things on SVN from now on to ease code review 
(no patch merging) and concurrent works.

+1 for committing this as is, there are some minor weak spots in the current 
patch like checking the input for the collections api requests (unexisitng 
params cause OverseerCollectionProcessor to die with NPE), reporting back input 
errors etc. put lets just put this in and open more jira issues to cover the 
improvement tasks and bugs?

One more thing: I am seeing BasicDistributedZkTest failing (not just 
sporadically), nut sure if it is related, with the following error:

{code}
 [junit4] ERROR   0.00s J1 | BasicDistributedZkTest (suite)
   [junit4]    > Throwable #1: java.lang.AssertionError: ERROR: 
SolrIndexSearcher opens=496 closes=494
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([F1C0A91EB78BAB39]:0)
   [junit4]    >        at org.junit.Assert.fail(Assert.java:93)
   [junit4]    >        at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190)
{code}



                
> Create a Collections API for SolrCloud
> --------------------------------------
>
>                 Key: SOLR-3488
>                 URL: https://issues.apache.org/jira/browse/SOLR-3488
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>         Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, 
> SOLR-3488_2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to