[ https://issues.apache.org/jira/browse/SOLR-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397362#comment-13397362 ]
Sami Siren commented on SOLR-3488: ---------------------------------- Mark, nice work. bq. I'm still somewhat unsure about handing failures though... IMO Fail fast: at minimum an error should be reported back (the completed queue Yonik mentions?). It seems that in the latest patch even in case of failure the job is removed from queue. bq. I also have not switched to requiring or respecting a replication factor - I was thinking perhaps specifying nothing or -1 would give you what you have now? An infinite rep factor? And we would enforce a lower rep factor if requested? Sounds good to me. bq. I'm not sure how replication factor would be enforced though? The Oveerseer just periodically prunes and adds given what it sees and what the rep factor is? Is that how failures should be handled? Don't readd to the queue, just let the periodic job attempt to fix things later? I would first implement the simplest? case first where if not enough nodes are available to meet #shards and/or #replication factor: report error to user and do not try to create the collection. Or did you mean at runtime after the collection has been created? I have one question about the patch specifically in the OverseerCollectionProcessor where you create the collection: why do you need the collection param? In context of creating N * R cluster: why don't you just go though live nodes to find available nodes and perhaps then based on some "strategy" class create specific shards (with shardids) to specific nodes? The rest of the overseer would have to respect that same strategy (instead of the dummy AssignShard that is now used) so that things would not break when new nodes are attached to the collection. Perhaps this "strategy" could also handle things like time based sharding and whatnot? bq. it should be easy to merge but I think that it'd be also good to start committing your patch and improve things on SVN from now on to ease code review (no patch merging) and concurrent works. +1 for committing this as is, there are some minor weak spots in the current patch like checking the input for the collections api requests (unexisitng params cause OverseerCollectionProcessor to die with NPE), reporting back input errors etc. put lets just put this in and open more jira issues to cover the improvement tasks and bugs? One more thing: I am seeing BasicDistributedZkTest failing (not just sporadically), nut sure if it is related, with the following error: {code} [junit4] ERROR 0.00s J1 | BasicDistributedZkTest (suite) [junit4] > Throwable #1: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=496 closes=494 [junit4] > at __randomizedtesting.SeedInfo.seed([F1C0A91EB78BAB39]:0) [junit4] > at org.junit.Assert.fail(Assert.java:93) [junit4] > at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:190) {code} > Create a Collections API for SolrCloud > -------------------------------------- > > Key: SOLR-3488 > URL: https://issues.apache.org/jira/browse/SOLR-3488 > Project: Solr > Issue Type: New Feature > Components: SolrCloud > Reporter: Mark Miller > Assignee: Mark Miller > Attachments: SOLR-3488.patch, SOLR-3488.patch, SOLR-3488.patch, > SOLR-3488_2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org