[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200470#comment-15200470
 ] 

Hoss Man commented on SOLR-8862:
--------------------------------

bq. The first call to createEphemeralLiveNode() is not actually called from the 
constructor; it's called from the OnReconnect handler much later, if you lose 
your ZK session and have to create a new one. At least, that's the theory. Are 
you seeing it actually get called early?

Ah ... ok ... i'm probably wrong then -- i thought the "OnReconnect" handler 
was also used on the _initial_ connect as well.

I'll edit my other comment to reduce confusion

bq. This works reasonably well for things like routing search requests. I can 
see how it might fall over if you're depending on live_nodes for doing cluster 
level operations.

that's my concern -- CloudSolrClient consults {{/live_nodes}} (via 
{{ClusterState.getLiveNodes()}}) to decide which nodes are up for any requests 
that aren't explicitly routable updates -- in my particular case i'm getting 
burned by collection API calls...

I guess I see your point though ... for any request involving specific 
collection(s) clients can use the replica state to see if they are ACTIVE (or 
if they are a LEADER for update situations) .. and CloudsolrClient does that 
even for searchers.  So I guess the "practical" impacts of this aren't as 
severe as i initially thought ... 

but I still feel like we need something per-node in ZK that isn't set to  
"true" until that node is actually listening on it's port. 

> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8862
>                 URL: https://issues.apache.org/jira/browse/SOLR-8862
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
>     configureCluster(numServers)
>       .addConfig(configName, configDir.toPath())
>       .configure();
>     Map<String, String> collectionProperties = ...;
>     assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>                                            configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to