[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator

David Smiley (Jira) Sat, 11 Jan 2025 23:35:34 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912241#comment-17912241
 ]


David Smiley commented on SOLR-16116:
-------------------------------------

Also, same test but different failing test run on a PR of mine 
[here|https://github.com/apache/solr/actions/runs/12721792820/job/35465118919?pr=3025]
 shows interesting behavior.  ObjectReleaseTracker shows ZkCollectionTerms and 
a shard terms as not closed.  I looked at the logs closely while examining 
ZkController, and it shows that one of the 4 nodes's ZkController.reconnect was 
called _after_ the nodes were shut down (thus after ZkController.close).  This 
is not an area I'm very comfortable in but the easy/naive answer is maybe 
onReconnect() needs to check for the shutdown state()?  Adding shutdown checks 
blindly is hacky though because there's usually a race condition.  Thinking 
out-loud here, Objects that are "ref-counted" are more resilient.  If 
ZkController was ref-counted, then onReconnect would have to first incref, 
failing that quit but on success would block a race from shutting it down.  I'm 
not actually recommending ref-counting here but pointing out its merits.  I 
suppose a better solution is to ensure that the executor that's actually 
running the onReconnect logic is shut down prior to ZkController closing.

> Refactor the Solr Zookeeper logic to use Apache Curator
> -------------------------------------------------------
>
>                 Key: SOLR-16116
>                 URL: https://issues.apache.org/jira/browse/SOLR-16116
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Houston Putman
>            Assignee: Houston Putman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: main (10.0)
>
>          Time Spent: 10h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator

Reply via email to