[
https://issues.apache.org/jira/browse/SOLR-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913105#comment-17913105
]
David Smiley commented on SOLR-17519:
-------------------------------------
Houston:
When I said "the operator might want to use a load balancer" I meant the
user/deployer, not _necessarily_ the k8s Solr Operator. You imply a load
balancer invalidates the utility of CloudSolrClient but CSC is designed to
reduce network hops. You don't know what HttpClusterStateProvider is for...
See the parent issue for an explanation but I suspect I misunderstand you. You
are making a distinction between CSC and the provider; the provider's sole job
is to provide the ClusterState (including live nodes). CSC's job is to route
user requests to the optimal node.
{quote}I think the dynamic part should be opt-out-able in the
HttpClusterStateProvider
{quote}
But since it exists, still need to deal with the accompanying implementation
details & test. So we don't get a simplicity win. Ah well; shrug.
> CloudSolrClient with HTTP ClusterState can forget live nodes and then fail
> --------------------------------------------------------------------------
>
> Key: SOLR-17519
> URL: https://issues.apache.org/jira/browse/SOLR-17519
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud, SolrJ
> Reporter: David Smiley
> Priority: Major
> Labels: newdev, pull-request-available
> Time Spent: 2h
> Remaining Estimate: 0h
>
> When using CloudSolrClient with HTTP URLs to Solr for the cluster state:
> If all live nodes disappear temporarily (hard cluster restart?), the client
> can permanently fail to talk to the cluster, and thus would need to be
> restarted to recover.
> Credit [~ilan] on the dev list:
> {quote}The current implementation removes non live nodes from the set of
> nodes to connect to. Getting the live nodes requires connecting to a specific
> node in the cluster that is therefore live when that happens. Worst case, if
> there is a single node up in the cluster, the client ends with a single node
> in its connection candidates list. For the issue to manifest, that Solr node
> then has to go down. Subsequently, even if other nodes are up, the client
> only has the address of a down node and can't connect.
> The fix is not a big deal. Nodes initially passed as configuration to the
> client should never be removed from the set of candidate nodes to connect to,
> even if they are not live. Other live nodes could be added to that set (and
> removed from it if we so desire when they are no longer live) to increase
> resiliency in case the cluster does have live nodes but all initially
> configured nodes are not live. The design issue is treating the configured
> set of nodes to connect to and the set of live nodes as one thing.
> {quote}
> See org.apache.solr.client.solrj.impl.BaseHttpClusterStateProvider
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]