[
https://issues.apache.org/jira/browse/SOLR-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987993#comment-17987993
]
Chris M. Hostetter commented on SOLR-17804:
-------------------------------------------
some concerns after skimming your branch...
----
{code:java}
protected final AtomicReference<CloudSolrClient> solrClientRef = ...{code}
I'm a little concerned that future devs might accidentally use
{{solrClientRef}} directly (w/o going through the Supplier) since it's got
protected scope in the {{KafkaCrossDcConsumer}} class – it would be safer to
encapsulate it inside the Supplier so no other code in this class can access it
----
{code:java}
if (existingClient != null &&
existingClient.getClusterStateProvider().isClosed()) {
log.info("- re-creating Solr client because the previous one
was closed");
{code}
# this comment is missleading – we don't know that the previous Solr client
was closed – we only know that the previous client's cluster state provider was
closed
# AFAICT if we go dow this code path, nothing ever ensures that
{{existingClient}} gets closed – so it will leak it's internals (including the
Executor inside the underlying {{LBSolrClient}}
## the test where you confirm we get a new client when the state reader is
closed should also confirm the old client gets closed
----
This alternate code path (used to create the very first client) ...
{code:java}
} else {
synchronized (solrClientRef) {
CloudSolrClient newClient = createSolrClient(conf);
solrClientRef.set(newClient);
return newClient;
}
} {code}
...is missing the "double check" inside the sync block (to see if
{{solrClientRef.get()}} is _still_ null)
Which means if thread-2 is waiting to enter that sync block because thread-1 is
already in it, thread-2 won't notice that thread-1 already called
{{createSolrClient(...)}} and we'll leak another CloudSolrCLient
I think if we change the {{Supplier}} to something like...
{code:java}
private interface ClientSupplier extends Supplier<CloudSolrClient> {
// call this method in the places that currently call
IOUtils.closeQuietly(solrClientRef.get()) directly
public void quietCloseClient();
}
protected final ClientSupplier solrClientSupplier;{code}
... then we can do something like...
{code:java}
solrClientSupplier = new ClientSupplier() {
private final AtomicRefrence<CloudSolrClient> solrClientRef = new
AtomicRefrence(createSolrClient(conf));
@Override
public CloudSolrClient get() {
CloudSolrClient existingClient = solrClientRef.get();
if (existingClient.getClusterStateProvider().isClosed()) {
// re-open the client if CSP was closed
synchronized (solrClientRef) {
existingClient = solrClientRef.get();
if (existingClient.getClusterStateProvider().isClosed()) {
log.info("- re-creating Solr client because it's CSP is closed");
CloudSolrClient newClient = createSolrClient(conf);
solrClientRef.set(newClient);
IOUtils.closeQuietly(existingClient);
return newClient;
}
}
}
return existingClient;
}
@Override
public void quietCloseClient() {
IOUtils.closeQuietly(solrClientRef.get())
}
}{code}
..and completely protect the {{solrClientRef}} from miss-use, ensure any "old"
client" gets closedwhen no longer in use, and simplify the locking logic.
WDYT?
> CrossDC Consumer tries to reuse an already closed SolrClient
> ------------------------------------------------------------
>
> Key: SOLR-17804
> URL: https://issues.apache.org/jira/browse/SOLR-17804
> Project: Solr
> Issue Type: Bug
> Components: module - crossDC
> Affects Versions: 9.8.1
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Major
> Fix For: 9.9
>
>
> When network is down for a while a CloudSolrClient used by the Consumer
> application becomes unusable because its ZkClientStateProvider gets closed:
> {code:java}
> 272746 ERROR (KafkaCrossDcConsumerWorker) [n: c:ab-test s: r: x: t:]
> o.a.s.c.m.SolrMessageProcessor Unable to connect to solr server. Not
> consuming.
>
> => org.apache.solr.common.AlreadyClosedException
>
>
>
> at
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getZkStateReader(ZkClientClusterStateProvider.java:222)
>
>
> org.apache.solr.common.AlreadyClosedException: null
>
>
>
> at
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getZkStateReader(ZkClientClusterStateProvider.java:222)
>
>
> at
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:217)
>
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:364)
>
>
> at
> org.apache.solr.crossdc.messageprocessor.SolrMessageProcessor.connectToSolrIfNeeded(SolrMessageProcessor.java:363)
>
>
> at
> org.apache.solr.crossdc.messageprocessor.SolrMessageProcessor.handleItem(SolrMessageProcessor.java:74)
>
>
> at
> org.apache.solr.crossdc.consumer.KafkaCrossDcConsumer.lambda$sendBatch$2(KafkaCrossDcConsumer.java:375)
>
>
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>
>
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
>
>
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
>
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
>
>
> at java.lang.Thread.run(Thread.java:829) [?:?] {code}
> The Consumer keeps trying to use this client instance, which results in
> multiple errors and no recovery without restarting the Consumer application.
> The immediate fix is for the Consumer to try to recover from this state by
> re-creating the SolrClient.
> A broader question (to address in another ticket) is what should be the
> proper behavior of CloudSolrClient when its ClusterStateProvider gets closed
> - IMHO the client should be closed too.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]