[ 
https://issues.apache.org/jira/browse/SOLR-17804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987993#comment-17987993
 ] 

Chris M. Hostetter commented on SOLR-17804:
-------------------------------------------

some concerns after skimming your branch...
----
{code:java}
protected final AtomicReference<CloudSolrClient> solrClientRef = ...{code}
I'm a little concerned that future devs might accidentally use 
{{solrClientRef}} directly (w/o going through the Supplier) since it's got 
protected scope in the {{KafkaCrossDcConsumer}} class – it would be safer to 
encapsulate it inside the Supplier so no other code in this class can access it

 
----
{code:java}
                if (existingClient != null && 
existingClient.getClusterStateProvider().isClosed()) {
                  log.info("- re-creating Solr client because the previous one 
was closed");
{code}
 # this comment is missleading – we don't know that the previous Solr client 
was closed – we only know that the previous client's cluster state provider was 
closed
 # AFAICT if we go dow this code path, nothing ever ensures that 
{{existingClient}} gets closed – so it will leak it's internals (including the 
Executor inside the underlying {{LBSolrClient}}
 ## the test where you confirm we get a new client when the state reader is 
closed should also confirm the old client gets closed

 
----
This alternate code path (used to create the very first client) ...
{code:java}
          } else {
            synchronized (solrClientRef) {
              CloudSolrClient newClient = createSolrClient(conf);
              solrClientRef.set(newClient);
              return newClient;
            }
          } {code}
...is missing the "double check" inside the sync block (to see if 
{{solrClientRef.get()}} is _still_ null)

Which means if thread-2 is waiting to enter that sync block because thread-1 is 
already in it, thread-2 won't notice that thread-1 already called 
{{createSolrClient(...)}} and we'll leak another CloudSolrCLient

 

I think if we change the {{Supplier}} to something like...
{code:java}
private interface ClientSupplier extends Supplier<CloudSolrClient> {
  // call this method in the places that currently call 
IOUtils.closeQuietly(solrClientRef.get()) directly
  public void quietCloseClient();
}
protected final ClientSupplier solrClientSupplier;{code}
... then we can do something like...
{code:java}
solrClientSupplier = new ClientSupplier() {
  private final AtomicRefrence<CloudSolrClient> solrClientRef = new 
AtomicRefrence(createSolrClient(conf));
  @Override
  public CloudSolrClient get() {
    CloudSolrClient existingClient = solrClientRef.get();
    if (existingClient.getClusterStateProvider().isClosed()) {
      // re-open the client if CSP was closed
      synchronized (solrClientRef) {
        existingClient = solrClientRef.get();
        if (existingClient.getClusterStateProvider().isClosed()) {
          log.info("- re-creating Solr client because it's CSP is closed");
          CloudSolrClient newClient = createSolrClient(conf);
          solrClientRef.set(newClient);
          IOUtils.closeQuietly(existingClient);
          return newClient;
        }
      }
    }
    return existingClient;
  }
  @Override
  public void quietCloseClient() {
    IOUtils.closeQuietly(solrClientRef.get())
  }
}{code}
..and completely protect the {{solrClientRef}} from miss-use, ensure any "old" 
client" gets closedwhen no longer in use, and simplify the locking logic.

WDYT?

> CrossDC Consumer tries to reuse an already closed SolrClient
> ------------------------------------------------------------
>
>                 Key: SOLR-17804
>                 URL: https://issues.apache.org/jira/browse/SOLR-17804
>             Project: Solr
>          Issue Type: Bug
>          Components: module - crossDC
>    Affects Versions: 9.8.1
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Major
>             Fix For: 9.9
>
>
> When network is down for a while a CloudSolrClient used by the Consumer 
> application becomes unusable because its ZkClientStateProvider gets closed:
> {code:java}
> 272746 ERROR (KafkaCrossDcConsumerWorker) [n: c:ab-test s: r: x: t:] 
> o.a.s.c.m.SolrMessageProcessor Unable to connect to solr server. Not 
> consuming.                                                                    
>                                 
>           => org.apache.solr.common.AlreadyClosedException                    
>                                                                               
>                                                                               
>                   
>     at 
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getZkStateReader(ZkClientClusterStateProvider.java:222)
>                                                                               
>                                                  
> org.apache.solr.common.AlreadyClosedException: null                           
>                                                                               
>                                                                               
>                   
>     at 
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getZkStateReader(ZkClientClusterStateProvider.java:222)
>                                                                               
>             
>     at 
> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:217)
>                                                                               
>                    
>     at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:364)
>                                                                               
>     
>     at 
> org.apache.solr.crossdc.messageprocessor.SolrMessageProcessor.connectToSolrIfNeeded(SolrMessageProcessor.java:363)
>                                                                               
>                
>     at 
> org.apache.solr.crossdc.messageprocessor.SolrMessageProcessor.handleItem(SolrMessageProcessor.java:74)
>                                                                               
>                           
>     at 
> org.apache.solr.crossdc.consumer.KafkaCrossDcConsumer.lambda$sendBatch$2(KafkaCrossDcConsumer.java:375)
>                                                                               
>                          
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] 
>                                                                               
>                                                                               
>            
>     at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]         
>                                                                               
>                                                                               
>                   
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]                                                                        
>                                                                               
>           
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]                                                                        
>                                                                               
>           
>     at java.lang.Thread.run(Thread.java:829) [?:?] {code}
> The Consumer keeps trying to use this client instance, which results in 
> multiple errors and no recovery without restarting the Consumer application.
> The immediate fix is for the Consumer to try to recover from this state by 
> re-creating the SolrClient.
> A broader question (to address in another ticket) is what should be the 
> proper behavior of CloudSolrClient when its ClusterStateProvider gets closed 
> - IMHO the client should be closed too.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to