Hoss Man created SOLR-11484:
-------------------------------

             Summary: Possible bug with CloudSolrClient directedUpdates & 
cached collection state -- 
TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete
                 Key: SOLR-11484
                 URL: https://issues.apache.org/jira/browse/SOLR-11484
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man



{{TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete}} 
seems to fail with non-trivial frequency, so I grabbed the logs from a recent 
failure and starting trying to follow along with the actions to figure out what 
exactly is happening....

https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/20662/

{noformat}
   [junit4] ERROR   20.3s J1 | 
TestCollectionsAPIViaSolrCloudCluster.testCollectionCreateSearchDelete <<<
   [junit4]    > Throwable #1: 
org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error from 
server at https://127.0.0.1:42959/solr/testcollection_shard1_replica_n3: 
Expected mime type a
pplication/octet-stream but got text/html. <html>
   [junit4]    > <head>
   [junit4]    > <meta http-equiv="Content-Type" 
content="text/html;charset=ISO-8859-1"/>
   [junit4]    > <title>Error 404 </title>
{noformat}

The crux of this failure appears to be a genuine bug in how CloudSolrClient 
uses it's cached ClusterState info when doing (direct) updates.  The key bits 
seem to be:

* CloudSolrClient does _something_ (update,query,etc...) with a collection 
causing the current cluster state for the collection to be cached
* The actual collection changes such that a Solr node/core no longer exists as 
part of the collection
* CloudSolrClient is asked to process an UpdateRequest which triggers the code 
paths for the {{directUpdate()}} method -- which attempts to route the updates 
directly to a replica of the appropriate shard using the (cache) collection 
state info
* CloudSolrClient (may) attempt to send that UpdateRequest to a node/core that 
doesn't exist, getting a 404 -- which does not (seem to) trigger a state 
refresh, or retry to find a correct URL to resend the update to.

Details to follow in comment....




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to