[jira] [Commented] (SOLR-12415) Solr Loadbalancer client LBHttpSolrClient not working as expected, if a Solr node goes down, it is unable to detect when it become live again due to 404 error

Shawn Heisey (JIRA) Wed, 30 May 2018 06:10:33 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495126#comment-16495126
 ]


Shawn Heisey commented on SOLR-12415:
-------------------------------------

bq. This may limit flexibility of the client that was created without any core 
information.

If the zombie check remains a standard query (which is defined internally as 
the all docs query), it's going to need an index name, or it can't possibly 
succeed.  If the zombie check is changed to call a global admin handler 
instead, then setting a default collection would not be required.

bq. Maybe this would work. But! Isn't a call to get list of valid cores enough 
to know that server is not a zombie? What if there is no core at all?

My initial reaction was to say that if a server has no cores at all, it is not 
a very useful server.  After a little more thought, I realized that there might 
be situations where a server is set up with no cores, and then CoreAdmin is 
used to create them.

The thing about that situation is that you really wouldn't want to use 
LBHttpSolrClient.  You'd want to use HttpSolrClient to talk to one specific 
server.  Load balancing CoreAdmin requests doesn't make any sense -- there 
would be no way to predict which server is going to receive them.


> Solr Loadbalancer client LBHttpSolrClient not working as expected, if a Solr 
> node goes down, it is unable to detect when it become live again due to 404 
> error
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-12415
>                 URL: https://issues.apache.org/jira/browse/SOLR-12415
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 7.2.1, 7.3.1, 7.4
>         Environment: Solr 7.2.1
> 2 servers - master and slave.
>            Reporter: Grzegorz Lebek
>            Priority: Critical
>
> *Context*
>  When LBHttpSolrClient has been constructed using *base root urls*, and when 
> a slave goes down, and then back again, the client is unable to mark it as 
> alive again due to 404 error.
> Logs  below:
> {code:java}
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "GET 
> /solr/select?q=%3A&rows=0&sort=docid+asc&distrib=false&wt=javabin&version=2 
> HTTP/1.1[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> 
> "User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 
> 1.0[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Host: 
> localhost:8984[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> 
> "Connection: Keep-Alive[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "HTTP/1.1 
> 404 Not Found[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Cache-Control: must-revalidate,no-cache,no-store[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Content-Type: text/html;charset=iso-8859-1[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Content-Length: 243[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<html>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<head>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<meta 
> http-equiv="Content-Type" content="text/html;charset=utf-8"/>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "<title>Error 404 Not Found</title>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</head>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "<body><h2>HTTP ERROR 404</h2>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<p>Problem 
> accessing /solr/select. Reason:[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<pre> Not 
> Found</pre></p>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</body>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</html>[\n]"{code}
> *Analysis*
>  when using only *base root urls* in a LBHttpSolrClient we need to pass a 
> "*collection*" paramter when sending a request. It works fine except that in 
> a method 
> {code:java}
> private void checkAZombieServer(ServerWrapper zombieServer){code}
> it tries to query a solr without the collection parameter, to check if the 
> server is alive. This causes a html content (apparently dashboard) to be 
> returned, and as a result it will move to the exception clause in the method 
> therefore even if the server is back it will never be marked as alive again.
>  I debugged this and if we pass a collection name there as a second param it 
> will respond in a right manner.
> Suggestion is either to somehow pass the collection name or to change the way 
> zombie servers are pinged.
> *Steps to reproduce*
> Run 2 servers - master and slave. Create client using base urls. Index, test 
> search etc.
> Turn off slave server and after couple of seconds turn it on again.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12415) Solr Loadbalancer client LBHttpSolrClient not working as expected, if a Solr node goes down, it is unable to detect when it become live again due to 404 error

Reply via email to