[ 
https://issues.apache.org/jira/browse/SOLR-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493945#comment-16493945
 ] 

Shawn Heisey commented on SOLR-12415:
-------------------------------------

Other thoughts:

Deprecate handling URLs whose path includes the collection name in 7.5 and 
remove that functionality in 8.0.

Strongly recommend setDefaultCollection in 7.x, make it a requirement in 8.0.  
(not sure if this is really needed)

Deprecate the request/client methods in SolrJ that do NOT take a collection 
name in 8.0 and remove in 9.0.  I think setDefaultCollection would also 
disappear in 9.0.  I also think that this paragraph assumes that the zombie 
checks in LBHttpSolrClient use an admin handler.

Maybe LBHttpSolrClient should be able to query each server periodically for a 
list of cores it can handle, so when a user requests core X, it won't be sent 
to a server unless that server is definitely able to handle the request.  This 
will require that /solr/admin/cores is implicitly or explicitly defined.  If no 
servers are found with the ability to handle the request, it could either be 
sent to one of them anyway (possibly failing), or the client could refresh the 
core list on all servers and fail fast if the core still isn't available.  When 
LBHttpSolrClient is used inside CloudSolrClient, it will require special 
handling because collection names are valid even though none of the servers has 
a core with that name.  


> Solr Loadbalancer client LBHttpSolrClient not working as expected, if a Solr 
> node goes down, it is unable to detect when it become live again due to 404 
> error
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-12415
>                 URL: https://issues.apache.org/jira/browse/SOLR-12415
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 7.2.1, 7.3.1, 7.4
>         Environment: Solr 7.2.1
> 2 servers - master and slave.
>            Reporter: Grzegorz Lebek
>            Priority: Critical
>
> *Context*
>  When LBHttpSolrClient has been constructed using *base root urls*, and when 
> a slave goes down, and then back again, the client is unable to mark it as 
> alive again due to 404 error.
> Logs  below:
> {code:java}
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "GET 
> /solr/select?q=%3A&rows=0&sort=docid+asc&distrib=false&wt=javabin&version=2 
> HTTP/1.1[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> 
> "User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 
> 1.0[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Host: 
> localhost:8984[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> 
> "Connection: Keep-Alive[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "HTTP/1.1 
> 404 Not Found[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Cache-Control: must-revalidate,no-cache,no-store[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Content-Type: text/html;charset=iso-8859-1[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Content-Length: 243[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<html>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<head>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<meta 
> http-equiv="Content-Type" content="text/html;charset=utf-8"/>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "<title>Error 404 Not Found</title>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</head>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "<body><h2>HTTP ERROR 404</h2>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<p>Problem 
> accessing /solr/select. Reason:[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<pre> Not 
> Found</pre></p>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</body>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</html>[\n]"{code}
> *Analysis*
>  when using only *base root urls* in a LBHttpSolrClient we need to pass a 
> "*collection*" paramter when sending a request. It works fine except that in 
> a method 
> {code:java}
> private void checkAZombieServer(ServerWrapper zombieServer){code}
> it tries to query a solr without the collection parameter, to check if the 
> server is alive. This causes a html content (apparently dashboard) to be 
> returned, and as a result it will move to the exception clause in the method 
> therefore even if the server is back it will never be marked as alive again.
>  I debugged this and if we pass a collection name there as a second param it 
> will respond in a right manner.
> Suggestion is either to somehow pass the collection name or to change the way 
> zombie servers are pinged.
> *Steps to reproduce*
> Run 2 servers - master and slave. Create client using base urls. Index, test 
> search etc.
> Turn off slave server and after couple of seconds turn it on again.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to