[ 
https://issues.apache.org/jira/browse/SOLR-12415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493840#comment-16493840
 ] 

Shawn Heisey commented on SOLR-12415:
-------------------------------------

With this SolrJ code and two servers, I was able to replicate the problem.

{code:java}
        public static void main(String[] args) throws InterruptedException
        {
                LBHttpSolrClient lc = new LBHttpSolrClient.Builder()
                                .withBaseSolrUrls("http://localhost:8983/solr";, 
"http://localhost:8984/solr";).build();
                doQ(lc);
                System.out.println("stop one solr.");
                Thread.sleep(30000);
                doQ(lc);
                System.out.println("start solr back up.");
                Thread.sleep(30000);
                doQ(lc);
                System.exit(0);
        }

        private static void doQ(LBHttpSolrClient lc) throws InterruptedException
        {
                QueryResponse r = null;
                for (int i = 0; i < 5; i++)
                {
                        try
                        {
                                r = null;
                                r = lc.query("foo", new SolrQuery("*"));
                        }
                        catch (Exception e)
                        {
                                System.out.println(e.getClass().getName() + ":" 
+ e.getMessage());
                        }
                        Thread.sleep(10);
                        if (r != null)
                        {
                                
System.out.println(r.getResults().getNumFound());
                        }
                        else
                        {
                                System.out.println("null response");
                        }
                }
        }
{code}

It gave the following output, with 30 second pauses after the stop and start 
notes, during which I did the mentioned action.
{noformat}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
0
1
0
1
0
stop one solr.
1
1
1
1
1
start solr back up.
1
1
1
1
1
{noformat}

I added one document to the server running on port 8984 so there would be two 
different responses.


> Solr Loadbalancer client LBHttpSolrClient not working as expected, if a Solr 
> node goes down, it is unable to detect when it become live again due to 404 
> error
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-12415
>                 URL: https://issues.apache.org/jira/browse/SOLR-12415
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 7.2.1, 7.3.1, 7.4
>         Environment: Solr 7.2.1
> 2 servers - master and slave.
>            Reporter: Grzegorz Lebek
>            Priority: Critical
>
> *Context*
>  When LBHttpSolrClient has been constructed using *base root urls*, and when 
> a slave goes down, and then back again, the client is unable to mark it as 
> alive again due to 404 error.
> Logs  below:
> {code:java}
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "GET 
> /solr/select?q=%3A&rows=0&sort=docid+asc&distrib=false&wt=javabin&version=2 
> HTTP/1.1[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> 
> "User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 
> 1.0[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "Host: 
> localhost:8984[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> 
> "Connection: Keep-Alive[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 >> "[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "HTTP/1.1 
> 404 Not Found[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Cache-Control: must-revalidate,no-cache,no-store[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Content-Type: text/html;charset=iso-8859-1[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "Content-Length: 243[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "[\r][\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<html>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<head>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<meta 
> http-equiv="Content-Type" content="text/html;charset=utf-8"/>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "<title>Error 404 Not Found</title>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</head>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "<body><h2>HTTP ERROR 404</h2>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<p>Problem 
> accessing /solr/select. Reason:[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << "<pre> Not 
> Found</pre></p>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</body>[\n]"
>  DEBUG [aliveCheckExecutor-1-thread-1] [wire] http-outgoing-83 << 
> "</html>[\n]"{code}
> *Analysis*
>  when using only *base root urls* in a LBHttpSolrClient we need to pass a 
> "*collection*" paramter when sending a request. It works fine except that in 
> a method 
> {code:java}
> private void checkAZombieServer(ServerWrapper zombieServer){code}
> it tries to query a solr without the collection parameter, to check if the 
> server is alive. This causes a html content (apparently dashboard) to be 
> returned, and as a result it will move to the exception clause in the method 
> therefore even if the server is back it will never be marked as alive again.
>  I debugged this and if we pass a collection name there as a second param it 
> will respond in a right manner.
> Suggestion is either to somehow pass the collection name or to change the way 
> zombie servers are pinged.
> *Steps to reproduce*
> Run 2 servers - master and slave. Create client using base urls. Index, test 
> search etc.
> Turn off slave server and after couple of seconds turn it on again.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to