[ 
https://issues.apache.org/jira/browse/SOLR-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316873#comment-15316873
 ] 

Hoss Man commented on SOLR-9189:
--------------------------------


My initial gut paranoia skimming the jenkins emails this morning was to assume 
that this might be because of SOLR-5776 -- the hypothosis being: "The increased 
randomized use of ssl (factoring in tests.nightly / tests.multiplier) is 
causing more tests to slow down due to the crypto calculations"

... but that hypothosis seems weak when i started looking at the logs -- there 
is a "Randomized ssl" line as part of the logs for every SolrTestCaseJ4 
subclass showing if ssl is being used or not...

* http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/834/
** 25 test failures
** only 7 of those were using ssl
* https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1034/
** 44 test failures
** only 17 of those were using ssl

...even if we assume every test failure where ssl was in use was directly 
caused by ssl, that still leaves a really high increase in the number of failed 
tests in those two runs.

So my ammended (paranoid) hypothosis is "The increased randomized use of ssl 
(factoring in tests.nightly / tests.multiplier) is causing more tests to slow 
down due to the crypto calculations *EVEN IN OTHER TESTS AT THE SAME TIME DUE 
TO CPU STARVATION*"

I'm going to commit a blanket disable of all SSL randomization _on master_ ASAP 
to test this hypothosis.

Part of me feels like this is an overkill reaction, and that a more rational 
response would simply be to undo the "increased odds of using ssl" portion of 
SOLR-5776 -- but I'd really like to get a difinitive understanding of wether 
SSL usage is really having such a seriously pronounced affect on other tests in 
the same jenkins run -- OR -- *is it just a red herring, and some other recent 
change has caused serious timeout issues?*



> explosion of timeout related failures in jenkins the past few days
> ------------------------------------------------------------------
>
>                 Key: SOLR-9189
>                 URL: https://issues.apache.org/jira/browse/SOLR-9189
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Critical
>
> In the past few days, something has gone seriously wonky with our jenkins 
> tests -- causing a serious explosion in the number of test failures -- 
> notably do to various sorts of timeouts...
> * "Unable to create core ... Timed out getting coreNodeName for ..."
> * "msg=SolrCore is loading,code=503"
> * "Timeout occured while waiting response from server"
> * "No registered leader was found after waiting for 30000ms"
> * "Unable to create core ... Caused by: Timed out getting shard id for core: 
> ..."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to