Re: Can someone explain how -Ptest.seed works (Oh and DistributedQueryComponentCustomSortTest keeps failing!)?

2021-11-22 Thread Michael Gibney
Dawid, sorry for the incomplete information (and for hijacking the thread)!
The "no runnable tests" error indeed isn't an issue for the gradle build.
But for ant I just confirmed that on a clean checkout of
`releases/lucene-solr/8.11.0`:
`ant test -Dtests.class='org.apache.solr.search.facet.*'` yields:
   [junit4] Completed [13/13 (1!)] on J0 in 129.17s, 162 tests
   [junit4]
   [junit4]
   [junit4] Tests with failures [seed: 37C421761CE5038]:
   [junit4]   - org.apache.solr.search.facet.DebugAgg.initializationError
   [junit4]
   [junit4]
   [junit4] JVM J0: 0.68 ..   135.06 =   134.38s
   [junit4] JVM J1: 0.68 ..73.02 =72.34s
   [junit4] JVM J2: 0.66 ..57.78 =57.12s
   [junit4] JVM J3: 0.66 ..60.63 =59.97s
   [junit4] Execution time total: 2 minutes 15 seconds
   [junit4] Tests summary: 13 suites, 234 tests, 1 error, 3 ignored (3
assumptions)
So, the 12 actual tests pass, but DebugAgg (which is basically a utility
class used across the various tests) fails at the suite level. To be clear
I'm not suggesting it's worth doing anything about this; but if invoking
tests in the ant build and specifying tests via wildcard pattern (a special
case!), it's worth knowing to ignore this type of spurious error.

On Mon, Nov 22, 2021 at 1:18 PM Dawid Weiss  wrote:

> > FWIW: the test-suite-level "no runnable tests" errors usually crop up
> for me when specifying tests with wildcard expressions, e.g.:
> `-Dtests.class='org.apache.solr.search.facet.*'` (with `DebugAgg` being the
> culprit in that case). I just ignore these "no runnable tests" errors
> (though perhaps there's a better way?). Not sure if that applies in this
> case (and come to think of it, I don't recall whether this applies to both
> `ant` or `gradle` builds), but I thought I'd mention it.
>
> I'm not sure I understand - can you provide a full command that
> produces this error?
>
> D.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>


Re: Can someone explain how -Ptest.seed works (Oh and DistributedQueryComponentCustomSortTest keeps failing!)?

2021-11-22 Thread Chris Hostetter


You didn't really say much about what the failure *looks* like (in terms 
of wether you get an assertion failure, if so where, and what the logs say 
etc...) but I tried to run this test (with this seed) locally a few times 
and skimmed the logs...

what i see is that some form of "time allowed exceeded" warning gets 
logged, and then there are a bunch of (aparently inter-node) connection 
failures.

In looking at the test, there is one place where it uses timeAllowed, and 
then tries to make an assertion about the result -- either it will be 
'complete', or if incomplete then it will meet some other garuntee...

// Regression check on timeAllowed in combination with sorting, SOLR-14758
// Should see either a complete result or a partial result, but never an NPE
rsp = queryAllowPartialResults("q", "text:d", "fl", "id", "sort", "payload 
desc", "rows", "20", "timeAllowed", "1");
if (!Objects.equals(Boolean.TRUE, 
rsp.getHeader().getBooleanArg(SolrQueryResponse.RESPONSE_HEADER_PARTIAL_RESULTS_KEY)))
 {
  assertFieldValues(rsp.getResults(), id, "11", "13", "12");
}

...but it never seems to fail with an assertion failure: it always seems 
to fail with a "Connection refused" (maybe from an internal retry in 
HttpSolrClient?)


my guess is that 2 things are colliding here:

1) for some/many seeds, this index only has one segment per shard, so 
timeAllowed never matters ... but in oher seeds, the randomized 
indexConfig causes a shard to have multiple segments, and the 1ms 
timeAllowed (essentially) always trips.


2) at some point in the (recent) past, something changed in how 
timeAllowed works in distributed searching (that may work fine in "real" 
solr cloud, but maybe doesn't work as well in "fake" 
BaseDistributedSearchTestCase type situations?) so that when the 
timeAllowed condition is hit, the request doesn't fail the same way it use 
to -- and the "client" (test code) is getting a RemoteSolrException 
instead of the partial response it's expecting...



org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest 
> test FAILED

org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException: 
Error from server at http://127.0.0.1:33555/collection1: 
org.apache.solr.client.solrj.SolrServerException: Connection refused
at 
__randomizedtesting.SeedInfo.seed([5B595A296E3623CE:D30D65F3C0CA4E36]:0)
at 
app//org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:668)
at 
app//org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:239)
at 
app//org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:222)
at 
app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:226)
at 
app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1004)
at 
app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1019)
at 
app//org.apache.solr.BaseDistributedSearchTestCase.queryServer(BaseDistributedSearchTestCase.java:623)




: Date: Mon, 22 Nov 2021 07:30:25 -0500
: From: Eric Pugh 
: Reply-To: dev@solr.apache.org
: To: dev@solr.apache.org
: Subject: Can someone explain how -Ptest.seed works (Oh and
: DistributedQueryComponentCustomSortTest keeps failing!)?
: 
: Can someone explain how the -Ptest.seed works, and how I use it to figure out 
if a bug exists?   
: 
: I ran the full set of tests, and had this message:
: ERROR: The following test(s) have failed:
:   - 
org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test 
(:solr:core)
: Test output: 
/Users/epugh/Documents/projects/solr-epugh-2/solr/core/build/test-results/test/outputs/OUTPUT-org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.txt
: Reproduce with: gradlew :solr:core:test --tests 
"org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test"
 -Ptests.jvms=4 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 
-Ptests.seed=5B595A296E3623CE -Ptests.file.encoding=US-ASCII
: 
: 
: I then tried the same tests, but with out the -Ptests.seed, and it passed!  
Does this mean there is a bug or not?  
: 
: I did go and check
: 
: 
http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test
: 
: And it appears that this test has been failing more since July.  
: 
: Eric
: 
: 
: 
: ___
: Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
: Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 

  
: This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly s

Re: Can someone explain how -Ptest.seed works (Oh and DistributedQueryComponentCustomSortTest keeps failing!)?

2021-11-22 Thread Dawid Weiss
> FWIW: the test-suite-level "no runnable tests" errors usually crop up for me 
> when specifying tests with wildcard expressions, e.g.: 
> `-Dtests.class='org.apache.solr.search.facet.*'` (with `DebugAgg` being the 
> culprit in that case). I just ignore these "no runnable tests" errors (though 
> perhaps there's a better way?). Not sure if that applies in this case (and 
> come to think of it, I don't recall whether this applies to both `ant` or 
> `gradle` builds), but I thought I'd mention it.

I'm not sure I understand - can you provide a full command that
produces this error?

D.

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Can someone explain how -Ptest.seed works (Oh and DistributedQueryComponentCustomSortTest keeps failing!)?

2021-11-22 Thread Michael Gibney
>LUCENE-9660. Perhaps it's worth porting over to Solr too.
+1 to this, I had been wondering about that!

The above explanation of the test suite error messages is great, and
perhaps would be nice to have in "contributing" docs too.

FWIW: the test-suite-level "no runnable tests" errors usually crop up for
me when specifying tests with wildcard expressions, e.g.:
`-Dtests.class='org.apache.solr.search.facet.*'` (with `DebugAgg` being the
culprit in that case). I just ignore these "no runnable tests" errors
(though perhaps there's a better way?). Not sure if that applies in this
case (and come to think of it, I don't recall whether this applies to both
`ant` or `gradle` builds), but I thought I'd mention it.

Michael

On Mon, Nov 22, 2021 at 11:59 AM Dawid Weiss  wrote:

> > The test.seed will cause the randomized testing to choose the same
> locale/timezone/encoding/etc so that if you're hitting a bug that only
> shows up with swahili or for the kiribati timezone it will reproduce
> reliably.
>
> Correct. Some things will be "derived" from this property at build
> time (in gradle), other things will be derived from this seed at
> runtime. Ideally we'd pick everything at runtime but some things can
> only be done prior to the JVM launching (file.encoding, for example).
>
> > A) it's key to run cleanTest before running that "reproduce with"
> command so that the build doesn't pass thinking the test is up to date (the
> reproduce line probably ought to include this by default).
>
> This has been changed on Lucene side -- see LUCENE-9660. Perhaps it's
> worth porting over to Solr too.
>
> > The thread leak version fails a test called "class method" and the
> reproduce string always fails with a message about junit not finding tests,
> because there isn't any test named .classMethod
>
> I can explain this. JUnit reports historically didn't have any way of
> passing exceptions (failures) for non-test methods. This synthetic
> "class method" is used by the test framework to report exceptions that
> happened in class constructors, class-level before/after hooks and in
> other scopes that are outside of the regular test methods.
>
> These errors are also aggregating: a thread leak error in Solr tests
> is typically a result of some other, prior failure that leaves
> unclosed file descriptors, non-interruptible threads running in the
> background, etc. In an ideal world, a try-with-resources would close
> these resources (such as thread pools), even if an exception is thrown
> in the test body. Things being as they are, you get the source
> exception first, followed by the framework complaining about the
> inability to clean up test class threads (which results in completely
> broken test suite isolation).
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>


Re: Can someone explain how -Ptest.seed works (Oh and DistributedQueryComponentCustomSortTest keeps failing!)?

2021-11-22 Thread Dawid Weiss
> The test.seed will cause the randomized testing to choose the same 
> locale/timezone/encoding/etc so that if you're hitting a bug that only shows 
> up with swahili or for the kiribati timezone it will reproduce reliably.

Correct. Some things will be "derived" from this property at build
time (in gradle), other things will be derived from this seed at
runtime. Ideally we'd pick everything at runtime but some things can
only be done prior to the JVM launching (file.encoding, for example).

> A) it's key to run cleanTest before running that "reproduce with" command so 
> that the build doesn't pass thinking the test is up to date (the reproduce 
> line probably ought to include this by default).

This has been changed on Lucene side -- see LUCENE-9660. Perhaps it's
worth porting over to Solr too.

> The thread leak version fails a test called "class method" and the reproduce 
> string always fails with a message about junit not finding tests, because 
> there isn't any test named .classMethod

I can explain this. JUnit reports historically didn't have any way of
passing exceptions (failures) for non-test methods. This synthetic
"class method" is used by the test framework to report exceptions that
happened in class constructors, class-level before/after hooks and in
other scopes that are outside of the regular test methods.

These errors are also aggregating: a thread leak error in Solr tests
is typically a result of some other, prior failure that leaves
unclosed file descriptors, non-interruptible threads running in the
background, etc. In an ideal world, a try-with-resources would close
these resources (such as thread pools), even if an exception is thrown
in the test body. Things being as they are, you get the source
exception first, followed by the framework complaining about the
inability to clean up test class threads (which results in completely
broken test suite isolation).

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org



Re: Can someone explain how -Ptest.seed works (Oh and DistributedQueryComponentCustomSortTest keeps failing!)?

2021-11-22 Thread Eric Pugh
Interesting!   So, I changed my test.seed from 5B595A296E3623CE to 
5B595A296E3623CC by changing the last letter from E to C, and reran the test, 
and it passed!

Be curious if it fails for anyone else?   

The failing seed version was complaining about various timeouts etc in dealing 
with the collections, is this worth opening a ticket about, or is it just 
something we live with?  (Maybe that is the the thrust of your other thread!)


> On Nov 22, 2021, at 10:11 AM, Gus Heck  wrote:
> 
> The test.seed will cause the randomized testing to choose the same 
> locale/timezone/encoding/etc so that if you're hitting a bug that only shows 
> up with swahili or for the kiribati timezone it will reproduce reliably.
> 
> This only works if your test is tripped by something influenced by that seed. 
> Otherwise:
> 
> A) it's key to run cleanTest before running that "reproduce with" command so 
> that the build doesn't pass thinking the test is up to date (the reproduce 
> line probably ought to include this by default).
> 
> B) The bad news is what you are probably seeing is flakey test issues. 
> Unfortunately, there is something wrong either with the test harness or the 
> server wherein at least two or maybe more things seem to happen. Test shut 
> down leaks threads relating to zkclient/httpclient, or collections that don't 
> seem to exist when they should. There may be more failure modes than that, 
> but those are two that I've noticed. These issues resolve when the tests are 
> run individually, and probably relate to something being slow due to the high 
> load of the tests. The thread leak version fails a test called "class method" 
> and the reproduce string always fails with a message about junit not finding 
> tests, because there isn't any test named .classMethod
> 
> I recently started a thread titled "unit tests" where I so far have 
> documented how bad this is but have not yet found any solution.
> 
> -Gus
> 
> On Mon, Nov 22, 2021 at 7:30 AM Eric Pugh  > wrote:
> Can someone explain how the -Ptest.seed works, and how I use it to figure out 
> if a bug exists?   
> 
> I ran the full set of tests, and had this message:
> ERROR: The following test(s) have failed:
>   - 
> org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test
>  (:solr:core)
> Test output: 
> /Users/epugh/Documents/projects/solr-epugh-2/solr/core/build/test-results/test/outputs/OUTPUT-org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.txt
> Reproduce with: gradlew :solr:core:test --tests 
> "org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test"
>  -Ptests.jvms=4 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 
> -Ptests.seed=5B595A296E3623CE -Ptests.file.encoding=US-ASCII
> 
> 
> I then tried the same tests, but with out the -Ptests.seed, and it passed!  
> Does this mean there is a bug or not?  
> 
> I did go and check
> 
> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test
>  
> 
> 
> And it appears that this test has been failing more since July.  
> 
> Eric
> 
> 
> 
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com  
> | My Free/Busy   
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> 
>   
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
> 
> 
> 
> -- 
> http://www.needhamsoftware.com  (work)
> http://www.the111shift.com  (play)

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: Can someone explain how -Ptest.seed works (Oh and DistributedQueryComponentCustomSortTest keeps failing!)?

2021-11-22 Thread Gus Heck
The test.seed will cause the randomized testing to choose the same
locale/timezone/encoding/etc so that if you're hitting a bug that only
shows up with swahili or for the kiribati timezone it will reproduce
reliably.

This only works if your test is tripped by something influenced by that
seed. Otherwise:

A) it's key to run cleanTest before running that "reproduce with" command
so that the build doesn't pass thinking the test is up to date (the
reproduce line probably ought to include this by default).

B) The bad news is what you are probably seeing is flakey test issues.
Unfortunately, there is something wrong either with the test harness or the
server wherein at least two or maybe more things seem to happen. Test shut
down leaks threads relating to zkclient/httpclient, or collections that
don't seem to exist when they should. There may be more failure modes than
that, but those are two that I've noticed. These issues resolve when the
tests are run individually, and probably relate to something being slow due
to the high load of the tests. The thread leak version fails a test called
"class method" and the reproduce string always fails with a message about
junit not finding tests, because there isn't any test named .classMethod

I recently started a thread titled "unit tests" where I so far have
documented how bad this is but have not yet found any solution.

-Gus

On Mon, Nov 22, 2021 at 7:30 AM Eric Pugh 
wrote:

> Can someone explain how the -Ptest.seed works, and how I use it to figure
> out if a bug exists?
>
> I ran the full set of tests, and had this message:
> ERROR: The following test(s) have failed:
>   -
> org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test
> (:solr:core)
> Test output:
> /Users/epugh/Documents/projects/solr-epugh-2/solr/core/build/test-results/test/outputs/OUTPUT-org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.txt
> Reproduce with: gradlew :solr:core:test --tests
> "org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test"
> -Ptests.jvms=4 -Ptests.jvmargs=-XX:TieredStopAtLevel=1
> -Ptests.seed=5B595A296E3623CE -Ptests.file.encoding=US-ASCII
>
>
> I then tried the same tests, but with out the -Ptests.seed, and it
> passed!  Does this mean there is a bug or not?
>
> I did go and check
>
>
> http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.handler.component.DistributedQueryComponentCustomSortTest.test
>
> And it appears that this test has been failing more since July.
>
> Eric
>
>
>
> ___
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> 
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> 
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)