Re: Help requested: o.a.c.cql3.validation unit tests periodically hang

2016-09-01 Thread Benjamin Lerer
o.a.c.cql3.validation contains a lot of tests. Most of them are not real
unittest in the sense that they start and use a C* server.
I am not sure of how it is done but if we ran multiple tests in parallel we
have to be carefull that those test do not interfer with each others. For
example, some test use a java driver to connect to the server. How do we
guarantee that they can connect to the expected server or that the servers
do not try to use the same sockets.

Another problem that I have seen with unit tests is that the order of the
tests is random. Due to that some tests my interfer with others within the
same Test class if the setup and cleanup is not properly done. It also
means that running the test once might not been enough to identify that
time of problems.

In my opinion, we should first determine if we have some interaction
between the tests by setting the number of runners to one for a reasonable
amount of runs. If the problem goes away, we can assume that it is caused
by some interactions between the tests that are run in parrallele. If it is
the case we should open a JIRA to modify the tests to make them able to run
in parallel.

Now, I am just guessing what the problem can be and somebody else might
have a better idea.

Benjamin


On Thu, Sep 1, 2016 at 4:59 AM, Michael Shuler 
wrote:

> Another couple ABORT examples have presented themselves, tonight, one
> that has logs.
>
> Usually we'll see unit tests finish similar to:
>
> 01:57:39 [junit] Testsuite:
> org.apache.cassandra.cql3.statements.PropertyDefinitionsTest
> 01:57:39 [junit] Testsuite:
> org.apache.cassandra.cql3.statements.PropertyDefinitionsTest Tests run:
> 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.362 sec
>
> This trunk_testall job has 4 different tests, none of which finished
> with the "Tests run:..." output.
>
> http://cassci.datastax.com/job/trunk_testall/1158/console
>
> Those log files are:
>
> TEST-org.apache.cassandra.cql3.validation.entities.CountersTest.log
> TEST-org.apache.cassandra.cql3.ViewTest.log
> TEST-org.apache.cassandra.cql3.ViewFilteringTest.log
> TEST-org.apache.cassandra.cql3.validation.entities.CollectionsTest.log
>
> CountersTest appears to have never really started running. ViewTest and
> ViewFilteringTest both logged shutdown entries. The CollectionsTest log
> shows a leak error at the end.
>
> ERROR [Strong-Reference-Leak-Detector:1] 2016-09-01 01:58:43,170 Strong
> self-ref loop detected..
>
> Logs are here:
> http://cassci.datastax.com/job/trunk_testall/1158/artifact/jenkins-trunk_
> testall-1158_logs.tar.gz
>
> These are completely different concurrently running tests than the last
> ABORT I posted from trunk, which is why I'm asking for help getting to
> the bottom of this. I have yet to find a rhyme or reason to these job
> halts.
>
> The other ABORT was
> http://cassci.datastax.com/job/cassandra-2.2_testall/578/console
>
> This appears to have hung on the cql3.DropKeyspaceCommitLogRecycleTest
> long-test, and failed to fetch logs.
>
> --
> Kind regards,
> Michael
>


Re: Help requested: o.a.c.cql3.validation unit tests periodically hang

2016-08-31 Thread Michael Shuler
Another couple ABORT examples have presented themselves, tonight, one
that has logs.

Usually we'll see unit tests finish similar to:

01:57:39 [junit] Testsuite:
org.apache.cassandra.cql3.statements.PropertyDefinitionsTest
01:57:39 [junit] Testsuite:
org.apache.cassandra.cql3.statements.PropertyDefinitionsTest Tests run:
2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.362 sec

This trunk_testall job has 4 different tests, none of which finished
with the "Tests run:..." output.

http://cassci.datastax.com/job/trunk_testall/1158/console

Those log files are:

TEST-org.apache.cassandra.cql3.validation.entities.CountersTest.log
TEST-org.apache.cassandra.cql3.ViewTest.log
TEST-org.apache.cassandra.cql3.ViewFilteringTest.log
TEST-org.apache.cassandra.cql3.validation.entities.CollectionsTest.log

CountersTest appears to have never really started running. ViewTest and
ViewFilteringTest both logged shutdown entries. The CollectionsTest log
shows a leak error at the end.

ERROR [Strong-Reference-Leak-Detector:1] 2016-09-01 01:58:43,170 Strong
self-ref loop detected..

Logs are here:
http://cassci.datastax.com/job/trunk_testall/1158/artifact/jenkins-trunk_testall-1158_logs.tar.gz

These are completely different concurrently running tests than the last
ABORT I posted from trunk, which is why I'm asking for help getting to
the bottom of this. I have yet to find a rhyme or reason to these job halts.

The other ABORT was
http://cassci.datastax.com/job/cassandra-2.2_testall/578/console

This appears to have hung on the cql3.DropKeyspaceCommitLogRecycleTest
long-test, and failed to fetch logs.

-- 
Kind regards,
Michael