[ 
https://issues.apache.org/jira/browse/CASSANDRA-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104880#comment-17104880
 ] 

Yifan Cai edited comment on CASSANDRA-15797 at 5/11/20, 11:17 PM:
------------------------------------------------------------------

PR: https://github.com/apache/cassandra/pull/588
Code: https://github.com/yifan-c/cassandra/tree/CASSANDRA-15797-Flaky-BinLogTest
Test: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra?branch=CASSANDRA-15797-Flaky-BinLogTest

_update: unit and jvm dtest passed in java 8 and java 11_

*Failed tests*
* BinLogTest.testPutAfterStop: there is a chance that at the time of assertion, 
the consumer thread in BinLog has not yet drain the queue, so that assertion 
get the last NO_OP object and fail. The behavior is expected. The fix to the 
test is to assert none of the records get from the queue is the one inserted 
after stopping the binLog.
* BinLogTest.testBinLogStartStop: there is no barrier in the test to block 
assertion until records in the queue are consumed. Added the coundownlatch as 
the barrier. 

*Chronicle reference counter history trace*
The trace indicates the last release operation sees the reference counter is 
already 0, so that it prints the history. It is caused by the resources has 
already been released by the try-with-resources statement. And according to 
StoreComponentReferenceHandler#processWireQueue, *the exception can be ignored 
as the resources has already been released*. In this case, all test that calls 
readBinLogRecords can see the reference counter history.
Why the history trace is not always printed every time? Because test already 
ended and there is no time to print it. If adding a sleep time (5 seconds) at 
the end of the test, it is guaranteed to print reference count history. 


was (Author: yifanc):
PR: https://github.com/apache/cassandra/pull/588
Code: https://github.com/yifan-c/cassandra/tree/CASSANDRA-15797-Flaky-BinLogTest
Test: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra?branch=CASSANDRA-15797-Flaky-BinLogTest

*Failed tests*
* BinLogTest.testPutAfterStop: there is a chance that at the time of assertion, 
the consumer thread in BinLog has not yet drain the queue, so that assertion 
get the last NO_OP object and fail. The behavior is expected. The fix to the 
test is to assert none of the records get from the queue is the one inserted 
after stopping the binLog.
* BinLogTest.testBinLogStartStop: there is no barrier in the test to block 
assertion until records in the queue are consumed. Added the coundownlatch as 
the barrier. 

*Chronicle reference counter history trace*
The trace indicates the last release operation sees the reference counter is 
already 0, so that it prints the history. It is caused by the resources has 
already been released by the try-with-resources statement. And according to 
StoreComponentReferenceHandler#processWireQueue, *the exception can be ignored 
as the resources has already been released*. In this case, all test that calls 
readBinLogRecords can see the reference counter history.
Why the history trace is not always printed every time? Because test already 
ended and there is no time to print it. If adding a sleep time (5 seconds) at 
the end of the test, it is guaranteed to print reference count history. 

> Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-15797
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15797
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/unit
>            Reporter: Jon Meredith
>            Assignee: Yifan Cai
>            Priority: Normal
>             Fix For: 4.0-alpha
>
>
> An internal CI system is failing BinLogTest somewhat frequently under JDK11.  
> Configuration was recently changed to reduce the number of cores the tests 
> run with, however it is reproducible on an 8 core laptop.
> {code}
> [junit-timeout] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC 
> was deprecated in version 9.0 and will likely be removed in a future release.
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest
> [Junit-timeout] WARNING: An illegal reflective access operation has occurred
> [junit-timeout] WARNING: Illegal reflective access by 
> net.openhft.chronicle.core.Jvm (file:/.../lib/chronicle-core-1.16.4.jar) to 
> field java.nio.Bits.RESERVED_MEMORY
> [junit-timeout] WARNING: Please consider reporting this to the maintainers of 
> net.openhft.chronicle.core.Jvm
> [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of 
> further illegal reflective access operations
> [junit-timeout] WARNING: All illegal access operations will be denied in a 
> future release
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests 
> run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.895 sec
> [junit-timeout]
> [junit-timeout] Testcase: 
> testPutAfterStop(org.apache.cassandra.utils.binlog.BinLogTest):     FAILED
> [junit-timeout] expected:<null> but 
> was:<org.apache.cassandra.utils.binlog.BinLog$1@5ed190be>
> [junit-timeout] junit.framework.AssertionFailedError: expected:<null> but 
> was:<org.apache.cassandra.utils.binlog.BinLog$1@5ed190be>
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.testPutAfterStop(BinLogTest.java:431)
> [junit-timeout]         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit-timeout]         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [junit-timeout]         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]
> [junit-timeout]
> [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED
> {code}
> There's also a different failure under JDK8
> {code}
> junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests 
> run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.273 sec
> [junit-timeout]
> [junit-timeout] Testcase: 
> testBinLogStartStop(org.apache.cassandra.utils.binlog.BinLogTest):  FAILED
> [junit-timeout] expected:<2> but was:<0>
> [junit-timeout] junit.framework.AssertionFailedError: expected:<2> but was:<0>
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.testBinLogStartStop(BinLogTest.java:172)
> [junit-timeout]
> [junit-timeout]
> [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED
> {code}
> Reproducer
> {code}
> PASSED=0; time  { while ant testclasslist -Dtest.classlistprefix=unit 
> -Dtest.classlistfile=<(echo 
> org/apache/cassandra/utils/binlog/BinLogTest.java); do PASSED=$((PASSED+1)); 
> echo PASSED $PASSED; done }; echo FAILED after $PASSED runs.
> {code}
> In the last four attempts it has taken 31, 38, 27 and 10 rounds respectively 
> under JDK11 and took 51 under JDK8 (about 15 minutes).
> I have not tried running in a cpu-limited container or anything like that yet.
> Additionally, this went past in the logs a few times (under JDK11).  No idea 
> if it's just an artifact of weird test setup, or something more serious.
> {code}
> [junit-timeout] WARNING: Please consider reporting this to the maintainers of 
> net.openhft.chronicle.core.Jvm
> [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of 
> further illegal reflective access operations
> [junit-timeout] WARNING: All illegal access operations will be denied in a 
> future release
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests 
> run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.839 sec
> [junit-timeout]
> [junit-timeout] java.lang.Throwable: 1e53135d-main creation ref-count=1
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.newRefCountHistory(ReferenceCounter.java:45)
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.<init>(ReferenceCounter.java:35)
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.onReleased(ReferenceCounter.java:40)
> [junit-timeout]         at 
> net.openhft.chronicle.bytes.AbstractBytes.<init>(AbstractBytes.java:38)
> [junit-timeout]         at 
> net.openhft.chronicle.bytes.MappedBytes.<init>(MappedBytes.java:59)
> [junit-timeout]         at 
> net.openhft.chronicle.bytes.MappedBytes.<init>(MappedBytes.java:55)
> [junit-timeout]         at 
> net.openhft.chronicle.bytes.MappedBytes.mappedBytes(MappedBytes.java:113)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueStore.bytes(SingleChronicleQueueStore.java:358)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.resetWires(SingleChronicleQueueExcerpts.java:1702)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.cycle(SingleChronicleQueueExcerpts.java:1907)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.toStart(SingleChronicleQueueExcerpts.java:1614)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.createTailer(SingleChronicleQueue.java:401)
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.readBinLogRecords(BinLogTest.java:495)
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.testCleanupOnOversize(BinLogTest.java:407)
> ...
> [junit-timeout] java.lang.Throwable: 1e53135d-main Reserve ref-count=2
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.recordResevation(ReferenceCounter.java:66)
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.reserve(ReferenceCounter.java:59)
> [junit-timeout]         at 
> net.openhft.chronicle.bytes.AbstractBytes.reserve(AbstractBytes.java:464)
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounted.tryReserve(ReferenceCounted.java:81)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueStore.writeEOF(SingleChronicleQueueStore.java:461)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.checkMoveToNextCycle(SingleChronicleQueueExcerpts.java:1437)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.inACycleNone(SingleChronicleQueueExcerpts.java:1416)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.inACycle(SingleChronicleQueueExcerpts.java:1356)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readingDocument(SingleChronicleQueueExcerpts.java:1168)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readDocument(SingleChronicleQueueExcerpts.java:1096)
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.readBinLogRecords(BinLogTest.java:498)
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.testCleanupOnOversize(BinLogTest.java:407)
> ...
> [junit-timeout] java.lang.Throwable: 1e53135d-main Release ref-count=1
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.recordRelease(ReferenceCounter.java:88)
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:79)
> [junit-timeout]         at 
> net.openhft.chronicle.bytes.AbstractBytes.release(AbstractBytes.java:469)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueStore.writeEOF(SingleChronicleQueueStore.java:463)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.checkMoveToNextCycle(SingleChronicleQueueExcerpts.java:1437)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.inACycleNone(SingleChronicleQueueExcerpts.java:1416)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.inACycle(SingleChronicleQueueExcerpts.java:1356)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readingDocument(SingleChronicleQueueExcerpts.java:1168)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.readDocument(SingleChronicleQueueExcerpts.java:1096)
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.readBinLogRecords(BinLogTest.java:498)
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.testCleanupOnOversize(BinLogTest.java:407)
> ...
> [junit-timeout] java.lang.Throwable: 1e53135d-main Release ref-count=0
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.recordRelease(ReferenceCounter.java:88)
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:79)
> [junit-timeout]         at 
> net.openhft.chronicle.bytes.AbstractBytes.release(AbstractBytes.java:469)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreTailer.close(SingleChronicleQueueExcerpts.java:1117)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.lambda$close$4(SingleChronicleQueue.java:537)
> [junit-timeout]         at 
> java.base/java.util.WeakHashMap.forEach(WeakHashMap.java:1026)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.close(SingleChronicleQueue.java:537)
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.readBinLogRecords(BinLogTest.java:506)
> [junit-timeout]         at 
> org.apache.cassandra.utils.binlog.BinLogTest.testCleanupOnOversize(BinLogTest.java:407)
> ...
> [junit-timeout] java.lang.Throwable: 
> 1e53135d-main/queue-thread-local-cleaner-daemon Release ref-count=0
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.recordRelease(ReferenceCounter.java:88)
> [junit-timeout]         at 
> net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:74)
> [junit-timeout]         at 
> net.openhft.chronicle.bytes.AbstractBytes.release(AbstractBytes.java:469)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.StoreComponentReferenceHandler.processWireQueue(StoreComponentReferenceHandler.java:74)
> [junit-timeout]         at 
> net.openhft.chronicle.queue.impl.single.StoreComponentReferenceHandler.lambda$static$0(StoreComponentReferenceHandler.java:43)
> [junit-timeout]         at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> [junit-timeout]         at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> [junit-timeout]         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [junit-timeout]         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [junit-timeout]         at java.base/java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to