[ https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376518#comment-15376518 ]
Stefania commented on CASSANDRA-11465: -------------------------------------- The additional warnings are a problem with ccm code, here is an extract of the log file taken from the last seen [failing test|http://cassci.datastax.com/job/trunk_dtest/1276/testReport/junit/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test/] on June 15: {code} ERROR [ScheduledTasks:1] 2016-06-15 03:06:41,111 Tracing.java:106 - Cannot use class junk for tracing (Unable to find Tracing class 'junk'), ignoring by defaulting on normal tracing WARN [main] 2016-06-15 03:06:41,119 StartupChecks.java:123 - jemalloc shared library could not be preloaded to speed up memory allocations WARN [main] 2016-06-15 03:06:41,119 StartupChecks.java:156 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. {code} PR [#518|https://github.com/pcmanus/ccm/pull/518] should have fixed this problem; it was merged on June 24. -- The missing {{'/127.0.0.1'}} happens at least on 3.8, 3.9 and trunk, I couldn't see any similar failures on 3.0 or 3.7. It seems the [oldest test|http://cassci.datastax.com/view/trunk/job/trunk_novnode_dtest/409/testReport/junit/cql_tracing_test/TestCqlTracing/tracing_default_impl_test/] that failed is on Jun 27. Locally, I can reproduce it about once every 10 times, so I am currently trying to bisect between cassandra-3.7 and cassandra-3.8. What's happening is that we are missing the tracing events of nodes 2 and 3, we only have the coordinator events. Looking at how tracing works, technically this can happen if the coordinator is very slow and pre-empted, because {{TraceStateImpl}} mutates at CL.ANY and the driver queries {{system_traces.events}} at CL.LOCAL_ONE. The coordinator writes a session-end entry in {{system_traces.sessions}} just before the response is sent to the client; the driver waits for this entry before querying {{system_tracing.events}}. Although the replicas send a tracing mutation request before sending the response to the actual request with tracing enabled, which in this case is at CL.ALL, since the coordinator is operating in a multi-threaded environment, there is no guarantee that the mutations of the tracing events will be inserted before inserting the session-end mutation. This is a race but I cannot find anything in the code that suggests that this was changed recently. It should only be a problem with a very slow C* coordinator, or if the tracing mutations were dropped due to overload, which clearly isn't the case. Therefore I am hoping the bisect may shed some light. > dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test > -------------------------------------------------------------------------- > > Key: CASSANDRA-11465 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11465 > Project: Cassandra > Issue Type: Bug > Reporter: Philip Thompson > Assignee: Stefania > Labels: dtest > > Failing on the following assert, on trunk only: > {{self.assertEqual(len(errs[0]), 1)}} > Is not failing consistently. > example failure: > http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test > Failed on CassCI build trunk_dtest #1087 -- This message was sent by Atlassian JIRA (v6.3.4#6332)