[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test

Stefania (JIRA) Thu, 14 Jul 2016 00:50:06 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376518#comment-15376518
 ]


Stefania commented on CASSANDRA-11465:
--------------------------------------

The additional warnings are a problem with ccm code, here is an extract of the 
log file taken from the last seen [failing 
test|http://cassci.datastax.com/job/trunk_dtest/1276/testReport/junit/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test/]
 on June 15:

{code}
ERROR [ScheduledTasks:1] 2016-06-15 03:06:41,111 Tracing.java:106 - Cannot use 
class junk for tracing (Unable to find Tracing class 'junk'), ignoring by 
defaulting on normal tracing
WARN  [main] 2016-06-15 03:06:41,119 StartupChecks.java:123 - jemalloc shared 
library could not be preloaded to speed up memory allocations
WARN  [main] 2016-06-15 03:06:41,119 StartupChecks.java:156 - JMX is not 
enabled to receive remote connections. Please see cassandra-env.sh for more 
info.
{code}

PR [#518|https://github.com/pcmanus/ccm/pull/518] should have fixed this 
problem; it was merged on June 24.

--

The missing {{'/127.0.0.1'}} happens at least on 3.8, 3.9 and trunk, I couldn't 
see any similar failures on 3.0 or 3.7. It seems the [oldest 
test|http://cassci.datastax.com/view/trunk/job/trunk_novnode_dtest/409/testReport/junit/cql_tracing_test/TestCqlTracing/tracing_default_impl_test/]
 that failed is on Jun 27. Locally, I can reproduce it about once every 10 
times, so I am currently trying to bisect between cassandra-3.7 and 
cassandra-3.8.

What's happening is that we are missing the tracing events of nodes 2 and 3, we 
only have the coordinator events. Looking at how tracing works, technically 
this can happen if the coordinator is very slow and pre-empted, because 
{{TraceStateImpl}} mutates at CL.ANY and the driver queries 
{{system_traces.events}} at CL.LOCAL_ONE. The coordinator writes a session-end 
entry in {{system_traces.sessions}} just before the response is sent to the 
client; the driver waits for this entry before querying 
{{system_tracing.events}}. Although the replicas send a tracing mutation 
request before sending the response to the actual request with tracing enabled, 
which in this case is at CL.ALL, since the coordinator is operating in a 
multi-threaded environment, there is no guarantee that the mutations of the 
tracing events will be inserted before inserting the session-end mutation. This 
is a race but I cannot find anything in the code that suggests that this was 
changed recently. It should only be a problem with a very slow C* coordinator, 
or if the tracing mutations were dropped due to overload, which clearly isn't 
the case. Therefore I am hoping the bisect may shed some light.

> dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11465
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11465
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Philip Thompson
>            Assignee: Stefania
>              Labels: dtest
>
> Failing on the following assert, on trunk only: 
> {{self.assertEqual(len(errs[0]), 1)}}
> Is not failing consistently.
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1087/testReport/cql_tracing_test/TestCqlTracing/tracing_unknown_impl_test
> Failed on CassCI build trunk_dtest #1087



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11465) dtest failure in cql_tracing_test.TestCqlTracing.tracing_unknown_impl_test

Reply via email to