[ https://issues.apache.org/jira/browse/CASSANDRA-14047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vincent White updated CASSANDRA-14047: -------------------------------------- Attachment: trunk-patch_passes_test-debug.log 3_11-debug.log trunk-debug.log trunk-debug.log-2 On 3.11 I saw still see the {{UnknownColumnFamilyException}}. Which does cause the test to fail because it triggers the "Unexpected error in log" assertion error when tearing down the test. Strangely the test passes and doesn't hit this on trunk with my patch even though the logs still contains the UnknownColumnFamilyException (not sure if thats related to C* version specific config in dtests or something). So the netty issue is unrelated to the flakiness of this test, not sure if it should have its own ticket? I've attached a few sets of debug logs that demonstrate the various behaviours with/without netty and with/without my patch from the previous comment. In regard to the test itself. It appears that the reads that are triggering the {{UnknownColumnFamilyException}} are actually from the initialisation of CassandraRoleManager since they are for {{system_auth.roles}} (I believe {{hasExistingRoles()}} in {{setupDefaultRole()}}), I'm not exactly sure what the best way to resolve this is. This error isn't an issue for the role manager itself as it will simply retry later and it doesn't affect the tests apart from triggering the unexpected error in log. For the tests I guess we could leave a gap between starting nodes. But it's probably more correct to just ignore these errors. I've tested that [https://github.com/vincewhite/cassandra-dtest/commit/7e48704713123a253a914802975f7163474ede9b] this resolves the failures and I assume it's probably safe to ignore this error for all of the tests in consistency_test but I haven't looked into that at this stage. Also these tests don't do anything fancy in regard to how they start the cluster, they just use the normal {{cluster.start(wait_for_binary_proto=True, wait_other_notice=True)}} call so I guess this could causes random failures in a lot of tests. > test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: > Missing: ['127.0.0.3.* now UP']: > -------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-14047 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14047 > Project: Cassandra > Issue Type: Bug > Components: Testing > Reporter: Michael Kjellman > Assignee: Vincent White > Attachments: 3_11-debug.log, trunk-debug.log, trunk-debug.log-2, > trunk-patch_passes_test-debug.log > > > test_simple_strategy_each_quorum_users - consistency_test.TestAccuracy fails: > Missing: ['127.0.0.3.* now UP']: > 15 Nov 2017 11:23:37 [node1] Missing: ['127.0.0.3.* now UP']: > INFO [main] 2017-11-15 11:21:32,452 YamlConfigura..... > See system.log for remainder > -------------------- >> begin captured logging << -------------------- > dtest: DEBUG: cluster ccm directory: /tmp/dtest-v3VgyS > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'num_tokens': '32', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 10000, > 'read_request_timeout_in_ms': 10000, > 'request_timeout_in_ms': 10000, > 'truncate_request_timeout_in_ms': 10000, > 'write_request_timeout_in_ms': 10000} > dtest: DEBUG: Testing single dc, users, each quorum reads > --------------------- >> end captured logging << --------------------- > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/cassandra/cassandra-dtest/tools/decorators.py", line 48, in > wrapped > f(obj) > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 621, in > test_simple_strategy_each_quorum_users > > self._run_test_function_in_parallel(TestAccuracy.Validation.validate_users, > [self.nodes], [self.rf], combinations) > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in > _run_test_function_in_parallel > self._start_cluster(save_sessions=True, > requires_local_reads=requires_local_reads) > File "/home/cassandra/cassandra-dtest/consistency_test.py", line 141, in > _start_cluster > cluster.start(wait_for_binary_proto=True, wait_other_notice=True) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", > line 428, in start > node.watch_log_for_alive(other_node, from_mark=mark) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line > 520, in watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/node.py", line > 488, in watch_log_for > raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " > [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + > reads[:50] + ".....\nSee {} for remainder".format(filename)) > "15 Nov 2017 11:23:37 [node1] Missing: ['127.0.0.3.* now UP']:\nINFO [main] > 2017-11-15 11:21:32,452 YamlConfigura.....\nSee system.log for > remainder\n-------------------- >> begin captured logging << > --------------------\ndtest: DEBUG: cluster ccm directory: > /tmp/dtest-v3VgyS\ndtest: DEBUG: Done setting configuration options:\n{ > 'initial_token': None,\n 'num_tokens': '32',\n 'phi_convict_threshold': > 5,\n 'range_request_timeout_in_ms': 10000,\n > 'read_request_timeout_in_ms': 10000,\n 'request_timeout_in_ms': 10000,\n > 'truncate_request_timeout_in_ms': 10000,\n 'write_request_timeout_in_ms': > 10000}\ndtest: DEBUG: Testing single dc, users, each quorum > reads\n--------------------- >> end captured logging << ---------------------" -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org