[ https://issues.apache.org/jira/browse/CASSANDRA-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-3520: ---------------------------------------- Attachment: 3520.patch So, the whole problem is due to our handling of non durable writes in the shutdown hook. For those, we flush the CFS as part of shutdown. However, flush tries to grab a commitlog context, which blocks because the commit log has been shutdown *before* all this (and for some reason, executor.submit() don't throw any exception if the executor is shutdown). The reason why r1185960 was triggering this is that it actually fixed a bug by which previously to this commit, adding a new column family to a keyspace would reset the durableWrites option to true, hence hiding the bug as far as CliTest is concerned. One simple solution is to move the commit log shutdown after the flushes of the non-durable CFs (which 1.0 does, and that's why it isn't affected). Truth is, it doesn't feel like the right fix in that non-durable CF shouldn't query the commit log at all, even during flushes. However, changing that introduces the possibility to have some CL segment retained forever when upgrading a keyspace from non-durable to durable if we're not careful. So overall just pushing the CL shutdown down in the shutdown hook to match 1.0 seems good enough, at least for 0.8. Attaching a patch to do just that. We can then look at making things cleaner with respect to flushing non-durable CFS in 1.0/trunk if we so wish. Note that while having a non-durable system keyspace was not directly the problem, I think it was a fairly bad idea, and we should leave it to durable for 0.8 and turn it back to durable for 1.0 and trunk. > Unit test are hanging on 0.8 branch > ----------------------------------- > > Key: CASSANDRA-3520 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3520 > Project: Cassandra > Issue Type: Bug > Components: Tests > Environment: Linux > Reporter: Sylvain Lebresne > Fix For: 0.8.8 > > Attachments: 0001-Use-durable-writes-for-system-ks.patch, 3520.patch > > > As the summary says, the unit test on current 0.8 are just hanging after > CliTest (it's apparently not the case on windows, but it is on Linux and > MacOSX). > Not sure what's going on, but what I can tell is that it's enough to run > CliTest to have it hang after the test successfully pass (i.e, JUnit just > wait indefinitely for the VM to exit). Even weirder, it seems that it is the > counter increment in the CliTest that make it hang, if you comment those > statement, it stop hanging. However, nothing seems to go wrong with the > increment itself (the test passes) and it doesn't even trigger anything > (typically sendToHintedEndpoint is not called because there is only one node). > Looking at the stack when the VM is hanging (attached), there is nothing > specific to counters in there, and nothing that struck me at odd (but I could > miss something). There do is a few thrift thread running (CASSANDRA-3335), > but why would that only be a problem for the tests in that situation is a > mystery to me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira