[ 
https://issues.apache.org/jira/browse/CASSANDRA-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-3520:
----------------------------------------

    Attachment: 3520.patch

So, the whole problem is due to our handling of non durable writes in the 
shutdown hook. For those, we flush the CFS as part of shutdown. However, flush 
tries to grab a commitlog context, which blocks because the commit log has been 
shutdown *before* all this (and for some reason, executor.submit() don't throw 
any exception if the executor is shutdown).

The reason why r1185960 was triggering this is that it actually fixed a bug by 
which previously to this commit, adding a new column family to a keyspace would 
reset the durableWrites option to true, hence hiding the bug as far as CliTest 
is concerned.

One simple solution is to move the commit log shutdown after the flushes of the 
non-durable CFs (which 1.0 does, and that's why it isn't affected). Truth is, 
it doesn't feel like the right fix in that non-durable CF shouldn't query the 
commit log at all, even during flushes. However, changing that introduces the 
possibility to have some CL segment retained forever when upgrading a keyspace 
from non-durable to durable if we're not careful. So overall just pushing the 
CL shutdown down in the shutdown hook to match 1.0 seems good enough, at least 
for 0.8. Attaching a patch to do just that. We can then look at making things 
cleaner with respect to flushing non-durable CFS in 1.0/trunk if we so wish.

Note that while having a non-durable system keyspace was not directly the 
problem, I think it was a fairly bad idea, and we should leave it to durable 
for 0.8 and turn it back to durable for 1.0 and trunk.

                
> Unit test are hanging on 0.8 branch
> -----------------------------------
>
>                 Key: CASSANDRA-3520
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3520
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tests
>         Environment: Linux
>            Reporter: Sylvain Lebresne
>             Fix For: 0.8.8
>
>         Attachments: 0001-Use-durable-writes-for-system-ks.patch, 3520.patch
>
>
> As the summary says, the unit test on current 0.8 are just hanging after 
> CliTest (it's apparently not the case on windows, but it is on Linux and 
> MacOSX).
> Not sure what's going on, but what I can tell is that it's enough to run 
> CliTest to have it hang after the test successfully pass (i.e, JUnit just 
> wait indefinitely for the VM to exit). Even weirder, it seems that it is the 
> counter increment in the CliTest that make it hang, if you comment those 
> statement, it stop hanging. However, nothing seems to go wrong with the 
> increment itself (the test passes) and it doesn't even trigger anything 
> (typically sendToHintedEndpoint is not called because there is only one node).
> Looking at the stack when the VM is hanging (attached), there is nothing 
> specific to counters in there, and nothing that struck me at odd (but I could 
> miss something). There do is a few thrift thread running (CASSANDRA-3335), 
> but why would that only be a problem for the tests in that situation is a 
> mystery to me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to