[ 
https://issues.apache.org/jira/browse/CASSANDRA-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728818#comment-14728818
 ] 

Benedict commented on CASSANDRA-10109:
--------------------------------------

bq. Not for the final record if you recall, we carry on in this case

Right, but we apply more stringent checks, i.e. that the state on disk matches 
our log file exactly. If it does not, we abort. However for listing we appear 
not to apply this logic, instead filtering out the problematic records.

The problem with eliminating this extra step, as you propose, is that a lister 
could quite reasonably encounter corruption twice in a row just due to tearing 
of writes (admittedly this unlikely, but we're deployed widely, so it will 
happen eventually) - in this case, without the extra logic, it would simply 
fail under completely healthy operation. The only reason we can safely stop 
deterministically after just two attempts is because we know that (under normal 
operation) if the disk state mismatches then the transaction has been 
committed, so we should either read a completed transaction file or no file at 
all.

Does that sound like a reasonable assessment to you?

In which case I think I would prefer to just apply the behaviour we decided on 
for cleanup to listers as well (i.e. ignore the txn log entirely in the event 
it does not {{validate}} - but leave that validation logic as is).

bq. Should we have a dedicated package inside lifecycle?

Missed responding to this: I'm easy. It could make the API cleaner still, but 
it's also not essential.

> Windows dtest 3.0: ttl_test.py failures
> ---------------------------------------
>
>                 Key: CASSANDRA-10109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10109
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Joshua McKenzie
>            Assignee: Stefania
>              Labels: Windows
>             Fix For: 3.0.0 rc1
>
>
> ttl_test.py:TestTTL.update_column_ttl_with_default_ttl_test2
> ttl_test.py:TestTTL.update_multiple_columns_ttl_test
> ttl_test.py:TestTTL.update_single_column_ttl_test
> Errors locally are different than CI from yesterday. Yesterday on CI we have 
> timeouts and general node hangs. Today on all 3 tests when run locally I see:
> {noformat}
> Traceback (most recent call last):
>   File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown
>     raise AssertionError('Unexpected error in %s node log: %s' % (node.name, 
> errors))
> AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 
> 16:53:43,120 NoSpamLogger.java:97 - This platform does not support atomic 
> directory streams (SecureDirectoryStream); race conditions when loading 
> sstable files could occurr']
> {noformat}
> This traces back to the commit for CASSANDRA-7066 today by [~Stefania] and 
> [~benedict].  Stefania - care to take this ticket and also look further into 
> whether or not we're going to have issues with 7066 on Windows? That error 
> message certainly *sounds* like it's not a good thing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to