[jira] [Commented] (CASSANDRA-8984) Introduce Transactional API for behaviours that can corrupt system state

Joshua McKenzie (JIRA) Mon, 23 Mar 2015 14:44:09 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376703#comment-14376703
 ]


Joshua McKenzie commented on CASSANDRA-8984:
--------------------------------------------

All unit tests are timing out on Windows - attaching a sample of one to this 
ticket. I've reviewed most of the patch and will leave feedback for what I have 
thus far - I still need to digest the changes to SSTableWriter as they're 
fairly extensive.

Comments:
* I'm not too keen on prepareToCommit being part of the usage pattern without 
actually being present in the interface but I can see why you went that route. 
I don't have any alternative suggestions unfortunately.
* The passing around of Throwables to merge deep in the stack (Transactional, 
SafeMemory*, Ref* etc) is a little cludgy and has some pretty deeply nested 
tag-along variables. Not sure why we can't just return a Throwable up that 
stack and merge at the top level when we know we might need to merge rather 
than passing the Throwable all the way down from the top to merge at the 
bottom...?
* The StateManager abstraction and the Transactional Interface seem to be a 
poor fit to several of the implementers. Having 2/3 of the methods resolve to 
the noOpTransition seems like we're conflating the idea of "classes that have 
resources that need to be cleaned up" with "classes that have a set of state 
transitions they go through and a logical abort process".
* With regard to the StateManager requiring a beginTransition / 
rejectedTransition combo and specific completeTransition - we're trading one 
set of manually managed states for another. Still error-prone and has quite a 
bit of duplication where it's implemented (rather than noOpTransition)
* autoclose seems to have some redundant assignment - we switch on state and if 
it's COMMITTED, we set state = COMMITTED, ABORTED we set it to ABORTED.
* Consider renaming StateManager.autoclose(). Something like 'finalize()' might 
be more accurate, as 'state.autoclose()' describes the context in which it's 
called rather than what it's doing.

nits:
* Inconsistent prepareForCommit vs. prepareToCommit in comment in Transactional
* Unused Logger added to IndexSummary -> was this intentional?
* You left a comment in SSTRW.prepareToCommitAndMaybeThrow that should be 
removed:
{noformat}
// No early open to finalize and replace
{noformat}

In general this patch and the recent trend in our code-base on the 2.1+ 
branches makes me uneasy. Moving the state tracking logic from within the SSTRW 
and SSTW into their own abstraction helps separate our concerns and increase 
modularity at the cost of increased complexity w/regards to the depth of the 
type system and object interaction, similarly to the introduction of the 
formalized ref-counting infrastructure. Each additional step we've take to 
shore up our stability w/regards to SSTable lifecycles is increasing our net 
complexity and the contrast between where we started and where we are now is 
pretty striking. Now, that's not to say that I prefer the alternative of being 
back where we started with regards to having an error-prone brittle interface 
for ref-counting for instance, but in general I'm left feeling wary when I see 
more wide-spread changes in the same vein particularly as we're approaching a 
.4 release on 2.1.

Note: When I refer to wide-spread changes, I have no hard and fast rule as to 
what qualifies however this change touches [many 
files|https://github.com/belliottsmith/cassandra/commit/16d92cc5926d54667609fb8300f3c573bea5c89f].

As we're not attempting to address any current pain-point with this ticket, 
there's the outstanding potential missed close() w/regards to commit(), and 
this commit rewrites some of the hotter areas in SSTRW w/regards to recent 
errors, I'm of the opinion this change is better targeted towards 3.0 similarly 
to CASSANDRA-8568. This would give us more time to beef up our testing 
infrastructure to better test changes in these portions of the code-base that 
are historically vulnerable to races w/regards to state and would also give the 
rest of the developers working on the code-base more time to get familiar with 
these changes rather than having them out in the wild immediately.

So all that being said, on the whole I believe this approach is a net 
improvement and once we get the details hammered out I believe this will be 
harder to get wrong compared to our previous implementation. The operations 
it's modifying are subtle enough that I don't feel like 2.1 is the right place 
for it at this time though (this could be the whole "Once bitten, twice shy" 
problem though...)

I'll dig into SSTW and update the ticket when I've gone through that.

> Introduce Transactional API for behaviours that can corrupt system state
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8984
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8984
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1.4
>
>         Attachments: 8984_windows_timeout.txt
>
>
> As a penultimate (and probably final for 2.1, if we agree to introduce it 
> there) round of changes to the internals managing sstable writing, I've 
> introduced a new API called "Transactional" that I hope will make it much 
> easier to write correct behaviour. As things stand we conflate a lot of 
> behaviours into methods like "close" - the recent changes unpicked some of 
> these, but didn't go far enough. My proposal here introduces an interface 
> designed to support four actions (on top of their normal function):
> * prepareToCommit
> * commit
> * abort
> * cleanup
> In normal operation, once we have finished constructing a state change we 
> call prepareToCommit; once all such state changes are prepared, we call 
> commit. If at any point everything fails, abort is called. In _either_ case, 
> cleanup is called at the very last.
> These transactional objects are all AutoCloseable, with the behaviour being 
> to rollback any changes unless commit has completed successfully.
> The changes are actually less invasive than it might sound, since we did 
> recently introduce abort in some places, as well as have commit like methods. 
> This simply formalises the behaviour, and makes it consistent between all 
> objects that interact in this way. Much of the code change is boilerplate, 
> such as moving an object into a try-declaration, although the change is still 
> non-trivial. What it _does_ do is eliminate a _lot_ of special casing that we 
> have had since 2.1 was released. The data tracker API changes and compaction 
> leftover cleanups should finish the job with making this much easier to 
> reason about, but this change I think is worthwhile considering for 2.1, 
> since we've just overhauled this entire area (and not released these 
> changes), and this change is essentially just the finishing touches, so the 
> risk is minimal and the potential gains reasonably significant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8984) Introduce Transactional API for behaviours that can corrupt system state

Reply via email to