[jira] [Commented] (CASSANDRA-8984) Introduce Transactional API for behaviours that can corrupt system state

Joshua McKenzie (JIRA) Tue, 24 Mar 2015 09:26:12 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378094#comment-14378094
 ]


Joshua McKenzie commented on CASSANDRA-8984:
--------------------------------------------

SSTW thoughts:
* Naming in SSTW is insufficient to give usage context: openEarly vs. openFinal 
vs. openFinalEarly. While I can read through SSTRW to get the gist of the 
expected usage patterns of these methods, we should comment them in SSTW rather 
than leaving the logical interdependency and current usage as our only 
documentation of them.
* As a general observation, rewriting the logic of SSTRW and parts of SSTW 
repeatedly isn't giving this code time to settle. The bug fixes we've made to 
previous implementations are out the window each time we re-write this code and 
this patch is again doing that on much of the SSTable early open, renaming, 
closing, etc.

That being said, SSTW's changes look good to me and the tests all passing on 
linux speaks to some measure of correctness. Once we get the Windows issues 
(which are likely trivial ordering issues w/regards to SSTRW) ironed out I'm +1 
on it.

bq. On the whole my current raft of changes is all about trying to make 
semantics clear and well defined, which in my opinion reduces complexity, 
significantly, even if it increases LOC.
While I agree they make things more clear, more well defined, and make things 
*easier* for us to use correctly, IMO they don't make the code-base *simpler*. 
It's not an argument against inclusion of this patch, just an observation that 
we're adding a raft of objects and interfaces (Ref, Refs, RefCounted, 
SharedCloseable, Tidy, Transactional, StateManager, etc) to codify some things 
that we previously just handled manually. And often poorly - this stuff is 
notoriously prone to human error so we definitely needed to address it. While 
each individual component we've added to shore these problems up is simple and 
straightforward, it's a different kind of complexity and price to be paid for 
having 5 objects with 20x complexity vs. 100 objects with 1x each; I'm also not 
trying to say your approach is deficient with regards to this, just that it's 
something to keep in mind.

bq.  But we are definitely broken in 2.1 for dealing with failures in the 
middle of state changes, especially if those failures happen during rollback of 
a change, and it's not clear what the system state will be after this.
That's a strong argument. Not sure how we want to handle this given our 
proximity to 3.0 and our point in the 2.1 release life-cycle. [~jbellis]?


> Introduce Transactional API for behaviours that can corrupt system state
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8984
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8984
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1.4
>
>         Attachments: 8984_windows_timeout.txt
>
>
> As a penultimate (and probably final for 2.1, if we agree to introduce it 
> there) round of changes to the internals managing sstable writing, I've 
> introduced a new API called "Transactional" that I hope will make it much 
> easier to write correct behaviour. As things stand we conflate a lot of 
> behaviours into methods like "close" - the recent changes unpicked some of 
> these, but didn't go far enough. My proposal here introduces an interface 
> designed to support four actions (on top of their normal function):
> * prepareToCommit
> * commit
> * abort
> * cleanup
> In normal operation, once we have finished constructing a state change we 
> call prepareToCommit; once all such state changes are prepared, we call 
> commit. If at any point everything fails, abort is called. In _either_ case, 
> cleanup is called at the very last.
> These transactional objects are all AutoCloseable, with the behaviour being 
> to rollback any changes unless commit has completed successfully.
> The changes are actually less invasive than it might sound, since we did 
> recently introduce abort in some places, as well as have commit like methods. 
> This simply formalises the behaviour, and makes it consistent between all 
> objects that interact in this way. Much of the code change is boilerplate, 
> such as moving an object into a try-declaration, although the change is still 
> non-trivial. What it _does_ do is eliminate a _lot_ of special casing that we 
> have had since 2.1 was released. The data tracker API changes and compaction 
> leftover cleanups should finish the job with making this much easier to 
> reason about, but this change I think is worthwhile considering for 2.1, 
> since we've just overhauled this entire area (and not released these 
> changes), and this change is essentially just the finishing touches, so the 
> risk is minimal and the potential gains reasonably significant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8984) Introduce Transactional API for behaviours that can corrupt system state

Reply via email to