[ https://issues.apache.org/jira/browse/CASSANDRA-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376703#comment-14376703 ]
Joshua McKenzie commented on CASSANDRA-8984: -------------------------------------------- All unit tests are timing out on Windows - attaching a sample of one to this ticket. I've reviewed most of the patch and will leave feedback for what I have thus far - I still need to digest the changes to SSTableWriter as they're fairly extensive. Comments: * I'm not too keen on prepareToCommit being part of the usage pattern without actually being present in the interface but I can see why you went that route. I don't have any alternative suggestions unfortunately. * The passing around of Throwables to merge deep in the stack (Transactional, SafeMemory*, Ref* etc) is a little cludgy and has some pretty deeply nested tag-along variables. Not sure why we can't just return a Throwable up that stack and merge at the top level when we know we might need to merge rather than passing the Throwable all the way down from the top to merge at the bottom...? * The StateManager abstraction and the Transactional Interface seem to be a poor fit to several of the implementers. Having 2/3 of the methods resolve to the noOpTransition seems like we're conflating the idea of "classes that have resources that need to be cleaned up" with "classes that have a set of state transitions they go through and a logical abort process". * With regard to the StateManager requiring a beginTransition / rejectedTransition combo and specific completeTransition - we're trading one set of manually managed states for another. Still error-prone and has quite a bit of duplication where it's implemented (rather than noOpTransition) * autoclose seems to have some redundant assignment - we switch on state and if it's COMMITTED, we set state = COMMITTED, ABORTED we set it to ABORTED. * Consider renaming StateManager.autoclose(). Something like 'finalize()' might be more accurate, as 'state.autoclose()' describes the context in which it's called rather than what it's doing. nits: * Inconsistent prepareForCommit vs. prepareToCommit in comment in Transactional * Unused Logger added to IndexSummary -> was this intentional? * You left a comment in SSTRW.prepareToCommitAndMaybeThrow that should be removed: {noformat} // No early open to finalize and replace {noformat} In general this patch and the recent trend in our code-base on the 2.1+ branches makes me uneasy. Moving the state tracking logic from within the SSTRW and SSTW into their own abstraction helps separate our concerns and increase modularity at the cost of increased complexity w/regards to the depth of the type system and object interaction, similarly to the introduction of the formalized ref-counting infrastructure. Each additional step we've take to shore up our stability w/regards to SSTable lifecycles is increasing our net complexity and the contrast between where we started and where we are now is pretty striking. Now, that's not to say that I prefer the alternative of being back where we started with regards to having an error-prone brittle interface for ref-counting for instance, but in general I'm left feeling wary when I see more wide-spread changes in the same vein particularly as we're approaching a .4 release on 2.1. Note: When I refer to wide-spread changes, I have no hard and fast rule as to what qualifies however this change touches [many files|https://github.com/belliottsmith/cassandra/commit/16d92cc5926d54667609fb8300f3c573bea5c89f]. As we're not attempting to address any current pain-point with this ticket, there's the outstanding potential missed close() w/regards to commit(), and this commit rewrites some of the hotter areas in SSTRW w/regards to recent errors, I'm of the opinion this change is better targeted towards 3.0 similarly to CASSANDRA-8568. This would give us more time to beef up our testing infrastructure to better test changes in these portions of the code-base that are historically vulnerable to races w/regards to state and would also give the rest of the developers working on the code-base more time to get familiar with these changes rather than having them out in the wild immediately. So all that being said, on the whole I believe this approach is a net improvement and once we get the details hammered out I believe this will be harder to get wrong compared to our previous implementation. The operations it's modifying are subtle enough that I don't feel like 2.1 is the right place for it at this time though (this could be the whole "Once bitten, twice shy" problem though...) I'll dig into SSTW and update the ticket when I've gone through that. > Introduce Transactional API for behaviours that can corrupt system state > ------------------------------------------------------------------------ > > Key: CASSANDRA-8984 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8984 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Benedict > Fix For: 2.1.4 > > Attachments: 8984_windows_timeout.txt > > > As a penultimate (and probably final for 2.1, if we agree to introduce it > there) round of changes to the internals managing sstable writing, I've > introduced a new API called "Transactional" that I hope will make it much > easier to write correct behaviour. As things stand we conflate a lot of > behaviours into methods like "close" - the recent changes unpicked some of > these, but didn't go far enough. My proposal here introduces an interface > designed to support four actions (on top of their normal function): > * prepareToCommit > * commit > * abort > * cleanup > In normal operation, once we have finished constructing a state change we > call prepareToCommit; once all such state changes are prepared, we call > commit. If at any point everything fails, abort is called. In _either_ case, > cleanup is called at the very last. > These transactional objects are all AutoCloseable, with the behaviour being > to rollback any changes unless commit has completed successfully. > The changes are actually less invasive than it might sound, since we did > recently introduce abort in some places, as well as have commit like methods. > This simply formalises the behaviour, and makes it consistent between all > objects that interact in this way. Much of the code change is boilerplate, > such as moving an object into a try-declaration, although the change is still > non-trivial. What it _does_ do is eliminate a _lot_ of special casing that we > have had since 2.1 was released. The data tracker API changes and compaction > leftover cleanups should finish the job with making this much easier to > reason about, but this change I think is worthwhile considering for 2.1, > since we've just overhauled this entire area (and not released these > changes), and this change is essentially just the finishing touches, so the > risk is minimal and the potential gains reasonably significant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)