[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-01-30 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298807#comment-14298807
 ] 

Benedict commented on CASSANDRA-8707:
-

I've uploaded a patch 
[here|https://github.com/belliottsmith/cassandra/tree/8707-safercleanup]

There are a couple of minor semi-extraneous commits tweaking slightly the 
design just introduced in 7705, in particular:

* introducing a ref() method for those instances where we expect never to fail
* Letting Ref and RefCounted hold a reference to the thing they're counting, so 
that Ref.referent() can be supported

The patch itself then has some ground works:

* Encapsulate SSTableReader.readMeter, so it isn't accessed directly
* Introduce a new SharedCloseable interface that represents a Closeable that 
can be shared by creating a new instance via sharedCopy(); each instance can 
treated as distinct, but the underlying resources will only be closed once all 
of them are. it is backed by RefCounted, and just provides a convenient 
abstraction.
* Make IndexSummary, IFilter and SegmentedFile implement SharedCloseable

Finally we do the guts of the patch, which is refactoring the cleanup in 
SSTableReader so that instead of maintaining a chain of replacements, we have 
three tiers of resource management: instance-only, type-shared 
(TEMPLINK/FINAL), and global. Each have their own reference count managed to 
ensure safe cleanup. The instance-only fields include the major ones: dfile, 
ifile, summary and filter, as these are all managed directly via 
SharedCloseable. The type-shared and global manage general cleanup, such as 
dropping page cache, trashing read meter data, deleting files, etc.

Hopefully only the last bit is the only that requires considerable attention, 
and whilst it is a little verbose the logic should be much easier to understand 
the before, and hence much safer. It will also give us all of the new debug 
info if we get it wrong.

This patch exposes a number of new LEAK DETECTED warnings in unit tests which I 
want to work through, but I suspect mostly are down to creating and discarding 
sstables directly in the test methods. 

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-01-30 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299286#comment-14299286
 ] 

Benedict commented on CASSANDRA-8707:
-

I've pushed an update to the repository with numerous fixes to the test cases 
suppressing all leak detection (except in RefCountedTest, where they're 
intended). In doing so I came across several meaningful memory leaks that I've 
also fixed. The worst offender being the bloom filter from SSTableWriter, which 
was not being freed. Users performing bulk transformations could have 
presumably been badly affected by this. 

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-01-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299930#comment-14299930
 ] 

Benedict commented on CASSANDRA-8707:
-

I was playing with another small refactor to the ref counting code, and somehow 
managed to butcher my git history so that I overrode my backup copy with the 
new version as well (still not sure what I did), and uploaded it. The upshot is 
there's an extra refactor in there. I can attempt to rollback the change if you 
prefer, but it isn't too drastic and I think makes the API cleaner.

Essentially, I remerged RefCountedImpl with Ref, but in the opposite way. Now 
you just create a Ref object, which sorts out creating all of the background 
gubbins. The Ref object both manages a single reference and the creation of new 
references. This simplifies some of the new helper classes I've created, and I 
think creates a lower cognitive burden on the user of the API. The biggest 
upshot is there is no concept of a sharedRef(); the first Ref you create you 
can simply stash somewhere (which we do in SSTableReader). I think this is 
cleaner, as it's not always necessary to have a "sharedRef". 

Let me know what you think. I've done my best to keep all of the changes into 
relatively small easily digested commits, so hopefully the review burden won't 
be too problematic. But we can rework it if necessary.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303053#comment-14303053
 ] 

Marcus Eriksson commented on CASSANDRA-8707:


* SSTableDeletingTask - why not notify when deleting tmplink files?
* Memory - looks like a fix for some other issue included, break that out in a 
new ticket?
* BloomFilter - the protected copy constructor, why not use the hashCount and 
bitset from the copy that is passed in?

In general I think the new class hierarchy inside SSTableReader 
(TypeManager/GlobalManager/TidyManager/TypeTidy/InstanceTidier) needs to be 
simplified and documented (basically no comments at all right now). I find it 
very hard to follow, perhaps it is because of the naming of the classes (for 
example, the name "TypeManager" is quite generic and tells me very little) or 
that they are nested and access each others private fields.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303099#comment-14303099
 ] 

Benedict commented on CASSANDRA-8707:
-

bq. SSTableDeletingTask - why not notify when deleting tmplink files?

We never did before - in fact we never called this for tmplink files 
previously, but since they could fail to delete for whatever reason, it seemed 
sensible to make it be called for those whilst cleaning this up. I'm assuming 
listeners to deletion notifications care about _actual_ deletion, as opposed to 
deletion of an interim file, so I expect the prior behaviour is correct.

bq. Memory - looks like a fix for some other issue included, break that out in 
a new ticket?

Not so much a fix, as simply throwing in an assertion error if we fail to 
allocate. It's an oversight that _has_ been seen as a possible problem in a 
couple of places recently. Happy to split it out if you prefer, but it seemed 
innoccuous with the other changes.

bq. BloomFilter - the protected copy constructor, why not use the hashCount and 
bitset from the copy that is passed in?

Good point!

bq. In general I think the new class hierarchy inside SSTableReader 
(TypeManager/GlobalManager/TidyManager/TypeTidy/InstanceTidier) needs to be 
simplified and documented (basically no comments at all right now). I find it 
very hard to follow, perhaps it is because of the naming of the classes (for 
example, the name "TypeManager" is quite generic and tells me very little) or 
that they are nested and access each others private fields.

OK, I'll see what I can do to comment that better. The minimal comments were 
"One Per X" which perhaps weren't very clear - they are expanding umbrellas of 
shared state: instance, descriptor "type", and global ("per logical sstable"). 
The XManager is just a shared abstraction for tracking the umbrellas, so when a 
new reader is constructed it asks a manager for a reference to its shared 
state. I'll see if I can make this much clearer.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303145#comment-14303145
 ] 

Benedict commented on CASSANDRA-8707:
-

OK, I've force pushed a version with comments in a directly proceeding commit 
to the introduction of that refactor. Let me know if it is still insufficient.

It is _possible_ we could simplify by not distinguishing between the global and 
per-descriptor-type state. We could, for instance, only support 
getCurrentReplacement() for like-types, and only persist read meter stats for 
FINAL files. The only thing that would not be explicitly wrong would be the 
dropping of the page cache, which might happen prematurely. The reason I like 
the separation, however, is that it directly maps onto the actual lifecycles, 
so (excepting the initial acclimation) there is a lower cognitive burden for 
understanding how and when cleanup occurs; and there are no caveats to the 
other pieces of state and where you can use them. My current view is there is 
too much special casing going on, and it's introducing bugs, so I want to move 
to abstractions that map as closely to use as possible.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303167#comment-14303167
 ] 

Benedict commented on CASSANDRA-8707:
-

I've split the Memory change out into CASSANDRA-8726, and will upload a version 
omitting it and the BF change shortly.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-03 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303566#comment-14303566
 ] 

Joshua McKenzie commented on CASSANDRA-8707:


For the record - I'm vary wary of anything touching SSTableDeletingTask on 2.1 
w/regards to Windows.  That being said - changes there look fine and don't 
appear to break anything obvious to my testing over here.  Only thing that 
popped up with a mixed stress and major compaction running concurrently was 
CASSANDRA-8535 as expected.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303571#comment-14303571
 ] 

Benedict commented on CASSANDRA-8707:
-

The behaviour is the same on Windows (well, some stuff unrelated to deletion 
has been moved out). it is largely the same on Linux, it's just run for early 
opened files as well.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304901#comment-14304901
 ] 

Marcus Eriksson commented on CASSANDRA-8707:


* class comments in RefCounted and Ref out of date (we don't have any 
RefCounted.Impl anymore for example)
* make SegmentedFile.Cleanup abstract and remove the empty tidy implementation. 
There is an instance created in BufferedSegmentedFile, I think it would be 
clearer if the empty tidy() method is implemented there.
* A few unused SharedClosable imports in the *SegmentedFile files

Still waiting for a simplification of the class hierarchy in SSTableReader as 
it is very hard to follow (though, yes, it is perhaps simpler than the linked 
list approach).

If it is as simple as it can be, I'm almost inclined to suggest that we should 
stop doing early opening of compaction results as I doubt the complexity is 
worth it, we have had so many issues with this since we released it (quite a 
few caused by me trying to fix other issues with it).

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305068#comment-14305068
 ] 

Benedict commented on CASSANDRA-8707:
-

Well, all I will say is that I wash my hands of resource cleanup problems 
without some variant of this patch. It has already caught several memory leaks 
before even being committed and used in the wild (all unrelated to the reasons 
I started it).

bq. Still waiting for a simplification of the class hierarchy

I offered one suggestion, that I think makes it worse but does simplify it. 
Another is to eliminate all of the XManager classes and just have them 
duplicate the code for lookup in a statically referenced ConcurrentMap. I did 
it this way first, but found the code duplication very ugly. I'm honestly not 
sure what the problem is here, though, as it seems to map pretty 
straightforwardly onto how things work.

bq.  I'm almost inclined to suggest that we should stop doing early opening of 
compaction results 

We would have to also stop doing summary resampling, as that was broken prior 
to the introduction of the linked-list chain, and is probably still subtly 
broken and brittle. The only distinction with this patch would be the ability 
to (without caveat) merge GlobalTidy and TypeTidy, but this seems like a small 
win, since we could do this now, as I suggested, but would still prefer not to.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305114#comment-14305114
 ] 

Marcus Eriksson commented on CASSANDRA-8707:


bq. simplify by not distinguishing between the global and per-descriptor-type 
state.
bq. eliminate all of the XManager classes.
bq. merge GlobalTidy and TypeTidy

lets try that?

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305137#comment-14305137
 ] 

Benedict commented on CASSANDRA-8707:
-

Like I said, I'd _really_ rather not merge GlobalTidy and TypeTidy, but can if 
you're really set on it. By doing so we trade some slight class hierarchy 
simplcity for special casing, and the latter always leads to problems further 
down the line. I'll offer up a version with the XManager classes removed for 
now, and see where that gets us.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305146#comment-14305146
 ] 

Marcus Eriksson commented on CASSANDRA-8707:


Imo "special casing" is fine if the special case is the only thing you need 
right now, I'd prefer making it more generic when we need to

bq. I'll offer up a version with the XManager classes removed for now,
sounds good

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305155#comment-14305155
 ] 

Benedict commented on CASSANDRA-8707:
-

Special casing is where many of our recent bugs have come from, as far as I can 
tell. They interplay with each other in ways we cannot easily predict, 
especially when focusing on one area of the codebase. We think "it'll be fine 
here, as it doesn't touch anything else, and I've got a handle on what it does" 
- then we remember validation compaction in LCS that breaks our assumption, for 
instance.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305447#comment-14305447
 ] 

Benedict commented on CASSANDRA-8707:
-

OK, uploaded a version with all of the outstanding nits addressed and the 
XManager hierarchy removed. It doesn't look so ugly, to be honest, so we can 
definitely manage with this if it makes it sufficiently clearer. I have my 
fingers crossed :)

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-05 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307337#comment-14307337
 ] 

T Jake Luciani commented on CASSANDRA-8707:
---

I want to give this code time to bake and give users some relief in the 
meantime with the other ~100 fixes in 2.1.3. 

I suggest we push this to 2.1.4 as well as CASSANDRA-8683 and CASSANDRA-8744.  
I suggest we disable incremental compaction for 2.1.3 which should avoid most 
of these bugs.  


> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-06 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308989#comment-14308989
 ] 

Marcus Eriksson commented on CASSANDRA-8707:


I think the penny has finally dropped, the thing I'm missing now is more 
"why?"-comments, for example, why do we need to copy CompressionMetadata in 
CM.Writer#open(..) if finish type is final and not otherwise? In general, more 
"why?"-comments than "what?" would be helpful, at least around the tricky 
parts, I think future readers will appreciate that.

Also, I think we need some high level documentation with concrete examples 
about the life cycle of sstables. I think this part of the code would be much 
simpler to maintain with a description of why and when the different cases 
occur.

So, I'm +1 once we have some documentation (perhaps as a big comment on top in 
SSTableReader?), and a bit of a tweak to the current code comments to focus 
more on why we do certain things.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309098#comment-14309098
 ] 

Benedict commented on CASSANDRA-8707:
-

bq. Also, I think we need some high level documentation with concrete examples 
about the life cycle of sstables. ... (perhaps as a big comment on top in 
SSTableReader?)

This seems like an excellent idea, and this ticket seems like a good 
opportunity to introduce it, although it will have to be revised with each of 
my follow ons. It didn't really occur to me to do this since that is all 
unchanged here, but it has grown overtime to be very non-obvious. It would be 
helpful if, once I've introdiced a high-level overview, the individuals 
familiar with the _tool_ or special compaction lifecycles (or others) could 
amend the SSTableReader description to make it clear how they are expected to 
behave and treat sstables, and what assumptions they utilise, as that is often 
subtly different and I am not at all an expert in those areas.

bq. I think the penny has finally dropped for me, the thing I'm missing now is 
more "why?"-comments, for example, why do we need to copy CompressionMetadata 
in CM.Writer#open(..) if finish type is final and not otherwise?

Could you list the places you think need this service? Since the code outside 
of SSTableReader tidying is largely functionally unchanged, only swapping in a 
new API, it's not clear which bits need this, outside of the high level 
overview. This particular spot, for instance, had this behaviour previously 
(and it's about resizing, rather than copying, since we now know the final size 
of the data so can release it). One of the follow tickets also cleans this 
particular spot up a little, by introducing an enum specific to it that 
describes the reasons for opening (which should make that more obvious). I'm 
hoping a small follow up I've yet to publish will also simplify that particular 
bit further. I'm more than happy to flesh out comments, but would appreciate an 
outline of which bits you think have been lacking historically.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-06 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309166#comment-14309166
 ] 

Marcus Eriksson commented on CASSANDRA-8707:


bq. It would be helpful if, once I've introdiced a high-level overview, the 
individuals familiar with the tool or special compaction lifecycles (or others) 
could amend the SSTableReader description
Absolutely, for now I think that a concrete explanation of what happens to an 
sstable that is early opened and one that is cloneWithNewStart(..):ed is enough

bq. Could you list the places you think need this service? Since the code 
outside of SSTableReader tidying is largely functionally unchanged
I don't think *I* need the service, it is mostly to avoid future confusion, and 
I figured now was a good time to clear up the confusing bits since we are 
touching those code parts (when would we do that otherwise?). I know most of 
the code was just moved around from other places and it was terrible before 
this, but that doesn't really stop us from making it easier for the future.

I just want a small 'why' instead of a comment stating the obvious:
{code}
// we don't want to alert deletions if not final
{code}
{code}
// get a new reference to the shared descriptor-type tidy
{code}

Those are just two examples, once the life cycle description is done, I can add 
the comments about the details I found confusing

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309214#comment-14309214
 ] 

Benedict commented on CASSANDRA-8707:
-

bq. I just want a small 'why' instead of a comment stating the obvious:

I'm honestly not meaning to be obtuse here, I'm just looking for help. For some 
of this code I struggle(d) to see how I could make it clearer, so just dropped 
in a few comments willynilly. In your latter example, I do my best to explain 
the why at the top of each of the Tidy classes, and especially the 
InstanceTidier, that outlines the hierarchy. Here I was just duplicating some 
information so that the concept that was being related to was directly 
mentioned. This is why I ask for guidance, because whilst it may _now_ be clear 
to you it wasn't before, but conversely it is not at all clear to me what 
_isn't/wasn't_ clear, and I really need some help knowing where to fill in. The 
why comments seem to already be there for me in that second example, excepting 
the class-level outline of the lifecycle.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-06 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309217#comment-14309217
 ] 

Marcus Eriksson commented on CASSANDRA-8707:


As I said, I'll walk over the code properly and explain the parts I struggled 
with.

I think the problem is that once you spend the time to understand it, it is 
obvious

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309226#comment-14309226
 ] 

Benedict commented on CASSANDRA-8707:
-

bq. As I said, I'll walk over the code properly and explain the parts I 
struggled with.

Thanks. I do want the code to be clear, and would also like to contribute to 
that where I am able, so if there are specific things you think I can elaborate 
on better, please do point me to them and leave it to me. 

Like you say, the code is made up of a lot of individually obvious things, that 
aren't obvious as a whole, and I've been lost in the weeds long enough to not 
be able to tell the difference between the two.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312612#comment-14312612
 ] 

Benedict commented on CASSANDRA-8707:
-

I've pushed an update with a single mega comment at the head of sstablereader. 
I'm still not 100% sure I've either nailed all of the stuff that should be 
covered, or that it's clear, so please do criticise and point out places it 
could be improved. Obviously it will also get tweaks in all of the upcoming 
patches that modify its behaviour.

I took the liberty of backporting the new OpenReason MOVED_START since it helps 
with the docs as well.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313868#comment-14313868
 ] 

Marcus Eriksson commented on CASSANDRA-8707:


ok, lgtm, +1

pushed 3 small comments to 
https://github.com/krummas/cassandra/commits/bes/8707-2 (with a rebase)

I'll fill out the details about compaction in the comment in SSTableReader in 
CASSANDRA-8764

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313887#comment-14313887
 ] 

Benedict commented on CASSANDRA-8707:
-

Thanks. I'll commit once 2.1.3 is out. I've actually pushed a really tiny 
update to your version 
[here|https://github.com/belliottsmith/cassandra/commits/bes/8707-2] that makes 
SharedCloseableImpl.close() idempotent; this isn't important here, but will be 
useful later, and is encouraged by the API in AutoCloseable.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315199#comment-14315199
 ] 

Benedict commented on CASSANDRA-8707:
-

Pushed a small fix that explains the behaviour encountered by [~aboudrealt] in 
CASSANDRA-8683, which I will rebase off this

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8707) Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted

2015-02-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316158#comment-14316158
 ] 

Benedict commented on CASSANDRA-8707:
-

Updated [this|https://github.com/belliottsmith/cassandra/commits/bes/8707-2] 
branch to include all of the follow up nits. If I could get a final +1 on these 
nits, i'll merge.

> Move SegmentedFile, IndexSummary and BloomFilter to utilising RefCounted
> 
>
> Key: CASSANDRA-8707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8707
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Benedict
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.3
>
>
> There are still a few bugs with resource management, especially around 
> SSTableReader cleanup, esp. when intermixing with compaction. This migration 
> should help. We can simultaneously "simplify" the logic in SSTableReader to 
> not track the replacement chain, only to take a new reference to each of the 
> underlying resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)