[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-08-01 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726788#comment-13726788
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


An initial idea for the implementation:

Based on the recent (last 15m?) read rate (reads/sec), periodically down-sample 
the summary for SSTables which fall below the mean rate.  The down-sampling 
rate could use a sliding scale based on the ratio of the mean to that SSTable's 
rate.  As a example basic implementation, keep X% of the samples, where {{X = 
max(25, min(100, 100 * (rate / mean_rate)))}}, so the coldest SSTables keep 
only 25% of the samples in memory.

Presenting a way for the user to tune this (other than a simple on/off) is a 
little trickier.  Perhaps make the min (default 25%) adjustable?  Or start 
down-sampling at a configurable point (the default is the mean)?  Those could 
also be automatically adjusted based on memory pressure.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Priority: Minor
> Fix For: 2.0.1
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-08-06 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731589#comment-13731589
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

Good start, but it seems a little fragile to me if a bunch of sstables are 
suddenly warmed up.

What about this?

We could define a fixed-size memory pool, similar to what we do for memtables 
or cache, and allocate it to the sstables proportional to their hotness.  Every 
15 minutes (which seems like a lot, maybe hourly?) we recalculate and rebuild 
the summaries.  Maybe we only rebuild the ones that are X% off of where they 
should be to make it lighter-weight.  Or if we're downsampling by more than 2x 
then we can just resample what we already have in memory instead of rebuilding 
"correctly."

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Priority: Minor
> Fix For: 2.0.1
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-08-07 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732588#comment-13732588
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


bq. We could define a fixed-size memory pool, similar to what we do for 
memtables or cache, and allocate it to the sstables proportional to their 
hotness.

It would be hard to describe this in text, so here's my pythonic psuedocode for 
distributing the fixed-size memory pool:

{noformat}
total_reads_per_sec = sum(sstable.reads_per_sec for sstable in sstables)
sstables_to_downsample = set()
leftover_entries = 0
for sstable in sstables:
allocated_space = total_space * (sstable.reads_per_sec / 
total_reads_per_sec)
num_entries = total_space / (SPACE_PER_ENTRY)  # space per entry = token + 
position + overhead
if (num_entries > sstable.max_index_summary_entries):
sstable.num_index_summary_entries = max_index_summary_entries
leftover_entries = num_entries - sstable.max_index_summary_entries
else
sstable.num_index_summary_entries = num_entries
sstables_to_downsample.add(sstable)

# distribute leftover_entries among sstables_to_downsample based on read rates
# (this probably ends up looking like a recursive or iterative function)
{noformat}

bq. Maybe we only rebuild the ones that are X% off of where they should be to 
make it lighter-weight.

That's a good idea. (I was thinking of using a step function.)  Instead of "X% 
off of where they should be", I would more precisely phrase that as "X% away 
from their previous proportion".

bq.  Or if we're downsampling by more than 2x then we can just resample what we 
already have in memory instead of rebuilding "correctly."

If you down-sample with a particular pattern, you can always down-sample using 
just the in-memory points; only up-samples need to read from disk.

I'm trying to generalize the down-sampling pattern, but the two main points are 
(assuming 1% granularity):
* For every 1% you down-sample, the number of points to remove from the 
in-memory summary is equal to 1% of the original (on-disk) count
* Each 1% down-sampling run starts at a different offset to evenly space the 
down-sampling

For example, to down-sample from 100% to 99%, you would remove every hundredth 
point, starting from index 0.  To down-sample from 99% to 98%, you would remove 
every 99th point, starting from index 50.  To down-sample from 98% to 97%, you 
would remove every 98th point, starting from index 24 or 74, and so on.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Priority: Minor
> Fix For: 2.0.1
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-08-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733593#comment-13733593
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

Sounds reasonable.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Priority: Minor
> Fix For: 2.0.1
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-10-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802408#comment-13802408
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

LGTM

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-10-31 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810813#comment-13810813
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


How do we want to handle the memory pool not being large enough to accommodate 
all of the index summaries (even after downsampling)?  Just make it a 
best-effort?

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-12 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820653#comment-13820653
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


I need to put this through more thorough testing and benchmarking, but I think 
it's at a good point for a preliminary review: 
https://github.com/thobbs/cassandra/compare/CASSANDRA-5519

A few comments/questions:
* I went with best-effort for the memory pool (if all summaries don't fit in 
the allotted space even at the minimum sampling level, there's nothing we can 
do about it).  The amount of memory used may also temporarily exceed the limit 
while building new summaries.
* There are two new cassandra.yaml options: one for controlling the memory pool 
size and one for regulating how frequently summaries are resized.  These can 
also be set through JMX. We could conceivably also make the down/upsample 
thresholds and the minimum sampling level configurable.  All of these default 
values are just guesses.
* I went with a reference counting strategy for free'ing the IndexSummary's 
Memory.  This makes the API a bit unpleasant (mostly in SSTR), but it should 
have low overhead.  A ReadWriteLock might also work well instead of this with a 
cleaner API; let me know if I should benchmark the two for comparison.
* I'm triggering the IndexSummaryManager singleton's initialization in 
DatabaseDescriptor; this feels wrong, so I'm open to suggestions.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821991#comment-13821991
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

What is the relationship between BASE_SAMPLING_LEVEL and MIN_SAMPLING_LEVEL 
with indexInterval?

How many rows do we get for 5% of a 8GB heap?

Isn't it a minor bug to just ignore compacting sstables?  Suggest reducing 
memory pool to allocate to the uncompacting ones, by the amount allocated to 
the compacting ones.

Could we just resample at compaction time instead of dealing with refcounting 
or locking?  That probably gives up too much of the potential benefits.  But I 
think we could make it almost as elegant by using the datatracker replace 
mechanism originally for compaction, to build a new SSTR and swap it in w/o 
extra concurrency controls.

Is the idea behind touching it in DD to force the mbean to be loaded, or is 
there a circular dependency that breaks w/o that?

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-14 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822623#comment-13822623
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


bq. What is the relationship between BASE_SAMPLING_LEVEL and MIN_SAMPLING_LEVEL 
with indexInterval?

{{BASE/MIN_SAMPLING_LEVEL}} are orthogonal to {{indexInterval}}.  
{{BASE_SAMPLING_LEVEL}} essentially sets the granularity at which you can 
down/upsample.  {{MIN_SAMPLING_LEVEL}} sets a limit on how low you can 
downsample.  (I'll note that we could potentially raise {{indexInterval}} 
alongside these changes in order to have more summary entries for hot sstables.)

bq. How many rows do we get for 5% of a 8GB heap?

That gives us ~410 MiB to work with.  If we assume the average key length is 8 
bytes, each summary entry uses 20 bytes of space, giving us ~21 million summary 
entries.

At full sampling, that's 21MM * 128 = 2.7 billion rows, assuming no overlap 
across sstables. At minimum sampling, that's ~11 billion rows.

If the avg key size is 16 bytes, that drops to ~2 and ~8 billion rows.

bq. Isn't it a minor bug to just ignore compacting sstables? Suggest reducing 
memory pool to allocate to the uncompacting ones, by the amount allocated to 
the compacting ones.

Good point, I agree.

bq. Could we just resample at compaction time instead of dealing with 
refcounting or locking? That probably gives up too much of the potential 
benefits.

Yeah, that would probably be okay for small sstables that are compacted 
frequently, but the large sstables would be tuned poorly, and those make up the 
majority of the memory use.

bq. I think we could make it almost as elegant by using the datatracker replace 
mechanism originally for compaction, to build a new SSTR and swap it in w/o 
extra concurrency controls.

That's a good idea; I think it would be fairly clean.  I'll give that a shot.

bq. Is the idea behind touching it in DD to force the mbean to be loaded, or is 
there a circular dependency that breaks w/o that?

Neither the {{IndexSummaryManager}} singleton nor the mbean are loaded without 
that.  No other classes use the {{IndexSummaryManager}},
so the static fields are never initialized.  (Just importing the classes 
doesn't seem to trigger the class loader.)

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823091#comment-13823091
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

Pushed my cleanup to https://github.com/jbellis/cassandra/commits/5519.

(Moved the ISM init to StorageService were we have some existing examples of 
similar.)

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823099#comment-13823099
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

Rather than expose MSL directly as a config option, how about changing 
index_interval to max_index_interval and adding a min_index_interval?  We could 
compute (as close as possible) MSL from min_index_interval.

(I don't think users will need to tune BSL.  128 lets us be accurate to with in 
1% which seems totally reasonable to me.)

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-14 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823118#comment-13823118
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


The cleanup looks good overall, thanks.

bq. Rather than expose MSL directly as a config option, how about changing 
index_interval to max_index_interval and adding a min_index_interval? We could 
compute (as close as possible) MSL from min_index_interval.

That sounds good to me.

bq. (I don't think users will need to tune BSL. 128 lets us be accurate to with 
in 1% which seems totally reasonable to me.)

Agreed.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827060#comment-13827060
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


Would it be alright to split the replacement of {{index_interval}} by 
{{max_index_interval}} and {{min_index_interval}} into another ticket just for 
sanity's sake?  It looks like a lot of changes need to be done for that, and 
they're independent of the changes for this ticket.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827100#comment-13827100
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

WFM.  What does that leave for this one?

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827108#comment-13827108
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


bq. WFM. What does that leave for this one?

I still need to account for spaced used by compacting SSTables, and I'm putting 
it through some more thorough testing.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827114#comment-13827114
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


Created CASSANDRA-6379 for the {{index_interval}} changes.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829406#comment-13829406
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

I'm pretty sure we can get rid of the isReplaced flag now.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: 5519-v1.txt, downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-21 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829417#comment-13829417
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


bq. I'm pretty sure we can get rid of the isReplaced flag now.

We still need it in order to do the proper cleanup on the replaced SSTR once 
all references are released, unless I'm missing something.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: 5519-v1.txt, downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-21 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829170#comment-13829170
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


This should be good for a second round of reviewing.  I opened a pull request 
against my own repo so that you can comment inline, if you'd like: 
https://github.com/thobbs/cassandra/pull/1

Changes since the last review:
* The entire {{SSTableReader}} is replaced instead of just the IndexSummary.
* Space used by compacting SSTables is accounted for
* Enough extra space is reserved to cover rebuilding the largest summary
* In order to stay within the memory usage limit on startup, the on-disk 
Summary is replaced whenever it is resampled.  I increased the threshold for 
downsampling to make this less frequent.  The alternative would be to always 
keep the full summary on disk and have a somewhat more complicated startup 
procedure.  I would appreciate your thoughts on this.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829475#comment-13829475
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

Hmm.  I see two uses of isReplaced:

# releaseReference, which can be reverted back to trunk form since isReplaced 
== !isCompacted
# close, which is only called by snapshot repair (and releaseReference) which 
will never do any index summary replacements

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: 5519-v1.txt, downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829201#comment-13829201
 ] 

Jonathan Ellis commented on CASSANDRA-5519:
---

bq. the on-disk Summary is replaced whenever it is resampled

Good call; startup time is a big pain point for some people and we don't want 
to make that worse.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-21 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829178#comment-13829178
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


I should also mention that if you want to test it out, I suggest setting 
logging to TRACE for o.a.c.io.sstable.IndexSummary manager, 
{{index_summary_capacity_in_mb}} to 1, and 
{{index_summary_resize_interval_in_minutes}} to 1.  That should give you a good 
picture of what's going on.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5519) Reduce index summary memory use for cold sstables

2013-11-22 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830123#comment-13830123
 ] 

Tyler Hobbs commented on CASSANDRA-5519:


bq. releaseReference, which can be reverted back to trunk form since isReplaced 
== !isCompacted

True

bq. close, which is only called by snapshot repair (and releaseReference) which 
will never do any index summary replacements

We still need to have different behavior for the {{close()}} call by snapshot 
repair, as it needs to perform the full close even though {{isCompacted}} will 
be false.  While we could add a parameter to close() or define a 
{{closeReplacedReader()}} method, it seems clearer and more future-proof to 
keep the isReplaced flag.

> Reduce index summary memory use for cold sstables
> -
>
> Key: CASSANDRA-5519
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5519
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Tyler Hobbs
>Priority: Minor
> Fix For: 2.1
>
> Attachments: 5519-v1.txt, downsample.py
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)