[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-27 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383976#comment-14383976
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


I'm partial to the more explicit {{nextSamplePosition = 0}}, so I've committed 
that as f7856c22.  Thanks, I think that should wrap this ticket up.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383561#comment-14383561
 ] 

Benedict commented on CASSANDRA-8993:
-

I hate it when the explanation is simply that I'm an idiot (or if I'm 
charitable, that I forgot the reason). Given this explanation, I think either 
piece of code is as good as the other, so since it's your baby perhaps you can 
decide which you prefer?

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-26 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382867#comment-14382867
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


Since you +1'ed the current patch, I went ahead and committed it as 7ff25f0d.  
I can make a follow-up commit for the nextSamplePosition change if the proposed 
logic sounds correct to you.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-26 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382831#comment-14382831
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


bq. without a setNextSamplePosition(-minIndexInterval) it doesn't pass its test 
cases (i.e. initiating the first sample index deterministically to zero caused 
unit test failures)

I looked into this.  When I simply replaced the 
{{setNextSamplePosition(-minIndexInterval)}} call with {{nextSamplePosition = 
0}}, I got failures in IndexSummaryManagerTest.  Perhaps that's what you were 
seeing?  It looks like the correct replacement for that line is:

{noformat}
nextSamplePosition = 0;
indexIntervalMatches++;
{noformat}

since we implicitly have an index interval match at 0.  With that code, the 
index summary related tests pass.

bq.  I wonder if we couldn't get a lot of the benefit of downsampling by 
sticking to powers of 2, as it might simplify the code significantly? 

That's true, it would simplify the code, we would just lose granularity in how 
the index summary memory pool is distributed.  I wouldn't be opposed to doing 
that, but I don't think it's a pressing issue yet.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-26 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382505#comment-14382505
 ] 

Benedict commented on CASSANDRA-8993:
-

OK, so it makes a lot more sense now that I realise the downsampling 
granularity can be so small - I was thrown by the "BSL must be a power of two" 
without an equivalent statement for the sampling itself, and in my head I just 
assumed it was all dealing with powers of 2 (so nothing technically wrong with 
the comments, just my interpretation of them). This also explains why the 
effective index intervals were always the same - with powers of 2 they would 
be. I wonder if we couldn't get a lot of the benefit of downsampling by 
sticking to powers of 2, as it might simplify the code significantly? The 
original indices, indices to skip, and effective intervals could each be 
implemented with approximately one simple statement. Not pushing for it, mind, 
just airing the question.

Thanks for taking the time to explain, anyway, and with that clarification I am 
+1 the patch as stands.

On the topic of the zero index always being present: I can vouch that this 
assumption breaks somewhere, because I assumed this to be the case when 
modifying IndexSummaryBuilder, and without a 
setNextSamplePosition(-minIndexInterval) it doesn't pass its test cases (i.e. 
initiating the first sample index deterministically to zero caused unit test 
failures). So we should perhaps track down where the logical flaw is, however 
minor it may be.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-26 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382302#comment-14382302
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


I'll try to explain a bit about how downsampling works overall so that more 
people besides myself understand how it works :)

I can put whatever info is useful into comments for posterity.

bq. If I print out the "original indices" and "effective intervals", it seems 
that at the first downsampling level (64)

The sampling level after minimal downsampling is 127, not 64.  The sampling 
level can be anywhere between 0 and BASE_SAMPLING_LEVEL.  When a summary moves 
from sampling level 128 to level 127, it will drop one summary entry with an 
index between \[0, 127\], one entry between \[127, 255\], and so on for the 
rest of the summary.  The index to drop is determined by 
{{Downsampling.getSamplingPattern()}}. The list of integers returned from 
{{Downsampling.getSamplingPattern(BASE_SAMPLING_LEVEL)}} are the indexes that 
we'll drop for each round of downsampling.

As an example, suppose BASE_SAMPLING_LEVEL is 16 instead of 128.  
{{Downsampling.getSamplingPattern(16)}} returns the following pattern:

{noformat}
15, 7, 11, 3, 13, 5, 9, 1, 14, 6, 10, 2, 12, 4, 8, 0
{noformat}

So, when we move from sampling level 16 to 15, we'll drop the entry at index 15 
(and repeat that for indexes 15 + (16 * 1), 15 + (16 * 2), 15 + (16 * 3), etc). 
 When we move from sampling level 15 to 14, we'll drop the entry at index 7 
(and repeat as before, but take into account the fact that we've already 
dropped the entry at index 15).  This pattern of dropping minimizes the maximum 
distance between remaining summary entries.

Now, in practice, we will never move from sampling level 128 directly to level 
127 because of IndexSummaryManager's {{DOWNSAMPLE_THRESHOLD}}.  However, an 
index summary could go through multiple rounds of down and upsampling and 
arrive at level 127, so we need to be able to handle that.

bq. Further confusion to understanding Downsampling as a whole stems from the 
permission of a -1 index into getEffectiveIndexIntervalAfterIndex without 
explanation

Hmm, yeah, looking at the code, I don't think we actually need to handle that.  
I believe it is leftover logic from earlier in the development of the code when 
downsampling would remove the 0th index in an earlier round.  With the current 
code, the 0th index entry should always be present.  I'll make some changes to 
remove that.

bq. and the fact that every effective interval is the same despite there being 
multiple avenues for calculating it

I'm not sure what you mean here. 

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381053#comment-14381053
 ] 

Benedict commented on CASSANDRA-8993:
-

bq.  that it should hopefully clarify the reason.

Unfortunately not, at least for me. I am afraid I don't really follow the 
Downsampling logic, and the comment actually confused me further. I tried to 
print out the various static method states in Downsampling to visualise them 
better, and it still doesn't make much sense to me, so I'll outline my 
confusion and we can decide either to help me understand, or to perhaps get 
[~jbellis] (the original reviewer iirc) to step in. I should note that, either 
way, I expect the test you've introduced to be sound for solving this problem, 
so if this takes too long we can roll a version without further ceremony, but I 
think it may be an unnecessarily lengthy test on startup for old format files, 
which could be onerous.

bq. Unfortunately, the first entry to be dropped is the entry at index 
(BASE_SAMPLING_LEVEL - 1), so we need to check a full set of 
BASE_SAMPLING_LEVEL entries.

If I print out the "original indices" and "effective intervals", it seems that 
at the first downsampling level (64) alternating summary entries are dropped 
(and again for each extra level), up to dropping all but each 128th (beginning 
with zero), so the first half of the comment doesn't seem to match with the 
behaviour I see in a way I understand. If the behaviour matches what I see, and 
not the comment, then it seems we could safely just check the first two 
"expected" intervals and if they mismatch we've downsampled and need to 
rebuild. This would translate into always just checking 2 * minIndexInterval 
records in the index, or 1/64th the data.

Further confusion to understanding Downsampling as a whole stems from the 
permission of a -1 index into getEffectiveIndexIntervalAfterIndex without 
explanation, and the fact that every effective interval is the same despite 
there being multiple avenues for calculating it (it would be much clearer if it 
were just minIndexInterval * (BASE_SAMPLING_LEVEL / samplingLevel).

I apologise if I'm just being a bit dimwitted.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-25 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380553#comment-14380553
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


bq. Might it be easier to filter the list of readers earlier

Sure, there's no good reason not to do this earlier, so I've moved it to the 
suggested location.

bq. Could you clarify SSTableReader Line 875: why do we check 
BASE_SAMPLING_LEVEL entries?

I meant to explain this a little better, thanks for catching that.  I've 
improved the comment enough that it should hopefully clarify the reason.

My [branch|https://github.com/thobbs/cassandra/tree/CASSANDRA-8993] has been 
updated in addition to the v2 patch.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1-v2.txt, 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380252#comment-14380252
 ] 

Benedict commented on CASSANDRA-8993:
-

* Might it be easier to filter the list of readers earlier, in 
{{redistributeSummaries(List compacting, List 
nonCompacting, long memoryPoolBytes)}} ?
* Could you clarify SSTableReader Line 875: why do we check BASE_SAMPLING_LEVEL 
entries?

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993-2.1.txt, 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-24 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377985#comment-14377985
 ] 

Benedict commented on CASSANDRA-8993:
-

[~Hachmann]: Yes, that's the expectation - I'm as certain as I can be that this 
will prevent the underlying bug causing any problems. We would love for you to 
corroborate this in the wild, though, by deploying the patch if you can.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377981#comment-14377981
 ] 

Björn Hachmann commented on CASSANDRA-8993:
---

Hi @Benedict,

as I do not completely understand the discussion in this ticket, I ask myself, 
if the attached patch might also be a cure for the NPEs reported for 
CASSANDRA-8851? 

Thank you + Kind regards!

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370218#comment-14370218
 ] 

Benedict commented on CASSANDRA-8993:
-

Huh, you're right, it's CQL rows we track. 

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370213#comment-14370213
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


I don't think we actually track a row count for sstables anywhere (including 
SSTable/StatsMetadata), so we'll need to compare summary entries against index 
entries for the first 2048 index entries.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370187#comment-14370187
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


For the fix, we'll need to do two things. First, stop overwriting summaries 
when the sstable is not a 2.1 format.  Second, to recover from this bug, when 
we load sstables with the old format, we need to check the summary against the 
actual index to verify that it's at full sampling.  If not, trigger a rebuild 
of the summary.  Unfortunately, I think that will need to be done every time 
until the sstables have been converted to the new format.  We could avoid that 
by persisting a flag somewhere, but I don't think the additional complexity is 
worth the savings in startup time.

I'll work on the patch for this.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370178#comment-14370178
 ] 

Benedict commented on CASSANDRA-8993:
-

Could we corroborate this using SSTableMetadata rowCount when deserializing?

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370150#comment-14370150
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


Okay, I think I have an idea of what's happening.  The 2.0-format SSTable is 
loaded by 2.1 and is eventually downsampled.  This results in the summary being 
overwritten on disk (with every 2048th index entry).  However, we overwrite it 
with the old 2.0 format, which does _not_ include the sampling level.  Whenever 
the sstable and summary are reloaded (due to a restart, etc), there's no 
sampling level to read from the serialized summary, so it's assumed to be fully 
sampled (with every 128th index entry).  I'm pretty sure that's what's leading 
to this mismatch.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370092#comment-14370092
 ] 

Benedict commented on CASSANDRA-8993:
-

bq. I'm assuming you inspected the serialized index summary and saw an (old) 
interval of 2048?

No, I'm using the summary to search the index, and logging how far I have to go 
to find the record I want. It ranges up to 2048 items. The value we deserialize 
is 128, so presumably it was serialized incorrectly, but I'm really not at all 
familiar with this to know it isn't a conversion problem of some kind.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370087#comment-14370087
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


bq. I have had another closer look. It seems that we're reading an interval of 
128, when in fact it is 2048.

Are you referring to when we deserialize the index summary?  The default 
{{index_interval}} in 2.0 was 128.  In 2.1, we use the old value of 
{{index_interval}} for {{min_index_interval}} and set {{max_index_interval}} to 
2048.  Unfortunately there was a bug in cqlsh in 2.1, so neither property is 
shown in the {{DESCRIBE}} output.  I'm assuming you inspected the serialized 
index summary and saw an (old) interval of 2048?

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370027#comment-14370027
 ] 

Benedict commented on CASSANDRA-8993:
-

I've pushed the current mitigation, anyway

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369998#comment-14369998
 ] 

Benedict commented on CASSANDRA-8993:
-

I have had another closer look. It seems that we're reading an interval of 128, 
when in fact it is 2048. Exactly why this is happening, I don't know. I've 
tried to poll git to see where it's gone wrong, but it's probably more easily 
done by someone with knowledge of the history here.

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369973#comment-14369973
 ] 

Benedict commented on CASSANDRA-8993:
-

Either way, I think it would be helpful to expand the current test coverage to 
include summaries after the serialization (under different de/serialization 
settings)

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369967#comment-14369967
 ] 

Benedict commented on CASSANDRA-8993:
-

Early opening doesn't touch the index or index summary contents. I've 
reproduced the problem on the raw sstable directly with some hacky code to just 
open the index and summary directly. The sstable is a prior format, so it has 
never been successfully subjected to early opening or rewriting. You can 
download the files from CASSANDRA-8851 yourself, and I can provide you with the 
gpg key. I suspect the problem is related to the elimination of "indexInterval" 
from CfMetaData prematurely. It looks like it is needed to establish the actual 
sampling level - users that have modified this will have the incorrect level 
set after upgrade until their sstables are rewritten.



> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8993) EffectiveIndexInterval calculation is incorrect

2015-03-19 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369903#comment-14369903
 ] 

Tyler Hobbs commented on CASSANDRA-8993:


I'm okay with the patch to mitigate incorrect calculations, but at the moment 
I'm at a loss as to where the calculation is going wrong.  We have pretty 
thorough test coverage around downsampling (especially with the modifications 
to the test in your patch).  I can only guess that it may be some interaction 
between downsampling and early opening.  We don't have a way to reproduce yet, 
correct?  Do you think extending the SSTableRewriterTests with checks on 
{{getPosition(..., EQ)}} and downsampling of index summaries would yield 
something?

> EffectiveIndexInterval calculation is incorrect
> ---
>
> Key: CASSANDRA-8993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8993
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Blocker
> Fix For: 2.1.4
>
> Attachments: 8993.txt
>
>
> I'm not familiar enough with the calculation itself to understand why this is 
> happening, but see discussion on CASSANDRA-8851 for the background. I've 
> introduced a test case to look for this during downsampling, but it seems to 
> pass just fine, so it may be an artefact of upgrading.
> The problem was, unfortunately, not manifesting directly because it would 
> simply result in a failed lookup. This was only exposed when early opening 
> used firstKeyBeyond, which does not use the effective interval, and provided 
> the result to getPosition().
> I propose a simple fix that ensures a bug here cannot break correctness. 
> Perhaps [~thobbs] can follow up with an investigation as to how it actually 
> went wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)