RE: DTCS Question

2016-03-22 Thread Anubhav Kale
I tried the patch, and while it works almost similar to what I described below, 
it does deviate at times due to "alignment" logic.

I opened https://issues.apache.org/jira/browse/CASSANDRA-11407 and provided a 
patch there. I would appreciate thoughts. 

-Original Message-
From: Anubhav Kale [mailto:anubhav.k...@microsoft.com] 
Sent: Thursday, March 17, 2016 12:08 PM
To: dev@cassandra.apache.org
Subject: RE: DTCS Question

Thanks for the long explanation, I looked at the link you pointed to and it 
does seem to concur with my mental model. 

Do you see any issues with that model and simplify this logic to 

1. Create windows from start (min) to end (max) going from maximum possible 
size.
2. Scan all SS Tables and put them in appropriate buckets.

To be honest, the Target class is really difficult to reason about. The reason 
I investigated this was we wanted to reason about how our SS Tables are 
looking, and I unfortunately can't.

Thanks again for the explanation !!

-Original Message-
From: Björn Hegerfors [mailto:bj...@spotify.com]
Sent: Thursday, March 17, 2016 11:19 AM
To: dev@cassandra.apache.org
Subject: Re: DTCS Question

That is probably close to the actual way it works, but not quite equal. My 
mental model when making this went backwards in time, towards 0, not forwards.

It's something like this (using the numbers from your first example): make a 
bucket of the specified "timeUnit" size (1000), that contains the "now"
timestamp (4050), where the starting (and therefore also the ending) timestamp 
of the bucket is 0 modulo the size of the bucket. That last point is perhaps 
the trickiest point to follow. There is only one such place for the bucket, 
[4000-5000) in this case. No other bucket that is aligned with the 1000s can 
contain 4050.

Now, the next bucket (backwards) is computed based on this [4000-5000) bucket. 
Most of the time it will simply be the same-sized bucket right before it, i.e. 
[3000-4000), but if the start timestamp of our bucket (4000), divided by its 
size (so 4), is 0 modulo "base" (2 in this case), which it happens to be here, 
then we increase out bucket size "base" times, and instead make the bucket of 
*that* size that ends right before our current bucket. So the result will be 
[2000-4000).

This method of getting the next bucket is repeated until we reach timestamp 0. 
Using the above logic, we don't increase the size of the bucket this time, 
because we have a start timestamp of 2000 which becomes 1 when divided by the 
size (2000). So we end up with [0, 2000), and we're done.
The buckets were [4000-5000), [2000-4000) and [0-2000).

What's more important than understanding these rules is of course getting some 
kind of intuition for this. Here's what it boils down to: we want there to be 
"base" equally sized buckets right next to each other before we
*coalesce* them. Every bucket is aligned with its own size (as an analogy, 
compilers typically align 4-byte integers on addresses divisible by 4, same 
concept). So, by extension, the bigger bucket they coalesce into must be 
aligned with *its* size. Not just any "base" adjacent buckets will do, it will 
be those that align with the next size.

The remaining question is when do they coalesce? There will always be at least 
1 and at most "base" buckets of every size. Say "base"=4, then there can be 4 
bucket of some size (by necessity next to each other and aligned on 4 times 
their size). The moment a new bucket of the same size appears, the 4 buckets 
become one and this "fifth" bucket will be alone with its size (and the start 
of a new group of 4 such buckets). (The rule for making the bucket were the 
"now" timestamp lives in, is where new buckets come from).

I wish this was easier to explain in simple terms. I personally find this to 
have very nice properties, in that it gives every bucket a fair amount of time 
to settle before it's time for the next compaction.

Interestingly, I proposed an alternative algorithm in this ticket 
,
 including a patch implementing it. My gut tells me that the mental model that 
you've used here is actually equivalent to that algorithm in the ticket. It's 
just expressed in a very different way. Might be something for me to try to 
prove when I'm really bored :)

Hope this helped! Any particular reason you're investigating this?

/
Bj0rn

On Thu, Mar 17, 2016 at 5:43 PM, Anubhav Kale 
wrote:

> Hello,
>
> I am trying to concretely understand how DTCS makes buckets and I am 
> looking at the DateTieredCompactionStrategyTest.testGetBuckets method 
> and played with some of the parameters to GetBuckets method call 
> (Cassandra 2.1.12).
>
> I don't think I fully understand som

[GitHub] cassandra pull request: Change token validation of Murmur3Partitio...

2016-03-22 Thread verma7
Github user verma7 closed the pull request at:

https://github.com/apache/cassandra/pull/64


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: Change token validation of Murmur3Partitio...

2016-03-22 Thread verma7
GitHub user verma7 opened a pull request:

https://github.com/apache/cassandra/pull/64

Change token validation of Murmur3Partitioner to display a helpful er…

…ror message if the token is invalid

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/verma7/cassandra 9348-trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/64.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #64


commit 11c52b904bc284dd0738181ff9b208671f7f5522
Author: Abhishek Verma 
Date:   2016-03-22T21:18:28Z

Change token validation of Murmur3Partitioner to display a helpful error 
message if the token is invalid




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---