Re: DTCS Question

2016-03-19 Thread Marcus Eriksson
On Wed, Mar 16, 2016 at 6:49 PM, Anubhav Kale 
wrote:

> I am using Cassandra 2.1.13 which has all the latest DTCS fixes (it does
> STCS within the DTCS windows). It also introduced a field called
> MAX_WINDOW_SIZE which defaults to one day.
>
>
>
> So in my data folders, I may see SS Tables that span beyond a day
> (generated through old data through repairs or commit logs), but whenever I
> see a message in logs “Compacted Foo” (meaning the SS Table under question
> was definitely a result of compaction), the “Foo” SS Table should never
> have data beyond a day. Is this understanding accurate ?
>
No - not until https://issues.apache.org/jira/browse/CASSANDRA-10496 (read
for explanation)


>
>
> If we have issues with repairs pulling in old data, should MAX_WINDOW_SIZE
> instead be set to a larger value so that we don’t run the risk of too many
> SS Tables lying around and never getting compacted ?
>
No, with CASSANDRA-10280 that old data will get compacted if needed
(assuming you have default settings). If the remote node is correctly date
tiered, the streamed sstable will also be correctly date tiered. Then that
streamed sstable will be put in a time window and if there are enough
sstables in that old window, we do a compaction.

/Marcus


RE: DTCS Question

2016-03-18 Thread Anubhav Kale
Thanks for the explanation.

From: Marcus Eriksson [mailto:krum...@gmail.com]
Sent: Thursday, March 17, 2016 12:56 AM
To: user@cassandra.apache.org
Subject: Re: DTCS Question



On Wed, Mar 16, 2016 at 6:49 PM, Anubhav Kale 
<anubhav.k...@microsoft.com<mailto:anubhav.k...@microsoft.com>> wrote:
I am using Cassandra 2.1.13 which has all the latest DTCS fixes (it does STCS 
within the DTCS windows). It also introduced a field called MAX_WINDOW_SIZE 
which defaults to one day.

So in my data folders, I may see SS Tables that span beyond a day (generated 
through old data through repairs or commit logs), but whenever I see a message 
in logs “Compacted Foo” (meaning the SS Table under question was definitely a 
result of compaction), the “Foo” SS Table should never have data beyond a day. 
Is this understanding accurate ?
No - not until 
https://issues.apache.org/jira/browse/CASSANDRA-10496<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fissues.apache.org%2fjira%2fbrowse%2fCASSANDRA-10496=01%7c01%7cAnubhav.Kale%40microsoft.com%7c1dde7659fb8a420b61f308d34e3993dc%7c72f988bf86f141af91ab2d7cd011db47%7c1=7334rIfNRo0Oz5sXGAlATOmAkbmFJg4cqifXbGm23qA%3d>
 (read for explanation)


If we have issues with repairs pulling in old data, should MAX_WINDOW_SIZE 
instead be set to a larger value so that we don’t run the risk of too many SS 
Tables lying around and never getting compacted ?
No, with CASSANDRA-10280 that old data will get compacted if needed (assuming 
you have default settings). If the remote node is correctly date tiered, the 
streamed sstable will also be correctly date tiered. Then that streamed sstable 
will be put in a time window and if there are enough sstables in that old 
window, we do a compaction.

/Marcus