subject:"sstable"

Re: concurrent sstable read

2022-10-25 Thread Jeff Jirsa

Sequentially, and yes - for some definition of "directly" - but not just
because it's sequential, but also because each sstable has cost in reading
(e.g. JVM garbage created when you open/seek that has to be collected after
the read)

On Tue, Oct 25, 2022 at 8:27 AM Grzegorz Pietrusza 
wrote:

> HI all
>
> I can't find any information about how cassandra handles reads involving
> multiple sstables. Are sstables read concurrently or sequentially? Is read
> latency directly connected to the number of opened sstables?
>
> Regards
> Grzegorz
>

concurrent sstable read

2022-10-25 Thread Grzegorz Pietrusza

HI all

I can't find any information about how cassandra handles reads involving
multiple sstables. Are sstables read concurrently or sequentially? Is read
latency directly connected to the number of opened sstables?

Regards
Grzegorz

sstable-to-arrow

2021-07-28 Thread Sebastian Estevez

Hi folks,

There was some discussion on here a couple of weeks ago about using the
Apache Arrow in memory format for Cassandra data so I thought I'd share the
following posts / code we just released as alpha (apache 2 license).


Code:
https://github.com/datastax/sstable-to-arrow

Post Part 1:
https://www.datastax.com/blog/analyzing-cassandra-data-using-gpus-part-1
Post Part 2:
https://www.datastax.com/blog/analyzing-cassandra-data-using-gpus-part-2

I also think the cross language sstable parsing code and visual
documentation is a tremendous contribution for the project and would love
to see more folks pick it up and use it for other purposes.

If anyone is interested feel free to reach out or join our live workshop on
this topic in mid August:

https://www.eventbrite.com/e/workshop-analyzing-cassandra-data-with-gpus-tickets-164294668777

--Seb

Re: sstable processing times

2020-10-24 Thread Erick Ramirez

The operation will run in a single anti-compaction thread so it won't
consume more than 1 CPU. The operation will mostly be IO-bound with the
disk being the most bottleneck. Are running it on a direct-attached SSD? It
won't perform well if you're running it on an EBS volume or some other slow
disk. Cheers!

sstable processing times

2020-10-23 Thread James A. Robinson

Hi folks,

I'm running a job on an offline node to test how long it takes to run
sstablesplit several large sstable.

I'm a bit dismayed to see it took about 22 hours to process a 1.5
gigabyte sstable!  I worry about the 32 gigabyte sstable that is my
ultimate target to split.

This is running on an otherwise unloaded Linux 3.10.0 CentOS 7 server
with 4 cpus and 24 gigabytes of ram.  Cassandra 3.11.0 and OpenJDK
1.8.0_252 are the installed versions of the software.

The machine isn't very busy itself, it looks as though java is only
making use of 1 of the 4 processors, and it's not using much of the
available 24 gigabytes of memory either, all the memory usage is in
the linux buffer cache, which I guess makes sense if it's just working
on these large files w/o needing to do a lot of heavy computation on
what it reads from them.

When you folks run sstablesplit, do you provide specific
CASSANDRA_INCLUDE settings to increase its performance?

Jim

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Corrupt SSTable

2020-08-13 Thread Nitan Kainth

If you are not deleting or updating data then it should be safe to use 2nd 
approach. 

Regards,
Nitan
Cell: 510 449 9629

> On Aug 13, 2020, at 11:48 AM, Pushpendra Rajpoot 
>  wrote:
> 
> 
> Hi,
> 
> I have a cluster of 2 DC, each DC has 5 nodes in production. This cluster is 
> based on active-passive model i.e. application is writing data on one DC 
> (Active) & it's replicated to other DC (Passive).
> 
> My Passive DC has corrupt sstables (3 nodes out of 5 nodes) whereas there are 
> no corrupt sstables on any node in Active DC.
> 
> Now, I am planning to do the following activity on affected nodes in Passive 
> DC:
> 1. delete keyspace having corrupt sstables
> 2. run 'nodetool rebuild'
> 
> DO you see any problem in the above approach ?
> 
> Another approach which I am considering as 'Second Option' is as given below:
> 1. Stop the affected node
> 2. Remove all corrupt sstables from the affected node
> 3. Start the affected node
> 4. Run repair on the affected node
> 
> In my case, data is not written to Passive DC and has corrupt sstables.
> 
> Does the 2nd approach lead to data resurrection ?
> 
> I found in one of the threads in 'Mailing List' that data can ressurect if I 
> run 'nodetool repair'. Because of this reason, I considered this as a 
> secondary approach.
> 
> Do you have any other approach to fix this problem OR I can go with one of 
> the above approaches?
> 
> Regards,
> Pushpendra

Corrupt SSTable

2020-08-13 Thread Pushpendra Rajpoot

Hi,

I have a cluster of 2 DC, each DC has 5 nodes in production. This cluster
is based on active-passive model i.e. application is writing data on one DC
(Active) & it's replicated to other DC (Passive).

My Passive DC has corrupt sstables (3 nodes out of 5 nodes) whereas there
are no corrupt sstables on any node in Active DC.

Now, I am planning to do the following activity on affected nodes in
Passive DC:
1. delete keyspace having corrupt sstables
2. run 'nodetool rebuild'

DO you see any problem in the above approach ?

Another approach which I am considering as 'Second Option' is as given
below:
1. Stop the affected node
2. Remove all corrupt sstables from the affected node
3. Start the affected node
4. Run repair on the affected node

In my case, data is not written to Passive DC and has corrupt sstables.

Does the 2nd approach lead to data resurrection ?

I found in one of the threads in 'Mailing List' that data can ressurect if
I run 'nodetool repair'. Because of this reason, I considered this as a
secondary approach.

Do you have any other approach to fix this problem OR I can go with one of
the above approaches?

Regards,
Pushpendra

Re: Records in table after deleting sstable manually

2020-08-11 Thread Kunal

Thanks Jeff. Appreciate your reply. as you said , looks like some there
were entries in commitlogs and when cassandra was brought up after deleting
sstables, data from commitlog replayed. May be next time I will let the
replay happen after deleting sstable and then truncate table using CQL.
This will ensure my table is empty. I could not truncate from CQL in the
first place as one of the node was not up.

Regards,
Kunal

On Tue, Aug 11, 2020 at 8:45 AM Jeff Jirsa  wrote:

> The data probably came from either hints or commitlog replay.
>
> If you use `truncate` from CQL, it solves both of those concerns.
>
>
> On Tue, Aug 11, 2020 at 8:42 AM Kunal  wrote:
>
>> HI,
>>
>> We have a 3 nodes cassandra cluster and one of the table grew big, around
>> 2 gb while it was supposed to be few MBs. During nodetool repair, one of
>> the cassandra went down. Even after multiple restart, one of the node was
>> going down after coming up for few mins. We decided to truncate the table
>> by removing the corresponding sstable from the disk since truncating a
>> table from cqlsh needs all the nodes to be up which was not the case in our
>> env. After deleting sstable from disk on all the 3 nodes, we brought up
>> cassandra and all the nodes came up fine and dont see any issue , but we
>> observed the size of the sstable is~100MB which was bit strange and the
>> table has old rows (around 20K) from previous date, before removing the
>> rows were 500K. Not sure how the table has old records and sstable is of
>> ~100M even after removing the sstable.
>> Any ideas ? Any help to understand this would be appreciated.
>>
>> Regards,
>> Kunal
>>
>

-- 



Regards,
Kunal Vaid

Re: Records in table after deleting sstable manually

2020-08-11 Thread Jeff Jirsa

The data probably came from either hints or commitlog replay.

If you use `truncate` from CQL, it solves both of those concerns.


On Tue, Aug 11, 2020 at 8:42 AM Kunal  wrote:

> HI,
>
> We have a 3 nodes cassandra cluster and one of the table grew big, around
> 2 gb while it was supposed to be few MBs. During nodetool repair, one of
> the cassandra went down. Even after multiple restart, one of the node was
> going down after coming up for few mins. We decided to truncate the table
> by removing the corresponding sstable from the disk since truncating a
> table from cqlsh needs all the nodes to be up which was not the case in our
> env. After deleting sstable from disk on all the 3 nodes, we brought up
> cassandra and all the nodes came up fine and dont see any issue , but we
> observed the size of the sstable is~100MB which was bit strange and the
> table has old rows (around 20K) from previous date, before removing the
> rows were 500K. Not sure how the table has old records and sstable is of
> ~100M even after removing the sstable.
> Any ideas ? Any help to understand this would be appreciated.
>
> Regards,
> Kunal
>

Records in table after deleting sstable manually

2020-08-11 Thread Kunal

HI,

We have a 3 nodes cassandra cluster and one of the table grew big, around 2
gb while it was supposed to be few MBs. During nodetool repair, one of the
cassandra went down. Even after multiple restart, one of the node was going
down after coming up for few mins. We decided to truncate the table by
removing the corresponding sstable from the disk since truncating a table
from cqlsh needs all the nodes to be up which was not the case in our env.
After deleting sstable from disk on all the 3 nodes, we brought up
cassandra and all the nodes came up fine and dont see any issue , but we
observed the size of the sstable is~100MB which was bit strange and the
table has old rows (around 20K) from previous date, before removing the
rows were 500K. Not sure how the table has old records and sstable is of
~100M even after removing the sstable.
Any ideas ? Any help to understand this would be appreciated.

Regards,
Kunal

Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Nitan Kainth

Yeah, I meant the down node can’t participate in repairs


Regards,
Nitan
Cell: 510 449 9629

> On May 27, 2020, at 2:09 PM, Leon Zaruvinsky  wrote:
> 
> 
> Yep, Jeff is right, the intention would be to run a repair limited to the 
> available nodes.
> 
>> On Wed, May 27, 2020 at 2:59 PM Jeff Jirsa  wrote:
>> The "-hosts " flag tells cassandra to only compare trees/run repair on the 
>> hosts you specify, so if you have 3 replicas, but 1 replica is down, you can 
>> provide -hosts with the other two, and it will make sure those two are in 
>> sync (via merkle trees, etc), but ignore the third.
>> 
>> 
>> 
>>> On Wed, May 27, 2020 at 10:45 AM Nitan Kainth  wrote:
>>> Jeff,
>>> 
>>> If Cassandra is down how will it generate merkle tree to compare?
>>> 
>>> 
>>> Regards,
>>> Nitan
>>> Cell: 510 449 9629
>>> 
>>>>> On May 27, 2020, at 11:15 AM, Jeff Jirsa  wrote:
>>>>> 
>>>> 
>>>> You definitely can repair with a node down by passing `-hosts 
>>>> specific_hosts`
>>>> 
>>>>> On Wed, May 27, 2020 at 9:06 AM Nitan Kainth  
>>>>> wrote:
>>>>> I didn't get you Leon,
>>>>> 
>>>>> But, the simple thing is just to follow the steps and you will be fine. 
>>>>> You can't run the repair if the node is down.
>>>>> 
>>>>>> On Wed, May 27, 2020 at 10:34 AM Leon Zaruvinsky 
>>>>>>  wrote:
>>>>>> Hey Jeff/Nitan,
>>>>>> 
>>>>>> 1) this concern should not be a problem if the repair happens before the 
>>>>>> corrupted node is brought back online, right?
>>>>>> 2) in this case, is option (3) equivalent to replacing the node? where 
>>>>>> we repair the two live nodes and then bring up the third node with no 
>>>>>> data
>>>>>> 
>>>>>> Leon
>>>>>> 
>>>>>>> On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:
>>>>>>> There’s two problems with this approach if you need strict correctness 
>>>>>>> 
>>>>>>> 1) after you delete the sstable and before you repair you’ll violate 
>>>>>>> consistency, so you’ll potentially serve incorrect data for a while
>>>>>>> 
>>>>>>> 2) The sstable May have a tombstone past gc grace that’s shadowing data 
>>>>>>> in another sstable that’s not corrupt and deleting it may resurrect 
>>>>>>> that deleted data. 
>>>>>>> 
>>>>>>> The only strictly safe thing to do here, unfortunately, is to treat the 
>>>>>>> host as failed and rebuild it from it’s neighbors (and again being 
>>>>>>> pedantic here, that means stop the host, while it’s stopped repair the 
>>>>>>> surviving replicas, then bootstrap a replacement on top of the same 
>>>>>>> tokens)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky 
>>>>>>> >  wrote:
>>>>>>> > 
>>>>>>> > 
>>>>>>> > Hi all,
>>>>>>> > 
>>>>>>> > I'm looking to understand Cassandra's behavior in an sstable 
>>>>>>> > corruption scenario, and what the minimum amount of work is that 
>>>>>>> > needs to be done to remove a bad sstable file.
>>>>>>> > 
>>>>>>> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
>>>>>>> > SStable corruption exception on one node at 
>>>>>>> > keyspace1/table1/lb-1-big-Data.db
>>>>>>> > Sstablescrub does not work.
>>>>>>> > 
>>>>>>> > Is it safest to, after running a repair on the two live nodes,
>>>>>>> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
>>>>>>> > 2) Delete all files associated with that sstable (i.e., 
>>>>>>> > keyspace1/table1/lb-1-*),
>>>>>>> > 3) Delete all files under keyspace1/table1/, or
>>>>>>> > 4) Any of the above are the same from a correctness perspective.
>>>>>>> > 
>>>>>>> > Thanks,
>>>>>>> > Leon
>>>>>>> > 
>>>>>>> 
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>>>

Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Leon Zaruvinsky

Yep, Jeff is right, the intention would be to run a repair limited to the
available nodes.

On Wed, May 27, 2020 at 2:59 PM Jeff Jirsa  wrote:

> The "-hosts " flag tells cassandra to only compare trees/run repair on the
> hosts you specify, so if you have 3 replicas, but 1 replica is down, you
> can provide -hosts with the other two, and it will make sure those two are
> in sync (via merkle trees, etc), but ignore the third.
>
>
>
> On Wed, May 27, 2020 at 10:45 AM Nitan Kainth 
> wrote:
>
>> Jeff,
>>
>> If Cassandra is down how will it generate merkle tree to compare?
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On May 27, 2020, at 11:15 AM, Jeff Jirsa  wrote:
>>
>> 
>> You definitely can repair with a node down by passing `-hosts
>> specific_hosts`
>>
>> On Wed, May 27, 2020 at 9:06 AM Nitan Kainth 
>> wrote:
>>
>>> I didn't get you Leon,
>>>
>>> But, the simple thing is just to follow the steps and you will be fine.
>>> You can't run the repair if the node is down.
>>>
>>> On Wed, May 27, 2020 at 10:34 AM Leon Zaruvinsky <
>>> leonzaruvin...@gmail.com> wrote:
>>>
>>>> Hey Jeff/Nitan,
>>>>
>>>> 1) this concern should not be a problem if the repair happens before
>>>> the corrupted node is brought back online, right?
>>>> 2) in this case, is option (3) equivalent to replacing the node? where
>>>> we repair the two live nodes and then bring up the third node with no data
>>>>
>>>> Leon
>>>>
>>>> On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:
>>>>
>>>>> There’s two problems with this approach if you need strict correctness
>>>>>
>>>>> 1) after you delete the sstable and before you repair you’ll violate
>>>>> consistency, so you’ll potentially serve incorrect data for a while
>>>>>
>>>>> 2) The sstable May have a tombstone past gc grace that’s shadowing
>>>>> data in another sstable that’s not corrupt and deleting it may resurrect
>>>>> that deleted data.
>>>>>
>>>>> The only strictly safe thing to do here, unfortunately, is to treat
>>>>> the host as failed and rebuild it from it’s neighbors (and again being
>>>>> pedantic here, that means stop the host, while it’s stopped repair the
>>>>> surviving replicas, then bootstrap a replacement on top of the same 
>>>>> tokens)
>>>>>
>>>>>
>>>>>
>>>>> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky <
>>>>> leonzaruvin...@gmail.com> wrote:
>>>>> >
>>>>> > 
>>>>> > Hi all,
>>>>> >
>>>>> > I'm looking to understand Cassandra's behavior in an sstable
>>>>> corruption scenario, and what the minimum amount of work is that needs to
>>>>> be done to remove a bad sstable file.
>>>>> >
>>>>> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
>>>>> > SStable corruption exception on one node at
>>>>> keyspace1/table1/lb-1-big-Data.db
>>>>> > Sstablescrub does not work.
>>>>> >
>>>>> > Is it safest to, after running a repair on the two live nodes,
>>>>> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
>>>>> > 2) Delete all files associated with that sstable (i.e.,
>>>>> keyspace1/table1/lb-1-*),
>>>>> > 3) Delete all files under keyspace1/table1/, or
>>>>> > 4) Any of the above are the same from a correctness perspective.
>>>>> >
>>>>> > Thanks,
>>>>> > Leon
>>>>> >
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>
>>>>>

Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Jeff Jirsa

The "-hosts " flag tells cassandra to only compare trees/run repair on the
hosts you specify, so if you have 3 replicas, but 1 replica is down, you
can provide -hosts with the other two, and it will make sure those two are
in sync (via merkle trees, etc), but ignore the third.



On Wed, May 27, 2020 at 10:45 AM Nitan Kainth  wrote:

> Jeff,
>
> If Cassandra is down how will it generate merkle tree to compare?
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On May 27, 2020, at 11:15 AM, Jeff Jirsa  wrote:
>
> 
> You definitely can repair with a node down by passing `-hosts
> specific_hosts`
>
> On Wed, May 27, 2020 at 9:06 AM Nitan Kainth 
> wrote:
>
>> I didn't get you Leon,
>>
>> But, the simple thing is just to follow the steps and you will be fine.
>> You can't run the repair if the node is down.
>>
>> On Wed, May 27, 2020 at 10:34 AM Leon Zaruvinsky <
>> leonzaruvin...@gmail.com> wrote:
>>
>>> Hey Jeff/Nitan,
>>>
>>> 1) this concern should not be a problem if the repair happens before the
>>> corrupted node is brought back online, right?
>>> 2) in this case, is option (3) equivalent to replacing the node? where
>>> we repair the two live nodes and then bring up the third node with no data
>>>
>>> Leon
>>>
>>> On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:
>>>
>>>> There’s two problems with this approach if you need strict correctness
>>>>
>>>> 1) after you delete the sstable and before you repair you’ll violate
>>>> consistency, so you’ll potentially serve incorrect data for a while
>>>>
>>>> 2) The sstable May have a tombstone past gc grace that’s shadowing data
>>>> in another sstable that’s not corrupt and deleting it may resurrect that
>>>> deleted data.
>>>>
>>>> The only strictly safe thing to do here, unfortunately, is to treat the
>>>> host as failed and rebuild it from it’s neighbors (and again being pedantic
>>>> here, that means stop the host, while it’s stopped repair the surviving
>>>> replicas, then bootstrap a replacement on top of the same tokens)
>>>>
>>>>
>>>>
>>>> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky <
>>>> leonzaruvin...@gmail.com> wrote:
>>>> >
>>>> > 
>>>> > Hi all,
>>>> >
>>>> > I'm looking to understand Cassandra's behavior in an sstable
>>>> corruption scenario, and what the minimum amount of work is that needs to
>>>> be done to remove a bad sstable file.
>>>> >
>>>> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
>>>> > SStable corruption exception on one node at
>>>> keyspace1/table1/lb-1-big-Data.db
>>>> > Sstablescrub does not work.
>>>> >
>>>> > Is it safest to, after running a repair on the two live nodes,
>>>> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
>>>> > 2) Delete all files associated with that sstable (i.e.,
>>>> keyspace1/table1/lb-1-*),
>>>> > 3) Delete all files under keyspace1/table1/, or
>>>> > 4) Any of the above are the same from a correctness perspective.
>>>> >
>>>> > Thanks,
>>>> > Leon
>>>> >
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>
>>>>

Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Nitan Kainth

Jeff,

If Cassandra is down how will it generate merkle tree to compare?


Regards,
Nitan
Cell: 510 449 9629

> On May 27, 2020, at 11:15 AM, Jeff Jirsa  wrote:
> 
> 
> You definitely can repair with a node down by passing `-hosts specific_hosts`
> 
>> On Wed, May 27, 2020 at 9:06 AM Nitan Kainth  wrote:
>> I didn't get you Leon,
>> 
>> But, the simple thing is just to follow the steps and you will be fine. You 
>> can't run the repair if the node is down.
>> 
>>> On Wed, May 27, 2020 at 10:34 AM Leon Zaruvinsky  
>>> wrote:
>>> Hey Jeff/Nitan,
>>> 
>>> 1) this concern should not be a problem if the repair happens before the 
>>> corrupted node is brought back online, right?
>>> 2) in this case, is option (3) equivalent to replacing the node? where we 
>>> repair the two live nodes and then bring up the third node with no data
>>> 
>>> Leon
>>> 
>>>> On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:
>>>> There’s two problems with this approach if you need strict correctness 
>>>> 
>>>> 1) after you delete the sstable and before you repair you’ll violate 
>>>> consistency, so you’ll potentially serve incorrect data for a while
>>>> 
>>>> 2) The sstable May have a tombstone past gc grace that’s shadowing data in 
>>>> another sstable that’s not corrupt and deleting it may resurrect that 
>>>> deleted data. 
>>>> 
>>>> The only strictly safe thing to do here, unfortunately, is to treat the 
>>>> host as failed and rebuild it from it’s neighbors (and again being 
>>>> pedantic here, that means stop the host, while it’s stopped repair the 
>>>> surviving replicas, then bootstrap a replacement on top of the same tokens)
>>>> 
>>>> 
>>>> 
>>>> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky  
>>>> > wrote:
>>>> > 
>>>> > 
>>>> > Hi all,
>>>> > 
>>>> > I'm looking to understand Cassandra's behavior in an sstable corruption 
>>>> > scenario, and what the minimum amount of work is that needs to be done 
>>>> > to remove a bad sstable file.
>>>> > 
>>>> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
>>>> > SStable corruption exception on one node at 
>>>> > keyspace1/table1/lb-1-big-Data.db
>>>> > Sstablescrub does not work.
>>>> > 
>>>> > Is it safest to, after running a repair on the two live nodes,
>>>> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
>>>> > 2) Delete all files associated with that sstable (i.e., 
>>>> > keyspace1/table1/lb-1-*),
>>>> > 3) Delete all files under keyspace1/table1/, or
>>>> > 4) Any of the above are the same from a correctness perspective.
>>>> > 
>>>> > Thanks,
>>>> > Leon
>>>> > 
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>

Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Jeff Jirsa

You definitely can repair with a node down by passing `-hosts
specific_hosts`

On Wed, May 27, 2020 at 9:06 AM Nitan Kainth  wrote:

> I didn't get you Leon,
>
> But, the simple thing is just to follow the steps and you will be fine.
> You can't run the repair if the node is down.
>
> On Wed, May 27, 2020 at 10:34 AM Leon Zaruvinsky 
> wrote:
>
>> Hey Jeff/Nitan,
>>
>> 1) this concern should not be a problem if the repair happens before the
>> corrupted node is brought back online, right?
>> 2) in this case, is option (3) equivalent to replacing the node? where we
>> repair the two live nodes and then bring up the third node with no data
>>
>> Leon
>>
>> On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:
>>
>>> There’s two problems with this approach if you need strict correctness
>>>
>>> 1) after you delete the sstable and before you repair you’ll violate
>>> consistency, so you’ll potentially serve incorrect data for a while
>>>
>>> 2) The sstable May have a tombstone past gc grace that’s shadowing data
>>> in another sstable that’s not corrupt and deleting it may resurrect that
>>> deleted data.
>>>
>>> The only strictly safe thing to do here, unfortunately, is to treat the
>>> host as failed and rebuild it from it’s neighbors (and again being pedantic
>>> here, that means stop the host, while it’s stopped repair the surviving
>>> replicas, then bootstrap a replacement on top of the same tokens)
>>>
>>>
>>>
>>> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky 
>>> wrote:
>>> >
>>> > 
>>> > Hi all,
>>> >
>>> > I'm looking to understand Cassandra's behavior in an sstable
>>> corruption scenario, and what the minimum amount of work is that needs to
>>> be done to remove a bad sstable file.
>>> >
>>> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
>>> > SStable corruption exception on one node at
>>> keyspace1/table1/lb-1-big-Data.db
>>> > Sstablescrub does not work.
>>> >
>>> > Is it safest to, after running a repair on the two live nodes,
>>> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
>>> > 2) Delete all files associated with that sstable (i.e.,
>>> keyspace1/table1/lb-1-*),
>>> > 3) Delete all files under keyspace1/table1/, or
>>> > 4) Any of the above are the same from a correctness perspective.
>>> >
>>> > Thanks,
>>> > Leon
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>

Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Nitan Kainth

I didn't get you Leon,

But, the simple thing is just to follow the steps and you will be fine. You
can't run the repair if the node is down.

On Wed, May 27, 2020 at 10:34 AM Leon Zaruvinsky 
wrote:

> Hey Jeff/Nitan,
>
> 1) this concern should not be a problem if the repair happens before the
> corrupted node is brought back online, right?
> 2) in this case, is option (3) equivalent to replacing the node? where we
> repair the two live nodes and then bring up the third node with no data
>
> Leon
>
> On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:
>
>> There’s two problems with this approach if you need strict correctness
>>
>> 1) after you delete the sstable and before you repair you’ll violate
>> consistency, so you’ll potentially serve incorrect data for a while
>>
>> 2) The sstable May have a tombstone past gc grace that’s shadowing data
>> in another sstable that’s not corrupt and deleting it may resurrect that
>> deleted data.
>>
>> The only strictly safe thing to do here, unfortunately, is to treat the
>> host as failed and rebuild it from it’s neighbors (and again being pedantic
>> here, that means stop the host, while it’s stopped repair the surviving
>> replicas, then bootstrap a replacement on top of the same tokens)
>>
>>
>>
>> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky 
>> wrote:
>> >
>> > 
>> > Hi all,
>> >
>> > I'm looking to understand Cassandra's behavior in an sstable corruption
>> scenario, and what the minimum amount of work is that needs to be done to
>> remove a bad sstable file.
>> >
>> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
>> > SStable corruption exception on one node at
>> keyspace1/table1/lb-1-big-Data.db
>> > Sstablescrub does not work.
>> >
>> > Is it safest to, after running a repair on the two live nodes,
>> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
>> > 2) Delete all files associated with that sstable (i.e.,
>> keyspace1/table1/lb-1-*),
>> > 3) Delete all files under keyspace1/table1/, or
>> > 4) Any of the above are the same from a correctness perspective.
>> >
>> > Thanks,
>> > Leon
>> >
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Leon Zaruvinsky

Hey Jeff/Nitan,

1) this concern should not be a problem if the repair happens before the
corrupted node is brought back online, right?
2) in this case, is option (3) equivalent to replacing the node? where we
repair the two live nodes and then bring up the third node with no data

Leon

On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:

> There’s two problems with this approach if you need strict correctness
>
> 1) after you delete the sstable and before you repair you’ll violate
> consistency, so you’ll potentially serve incorrect data for a while
>
> 2) The sstable May have a tombstone past gc grace that’s shadowing data in
> another sstable that’s not corrupt and deleting it may resurrect that
> deleted data.
>
> The only strictly safe thing to do here, unfortunately, is to treat the
> host as failed and rebuild it from it’s neighbors (and again being pedantic
> here, that means stop the host, while it’s stopped repair the surviving
> replicas, then bootstrap a replacement on top of the same tokens)
>
>
>
> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky 
> wrote:
> >
> > 
> > Hi all,
> >
> > I'm looking to understand Cassandra's behavior in an sstable corruption
> scenario, and what the minimum amount of work is that needs to be done to
> remove a bad sstable file.
> >
> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
> > SStable corruption exception on one node at
> keyspace1/table1/lb-1-big-Data.db
> > Sstablescrub does not work.
> >
> > Is it safest to, after running a repair on the two live nodes,
> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
> > 2) Delete all files associated with that sstable (i.e.,
> keyspace1/table1/lb-1-*),
> > 3) Delete all files under keyspace1/table1/, or
> > 4) Any of the above are the same from a correctness perspective.
> >
> > Thanks,
> > Leon
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Is deleting live sstable safe in this scenario?

2020-05-26 Thread Jeff Jirsa

There’s two problems with this approach if you need strict correctness 

1) after you delete the sstable and before you repair you’ll violate 
consistency, so you’ll potentially serve incorrect data for a while

2) The sstable May have a tombstone past gc grace that’s shadowing data in 
another sstable that’s not corrupt and deleting it may resurrect that deleted 
data. 

The only strictly safe thing to do here, unfortunately, is to treat the host as 
failed and rebuild it from it’s neighbors (and again being pedantic here, that 
means stop the host, while it’s stopped repair the surviving replicas, then 
bootstrap a replacement on top of the same tokens)



> On May 26, 2020, at 4:46 PM, Leon Zaruvinsky  wrote:
> 
> 
> Hi all,
> 
> I'm looking to understand Cassandra's behavior in an sstable corruption 
> scenario, and what the minimum amount of work is that needs to be done to 
> remove a bad sstable file.
> 
> Consider: 3 node, RF 3 cluster, reads/writes at quorum
> SStable corruption exception on one node at keyspace1/table1/lb-1-big-Data.db
> Sstablescrub does not work.
> 
> Is it safest to, after running a repair on the two live nodes,
> 1) Delete only keyspace1/table1/lb-1-big-Data.db,
> 2) Delete all files associated with that sstable (i.e., 
> keyspace1/table1/lb-1-*),
> 3) Delete all files under keyspace1/table1/, or
> 4) Any of the above are the same from a correctness perspective.
> 
> Thanks,
> Leon
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Is deleting live sstable safe in this scenario?

2020-05-26 Thread Nitan Kainth

Stop the node
Delete as per option 2
Run repair 


Regards,
Nitan
Cell: 510 449 9629

> On May 26, 2020, at 6:46 PM, Leon Zaruvinsky  wrote:
> 
> 
> Hi all,
> 
> I'm looking to understand Cassandra's behavior in an sstable corruption 
> scenario, and what the minimum amount of work is that needs to be done to 
> remove a bad sstable file.
> 
> Consider: 3 node, RF 3 cluster, reads/writes at quorum
> SStable corruption exception on one node at keyspace1/table1/lb-1-big-Data.db
> Sstablescrub does not work.
> 
> Is it safest to, after running a repair on the two live nodes,
> 1) Delete only keyspace1/table1/lb-1-big-Data.db,
> 2) Delete all files associated with that sstable (i.e., 
> keyspace1/table1/lb-1-*),
> 3) Delete all files under keyspace1/table1/, or
> 4) Any of the above are the same from a correctness perspective.
> 
> Thanks,
> Leon
>

Is deleting live sstable safe in this scenario?

2020-05-26 Thread Leon Zaruvinsky

Hi all,

I'm looking to understand Cassandra's behavior in an sstable corruption
scenario, and what the minimum amount of work is that needs to be done to
remove a bad sstable file.

Consider: 3 node, RF 3 cluster, reads/writes at quorum
SStable corruption exception on one node at
keyspace1/table1/lb-1-big-Data.db
Sstablescrub does not work.

Is it safest to, after running a repair on the two live nodes,
1) Delete only keyspace1/table1/lb-1-big-Data.db,
2) Delete all files associated with that sstable (i.e.,
keyspace1/table1/lb-1-*),
3) Delete all files under keyspace1/table1/, or
4) Any of the above are the same from a correctness perspective.

Thanks,
Leon

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-25 Thread manish khandelwal

Thanks all for your support.

I executed the discussed process (barring repair, as table was read for
reporting only) and it worked fine in production.

Regards
Manish

>

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-14 Thread Jeff Jirsa

The risk is you violate consistency while you run repair

Assume you have three replicas for that range, a b c

At some point b misses a write, but it’s committed on a and c for quorum 
Now c has a corrupt sstable

You empty c and bring it back with no data and start repair

Then the app reads at quorum and selects b and c

You don’t see the data when you do a quorum read - this is technically incorrect

You could:

Stop the host with corrupt sstable
Run repair on the token range impacted using just the surviving hosts (this 
makes sure the two survivors have all of the data)
Clear all the data for that table on the host for the corrupt sstable (I think 
you can leave the commitlog in place but you probably want to flush and drain 
before you stop the host)
Then bring that host up and run repair


I think that’s strictly safe and you’re just rebuilding 5-6gb


> On Feb 13, 2020, at 11:23 PM, manish khandelwal 
>  wrote:
> 
> 
> Thanks Jeff for your response.
> 
> Do you see any risk in following approach
> 
> 1. Stop the node.
> 2. Remove all sstable files from  
> /var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33 
> directory.
> 3. Start the node.
> 4. Run full repair on this particular table
> 
> I wanted to go this way because this table is small (5-6 GB).  I would like 
> to avoid 2-3 days of streaming in case of replacing the whole host.
> 
> Regards
> Manish
> 
>> On Fri, Feb 14, 2020 at 12:28 PM Jeff Jirsa  wrote:
>> Agree this is both strictly possible and more common with LCS. The only 
>> thing that's strictly correct to do is treat every corrupt sstable exception 
>> as a failed host, and replace it just like you would a failed host.
>> 
>> 
>>> On Thu, Feb 13, 2020 at 10:55 PM manish khandelwal 
>>>  wrote:
>>> Thanks Erick
>>> 
>>> I would like to explain how data resurrection can take place with single 
>>> SSTable deletion.
>>> 
>>> Consider this case of table with Levelled Compaction Strategy
>>> 
>>> 1. Data A written a long time back.
>>> 2. Data A is deleted and tombstone is created.
>>> 3. After GC grace tombstone is purgeable.
>>> 4. Now the SSTable containing purgeable tombstone in one node is corruputed.
>>> 4. The node with corrupt SSTable cannot compact the data and purgeable 
>>> tombstone
>>> 6. From other two nodes Data A is removed after compaction.
>>> 7. Remove the corrupt SSTable from impacted node.
>>> 8. When you run repair Data A is copied to all the nodes.
>>> 
>>> This table in quesiton is using Levelled Compaction Strategy.
>>> 
>>> Regards
>>> Manish
>>> 
>>>> On Fri, Feb 14, 2020 at 12:00 PM Erick Ramirez 
>>>>  wrote:
>>>> The log shows that the the problem occurs when decompressing the SSTable 
>>>> but there's not much actionable info from it.
>>>> 
>>>>> I would like to know what will be "ordinary hammer" in this  case. Do you 
>>>>> want to suggest that  deleting only corrupt sstable file ( in this case 
>>>>> mc-1234-big-*.db) would be suffice ?
>>>> 
>>>> Exactly. I mean if it's just a one-off, why go through the trouble of 
>>>> blowing away all the files? :)
>>>> 
>>>>> I am afraid that this may cause data resurrection (I have prior 
>>>>> experience with same). 
>>>> 
>>>> Whoa! That's a long bow to draw. Sounds like there's more history to it.
>>>> 
>>>>> Note that i am not willing to run the entire node rebuild as it will take 
>>>>> lots of time due to presence of multiple big tables (I am keeping it as 
>>>>> my last option)
>>>> 
>>>> 
>>>> I wasn't going to suggest that at all. I didn't like the sledge hammer 
>>>> approach. I certainly wouldn't recommend bringing in a wrecking ball. 😁
>>>> 
>>>> Cheers!

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal

Thanks Jeff for your response.

Do you see any risk in following approach

1. Stop the node.
2. Remove all sstable files from
*/var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33 *
directory.
3. Start the node.
4. Run full repair on this particular table

I wanted to go this way because this table is small (5-6 GB).  I would like
to avoid 2-3 days of streaming in case of replacing the whole host.

Regards
Manish

On Fri, Feb 14, 2020 at 12:28 PM Jeff Jirsa  wrote:

> Agree this is both strictly possible and more common with LCS. The only
> thing that's strictly correct to do is treat every corrupt sstable
> exception as a failed host, and replace it just like you would a failed
> host.
>
>
> On Thu, Feb 13, 2020 at 10:55 PM manish khandelwal <
> manishkhandelwa...@gmail.com> wrote:
>
>> Thanks Erick
>>
>> I would like to explain how data resurrection can take place with single
>> SSTable deletion.
>>
>> Consider this case of table with Levelled Compaction Strategy
>>
>> 1. Data A written a long time back.
>> 2. Data A is deleted and tombstone is created.
>> 3. After GC grace tombstone is purgeable.
>> 4. Now the SSTable containing purgeable tombstone in one node is
>> corruputed.
>> 4. The node with corrupt SSTable cannot compact the data and purgeable
>> tombstone
>> 6. From other two nodes Data A is removed after compaction.
>> 7. Remove the corrupt SSTable from impacted node.
>> 8. When you run repair Data A is copied to all the nodes.
>>
>> This table in quesiton is using Levelled Compaction Strategy.
>>
>> Regards
>> Manish
>>
>> On Fri, Feb 14, 2020 at 12:00 PM Erick Ramirez <
>> erick.rami...@datastax.com> wrote:
>>
>>> The log shows that the the problem occurs when decompressing the SSTable
>>> but there's not much actionable info from it.
>>>
>>> I would like to know what will be "ordinary hammer" in this  case. Do
>>>> you want to suggest that  deleting only corrupt sstable file ( in this case
>>>> mc-1234-big-*.db) would be suffice ?
>>>
>>>
>>> Exactly. I mean if it's just a one-off, why go through the trouble of
>>> blowing away all the files? :)
>>>
>>> I am afraid that this may cause data resurrection (I have prior
>>>> experience with same).
>>>
>>>
>>> Whoa! That's a long bow to draw. Sounds like there's more history to it.
>>>
>>> Note that i am not willing to run the entire node rebuild as it will
>>>> take lots of time due to presence of multiple big tables (I am keeping it
>>>> as my last option)
>>>
>>>
>>> I wasn't going to suggest that at all. I didn't like the sledge hammer
>>> approach. I certainly wouldn't recommend bringing in a wrecking ball. 😁
>>>
>>> Cheers!
>>>
>>

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Jeff Jirsa

Agree this is both strictly possible and more common with LCS. The only
thing that's strictly correct to do is treat every corrupt sstable
exception as a failed host, and replace it just like you would a failed
host.


On Thu, Feb 13, 2020 at 10:55 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Thanks Erick
>
> I would like to explain how data resurrection can take place with single
> SSTable deletion.
>
> Consider this case of table with Levelled Compaction Strategy
>
> 1. Data A written a long time back.
> 2. Data A is deleted and tombstone is created.
> 3. After GC grace tombstone is purgeable.
> 4. Now the SSTable containing purgeable tombstone in one node is
> corruputed.
> 4. The node with corrupt SSTable cannot compact the data and purgeable
> tombstone
> 6. From other two nodes Data A is removed after compaction.
> 7. Remove the corrupt SSTable from impacted node.
> 8. When you run repair Data A is copied to all the nodes.
>
> This table in quesiton is using Levelled Compaction Strategy.
>
> Regards
> Manish
>
> On Fri, Feb 14, 2020 at 12:00 PM Erick Ramirez 
> wrote:
>
>> The log shows that the the problem occurs when decompressing the SSTable
>> but there's not much actionable info from it.
>>
>> I would like to know what will be "ordinary hammer" in this  case. Do you
>>> want to suggest that  deleting only corrupt sstable file ( in this case
>>> mc-1234-big-*.db) would be suffice ?
>>
>>
>> Exactly. I mean if it's just a one-off, why go through the trouble of
>> blowing away all the files? :)
>>
>> I am afraid that this may cause data resurrection (I have prior
>>> experience with same).
>>
>>
>> Whoa! That's a long bow to draw. Sounds like there's more history to it.
>>
>> Note that i am not willing to run the entire node rebuild as it will take
>>> lots of time due to presence of multiple big tables (I am keeping it as my
>>> last option)
>>
>>
>> I wasn't going to suggest that at all. I didn't like the sledge hammer
>> approach. I certainly wouldn't recommend bringing in a wrecking ball. 😁
>>
>> Cheers!
>>
>

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal

Thanks Erick

I would like to explain how data resurrection can take place with single
SSTable deletion.

Consider this case of table with Levelled Compaction Strategy

1. Data A written a long time back.
2. Data A is deleted and tombstone is created.
3. After GC grace tombstone is purgeable.
4. Now the SSTable containing purgeable tombstone in one node is corruputed.
4. The node with corrupt SSTable cannot compact the data and purgeable
tombstone
6. From other two nodes Data A is removed after compaction.
7. Remove the corrupt SSTable from impacted node.
8. When you run repair Data A is copied to all the nodes.

This table in quesiton is using Levelled Compaction Strategy.

Regards
Manish

On Fri, Feb 14, 2020 at 12:00 PM Erick Ramirez 
wrote:

> The log shows that the the problem occurs when decompressing the SSTable
> but there's not much actionable info from it.
>
> I would like to know what will be "ordinary hammer" in this  case. Do you
>> want to suggest that  deleting only corrupt sstable file ( in this case
>> mc-1234-big-*.db) would be suffice ?
>
>
> Exactly. I mean if it's just a one-off, why go through the trouble of
> blowing away all the files? :)
>
> I am afraid that this may cause data resurrection (I have prior experience
>> with same).
>
>
> Whoa! That's a long bow to draw. Sounds like there's more history to it.
>
> Note that i am not willing to run the entire node rebuild as it will take
>> lots of time due to presence of multiple big tables (I am keeping it as my
>> last option)
>
>
> I wasn't going to suggest that at all. I didn't like the sledge hammer
> approach. I certainly wouldn't recommend bringing in a wrecking ball. 😁
>
> Cheers!
>

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez

The log shows that the the problem occurs when decompressing the SSTable
but there's not much actionable info from it.

I would like to know what will be "ordinary hammer" in this  case. Do you
> want to suggest that  deleting only corrupt sstable file ( in this case
> mc-1234-big-*.db) would be suffice ?


Exactly. I mean if it's just a one-off, why go through the trouble of
blowing away all the files? :)

I am afraid that this may cause data resurrection (I have prior experience
> with same).


Whoa! That's a long bow to draw. Sounds like there's more history to it.

Note that i am not willing to run the entire node rebuild as it will take
> lots of time due to presence of multiple big tables (I am keeping it as my
> last option)


I wasn't going to suggest that at all. I didn't like the sledge hammer
approach. I certainly wouldn't recommend bringing in a wrecking ball. 😁

Cheers!

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal

Hi Erick

Thanks for your quick response. I have attached the full stacktrace which
show exception during validation phase of table repair.

I would like to know what will be "ordinary hammer" in this  case. Do you
want to suggest that  deleting only corrupt sstable file ( in this case
*mc-1234-big-*.db*) would be suffice ? I am afraid that this may cause data
resurrection (I have prior experience with same).
Or you are pointing towards running scrub ? Kindly explain.

Note that i am not willing to run the entire node rebuild as it will take
lots of time due to presence of multiple big tables (I am keeping it as my
last option)

Regards
Manish

On Fri, Feb 14, 2020 at 11:11 AM Erick Ramirez 
wrote:

> It will achieve the outcome you are after but I doubt anyone would
> recommend that approach. It's like using a sledgehammer when an ordinary
> hammer would suffice. And if you were hitting some bug then you'd run into
> the same problem anyway.
>
> Can you post the full stack trace? It might provide us some clues as to
> why you ran into the problem. Cheers!
>

error.log
Description: Binary data

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez

It will achieve the outcome you are after but I doubt anyone would
recommend that approach. It's like using a sledgehammer when an ordinary
hammer would suffice. And if you were hitting some bug then you'd run into
the same problem anyway.

Can you post the full stack trace? It might provide us some clues as to why
you ran into the problem. Cheers!

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal

Hi Eric

Thanks for reply.

Reason for corruption is unknown to me. I just found the corrupt table when
scheduled repair failed with logs showing






*ERROR [ValidationExecutor:16] 2020-01-21 19:13:18,123
CassandraDaemon.java:228 - Exception in thread
Thread[ValidationExecutor:16,1,main]org.apache.cassandra.io.sstable.CorruptSSTableException:
Corrupted:
/var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33/mc-1234-big-Data.db
 at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:134)
~[apache-cassandra-3.11.2.jar:3.11.2]   at
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
~[apache-cassandra-3.11.2.jar:3.11.2]   at
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
~[apache-cassandra-3.11.2.jar:3.11.2]   at
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.2.jar:3.11.2]*


Regarding you question about removing all SSTable files of a table(column
family). I want quick recovery without any inconsistency. Since I have 3
node cluster with RF=3, my expectation is that repair would stream the data
from other two nodes. I just wanted to know is it correct to do this way

1. Stop the node.
2. Remove all sstable files from
*/var/lib/cassandra/data/keyspace/tablename-23dfadf32adf33d33s333s33s3s33
*directory.
3. Start the node.
4. Run full repair on this particular table

Regards
Manish







On Fri, Feb 14, 2020 at 4:44 AM Erick Ramirez 
wrote:

> You need to stop C* in order to run the offline sstable scrub utility.
> That's why it's referred to as "offline". :)
>
> Do you have any idea on what caused the corruption? It's highly unusual
> that you're thinking of removing all the files for just one table.
> Typically if the corruption was a result of a faulty disk or hardware
> failure, it wouldn't be isolated to just one table. If you provide a bit
> more background information, we would be able to give you a better
> response. Cheers!
>
> Erick Ramirez  |  Developer Relations
>
> erick.rami...@datastax.com | datastax.com <http://www.datastax.com>
> <https://www.linkedin.com/company/datastax>
> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>
> <https://www.datastax.com/accelerate>
>
>
>
> On Fri, 14 Feb 2020 at 04:39, manish khandelwal <
> manishkhandelwa...@gmail.com> wrote:
>
>> Hi
>>
>> I see a corrupt SSTable in one of my keyspace table on one node. Cluster
>> is 3 nodes with replication 3. Cassandra version is 3.11.2.
>> I am thinking on following lines to resolve the corrupt SSTable issue.
>> 1. Run nodetool scrub.
>> 2. If step 1 fails, run offline sstabablescrub.
>> 3. If step 2 fails, stop node. Remove all SSTables from problematic
>> table.Start the node and run full repair on table.I am removing all
>> SSTABLES of the particular table so as to avoid resurrection of data or any
>> data corruption.
>>
>> I would like to know are there any side effects of executing step 3 if
>> step 1 and step 2 fails.
>>
>> Regards
>> Manish
>>
>>
>>
>>
>>

Re: Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread Erick Ramirez

You need to stop C* in order to run the offline sstable scrub utility.
That's why it's referred to as "offline". :)

Do you have any idea on what caused the corruption? It's highly unusual
that you're thinking of removing all the files for just one table.
Typically if the corruption was a result of a faulty disk or hardware
failure, it wouldn't be isolated to just one table. If you provide a bit
more background information, we would be able to give you a better
response. Cheers!

Erick Ramirez  |  Developer Relations

erick.rami...@datastax.com | datastax.com <http://www.datastax.com>
<https://www.linkedin.com/company/datastax>
<https://www.facebook.com/datastax> <https://twitter.com/datastax>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

<https://www.datastax.com/accelerate>

On Fri, 14 Feb 2020 at 04:39, manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Hi
>
> I see a corrupt SSTable in one of my keyspace table on one node. Cluster
> is 3 nodes with replication 3. Cassandra version is 3.11.2.
> I am thinking on following lines to resolve the corrupt SSTable issue.
> 1. Run nodetool scrub.
> 2. If step 1 fails, run offline sstabablescrub.
> 3. If step 2 fails, stop node. Remove all SSTables from problematic
> table.Start the node and run full repair on table.I am removing all
> SSTABLES of the particular table so as to avoid resurrection of data or any
> data corruption.
>
> I would like to know are there any side effects of executing step 3 if
> step 1 and step 2 fails.
>
> Regards
> Manish
>
>
>
>
>

Corrupt SSTable Cassandra 3.11.2

2020-02-13 Thread manish khandelwal

Hi

I see a corrupt SSTable in one of my keyspace table on one node. Cluster is
3 nodes with replication 3. Cassandra version is 3.11.2.
I am thinking on following lines to resolve the corrupt SSTable issue.
1. Run nodetool scrub.
2. If step 1 fails, run offline sstabablescrub.
3. If step 2 fails, stop node. Remove all SSTables from problematic
table.Start the node and run full repair on table.I am removing all
SSTABLES of the particular table so as to avoid resurrection of data or any
data corruption.

I would like to know are there any side effects of executing step 3 if step
1 and step 2 fails.

Regards
Manish

Re: Find large partition https://github.com/tolbertam/sstable-tools

2019-11-22 Thread Sergio Bilello

Thanks! I will look into it

On 2019/11/22 19:22:15, Jeff Jirsa  wrote: 
> Brian Gallew has a very simple script that does something similar:
> https://github.com/BrianGallew/cassandra_tools/blob/master/poison_pill_tester
> 
> You can also search the logs for messages about writing large partitions
> during compaction.
> 
> 
> 
> 
> 
> On Thu, Nov 21, 2019 at 6:33 PM Sergio Bilello 
> wrote:
> 
> > Hi guys!
> > Just for curiosity do you know anything beside
> > https://github.com/tolbertam/sstable-tools to find a large partition?
> > Best,
> >
> > Sergio
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
> >
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Find large partition https://github.com/tolbertam/sstable-tools

2019-11-22 Thread Jeff Jirsa

Brian Gallew has a very simple script that does something similar:
https://github.com/BrianGallew/cassandra_tools/blob/master/poison_pill_tester

You can also search the logs for messages about writing large partitions
during compaction.





On Thu, Nov 21, 2019 at 6:33 PM Sergio Bilello 
wrote:

> Hi guys!
> Just for curiosity do you know anything beside
> https://github.com/tolbertam/sstable-tools to find a large partition?
> Best,
>
> Sergio
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Find large partition https://github.com/tolbertam/sstable-tools

2019-11-22 Thread Jeff Jirsa

It's apache licensed:
https://github.com/instaclustr/cassandra-sstable-tools/blob/cassandra-3.11/LICENSE



On Fri, Nov 22, 2019 at 12:06 AM Ahmed Eljami 
wrote:

> I found this project on instaclustr github  but I dont have any idea about
> license:
>
>
> https://github.com/instaclustr/cassandra-sstable-tools/blob/cassandra-3.11/README.md
>
>
>
> Le ven. 22 nov. 2019 à 03:33, Sergio Bilello 
> a écrit :
>
>> Hi guys!
>> Just for curiosity do you know anything beside
>> https://github.com/tolbertam/sstable-tools to find a large partition?
>> Best,
>>
>> Sergio
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

Re: Find large partition https://github.com/tolbertam/sstable-tools

2019-11-22 Thread Ahmed Eljami

I found this project on instaclustr github  but I dont have any idea about
license:

https://github.com/instaclustr/cassandra-sstable-tools/blob/cassandra-3.11/README.md



Le ven. 22 nov. 2019 à 03:33, Sergio Bilello  a
écrit :

> Hi guys!
> Just for curiosity do you know anything beside
> https://github.com/tolbertam/sstable-tools to find a large partition?
> Best,
>
> Sergio
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Find large partition https://github.com/tolbertam/sstable-tools

2019-11-21 Thread Sergio Bilello

Hi guys!
Just for curiosity do you know anything beside 
https://github.com/tolbertam/sstable-tools to find a large partition?
Best,

Sergio

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Breaking up major compacted Sstable with TWCS

2019-07-15 Thread Jeff Jirsa

No 

Sent from my iPhone

> On Jul 15, 2019, at 9:14 AM, Carl Mueller 
>  wrote:
> 
> Does sstablesplit properly restore the time-bucket the data? That appears to 
> be size-based only.
> 
>> On Fri, Jul 12, 2019 at 5:55 AM Rhys Campbell 
>>  wrote:
>> https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSSTableSplit.html
>> 
>> Leon Zaruvinsky  schrieb am Fr., 12. Juli 2019, 
>> 00:06:
>>> Hi,
>>> 
>>> We are switching a table to run using TWCS. However, after running the 
>>> alter statement, we ran a major compaction without understanding the 
>>> implications.
>>> 
>>> Now, while new sstables are properly being created according to the time 
>>> window, there is a giant sstable sitting around waiting for expiration.
>>> 
>>> Is there a way we can break it up again?  Running the alter statement again 
>>> doesn’t seem to be touching it.
>>> 
>>> Thanks,
>>> Leon
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>

Re: Breaking up major compacted Sstable with TWCS

2019-07-15 Thread Carl Mueller

Does sstablesplit properly restore the time-bucket the data? That appears
to be size-based only.

On Fri, Jul 12, 2019 at 5:55 AM Rhys Campbell
 wrote:

>
> https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSSTableSplit.html
>
> Leon Zaruvinsky  schrieb am Fr., 12. Juli 2019,
> 00:06:
>
>> Hi,
>>
>> We are switching a table to run using TWCS. However, after running the
>> alter statement, we ran a major compaction without understanding the
>> implications.
>>
>> Now, while new sstables are properly being created according to the time
>> window, there is a giant sstable sitting around waiting for expiration.
>>
>> Is there a way we can break it up again?  Running the alter statement
>> again doesn’t seem to be touching it.
>>
>> Thanks,
>> Leon
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

Re: Breaking up major compacted Sstable with TWCS

2019-07-12 Thread Rhys Campbell

https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/toolsSStables/toolsSSTableSplit.html

Leon Zaruvinsky  schrieb am Fr., 12. Juli 2019,
00:06:

> Hi,
>
> We are switching a table to run using TWCS. However, after running the
> alter statement, we ran a major compaction without understanding the
> implications.
>
> Now, while new sstables are properly being created according to the time
> window, there is a giant sstable sitting around waiting for expiration.
>
> Is there a way we can break it up again?  Running the alter statement
> again doesn’t seem to be touching it.
>
> Thanks,
> Leon
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Breaking up major compacted Sstable with TWCS

2019-07-11 Thread Leon Zaruvinsky

Hi,

We are switching a table to run using TWCS. However, after running the alter 
statement, we ran a major compaction without understanding the implications.

Now, while new sstables are properly being created according to the time 
window, there is a giant sstable sitting around waiting for expiration.

Is there a way we can break it up again?  Running the alter statement again 
doesn’t seem to be touching it.

Thanks,
Leon

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Can sstable corruption cause schema mismatch?

2019-05-29 Thread Nitan Kainth

All ports are open.
We tried rolling restart and full cluster down and start one mode at a time.
Changes down were:
Storage addition
Ddl for column drop and recreate

Schema version is same for few nodes and few shows unavailable.

Network has been verified in detail and no severe packet drops.


Regards,
Nitan
Cell: 510 449 9629

> On May 29, 2019, at 10:04 AM, Alain RODRIGUEZ  wrote:
> 
> Ideas that come mind are:
> 
> - Rolling restart of the cluster
> - Use of 'nodetool resetlocalschema'  --> function name speaks for itself. 
> Note that this is to be ran on each node you think is having schema issues
> - Are all nodes showing a schema version showing the same one?
> - Port not fully open across all nodes?
> - Anything in the logs?
> 
> Do you know what triggered this situation in the first place?
> 
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
>> Le mar. 28 mai 2019 à 18:28, Nitan Kainth  a écrit :
>> Thank you Alain.
>> 
>> Nodetool describecluster shows some nodes unreachable, different output from 
>> each node. 
>> Node1 can see all 4 nodes up.
>> Node 2 says node 4 and node 5 unreachable
>> Node 3 complains about node node 2 and node 1
>> 
>> Nodetool status shows all nodes up and read writes are working for most most 
>> operations. 
>> 
>> Network looks good. Any other ideas?
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629
>> 
>>> On May 28, 2019, at 11:21 AM, Alain RODRIGUEZ  wrote:
>>> 
>>> Hello Nitan,
>>> 
>>>> 1. Can sstable corruption in application tables cause schema mismatch?
>>> 
>>> I would say it should not. I could imagine in the case that the corrupted 
>>> table hits some 'system' keyspace sstable. If not I don' see how corrupted 
>>> data can impact the schema on the node.
>>>  
>>>> 2. Do we need to disable repair while adding storage while Cassandra is 
>>>> down?
>>> 
>>> I think you don't have to, but that it's a good idea.
>>> Repairs would fail as soon/long as you have a node down that should be 
>>> involved (I think there is an option to change that behaviour now).
>>> Anyway, stopping repair and restarting it when all nodes are probably 
>>> allows you a better understanding/control of what's going on. Also, it 
>>> reduces the load in time of troubles or maintenance, when the cluster is 
>>> somewhat weaker.
>>> 
>>> C*heers,
>>> ---
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France / Spain
>>> 
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>> 
>>> 
>>> 
>>>> Le mar. 28 mai 2019 à 17:13, Nitan Kainth  a écrit :
>>>> Hi,
>>>> 
>>>> Two questions:
>>>> 1. Can sstable corruption in application tables cause schema mismatch?
>>>> 2. Do we need to disable repair while adding storage while Cassandra is 
>>>> down?
>>>> 
>>>> 
>>>> Regards,
>>>> Nitan
>>>> Cell: 510 449 9629

Re: Can sstable corruption cause schema mismatch?

2019-05-29 Thread Alain RODRIGUEZ

Ideas that come mind are:

- Rolling restart of the cluster
- Use of 'nodetool resetlocalschema'  --> function name speaks for itself.
Note that this is to be ran on each node you think is having schema issues
- Are all nodes showing a schema version showing the same one?
- Port not fully open across all nodes?
- Anything in the logs?

Do you know what triggered this situation in the first place?

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le mar. 28 mai 2019 à 18:28, Nitan Kainth  a écrit :

> Thank you Alain.
>
> Nodetool describecluster shows some nodes unreachable, different output
> from each node.
> Node1 can see all 4 nodes up.
> Node 2 says node 4 and node 5 unreachable
> Node 3 complains about node node 2 and node 1
>
> Nodetool status shows all nodes up and read writes are working for most
> most operations.
>
> Network looks good. Any other ideas?
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On May 28, 2019, at 11:21 AM, Alain RODRIGUEZ  wrote:
>
> Hello Nitan,
>
> 1. Can sstable corruption in application tables cause schema mismatch?
>>
>
> I would say it should not. I could imagine in the case that the corrupted
> table hits some 'system' keyspace sstable. If not I don' see how corrupted
> data can impact the schema on the node.
>
>
>> 2. Do we need to disable repair while adding storage while Cassandra is
>> down?
>>
>
> I think you don't have to, but that it's a good idea.
> Repairs would fail as soon/long as you have a node down that should be
> involved (I think there is an option to change that behaviour now).
> Anyway, stopping repair and restarting it when all nodes are probably
> allows you a better understanding/control of what's going on. Also, it
> reduces the load in time of troubles or maintenance, when the cluster is
> somewhat weaker.
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> Le mar. 28 mai 2019 à 17:13, Nitan Kainth  a
> écrit :
>
>> Hi,
>>
>> Two questions:
>> 1. Can sstable corruption in application tables cause schema mismatch?
>> 2. Do we need to disable repair while adding storage while Cassandra is
>> down?
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>

Re: Can sstable corruption cause schema mismatch?

2019-05-28 Thread Nitan Kainth

Thank you Alain.

Nodetool describecluster shows some nodes unreachable, different output from 
each node. 
Node1 can see all 4 nodes up.
Node 2 says node 4 and node 5 unreachable
Node 3 complains about node node 2 and node 1

Nodetool status shows all nodes up and read writes are working for most most 
operations. 

Network looks good. Any other ideas?


Regards,
Nitan
Cell: 510 449 9629

> On May 28, 2019, at 11:21 AM, Alain RODRIGUEZ  wrote:
> 
> Hello Nitan,
> 
>> 1. Can sstable corruption in application tables cause schema mismatch?
> 
> I would say it should not. I could imagine in the case that the corrupted 
> table hits some 'system' keyspace sstable. If not I don' see how corrupted 
> data can impact the schema on the node.
>  
>> 2. Do we need to disable repair while adding storage while Cassandra is down?
> 
> I think you don't have to, but that it's a good idea.
> Repairs would fail as soon/long as you have a node down that should be 
> involved (I think there is an option to change that behaviour now).
> Anyway, stopping repair and restarting it when all nodes are probably allows 
> you a better understanding/control of what's going on. Also, it reduces the 
> load in time of troubles or maintenance, when the cluster is somewhat weaker.
> 
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> 
> 
>> Le mar. 28 mai 2019 à 17:13, Nitan Kainth  a écrit :
>> Hi,
>> 
>> Two questions:
>> 1. Can sstable corruption in application tables cause schema mismatch?
>> 2. Do we need to disable repair while adding storage while Cassandra is down?
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629

Re: Can sstable corruption cause schema mismatch?

2019-05-28 Thread Alain RODRIGUEZ

Hello Nitan,

1. Can sstable corruption in application tables cause schema mismatch?
>

I would say it should not. I could imagine in the case that the corrupted
table hits some 'system' keyspace sstable. If not I don' see how corrupted
data can impact the schema on the node.


> 2. Do we need to disable repair while adding storage while Cassandra is
> down?
>

I think you don't have to, but that it's a good idea.
Repairs would fail as soon/long as you have a node down that should be
involved (I think there is an option to change that behaviour now).
Anyway, stopping repair and restarting it when all nodes are probably
allows you a better understanding/control of what's going on. Also, it
reduces the load in time of troubles or maintenance, when the cluster is
somewhat weaker.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



Le mar. 28 mai 2019 à 17:13, Nitan Kainth  a écrit :

> Hi,
>
> Two questions:
> 1. Can sstable corruption in application tables cause schema mismatch?
> 2. Do we need to disable repair while adding storage while Cassandra is
> down?
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>

Can sstable corruption cause schema mismatch?

2019-05-28 Thread Nitan Kainth

Hi,

Two questions:
1. Can sstable corruption in application tables cause schema mismatch?
2. Do we need to disable repair while adding storage while Cassandra is down?


Regards,
Nitan
Cell: 510 449 9629

Re: SStable format change in 3.0.18 ?

2019-04-04 Thread Léo FERLIN SUTTON

Thank you guys !



On Thu, Apr 4, 2019 at 5:49 PM Dmitry Saprykin 
wrote:

> Hello,
>
> I think it was done in the following issue: Sstable min/max metadata can
> cause data loss (CASSANDRA-14861)
>
>
> https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f
> src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
> <https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f#diff-62875acfa21fb24c7167a0a2d761780e>
>  :
> 129
> // md (3.0.18, 3.11.4): corrected sstable min/max clustering
>
> Dmitry Saprykin
>
> On Thu, Apr 4, 2019 at 11:23 AM Léo FERLIN SUTTON
>  wrote:
>
>> Hello !
>>
>> I have noticed something since I upgraded to cassandra 3.0.18.
>>
>> Before all my Sstable used to be named this way :
>> ```
>> mc-130817-big-CompressionInfo.db
>> mc-130817-big-Data.db
>> mc-130817-big-Digest.crc32
>> mc-130817-big-Filter.db
>> mc-130817-big-Index.db
>> mc-130817-big-Statistics.db
>> mc-130817-big-Summary.db
>> mc-130817-big-TOC.txt
>> ```
>>
>> Since the update I have a new type of files :
>>
>> ```
>> md-20631-big-Statistics.db
>> md-20631-big-Filter.db
>> md-20631-big-TOC.txt
>> md-20631-big-Summary.db
>> md-20631-big-CompressionInfo.db
>> md-20631-big-Data.db
>> md-20631-big-Digest.crc32
>> md-20631-big-Index.db
>> ```
>>
>> Starting with `md` mixed with my the ancient format starting with "mc".
>>
>> Other than the name these files seems identical to regular Sstables. I
>> haven't seen any information about this in the changelog :
>> ``` (lines with "sstables" from the changelog)
>>
>>  * Fix handling of collection tombstones for dropped columns from legacy 
>> sstables (CASSANDRA-14912)
>>  * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 
>> (CASSANDRA-14873)
>>  * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
>>  * Dropped columns can cause reverse sstable iteration to return prematurely 
>> (CASSANDRA-14838)
>>  * Legacy sstables with  multi block range tombstones create invalid bound 
>> sequences (CASSANDRA-14823)
>>  * Handle failures in parallelAllSSTableOperation 
>> (cleanup/upgradesstables/etc) (CASSANDRA-14657)
>>  * sstableloader should use discovered broadcast address to connect 
>> intra-cluster (CASSANDRA-14522)
>>
>> ```
>>
>> I am asking because I have read online that : The "mc" is the SSTable
>> file version. This changes whenever a new release of Cassandra changes
>> anything in the way data is stored in any of the files listed in the table
>> above.
>> https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
>>
>> Does anyone have any information about this ?
>>
>> Regards,
>>
>> Leo
>>
>

Re: SStable format change in 3.0.18 ?

2019-04-04 Thread Dmitry Saprykin

Hello,

I think it was done in the following issue: Sstable min/max metadata can
cause data loss (CASSANDRA-14861)

https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f
src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
<https://github.com/apache/cassandra/commit/d60c78358b6f599a83f3c112bfd6ce72c1129c9f#diff-62875acfa21fb24c7167a0a2d761780e>
:
129
// md (3.0.18, 3.11.4): corrected sstable min/max clustering

Dmitry Saprykin

On Thu, Apr 4, 2019 at 11:23 AM Léo FERLIN SUTTON
 wrote:

> Hello !
>
> I have noticed something since I upgraded to cassandra 3.0.18.
>
> Before all my Sstable used to be named this way :
> ```
> mc-130817-big-CompressionInfo.db
> mc-130817-big-Data.db
> mc-130817-big-Digest.crc32
> mc-130817-big-Filter.db
> mc-130817-big-Index.db
> mc-130817-big-Statistics.db
> mc-130817-big-Summary.db
> mc-130817-big-TOC.txt
> ```
>
> Since the update I have a new type of files :
>
> ```
> md-20631-big-Statistics.db
> md-20631-big-Filter.db
> md-20631-big-TOC.txt
> md-20631-big-Summary.db
> md-20631-big-CompressionInfo.db
> md-20631-big-Data.db
> md-20631-big-Digest.crc32
> md-20631-big-Index.db
> ```
>
> Starting with `md` mixed with my the ancient format starting with "mc".
>
> Other than the name these files seems identical to regular Sstables. I
> haven't seen any information about this in the changelog :
> ``` (lines with "sstables" from the changelog)
>
>  * Fix handling of collection tombstones for dropped columns from legacy 
> sstables (CASSANDRA-14912)
>  * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 
> (CASSANDRA-14873)
>  * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
>  * Dropped columns can cause reverse sstable iteration to return prematurely 
> (CASSANDRA-14838)
>  * Legacy sstables with  multi block range tombstones create invalid bound 
> sequences (CASSANDRA-14823)
>  * Handle failures in parallelAllSSTableOperation 
> (cleanup/upgradesstables/etc) (CASSANDRA-14657)
>  * sstableloader should use discovered broadcast address to connect 
> intra-cluster (CASSANDRA-14522)
>
> ```
>
> I am asking because I have read online that : The "mc" is the SSTable file
> version. This changes whenever a new release of Cassandra changes anything
> in the way data is stored in any of the files listed in the table above.
> https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
>
> Does anyone have any information about this ?
>
> Regards,
>
> Leo
>

Re: SStable format change in 3.0.18 ?

2019-04-04 Thread Jeff Jirsa

This is CASSANDRA-14861



-- 
Jeff Jirsa


> On Apr 4, 2019, at 8:23 AM, Léo FERLIN SUTTON  
> wrote:
> 
> Hello !
> 
> I have noticed something since I upgraded to cassandra 3.0.18.
> 
> Before all my Sstable used to be named this way :
> ```
> mc-130817-big-CompressionInfo.db
> mc-130817-big-Data.db
> mc-130817-big-Digest.crc32
> mc-130817-big-Filter.db
> mc-130817-big-Index.db
> mc-130817-big-Statistics.db
> mc-130817-big-Summary.db
> mc-130817-big-TOC.txt
> ```
> 
> Since the update I have a new type of files :
> 
> ```
> md-20631-big-Statistics.db
> md-20631-big-Filter.db
> md-20631-big-TOC.txt
> md-20631-big-Summary.db
> md-20631-big-CompressionInfo.db
> md-20631-big-Data.db
> md-20631-big-Digest.crc32
> md-20631-big-Index.db
> ```
> 
> Starting with `md` mixed with my the ancient format starting with "mc".
> 
> Other than the name these files seems identical to regular Sstables. I 
> haven't seen any information about this in the changelog :
> ``` (lines with "sstables" from the changelog)
>  * Fix handling of collection tombstones for dropped columns from legacy 
> sstables (CASSANDRA-14912)
>  * Fix missing rows when reading 2.1 SSTables with static columns in 3.0 
> (CASSANDRA-14873)
>  * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
>  * Dropped columns can cause reverse sstable iteration to return prematurely 
> (CASSANDRA-14838)
>  * Legacy sstables with  multi block range tombstones create invalid bound 
> sequences (CASSANDRA-14823)
>  * Handle failures in parallelAllSSTableOperation 
> (cleanup/upgradesstables/etc) (CASSANDRA-14657)
>  * sstableloader should use discovered broadcast address to connect 
> intra-cluster (CASSANDRA-14522)
> ```
> 
> I am asking because I have read online that : The "mc" is the SSTable file 
> version. This changes whenever a new release of Cassandra changes anything in 
> the way data is stored in any of the files listed in the table above. 
> https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
> 
> Does anyone have any information about this ?
>  
> Regards,
> 
> Leo

SStable format change in 3.0.18 ?

2019-04-04 Thread Léo FERLIN SUTTON

Hello !

I have noticed something since I upgraded to cassandra 3.0.18.

Before all my Sstable used to be named this way :
```
mc-130817-big-CompressionInfo.db
mc-130817-big-Data.db
mc-130817-big-Digest.crc32
mc-130817-big-Filter.db
mc-130817-big-Index.db
mc-130817-big-Statistics.db
mc-130817-big-Summary.db
mc-130817-big-TOC.txt
```

Since the update I have a new type of files :

```
md-20631-big-Statistics.db
md-20631-big-Filter.db
md-20631-big-TOC.txt
md-20631-big-Summary.db
md-20631-big-CompressionInfo.db
md-20631-big-Data.db
md-20631-big-Digest.crc32
md-20631-big-Index.db
```

Starting with `md` mixed with my the ancient format starting with "mc".

Other than the name these files seems identical to regular Sstables. I
haven't seen any information about this in the changelog :
``` (lines with "sstables" from the changelog)

 * Fix handling of collection tombstones for dropped columns from
legacy sstables (CASSANDRA-14912)
 * Fix missing rows when reading 2.1 SSTables with static columns in
3.0 (CASSANDRA-14873)
 * Sstable min/max metadata can cause data loss (CASSANDRA-14861)
 * Dropped columns can cause reverse sstable iteration to return
prematurely (CASSANDRA-14838)
 * Legacy sstables with  multi block range tombstones create invalid
bound sequences (CASSANDRA-14823)
 * Handle failures in parallelAllSSTableOperation
(cleanup/upgradesstables/etc) (CASSANDRA-14657)
 * sstableloader should use discovered broadcast address to connect
intra-cluster (CASSANDRA-14522)

```

I am asking because I have read online that : The "mc" is the SSTable file
version. This changes whenever a new release of Cassandra changes anything
in the way data is stored in any of the files listed in the table above.
https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/

Does anyone have any information about this ?

Regards,

Leo

can i delete a sstable with Estimated droppable tombstones > 1, manually?

2019-03-19 Thread onmstester onmstester

Running:
SSTablemetadata /THE_KEYSPACE_DIR/mc-1421-big-Data.db



result was:

Estimated droppable tombstones: 1.2



Having STCS and data disk usage of 80% (do not have enough free space for 
normal compaction), Is it OK to just: 1. stop Cassandra, 2. delete mc-1421* and 
then 3. start Cassandra?
Sent using https://www.zoho.com/mail/

Re: About the relationship between the sstable compaction and the read path

2019-01-09 Thread Jinhua Luo

> We stop at the memtable if we know that’s all we need. This depends on a lot 
> of factors (schema, point read vs slice, etc)

The codes seems to search sstables without checking whether the query
is already satisfied in memtable only.
Could you point out the related code snippets for what you said?




Could you give quick and simple answer to my questions about the complex types:

For collection, when I select a column of collection type, e.g.
map, to ensure the whole set of map fields is collected,
it is necessary to search in all sstables.

For cdt, it needs to ensure all fields of the cdt is collected.

For counter, it needs to merge all mutations distributed in all
sstables to give a final state of counter value.




Another related question, since the sstable only contains partitioning
key index, clustering key index (inline within the index file), but no
index for collection, like map and set. So, for field getting,
cassandra needs to iterate all fields or do quick search based on
sorted array?

Jeff Jirsa  于2019年1月9日周三 下午10:43写道：
>
> You’re comparing single machine key/value stores to a distributed db with a 
> much richer data model (partitions/slices, statics, range reads, range 
> deletions, etc). They’re going to read very differently. Instead of 
> explaining why they’re not like rocks/ldb, how about you tell us what you’re 
> trying to do / learn so we can answer the real question?
>
> Few other notes inline.
>
> --
> Jeff Jirsa
>
>
> > On Jan 8, 2019, at 10:51 PM, Jinhua Luo  wrote:
> >
> > Thanks. Let me clarify my questions more.
> >
> > 1) For memtable, if the selected columns (assuming they are in simple
> > types) could be found in memtable only, why bother to search sstables
> > then? In leveldb and rocksdb, they would stop consulting sstables if
> > the memtable already fulfill the query.
>
> We stop at the memtable if we know that’s all we need. This depends on a lot 
> of factors (schema, point read vs slice, etc)
>
> >
> > 2) For STCS and LCS, obviously, the sstables are grouped in
> > generations (old mutations would promoted into next level or bucket),
> > so why not search the columns level by level (or bucket by bucket)
> > until all selected columns are collected? In leveldb and rocksdb, they
> > do in this way.
>
> They’re single machine and Cassandra isn’t. There’s no guarantee in Cassandra 
> that the small sstables in stcs or low levels in LCS are newest:
>
> - you can write arbitrary timestamps into the memtable
> - read repair can put old data in the memtable
> - streaming (bootstrap/repair) can put old data into new files
> - user processes (nodetool refresh) can put old data into new files
>
>
> >
> > 3) Could you explain the collection, cdt and counter types in more
> > detail? Does they need to iterate all sstables? Because they could not
> > be simply filtered by timestamp or value range.
> >
>
> I can’t (combination of time available and it’s been a long time since I’ve 
> dealt with that code and I don’t want to misspeak).
>
>
> > For collection, when I select a column of collection type, e.g.
> > map, to ensure the whole set of map fields is collected,
> > it is necessary to search in all sstables.
> >
> > For cdt, it needs to ensure all fields of the cdt is collected.
> >
> > For counter, it needs to merge all mutations distributed in all
> > sstables to give a final state of counter value.
> >
> > Am I correct? If so, then there three complex types seems less
> > efficient than simple types, right?
> >
> > Jeff Jirsa  于2019年1月8日周二 下午11:58写道：
> >>
> >> First:
> >>
> >> Compaction controls how sstables are combined but not how they’re read. 
> >> The read path (with one tiny exception) doesn’t know or care which 
> >> compaction strategy you’re using.
> >>
> >> A few more notes inline.
> >>
> >>> On Jan 8, 2019, at 3:04 AM, Jinhua Luo  wrote:
> >>>
> >>> Hi All,
> >>>
> >>> The compaction would organize the sstables, e.g. with LCS, the
> >>> sstables would be categorized into levels, and the read path should
> >>> read sstables level by level until the read is fulfilled, correct?
> >>
> >> LCS levels are to minimize the number of sstables scanned - at most one 
> >> per level - but there’s no attempt to fulfill the read with low levels 
> >> beyond the filtering done by timestamp.
> >>
> >>>
> >>> For STCS, it would search sstables in buckets from smallest to largest?
> >>
> >> Nope. No attempt to do this.
> >>
>

Re: About the relationship between the sstable compaction and the read path

2019-01-09 Thread Jeff Jirsa

You’re comparing single machine key/value stores to a distributed db with a 
much richer data model (partitions/slices, statics, range reads, range 
deletions, etc). They’re going to read very differently. Instead of explaining 
why they’re not like rocks/ldb, how about you tell us what you’re trying to do 
/ learn so we can answer the real question?

Few other notes inline.

-- 
Jeff Jirsa


> On Jan 8, 2019, at 10:51 PM, Jinhua Luo  wrote:
> 
> Thanks. Let me clarify my questions more.
> 
> 1) For memtable, if the selected columns (assuming they are in simple
> types) could be found in memtable only, why bother to search sstables
> then? In leveldb and rocksdb, they would stop consulting sstables if
> the memtable already fulfill the query.

We stop at the memtable if we know that’s all we need. This depends on a lot of 
factors (schema, point read vs slice, etc)

> 
> 2) For STCS and LCS, obviously, the sstables are grouped in
> generations (old mutations would promoted into next level or bucket),
> so why not search the columns level by level (or bucket by bucket)
> until all selected columns are collected? In leveldb and rocksdb, they
> do in this way.

They’re single machine and Cassandra isn’t. There’s no guarantee in Cassandra 
that the small sstables in stcs or low levels in LCS are newest:

- you can write arbitrary timestamps into the memtable
- read repair can put old data in the memtable
- streaming (bootstrap/repair) can put old data into new files
- user processes (nodetool refresh) can put old data into new files


> 
> 3) Could you explain the collection, cdt and counter types in more
> detail? Does they need to iterate all sstables? Because they could not
> be simply filtered by timestamp or value range.
> 

I can’t (combination of time available and it’s been a long time since I’ve 
dealt with that code and I don’t want to misspeak).


> For collection, when I select a column of collection type, e.g.
> map, to ensure the whole set of map fields is collected,
> it is necessary to search in all sstables.
> 
> For cdt, it needs to ensure all fields of the cdt is collected.
> 
> For counter, it needs to merge all mutations distributed in all
> sstables to give a final state of counter value.
> 
> Am I correct? If so, then there three complex types seems less
> efficient than simple types, right?
> 
> Jeff Jirsa  于2019年1月8日周二 下午11:58写道：
>> 
>> First:
>> 
>> Compaction controls how sstables are combined but not how they’re read. The 
>> read path (with one tiny exception) doesn’t know or care which compaction 
>> strategy you’re using.
>> 
>> A few more notes inline.
>> 
>>> On Jan 8, 2019, at 3:04 AM, Jinhua Luo  wrote:
>>> 
>>> Hi All,
>>> 
>>> The compaction would organize the sstables, e.g. with LCS, the
>>> sstables would be categorized into levels, and the read path should
>>> read sstables level by level until the read is fulfilled, correct?
>> 
>> LCS levels are to minimize the number of sstables scanned - at most one per 
>> level - but there’s no attempt to fulfill the read with low levels beyond 
>> the filtering done by timestamp.
>> 
>>> 
>>> For STCS, it would search sstables in buckets from smallest to largest?
>> 
>> Nope. No attempt to do this.
>> 
>>> 
>>> What about other compaction cases? They would iterate all sstables?
>> 
>> In all cases, we’ll use a combination of bloom filters and sstable metadata 
>> and indices to include / exclude sstables. If the bloom filter hits, we’ll 
>> consider things like timestamps and whether or not the min/max clustering of 
>> the sstable matches the slice we care about. We don’t consult the compaction 
>> strategy, though the compaction strategy may have (in the case of LCS or 
>> TWCS) placed the sstables into a state that makes this read less expensive.
>> 
>>> 
>>> But in the codes, I'm confused a lot:
>>> In 
>>> org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal,
>>> it seems that no matter whether the selected columns (except the
>>> collection/cdt and counter cases, let's assume here the selected
>>> columns are simple cell) are collected and satisfied, it would search
>>> both memtable and all sstables, regardless of the compaction strategy.
>> 
>> There’s another that includes timestamps that will do some smart-ish 
>> exclusion of sstables that aren’t needed for the read command.
>> 
>>> 
>>> Why?
>>> 
>>> Moreover, for collection/cdt (non-frozen) and counter types, it would
>>> need to iterate al

Re: About the relationship between the sstable compaction and the read path

2019-01-08 Thread Jinhua Luo

Thanks. Let me clarify my questions more.

1) For memtable, if the selected columns (assuming they are in simple
types) could be found in memtable only, why bother to search sstables
then? In leveldb and rocksdb, they would stop consulting sstables if
the memtable already fulfill the query.

2) For STCS and LCS, obviously, the sstables are grouped in
generations (old mutations would promoted into next level or bucket),
so why not search the columns level by level (or bucket by bucket)
until all selected columns are collected? In leveldb and rocksdb, they
do in this way.

3) Could you explain the collection, cdt and counter types in more
detail? Does they need to iterate all sstables? Because they could not
be simply filtered by timestamp or value range.

For collection, when I select a column of collection type, e.g.
map, to ensure the whole set of map fields is collected,
it is necessary to search in all sstables.

For cdt, it needs to ensure all fields of the cdt is collected.

For counter, it needs to merge all mutations distributed in all
sstables to give a final state of counter value.

Am I correct? If so, then there three complex types seems less
efficient than simple types, right?

Jeff Jirsa  于2019年1月8日周二 下午11:58写道：
>
> First:
>
> Compaction controls how sstables are combined but not how they’re read. The 
> read path (with one tiny exception) doesn’t know or care which compaction 
> strategy you’re using.
>
> A few more notes inline.
>
> > On Jan 8, 2019, at 3:04 AM, Jinhua Luo  wrote:
> >
> > Hi All,
> >
> > The compaction would organize the sstables, e.g. with LCS, the
> > sstables would be categorized into levels, and the read path should
> > read sstables level by level until the read is fulfilled, correct?
>
> LCS levels are to minimize the number of sstables scanned - at most one per 
> level - but there’s no attempt to fulfill the read with low levels beyond the 
> filtering done by timestamp.
>
> >
> > For STCS, it would search sstables in buckets from smallest to largest?
>
> Nope. No attempt to do this.
>
> >
> > What about other compaction cases? They would iterate all sstables?
>
> In all cases, we’ll use a combination of bloom filters and sstable metadata 
> and indices to include / exclude sstables. If the bloom filter hits, we’ll 
> consider things like timestamps and whether or not the min/max clustering of 
> the sstable matches the slice we care about. We don’t consult the compaction 
> strategy, though the compaction strategy may have (in the case of LCS or 
> TWCS) placed the sstables into a state that makes this read less expensive.
>
> >
> > But in the codes, I'm confused a lot:
> > In 
> > org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal,
> > it seems that no matter whether the selected columns (except the
> > collection/cdt and counter cases, let's assume here the selected
> > columns are simple cell) are collected and satisfied, it would search
> > both memtable and all sstables, regardless of the compaction strategy.
>
> There’s another that includes timestamps that will do some smart-ish 
> exclusion of sstables that aren’t needed for the read command.
>
> >
> > Why?
> >
> > Moreover, for collection/cdt (non-frozen) and counter types, it would
> > need to iterate all sstable to ensure the whole set of the fields are
> > collected, correct? If so, such multi-cell or counter types are
> > heavyweight in performance, correct?
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: About the relationship between the sstable compaction and the read path

2019-01-08 Thread Jeff Jirsa

First: 

Compaction controls how sstables are combined but not how they’re read. The 
read path (with one tiny exception) doesn’t know or care which compaction 
strategy you’re using. 

A few more notes inline. 

> On Jan 8, 2019, at 3:04 AM, Jinhua Luo  wrote:
> 
> Hi All,
> 
> The compaction would organize the sstables, e.g. with LCS, the
> sstables would be categorized into levels, and the read path should
> read sstables level by level until the read is fulfilled, correct?

LCS levels are to minimize the number of sstables scanned - at most one per 
level - but there’s no attempt to fulfill the read with low levels beyond the 
filtering done by timestamp.

> 
> For STCS, it would search sstables in buckets from smallest to largest?

Nope. No attempt to do this. 

> 
> What about other compaction cases? They would iterate all sstables?

In all cases, we’ll use a combination of bloom filters and sstable metadata and 
indices to include / exclude sstables. If the bloom filter hits, we’ll consider 
things like timestamps and whether or not the min/max clustering of the sstable 
matches the slice we care about. We don’t consult the compaction strategy, 
though the compaction strategy may have (in the case of LCS or TWCS) placed the 
sstables into a state that makes this read less expensive.

> 
> But in the codes, I'm confused a lot:
> In 
> org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal,
> it seems that no matter whether the selected columns (except the
> collection/cdt and counter cases, let's assume here the selected
> columns are simple cell) are collected and satisfied, it would search
> both memtable and all sstables, regardless of the compaction strategy.

There’s another that includes timestamps that will do some smart-ish exclusion 
of sstables that aren’t needed for the read command.  

> 
> Why?
> 
> Moreover, for collection/cdt (non-frozen) and counter types, it would
> need to iterate all sstable to ensure the whole set of the fields are
> collected, correct? If so, such multi-cell or counter types are
> heavyweight in performance, correct?
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

About the relationship between the sstable compaction and the read path

2019-01-08 Thread Jinhua Luo

Hi All,

The compaction would organize the sstables, e.g. with LCS, the
sstables would be categorized into levels, and the read path should
read sstables level by level until the read is fulfilled, correct?

For STCS, it would search sstables in buckets from smallest to largest?

What about other compaction cases? They would iterate all sstables?

But in the codes, I'm confused a lot:
In 
org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal,
it seems that no matter whether the selected columns (except the
collection/cdt and counter cases, let's assume here the selected
columns are simple cell) are collected and satisfied, it would search
both memtable and all sstables, regardless of the compaction strategy.

Why?

Moreover, for collection/cdt (non-frozen) and counter types, it would
need to iterate all sstable to ensure the whole set of the fields are
collected, correct? If so, such multi-cell or counter types are
heavyweight in performance, correct?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

sstable corruption and schema migration issues

2018-10-24 Thread David Payne

which versions of cassandra 2.x and 3.x are best for avoiding sstable 
corruption and schema migration slowness?

is this a "cassandra is not a set it and forget it system" concept?

Re: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?))

2018-09-18 Thread Oleksandr Shulgin

On Tue, Sep 18, 2018 at 10:38 AM Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

>
> any indications in Cassandra log about insufficient disk space during
> compactions?
>

Bingo!  The following was logged around the time compaction was started
(and I only looked around when it was finishing):

Not enough space for compaction, 284674.12MB estimated.  Reducing scope.

That still leaves a question why the estimate doesn't take into account the
tombstones which will be dropped in the process.  Because actually it takes
only slightly more than 100GB in the end, as seen on the other nodes.

Thanks, Thomas!
--
Alex

RE: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?))

2018-09-18 Thread Steinmaurer, Thomas

Alex,

any indications in Cassandra log about insufficient disk space during 
compactions?

Thomas

From: Oleksandr Shulgin 
Sent: Dienstag, 18. September 2018 10:01
To: User 
Subject: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files 
(due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a 
or scrub?))

On Mon, Sep 17, 2018 at 4:29 PM Oleksandr Shulgin 
mailto:oleksandr.shul...@zalando.de>> wrote:

Thanks for your reply!  Indeed it could be coming from single-SSTable 
compaction, this I didn't think about.  By any chance looking into 
compaction_history table could be useful to trace it down?

Hello,

Yet another unexpected thing we are seeing is that after a major compaction 
completed on one of the nodes there are two SSTables instead of only one (time 
is UTC):

-rw-r--r-- 1 999 root 99G Sep 18 00:13 mc-583-big-Data.db -rw-r--r-- 1 999 root 
70G Mar 8 2018 mc-74-big-Data.db

The more recent one must be the result of major compaction on this table, but 
why the other one from March was not included?

The logs don't help to understand the reason, and from compaction history on 
this node the following record seems to be the only trace:

@ Row 1
---+--
 id| b6feb180-bad7-11e8-9f42-f1a67c22839a
 bytes_in  | 223804299627
 bytes_out | 105322622473
 columnfamily_name | XXX
 compacted_at  | 2018-09-18 00:13:48+
 keyspace_name | YYY
 rows_merged   | {1: 31321943, 2: 11722759, 3: 382232, 4: 23405, 5: 2250, 
6: 134}

This also doesn't tell us a lot.

This has happened only on one node out of 10 where the same command was used to 
start major compaction on this table.

Any ideas what could be the reason?

For now we have just started major compaction again to ensure these two last 
tables are compacted together, but we would really like to understand the 
reason for this behavior.

Regards,
--
Alex

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

Major compaction ignoring one SSTable? (was Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?))

2018-09-18 Thread Oleksandr Shulgin

On Mon, Sep 17, 2018 at 4:29 PM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

>
> Thanks for your reply!  Indeed it could be coming from single-SSTable
> compaction, this I didn't think about.  By any chance looking into
> compaction_history table could be useful to trace it down?
>

Hello,

Yet another unexpected thing we are seeing is that after a major compaction
completed on one of the nodes there are two SSTables instead of only one
(time is UTC):

-rw-r--r-- 1 999 root 99G Sep 18 00:13 mc-583-big-Data.db -rw-r--r-- 1 999
root 70G Mar 8 2018 mc-74-big-Data.db

The more recent one must be the result of major compaction on this table,
but why the other one from March was not included?

The logs don't help to understand the reason, and from compaction history
on this node the following record seems to be the only trace:

@ Row 1
---+--
 id| b6feb180-bad7-11e8-9f42-f1a67c22839a
 bytes_in  | 223804299627
 bytes_out | 105322622473
 columnfamily_name | XXX
 compacted_at  | 2018-09-18 00:13:48+
 keyspace_name | YYY
 rows_merged   | {1: 31321943, 2: 11722759, 3: 382232, 4: 23405, 5:
2250, 6: 134}

This also doesn't tell us a lot.

This has happened only on one node out of 10 where the same command was
used to start major compaction on this table.

Any ideas what could be the reason?

For now we have just started major compaction again to ensure these two
last tables are compacted together, but we would really like to understand
the reason for this behavior.

Regards,
--
Alex

Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-18 Thread Oleksandr Shulgin

On Mon, Sep 17, 2018 at 4:41 PM Jeff Jirsa  wrote:

> Marcus’ idea of row lifting seems more likely, since you’re using STCS -
> it’s an optimization to “lift” expensive reads into a single sstable for
> future reads (if a read touches more than - I think - 4? sstables, we copy
> it back into the memtable so it’s flushed into a single sstable), so if you
> have STCS and you’re still doing reads, it could definitely be that.
>

A-ha, that's eye-opening: it could definitely be that.  Thanks for
explanation!

--
Alex

Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-17 Thread Jeff Jirsa

> On Sep 17, 2018, at 7:29 AM, Oleksandr Shulgin  
> wrote:
> 
> On Mon, Sep 17, 2018 at 4:04 PM Jeff Jirsa  wrote:
>>> Again, given that the tables are not updated anymore from the application 
>>> and we have repaired them successfully multiple times already, how can it 
>>> be that any inconsistency would be found by read-repair or normal repair?
>>> 
>>> We have seen this on a number of nodes, including SSTables written at the 
>>> time there was guaranteed no repair running.
>> Not obvious to me where the sstable is coming from - you’d have to look in 
>> the logs. If it’s read repair, it’ll be created during a memtable flush. If 
>> it’s nodetool repair, it’ll be streamed in. It could also be compaction 
>> (especially tombstone compaction), in which case it’ll be in the compaction 
>> logs and it’ll have an sstable ancestor in the metadata.
> 
> Jeff,
> 
> Thanks for your reply!  Indeed it could be coming from single-SSTable 
> compaction, this I didn't think about.  By any chance looking into 
> compaction_history table could be useful to trace it down?
> 

Maybe. Also check your normal system / debug logs (depending on your version), 
which will usually tell you inputs and outputs

Marcus’ idea of row lifting seems more likely, since you’re using STCS - it’s 
an optimization to “lift” expensive reads into a single sstable for future 
reads (if a read touches more than - I think - 4? sstables, we copy it back 
into the memtable so it’s flushed into a single sstable), so if you have STCS 
and you’re still doing reads, it could definitely be that.

- Jeff

Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-17 Thread Oleksandr Shulgin

On Mon, Sep 17, 2018 at 4:04 PM Jeff Jirsa  wrote:

> Again, given that the tables are not updated anymore from the application
> and we have repaired them successfully multiple times already, how can it
> be that any inconsistency would be found by read-repair or normal repair?
>
> We have seen this on a number of nodes, including SSTables written at the
> time there was guaranteed no repair running.
>
> Not obvious to me where the sstable is coming from - you’d have to look in
> the logs. If it’s read repair, it’ll be created during a memtable flush. If
> it’s nodetool repair, it’ll be streamed in. It could also be compaction
> (especially tombstone compaction), in which case it’ll be in the compaction
> logs and it’ll have an sstable ancestor in the metadata.
>

Jeff,

Thanks for your reply!  Indeed it could be coming from single-SSTable
compaction, this I didn't think about.  By any chance looking into
compaction_history table could be useful to trace it down?

--
Alex

Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-17 Thread Marcus Eriksson

It could also be https://issues.apache.org/jira/browse/CASSANDRA-2503

On Mon, Sep 17, 2018 at 4:04 PM Jeff Jirsa  wrote:

>
>
> On Sep 17, 2018, at 2:34 AM, Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
> On Tue, Sep 11, 2018 at 8:10 PM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Tue, 11 Sep 2018, 19:26 Jeff Jirsa,  wrote:
>>
>>> Repair or read-repair
>>>
>>
>> Could you be more specific please?
>>
>> Why any data would be streamed in if there is no (as far as I can see)
>> possibilities for the nodes to have inconsistency?
>>
>
> Again, given that the tables are not updated anymore from the application
> and we have repaired them successfully multiple times already, how can it
> be that any inconsistency would be found by read-repair or normal repair?
>
> We have seen this on a number of nodes, including SSTables written at the
> time there was guaranteed no repair running.
>
>
> Not obvious to me where the sstable is coming from - you’d have to look in
> the logs. If it’s read repair, it’ll be created during a memtable flush. If
> it’s nodetool repair, it’ll be streamed in. It could also be compaction
> (especially tombstone compaction), in which case it’ll be in the compaction
> logs and it’ll have an sstable ancestor in the metadata.
>
>
>

Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-17 Thread Jeff Jirsa

> On Sep 17, 2018, at 2:34 AM, Oleksandr Shulgin  
> wrote:
> 
>> On Tue, Sep 11, 2018 at 8:10 PM Oleksandr Shulgin 
>>  wrote:
>>> On Tue, 11 Sep 2018, 19:26 Jeff Jirsa,  wrote:
>>> Repair or read-repair
>> 
>> 
>> Could you be more specific please?
>> 
>> Why any data would be streamed in if there is no (as far as I can see) 
>> possibilities for the nodes to have inconsistency?
> 
> Again, given that the tables are not updated anymore from the application and 
> we have repaired them successfully multiple times already, how can it be that 
> any inconsistency would be found by read-repair or normal repair?
> 
> We have seen this on a number of nodes, including SSTables written at the 
> time there was guaranteed no repair running.
> 

Not obvious to me where the sstable is coming from - you’d have to look in the 
logs. If it’s read repair, it’ll be created during a memtable flush. If it’s 
nodetool repair, it’ll be streamed in. It could also be compaction (especially 
tombstone compaction), in which case it’ll be in the compaction logs and it’ll 
have an sstable ancestor in the metadata.

Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-17 Thread Oleksandr Shulgin

On Tue, Sep 11, 2018 at 8:10 PM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Tue, 11 Sep 2018, 19:26 Jeff Jirsa,  wrote:
>
>> Repair or read-repair
>>
>
> Could you be more specific please?
>
> Why any data would be streamed in if there is no (as far as I can see)
> possibilities for the nodes to have inconsistency?
>

Again, given that the tables are not updated anymore from the application
and we have repaired them successfully multiple times already, how can it
be that any inconsistency would be found by read-repair or normal repair?

We have seen this on a number of nodes, including SSTables written at the
time there was guaranteed no repair running.

Regards,
--
Alex

Re: Scrub a single SSTable only?

2018-09-11 Thread Jeff Jirsa

Doing this can resurrect deleted data and violate consistency - if that’s a 
problem for you, it may be easier to treat the whole host as failed, run 
repairs and replace it.

-- 
Jeff Jirsa


> On Sep 11, 2018, at 2:41 PM, Rahul Singh  wrote:
> 
> What’s the RF for that data ? If you can manage downtime one node I’d 
> recommend just bringing it down, and then repairing after you delete the bad 
> file and bring it back up.
> 
> Rahul Singh
> Chief Executive Officer
> m 202.905.2818
> 
> Anant Corporation
> 1010 Wisconsin Ave NW, Suite 250
> Washington, D.C. 20007
> 
> We build and manage digital business technology platforms.
>> On Sep 11, 2018, 2:55 AM -0400, Steinmaurer, Thomas 
>> , wrote:
>> Hello,
>> 
>>  
>> 
>> is there a way to Online scrub a particular SSTable file only and not the 
>> entire column family?
>> 
>>  
>> 
>> According to the Cassandra logs we have a corrupted SSTable smallish 
>> compared to the entire data volume of the column family in question.
>> 
>>  
>> 
>> To my understanding, both, nodetool scrub and sstablescrub operate on the 
>> entire column family and can’t work on a single SSTable, right?
>> 
>>  
>> 
>> There is still the way to shutdown Cassandra and remove the file from disk, 
>> but ideally I want to have that as an online operation.
>> 
>>  
>> 
>> Perhaps there is something JMX based?
>> 
>>  
>> 
>> Thanks,
>> 
>> Thomas
>> 
>>  
>> 
>> The contents of this e-mail are intended for the named addressee only. It 
>> contains information that may be confidential. Unless you are the named 
>> addressee or an authorized designee, you may not copy or use it, or disclose 
>> it to anyone else. If you received it in error please notify us immediately 
>> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
>> is a company registered in Linz whose registered office is at 4040 Linz, 
>> Austria, Freistädterstraße 313

Re: Scrub a single SSTable only?

2018-09-11 Thread Rahul Singh

What’s the RF for that data ? If you can manage downtime one node I’d recommend 
just bringing it down, and then repairing after you delete the bad file and 
bring it back up.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Sep 11, 2018, 2:55 AM -0400, Steinmaurer, Thomas 
, wrote:
> Hello,
>
> is there a way to Online scrub a particular SSTable file only and not the 
> entire column family?
>
> According to the Cassandra logs we have a corrupted SSTable smallish compared 
> to the entire data volume of the column family in question.
>
> To my understanding, both, nodetool scrub and sstablescrub operate on the 
> entire column family and can’t work on a single SSTable, right?
>
> There is still the way to shutdown Cassandra and remove the file from disk, 
> but ideally I want to have that as an online operation.
>
> Perhaps there is something JMX based?
>
> Thanks,
> Thomas
>
> The contents of this e-mail are intended for the named addressee only. It 
> contains information that may be confidential. Unless you are the named 
> addressee or an authorized designee, you may not copy or use it, or disclose 
> it to anyone else. If you received it in error please notify us immediately 
> and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) 
> is a company registered in Linz whose registered office is at 4040 Linz, 
> Austria, Freistädterstraße 313

Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-11 Thread Oleksandr Shulgin

On Tue, 11 Sep 2018, 19:26 Jeff Jirsa,  wrote:

> Repair or read-repair
>

Jeff,

Could you be more specific please?

Why any data would be streamed in if there is no (as far as I can see)
possibilities for the nodes to have inconsistency?

--
Alex

On Tue, Sep 11, 2018 at 12:58 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Tue, Sep 11, 2018 at 9:47 AM Oleksandr Shulgin <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>> On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas <
>>> thomas.steinmau...@dynatrace.com> wrote:
>>>
>>>> As far as I remember, in newer Cassandra versions, with STCS, nodetool
>>>> compact offers a ‘-s’ command-line option to split the output into files
>>>> with 50%, 25% … in size, thus in this case, not a single largish SSTable
>>>> anymore. By default, without -s, it is a single SSTable though.
>>>>
>>>
>>> Thanks Thomas, I've also spotted the option while testing this
>>> approach.  I understand that doing major compactions is generally not
>>> recommended, but do you see any real drawback of having a single SSTable
>>> file in case we stopped writing new data to the table?
>>>
>>
>> A related question is: given that we are not writing new data to these
>> tables, it would make sense to exclude them from the routine repair
>> regardless of the option we use in the end to remove the tombstones.
>>
>> However, I've just checked the timestamps of the SSTable files on one of
>> the nodes and to my surprise I can find some files written only a few weeks
>> ago (most of the files are half a year ago, which is expected because it
>> was the time we were adding this DC).  But we've stopped writing to the
>> tables about a year ago and we repair the cluster very week.
>>
>> What could explain that we suddenly see these new SSTable files?  They
>> shouldn't be there even due to overstreaming, because one would need to
>> find some differences in the Merkle tree in the first place, but I don't
>> see how that could actually happen in our case.
>>
>> Any ideas?
>>
>> Thanks,
>> --
>> Alex
>>
>>

Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-11 Thread Jeff Jirsa

Repair or read-repair


On Tue, Sep 11, 2018 at 12:58 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Sep 11, 2018 at 9:47 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas <
>> thomas.steinmau...@dynatrace.com> wrote:
>>
>>> As far as I remember, in newer Cassandra versions, with STCS, nodetool
>>> compact offers a ‘-s’ command-line option to split the output into files
>>> with 50%, 25% … in size, thus in this case, not a single largish SSTable
>>> anymore. By default, without -s, it is a single SSTable though.
>>>
>>
>> Thanks Thomas, I've also spotted the option while testing this approach.
>> I understand that doing major compactions is generally not recommended, but
>> do you see any real drawback of having a single SSTable file in case we
>> stopped writing new data to the table?
>>
>
> A related question is: given that we are not writing new data to these
> tables, it would make sense to exclude them from the routine repair
> regardless of the option we use in the end to remove the tombstones.
>
> However, I've just checked the timestamps of the SSTable files on one of
> the nodes and to my surprise I can find some files written only a few weeks
> ago (most of the files are half a year ago, which is expected because it
> was the time we were adding this DC).  But we've stopped writing to the
> tables about a year ago and we repair the cluster very week.
>
> What could explain that we suddenly see these new SSTable files?  They
> shouldn't be there even due to overstreaming, because one would need to
> find some differences in the Merkle tree in the first place, but I don't
> see how that could actually happen in our case.
>
> Any ideas?
>
> Thanks,
> --
> Alex
>
>

Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?)

2018-09-11 Thread Oleksandr Shulgin

On Tue, Sep 11, 2018 at 9:47 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
>> As far as I remember, in newer Cassandra versions, with STCS, nodetool
>> compact offers a ‘-s’ command-line option to split the output into files
>> with 50%, 25% … in size, thus in this case, not a single largish SSTable
>> anymore. By default, without -s, it is a single SSTable though.
>>
>
> Thanks Thomas, I've also spotted the option while testing this approach.
> I understand that doing major compactions is generally not recommended, but
> do you see any real drawback of having a single SSTable file in case we
> stopped writing new data to the table?
>

A related question is: given that we are not writing new data to these
tables, it would make sense to exclude them from the routine repair
regardless of the option we use in the end to remove the tombstones.

However, I've just checked the timestamps of the SSTable files on one of
the nodes and to my surprise I can find some files written only a few weeks
ago (most of the files are half a year ago, which is expected because it
was the time we were adding this DC).  But we've stopped writing to the
tables about a year ago and we repair the cluster very week.

What could explain that we suddenly see these new SSTable files?  They
shouldn't be there even due to overstreaming, because one would need to
find some differences in the Merkle tree in the first place, but I don't
see how that could actually happen in our case.

Any ideas?

Thanks,
--
Alex

Scrub a single SSTable only?

2018-09-10 Thread Steinmaurer, Thomas

Hello,

is there a way to Online scrub a particular SSTable file only and not the 
entire column family?

According to the Cassandra logs we have a corrupted SSTable smallish compared 
to the entire data volume of the column family in question.

To my understanding, both, nodetool scrub and sstablescrub operate on the 
entire column family and can't work on a single SSTable, right?

There is still the way to shutdown Cassandra and remove the file from disk, but 
ideally I want to have that as an online operation.

Perhaps there is something JMX based?

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313

Re: SSTable Compression Ratio -1.0

2018-08-28 Thread Vitaliy Semochkin

Thank you ZAIDI,  can you please explain why mentioned ratio is negative?
On Tue, Aug 28, 2018 at 8:18 PM ZAIDI, ASAD A  wrote:
>
> Compression ratio is ratio of compression to its original size - smaller is 
> better; see it like compressed/uncompressed
> 1 would mean no change in size after compression!
>
>
>
> -Original Message-
> From: Vitaliy Semochkin [mailto:vitaliy...@gmail.com]
> Sent: Tuesday, August 28, 2018 12:03 PM
> To: user@cassandra.apache.org
> Subject: SSTable Compression Ratio -1.0
>
> Hello,
>
> nodetool tablestats my_kespace
> returns SSTable Compression Ratio -1.0
>
> Can someone explain, what does -1.0 mean?
>
> Regards,
> Vitaliy
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: SSTable Compression Ratio -1.0

2018-08-28 Thread ZAIDI, ASAD A

Compression ratio is ratio of compression to its original size - smaller is 
better; see it like compressed/uncompressed 
1 would mean no change in size after compression!



-Original Message-
From: Vitaliy Semochkin [mailto:vitaliy...@gmail.com] 
Sent: Tuesday, August 28, 2018 12:03 PM
To: user@cassandra.apache.org
Subject: SSTable Compression Ratio -1.0

Hello,

nodetool tablestats my_kespace
returns SSTable Compression Ratio -1.0

Can someone explain, what does -1.0 mean?

Regards,
Vitaliy

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

SSTable Compression Ratio -1.0

2018-08-28 Thread Vitaliy Semochkin

Hello,

nodetool tablestats my_kespace
returns SSTable Compression Ratio -1.0

Can someone explain, what does -1.0 mean?

Regards,
Vitaliy

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Infinite loop of single SSTable compactions

2018-07-30 Thread Martin Mačura

Hi Rahul,

the table TTL is 24 months. Oldest data is 22 months, so no
expirations yet.  Compacted partition maximum bytes: 17 GB - yeah, I
know that's not good, but we'll have to wait for the TTL to make it go
away.  More recent partitions are kept under 100 MB by bucketing.

The data model:
CREATE TABLE keyspace.table (
   group int,
   status int,
   bucket timestamp,
   ts timeuuid,
   source int,
...
   PRIMARY KEY ((group, status, bucket), ts, source)
) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)

There are no INSERT statements with the same 'ts' and 'source'
clustering columns.

Regards,

Martin
On Thu, Jul 26, 2018 at 12:16 PM Rahul Singh
 wrote:
>
> Few questions
>
>
> What is your maximumcompactedbytes across the cluster for this table ?
> What’s your TTL ?
> What does your data model look like as in what’s your PK?
>
> Rahul
> On Jul 25, 2018, 1:07 PM -0400, James Shaw , wrote:
>
> nodetool compactionstats  --- see compacting which table
> nodetool cfstats keyspace_name.table_name  --- check partition side, 
> tombstones
>
> go the data file directories:  look the data file size, timestamp,  --- 
> compaction will write to new temp file with _tmplink...,
>
> use sstablemetadata ...    look the largest or oldest one first
>
> of course, other factors may be,  like disk space, etc
> also what are compaction_throughput_mb_per_sec in cassandra.yaml
>
> Hope it is helpful.
>
> Thanks,
>
> James
>
>
>
>
> On Wed, Jul 25, 2018 at 4:18 AM, Martin Mačura  wrote:
>>
>> Hi,
>> we have a table which is being compacted all the time, with no change in 
>> size:
>>
>> Compaction History:
>> compacted_atbytes_inbytes_out   rows_merged
>> 2018-07-25T05:26:48.101 57248063878 57248063878 {1:11655}
>>
>>   2018-07-25T01:09:47.346 57248063878 57248063878
>> {1:11655}
>>  2018-07-24T20:52:48.652
>> 57248063878 57248063878 {1:11655}
>>
>> 2018-07-24T16:36:01.828 57248063878 57248063878 {1:11655}
>>
>>   2018-07-24T12:11:00.026 57248063878 57248063878
>> {1:11655}
>>  2018-07-24T07:28:04.686
>> 57248063878 57248063878 {1:11655}
>>
>> 2018-07-24T02:47:15.290 57248063878 57248063878 {1:11655}
>>
>>   2018-07-23T22:06:17.410 57248137921 57248063878
>> {1:11655}
>>
>> We tried setting unchecked_tombstone_compaction to false, had no effect.
>>
>> The data is a time series, there will be only a handful of cell
>> tombstones present. The table has a TTL, but it'll be least a month
>> before it takes effect.
>>
>> Table properties:
>>AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
>> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
>> 'max_threshold': '32', 'min_threshold': '4',
>> 'unchecked_tombstone_compaction': 'false'}
>>AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>AND crc_check_chance = 1.0
>>AND dclocal_read_repair_chance = 0.0
>>AND default_time_to_live = 63072000
>>AND gc_grace_seconds = 10800
>>AND max_index_interval = 2048
>>AND memtable_flush_period_in_ms = 0
>>AND min_index_interval = 128
>>AND read_repair_chance = 0.0
>>AND speculative_retry = 'NONE';
>>
>> Thanks for any help
>>
>>
>> Martin
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Infinite loop of single SSTable compactions

2018-07-26 Thread Rahul Singh

Few questions


What is your maximumcompactedbytes across the cluster for this table ?
What’s your TTL ?
What does your data model look like as in what’s your PK?

Rahul
On Jul 25, 2018, 1:07 PM -0400, James Shaw , wrote:
> nodetool compactionstats  --- see compacting which table
> nodetool cfstats keyspace_name.table_name  --- check partition side, 
> tombstones
>
> go the data file directories:  look the data file size, timestamp,  --- 
> compaction will write to new temp file with _tmplink...,
>
> use sstablemetadata ...    look the largest or oldest one first
>
> of course, other factors may be,  like disk space, etc
> also what are compaction_throughput_mb_per_sec in cassandra.yaml
>
> Hope it is helpful.
>
> Thanks,
>
> James
>
>
>
>
> > On Wed, Jul 25, 2018 at 4:18 AM, Martin Mačura  wrote:
> > > Hi,
> > > we have a table which is being compacted all the time, with no change in 
> > > size:
> > >
> > > Compaction History:
> > > compacted_at            bytes_in    bytes_out   rows_merged
> > > 2018-07-25T05:26:48.101 57248063878 57248063878 {1:11655}
> > >
> > >                   2018-07-25T01:09:47.346 57248063878 57248063878
> > > {1:11655}
> > >                                          2018-07-24T20:52:48.652
> > > 57248063878 57248063878 {1:11655}
> > >
> > > 2018-07-24T16:36:01.828 57248063878 57248063878 {1:11655}
> > >
> > >                   2018-07-24T12:11:00.026 57248063878 57248063878
> > > {1:11655}
> > >                                          2018-07-24T07:28:04.686
> > > 57248063878 57248063878 {1:11655}
> > >
> > > 2018-07-24T02:47:15.290 57248063878 57248063878 {1:11655}
> > >
> > >                   2018-07-23T22:06:17.410 57248137921 57248063878
> > > {1:11655}
> > >
> > > We tried setting unchecked_tombstone_compaction to false, had no effect.
> > >
> > > The data is a time series, there will be only a handful of cell
> > > tombstones present. The table has a TTL, but it'll be least a month
> > > before it takes effect.
> > >
> > > Table properties:
> > >    AND compaction = {'class':
> > > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> > > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
> > > 'max_threshold': '32', 'min_threshold': '4',
> > > 'unchecked_tombstone_compaction': 'false'}
> > >    AND compression = {'chunk_length_in_kb': '64', 'class':
> > > 'org.apache.cassandra.io.compress.LZ4Compressor'}
> > >    AND crc_check_chance = 1.0
> > >    AND dclocal_read_repair_chance = 0.0
> > >    AND default_time_to_live = 63072000
> > >    AND gc_grace_seconds = 10800
> > >    AND max_index_interval = 2048
> > >    AND memtable_flush_period_in_ms = 0
> > >    AND min_index_interval = 128
> > >    AND read_repair_chance = 0.0
> > >    AND speculative_retry = 'NONE';
> > >
> > > Thanks for any help
> > >
> > >
> > > Martin
> > >
> > > -
> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: user-h...@cassandra.apache.org
> > >
>

Re: Infinite loop of single SSTable compactions

2018-07-25 Thread James Shaw

nodetool compactionstats  --- see compacting which table
nodetool cfstats keyspace_name.table_name  --- check partition side,
tombstones

go the data file directories:  look the data file size, timestamp,  ---
compaction will write to new temp file with _tmplink...,

use sstablemetadata ...    look the largest or oldest one first

of course, other factors may be,  like disk space, etc
also what are compaction_throughput_mb_per_sec in cassandra.yaml

Hope it is helpful.

Thanks,

James




On Wed, Jul 25, 2018 at 4:18 AM, Martin Mačura  wrote:

> Hi,
> we have a table which is being compacted all the time, with no change in
> size:
>
> Compaction History:
> compacted_atbytes_inbytes_out   rows_merged
> 2018-07-25T05:26:48.101 57248063878 57248063878 {1:11655}
>
>   2018-07-25T01:09:47.346 57248063878 57248063878
> {1:11655}
>  2018-07-24T20:52:48.652
> 57248063878 57248063878 {1:11655}
>
> 2018-07-24T16:36:01.828 57248063878 57248063878 {1:11655}
>
>   2018-07-24T12:11:00.026 57248063878 57248063878
> {1:11655}
>  2018-07-24T07:28:04.686
> 57248063878 57248063878 {1:11655}
>
> 2018-07-24T02:47:15.290 57248063878 57248063878 {1:11655}
>
>   2018-07-23T22:06:17.410 57248137921 57248063878
> {1:11655}
>
> We tried setting unchecked_tombstone_compaction to false, had no effect.
>
> The data is a time series, there will be only a handful of cell
> tombstones present. The table has a TTL, but it'll be least a month
> before it takes effect.
>
> Table properties:
>AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
> 'max_threshold': '32', 'min_threshold': '4',
> 'unchecked_tombstone_compaction': 'false'}
>AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>AND crc_check_chance = 1.0
>AND dclocal_read_repair_chance = 0.0
>AND default_time_to_live = 63072000
>AND gc_grace_seconds = 10800
>AND max_index_interval = 2048
>AND memtable_flush_period_in_ms = 0
>AND min_index_interval = 128
>AND read_repair_chance = 0.0
>AND speculative_retry = 'NONE';
>
> Thanks for any help
>
>
> Martin
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Infinite loop of single SSTable compactions

2018-07-25 Thread Martin Mačura

Hi,
we have a table which is being compacted all the time, with no change in size:

Compaction History:
compacted_atbytes_inbytes_out   rows_merged
2018-07-25T05:26:48.101 57248063878 57248063878 {1:11655}

  2018-07-25T01:09:47.346 57248063878 57248063878
{1:11655}
 2018-07-24T20:52:48.652
57248063878 57248063878 {1:11655}

2018-07-24T16:36:01.828 57248063878 57248063878 {1:11655}

  2018-07-24T12:11:00.026 57248063878 57248063878
{1:11655}
 2018-07-24T07:28:04.686
57248063878 57248063878 {1:11655}

2018-07-24T02:47:15.290 57248063878 57248063878 {1:11655}

  2018-07-23T22:06:17.410 57248137921 57248063878
{1:11655}

We tried setting unchecked_tombstone_compaction to false, had no effect.

The data is a time series, there will be only a handful of cell
tombstones present. The table has a TTL, but it'll be least a month
before it takes effect.

Table properties:
   AND compaction = {'class':
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
'max_threshold': '32', 'min_threshold': '4',
'unchecked_tombstone_compaction': 'false'}
   AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
   AND crc_check_chance = 1.0
   AND dclocal_read_repair_chance = 0.0
   AND default_time_to_live = 63072000
   AND gc_grace_seconds = 10800
   AND max_index_interval = 2048
   AND memtable_flush_period_in_ms = 0
   AND min_index_interval = 128
   AND read_repair_chance = 0.0
   AND speculative_retry = 'NONE';

Thanks for any help


Martin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Maximum SSTable size

2018-06-27 Thread Jeff Jirsa

That’s a target and not a maximum, and it can go over that even on LCS with 
wide partitions

In general, the practical limit is hundreds of gigabytes (I’ve seen 500-600 a 
lot, but not more than a terabyte very often). You’ll typically hit other 
issues and need to tune some params before you hit hard caps (unless you know 
what you’re doing).

-- 
Jeff Jirsa


> On Jun 27, 2018, at 3:11 PM, ZAIDI, ASAD A  wrote:
> 
> With  Leveled compaction strategy  , you can set target size with 
> sstable_size_in_mb attribute however actual size can still be larger than 
> target given large partition size.
>  
> Thanks/Asad
>  
> From: Lucas Benevides [mailto:lu...@maurobenevides.com.br] 
> Sent: Wednesday, June 27, 2018 7:02 AM
> To: user@cassandra.apache.org
> Subject: Maximum SSTable size
>  
> Hello Community,
>  
> Is there a maximum SSTable Size?  
> If there is not, does it go up to the maximum Operational System values? 
>  
> Thanks in advance,
> Lucas Benevides

Re: [External] Maximum SSTable size

2018-06-27 Thread Tom van der Woerdt

I’ve had SSTables as big as 11TB. It works, read performance is fine. But,
compaction is hell, because you’ll need twice that in disk space and it
will take many hours 🙂

Avoid large SSTables unless you really know what you’re doing. LCS is a
great default for almost every workload, especially if your cluster has a
single large table. STCS is the actual Cassandra default but it often
causes more trouble than it solves, because of large SSTables 🙂

Hope that helps!

Tom

On Wed, 27 Jun 2018 at 08:02, Lucas Benevides 
wrote:

> Hello Community,
>
> Is there a maximum SSTable Size?
> If there is not, does it go up to the maximum Operational System values?
>
> Thanks in advance,
> Lucas Benevides
>
-- 
Tom van der Woerdt
Site Reliability Engineer

Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
Direct +31207153426
[image: Booking.com] <https://www.booking.com/>
The world's #1 accommodation site
43 languages, 198+ offices worldwide, 120,000+ global destinations,
1,550,000+ room nights booked every day
No booking fees, best price always guaranteed
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)

RE: Maximum SSTable size

2018-06-27 Thread ZAIDI, ASAD A

With  Leveled compaction 
strategy<http://cassandra.apache.org/doc/latest/operating/compaction.html#lcs>  
, you can set target size with sstable_size_in_mb attribute however actual size 
can still be larger than target given large partition size.

Thanks/Asad

From: Lucas Benevides [mailto:lu...@maurobenevides.com.br]
Sent: Wednesday, June 27, 2018 7:02 AM
To: user@cassandra.apache.org
Subject: Maximum SSTable size

Hello Community,

Is there a maximum SSTable Size?
If there is not, does it go up to the maximum Operational System values?

Thanks in advance,
Lucas Benevides

Maximum SSTable size

2018-06-27 Thread Lucas Benevides

Hello Community,

Is there a maximum SSTable Size?
If there is not, does it go up to the maximum Operational System values?

Thanks in advance,
Lucas Benevides

Re: Snapshot SSTable modified??

2018-05-29 Thread Max C.

Oh, thanks Elliott for the explanation!  I had no idea about that little tidbit 
concerning ctime.   Now it all makes sense!

- Max

> On May 28, 2018, at 10:24 pm, Elliott Sims  wrote:
> 
> Unix timestamps are a bit odd.  "mtime/Modify" is file changes, 
> "ctime/Change/(sometimes called create)" is file metadata changes, and a link 
> count change is a metadata change.  This seems like an odd decision on the 
> part of GNU tar, but presumably there's a good reason for it.
> 
> When the original sstable is compacted away, it's removed and therefore the 
> link count on the snapshot file is decremented.  The file's contents haven't 
> changed so mtime is identical, but ctime does get updated.  BSDtar doesn't 
> seem to interpret link count changes as a file change, so it's pretty 
> effective as a workaround.
> 
> 
> 
> On Fri, May 25, 2018 at 8:00 PM, Max C  <mailto:mc_cassan...@core43.com>> wrote:
> I looked at the source code for GNU tar, and it looks for a change in the 
> create time or (more likely) a change in the size.
> 
> This seems very strange to me — I would think that creating a snapshot would 
> cause a flush and then once the SSTables are written, hardlinks would be 
> created and the SSTables wouldn't be written to after that.
> 
> Our solution is to wait 5 minutes and retry the tar if an error occurs.  This 
> isn't ideal - but it's the best I could come up with.  :-/
> 
> Thanks Jeff & others for your responses.
> 
> - Max
> 
>> On May 25, 2018, at 5:05pm, Elliott Sims > <mailto:elli...@backblaze.com>> wrote:
>> 
>> I've run across this problem before - it seems like GNU tar interprets 
>> changes in the link count as changes to the file, so if the file gets 
>> compacted mid-backup it freaks out even if the file contents are unchanged.  
>> I worked around it by just using bsdtar instead.
>> 
>> On Thu, May 24, 2018 at 6:08 AM, Nitan Kainth > <mailto:nitankai...@gmail.com>> wrote:
>> Jeff,
>> 
>> Shouldn't Snapshot get consistent state of sstables? -tmp file shouldn't 
>> impact backup operation right?
>> 
>> 
>> Regards,
>> Nitan K.
>> Cassandra and Oracle Architect/SME
>> Datastax Certified Cassandra expert
>> Oracle 10g Certified
>> 
>> On Wed, May 23, 2018 at 6:26 PM, Jeff Jirsa > <mailto:jji...@gmail.com>> wrote:
>> In versions before 3.0, sstables were written with a -tmp filename and 
>> copied/moved to the final filename when complete. This changes in 3.0 - we 
>> write into the file with the final name, and have a journal/log to let uss 
>> know when it's done/final/live.
>> 
>> Therefore, you can no longer just watch for a -Data.db file to be created 
>> and uploaded - you have to watch the log to make sure it's not being written.
>> 
>> 
>> On Wed, May 23, 2018 at 2:18 PM, Max C. > <mailto:mc_cassan...@core43.com>> wrote:
>> Hi Everyone,
>> 
>> We’ve noticed a few times in the last few weeks that when we’re doing 
>> backups, tar has complained with messages like this:
>> 
>> tar: 
>> /var/lib/cassandra/data/mars/test_instances_by_test_id-6a9440a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
>>  file changed as we read it
>> 
>> Any idea what might be causing this?
>> 
>> We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of our 
>> backup process:
>> 
>> 
>> SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
>> nodetool snapshot -t $SNAPSHOT_NAME
>> 
>> for each keyspace
>> - dump schema to “schema.cql"
>> - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz 
>> schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME
>> 
>> nodetool clearsnapshot -t $SNAPSHOT_NAME
>> 
>> Thanks.
>> 
>> - Max
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>> <mailto:user-unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>> <mailto:user-h...@cassandra.apache.org>
>> 
>> 
>> 
>> 
> 
>

Re: Snapshot SSTable modified??

2018-05-28 Thread Elliott Sims

Unix timestamps are a bit odd.  "mtime/Modify" is file changes,
"ctime/Change/(sometimes called create)" is file metadata changes, and a
link count change is a metadata change.  This seems like an odd decision on
the part of GNU tar, but presumably there's a good reason for it.

When the original sstable is compacted away, it's removed and therefore the
link count on the snapshot file is decremented.  The file's contents
haven't changed so mtime is identical, but ctime does get updated.  BSDtar
doesn't seem to interpret link count changes as a file change, so it's
pretty effective as a workaround.



On Fri, May 25, 2018 at 8:00 PM, Max C  wrote:

> I looked at the source code for GNU tar, and it looks for a change in the
> create time or (more likely) a change in the size.
>
> This seems very strange to me — I would think that creating a snapshot
> would cause a flush and then once the SSTables are written, hardlinks would
> be created and the SSTables wouldn't be written to after that.
>
> Our solution is to wait 5 minutes and retry the tar if an error occurs.
> This isn't ideal - but it's the best I could come up with.  :-/
>
> Thanks Jeff & others for your responses.
>
> - Max
>
> On May 25, 2018, at 5:05pm, Elliott Sims  wrote:
>
> I've run across this problem before - it seems like GNU tar interprets
> changes in the link count as changes to the file, so if the file gets
> compacted mid-backup it freaks out even if the file contents are
> unchanged.  I worked around it by just using bsdtar instead.
>
> On Thu, May 24, 2018 at 6:08 AM, Nitan Kainth 
> wrote:
>
>> Jeff,
>>
>> Shouldn't Snapshot get consistent state of sstables? -tmp file shouldn't
>> impact backup operation right?
>>
>>
>> Regards,
>> Nitan K.
>> Cassandra and Oracle Architect/SME
>> Datastax Certified Cassandra expert
>> Oracle 10g Certified
>>
>> On Wed, May 23, 2018 at 6:26 PM, Jeff Jirsa  wrote:
>>
>>> In versions before 3.0, sstables were written with a -tmp filename and
>>> copied/moved to the final filename when complete. This changes in 3.0 - we
>>> write into the file with the final name, and have a journal/log to let uss
>>> know when it's done/final/live.
>>>
>>> Therefore, you can no longer just watch for a -Data.db file to be
>>> created and uploaded - you have to watch the log to make sure it's not
>>> being written.
>>>
>>>
>>> On Wed, May 23, 2018 at 2:18 PM, Max C.  wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> We’ve noticed a few times in the last few weeks that when we’re doing
>>>> backups, tar has complained with messages like this:
>>>>
>>>> tar: /var/lib/cassandra/data/mars/test_instances_by_test_id-6a944
>>>> 0a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
>>>> file changed as we read it
>>>>
>>>> Any idea what might be causing this?
>>>>
>>>> We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of
>>>> our backup process:
>>>>
>>>> 
>>>> SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
>>>> nodetool snapshot -t $SNAPSHOT_NAME
>>>>
>>>> for each keyspace
>>>> - dump schema to “schema.cql"
>>>> - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz
>>>> schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME
>>>>
>>>> nodetool clearsnapshot -t $SNAPSHOT_NAME
>>>>
>>>> Thanks.
>>>>
>>>> - Max
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>
>>>>
>>>
>>
>
>

Re: Snapshot SSTable modified??

2018-05-25 Thread Max C

I looked at the source code for GNU tar, and it looks for a change in the 
create time or (more likely) a change in the size.

This seems very strange to me — I would think that creating a snapshot would 
cause a flush and then once the SSTables are written, hardlinks would be 
created and the SSTables wouldn't be written to after that.

Our solution is to wait 5 minutes and retry the tar if an error occurs.  This 
isn't ideal - but it's the best I could come up with.  :-/

Thanks Jeff & others for your responses.

- Max

> On May 25, 2018, at 5:05pm, Elliott Sims  wrote:
> 
> I've run across this problem before - it seems like GNU tar interprets 
> changes in the link count as changes to the file, so if the file gets 
> compacted mid-backup it freaks out even if the file contents are unchanged.  
> I worked around it by just using bsdtar instead.
> 
> On Thu, May 24, 2018 at 6:08 AM, Nitan Kainth  > wrote:
> Jeff,
> 
> Shouldn't Snapshot get consistent state of sstables? -tmp file shouldn't 
> impact backup operation right?
> 
> 
> Regards,
> Nitan K.
> Cassandra and Oracle Architect/SME
> Datastax Certified Cassandra expert
> Oracle 10g Certified
> 
> On Wed, May 23, 2018 at 6:26 PM, Jeff Jirsa  > wrote:
> In versions before 3.0, sstables were written with a -tmp filename and 
> copied/moved to the final filename when complete. This changes in 3.0 - we 
> write into the file with the final name, and have a journal/log to let uss 
> know when it's done/final/live.
> 
> Therefore, you can no longer just watch for a -Data.db file to be created and 
> uploaded - you have to watch the log to make sure it's not being written.
> 
> 
> On Wed, May 23, 2018 at 2:18 PM, Max C.  > wrote:
> Hi Everyone,
> 
> We’ve noticed a few times in the last few weeks that when we’re doing 
> backups, tar has complained with messages like this:
> 
> tar: 
> /var/lib/cassandra/data/mars/test_instances_by_test_id-6a9440a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
>  file changed as we read it
> 
> Any idea what might be causing this?
> 
> We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of our 
> backup process:
> 
> 
> SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
> nodetool snapshot -t $SNAPSHOT_NAME
> 
> for each keyspace
> - dump schema to “schema.cql"
> - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz 
> schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME
> 
> nodetool clearsnapshot -t $SNAPSHOT_NAME
> 
> Thanks.
> 
> - Max
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
> 
> 
> 
>

Re: Snapshot SSTable modified??

2018-05-25 Thread Elliott Sims

I've run across this problem before - it seems like GNU tar interprets
changes in the link count as changes to the file, so if the file gets
compacted mid-backup it freaks out even if the file contents are
unchanged.  I worked around it by just using bsdtar instead.

On Thu, May 24, 2018 at 6:08 AM, Nitan Kainth  wrote:

> Jeff,
>
> Shouldn't Snapshot get consistent state of sstables? -tmp file shouldn't
> impact backup operation right?
>
>
> Regards,
> Nitan K.
> Cassandra and Oracle Architect/SME
> Datastax Certified Cassandra expert
> Oracle 10g Certified
>
> On Wed, May 23, 2018 at 6:26 PM, Jeff Jirsa  wrote:
>
>> In versions before 3.0, sstables were written with a -tmp filename and
>> copied/moved to the final filename when complete. This changes in 3.0 - we
>> write into the file with the final name, and have a journal/log to let uss
>> know when it's done/final/live.
>>
>> Therefore, you can no longer just watch for a -Data.db file to be created
>> and uploaded - you have to watch the log to make sure it's not being
>> written.
>>
>>
>> On Wed, May 23, 2018 at 2:18 PM, Max C.  wrote:
>>
>>> Hi Everyone,
>>>
>>> We’ve noticed a few times in the last few weeks that when we’re doing
>>> backups, tar has complained with messages like this:
>>>
>>> tar: /var/lib/cassandra/data/mars/test_instances_by_test_id-6a944
>>> 0a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
>>> file changed as we read it
>>>
>>> Any idea what might be causing this?
>>>
>>> We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of our
>>> backup process:
>>>
>>> 
>>> SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
>>> nodetool snapshot -t $SNAPSHOT_NAME
>>>
>>> for each keyspace
>>> - dump schema to “schema.cql"
>>> - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz
>>> schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME
>>>
>>> nodetool clearsnapshot -t $SNAPSHOT_NAME
>>>
>>> Thanks.
>>>
>>> - Max
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>
>

Re: Snapshot SSTable modified??

2018-05-24 Thread Nitan Kainth

Jeff,

Shouldn't Snapshot get consistent state of sstables? -tmp file shouldn't
impact backup operation right?


Regards,
Nitan K.
Cassandra and Oracle Architect/SME
Datastax Certified Cassandra expert
Oracle 10g Certified

On Wed, May 23, 2018 at 6:26 PM, Jeff Jirsa  wrote:

> In versions before 3.0, sstables were written with a -tmp filename and
> copied/moved to the final filename when complete. This changes in 3.0 - we
> write into the file with the final name, and have a journal/log to let uss
> know when it's done/final/live.
>
> Therefore, you can no longer just watch for a -Data.db file to be created
> and uploaded - you have to watch the log to make sure it's not being
> written.
>
>
> On Wed, May 23, 2018 at 2:18 PM, Max C.  wrote:
>
>> Hi Everyone,
>>
>> We’ve noticed a few times in the last few weeks that when we’re doing
>> backups, tar has complained with messages like this:
>>
>> tar: /var/lib/cassandra/data/mars/test_instances_by_test_id-6a944
>> 0a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
>> file changed as we read it
>>
>> Any idea what might be causing this?
>>
>> We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of our
>> backup process:
>>
>> 
>> SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
>> nodetool snapshot -t $SNAPSHOT_NAME
>>
>> for each keyspace
>> - dump schema to “schema.cql"
>> - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz
>> schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME
>>
>> nodetool clearsnapshot -t $SNAPSHOT_NAME
>>
>> Thanks.
>>
>> - Max
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>

Re: Snapshot SSTable modified??

2018-05-23 Thread Jeff Jirsa

In versions before 3.0, sstables were written with a -tmp filename and
copied/moved to the final filename when complete. This changes in 3.0 - we
write into the file with the final name, and have a journal/log to let uss
know when it's done/final/live.

Therefore, you can no longer just watch for a -Data.db file to be created
and uploaded - you have to watch the log to make sure it's not being
written.


On Wed, May 23, 2018 at 2:18 PM, Max C.  wrote:

> Hi Everyone,
>
> We’ve noticed a few times in the last few weeks that when we’re doing
> backups, tar has complained with messages like this:
>
> tar: /var/lib/cassandra/data/mars/test_instances_by_test_id-
> 6a9440a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
> file changed as we read it
>
> Any idea what might be causing this?
>
> We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of our
> backup process:
>
> 
> SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
> nodetool snapshot -t $SNAPSHOT_NAME
>
> for each keyspace
> - dump schema to “schema.cql"
> - tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz
> schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME
>
> nodetool clearsnapshot -t $SNAPSHOT_NAME
>
> Thanks.
>
> - Max
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Snapshot SSTable modified??

2018-05-23 Thread Max C.

Hi Everyone,

We’ve noticed a few times in the last few weeks that when we’re doing backups, 
tar has complained with messages like this:

tar: 
/var/lib/cassandra/data/mars/test_instances_by_test_id-6a9440a04cc111e8878675f1041d7e1c/snapshots/backup_20180523_024502/mb-63-big-Data.db:
 file changed as we read it

Any idea what might be causing this?

We’re running Cassandra 3.0.8 on RHEL 7.  Here’s rough pseudocode of our backup 
process:


SNAPSHOT_NAME=backup_YYYMMDD_HHMMSS
nodetool snapshot -t $SNAPSHOT_NAME

for each keyspace
- dump schema to “schema.cql"
- tar -czf /file_server/backup_$HOSTNAME_$KEYSPACE_MMDD_HHMMSS.tgz 
schema.cql /var/lib/cassandra/data/$KEYSPACE/*/snapshots/$SNAPSHOT_NAME

nodetool clearsnapshot -t $SNAPSHOT_NAME

Thanks.

- Max
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: SSTable count in Nodetool tablestats(LevelCompactionStrategy)

2018-04-20 Thread Vishal1.Sharma

I used version: 3.11.2

I want to add that both the counts (SSTables, sum of numbers shown in levels), 
change after some time and become equal(i.e. the mismatch does not last 
forever) which has led me to believe that this mismatch happens only when the 
compaction process is going on and once the compaction is complete, the count 
becomes equal.

Regards,
Vishal Sharma

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Friday, April 20, 2018 12:27 PM
To: User
Subject: Re: SSTable count in Nodetool tablestats(LevelCompactionStrategy)

I'm currently investigating this issue on one of our clusters (but much worse, 
we're seeing >100 SSTables and only 2 in the levels) on 3.11.1. What version 
are you using? It's definitely a bug.

On 17 April 2018 at 10:09, 
mailto:vishal1.sha...@ril.com>> wrote:
Dear Community,

One of the tables in my keyspace is using LevelCompactionStrategy and when I 
used the nodetool tablestats keyspace.table_name command, I found some mismatch 
in the count of SSTables displayed at 2 different places. Please refer the 
attached image.

The command is giving SSTable count = 6 but if you add the numbers shown 
against SSTables in each level, then that comes out as 5. Why is there a 
difference?

Thanks and regards,
Vishal Sharma

"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s), are confidential and may be 
privileged. If you are not the intended recipient, you are hereby notified that 
any review, re-transmission, conversion to hard copy, copying, circulation or 
other use of this message and any attachments is strictly prohibited. If you 
are not the intended recipient, please notify the sender immediately by return 
email and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. The company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachment."


-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>

"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."

Re: SSTable count in Nodetool tablestats(LevelCompactionStrategy)

2018-04-19 Thread kurt greaves

I'm currently investigating this issue on one of our clusters (but much
worse, we're seeing >100 SSTables and only 2 in the levels) on 3.11.1. What
version are you using? It's definitely a bug.

On 17 April 2018 at 10:09,  wrote:

> Dear Community,
>
>
>
> One of the tables in my keyspace is using LevelCompactionStrategy and when
> I used the nodetool tablestats keyspace.table_name command, I found some
> mismatch in the count of SSTables displayed at 2 different places. Please
> refer the attached image.
>
>
>
> The command is giving SSTable count = 6 but if you add the numbers shown
> against SSTables in each level, then that comes out as 5. Why is there a
> difference?
>
>
>
> Thanks and regards,
>
> Vishal Sharma
>
>
> "*Confidentiality Warning*: This message and any attachments are intended
> only for the use of the intended recipient(s), are confidential and may be
> privileged. If you are not the intended recipient, you are hereby notified
> that any review, re-transmission, conversion to hard copy, copying,
> circulation or other use of this message and any attachments is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately by return email and delete this message and any attachments
> from your system.
>
> *Virus Warning:* Although the company has taken reasonable precautions to
> ensure no viruses are present in this email. The company cannot accept
> responsibility for any loss or damage arising from the use of this email or
> attachment."
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

SSTable count in Nodetool tablestats(LevelCompactionStrategy)

2018-04-17 Thread Vishal1.Sharma

Dear Community,

One of the tables in my keyspace is using LevelCompactionStrategy and when I 
used the nodetool tablestats keyspace.table_name command, I found some mismatch 
in the count of SSTables displayed at 2 different places. Please refer the 
attached image.

The command is giving SSTable count = 6 but if you add the numbers shown 
against SSTables in each level, then that comes out as 5. Why is there a 
difference?

Thanks and regards,
Vishal Sharma
"Confidentiality Warning: This message and any attachments are intended only 
for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. 
you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other 
use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the 
sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure 
no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachment."

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Single sstable file compaction issue

2018-03-26 Thread wxn...@zjqunshuo.com

Thanks Jeff for the reply. Answers inlined.

Tombstones probably aren't clearing because the same partition exists with 
older timestamps in other files (this is the "sstableexpiredblockers" problem, 
or "overlaps"). 
>>The RF is 2, so there is two copies of one partition in two node. So my 
>>method to clear expired data doesn't work because of the "overlaps" you 
>>mentioned. Is my understanding corrent? One more question, nodetool cleanup 
>>may work for me, but how cleanup deal with the sstable files in TWCS mode? I 
>>have large sstable files before changing from STCS to TWCS, and newer ones 
>>with time buckets with TWCS. How does the command deal with them? Compact all 
>>of them into large sstable files?

If you're certain you are ok losing that data, then you could stop the node, 
remove lb-143951-big-* , and start the node. This is usually a bad idea in data 
models that aren't ttl-only time-series, but if you KNOW the data is all 
expired, and you didnt manually delete any other data, it may work for you.
>>My data model is indeed ttl-only time-series.

Cheers,
-Simon

From: Jeff Jirsa
Date: 2018-03-27 11:52
To: cassandra
Subject: Re: Single sstable file compaction issue
Tombstones probably aren't clearing because the same partition exists with 
older timestamps in other files (this is the "sstableexpiredblockers" problem, 
or "overlaps"). 

If you're certain you are ok losing that data, then you could stop the node, 
remove lb-143951-big-* , and start the node. This is usually a bad idea in data 
models that aren't ttl-only time-series, but if you KNOW the data is all 
expired, and you didnt manually delete any other data, it may work for you.

On Mon, Mar 26, 2018 at 8:03 PM, wxn...@zjqunshuo.com  
wrote:
Hi All,
I changed STCS to TWCS months ago and left some old sstable files. Some are 
almost tombstones. To release disk space, I issued compaction command on one 
file by JMX. After the compaction is done, I got one new file with almost the 
same size of the old one. Seems no tombstones are cleaned during the 
compaction. 

Before compaciton:
Max: 01/12/2017 Min: 11/25/2016 Estimated droppable tombstones: 
0.9115440366604225 53G Jan 16 00:36 lb-124337-big-Data.db
After compaction:
Max: 01/12/2017 Min: 11/25/2016 Estimated droppable tombstones: 
0.9114708007586322 53G Mar 27 00:17 lb-143951-big-Data.db

Questions:
1. Why the compaction didn't clean the tombstones?
2. If one file are all tombstones and I want to it manually(including data, 
index, filter, etc), do I need to shutdown the node?

Cheers,
-Simon

Re: Single sstable file compaction issue

2018-03-26 Thread Jeff Jirsa

Tombstones probably aren't clearing because the same partition exists with
older timestamps in other files (this is the "sstableexpiredblockers"
problem, or "overlaps").

If you're certain you are ok losing that data, then you could stop the
node, remove lb-143951-big-* , and start the node. This is usually a bad
idea in data models that aren't ttl-only time-series, but if you KNOW the
data is all expired, and you didnt manually delete any other data, it may
work for you.

On Mon, Mar 26, 2018 at 8:03 PM, wxn...@zjqunshuo.com 
wrote:

> Hi All,
> I changed STCS to TWCS months ago and left some old sstable files. Some
> are almost tombstones. To release disk space, I issued compaction command
> on one file by JMX. After the compaction is done, I got one new file with
> almost the same size of the old one. Seems no tombstones are cleaned during
> the compaction.
>
> Before compaciton:
> Max: 01/12/2017 Min: 11/25/2016 Estimated droppable tombstones: 0.
> 9115440366604225 53G Jan 16 00:36 lb-124337-big-Data.db
> After compaction:
> Max: 01/12/2017 Min: 11/25/2016 Estimated droppable tombstones: 0.
> 9114708007586322 53G Mar 27 00:17 lb-143951-big-Data.db
>
> Questions:
> 1. Why the compaction didn't clean the tombstones?
> 2. If one file are all tombstones and I want to it manually(including
> data, index, filter, etc), do I need to shutdown the node?
>
> Cheers,
> -Simon
>

Single sstable file compaction issue

2018-03-26 Thread wxn...@zjqunshuo.com

Hi All,
I changed STCS to TWCS months ago and left some old sstable files. Some are 
almost tombstones. To release disk space, I issued compaction command on one 
file by JMX. After the compaction is done, I got one new file with almost the 
same size of the old one. Seems no tombstones are cleaned during the 
compaction. 

Before compaciton:
Max: 01/12/2017 Min: 11/25/2016 Estimated droppable tombstones: 
0.9115440366604225 53G Jan 16 00:36 lb-124337-big-Data.db
After compaction:
Max: 01/12/2017 Min: 11/25/2016 Estimated droppable tombstones: 
0.9114708007586322 53G Mar 27 00:17 lb-143951-big-Data.db

Questions:
1. Why the compaction didn't clean the tombstones?
2. If one file are all tombstones and I want to it manually(including data, 
index, filter, etc), do I need to shutdown the node?

Cheers,
-Simon

Re: Is this SSTable restore merging scenario possible ?

2018-03-21 Thread Carlos Rolo

As said before, as long as you rename the UUIDs to match you should be good.

The Production "win out" depends on the timestamps. In Cassandra last write
wins, so as long as, for the same row, the production timestamps are more
recent than the secondary cluster, the production data would "win over".

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +351 918 918 100
www.pythian.com

On Wed, Mar 21, 2018 at 2:04 PM, Rahul Singh 
wrote:

> If its not on the same “cluster” and you are not using something like
> OpsCenter, the snapshotted files will have a diferent schema UUID for each
> entity. If you rename the files to have the matching UUID in the file
> names, then you should be able to do what you are talking about.
>
> On Mar 21, 2018, 4:50 AM -0500, Andrew Voumard ,
> wrote:
>
> Hi All,
>
> I am using Cassandra 3.10
>
> I would like to know if the following SSTable row level merging scenario
> is possible:
>
> 1. On a Production Cluster
> - Take a full snapshot on every node
>
> 2. On a new, empty Secondary Cluster with the same topology
> - Create a matching schema (keyspaces + tables as the production cluster).
> Note: The schema is very simple: No secondary or SASI indices, materialized
> views, etc.
> - Restore the full snapshots from production on each corresponding
> secondary node
> - INSERT rows into tables on the secondary node, but with keys that are
> guaranteed to be different to any INSERTs that were restored from the
> production cluster
> - UPDATE some of the existing rows in the secondary cluster (these are
> rows originally from the production cluster)
>
> 3. On the Production Cluster
> - Apply INSERTs, UPDATEs and DELETEs to tables
> - Take an incremental snapshot on every node
>
> 4. On the Secondary Cluster
> - Attempt to restore the incremental snapshots from the Production Cluster
> on each corresponding secondary cluster node, using nodetool refresh
>
> Question: Will Step 4 be successful ?, reiterating that:
> - The INSERTs will not have conflicting keys (as noted above)
> - The Production Cluster may have made UPDATES to rows that the Secondary
> Cluster has also made UPDATEs to
> - The Production Cluster may have DELETEd rows which the Secondary Cluster
> has UPDATED in the meantime
>
> If it works, I would expect the changes from Production to "win out" over
> any made independently on the Secondary Cluster.
>
> Many Thanks for any help you can provide
>
>
>

-- 


--

Re: Is this SSTable restore merging scenario possible ?

2018-03-21 Thread Rahul Singh

If its not on the same “cluster” and you are not using something like 
OpsCenter, the snapshotted files will have a diferent schema UUID for each 
entity. If you rename the files to have the matching UUID in the file names, 
then you should be able to do what you are talking about.

On Mar 21, 2018, 4:50 AM -0500, Andrew Voumard , wrote:
> Hi All,
>
> I am using Cassandra 3.10
>
> I would like to know if the following SSTable row level merging scenario is 
> possible:
>
> 1. On a Production Cluster
> - Take a full snapshot on every node
>
> 2. On a new, empty Secondary Cluster with the same topology
> - Create a matching schema (keyspaces + tables as the production cluster). 
> Note: The schema is very simple: No secondary or SASI indices, materialized 
> views, etc.
> - Restore the full snapshots from production on each corresponding secondary 
> node
> - INSERT rows into tables on the secondary node, but with keys that are 
> guaranteed to be different to any INSERTs that were restored from the 
> production cluster
> - UPDATE some of the existing rows in the secondary cluster (these are rows 
> originally from the production cluster)
>
> 3. On the Production Cluster
> - Apply INSERTs, UPDATEs and DELETEs to tables
> - Take an incremental snapshot on every node
>
> 4. On the Secondary Cluster
> - Attempt to restore the incremental snapshots from the Production Cluster on 
> each corresponding secondary cluster node, using nodetool refresh
>
> Question: Will Step 4 be successful ?, reiterating that:
> - The INSERTs will not have conflicting keys (as noted above)
> - The Production Cluster may have made UPDATES to rows that the Secondary 
> Cluster has also made UPDATEs to
> - The Production Cluster may have DELETEd rows which the Secondary Cluster 
> has UPDATED in the meantime
>
> If it works, I would expect the changes from Production to "win out" over any 
> made independently on the Secondary Cluster.
>
> Many Thanks for any help you can provide
>
>

Is this SSTable restore merging scenario possible ?

2018-03-21 Thread Andrew Voumard

Hi All,

I am using Cassandra 3.10

I would like to know if the following SSTable row level merging scenario is 
possible:

1. On a Production Cluster
- Take a full snapshot on every node

2. On a new, empty Secondary Cluster with the same topology
- Create a matching schema (keyspaces + tables as the production cluster). 
Note: The schema is very simple: No secondary or SASI indices, materialized 
views, etc.
- Restore the full snapshots from production on each corresponding secondary 
node
- INSERT rows into tables on the secondary node, but with keys that are 
guaranteed to be different to any INSERTs that were restored from the 
production cluster
- UPDATE some of the existing rows in the secondary cluster (these are rows 
originally from the production cluster)

3. On the Production Cluster
- Apply INSERTs, UPDATEs and DELETEs to tables
- Take an incremental snapshot on every node

4. On the Secondary Cluster
- Attempt to restore the incremental snapshots from the Production Cluster on 
each corresponding secondary cluster node, using nodetool refresh

Question: Will Step 4 be successful ?, reiterating that:
- The INSERTs will not have conflicting keys (as noted above)
- The Production Cluster may have made UPDATES to rows that the Secondary 
Cluster has also made UPDATEs to
- The Production Cluster may have DELETEd rows which the Secondary Cluster has 
UPDATED in the meantime

If it works, I would expect the changes from Production to "win out" over any 
made independently on the Secondary Cluster.

Many Thanks for any help you can provide

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread kurt greaves

>
> Also, I was wondering if the key cache maintains a count of how many local
> accesses a key undergoes. Such information might be very useful for
> compactions of sstables by splitting data by frequency of use so that those
> can be preferentially compacted.

No we don't currently have metrics for that, only overall cache
hits/misses. Measuring individual local accesses would probably have a
performance and memory impact but there's probably a way to do it
efficiently.

Has this been exploited... ever?

Not that I know of. I've theorised about using it previously with some
friends, but never got around to trying it. I imagine if you did you'd
probably have to fix some parts of the code to make it work (like
potentially discoverComponentsFor).

Typically I think any conversation that is relevant to the internals of
Cassandra is fine for the dev list, and that's the desired audience. Not
every dev watches the user list and only developers will really be able to
answer these questions. Lets face it, the dev list is pretty dead so not
sure why we care about a few emails landing there.

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller

Also, I was wondering if the key cache maintains a count of how many local
accesses a key undergoes. Such information might be very useful for
compactions of sstables by splitting data by frequency of use so that those
can be preferentially compacted.

On Wed, Feb 21, 2018 at 5:08 PM, Carl Mueller 
wrote:

> Looking through the 2.1.X code I see this:
>
> org.apache.cassandra.io.sstable.Component.java
>
> In the enum for component types there is a CUSTOM enum value which seems
> to indicate a catchall for providing metadata for sstables.
>
> Has this been exploited... ever? I noticed in some of the patches for the
> archival options on TWCS there are complaints about being able to identify
> sstables that are archived and those that aren't.
>
> I would be interested in order to mark the sstables with metadata
> indicating the date range an sstable is targetted at for compactions.
>
> discoverComponentsFor seems to explicitly exclude the loadup of any
> files/sstable components that are CUSTOM in SStable.java
>
> On Wed, Feb 21, 2018 at 10:05 AM, Carl Mueller <
> carl.muel...@smartthings.com> wrote:
>
>> jon: I am planning on writing a custom compaction strategy. That's why
>> the question is here, I figured the specifics of memtable -> sstable and
>> cassandra internals are not a user question. If that still isn't deep
>> enough for the dev thread, I will move all those questions to user.
>>
>> On Wed, Feb 21, 2018 at 9:59 AM, Carl Mueller <
>> carl.muel...@smartthings.com> wrote:
>>
>>> Thank you all!
>>>
>>> On Tue, Feb 20, 2018 at 7:35 PM, kurt greaves 
>>> wrote:
>>>
>>>> Probably a lot of work but it would be incredibly useful for vnodes if
>>>> flushing was range aware (to be used with RangeAwareCompactionStrategy).
>>>> The writers are already range aware for JBOD, but that's not terribly
>>>> valuable ATM.
>>>>
>>>> On 20 February 2018 at 21:57, Jeff Jirsa  wrote:
>>>>
>>>>> There are some arguments to be made that the flush should consider
>>>>> compaction strategy - would allow a bug flush to respect LCS filesizes or
>>>>> break into smaller pieces to try to minimize range overlaps going from l0
>>>>> into l1, for example.
>>>>>
>>>>> I have no idea how much work would be involved, but may be worthwhile.
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Jirsa
>>>>>
>>>>>
>>>>> On Feb 20,  2018, at 1:26 PM, Jon Haddad  wrote:
>>>>>
>>>>> The file format is independent from compaction.  A compaction strategy
>>>>> only selects sstables to be compacted, that’s it’s only job.  It could 
>>>>> have
>>>>> side effects, like generating other files, but any decent compaction
>>>>> strategy will account for the fact that those other files don’t exist.
>>>>>
>>>>> I wrote a blog post a few months ago going over some of the nuance of
>>>>> compaction you mind find informative: http://thelastpic
>>>>> kle.com/blog/2017/03/16/compaction-nuance.html
>>>>>
>>>>> This is also the wrong mailing list, please direct future user
>>>>> questions to the user list.  The dev list is for development of Cassandra
>>>>> itself.
>>>>>
>>>>> Jon
>>>>>
>>>>> On Feb 20, 2018, at 1:10 PM, Carl Mueller <
>>>>> carl.muel...@smartthings.com> wrote:
>>>>>
>>>>> When memtables/CommitLogs are flushed to disk/sstable, does the
>>>>> sstable go
>>>>> through sstable organization specific to each compaction strategy, or
>>>>> is
>>>>> the sstable creation the same for all compactionstrats and it is up to
>>>>> the
>>>>> compaction strategy to recompact the sstable if desired?
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

1 2 3 4 5 6 >

1 - 100 of 591 matches

Mail list logo