Re: Backups eating up disk space

2017-02-27 Thread Vladimir Yudovin
Yes, you can. It's just hardlinks to tables files, so if some file is still 
active it will remain intact.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 27 Feb 2017 09:27:50 -0500 Kunal Gangakhedkar 
kgangakhed...@gmail.com wrote 




Hi all,



Is it safe to delete the backup folders from various CFs from 'system' keyspace 
too?

I seem to have missed them in the last cleanup - and now, the size_estimates 
and compactions_in_progress seem to have grown large ( 200G and ~6G 
respectively).



Can I remove them too?



Thanks,

Kunal




On 13 January 2017 at 18:30, Kunal Gangakhedkar kgangakhed...@gmail.com 
wrote:








Great, thanks a lot to all for the help :)



I finally took the dive and went with Razi's suggestions.

In summary, this is what I did:

turn off incremental backups on each of the nodes in rolling fashion

remove the 'backups' directory from each keyspace on each node.

This ended up freeing up almost 350GB on each node - yay :)




Again, thanks a lot for the help, guys.



Kunal




On 12 January 2017 at 21:15, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
raziuddin.kh...@nih.gov wrote:

snapshots are slightly different than backups.

 

In my explanation of the hardlinks created in the backups folder, notice that 
compacted sstables, never end up in the backups folder.

 

On the other hand, a snapshot is meant to represent the data at a particular 
moment in time. Thus, the snapshots directory contains hardlinks to all active 
sstables at the time the snapshot was taken, which would include: compacted 
sstables; and any sstables from memtable flush or streamed from other nodes 
that both exist in the table directory and the backups directory.

 

So, that would be the difference between snapshots and backups.

 

Best regards,

-Razi

 

 

From:  Alain RODRIGUEZ arodr...@gmail.com
 Reply-To: "user@cassandra.apache.org" user@cassandra.apache.org
 Date: Thursday, January 12, 2017 at 9:16 AM


To: "user@cassandra.apache.org" user@cassandra.apache.org

 Subject: Re: Backups eating up disk space






 


My 2 cents, 

 


As I mentioned earlier, we're not currently using snapshots - it's only the 
backups that are bothering me right now.

 


I believe backups folder is just the new name for the previously called 
snapshots folder. But I can be completely wrong, I haven't played that much 
with snapshots in new versions yet.


 


Anyway, some operations in Apache Cassandra can trigger a snapshot:


 


- Repair (when not using parallel option but sequential repairs instead)


- Truncating a table (by default)


- Dropping a table (by default)


- Maybe other I can't think of... ?


 


If you want to clean space but still keep a backup you can run:


 


"nodetool clearsnapshots"


"nodetool snapshot whatever"


 


This way and for a while, data won't be taking space as old files will be 
cleaned and new files will be only hardlinks as detailed above. Then you might 
want to work at a proper backup policy, probably implying getting data out of 
production server (a lot of people uses S3 or similar services). Or just do 
that from time to time, meaning you only keep a backup and disk space behaviour 
will be hard to predict.


 


C*heers,


---


Alain Rodriguez - @arodream -  al...@thelastpickle.com


France


 


The Last Pickle - Apache Cassandra Consulting


http://www.thelastpickle.com




 

2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar prasenjit.sar...@datos.io:

Hi Kunal, 

 


Razi's post does give a very lucid description of how cassandra manages the 
hard links inside the backup directory.


 


Where it needs clarification is the following:


-- incremental backups is a system wide setting and so its an all or 
nothing approach


 


-- as multiple people have stated, incremental backups do not create hard 
links to compacted sstables. however, this can bloat the size of your backups


 


-- again as stated, it is a general industry practice to place backups in a 
different secondary storage location than the main production site. So best to 
move it to the secondary storage before applying rm on the backups folder


 


In my experience with production clusters, managing the backups folder across 
multiple nodes can be painful if the objective is to ever recover data. With 
the usual disclaimers, better to rely on third party vendors to accomplish the 
needful rather than scripts/tablesnap.


 


Regards


Prasenjit 

 

 

On Wed, Jan 11, 2017 at 7:49 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
raziuddin.kh...@nih.gov wrote:

Hello Kunal,

 

Caveat: I am not a super-expert on Cassandra, but it helps to explain to 
others, in order to eventually become an expert, so if my explanation is wrong, 
I would hope others would correct me. J

 

The active sstables/data files are are all the files located in the directory 
for the table.

You can safely remove all files under 

Re: Backups eating up disk space

2017-02-27 Thread Kunal Gangakhedkar
Hi all,

Is it safe to delete the backup folders from various CFs from 'system'
keyspace too?
I seem to have missed them in the last cleanup - and now, the
size_estimates and compactions_in_progress seem to have grown large ( >200G
and ~6G respectively).

Can I remove them too?

Thanks,
Kunal

On 13 January 2017 at 18:30, Kunal Gangakhedkar <kgangakhed...@gmail.com>
wrote:

> Great, thanks a lot to all for the help :)
>
> I finally took the dive and went with Razi's suggestions.
> In summary, this is what I did:
>
>- turn off incremental backups on each of the nodes in rolling fashion
>- remove the 'backups' directory from each keyspace on each node.
>
> This ended up freeing up almost 350GB on each node - yay :)
>
> Again, thanks a lot for the help, guys.
>
> Kunal
>
> On 12 January 2017 at 21:15, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
> raziuddin.kh...@nih.gov> wrote:
>
>> snapshots are slightly different than backups.
>>
>>
>>
>> In my explanation of the hardlinks created in the backups folder, notice
>> that compacted sstables, never end up in the backups folder.
>>
>>
>>
>> On the other hand, a snapshot is meant to represent the data at a
>> particular moment in time. Thus, the snapshots directory contains hardlinks
>> to all active sstables at the time the snapshot was taken, which would
>> include: compacted sstables; and any sstables from memtable flush or
>> streamed from other nodes that both exist in the table directory and the
>> backups directory.
>>
>>
>>
>> So, that would be the difference between snapshots and backups.
>>
>>
>>
>> Best regards,
>>
>> -Razi
>>
>>
>>
>>
>>
>> *From: *Alain RODRIGUEZ <arodr...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Thursday, January 12, 2017 at 9:16 AM
>>
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Backups eating up disk space
>>
>>
>>
>> My 2 cents,
>>
>>
>>
>> As I mentioned earlier, we're not currently using snapshots - it's only
>> the backups that are bothering me right now.
>>
>>
>>
>> I believe backups folder is just the new name for the previously called
>> snapshots folder. But I can be completely wrong, I haven't played that much
>> with snapshots in new versions yet.
>>
>>
>>
>> Anyway, some operations in Apache Cassandra can trigger a snapshot:
>>
>>
>>
>> - Repair (when not using parallel option but sequential repairs instead)
>>
>> - Truncating a table (by default)
>>
>> - Dropping a table (by default)
>>
>> - Maybe other I can't think of... ?
>>
>>
>>
>> If you want to clean space but still keep a backup you can run:
>>
>>
>>
>> "nodetool clearsnapshots"
>>
>> "nodetool snapshot "
>>
>>
>>
>> This way and for a while, data won't be taking space as old files will be
>> cleaned and new files will be only hardlinks as detailed above. Then you
>> might want to work at a proper backup policy, probably implying getting
>> data out of production server (a lot of people uses S3 or similar
>> services). Or just do that from time to time, meaning you only keep a
>> backup and disk space behaviour will be hard to predict.
>>
>>
>>
>> C*heers,
>>
>> ---
>>
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>>
>> France
>>
>>
>>
>> The Last Pickle - Apache Cassandra Consulting
>>
>> http://www.thelastpickle.com
>>
>>
>>
>> 2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar <prasenjit.sar...@datos.io>:
>>
>> Hi Kunal,
>>
>>
>>
>> Razi's post does give a very lucid description of how cassandra manages
>> the hard links inside the backup directory.
>>
>>
>>
>> Where it needs clarification is the following:
>>
>> --> incremental backups is a system wide setting and so its an all or
>> nothing approach
>>
>>
>>
>> --> as multiple people have stated, incremental backups do not create
>> hard links to compacted sstables. however, this can bloat the size of your
>> backups
>>
>>
>>
>> --> again as stated, it is a general industry practice to place backups
>> in a different secondary storage location than the main production site. So
>> b

Re: Backups eating up disk space

2017-01-15 Thread Chris Mawata
You don't have a viable solution because you are not making a snapshot as a
starting point. After a while you will have a lot of backup data.  Using
the backups to get your cluster to a given state will involve copying a
very large amount of backup data, possibility more than the capacity of
your cluster followed by a tremendous amount of compaction. If your
topology changes life could really get miserable. I would counsel having
period snapshots so that your possible bad day in the future is less bad.
On Jan 13, 2017 8:01 AM, "Kunal Gangakhedkar" <kgangakhed...@gmail.com>
wrote:

> Great, thanks a lot to all for the help :)
>
> I finally took the dive and went with Razi's suggestions.
> In summary, this is what I did:
>
>- turn off incremental backups on each of the nodes in rolling fashion
>- remove the 'backups' directory from each keyspace on each node.
>
> This ended up freeing up almost 350GB on each node - yay :)
>
> Again, thanks a lot for the help, guys.
>
> Kunal
>
> On 12 January 2017 at 21:15, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
> raziuddin.kh...@nih.gov> wrote:
>
>> snapshots are slightly different than backups.
>>
>>
>>
>> In my explanation of the hardlinks created in the backups folder, notice
>> that compacted sstables, never end up in the backups folder.
>>
>>
>>
>> On the other hand, a snapshot is meant to represent the data at a
>> particular moment in time. Thus, the snapshots directory contains hardlinks
>> to all active sstables at the time the snapshot was taken, which would
>> include: compacted sstables; and any sstables from memtable flush or
>> streamed from other nodes that both exist in the table directory and the
>> backups directory.
>>
>>
>>
>> So, that would be the difference between snapshots and backups.
>>
>>
>>
>> Best regards,
>>
>> -Razi
>>
>>
>>
>>
>>
>> *From: *Alain RODRIGUEZ <arodr...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Thursday, January 12, 2017 at 9:16 AM
>>
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Backups eating up disk space
>>
>>
>>
>> My 2 cents,
>>
>>
>>
>> As I mentioned earlier, we're not currently using snapshots - it's only
>> the backups that are bothering me right now.
>>
>>
>>
>> I believe backups folder is just the new name for the previously called
>> snapshots folder. But I can be completely wrong, I haven't played that much
>> with snapshots in new versions yet.
>>
>>
>>
>> Anyway, some operations in Apache Cassandra can trigger a snapshot:
>>
>>
>>
>> - Repair (when not using parallel option but sequential repairs instead)
>>
>> - Truncating a table (by default)
>>
>> - Dropping a table (by default)
>>
>> - Maybe other I can't think of... ?
>>
>>
>>
>> If you want to clean space but still keep a backup you can run:
>>
>>
>>
>> "nodetool clearsnapshots"
>>
>> "nodetool snapshot "
>>
>>
>>
>> This way and for a while, data won't be taking space as old files will be
>> cleaned and new files will be only hardlinks as detailed above. Then you
>> might want to work at a proper backup policy, probably implying getting
>> data out of production server (a lot of people uses S3 or similar
>> services). Or just do that from time to time, meaning you only keep a
>> backup and disk space behaviour will be hard to predict.
>>
>>
>>
>> C*heers,
>>
>> ---
>>
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>>
>> France
>>
>>
>>
>> The Last Pickle - Apache Cassandra Consulting
>>
>> http://www.thelastpickle.com
>>
>>
>>
>> 2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar <prasenjit.sar...@datos.io>:
>>
>> Hi Kunal,
>>
>>
>>
>> Razi's post does give a very lucid description of how cassandra manages
>> the hard links inside the backup directory.
>>
>>
>>
>> Where it needs clarification is the following:
>>
>> --> incremental backups is a system wide setting and so its an all or
>> nothing approach
>>
>>
>>
>> --> as multiple people have stated, incremental backups do not create
>> hard links to compacted sstables. however, this can bloat the size of your
>> ba

Re: Backups eating up disk space

2017-01-13 Thread Kunal Gangakhedkar
Great, thanks a lot to all for the help :)

I finally took the dive and went with Razi's suggestions.
In summary, this is what I did:

   - turn off incremental backups on each of the nodes in rolling fashion
   - remove the 'backups' directory from each keyspace on each node.

This ended up freeing up almost 350GB on each node - yay :)

Again, thanks a lot for the help, guys.

Kunal

On 12 January 2017 at 21:15, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
raziuddin.kh...@nih.gov> wrote:

> snapshots are slightly different than backups.
>
>
>
> In my explanation of the hardlinks created in the backups folder, notice
> that compacted sstables, never end up in the backups folder.
>
>
>
> On the other hand, a snapshot is meant to represent the data at a
> particular moment in time. Thus, the snapshots directory contains hardlinks
> to all active sstables at the time the snapshot was taken, which would
> include: compacted sstables; and any sstables from memtable flush or
> streamed from other nodes that both exist in the table directory and the
> backups directory.
>
>
>
> So, that would be the difference between snapshots and backups.
>
>
>
> Best regards,
>
> -Razi
>
>
>
>
>
> *From: *Alain RODRIGUEZ <arodr...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Thursday, January 12, 2017 at 9:16 AM
>
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Backups eating up disk space
>
>
>
> My 2 cents,
>
>
>
> As I mentioned earlier, we're not currently using snapshots - it's only
> the backups that are bothering me right now.
>
>
>
> I believe backups folder is just the new name for the previously called
> snapshots folder. But I can be completely wrong, I haven't played that much
> with snapshots in new versions yet.
>
>
>
> Anyway, some operations in Apache Cassandra can trigger a snapshot:
>
>
>
> - Repair (when not using parallel option but sequential repairs instead)
>
> - Truncating a table (by default)
>
> - Dropping a table (by default)
>
> - Maybe other I can't think of... ?
>
>
>
> If you want to clean space but still keep a backup you can run:
>
>
>
> "nodetool clearsnapshots"
>
> "nodetool snapshot "
>
>
>
> This way and for a while, data won't be taking space as old files will be
> cleaned and new files will be only hardlinks as detailed above. Then you
> might want to work at a proper backup policy, probably implying getting
> data out of production server (a lot of people uses S3 or similar
> services). Or just do that from time to time, meaning you only keep a
> backup and disk space behaviour will be hard to predict.
>
>
>
> C*heers,
>
> ---
>
> Alain Rodriguez - @arodream - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
>
>
> 2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar <prasenjit.sar...@datos.io>:
>
> Hi Kunal,
>
>
>
> Razi's post does give a very lucid description of how cassandra manages
> the hard links inside the backup directory.
>
>
>
> Where it needs clarification is the following:
>
> --> incremental backups is a system wide setting and so its an all or
> nothing approach
>
>
>
> --> as multiple people have stated, incremental backups do not create hard
> links to compacted sstables. however, this can bloat the size of your
> backups
>
>
>
> --> again as stated, it is a general industry practice to place backups in
> a different secondary storage location than the main production site. So
> best to move it to the secondary storage before applying rm on the backups
> folder
>
>
>
> In my experience with production clusters, managing the backups folder
> across multiple nodes can be painful if the objective is to ever recover
> data. With the usual disclaimers, better to rely on third party vendors to
> accomplish the needful rather than scripts/tablesnap.
>
>
>
> Regards
>
> Prasenjit
>
>
>
>
>
> On Wed, Jan 11, 2017 at 7:49 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
> raziuddin.kh...@nih.gov> wrote:
>
> Hello Kunal,
>
>
>
> Caveat: I am not a super-expert on Cassandra, but it helps to explain to
> others, in order to eventually become an expert, so if my explanation is
> wrong, I would hope others would correct me. J
>
>
>
> The active sstables/data files are are all the files located in the
> directory for the table.
>
> You can safely remove all fil

Re: Backups eating up disk space

2017-01-12 Thread Khaja, Raziuddin (NIH/NLM/NCBI) [C]
snapshots are slightly different than backups.

In my explanation of the hardlinks created in the backups folder, notice that 
compacted sstables, never end up in the backups folder.

On the other hand, a snapshot is meant to represent the data at a particular 
moment in time. Thus, the snapshots directory contains hardlinks to all active 
sstables at the time the snapshot was taken, which would include: compacted 
sstables; and any sstables from memtable flush or streamed from other nodes 
that both exist in the table directory and the backups directory.

So, that would be the difference between snapshots and backups.

Best regards,
-Razi


From: Alain RODRIGUEZ <arodr...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday, January 12, 2017 at 9:16 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Backups eating up disk space

My 2 cents,

As I mentioned earlier, we're not currently using snapshots - it's only the 
backups that are bothering me right now.

I believe backups folder is just the new name for the previously called 
snapshots folder. But I can be completely wrong, I haven't played that much 
with snapshots in new versions yet.

Anyway, some operations in Apache Cassandra can trigger a snapshot:

- Repair (when not using parallel option but sequential repairs instead)
- Truncating a table (by default)
- Dropping a table (by default)
- Maybe other I can't think of... ?

If you want to clean space but still keep a backup you can run:

"nodetool clearsnapshots"
"nodetool snapshot "

This way and for a while, data won't be taking space as old files will be 
cleaned and new files will be only hardlinks as detailed above. Then you might 
want to work at a proper backup policy, probably implying getting data out of 
production server (a lot of people uses S3 or similar services). Or just do 
that from time to time, meaning you only keep a backup and disk space behaviour 
will be hard to predict.

C*heers,
---
Alain Rodriguez - @arodream - 
al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar 
<prasenjit.sar...@datos.io<mailto:prasenjit.sar...@datos.io>>:
Hi Kunal,

Razi's post does give a very lucid description of how cassandra manages the 
hard links inside the backup directory.

Where it needs clarification is the following:
--> incremental backups is a system wide setting and so its an all or nothing 
approach

--> as multiple people have stated, incremental backups do not create hard 
links to compacted sstables. however, this can bloat the size of your backups

--> again as stated, it is a general industry practice to place backups in a 
different secondary storage location than the main production site. So best to 
move it to the secondary storage before applying rm on the backups folder

In my experience with production clusters, managing the backups folder across 
multiple nodes can be painful if the objective is to ever recover data. With 
the usual disclaimers, better to rely on third party vendors to accomplish the 
needful rather than scripts/tablesnap.

Regards
Prasenjit


On Wed, Jan 11, 2017 at 7:49 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
<raziuddin.kh...@nih.gov<mailto:raziuddin.kh...@nih.gov>> wrote:
Hello Kunal,

Caveat: I am not a super-expert on Cassandra, but it helps to explain to 
others, in order to eventually become an expert, so if my explanation is wrong, 
I would hope others would correct me. ☺

The active sstables/data files are are all the files located in the directory 
for the table.
You can safely remove all files under the backups/ directory and the directory 
itself.
Removing any files that are current hard-links inside backups won’t cause any 
issues, and I will explain why.

Have you looked at your Cassandra.yaml file and checked the setting for 
incremental_backups?  If it is set to true, and you don’t want to make new 
backups, you can set it to false, so that after you clean up, you will not have 
to clean up the backups again.

Explanation:
Lets look at the the definition of incremental backups again: “Cassandra 
creates a hard link to each SSTable flushed or streamed locally in a backups 
subdirectory of the keyspace data.”

Suppose we have a directory path: my_keyspace/my_table-some-uuid/backups/
In the rest of the discussion, when I refer to “table directory”, I explicitly 
mean the directory: my_keyspace/my_table-some-uuid/
When I refer to backups/ directory, I explicitly mean: 
my_keyspace/my_table-some-uuid/backups/

Suppose that you have an sstable-A that was either flushed from a memtable or 
streamed from another node.
At this point, you have a hardlink to sstable-A in your table directory, and a 
hardlink to sstable-A in your backups/

Re: Backups eating up disk space

2017-01-12 Thread Khaja, Raziuddin (NIH/NLM/NCBI) [C]
Thanks, Prasenjit, I appreciate the compliment ☺

Kunal,  To add to Prasenjit’s comment, it doesn’t make sense to make backups 
unless it is moved to secondary storage.  This means that if you don’t plan to 
move the backups to secondary storage, you should set incremental_backups: 
false, and instead rely on replication and full repair in order to rebuild a 
node that has had catastrophic failure.

I assume that you are not moving backups to secondary storage, so to save 
space, I would turn it off.

Best regards,
-Razi


From: Prasenjit Sarkar <prasenjit.sar...@datos.io>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday, January 12, 2017 at 12:42 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Backups eating up disk space

Hi Kunal,

Razi's post does give a very lucid description of how cassandra manages the 
hard links inside the backup directory.

Where it needs clarification is the following:
--> incremental backups is a system wide setting and so its an all or nothing 
approach

--> as multiple people have stated, incremental backups do not create hard 
links to compacted sstables. however, this can bloat the size of your backups

--> again as stated, it is a general industry practice to place backups in a 
different secondary storage location than the main production site. So best to 
move it to the secondary storage before applying rm on the backups folder

In my experience with production clusters, managing the backups folder across 
multiple nodes can be painful if the objective is to ever recover data. With 
the usual disclaimers, better to rely on third party vendors to accomplish the 
needful rather than scripts/tablesnap.

Regards
Prasenjit

On Wed, Jan 11, 2017 at 7:49 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
<raziuddin.kh...@nih.gov<mailto:raziuddin.kh...@nih.gov>> wrote:
Hello Kunal,

Caveat: I am not a super-expert on Cassandra, but it helps to explain to 
others, in order to eventually become an expert, so if my explanation is wrong, 
I would hope others would correct me. ☺

The active sstables/data files are are all the files located in the directory 
for the table.
You can safely remove all files under the backups/ directory and the directory 
itself.
Removing any files that are current hard-links inside backups won’t cause any 
issues, and I will explain why.

Have you looked at your Cassandra.yaml file and checked the setting for 
incremental_backups?  If it is set to true, and you don’t want to make new 
backups, you can set it to false, so that after you clean up, you will not have 
to clean up the backups again.

Explanation:
Lets look at the the definition of incremental backups again: “Cassandra 
creates a hard link to each SSTable flushed or streamed locally in a backups 
subdirectory of the keyspace data.”

Suppose we have a directory path: my_keyspace/my_table-some-uuid/backups/
In the rest of the discussion, when I refer to “table directory”, I explicitly 
mean the directory: my_keyspace/my_table-some-uuid/
When I refer to backups/ directory, I explicitly mean: 
my_keyspace/my_table-some-uuid/backups/

Suppose that you have an sstable-A that was either flushed from a memtable or 
streamed from another node.
At this point, you have a hardlink to sstable-A in your table directory, and a 
hardlink to sstable-A in your backups/ directory.
Suppose that you have another sstable-B that was also either flushed from a 
memtable or streamed from another node.
At this point, you have a hardlink to sstable-B in your table directory, and a 
hardlink to sstable-B in your backups/ directory.

Next, suppose compaction were to occur, where say sstable-A and sstable-B would 
be compacted to produce sstable-C, representing all the data from A and B.
Now, sstable-C will live in your main table directory, and the hardlinks to 
sstable-A and sstable-B will be deleted in the main table directory, but 
sstable-A and sstable-B will continue to exist in /backups.
At this point, in your main table directory, you will have a hardlink to 
sstable-C. In your backups/ directory you will have hardlinks to sstable-A, and 
sstable-B.

Thus, your main table directory is not cluttered with old un-compacted 
sstables, and only has the sstables along with other files that are actively 
being used.

To drive the point home, …
Suppose that you have another sstable-D that was either flushed from a memtable 
or streamed from another node.
At this point, in your main table directory, you will have sstable-C and 
sstable-D. In your backups/ directory you will have hardlinks to sstable-A, 
sstable-B, and sstable-D.

Next, suppose compaction were to occur where say sstable-C and sstable-D would 
be compacted to produce sstable-E, representing all the data from C and D.
Now, sstable-E will live in your main table directory, and the hardlinks to 
sstable-C and sstable-D will be deleted in the main table d

Re: Backups eating up disk space

2017-01-12 Thread Alain RODRIGUEZ
 the data from A
>> and B.
>>
>> Now, sstable-C will live in your main table directory, and the hardlinks
>> to sstable-A and sstable-B will be deleted in the main table directory, but
>> sstable-A and sstable-B will continue to exist in /backups.
>>
>> At this point, in your main table directory, you will have a hardlink to
>> sstable-C. In your backups/ directory you will have hardlinks to sstable-A,
>> and sstable-B.
>>
>>
>>
>> Thus, your main table directory is not cluttered with old un-compacted
>> sstables, and only has the sstables along with other files that are
>> actively being used.
>>
>>
>>
>> To drive the point home, …
>>
>> Suppose that you have another sstable-D that was either flushed from a
>> memtable or streamed from another node.
>>
>> At this point, in your main table directory, you will have sstable-C and
>> sstable-D. In your backups/ directory you will have hardlinks to sstable-A,
>> sstable-B, and sstable-D.
>>
>>
>>
>> Next, suppose compaction were to occur where say sstable-C and sstable-D
>> would be compacted to produce sstable-E, representing all the data from C
>> and D.
>>
>> Now, sstable-E will live in your main table directory, and the hardlinks
>> to sstable-C and sstable-D will be deleted in the main table directory, but
>> sstable-D will continue to exist in /backups.
>>
>> At this point, in your main table directory, you will have a hardlink to
>> sstable-E. In your backups/ directory you will have hardlinks to sstable-A,
>> sstable-B and sstable-D.
>>
>>
>>
>> As you can see, the /backups directory quickly accumulates with all
>> un-compacted sstables and how it progressively used up more and more space.
>>
>> Also, note that the /backups directory does not contain sstables
>> generated from compaction, such as sstable-C and sstable-E.
>>
>> It is safe to delete the entire backups/ directory because all the data
>> is represented in the compacted sstable-E.
>>
>> I hope this explanation was clear and gives you confidence in using rm to
>> delete the directory for backups/.
>>
>>
>>
>> Best regards,
>>
>> -Razi
>>
>>
>>
>>
>>
>>
>>
>> *From: *Kunal Gangakhedkar <kgangakhed...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Wednesday, January 11, 2017 at 6:47 AM
>>
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Backups eating up disk space
>>
>>
>>
>> Thanks for the reply, Razi.
>>
>>
>>
>> As I mentioned earlier, we're not currently using snapshots - it's only
>> the backups that are bothering me right now.
>>
>>
>>
>> So my next question is pertaining to this statement of yours:
>>
>>
>>
>> As far as I am aware, using *rm* is perfectly safe to delete the
>> directories for snapshots/backups as long as you are careful not to delete
>> your actively used sstable files and directories.
>>
>>
>>
>> How do I find out which are the actively used sstables?
>>
>> If by that you mean the main data files, does that mean I can safely
>> remove all files ONLY under the "backups/" directory?
>>
>> Or, removing any files that are current hard-links inside backups can
>> potentially cause any issues?
>>
>>
>>
>> Thanks,
>>
>> Kunal
>>
>>
>>
>> On 11 January 2017 at 01:06, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
>> raziuddin.kh...@nih.gov> wrote:
>>
>> Hello Kunal,
>>
>>
>>
>> I would take a look at the following configuration options in the
>> Cassandra.yaml
>>
>>
>>
>> *Common automatic backup settings*
>>
>> *Incremental_backups:*
>>
>> http://docs.datastax.com/en/archived/cassandra/3.x/cassandra
>> /configuration/configCassandra_yaml.html#configCassandra_
>> yaml__incremental_backups
>>
>>
>>
>> (Default: false) Backs up data updated since the last snapshot was taken.
>> When enabled, Cassandra creates a hard link to each SSTable flushed or
>> streamed locally in a backups subdirectory of the keyspace data. Removing
>> these links is the operator's responsibility.
>>
>>
>>
>> *snapshot_before_compaction*:
>>
>> http://docs.datastax.com/en/archived/cassandra/3.x/cassandra
>> /configuration/configC

Re: Backups eating up disk space

2017-01-11 Thread Prasenjit Sarkar
t;
> Also, note that the /backups directory does not contain sstables generated
> from compaction, such as sstable-C and sstable-E.
>
> It is safe to delete the entire backups/ directory because all the data is
> represented in the compacted sstable-E.
>
> I hope this explanation was clear and gives you confidence in using rm to
> delete the directory for backups/.
>
>
>
> Best regards,
>
> -Razi
>
>
>
>
>
>
>
> *From: *Kunal Gangakhedkar <kgangakhed...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Wednesday, January 11, 2017 at 6:47 AM
>
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Backups eating up disk space
>
>
>
> Thanks for the reply, Razi.
>
>
>
> As I mentioned earlier, we're not currently using snapshots - it's only
> the backups that are bothering me right now.
>
>
>
> So my next question is pertaining to this statement of yours:
>
>
>
> As far as I am aware, using *rm* is perfectly safe to delete the
> directories for snapshots/backups as long as you are careful not to delete
> your actively used sstable files and directories.
>
>
>
> How do I find out which are the actively used sstables?
>
> If by that you mean the main data files, does that mean I can safely
> remove all files ONLY under the "backups/" directory?
>
> Or, removing any files that are current hard-links inside backups can
> potentially cause any issues?
>
>
>
> Thanks,
>
> Kunal
>
>
>
> On 11 January 2017 at 01:06, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
> raziuddin.kh...@nih.gov> wrote:
>
> Hello Kunal,
>
>
>
> I would take a look at the following configuration options in the
> Cassandra.yaml
>
>
>
> *Common automatic backup settings*
>
> *Incremental_backups:*
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__
> incremental_backups
>
>
>
> (Default: false) Backs up data updated since the last snapshot was taken.
> When enabled, Cassandra creates a hard link to each SSTable flushed or
> streamed locally in a backups subdirectory of the keyspace data. Removing
> these links is the operator's responsibility.
>
>
>
> *snapshot_before_compaction*:
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__
> snapshot_before_compaction
>
>
>
> (Default: false) Enables or disables taking a snapshot before each
> compaction. A snapshot is useful to back up data when there is a data
> format change. Be careful using this option: Cassandra does not clean up
> older snapshots automatically.
>
>
>
>
>
> *Advanced automatic backup setting*
>
> *auto_snapshot*:
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#
> configCassandra_yaml__auto_snapshot
>
>
>
> (Default: true) Enables or disables whether Cassandra takes a snapshot of
> the data before truncating a keyspace or dropping a table. To prevent data
> loss, Datastax strongly advises using the default setting. If you
> set auto_snapshot to false, you lose data on truncation or drop.
>
>
>
>
>
> *nodetool* also provides methods to manage snapshots.
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/tools/toolsNodetool.html
>
> See the specific commands:
>
>- nodetool clearsnapshot
>
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsClearSnapShot.html>
>Removes one or more snapshots.
>- nodetool listsnapshots
>
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsListSnapShots.html>
>Lists snapshot names, size on disk, and true size.
>- nodetool snapshot
>
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsSnapShot.html>
>Take a snapshot of one or more keyspaces, or of a table, to backup
>data.
>
>
>
> As far as I am aware, using *rm* is perfectly safe to delete the
> directories for snapshots/backups as long as you are careful not to delete
> your actively used sstable files and directories.  I think the *nodetool
> clearsnapshot* command is provided so that you don’t accidentally delete
> actively used files.  Last I used *clearsnapshot*, (a very long time
> ago), I thought it left behind the directory, but this could have been
> fixed in newer versions (so you might want to check that).
>
>
>
> HTH
>
> -Razi
>
&g

Re: Backups eating up disk space

2017-01-11 Thread Khaja, Raziuddin (NIH/NLM/NCBI) [C]
Hello Kunal,

Caveat: I am not a super-expert on Cassandra, but it helps to explain to 
others, in order to eventually become an expert, so if my explanation is wrong, 
I would hope others would correct me. ☺

The active sstables/data files are are all the files located in the directory 
for the table.
You can safely remove all files under the backups/ directory and the directory 
itself.
Removing any files that are current hard-links inside backups won’t cause any 
issues, and I will explain why.

Have you looked at your Cassandra.yaml file and checked the setting for 
incremental_backups?  If it is set to true, and you don’t want to make new 
backups, you can set it to false, so that after you clean up, you will not have 
to clean up the backups again.

Explanation:
Lets look at the the definition of incremental backups again: “Cassandra 
creates a hard link to each SSTable flushed or streamed locally in a backups 
subdirectory of the keyspace data.”

Suppose we have a directory path: my_keyspace/my_table-some-uuid/backups/
In the rest of the discussion, when I refer to “table directory”, I explicitly 
mean the directory: my_keyspace/my_table-some-uuid/
When I refer to backups/ directory, I explicitly mean: 
my_keyspace/my_table-some-uuid/backups/

Suppose that you have an sstable-A that was either flushed from a memtable or 
streamed from another node.
At this point, you have a hardlink to sstable-A in your table directory, and a 
hardlink to sstable-A in your backups/ directory.
Suppose that you have another sstable-B that was also either flushed from a 
memtable or streamed from another node.
At this point, you have a hardlink to sstable-B in your table directory, and a 
hardlink to sstable-B in your backups/ directory.

Next, suppose compaction were to occur, where say sstable-A and sstable-B would 
be compacted to produce sstable-C, representing all the data from A and B.
Now, sstable-C will live in your main table directory, and the hardlinks to 
sstable-A and sstable-B will be deleted in the main table directory, but 
sstable-A and sstable-B will continue to exist in /backups.
At this point, in your main table directory, you will have a hardlink to 
sstable-C. In your backups/ directory you will have hardlinks to sstable-A, and 
sstable-B.

Thus, your main table directory is not cluttered with old un-compacted 
sstables, and only has the sstables along with other files that are actively 
being used.

To drive the point home, …
Suppose that you have another sstable-D that was either flushed from a memtable 
or streamed from another node.
At this point, in your main table directory, you will have sstable-C and 
sstable-D. In your backups/ directory you will have hardlinks to sstable-A, 
sstable-B, and sstable-D.

Next, suppose compaction were to occur where say sstable-C and sstable-D would 
be compacted to produce sstable-E, representing all the data from C and D.
Now, sstable-E will live in your main table directory, and the hardlinks to 
sstable-C and sstable-D will be deleted in the main table directory, but 
sstable-D will continue to exist in /backups.
At this point, in your main table directory, you will have a hardlink to 
sstable-E. In your backups/ directory you will have hardlinks to sstable-A, 
sstable-B and sstable-D.

As you can see, the /backups directory quickly accumulates with all 
un-compacted sstables and how it progressively used up more and more space.
Also, note that the /backups directory does not contain sstables generated from 
compaction, such as sstable-C and sstable-E.
It is safe to delete the entire backups/ directory because all the data is 
represented in the compacted sstable-E.
I hope this explanation was clear and gives you confidence in using rm to 
delete the directory for backups/.

Best regards,
-Razi



From: Kunal Gangakhedkar <kgangakhed...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Wednesday, January 11, 2017 at 6:47 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Backups eating up disk space

Thanks for the reply, Razi.

As I mentioned earlier, we're not currently using snapshots - it's only the 
backups that are bothering me right now.

So my next question is pertaining to this statement of yours:

As far as I am aware, using rm is perfectly safe to delete the directories for 
snapshots/backups as long as you are careful not to delete your actively used 
sstable files and directories.

How do I find out which are the actively used sstables?
If by that you mean the main data files, does that mean I can safely remove all 
files ONLY under the "backups/" directory?
Or, removing any files that are current hard-links inside backups can 
potentially cause any issues?

Thanks,
Kunal

On 11 January 2017 at 01:06, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
<raziuddin.kh...@nih.gov<mailto:raziuddin.kh...@nih.gov>> wrote:
Hello Kunal,

Re: Backups eating up disk space

2017-01-11 Thread Kunal Gangakhedkar
Thanks for the reply, Razi.

As I mentioned earlier, we're not currently using snapshots - it's only the
backups that are bothering me right now.

So my next question is pertaining to this statement of yours:

As far as I am aware, using *rm* is perfectly safe to delete the
> directories for snapshots/backups as long as you are careful not to delete
> your actively used sstable files and directories.


How do I find out which are the actively used sstables?
If by that you mean the main data files, does that mean I can safely remove
all files ONLY under the "backups/" directory?
Or, removing any files that are current hard-links inside backups can
potentially cause any issues?

Thanks,
Kunal

On 11 January 2017 at 01:06, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
raziuddin.kh...@nih.gov> wrote:

> Hello Kunal,
>
>
>
> I would take a look at the following configuration options in the
> Cassandra.yaml
>
>
>
> *Common automatic backup settings*
>
> *Incremental_backups:*
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__
> incremental_backups
>
>
>
> (Default: false) Backs up data updated since the last snapshot was taken.
> When enabled, Cassandra creates a hard link to each SSTable flushed or
> streamed locally in a backups subdirectory of the keyspace data. Removing
> these links is the operator's responsibility.
>
>
>
> *snapshot_before_compaction*:
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__
> snapshot_before_compaction
>
>
>
> (Default: false) Enables or disables taking a snapshot before each
> compaction. A snapshot is useful to back up data when there is a data
> format change. Be careful using this option: Cassandra does not clean up
> older snapshots automatically.
>
>
>
>
>
> *Advanced automatic backup setting*
>
> *auto_snapshot*:
>
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/configuration/configCassandra_yaml.html#
> configCassandra_yaml__auto_snapshot
>
>
>
> (Default: true) Enables or disables whether Cassandra takes a snapshot of
> the data before truncating a keyspace or dropping a table. To prevent data
> loss, Datastax strongly advises using the default setting. If you
> set auto_snapshot to false, you lose data on truncation or drop.
>
>
>
>
>
> *nodetool* also provides methods to manage snapshots.
> http://docs.datastax.com/en/archived/cassandra/3.x/
> cassandra/tools/toolsNodetool.html
>
> See the specific commands:
>
>- nodetool clearsnapshot
>
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsClearSnapShot.html>
>Removes one or more snapshots.
>- nodetool listsnapshots
>
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsListSnapShots.html>
>Lists snapshot names, size on disk, and true size.
>- nodetool snapshot
>
> <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsSnapShot.html>
>Take a snapshot of one or more keyspaces, or of a table, to backup
>data.
>
>
>
> As far as I am aware, using *rm* is perfectly safe to delete the
> directories for snapshots/backups as long as you are careful not to delete
> your actively used sstable files and directories.  I think the *nodetool
> clearsnapshot* command is provided so that you don’t accidentally delete
> actively used files.  Last I used *clearsnapshot*, (a very long time
> ago), I thought it left behind the directory, but this could have been
> fixed in newer versions (so you might want to check that).
>
>
>
> HTH
>
> -Razi
>
>
>
>
>
> *From: *Jonathan Haddad <j...@jonhaddad.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Tuesday, January 10, 2017 at 12:26 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Backups eating up disk space
>
>
>
> If you remove the files from the backup directory, you would not have data
> loss in the case of a node going down.  They're hard links to the same
> files that are in your data directory, and are created when an sstable is
> written to disk.  At the time, they take up (almost) no space, so they
> aren't a big deal, but when the sstable gets compacted, they stick around,
> so they end up not freeing space up.
>
>
>
> Usually you use incremental backups as a means of moving the sstables off
> the node to a backup location.  If you're not doing anything with them,
> they're just wasting space and you should d

Re: Backups eating up disk space

2017-01-10 Thread Khaja, Raziuddin (NIH/NLM/NCBI) [C]
Hello Kunal,

I would take a look at the following configuration options in the Cassandra.yaml

Common automatic backup settings
Incremental_backups:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__incremental_backups

(Default: false) Backs up data updated since the last snapshot was taken. When 
enabled, Cassandra creates a hard link to each SSTable flushed or streamed 
locally in a backups subdirectory of the keyspace data. Removing these links is 
the operator's responsibility.

snapshot_before_compaction:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__snapshot_before_compaction

(Default: false) Enables or disables taking a snapshot before each compaction. 
A snapshot is useful to back up data when there is a data format change. Be 
careful using this option: Cassandra does not clean up older snapshots 
automatically.


Advanced automatic backup setting
auto_snapshot:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__auto_snapshot

(Default: true) Enables or disables whether Cassandra takes a snapshot of the 
data before truncating a keyspace or dropping a table. To prevent data loss, 
Datastax strongly advises using the default setting. If you set auto_snapshot 
to false, you lose data on truncation or drop.


nodetool also provides methods to manage snapshots. 
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsNodetool.html
See the specific commands:

  *   nodetool 
clearsnapshot<http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsClearSnapShot.html>
Removes one or more snapshots.
  *   nodetool 
listsnapshots<http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsListSnapShots.html>
Lists snapshot names, size on disk, and true size.
  *   nodetool 
snapshot<http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsSnapShot.html>
Take a snapshot of one or more keyspaces, or of a table, to backup data.

As far as I am aware, using rm is perfectly safe to delete the directories for 
snapshots/backups as long as you are careful not to delete your actively used 
sstable files and directories.  I think the nodetool clearsnapshot command is 
provided so that you don’t accidentally delete actively used files.  Last I 
used clearsnapshot, (a very long time ago), I thought it left behind the 
directory, but this could have been fixed in newer versions (so you might want 
to check that).

HTH
-Razi


From: Jonathan Haddad <j...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, January 10, 2017 at 12:26 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Backups eating up disk space

If you remove the files from the backup directory, you would not have data loss 
in the case of a node going down.  They're hard links to the same files that 
are in your data directory, and are created when an sstable is written to disk. 
 At the time, they take up (almost) no space, so they aren't a big deal, but 
when the sstable gets compacted, they stick around, so they end up not freeing 
space up.

Usually you use incremental backups as a means of moving the sstables off the 
node to a backup location.  If you're not doing anything with them, they're 
just wasting space and you should disable incremental backups.

Some people take snapshots then rely on incremental backups.  Others use the 
tablesnap utility which does sort of the same thing.

On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar 
<kgangakhed...@gmail.com<mailto:kgangakhed...@gmail.com>> wrote:
Thanks for quick reply, Jon.

But, what about in case of node/cluster going down? Would there be data loss if 
I remove these files manually?

How is it typically managed in production setups?
What are the best-practices for the same?
Do people take snapshots on each node before removing the backups?

This is my first production deployment - so, still trying to learn.

Thanks,
Kunal

On 10 January 2017 at 21:36, Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
You can just delete them off the filesystem (rm)

On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar 
<kgangakhed...@gmail.com<mailto:kgangakhed...@gmail.com>> wrote:
Hi all,

We have a 3-node cassandra cluster with incremental backup set to true.
Each node has 1TB data volume that stores cassandra data.

The load in the output of 'nodetool status' comes up at around 260GB each node.
All our keyspaces use replication factor = 3.

However, the df output shows the data volumes consuming around 850GB of space.
I checked the keyspace directory structures - most of the space goes in 
/data///backups.

We have never manually run snapshots.

Re: Backups eating up disk space

2017-01-10 Thread Jonathan Haddad
If you remove the files from the backup directory, you would not have data
loss in the case of a node going down.  They're hard links to the same
files that are in your data directory, and are created when an sstable is
written to disk.  At the time, they take up (almost) no space, so they
aren't a big deal, but when the sstable gets compacted, they stick around,
so they end up not freeing space up.

Usually you use incremental backups as a means of moving the sstables off
the node to a backup location.  If you're not doing anything with them,
they're just wasting space and you should disable incremental backups.

Some people take snapshots then rely on incremental backups.  Others use
the tablesnap utility which does sort of the same thing.

On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar 
wrote:

> Thanks for quick reply, Jon.
>
> But, what about in case of node/cluster going down? Would there be data
> loss if I remove these files manually?
>
> How is it typically managed in production setups?
> What are the best-practices for the same?
> Do people take snapshots on each node before removing the backups?
>
> This is my first production deployment - so, still trying to learn.
>
> Thanks,
> Kunal
>
> On 10 January 2017 at 21:36, Jonathan Haddad  wrote:
>
> You can just delete them off the filesystem (rm)
>
> On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar <
> kgangakhed...@gmail.com> wrote:
>
> Hi all,
>
> We have a 3-node cassandra cluster with incremental backup set to true.
> Each node has 1TB data volume that stores cassandra data.
>
> The load in the output of 'nodetool status' comes up at around 260GB each
> node.
> All our keyspaces use replication factor = 3.
>
> However, the df output shows the data volumes consuming around 850GB of
> space.
> I checked the keyspace directory structures - most of the space goes in
> /data///backups.
>
> We have never manually run snapshots.
>
> What is the typical procedure to clear the backups?
> Can it be done without taking the node offline?
>
> Thanks,
> Kunal
>
>
>


Re: Backups eating up disk space

2017-01-10 Thread Kunal Gangakhedkar
Thanks for quick reply, Jon.

But, what about in case of node/cluster going down? Would there be data
loss if I remove these files manually?

How is it typically managed in production setups?
What are the best-practices for the same?
Do people take snapshots on each node before removing the backups?

This is my first production deployment - so, still trying to learn.

Thanks,
Kunal

On 10 January 2017 at 21:36, Jonathan Haddad  wrote:

> You can just delete them off the filesystem (rm)
>
> On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar <
> kgangakhed...@gmail.com> wrote:
>
>> Hi all,
>>
>> We have a 3-node cassandra cluster with incremental backup set to true.
>> Each node has 1TB data volume that stores cassandra data.
>>
>> The load in the output of 'nodetool status' comes up at around 260GB each
>> node.
>> All our keyspaces use replication factor = 3.
>>
>> However, the df output shows the data volumes consuming around 850GB of
>> space.
>> I checked the keyspace directory structures - most of the space goes in
>> /data///backups.
>>
>> We have never manually run snapshots.
>>
>> What is the typical procedure to clear the backups?
>> Can it be done without taking the node offline?
>>
>> Thanks,
>> Kunal
>>
>


Re: Backups eating up disk space

2017-01-10 Thread Jonathan Haddad
You can just delete them off the filesystem (rm)

On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar 
wrote:

> Hi all,
>
> We have a 3-node cassandra cluster with incremental backup set to true.
> Each node has 1TB data volume that stores cassandra data.
>
> The load in the output of 'nodetool status' comes up at around 260GB each
> node.
> All our keyspaces use replication factor = 3.
>
> However, the df output shows the data volumes consuming around 850GB of
> space.
> I checked the keyspace directory structures - most of the space goes in
> /data///backups.
>
> We have never manually run snapshots.
>
> What is the typical procedure to clear the backups?
> Can it be done without taking the node offline?
>
> Thanks,
> Kunal
>


Backups eating up disk space

2017-01-10 Thread Kunal Gangakhedkar
Hi all,

We have a 3-node cassandra cluster with incremental backup set to true.
Each node has 1TB data volume that stores cassandra data.

The load in the output of 'nodetool status' comes up at around 260GB each
node.
All our keyspaces use replication factor = 3.

However, the df output shows the data volumes consuming around 850GB of
space.
I checked the keyspace directory structures - most of the space goes in
/data///backups.

We have never manually run snapshots.

What is the typical procedure to clear the backups?
Can it be done without taking the node offline?

Thanks,
Kunal