RE: Removal of old data files

2011-09-28 Thread hiroyuki.watanabe
Thank you.  You have been very helpful.

We stored only one day worth of data for now. However, we want to store 5 days 
worth of data eventually.
That is 5 times more disk space.
That is our main reason for us to look at older SSTables that appears to be 
holding only tombstones.

- yuki



From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, September 27, 2011 8:22 PM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

Short Answer: Cassandra will actively delete files when it needs to make space. 
Otherwise they will be deleted some time later. Unless you are getting out of 
disk space errors it's not normally something to worry about.

Longer:
The TTL guarantee is do not return this data to get requests after this many 
seconds.

Data is purged from an SSTable when we run compactions (either minor/auto or 
major/manual). Purging means it will not be written in the new SSTable created 
by the compaction process. The main criteria for purging is that either 
gc_grace_seconds OR ttl have expired on the column.

After compaction completes it's writes the -Compacted for the SSTables that 
were compacted. But there is a bunch of logic associated with which files are 
compacted, it's not compact the oldest 3 files.  Basically it tries to 
compact files which are about the same size.

Remember we *never* modify data on disk. If we want to remove data from an 
SSTable we have to write a new SSTable. It's one of the reasons things writes 
are fast http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

Hope that helps.

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28/09/2011, at 10:16 AM, 
hiroyuki.watan...@barclayscapital.commailto:hiroyuki.watan...@barclayscapital.com
 wrote:


Now, we use TTL of 12 hours and GC grace period of 8 hours for encouraging 
Cassandra to remove old data/files more aggressively.

Cassandra do remove fair amount of old data files.
Cassandra tends to removed 4 out of every 5 files.
I notice it because data file has a sequence number as a part of name.

I also noticed when Cassandra generated *-Compacted file it generated 4 file at 
a time.
They have consecutive numbers as file name, but skip one number from the 
previous group of 4.
The one missing is the file that is failed to be removed in the end and stays 
forever.

I looked at the Keys in an index file that failed to be removed.  If I make 
query of any of keys, Cassandra indicates that there is not data, which is 
correct because these files are older than 24 hours.  All the data must be 
obsolete due to TTL.

I am wondering why Cassandra does not remove all data file whose time stamp is 
much older than TTL + grace period.

Does anybody have similar experience ?


Yuki Watanabe



-Original Message-
From: Watanabe, Hiroyuki: IT (NYK)
Sent: Friday, September 02, 2011 9:01 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Removal of old data files


I see. Thank you for helpful information

Yuki



-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Friday, September 02, 2011 3:40 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Removal of old data files

On Fri, Sep 2, 2011 at 12:11 AM,  
hiroyuki.watan...@barclayscapital.commailto:hiroyuki.watan...@barclayscapital.com
 wrote:
Yes, I see files with name like
Orders-g-6517-Compacted

However, all of those file have a size of 0.

Starting from Monday to Thurseday we have 5642 files for -Data.db,
-Filter.db and Statistics.db and only 128 -Compacted files.
and all of -Compacted file has size of 0.

Is this normal, or we are doing something wrong?

You are not doing something wrong. The -Compacted files are just marker, to 
indicate that the -Data file corresponding (with the same number) are, in fact, 
compacted and will eventually be removed. So those files will always have a 
size of 0.

--
Sylvain



yuki


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, August 25, 2011 6:13 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Removal of old data files

If cassandra does not have enough disk space to create a new file it
will provoke a JVM GC which should result in compacted SStables that
are no longer needed been deleted. Otherwise they are deleted at some
time in the future.
Compacted SSTables have a file written out with a compacted extension.
Do you see compacted sstables in the data directory?
Cheers.
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 26/08/2011, at 2:29 AM, yuki watanabe wrote:

We are using Cassandra 0.8.0 with 8 node ring and only one CF.
Every column has TTL of 86400 (24 hours). we also set 'GC grace
second' to 43200
(12 hours).  We have to store massive amount of data for one day now

Re: Removal of old data files

2011-09-28 Thread aaron morton
For background:

Minor compaction will bucket files (see 
https://github.com/apache/cassandra/blob/cassandra-0.8.6/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L989)
 and then compact them if they have more than min_compaction_threshold set per 
CF. 

It will then purge tombstones if a row is only contained in the SSTables 
involved in the compaction (see 
https://github.com/apache/cassandra/blob/cassandra-0.8.6/src/java/org/apache/cassandra/db/compaction/CompactionController.java#L84)
 

So there are a couple of approaches you can take if you want to ensure all 
TTL'd data is purged ASAP. Note that Tombstones and TTL'd data will be 
automatically purged at some point, but if more precise control you may need to 
take a few steps.  

First if all the data you are storing has a 24 hour TTL you can include a 
manualy major compaction via node tool in your maintenance routine. We normally 
advise against it because it tries to create one large file, but if all your 
data is going to be removed it's prob ok. 

Second, play around with minor compaction to increase the chances that data is 
purged soon after the 

Third, monkey up a process to kick of user defined compaction runs for SSTables 
that are over 24 hours old. 

I know disk space can be an issue, but if you have the spare capacity you can 
just let cassandra manage things. Also 1.0 has some major changes in this area.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29/09/2011, at 4:39 AM, hiroyuki.watan...@barclayscapital.com wrote:

 Thank you.  You have been very helpful.
  
 We stored only one day worth of data for now. However, we want to store 5 
 days worth of data eventually.
 That is 5 times more disk space.
 That is our main reason for us to look at older SSTables that appears to be 
 holding only tombstones.
  
 - yuki
  
 
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Tuesday, September 27, 2011 8:22 PM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files
 
 Short Answer: Cassandra will actively delete files when it needs to make 
 space. Otherwise they will be deleted some time later. Unless you are 
 getting out of disk space errors it's not normally something to worry about. 
 
 Longer: 
 The TTL guarantee is do not return this data to get requests after this many 
 seconds.
  
 Data is purged from an SSTable when we run compactions (either minor/auto 
 or major/manual). Purging means it will not be written in the new SSTable 
 created by the compaction process. The main criteria for purging is that 
 either gc_grace_seconds OR ttl have expired on the column. 
 
 After compaction completes it's writes the -Compacted for the SSTables that 
 were compacted. But there is a bunch of logic associated with which files are 
 compacted, it's not compact the oldest 3 files.  Basically it tries to 
 compact files which are about the same size. 
 
 Remember we *never* modify data on disk. If we want to remove data from an 
 SSTable we have to write a new SSTable. It's one of the reasons things writes 
 are fast http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 28/09/2011, at 10:16 AM, hiroyuki.watan...@barclayscapital.com wrote:
 
 
 Now, we use TTL of 12 hours and GC grace period of 8 hours for encouraging 
 Cassandra to remove old data/files more aggressively.  
 
 Cassandra do remove fair amount of old data files. 
 Cassandra tends to removed 4 out of every 5 files. 
 I notice it because data file has a sequence number as a part of name.
 
 I also noticed when Cassandra generated *-Compacted file it generated 4 file 
 at a time. 
 They have consecutive numbers as file name, but skip one number from the 
 previous group of 4. 
 The one missing is the file that is failed to be removed in the end and 
 stays forever. 
 
 I looked at the Keys in an index file that failed to be removed.  If I make 
 query of any of keys, Cassandra indicates that there is not data, which is 
 correct because these files are older than 24 hours.  All the data must be 
 obsolete due to TTL. 
 
 I am wondering why Cassandra does not remove all data file whose time stamp 
 is much older than TTL + grace period. 
 
 Does anybody have similar experience ? 
 
 
 Yuki Watanabe
 
 
 
 -Original Message-
 From: Watanabe, Hiroyuki: IT (NYK) 
 Sent: Friday, September 02, 2011 9:01 AM
 To: user@cassandra.apache.org
 Subject: RE: Removal of old data files
 
 
 I see. Thank you for helpful information 
 
 Yuki
 
 
 
 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: Friday, September 02, 2011 3:40 AM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files
 
 On Fri, Sep 2, 2011 at 12:11 AM,  hiroyuki.watan...@barclayscapital.com 
 wrote:
 Yes, I see

RE: Removal of old data files

2011-09-27 Thread hiroyuki.watanabe

Now, we use TTL of 12 hours and GC grace period of 8 hours for encouraging 
Cassandra to remove old data/files more aggressively.  

Cassandra do remove fair amount of old data files. 
Cassandra tends to removed 4 out of every 5 files. 
I notice it because data file has a sequence number as a part of name.

I also noticed when Cassandra generated *-Compacted file it generated 4 file at 
a time. 
They have consecutive numbers as file name, but skip one number from the 
previous group of 4. 
The one missing is the file that is failed to be removed in the end and stays 
forever. 

I looked at the Keys in an index file that failed to be removed.  If I make 
query of any of keys, Cassandra indicates that there is not data, which is 
correct because these files are older than 24 hours.  All the data must be 
obsolete due to TTL. 
  
I am wondering why Cassandra does not remove all data file whose time stamp is 
much older than TTL + grace period. 

Does anybody have similar experience ? 


Yuki Watanabe



-Original Message-
From: Watanabe, Hiroyuki: IT (NYK) 
Sent: Friday, September 02, 2011 9:01 AM
To: user@cassandra.apache.org
Subject: RE: Removal of old data files

 
I see. Thank you for helpful information 

Yuki



-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Friday, September 02, 2011 3:40 AM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

On Fri, Sep 2, 2011 at 12:11 AM,  hiroyuki.watan...@barclayscapital.com wrote:
 Yes, I see files with name like
     Orders-g-6517-Compacted

 However, all of those file have a size of 0.

 Starting from Monday to Thurseday we have 5642 files for -Data.db, 
 -Filter.db and Statistics.db and only 128 -Compacted files.
 and all of -Compacted file has size of 0.

 Is this normal, or we are doing something wrong?

You are not doing something wrong. The -Compacted files are just marker, to 
indicate that the -Data file corresponding (with the same number) are, in fact, 
compacted and will eventually be removed. So those files will always have a 
size of 0.

--
Sylvain



 yuki

 
 From: aaron morton [mailto:aa...@thelastpickle.com]
 Sent: Thursday, August 25, 2011 6:13 PM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files

 If cassandra does not have enough disk space to create a new file it 
 will provoke a JVM GC which should result in compacted SStables that 
 are no longer needed been deleted. Otherwise they are deleted at some 
 time in the future.
 Compacted SSTables have a file written out with a compacted extension.
 Do you see compacted sstables in the data directory?
 Cheers.
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 26/08/2011, at 2:29 AM, yuki watanabe wrote:

 We are using Cassandra 0.8.0 with 8 node ring and only one CF.
 Every column has TTL of 86400 (24 hours). we also set 'GC grace 
 second' to 43200
 (12 hours).  We have to store massive amount of data for one day now 
 and eventually for five days if we get more disk space.
 Even for one day, we do run out disk space in a busy day.

 We run nodetool compact command at night or as necessary then we run 
 GC from jconsole. We observed that  GC did remove files but not 
 necessarily oldest ones.
 Data files from more than 36 hours ago and quite often three days ago 
 are still there.

 Does this behavior expected or we need adjust some other parameters?


 Yuki Watanabe

 ___



 This e-mail may contain information that is confidential, privileged 
 or otherwise protected from disclosure. If you are not an intended 
 recipient of this e-mail, do not duplicate or redistribute it by any 
 means. Please delete it and any attachments and notify the sender that 
 you have received it in error. Unless specifically indicated, this 
 e-mail is not an offer to buy or sell or a solicitation to buy or sell 
 any securities, investment products or other financial product or 
 service, an official confirmation of any transaction, or an official 
 statement of Barclays. Any views or opinions presented are solely 
 those of the author and do not necessarily represent those of 
 Barclays. This e-mail is subject to terms available at the following
 link: www.barcap.com/emaildisclaimer. By messaging with Barclays you 
 consent to the foregoing.  Barclays Capital is the investment banking 
 division of Barclays Bank PLC, a company registered in England (number
 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
 This email may relate to or be sent from other members of the Barclays 
 Group.

 ___


Re: Removal of old data files

2011-09-27 Thread aaron morton
Short Answer: Cassandra will actively delete files when it needs to make space. 
Otherwise they will be deleted some time later. Unless you are getting out of 
disk space errors it's not normally something to worry about. 

Longer: 
The TTL guarantee is do not return this data to get requests after 
this many seconds.
 
Data is purged from an SSTable when we run compactions (either 
minor/auto or major/manual). Purging means it will not be written in the new 
SSTable created by the compaction process. The main criteria for purging is 
that either gc_grace_seconds OR ttl have expired on the column. 

After compaction completes it's writes the -Compacted for the SSTables 
that were compacted. But there is a bunch of logic associated with which files 
are compacted, it's not compact the oldest 3 files.  Basically it tries to 
compact files which are about the same size. 

Remember we *never* modify data on disk. If we want to remove data from 
an SSTable we have to write a new SSTable. It's one of the reasons things 
writes are fast http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28/09/2011, at 10:16 AM, hiroyuki.watan...@barclayscapital.com wrote:

 
 Now, we use TTL of 12 hours and GC grace period of 8 hours for encouraging 
 Cassandra to remove old data/files more aggressively.  
 
 Cassandra do remove fair amount of old data files. 
 Cassandra tends to removed 4 out of every 5 files. 
 I notice it because data file has a sequence number as a part of name.
 
 I also noticed when Cassandra generated *-Compacted file it generated 4 file 
 at a time. 
 They have consecutive numbers as file name, but skip one number from the 
 previous group of 4. 
 The one missing is the file that is failed to be removed in the end and stays 
 forever. 
 
 I looked at the Keys in an index file that failed to be removed.  If I make 
 query of any of keys, Cassandra indicates that there is not data, which is 
 correct because these files are older than 24 hours.  All the data must be 
 obsolete due to TTL. 
 
 I am wondering why Cassandra does not remove all data file whose time stamp 
 is much older than TTL + grace period. 
 
 Does anybody have similar experience ? 
 
 
 Yuki Watanabe
 
 
 
 -Original Message-
 From: Watanabe, Hiroyuki: IT (NYK) 
 Sent: Friday, September 02, 2011 9:01 AM
 To: user@cassandra.apache.org
 Subject: RE: Removal of old data files
 
 
 I see. Thank you for helpful information 
 
 Yuki
 
 
 
 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: Friday, September 02, 2011 3:40 AM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files
 
 On Fri, Sep 2, 2011 at 12:11 AM,  hiroyuki.watan...@barclayscapital.com 
 wrote:
 Yes, I see files with name like
 Orders-g-6517-Compacted
 
 However, all of those file have a size of 0.
 
 Starting from Monday to Thurseday we have 5642 files for -Data.db, 
 -Filter.db and Statistics.db and only 128 -Compacted files.
 and all of -Compacted file has size of 0.
 
 Is this normal, or we are doing something wrong?
 
 You are not doing something wrong. The -Compacted files are just marker, to 
 indicate that the -Data file corresponding (with the same number) are, in 
 fact, compacted and will eventually be removed. So those files will always 
 have a size of 0.
 
 --
 Sylvain
 
 
 
 yuki
 
 
 From: aaron morton [mailto:aa...@thelastpickle.com]
 Sent: Thursday, August 25, 2011 6:13 PM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files
 
 If cassandra does not have enough disk space to create a new file it 
 will provoke a JVM GC which should result in compacted SStables that 
 are no longer needed been deleted. Otherwise they are deleted at some 
 time in the future.
 Compacted SSTables have a file written out with a compacted extension.
 Do you see compacted sstables in the data directory?
 Cheers.
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 26/08/2011, at 2:29 AM, yuki watanabe wrote:
 
 We are using Cassandra 0.8.0 with 8 node ring and only one CF.
 Every column has TTL of 86400 (24 hours). we also set 'GC grace 
 second' to 43200
 (12 hours).  We have to store massive amount of data for one day now 
 and eventually for five days if we get more disk space.
 Even for one day, we do run out disk space in a busy day.
 
 We run nodetool compact command at night or as necessary then we run 
 GC from jconsole. We observed that  GC did remove files but not 
 necessarily oldest ones.
 Data files from more than 36 hours ago and quite often three days ago 
 are still there.
 
 Does this behavior expected or we need adjust some other parameters?
 
 
 Yuki Watanabe
 
 ___
 
 
 
 This e

Re: Removal of old data files

2011-09-02 Thread Sylvain Lebresne
On Fri, Sep 2, 2011 at 12:11 AM,  hiroyuki.watan...@barclayscapital.com wrote:
 Yes, I see files with name like
     Orders-g-6517-Compacted

 However, all of those file have a size of 0.

 Starting from Monday to Thurseday we have 5642 files for -Data.db,
 -Filter.db and Statistics.db and only 128 -Compacted files.
 and all of -Compacted file has size of 0.

 Is this normal, or we are doing something wrong?

You are not doing something wrong. The -Compacted files are just
marker, to indicate
that the -Data file corresponding (with the same number) are, in fact,
compacted and
will eventually be removed. So those files will always have a size of 0.

--
Sylvain



 yuki

 
 From: aaron morton [mailto:aa...@thelastpickle.com]
 Sent: Thursday, August 25, 2011 6:13 PM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files

 If cassandra does not have enough disk space to create a new file it will
 provoke a JVM GC which should result in compacted SStables that are no
 longer needed been deleted. Otherwise they are deleted at some time in the
 future.
 Compacted SSTables have a file written out with a compacted extension.
 Do you see compacted sstables in the data directory?
 Cheers.
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 26/08/2011, at 2:29 AM, yuki watanabe wrote:

 We are using Cassandra 0.8.0 with 8 node ring and only one CF.
 Every column has TTL of 86400 (24 hours). we also set 'GC grace second' to
 43200
 (12 hours).  We have to store massive amount of data for one day now and
 eventually for five days if we get more disk space.
 Even for one day, we do run out disk space in a busy day.

 We run nodetool compact command at night or as necessary then we run GC from
 jconsole. We observed that  GC did remove files but not necessarily oldest
 ones.
 Data files from more than 36 hours ago and quite often three days ago are
 still there.

 Does this behavior expected or we need adjust some other parameters?


 Yuki Watanabe

 ___



 This e-mail may contain information that is confidential, privileged or
 otherwise protected from disclosure. If you are not an intended recipient of
 this e-mail, do not duplicate or redistribute it by any means. Please delete
 it and any attachments and notify the sender that you have received it in
 error. Unless specifically indicated, this e-mail is not an offer to buy or
 sell or a solicitation to buy or sell any securities, investment products or
 other financial product or service, an official confirmation of any
 transaction, or an official statement of Barclays. Any views or opinions
 presented are solely those of the author and do not necessarily represent
 those of Barclays. This e-mail is subject to terms available at the
 following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
 you consent to the foregoing.  Barclays Capital is the investment banking
 division of Barclays Bank PLC, a company registered in England (number
 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
 This email may relate to or be sent from other members of the Barclays
 Group.

 ___


RE: Removal of old data files

2011-09-02 Thread hiroyuki.watanabe
 
I see. Thank you for helpful information 

Yuki



-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Friday, September 02, 2011 3:40 AM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

On Fri, Sep 2, 2011 at 12:11 AM,  hiroyuki.watan...@barclayscapital.com wrote:
 Yes, I see files with name like
     Orders-g-6517-Compacted

 However, all of those file have a size of 0.

 Starting from Monday to Thurseday we have 5642 files for -Data.db, 
 -Filter.db and Statistics.db and only 128 -Compacted files.
 and all of -Compacted file has size of 0.

 Is this normal, or we are doing something wrong?

You are not doing something wrong. The -Compacted files are just marker, to 
indicate that the -Data file corresponding (with the same number) are, in fact, 
compacted and will eventually be removed. So those files will always have a 
size of 0.

--
Sylvain



 yuki

 
 From: aaron morton [mailto:aa...@thelastpickle.com]
 Sent: Thursday, August 25, 2011 6:13 PM
 To: user@cassandra.apache.org
 Subject: Re: Removal of old data files

 If cassandra does not have enough disk space to create a new file it 
 will provoke a JVM GC which should result in compacted SStables that 
 are no longer needed been deleted. Otherwise they are deleted at some 
 time in the future.
 Compacted SSTables have a file written out with a compacted extension.
 Do you see compacted sstables in the data directory?
 Cheers.
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 26/08/2011, at 2:29 AM, yuki watanabe wrote:

 We are using Cassandra 0.8.0 with 8 node ring and only one CF.
 Every column has TTL of 86400 (24 hours). we also set 'GC grace 
 second' to 43200
 (12 hours).  We have to store massive amount of data for one day now 
 and eventually for five days if we get more disk space.
 Even for one day, we do run out disk space in a busy day.

 We run nodetool compact command at night or as necessary then we run 
 GC from jconsole. We observed that  GC did remove files but not 
 necessarily oldest ones.
 Data files from more than 36 hours ago and quite often three days ago 
 are still there.

 Does this behavior expected or we need adjust some other parameters?


 Yuki Watanabe

 ___



 This e-mail may contain information that is confidential, privileged 
 or otherwise protected from disclosure. If you are not an intended 
 recipient of this e-mail, do not duplicate or redistribute it by any 
 means. Please delete it and any attachments and notify the sender that 
 you have received it in error. Unless specifically indicated, this 
 e-mail is not an offer to buy or sell or a solicitation to buy or sell 
 any securities, investment products or other financial product or 
 service, an official confirmation of any transaction, or an official 
 statement of Barclays. Any views or opinions presented are solely 
 those of the author and do not necessarily represent those of 
 Barclays. This e-mail is subject to terms available at the following 
 link: www.barcap.com/emaildisclaimer. By messaging with Barclays you 
 consent to the foregoing.  Barclays Capital is the investment banking 
 division of Barclays Bank PLC, a company registered in England (number
 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
 This email may relate to or be sent from other members of the Barclays 
 Group.

 ___


RE: Removal of old data files

2011-09-01 Thread hiroyuki.watanabe
Yes, I see files with name like
Orders-g-6517-Compacted

However, all of those file have a size of 0.

Starting from Monday to Thurseday we have 5642 files for -Data.db, -Filter.db 
and Statistics.db and only 128 -Compacted files.
and all of -Compacted file has size of 0.

Is this normal, or we are doing something wrong?


yuki



From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, August 25, 2011 6:13 PM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

If cassandra does not have enough disk space to create a new file it will 
provoke a JVM GC which should result in compacted SStables that are no longer 
needed been deleted. Otherwise they are deleted at some time in the future.

Compacted SSTables have a file written out with a compacted extension.

Do you see compacted sstables in the data directory?

Cheers.

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26/08/2011, at 2:29 AM, yuki watanabe wrote:


We are using Cassandra 0.8.0 with 8 node ring and only one CF.
Every column has TTL of 86400 (24 hours). we also set 'GC grace second' to 43200
(12 hours).  We have to store massive amount of data for one day now and 
eventually for five days if we get more disk space.
Even for one day, we do run out disk space in a busy day.

We run nodetool compact command at night or as necessary then we run GC from 
jconsole. We observed that  GC did remove files but not necessarily oldest ones.
Data files from more than 36 hours ago and quite often three days ago are still 
there.

Does this behavior expected or we need adjust some other parameters?


Yuki Watanabe


___

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Please delete it 
and any attachments and notify the sender that you have received it in error. 
Unless specifically indicated, this e-mail is not an offer to buy or sell or a 
solicitation to buy or sell any securities, investment products or other 
financial product or service, an official confirmation of any transaction, or 
an official statement of Barclays. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of Barclays. This 
e-mail is subject to terms available at the following link: 
www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the 
foregoing.  Barclays Capital is the investment banking division of Barclays 
Bank PLC, a company registered in England (number 1026167) with its registered 
office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be 
sent from other members of the Barclays Group.
___


Removal of old data files

2011-08-25 Thread yuki watanabe
We are using Cassandra 0.8.0 with 8 node ring and only one CF.
Every column has TTL of 86400 (24 hours). we also set 'GC grace second' to
43200
(12 hours).  We have to store massive amount of data for one day now and
eventually for five days if we get more disk space.
Even for one day, we do run out disk space in a busy day.

We run nodetool compact command at night or as necessary then we run GC from
jconsole. We observed that  GC did remove files but not necessarily oldest
ones.
Data files from more than 36 hours ago and quite often three days ago are
still there.

Does this behavior expected or we need adjust some other parameters?


Yuki Watanabe


Re: Removal of old data files

2011-08-25 Thread aaron morton
If cassandra does not have enough disk space to create a new file it will 
provoke a JVM GC which should result in compacted SStables that are no longer 
needed been deleted. Otherwise they are deleted at some time in the future.

Compacted SSTables have a file written out with a compacted extension. 

Do you see compacted sstables in the data directory?

Cheers.

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26/08/2011, at 2:29 AM, yuki watanabe wrote:

 
 We are using Cassandra 0.8.0 with 8 node ring and only one CF.
 Every column has TTL of 86400 (24 hours). we also set 'GC grace second' to 
 43200
 (12 hours).  We have to store massive amount of data for one day now and 
 eventually for five days if we get more disk space.  
 Even for one day, we do run out disk space in a busy day.
 
 We run nodetool compact command at night or as necessary then we run GC from 
 jconsole. We observed that  GC did remove files but not necessarily oldest 
 ones.
 Data files from more than 36 hours ago and quite often three days ago are 
 still there.
 
 Does this behavior expected or we need adjust some other parameters?
 
 
 Yuki Watanabe