On Wed, Jan 13, 2010 at 2:30 PM, Mark Robson <mar...@gmail.com> wrote:
> I also agree: Some mechanism to expire rolling data would be really good if > we can incorporate it. Using the existing client interface, deleting old > data is very cumbersome. > > We want to store lots of audit data in Cassandra, this will need to be > expired eventually. > > Nodes should be able to do expiry locally without needing to talk to other > nodes in the cluster. As we have a timestamp on everything anyway, can we > not use that somehow? > > If we only ever append data rather than update it (or update it very > rarely), can we somehow store timestamp ranges in each sstable file and then > have the server know when it's time to expire one? > > I personally like this last option of expiring entire sstables. It seems significantly more efficient then scrubbing data. The granularity might be a bit high, but by columnfamily seems a reasonable trade-off in the short run for an easier solution. For apps that don't want to see the old data, during a read if the data had a timestamp older than the expire time on the ColumnFamily it could also be ignored, then when all in an sstable < x, truncate. Logs are a great example of this. - August