We regularly have questions from users about querying new data and aging off old data. I was thinking about how we could better support this in need in 1.5. One thing that occurred to me is having locality groups that were based on timestamp instead of column family. For example a locality group for each month. Alternatively we could have group for < day old, < week old, < month old, < year old. Would need a way for users to define these.
This would make scanning a table for recent data much faster. Also dropping old data could be made much faster by just dropping entire locality groups at compaction time. One thing that irks me about this is : Should column family and time based locality groups be mutually exclusive (i.e. an RFile has one or the other, not both)? If they are not then order of which is partitioned first is important for query performance and would probably need to be user configurable. Thoughts? Keith
