You should be able to set split points for every customer/date combination of interest. This would allow you to localize the data to be deleted to single regions.
On Sat, May 14, 2011 at 12:25 PM, Ophir Cohen <oph...@gmail.com> wrote: > About the partitionion: I talked about something more automatic. > > My use case: I have data that comes from different customers that has > different retention policies and different behaver. > For example: if I have key such as: > *customerA-date-other_parts_of_key* > *customerB-date-other_parts_of_key* > *customerC-date-other_parts_of_key* > *customerD-date-other_parts_of_key* > * > * > I would like to have some kind of option to tell HBase that all the first > part of the key (say start to the '-' sign) *has* to be in a different > regions and that from now on even with new customer the partitoning > will happened automatically rather manually as it right now. > I'm not sure how to should be implemented but this is my use case... > And yes, I can do it manually... > > About the regions deletion: exactly what you say: a tool that I provide > region (or even better: provide *start *and *end* key) and it deletes it in > bulk way. > It should do something as follows: > > 1. Split region (or more) by the start/end key. > 2. Close this region/regions. > 3. Remove the directories from the HDFS. > 4. Remove those regions from .META. > > It sound to me like a useful tool to have. > As you suggested, I'm going to add an issue and maybe even will try > to implement it... > > Ophir > > On Fri, May 13, 2011 at 10:00 PM, Jean-Daniel Cryans <jdcry...@apache.org > >wrote: > > > I haven't read the whole thread, but I'll try some answers anyway. > > > > J-D > > > > > What do I mean by partitioning? - an option to state where the regions > > are > > > split. > > > > You can already do that, either at creation time or when doing a split > > via the shell or HBA you can tell on which row it should try to split. > > > > > > > > This is a standard capability of databases and can be use for various > > > things: > > > > > > - Load balancing - I can split overloaded read/write region into two > or > > > more regions. > > > - Retention - (say data sorted by time) I can delete old regions. > > > > > > Anther feature I think can be useful is region delete. > > > It good especially to delete large amount of data that sorted together > > (e.g. > > > delete old rows if the key has date) > > > > You can already do it in a very expensive way, so I guess you more > > talking about some sort of "bulk" delete where instead of issuing one > > Delete per row you would the whole folder altogether right? And then > > do the required .META. fixup... Doesn't sound too bad, and could be > > part of online merging, please open a jira. > > >