Re: Thoughts about partitioning retention and other stuff...

Ted Dunning Sat, 14 May 2011 12:42:50 -0700

You should be able to set split points for every customer/date combination
of interest.  This would allow you to localize the data to be deleted to
single regions.


On Sat, May 14, 2011 at 12:25 PM, Ophir Cohen <oph...@gmail.com> wrote:

> About the partitionion: I talked about something more automatic.
>
> My use case: I have data that comes from different customers that has
> different retention policies and different behaver.
> For example: if I have key such as:
> *customerA-date-other_parts_of_key*
> *customerB-date-other_parts_of_key*
> *customerC-date-other_parts_of_key*
> *customerD-date-other_parts_of_key*
> *
> *
> I would like to have some kind of option to tell HBase that all the first
> part of the key (say start to the '-' sign) *has* to be in a different
> regions and that from now on even with new customer the partitoning
> will happened automatically rather manually as it right now.
> I'm not sure how to should be implemented but this is my use case...
> And yes, I can do it manually...
>
> About the regions deletion: exactly what you say: a tool that I provide
> region (or even better: provide *start *and *end* key) and it deletes it in
> bulk way.
> It should do something as follows:
>
>   1. Split region (or more) by the start/end key.
>   2. Close this region/regions.
>   3. Remove the directories from the HDFS.
>   4. Remove those regions from .META.
>
> It sound to me like a useful tool to have.
> As you suggested, I'm going to add an issue and maybe even will try
> to implement it...
>
> Ophir
>
> On Fri, May 13, 2011 at 10:00 PM, Jean-Daniel Cryans <jdcry...@apache.org
> >wrote:
>
> > I haven't read the whole thread, but I'll try some answers anyway.
> >
> > J-D
> >
> > > What do I mean by partitioning? - an option to state where the regions
> > are
> > > split.
> >
> > You can already do that, either at creation time or when doing a split
> > via the shell or HBA you can tell on which row it should try to split.
> >
> > >
> > > This is a standard capability of databases and can be use for various
> > > things:
> > >
> > >   - Load balancing - I can split overloaded read/write region into two
> or
> > >   more regions.
> > >   - Retention - (say data sorted by time) I can delete old regions.
> > >
> > > Anther feature I think can be useful is region delete.
> > > It good especially to delete large amount of data that sorted together
> > (e.g.
> > > delete old rows if the key has date)
> >
> > You can already do it in a very expensive way, so I guess you more
> > talking about some sort of "bulk" delete where instead of issuing one
> > Delete per row you would the whole folder altogether right? And then
> > do the required .META. fixup... Doesn't sound too bad, and could be
> > part of online merging, please open a jira.
> >
>

Re: Thoughts about partitioning retention and other stuff...

Reply via email to