Chinmay, how are you configuring your job? Have you checked using setScan
and selecting the keys you care to run MR over? See

As a shameless plug - For your reports, see if you want to leverage Crux:

Best Regards,
Crux: Reporting for HBase <>
Nube Technologies <>


On Sat, Sep 10, 2011 at 2:53 PM, Eugene Kirpichov <>wrote:

> I believe HBase has some kind of TTL (timeout-based expiry) for
> records and it can clean them up on its own.
> On Sat, Sep 10, 2011 at 1:54 AM, Dhodapkar, Chinmay
> <> wrote:
> > Hello,
> > I have a setup where a bunch of clients store 'events' in an Hbase table
> . Also, periodically(once a day), I run a mapreduce job that goes over the
> table and computes some reports.
> >
> > Now my issue is that the next time I don't want mapreduce job to process
> the 'events' that it has already processed previously. I know that I can
> mark processed event in the hbase table and the mapper can filter them them
> out during the next run. But what I would really like/want is that
> previously processed events don't even hit the mapper.
> >
> > One solution I can think of is to backup the hbase table after running
> the job and then clear the table. But this has lot of problems..
> > 1) Clients may have inserted events while the job was running.
> > 2) I could disable and drop the table and then create it again...but then
> the clients would complain about this short window of unavailability.
> >
> >
> > What do people using Hbase (live) + mapreduce typically do. ?
> >
> > Thanks!
> > Chinmay
> >
> >
> --
> Eugene Kirpichov
> Principal Engineer, Mirantis Inc.
> Editor,

Reply via email to