Chinmay, how are you configuring your job? Have you checked using setScan
and selecting the keys you care to run MR over? See

http://ofps.oreilly.com/titles/9781449396107/mapreduce.html

As a shameless plug - For your reports, see if you want to leverage Crux:
https://github.com/sonalgoyal/crux

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Sat, Sep 10, 2011 at 2:53 PM, Eugene Kirpichov <ekirpic...@gmail.com>wrote:

> I believe HBase has some kind of TTL (timeout-based expiry) for
> records and it can clean them up on its own.
>
> On Sat, Sep 10, 2011 at 1:54 AM, Dhodapkar, Chinmay
> <chinm...@qualcomm.com> wrote:
> > Hello,
> > I have a setup where a bunch of clients store 'events' in an Hbase table
> . Also, periodically(once a day), I run a mapreduce job that goes over the
> table and computes some reports.
> >
> > Now my issue is that the next time I don't want mapreduce job to process
> the 'events' that it has already processed previously. I know that I can
> mark processed event in the hbase table and the mapper can filter them them
> out during the next run. But what I would really like/want is that
> previously processed events don't even hit the mapper.
> >
> > One solution I can think of is to backup the hbase table after running
> the job and then clear the table. But this has lot of problems..
> > 1) Clients may have inserted events while the job was running.
> > 2) I could disable and drop the table and then create it again...but then
> the clients would complain about this short window of unavailability.
> >
> >
> > What do people using Hbase (live) + mapreduce typically do. ?
> >
> > Thanks!
> > Chinmay
> >
> >
>
>
>
> --
> Eugene Kirpichov
> Principal Engineer, Mirantis Inc. http://www.mirantis.com/
> Editor, http://fprog.ru/
>

Reply via email to