Chinmay, how are you configuring your job? Have you checked using setScan and selecting the keys you care to run MR over? See
http://ofps.oreilly.com/titles/9781449396107/mapreduce.html As a shameless plug - For your reports, see if you want to leverage Crux: https://github.com/sonalgoyal/crux Best Regards, Sonal Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Sat, Sep 10, 2011 at 2:53 PM, Eugene Kirpichov <ekirpic...@gmail.com>wrote: > I believe HBase has some kind of TTL (timeout-based expiry) for > records and it can clean them up on its own. > > On Sat, Sep 10, 2011 at 1:54 AM, Dhodapkar, Chinmay > <chinm...@qualcomm.com> wrote: > > Hello, > > I have a setup where a bunch of clients store 'events' in an Hbase table > . Also, periodically(once a day), I run a mapreduce job that goes over the > table and computes some reports. > > > > Now my issue is that the next time I don't want mapreduce job to process > the 'events' that it has already processed previously. I know that I can > mark processed event in the hbase table and the mapper can filter them them > out during the next run. But what I would really like/want is that > previously processed events don't even hit the mapper. > > > > One solution I can think of is to backup the hbase table after running > the job and then clear the table. But this has lot of problems.. > > 1) Clients may have inserted events while the job was running. > > 2) I could disable and drop the table and then create it again...but then > the clients would complain about this short window of unavailability. > > > > > > What do people using Hbase (live) + mapreduce typically do. ? > > > > Thanks! > > Chinmay > > > > > > > > -- > Eugene Kirpichov > Principal Engineer, Mirantis Inc. http://www.mirantis.com/ > Editor, http://fprog.ru/ >