Re: Custom Input Split

stack Wed, 22 Apr 2009 10:06:51 -0700

So you need the result to feed a program?

Maybe someone else knows how to ask a finished mapreduce job questions about
its counters?   There must be a way?


Or, yeah, I suppose, I don't believe RowCounter writes the count to the
filesystem.  You'd need to add that if you can't figure a way to ask the
finished RowCounter job what the value of its Counter.ROWS counter was.

St.Ack

On Wed, Apr 22, 2009 at 9:50 AM, Rakhi Khatwani <rakhi.khatw...@gmail.com>wrote:

> Hi St Ack,
>          well i did go through the usage... where we were supposed to
> mention 3 parameters, OutputDir, TableName and Columns
> what i actually wanted is an int value count, which contains the number of
> rows in the table.
> i guess this program seems to store the o/p in some output dir... correct
> me
> if i am going wrong.
>
> Thanks,
> Raakhi
>
> On Wed, Apr 22, 2009 at 8:25 AM, stack <st...@duboce.net> wrote:
>
> > Oh, and the reason to use a MR job counting rows is because if many, a
> > single process would take too long (If you know you have a small table,
> use
> > the 'count' command in shell).
> >
> > St.Ack
> >
> > On Wed, Apr 22, 2009 at 9:06 AM, Stack <saint....@gmail.com> wrote:
> >
> > > If you run
> > >
> > > ./bin/hadoop -jar hbase.jar rowcounter
> > >
> > > It will emit usage.  You are a smart fellow. I think you can take it
> from
> > > there.
> > >
> > > Stack
> > >
> > >
> > >
> > >
> > > On Apr 22, 2009, at 5:48, Rakhi Khatwani <rakhi.khatw...@gmail.com>
> > wrote:
> > >
> > >  Hi Lars,
> > >>          Thanks for the suggesstion, I also figured out my problem
> using
> > >> TableInputFormatBase.
> > >>
> > >> but my table had only one region but i still wanted to split the input
> > >> into
> > >> 4 maps.
> > >> so i am basically overriding the getInputSplits() method in
> > >> TableInputFormatBase.
> > >>
> > >> One more question
> > >> is there any method in hbase API which can count the number of rows in
> a
> > >> table?
> > >> i tried googling it and all i came across is a RowCounter class which
> is
> > a
> > >> mapreduce job to count the number of rows. but i really dont know how
> to
> > >> use
> > >> it. any suggestions?
> > >>
> > >> thanks,
> > >> Raakhi
> > >>
> > >>
> > >> On Wed, Apr 22, 2009 at 4:30 AM, Lars George <l...@worldlingo.com>
> > wrote:
> > >>
> > >>  Hi Rakhi,
> > >>>
> > >>> This is all done in the TableInputFormatBase class, which you can
> > extend
> > >>> and then override the getSplits() function:
> > >>>
> > >>>
> > >>>
> > >>>
> >
> http://hadoop.apache.org/hbase/docs/r0.19.1/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html
> > >>>
> > >>> This is where you can then specify how many rows per map are
> assigned.
> > >>> Really straight forward as I see it. I have used it to implement a
> > >>> special
> > >>> "only use N regions" support where I can run a sample subset against
> a
> > MR
> > >>> job. For example only map 5 out if 8K regions of a table.
> > >>>
> > >>> The default one will always split all regions into N maps. Hence the
> > >>> recommendation to set the number of maps to the number of regions in
> a
> > >>> table. If you set it to something lower than it will split the
> regions
> > >>> into
> > >>> a smaller number but with more rows per map, i.e. each map gets more
> > than
> > >>> one region to process.
> > >>>
> > >>> Look into the source of the above class and it should be obvious - I
> > >>> hope.
> > >>>
> > >>> Lars
> > >>>
> > >>>
> > >>>
> > >>> Rakhi Khatwani wrote:
> > >>>
> > >>>  Hi,
> > >>>>   I have a table with N records,
> > >>>>   now i want to run a map reduce job with 4 maps and 0 reduces.
> > >>>>   is there a way i can create my own custom input split so that i
> can
> > >>>> send 'n' records to each map??
> > >>>>  if there is a way, can i have a sample code snippet to gain better
> > >>>> understanding?
> > >>>>
> > >>>> Thanks
> > >>>> Raakhi.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>
> >
>

Re: Custom Input Split

Reply via email to