So you need the result to feed a program? Maybe someone else knows how to ask a finished mapreduce job questions about its counters? There must be a way?
Or, yeah, I suppose, I don't believe RowCounter writes the count to the filesystem. You'd need to add that if you can't figure a way to ask the finished RowCounter job what the value of its Counter.ROWS counter was. St.Ack On Wed, Apr 22, 2009 at 9:50 AM, Rakhi Khatwani <rakhi.khatw...@gmail.com>wrote: > Hi St Ack, > well i did go through the usage... where we were supposed to > mention 3 parameters, OutputDir, TableName and Columns > what i actually wanted is an int value count, which contains the number of > rows in the table. > i guess this program seems to store the o/p in some output dir... correct > me > if i am going wrong. > > Thanks, > Raakhi > > On Wed, Apr 22, 2009 at 8:25 AM, stack <st...@duboce.net> wrote: > > > Oh, and the reason to use a MR job counting rows is because if many, a > > single process would take too long (If you know you have a small table, > use > > the 'count' command in shell). > > > > St.Ack > > > > On Wed, Apr 22, 2009 at 9:06 AM, Stack <saint....@gmail.com> wrote: > > > > > If you run > > > > > > ./bin/hadoop -jar hbase.jar rowcounter > > > > > > It will emit usage. You are a smart fellow. I think you can take it > from > > > there. > > > > > > Stack > > > > > > > > > > > > > > > On Apr 22, 2009, at 5:48, Rakhi Khatwani <rakhi.khatw...@gmail.com> > > wrote: > > > > > > Hi Lars, > > >> Thanks for the suggesstion, I also figured out my problem > using > > >> TableInputFormatBase. > > >> > > >> but my table had only one region but i still wanted to split the input > > >> into > > >> 4 maps. > > >> so i am basically overriding the getInputSplits() method in > > >> TableInputFormatBase. > > >> > > >> One more question > > >> is there any method in hbase API which can count the number of rows in > a > > >> table? > > >> i tried googling it and all i came across is a RowCounter class which > is > > a > > >> mapreduce job to count the number of rows. but i really dont know how > to > > >> use > > >> it. any suggestions? > > >> > > >> thanks, > > >> Raakhi > > >> > > >> > > >> On Wed, Apr 22, 2009 at 4:30 AM, Lars George <l...@worldlingo.com> > > wrote: > > >> > > >> Hi Rakhi, > > >>> > > >>> This is all done in the TableInputFormatBase class, which you can > > extend > > >>> and then override the getSplits() function: > > >>> > > >>> > > >>> > > >>> > > > http://hadoop.apache.org/hbase/docs/r0.19.1/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html > > >>> > > >>> This is where you can then specify how many rows per map are > assigned. > > >>> Really straight forward as I see it. I have used it to implement a > > >>> special > > >>> "only use N regions" support where I can run a sample subset against > a > > MR > > >>> job. For example only map 5 out if 8K regions of a table. > > >>> > > >>> The default one will always split all regions into N maps. Hence the > > >>> recommendation to set the number of maps to the number of regions in > a > > >>> table. If you set it to something lower than it will split the > regions > > >>> into > > >>> a smaller number but with more rows per map, i.e. each map gets more > > than > > >>> one region to process. > > >>> > > >>> Look into the source of the above class and it should be obvious - I > > >>> hope. > > >>> > > >>> Lars > > >>> > > >>> > > >>> > > >>> Rakhi Khatwani wrote: > > >>> > > >>> Hi, > > >>>> I have a table with N records, > > >>>> now i want to run a map reduce job with 4 maps and 0 reduces. > > >>>> is there a way i can create my own custom input split so that i > can > > >>>> send 'n' records to each map?? > > >>>> if there is a way, can i have a sample code snippet to gain better > > >>>> understanding? > > >>>> > > >>>> Thanks > > >>>> Raakhi. > > >>>> > > >>>> > > >>>> > > >>>> > > >>> > > >