Hi Stack, ya i needed the result to feed a program, thanks for the suggestions though, i ll try out the Counter.ROWS thing tomorrow.
Thanks, Raakhi On Wed, Apr 22, 2009 at 10:36 PM, stack <st...@duboce.net> wrote: > So you need the result to feed a program? > > Maybe someone else knows how to ask a finished mapreduce job questions > about > its counters? There must be a way? > > Or, yeah, I suppose, I don't believe RowCounter writes the count to the > filesystem. You'd need to add that if you can't figure a way to ask the > finished RowCounter job what the value of its Counter.ROWS counter was. > > St.Ack > > On Wed, Apr 22, 2009 at 9:50 AM, Rakhi Khatwani <rakhi.khatw...@gmail.com > >wrote: > > > Hi St Ack, > > well i did go through the usage... where we were supposed to > > mention 3 parameters, OutputDir, TableName and Columns > > what i actually wanted is an int value count, which contains the number > of > > rows in the table. > > i guess this program seems to store the o/p in some output dir... correct > > me > > if i am going wrong. > > > > Thanks, > > Raakhi > > > > On Wed, Apr 22, 2009 at 8:25 AM, stack <st...@duboce.net> wrote: > > > > > Oh, and the reason to use a MR job counting rows is because if many, a > > > single process would take too long (If you know you have a small table, > > use > > > the 'count' command in shell). > > > > > > St.Ack > > > > > > On Wed, Apr 22, 2009 at 9:06 AM, Stack <saint....@gmail.com> wrote: > > > > > > > If you run > > > > > > > > ./bin/hadoop -jar hbase.jar rowcounter > > > > > > > > It will emit usage. You are a smart fellow. I think you can take it > > from > > > > there. > > > > > > > > Stack > > > > > > > > > > > > > > > > > > > > On Apr 22, 2009, at 5:48, Rakhi Khatwani <rakhi.khatw...@gmail.com> > > > wrote: > > > > > > > > Hi Lars, > > > >> Thanks for the suggesstion, I also figured out my problem > > using > > > >> TableInputFormatBase. > > > >> > > > >> but my table had only one region but i still wanted to split the > input > > > >> into > > > >> 4 maps. > > > >> so i am basically overriding the getInputSplits() method in > > > >> TableInputFormatBase. > > > >> > > > >> One more question > > > >> is there any method in hbase API which can count the number of rows > in > > a > > > >> table? > > > >> i tried googling it and all i came across is a RowCounter class > which > > is > > > a > > > >> mapreduce job to count the number of rows. but i really dont know > how > > to > > > >> use > > > >> it. any suggestions? > > > >> > > > >> thanks, > > > >> Raakhi > > > >> > > > >> > > > >> On Wed, Apr 22, 2009 at 4:30 AM, Lars George <l...@worldlingo.com> > > > wrote: > > > >> > > > >> Hi Rakhi, > > > >>> > > > >>> This is all done in the TableInputFormatBase class, which you can > > > extend > > > >>> and then override the getSplits() function: > > > >>> > > > >>> > > > >>> > > > >>> > > > > > > http://hadoop.apache.org/hbase/docs/r0.19.1/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html > > > >>> > > > >>> This is where you can then specify how many rows per map are > > assigned. > > > >>> Really straight forward as I see it. I have used it to implement a > > > >>> special > > > >>> "only use N regions" support where I can run a sample subset > against > > a > > > MR > > > >>> job. For example only map 5 out if 8K regions of a table. > > > >>> > > > >>> The default one will always split all regions into N maps. Hence > the > > > >>> recommendation to set the number of maps to the number of regions > in > > a > > > >>> table. If you set it to something lower than it will split the > > regions > > > >>> into > > > >>> a smaller number but with more rows per map, i.e. each map gets > more > > > than > > > >>> one region to process. > > > >>> > > > >>> Look into the source of the above class and it should be obvious - > I > > > >>> hope. > > > >>> > > > >>> Lars > > > >>> > > > >>> > > > >>> > > > >>> Rakhi Khatwani wrote: > > > >>> > > > >>> Hi, > > > >>>> I have a table with N records, > > > >>>> now i want to run a map reduce job with 4 maps and 0 reduces. > > > >>>> is there a way i can create my own custom input split so that i > > can > > > >>>> send 'n' records to each map?? > > > >>>> if there is a way, can i have a sample code snippet to gain > better > > > >>>> understanding? > > > >>>> > > > >>>> Thanks > > > >>>> Raakhi. > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>> > > > > > >