Hi JD
Thanks for the reply.
Regarding the filesize parameters if I'm not mistaken these apply to
all tables right? Or can we configure it for a specific table? this
because this is actually a table with a lot of entries but each entry is
very small, so if I set the filesize param for the amount of entries I
need (like 1.5K entries) and this applies to other tables there are ones
that would create a region per entry :).
Regarding the other option I'm glad to try and implement it but would
appreciate any guidance. Would definitely need a row counter, and a
means of getting the nth row, I seem to recall a JIRA about the counter
but don't know about the second issue. Any thoughts?
David
On Thu, 2008-07-31 at 09:49 -0400, Jean-Daniel Cryans wrote:
> David,
>
> If having regions splitting below the default threshold is what you want,
> you can change the configuration parameter "hbase.hregion.max.filesize"
> which by default is set to 256M. Regarding the other option, I don't think
> it's easily doable.
>
> J-D
>
> On Thu, Jul 31, 2008 at 9:06 AM, David Alves
> <[EMAIL PROTECTED]>wrote:
>
> > Hi Guys
> >
> > I use hbase (amongst other things) to crawl some repos of infomation
> > and util now I've been using the Nutch segment generation paradigm. I
> > would very much like to skip the segment generation step using hbase as
> > source and sink directly but in order to do that I would need to either
> > allow more that one split to be generated for a single region or make
> > the regions in this particular table split with much less entries than
> > other tables.
> > Is any of this possible?
> >
> > Regards
> > David Alves
> >
> > PS: Thanks Jim and Stack for your hard work on this great piece of
> > software. Hoping to see you guys commiting again soon, but either way
> > you have already done great work.
> >
> >