Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)

Yi Liang Sun, 14 Mar 2010 18:28:02 -0700

There're some unmatched hunks in Scan.java and HRegion.java, especially much
difference in nextInternal method of HRegion.java.
As a common user, it's a little tough and risky for me to merge them
manually, I would appreciate if the patch can apply to our current working
version(0.20.3) directly. Maybe I could try to get related code from trunk.
:)


Thanks for your comments!

On Fri, Mar 12, 2010 at 2:41 PM, Stack <[email protected]> wrote:

> On Thu, Mar 11, 2010 at 10:18 PM, Yi Liang <[email protected]> wrote:
> > Hi St.Ack,
> >
> > Can hbase-1537 applied to 0.20.3? It should be very useful, but the patch
> > can't compile Scan.java and HRegion.java for me.
> >
>
> I took a quick look.  It looks like it wouldn't take too much
> massaging getting the patch to apply to trunk.  What kinda errors are
> you seeing?  If you get it to work, it'd be good to backport.
>
> St.Ack
>
>
> > Thanks,
> > Yi
> > On Tue, Mar 9, 2010 at 2:26 PM, Stack <[email protected]> wrote:
> >
> >> On Mon, Mar 8, 2010 at 6:58 PM, William Kang <[email protected]>
> >> wrote:
> >> > Hi,
> >> > Can you give me some more details about how the information in a row
> can
> >> be
> >> > fetched? I understand that a file like 1.5 G may have multiple HFiles
> in
> >> a
> >> > region server. If the client want to access a column label value in
>  that
> >> > row, what is going to happen?
> >>
> >> Only that cell is fetched if you specify an explicity column name
> >> (column family + qualifier).
> >>
> >> After HBase found the region store this row,
> >> > it goes to region .meta and find the index of the HFile that store the
> >> > column family. And the HFile has the offset of keyvalue pairs. Then
> HBase
> >> > can go to the keyvalue pair and get the value for a certain column
> label.
> >> >
> >>
> >> Yes.
> >>
> >>
> >> > Why the whole row needs to be read in memory?
> >> >
> >>
> >> If you ask for the whole row, it will try to load it all to deliver it
> >> all to you.  There is no "streaming" api per se.  Rather a Result
> >> object is passed from server to client which has in it all in a row
> >> keyed by column name.
> >>
> >> That said, if you want the whole row and you are scanning as opposed
> >> to getting, TRUNK has hbase-1537 applied which allows for intra-row
> >> scanning -- you call setBatch to set maximum returned within a row and
> >> the 0.20 branch has HBASE-1996, which allows you set maximum size
> >> returned on a next invocation (in both cases, if the row is not
> >> exhausted, the next 'next' invocation will return more out of the
> >> current row, and so on, until the row is exhausted).
> >>
> >> > If HBase does not read the whole row at once, what caused its
> >> inefficiency?
> >>
> >> I think Ryan is just allowing that the above means of scanning parts
> >> of rows may have bugs that we've not yet squashed.
> >>
> >> St.Ack
> >>
> >>
> >> > Thanks.
> >> >
> >> >
> >> > William
> >> >
> >> > On Mon, Mar 8, 2010 at 3:44 PM, Ryan Rawson <[email protected]>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> At this time, truly massive massive rows such as the one you
> described
> >> >> may behave non-optimally in hbase. While in previous versions of
> >> >> HBase, reading an entire row required you to be able to actually read
> >> >> and send the entire row in one go, there is a new API that allows you
> >> >> to get effectively stream rows.  There are still some read paths that
> >> >> may read more data than necessary, so your performance milage may
> >> >> vary.
> >> >>
> >> >>
> >> >>
> >> >> On Sun, Mar 7, 2010 at 3:56 AM, Ahmed Suhail Manzoor
> >> >> <[email protected]> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > This might prove to be a blatantly obvious questions but wouldn't
> it
> >> make
> >> >> > sense to store large files directly in HDFS and keep the metadata
> >> about
> >> >> the
> >> >> > file in HBase? One could for instance serialize set the details of
> the
> >> >> hdfs
> >> >> > file in a java object and store that in hbase. This object could
> >> export
> >> >> the
> >> >> > reading of the hdfs file for instance so that one is left with
> clean
> >> >> code.
> >> >> > Anything wrong in implementing things this way?
> >> >> >
> >> >> > Cheers
> >> >> > su./hail
> >> >> >
> >> >> > On 07/03/2010 09:21, tsuna wrote:
> >> >> >>
> >> >> >> On Sat, Mar 6, 2010 at 9:14 PM, steven zhuang
> >> >> >> <[email protected]>  wrote:
> >> >> >>
> >> >> >>>
> >> >> >>>          I have a table which may contain super big rows, e.g.
> with
> >> >> >>> millions of cells in one row, 1.5GB in size.
> >> >> >>>
> >> >> >>>          now I have problem at emitting data into the table,
> >> probably
> >> >> >>> because of these super big rows are too large for my
> >> regionserver(with
> >> >> >>> only
> >> >> >>> 1GB heap)
> >> >> >>>
> >> >> >>
> >> >> >> A row can't be split and whatever you do that needs that row (like
> >> >> >> reading it) requires that HBase loads the entire row in memory.
>  If
> >> >> >> the row is 1.5GB and your regionserver has only 1G of memory, it
> >> won't
> >> >> >> be able to use that row.
> >> >> >>
> >> >> >> I'm not 100% sure about that because I'm still a HBase n00b too,
> but
> >> >> >> that's my understanding.
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)

Reply via email to