There're some unmatched hunks in Scan.java and HRegion.java, especially much difference in nextInternal method of HRegion.java. As a common user, it's a little tough and risky for me to merge them manually, I would appreciate if the patch can apply to our current working version(0.20.3) directly. Maybe I could try to get related code from trunk. :)
Thanks for your comments! On Fri, Mar 12, 2010 at 2:41 PM, Stack <[email protected]> wrote: > On Thu, Mar 11, 2010 at 10:18 PM, Yi Liang <[email protected]> wrote: > > Hi St.Ack, > > > > Can hbase-1537 applied to 0.20.3? It should be very useful, but the patch > > can't compile Scan.java and HRegion.java for me. > > > > I took a quick look. It looks like it wouldn't take too much > massaging getting the patch to apply to trunk. What kinda errors are > you seeing? If you get it to work, it'd be good to backport. > > St.Ack > > > > Thanks, > > Yi > > On Tue, Mar 9, 2010 at 2:26 PM, Stack <[email protected]> wrote: > > > >> On Mon, Mar 8, 2010 at 6:58 PM, William Kang <[email protected]> > >> wrote: > >> > Hi, > >> > Can you give me some more details about how the information in a row > can > >> be > >> > fetched? I understand that a file like 1.5 G may have multiple HFiles > in > >> a > >> > region server. If the client want to access a column label value in > that > >> > row, what is going to happen? > >> > >> Only that cell is fetched if you specify an explicity column name > >> (column family + qualifier). > >> > >> After HBase found the region store this row, > >> > it goes to region .meta and find the index of the HFile that store the > >> > column family. And the HFile has the offset of keyvalue pairs. Then > HBase > >> > can go to the keyvalue pair and get the value for a certain column > label. > >> > > >> > >> Yes. > >> > >> > >> > Why the whole row needs to be read in memory? > >> > > >> > >> If you ask for the whole row, it will try to load it all to deliver it > >> all to you. There is no "streaming" api per se. Rather a Result > >> object is passed from server to client which has in it all in a row > >> keyed by column name. > >> > >> That said, if you want the whole row and you are scanning as opposed > >> to getting, TRUNK has hbase-1537 applied which allows for intra-row > >> scanning -- you call setBatch to set maximum returned within a row and > >> the 0.20 branch has HBASE-1996, which allows you set maximum size > >> returned on a next invocation (in both cases, if the row is not > >> exhausted, the next 'next' invocation will return more out of the > >> current row, and so on, until the row is exhausted). > >> > >> > If HBase does not read the whole row at once, what caused its > >> inefficiency? > >> > >> I think Ryan is just allowing that the above means of scanning parts > >> of rows may have bugs that we've not yet squashed. > >> > >> St.Ack > >> > >> > >> > Thanks. > >> > > >> > > >> > William > >> > > >> > On Mon, Mar 8, 2010 at 3:44 PM, Ryan Rawson <[email protected]> > wrote: > >> > > >> >> Hi, > >> >> > >> >> At this time, truly massive massive rows such as the one you > described > >> >> may behave non-optimally in hbase. While in previous versions of > >> >> HBase, reading an entire row required you to be able to actually read > >> >> and send the entire row in one go, there is a new API that allows you > >> >> to get effectively stream rows. There are still some read paths that > >> >> may read more data than necessary, so your performance milage may > >> >> vary. > >> >> > >> >> > >> >> > >> >> On Sun, Mar 7, 2010 at 3:56 AM, Ahmed Suhail Manzoor > >> >> <[email protected]> wrote: > >> >> > Hi, > >> >> > > >> >> > This might prove to be a blatantly obvious questions but wouldn't > it > >> make > >> >> > sense to store large files directly in HDFS and keep the metadata > >> about > >> >> the > >> >> > file in HBase? One could for instance serialize set the details of > the > >> >> hdfs > >> >> > file in a java object and store that in hbase. This object could > >> export > >> >> the > >> >> > reading of the hdfs file for instance so that one is left with > clean > >> >> code. > >> >> > Anything wrong in implementing things this way? > >> >> > > >> >> > Cheers > >> >> > su./hail > >> >> > > >> >> > On 07/03/2010 09:21, tsuna wrote: > >> >> >> > >> >> >> On Sat, Mar 6, 2010 at 9:14 PM, steven zhuang > >> >> >> <[email protected]> wrote: > >> >> >> > >> >> >>> > >> >> >>> I have a table which may contain super big rows, e.g. > with > >> >> >>> millions of cells in one row, 1.5GB in size. > >> >> >>> > >> >> >>> now I have problem at emitting data into the table, > >> probably > >> >> >>> because of these super big rows are too large for my > >> regionserver(with > >> >> >>> only > >> >> >>> 1GB heap) > >> >> >>> > >> >> >> > >> >> >> A row can't be split and whatever you do that needs that row (like > >> >> >> reading it) requires that HBase loads the entire row in memory. > If > >> >> >> the row is 1.5GB and your regionserver has only 1G of memory, it > >> won't > >> >> >> be able to use that row. > >> >> >> > >> >> >> I'm not 100% sure about that because I'm still a HBase n00b too, > but > >> >> >> that's my understanding. > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > >> > > >> > > >
