OK,OK,OK. If data is stored row-by-row in hbase, how could you explain the text under section "Physical Storage View" in http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture. Is the page stale or something else wrong?
On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<[email protected]> wrote: > Data is stored row-by-row in the hbase store files (aka hfiles). > HBase is not a column-oriented-store as described in the wikipedia > article: http://en.wikipedia.org/wiki/Column-oriented_DBMS > > Have a look at the bigtable paper, do some searches, lots of material > out there describing the benefits of a flexible store like > bigtable/hbase. > > -ryan > > > > On Fri, Jul 31, 2009 at 12:42 AM, Angus He<[email protected]> wrote: >> Hi Ryan, >> >> You cannot equate the "column" in that article of wikipedia to the >> "column" in HBase. >> >> We should assume that the word "column" in "column-oriented" is >> predefined, otherwise, it is meaningless. >> >> So we should consider the "column" in wikipedia as "column-family" in >> HBase. In this way, the article can answer 宏明's question. >> >> >> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<[email protected]> wrote: >>> Hey, >>> >>> The bigtable paper talks more about column families, but in HBase each >>> column family is stored in it's own file. That means there is disk >>> locality for different column families. The canonical use is to put >>> web crawl data in one family, and meta data (like derived meta data) >>> in another. That way scanning just the meta data is not as expensive >>> as scanning the web page crawl dump. >>> >>> Column families are pre-defined - the "schema" for what it's worth - >>> but the 'qualifier' within a family is dynamically determined by the >>> client. >>> >>> In the terminology of the article, hbase would be more 'row oriented', >>> but with the column family snag, it isnt that simple. Since rows from >>> different families are stored in different files, reading efficiency >>> is related to which column families you are reading in a query. >>> >>> -ryan >>> >>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<[email protected]> wrote: >>>> Hi Ryan, >>>> >>>> 1. If it is not the case , what is the purpose of introduction of >>>> "column family"? >>>> Does the contents from different column family stored in different >>>> files in HBase? >>>> >>>> BTW, in the bigtable paper, we can find the following text: >>>> "Access control and both disk and memory accounting are performed at >>>> the column-family level." >>>> >>>> 2. I was wondering if HBase shares the benefits described in the >>>> "Benefits" sections of wikipedia article. If not, what is the meaning >>>> of "column-stores" in HBase? >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<[email protected]> wrote: >>>>> HBase and bigtable are referred to column-stores, but we arent a >>>>> 'column oriented dbms' as described in the wikipedia. >>>>> >>>>> At the storage level, hbase stores key-values, where the key is a >>>>> triple of row / column / timestamp. Files are ordered lists of these >>>>> key/values, and they are sorted in that order, hence rows are stored >>>>> together, then sorted by column then reverse by timestamp (newest on >>>>> top). >>>>> >>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia >>>>> entry. >>>>> >>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<[email protected]> wrote: >>>>>> Why don't you try to google it first? >>>>>> After googling with the keyword "Column-oriented", the first result is >>>>>> exactly what you want. >>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS >>>>>> >>>>>> >>>>>> >>>>>> 2009/7/31 <[email protected]>: >>>>>>> Hi, >>>>>>> Does anyone can tell me the benefit of Column-oriented data modal? >>>>>>> Thank you >>>>>>> >>>>>>> Fleming >>>>>>> 宏明 >>>>>>> --------------------------------------------------------------------------- >>>>>>> TSMC PROPERTY >>>>>>> This email communication (and any attachments) is proprietary >>>>>>> information >>>>>>> for the sole use of its >>>>>>> intended recipient. Any unauthorized review, use or distribution by >>>>>>> anyone >>>>>>> other than the intended >>>>>>> recipient is strictly prohibited. If you are not the intended >>>>>>> recipient, >>>>>>> please notify the sender by >>>>>>> replying to this email, and then delete this email and any copies of it >>>>>>> immediately. Thank you. >>>>>>> --------------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards >>>>>> Angus >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards >>>> Angus >>>> >>> >> >> >> >> -- >> Regards >> Angus >> > -- Regards Angus
