On Fri, Jul 31, 2009 at 4:23 PM, Ryan Rawson<[email protected]> wrote: > Not really, only storing 1 value per column family is a fairly > degenerate case and not really the primary mechanism by which people > use hbase. The column family storage model may superficially appear > to be like a column-store, but it can do so much more and is much more > flexible.
Yes, I couldn't agree more, Ryan. And that's why we choose hbase instead of other column-oriented DBMS, it provides us much more flexibility. But from the conceptual point of view, hbase and Google bigtable is a column-family oriented database system indeed and consequently they share the benefits as described in http://en.wikipedia.org/wiki/Column-oriented_DBMS . > On Fri, Jul 31, 2009 at 1:20 AM, Angus He<[email protected]> wrote: >>> If you stored only 1 column per family, it would resemble a >>> column-store, however as you stored more columns per family, they >>> would be stored in "row order", ie: columns from the same row are >>> stored next to each other. >> >> I know. And In previous post, I have mentioned "You cannot equate the >> "column" in that article of wikipedia to the >> "column" in HBase. >> So we should consider the "column" in wikipedia as "column-family" in >> HBase". >> >> Anyway, >> Ryan, do you agree that hbase is a "column-family oriented db system"? >> >> >> >> >>> >>> On Fri, Jul 31, 2009 at 1:05 AM, Angus He<[email protected]> wrote: >>>> OK,OK,OK. >>>> >>>> If data is stored row-by-row in hbase, how could you explain the text >>>> under section "Physical Storage View" in >>>> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture. >>>> Is the page stale or something else wrong? >>>> >>>> On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<[email protected]> wrote: >>>>> Data is stored row-by-row in the hbase store files (aka hfiles). >>>>> HBase is not a column-oriented-store as described in the wikipedia >>>>> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS >>>>> >>>>> Have a look at the bigtable paper, do some searches, lots of material >>>>> out there describing the benefits of a flexible store like >>>>> bigtable/hbase. >>>>> >>>>> -ryan >>>>> >>>>> >>>>> >>>>> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<[email protected]> wrote: >>>>>> Hi Ryan, >>>>>> >>>>>> You cannot equate the "column" in that article of wikipedia to the >>>>>> "column" in HBase. >>>>>> >>>>>> We should assume that the word "column" in "column-oriented" is >>>>>> predefined, otherwise, it is meaningless. >>>>>> >>>>>> So we should consider the "column" in wikipedia as "column-family" in >>>>>> HBase. In this way, the article can answer 宏明's question. >>>>>> >>>>>> >>>>>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<[email protected]> wrote: >>>>>>> Hey, >>>>>>> >>>>>>> The bigtable paper talks more about column families, but in HBase each >>>>>>> column family is stored in it's own file. That means there is disk >>>>>>> locality for different column families. The canonical use is to put >>>>>>> web crawl data in one family, and meta data (like derived meta data) >>>>>>> in another. That way scanning just the meta data is not as expensive >>>>>>> as scanning the web page crawl dump. >>>>>>> >>>>>>> Column families are pre-defined - the "schema" for what it's worth - >>>>>>> but the 'qualifier' within a family is dynamically determined by the >>>>>>> client. >>>>>>> >>>>>>> In the terminology of the article, hbase would be more 'row oriented', >>>>>>> but with the column family snag, it isnt that simple. Since rows from >>>>>>> different families are stored in different files, reading efficiency >>>>>>> is related to which column families you are reading in a query. >>>>>>> >>>>>>> -ryan >>>>>>> >>>>>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<[email protected]> wrote: >>>>>>>> Hi Ryan, >>>>>>>> >>>>>>>> 1. If it is not the case , what is the purpose of introduction of >>>>>>>> "column family"? >>>>>>>> Does the contents from different column family stored in different >>>>>>>> files in HBase? >>>>>>>> >>>>>>>> BTW, in the bigtable paper, we can find the following text: >>>>>>>> "Access control and both disk and memory accounting are performed at >>>>>>>> the column-family level." >>>>>>>> >>>>>>>> 2. I was wondering if HBase shares the benefits described in the >>>>>>>> "Benefits" sections of wikipedia article. If not, what is the meaning >>>>>>>> of "column-stores" in HBase? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<[email protected]> wrote: >>>>>>>>> HBase and bigtable are referred to column-stores, but we arent a >>>>>>>>> 'column oriented dbms' as described in the wikipedia. >>>>>>>>> >>>>>>>>> At the storage level, hbase stores key-values, where the key is a >>>>>>>>> triple of row / column / timestamp. Files are ordered lists of these >>>>>>>>> key/values, and they are sorted in that order, hence rows are stored >>>>>>>>> together, then sorted by column then reverse by timestamp (newest on >>>>>>>>> top). >>>>>>>>> >>>>>>>>> Thus hbase is not a 'column store' in the sense listed in the >>>>>>>>> wikipedia entry. >>>>>>>>> >>>>>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<[email protected]> wrote: >>>>>>>>>> Why don't you try to google it first? >>>>>>>>>> After googling with the keyword "Column-oriented", the first result >>>>>>>>>> is >>>>>>>>>> exactly what you want. >>>>>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2009/7/31 <[email protected]>: >>>>>>>>>>> Hi, >>>>>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal? >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Fleming >>>>>>>>>>> 宏明 >>>>>>>>>>> --------------------------------------------------------------------------- >>>>>>>>>>> TSMC >>>>>>>>>>> PROPERTY >>>>>>>>>>> This email communication (and any attachments) is proprietary >>>>>>>>>>> information >>>>>>>>>>> for the sole use of its >>>>>>>>>>> intended recipient. Any unauthorized review, use or distribution >>>>>>>>>>> by anyone >>>>>>>>>>> other than the intended >>>>>>>>>>> recipient is strictly prohibited. If you are not the intended >>>>>>>>>>> recipient, >>>>>>>>>>> please notify the sender by >>>>>>>>>>> replying to this email, and then delete this email and any copies >>>>>>>>>>> of it >>>>>>>>>>> immediately. Thank you. >>>>>>>>>>> --------------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Regards >>>>>>>>>> Angus >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Regards >>>>>>>> Angus >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards >>>>>> Angus >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards >>>> Angus >>>> >>> >> >> >> >> -- >> Regards >> Angus >> > -- Regards Angus
