Hi Ryan, You cannot equate the "column" in that article of wikipedia to the "column" in HBase.
We should assume that the word "column" in "column-oriented" is predefined, otherwise, it is meaningless. So we should consider the "column" in wikipedia as "column-family" in HBase. In this way, the article can answer 宏明's question. On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<[email protected]> wrote: > Hey, > > The bigtable paper talks more about column families, but in HBase each > column family is stored in it's own file. That means there is disk > locality for different column families. The canonical use is to put > web crawl data in one family, and meta data (like derived meta data) > in another. That way scanning just the meta data is not as expensive > as scanning the web page crawl dump. > > Column families are pre-defined - the "schema" for what it's worth - > but the 'qualifier' within a family is dynamically determined by the > client. > > In the terminology of the article, hbase would be more 'row oriented', > but with the column family snag, it isnt that simple. Since rows from > different families are stored in different files, reading efficiency > is related to which column families you are reading in a query. > > -ryan > > On Fri, Jul 31, 2009 at 12:02 AM, Angus He<[email protected]> wrote: >> Hi Ryan, >> >> 1. If it is not the case , what is the purpose of introduction of >> "column family"? >> Does the contents from different column family stored in different >> files in HBase? >> >> BTW, in the bigtable paper, we can find the following text: >> "Access control and both disk and memory accounting are performed at >> the column-family level." >> >> 2. I was wondering if HBase shares the benefits described in the >> "Benefits" sections of wikipedia article. If not, what is the meaning >> of "column-stores" in HBase? >> >> >> >> >> >> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<[email protected]> wrote: >>> HBase and bigtable are referred to column-stores, but we arent a >>> 'column oriented dbms' as described in the wikipedia. >>> >>> At the storage level, hbase stores key-values, where the key is a >>> triple of row / column / timestamp. Files are ordered lists of these >>> key/values, and they are sorted in that order, hence rows are stored >>> together, then sorted by column then reverse by timestamp (newest on >>> top). >>> >>> Thus hbase is not a 'column store' in the sense listed in the wikipedia >>> entry. >>> >>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<[email protected]> wrote: >>>> Why don't you try to google it first? >>>> After googling with the keyword "Column-oriented", the first result is >>>> exactly what you want. >>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS >>>> >>>> >>>> >>>> 2009/7/31 <[email protected]>: >>>>> Hi, >>>>> Does anyone can tell me the benefit of Column-oriented data modal? >>>>> Thank you >>>>> >>>>> Fleming >>>>> 宏明 >>>>> --------------------------------------------------------------------------- >>>>> TSMC PROPERTY >>>>> This email communication (and any attachments) is proprietary information >>>>> for the sole use of its >>>>> intended recipient. Any unauthorized review, use or distribution by >>>>> anyone >>>>> other than the intended >>>>> recipient is strictly prohibited. If you are not the intended recipient, >>>>> please notify the sender by >>>>> replying to this email, and then delete this email and any copies of it >>>>> immediately. Thank you. >>>>> --------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards >>>> Angus >>>> >>> >> >> >> >> -- >> Regards >> Angus >> > -- Regards Angus
