The diagram is mostly correct, the thing it doesn't show it the other
rows that bracket the row shown there.  Each different column family
get stored into a different file, but within each file, things are
stored in row order.

If you stored only 1 column per family, it would resemble a
column-store, however as you stored more columns per family, they
would be stored in "row order", ie: columns from the same row are
stored next to each other.

-ryan

On Fri, Jul 31, 2009 at 1:05 AM, Angus He<[email protected]> wrote:
> OK,OK,OK.
>
> If data is stored row-by-row in hbase, how could you explain the text
> under section "Physical Storage View" in
> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture.
> Is the page stale or something else wrong?
>
> On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<[email protected]> wrote:
>> Data is stored row-by-row in the hbase store files (aka hfiles).
>> HBase is not a column-oriented-store as described in the wikipedia
>> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>
>> Have a look at the bigtable paper, do some searches, lots of material
>> out there describing the benefits of a flexible store like
>> bigtable/hbase.
>>
>> -ryan
>>
>>
>>
>> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<[email protected]> wrote:
>>> Hi Ryan,
>>>
>>> You cannot equate the "column" in that article of wikipedia to the
>>> "column" in HBase.
>>>
>>> We should assume that the word "column" in "column-oriented" is
>>> predefined, otherwise, it is meaningless.
>>>
>>> So we should consider the "column" in wikipedia as "column-family" in
>>> HBase.  In this way, the article can answer 宏明's question.
>>>
>>>
>>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<[email protected]> wrote:
>>>> Hey,
>>>>
>>>> The bigtable paper talks more about column families, but in HBase each
>>>> column family is stored in it's own file.  That means there is disk
>>>> locality for different column families.  The canonical use is to put
>>>> web crawl data in one family, and meta data (like derived meta data)
>>>> in another.  That way scanning just the meta data is not as expensive
>>>> as scanning the web page crawl dump.
>>>>
>>>> Column families are pre-defined - the "schema" for what it's worth -
>>>> but the 'qualifier' within a family is dynamically determined by the
>>>> client.
>>>>
>>>> In the terminology of the article, hbase would be more 'row oriented',
>>>> but with the column family snag, it isnt that simple.  Since rows from
>>>> different families are stored in different files, reading efficiency
>>>> is related to which column families you are reading in a query.
>>>>
>>>> -ryan
>>>>
>>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<[email protected]> wrote:
>>>>> Hi Ryan,
>>>>>
>>>>> 1. If it is not the case , what is the purpose of introduction of
>>>>> "column family"?
>>>>> Does the contents from different column family stored in different
>>>>> files in HBase?
>>>>>
>>>>> BTW, in the bigtable paper, we can find the following text:
>>>>> "Access control and both disk and memory accounting are performed at
>>>>> the column-family level."
>>>>>
>>>>> 2. I was wondering if HBase shares the benefits described in the
>>>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>>>> of  "column-stores" in HBase?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<[email protected]> wrote:
>>>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>>>> 'column oriented dbms' as described in the wikipedia.
>>>>>>
>>>>>> At the storage level, hbase stores key-values, where the key is a
>>>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>>>> key/values, and they are sorted in that order, hence rows are stored
>>>>>> together, then sorted by column then reverse by timestamp (newest on
>>>>>> top).
>>>>>>
>>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia 
>>>>>> entry.
>>>>>>
>>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<[email protected]> wrote:
>>>>>>> Why don't you try to google it first?
>>>>>>> After googling with the keyword "Column-oriented", the first result is
>>>>>>> exactly what you want.
>>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2009/7/31  <[email protected]>:
>>>>>>>> Hi,
>>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Fleming
>>>>>>>> 宏明
>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>                                                         TSMC PROPERTY
>>>>>>>>  This email communication (and any attachments) is proprietary 
>>>>>>>> information
>>>>>>>>  for the sole use of its
>>>>>>>>  intended recipient. Any unauthorized review, use or distribution by 
>>>>>>>> anyone
>>>>>>>>  other than the intended
>>>>>>>>  recipient is strictly prohibited.  If you are not the intended 
>>>>>>>> recipient,
>>>>>>>>  please notify the sender by
>>>>>>>>  replying to this email, and then delete this email and any copies of 
>>>>>>>> it
>>>>>>>>  immediately. Thank you.
>>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards
>>>>>>> Angus
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards
>>>>> Angus
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Angus
>>>
>>
>
>
>
> --
> Regards
> Angus
>

Reply via email to