"The canonical use is to put
web crawl data in one family, and meta data (like derived meta data)
in another.  That way scanning just the meta data is not as expensive
as scanning the web page crawl dump."

I have this very same canonical use case. HBase provides very clear
benefits here. One can have a very deep archival store of web content --
and with multiversioning and timestamps, the ability to reconstitute
snapshots of change over time -- and yet run very efficient analytics
over derived metadata. The metadata and archival data are colocated in
the sense they are both in the same table, but by separating them as
different column families access to one is I/O independent of access to 
the other. So I might get a few K/ops/sec/node scanning over content,
but can get > 100K/ops/sec/node scanning over metadata only. 

   - Andy





________________________________
From: Ryan Rawson <[email protected]>
To: [email protected]
Sent: Friday, July 31, 2009 12:18:31 AM
Subject: Re: Column-oriented data modal

Hey,

The bigtable paper talks more about column families, but in HBase each
column family is stored in it's own file.  That means there is disk
locality for different column families.  The canonical use is to put
web crawl data in one family, and meta data (like derived meta data)
in another.  That way scanning just the meta data is not as expensive
as scanning the web page crawl dump.

Column families are pre-defined - the "schema" for what it's worth -
but the 'qualifier' within a family is dynamically determined by the
client.

In the terminology of the article, hbase would be more 'row oriented',
but with the column family snag, it isnt that simple.  Since rows from
different families are stored in different files, reading efficiency
is related to which column families you are reading in a query.

-ryan

On Fri, Jul 31, 2009 at 12:02 AM, Angus He<[email protected]> wrote:
> Hi Ryan,
>
> 1. If it is not the case , what is the purpose of introduction of
> "column family"?
> Does the contents from different column family stored in different
> files in HBase?
>
> BTW, in the bigtable paper, we can find the following text:
> "Access control and both disk and memory accounting are performed at
> the column-family level."
>
> 2. I was wondering if HBase shares the benefits described in the
> "Benefits" sections of wikipedia article. If not, what is the meaning
> of  "column-stores" in HBase?
>
>
>
>
>
> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<[email protected]> wrote:
>> HBase and bigtable are referred to column-stores, but we arent a
>> 'column oriented dbms' as described in the wikipedia.
>>
>> At the storage level, hbase stores key-values, where the key is a
>> triple of row / column / timestamp.  Files are ordered lists of these
>> key/values, and they are sorted in that order, hence rows are stored
>> together, then sorted by column then reverse by timestamp (newest on
>> top).
>>
>> Thus hbase is not a 'column store' in the sense listed in the wikipedia 
>> entry.
>>
>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<[email protected]> wrote:
>>> Why don't you try to google it first?
>>> After googling with the keyword "Column-oriented", the first result is
>>> exactly what you want.
>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>
>>>
>>>
>>> 2009/7/31  <[email protected]>:
>>>> Hi,
>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>> Thank you
>>>>
>>>> Fleming
>>>> 宏明
>>>>  
>>>> ---------------------------------------------------------------------------
>>>>                                                         TSMC PROPERTY
>>>>  This email communication (and any attachments) is proprietary information
>>>>  for the sole use of its
>>>>  intended recipient. Any unauthorized review, use or distribution by anyone
>>>>  other than the intended
>>>>  recipient is strictly prohibited.  If you are not the intended recipient,
>>>>  please notify the sender by
>>>>  replying to this email, and then delete this email and any copies of it
>>>>  immediately. Thank you.
>>>>  
>>>> ---------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Angus
>>>
>>
>
>
>
> --
> Regards
> Angus
>



      

Reply via email to