On Fri, May 7, 2010 at 12:03 AM, Takayuki Tsunakawa
<tsunakawa.ta...@jp.fujitsu.com> wrote:
> If versioning is not necessary from your requirement, you can ignore
> timestamps (do not have to specify timestamp in API call).

Yes, it's actually recommended to not manually specify timestamps in
API calls, particularly when inserting data, unless you absolutely
need it.  This can cause very confusing situations.

> Although HBase keeps three versions by default and it may be a bit
> wasteful for memory and disk, turning on compression for column

It's not wasteful if you never re-write any cell.

> families can minimize the waste as much as you can ignore (is it
> true?).

It's true for disk storage.  Compression doesn't help save memory
because the data is always stored uncompressed in memory (at least in
the current implementation – this may change in the future).

> If saving memory (=keep memtable as small as possible) is important,
> you can set the maximum number of versions to 1.

I don't think you'll be saving much memory anyway.  As Ryan already
pointed out, when you overwrite a cell, a new version is created,
regardless of the maximum number of versions you allow.  If you
rewrite a cell 100 times in 5s, you'll end up with 101 versions, but
only the last version will be visible to you.  You'll have to wait
until a compaction "garbage collects" the 100 "unreachable", older
versions.

> The reason that the default is 3 is to rescue users from their
> mistakes.

You could run a benchmark and compare the performance difference (both
in speed and space) with 1 vs 3 and tell us what you find.


Also, you didn't mention the most important reason why there are
versions and timestamps.  It's *required* by the HBase implementation
(same thing in Bigtable).  Now it could well be hidden as an
implementation detail (as some RDB do), but it was chosen to expose
this implementation detail in order to give an extra feature to the
users.  This way you can look at the past few versions of a cell if
you want to.

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Reply via email to