On Fri, May 7, 2010 at 12:03 AM, Takayuki Tsunakawa <tsunakawa.ta...@jp.fujitsu.com> wrote: > If versioning is not necessary from your requirement, you can ignore > timestamps (do not have to specify timestamp in API call).
Yes, it's actually recommended to not manually specify timestamps in API calls, particularly when inserting data, unless you absolutely need it. This can cause very confusing situations. > Although HBase keeps three versions by default and it may be a bit > wasteful for memory and disk, turning on compression for column It's not wasteful if you never re-write any cell. > families can minimize the waste as much as you can ignore (is it > true?). It's true for disk storage. Compression doesn't help save memory because the data is always stored uncompressed in memory (at least in the current implementation – this may change in the future). > If saving memory (=keep memtable as small as possible) is important, > you can set the maximum number of versions to 1. I don't think you'll be saving much memory anyway. As Ryan already pointed out, when you overwrite a cell, a new version is created, regardless of the maximum number of versions you allow. If you rewrite a cell 100 times in 5s, you'll end up with 101 versions, but only the last version will be visible to you. You'll have to wait until a compaction "garbage collects" the 100 "unreachable", older versions. > The reason that the default is 3 is to rescue users from their > mistakes. You could run a benchmark and compare the performance difference (both in speed and space) with 1 vs 3 and tell us what you find. Also, you didn't mention the most important reason why there are versions and timestamps. It's *required* by the HBase implementation (same thing in Bigtable). Now it could well be hidden as an implementation detail (as some RDB do), but it was chosen to expose this implementation detail in order to give an extra feature to the users. This way you can look at the past few versions of a cell if you want to. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com