Excellent summary Matt.  Some notes in the below.

On Sun, Sep 18, 2011 at 6:43 PM, Matt Corgan <mcor...@hotpads.com> wrote:
> ... All of this is relatively easy for the data
> and index blocks because they're immutable.  Doing it for the memstore is
> another story...
>

We'd need another data structure completely, wouldn't we?  Have you
given it any thought?


> Here's some contrived data to illustrate.  Row key is a (reversed
> domain) url.  Column family is called "familyName".  ColQualifier is a
> browser type.  Value is the number of views from that browser type.  Here's
> the current representation <http://pastebin.com/7ks8kzJ2>, the prefix trie's
> toString output <http://pastebin.com/cL4AkCPC> for the row keys, and removing
> the whitespace <http://pastebin.com/4qiSXNh9> from the toString output.

Thats a big diff.

> The second problem is the lack of random access to cell offsets within the
> data block.  (I'm not 100% sure on this one, so please correct me if i'm
> wrong).  I noticed how bad this problem is when i was storing historical
> event logs with 8 byte keys and small values (so ~30b per cell).  I had to
> increase block size to 256KB because the block indexes were too big to fit
> in memory.  Then I needed fast random access to these events.  The problem
> is that there are ~10,000 cells per block, so without random lookups inside
> the block, it's seeking through ~5,000 keys for the average lookup.  That's
> a crazy amount of overhead to retrieve a single cell.  Probably the quickest
> solution to this problem is to store the offset of each key in a list of
> integers at the end of the block so that it's possible to do a binary search
> inside the block.  That would reduce it to ~14 avg memory accesses to find
> the cell.
>

I'd imagine that we'd not want the index always, just when keys per
block went over some size.

hfilev2 should help here because we don't load all indices all the time.

> But, the prefix trie should actually be way faster than a binary search
> because you don't have to keep comparing the beginning of the key to the one
> you're looking for.  I'll save the details for another email or for code
> comments.  In general, with a smarter block/index format we could be much
> more efficient than the current method of comparing full key byte[] over and
> over again.
>
> Same random access problem also applies to the block index i think (correct
> me if i'm wrong here too).
>

You should stick the above in the issue Matt, or at least refer to
this mail in the issue; its great stuff.

Thanks,
St.Ack

Reply via email to