Re: Lucene's FST for the block index

Jason Rutherglen Thu, 02 Jun 2011 09:32:39 -0700

> The FST is more compact [than] keeping every Nth row id in RAM.

> It would be nice to support pluggable block index implementations

Maybe we should try to support this prior to the HFile v2, which
instead uses a tree structure to layout the blocks?  Eg, a pluggable
block index then becomes more difficult.  I think HFile v2 lists the
memory usage of the bloom filter and the block index as primary
motivations for creation.  There has also been work to try to turn the
FST into a bloom filter like data structure.

> Perhaps we do this in the scope of HFile "v2"? 
> https://issues.apache.org/jira/browse/HBASE-3857

I'm not sure.

The other possible usage of the FST is to simply store all rowids
(compactly) into it and lay it out on disk, eg, then a separate block
index should not be required.  We could test out and benchmark these
scenarios with a pluggable HFile system.

On Thu, Jun 2, 2011 at 9:09 AM, Jason Rutherglen
<jason.rutherg...@gmail.com> wrote:
> Lucene has a compact FST (Finite State Transducer) that's used for the
> sorted terms index.  I think this is the same type of functionality as
> the HBase block index, eg, a sorted index of row ids?  The FST is more
> compact keeping every Nth row id in RAM.  Does the HFile format allow
> pluggable block index implementations?
>
> I posted this to Jira issues however that's probably not the best place.
>

Re: Lucene's FST for the block index

Reply via email to