> The FST is more compact [than] keeping every Nth row id in RAM. > It would be nice to support pluggable block index implementations
Maybe we should try to support this prior to the HFile v2, which instead uses a tree structure to layout the blocks? Eg, a pluggable block index then becomes more difficult. I think HFile v2 lists the memory usage of the bloom filter and the block index as primary motivations for creation. There has also been work to try to turn the FST into a bloom filter like data structure. > Perhaps we do this in the scope of HFile "v2"? > https://issues.apache.org/jira/browse/HBASE-3857 I'm not sure. The other possible usage of the FST is to simply store all rowids (compactly) into it and lay it out on disk, eg, then a separate block index should not be required. We could test out and benchmark these scenarios with a pluggable HFile system. On Thu, Jun 2, 2011 at 9:09 AM, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: > Lucene has a compact FST (Finite State Transducer) that's used for the > sorted terms index. I think this is the same type of functionality as > the HBase block index, eg, a sorted index of row ids? The FST is more > compact keeping every Nth row id in RAM. Does the HFile format allow > pluggable block index implementations? > > I posted this to Jira issues however that's probably not the best place. >