Lucene has a compact FST (Finite State Transducer) that's used for the sorted terms index. I think this is the same type of functionality as the HBase block index, eg, a sorted index of row ids? The FST is more compact keeping every Nth row id in RAM. Does the HFile format allow pluggable block index implementations?
I posted this to Jira issues however that's probably not the best place.