Nick Dimiduk created HBASE-15352:
------------------------------------

             Summary: FST BlockEncoder
                 Key: HBASE-15352
                 URL: https://issues.apache.org/jira/browse/HBASE-15352
             Project: HBase
          Issue Type: New Feature
          Components: regionserver
            Reporter: Nick Dimiduk
             Fix For: 2.0.0, 1.4.0


We could improve on the existing [PREFIX_TREE 
block|http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/codec/prefixtree/package-summary.html]
 encoder by upgrading the persistent data structure from a trie to a finite 
state transducer. This would theoretically allow us to reuse bytes not just for 
rowkey prefixes, but infixes and suffixes as well. My read of the literature 
means we may also be able to encode values as well, further reducing storage 
size when values are repeated (ie, a "customer id" field with very low 
cardinality -- probably happens a lot in our denormalized world). There's a 
really nice [blog post|http://blog.burntsushi.net/transducers/] about this data 
structure, and apparently our siblings in Lucene make heavy use of [their 
implementation|http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/util/fst/package-summary.html#package_description].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to