[ 
https://issues.apache.org/jira/browse/HBASE-30174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-30174:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add start offset option to ROWPREFIX_FIXED_LENGTH bloom filter
> --------------------------------------------------------------
>
>                 Key: HBASE-30174
>                 URL: https://issues.apache.org/jira/browse/HBASE-30174
>             Project: HBase
>          Issue Type: Task
>          Components: master
>            Reporter: JinHyuk Kim
>            Assignee: JinHyuk Kim
>            Priority: Minor
>              Labels: pull-request-available
>
> h2. Problem
> The {{ROWPREFIX_FIXED_LENGTH}} bloom filter always hashes the prefix starting 
> from the beginning of the row key. This works well in many cases, but there 
> are also schemas where the leading bytes contain low-value or repetitive data 
> such as a fixed salt or bucket id.
> For example, row keys like:
> {code:java}
> {salt}:{id1}:{id2}
> {code}
> may benefit more from building the bloom filter on {{id1}} rather than the 
> leading salt bytes.
> In those cases, hashing from offset 0 reduces the effectiveness of the bloom 
> filter because part of the bloom key space is consumed by bytes that do not 
> meaningfully help distinguish HFiles.
> h2. Suggestion
> Introduce a new optional configuration:
> {code:java}
> RowPrefixBloomFilter.prefix_start_offset
> {code}
> This allows the bloom filter to skip a configurable number of leading bytes 
> before extracting the fixed-length prefix used for hashing. Defaults to 0.
> The goal is to support rowkey layouts where the meaningful lookup prefix does 
> not start at byte {{{}0{}}}.
> h2. Usage
> {code:java}
> create 'test', {
>   NAME => 'cf',
>   BLOOMFILTER => 'ROWPREFIX_FIXED_LENGTH',
>   CONFIGURATION => {
>     'RowPrefixBloomFilter.prefix_length' => '8',
>     'RowPrefixBloomFilter.prefix_start_offset' => '4'
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to