[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964558#comment-13964558 ]
Alex Baranau commented on HBASE-6618: ------------------------------------- right. I mean it won't be human-friendly though still... I thought more about smth like this: {code} new FuzzyRowFilter.Builder() .any(<length>) // meaning "????" for 4 .range(<range_start_bytes>, <range_end_bytes>) // builder will check that length of those is the same .any(<length>) .fixed(<couple_fixed_bytes>) .build(); {code} We may also overload with allowing strings if makes sense. So that e.g. "???(11-88)??AAA" could be built with: {code} new FuzzyRowFilter.Builder() .any(3) .range("11", "88") .any(2) .fixed("AAA") .build(); {code} thoughts? > Implement FuzzyRowFilter with ranges support > -------------------------------------------- > > Key: HBASE-6618 > URL: https://issues.apache.org/jira/browse/HBASE-6618 > Project: HBase > Issue Type: New Feature > Components: Filters > Reporter: Alex Baranau > Assignee: Alex Baranau > Priority: Minor > Fix For: 0.99.0 > > Attachments: HBASE-6618-algo-desc-bits.png, HBASE-6618-algo.patch, > HBASE-6618.patch, HBASE-6618_2.path, HBASE-6618_3.path, HBASE-6618_4.patch, > HBASE-6618_5.patch > > > Apart from current ability to specify fuzzy row filter e.g. for > <userId_actionId> format as ????_0004 (where 0004 - actionId) it would be > great to also have ability to specify the "fuzzy range" , e.g. ????_0004, > ..., ????_0099. > See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 > Note: currently it is possible to provide multiple fuzzy row rules to > existing FuzzyRowFilter, but in case when the range is big (contains > thousands of values) it is not efficient. > Filter should perform efficient fast-forwarding during the scan (this is what > distinguishes it from regex row filter). > While such functionality may seem like a proper fit for custom filter (i.e. > not including into standard filter set) it looks like the filter may be very > re-useable. We may judge based on the implementation that will hopefully be > added. -- This message was sent by Atlassian JIRA (v6.2#6252)