[ https://issues.apache.org/jira/browse/HBASE-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439311#comment-13439311 ]
Anil Gupta commented on HBASE-6618: ----------------------------------- Hi Alex, I agree with you idea of RangeBased Fuzzy Filter. However, I would like to take a phased approach in developing this: In your proposal, the user can provide multiple fuzzy ranges in a single scan. i.e. <any 4 bytes><any 6 bytes value between "_0001" and "0099"><any 3 bytes><any 4 bytes value between "_001" and "_099"> Instead of the above, IMO lets try to make a filter for "<any 4 bytes><any 6 bytes value between "_0001" and "0099"><any 3 bytes>" or "<any 4 bytes><any 6 bytes value between "_0001" and "0099">". Once we develop this then we can enhance it to use multiple fuzzy ranges. This is just my thought/approach of developing this. Let me know your opinion. >From this week, at work I had to shift focus from HBase to Hive and HCatalog >for another POC. So, I'll be squeezing time for this JIRA out of work >schedule. I'll start looking into the current implementation of FuzzyRowFilter >to get idea about implementation. Thanks, Anil Gupta Software Engineer II, Intuit, Inc > Implement FuzzyRowFilter with ranges support > -------------------------------------------- > > Key: HBASE-6618 > URL: https://issues.apache.org/jira/browse/HBASE-6618 > Project: HBase > Issue Type: New Feature > Components: filters > Reporter: Alex Baranau > Priority: Minor > > Apart from current ability to specify fuzzy row filter e.g. for > <userId_actionId> format as ????_0004 (where 0004 - actionId) it would be > great to also have ability to specify the "fuzzy range" , e.g. ????_0004, > ..., ????_0099. > See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65 > Note: currently it is possible to provide multiple fuzzy row rules to > existing FuzzyRowFilter, but in case when the range is big (contains > thousands of values) it is not efficient. > Filter should perform efficient fast-forwarding during the scan (this is what > distinguishes it from regex row filter). > While such functionality may seem like a proper fit for custom filter (i.e. > not including into standard filter set) it looks like the filter may be very > re-useable. We may judge based on the implementation that will hopefully be > added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira