[ 
https://issues.apache.org/jira/browse/HBASE-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393389#comment-15393389
 ] 

Duo Zhang commented on HBASE-16225:
-----------------------------------

{quote}
You mean delete markers are now passed to filters (The per cell filter)?
{quote}
In the old SQM implementation, if KeepDeletedCells is TRUE, then we will not 
track delete markers. Instead, we will use column tracker to check that if 
there are already enough versions thus we can drop a delete marker. On this 
code path, we will pass the delete marker to a filter. Of course, 
KeepDeletedCells can only be TRUE when compaction or a raw scan, and if no 
coprocessor hook then we will not have a filter when compaction, and this is 
also why I said above that we should disable filter when raw scan.

{quote}
So this is for cases where there are special CPs written to deal with delete 
markers in compaction?
{quote}
I think most CPs do not need to deal with delete markers. Usually they only 
want to drop some cells during compaction? I mean that, if a CP really want to 
deal with delete marker, they would better implement a new scanner instead of 
using a filter since the delete logic is really complicated in HBase. And for 
the normal CPs want to drop some cells during compactions, I suggest we add a 
new type of filter which is only used for compaction? This is safer and clearer.

{quote}
What do you reckon? If we change the behaviour some existing use cases built 
with filters may break.
{quote}
But I think the use cases are not reliable?

For example, max versions = 2, 3 cells with timestamp T1 < T2 < T3

For a normal scan, T3 and T2 are returned.
If your filter eat T3, then T2 and T1 are returned.

But this is not reliable. After a compaction(no need to be major compaction), 
T1 will be gone forever and you can not get it with your filter...

Thanks.

> Refactor ScanQueryMatcher
> -------------------------
>
>                 Key: HBASE-16225
>                 URL: https://issues.apache.org/jira/browse/HBASE-16225
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>         Attachments: HBASE-16225-v1.patch, HBASE-16225-v2.patch, 
> HBASE-16225.patch
>
>
> As said in HBASE-16223, the code of {{ScanQueryMatcher}} is too complicated. 
> I suggest that we can abstract an interface and implement several sub classes 
> which separate different logic into different implementations. For example, 
> the requirements of compaction and user scan are different, now we also need 
> to consider the logic of user scan even if we only want to add a logic for 
> compaction. And at least, the raw scan does not need a query matcher... we 
> can implement a dummy query matcher for it.
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to