[
https://issues.apache.org/jira/browse/PHOENIX-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410452#comment-15410452
]
James Taylor commented on PHOENIX-3156:
---------------------------------------
Thanks for tracking this down, [~lhofhansl]. Probably need a whiteboard to
fully understand the issue, but here are some ideas/comments:
- Meta comment - we need to get a release out. It's been 5 months. There's a
lot of good stuff in 4.8. How about a quick JIRA to disable usage of
DistinctPrefixFilter for all scenarios where we're not sure if it works while
we figure out the ultimate solution?
- We can know from the client what the sort order is for all columns involved
in the evaluation. Would the DistinctPrefixFilter work depending on the column
names and their sort order?
- For the transaction visibility filter, instead of using a FIlterList, we had
to wrap the filter instead (for similar reasons as you're describing),
evaluating it ourselves based on the evaluation of the nested filter. See
TransactionVisibilityFilter. Would an approach like that help?
- For the 4.9 release, for immutable tables we'll have a single KeyValue per
row with all values encoded. In this case, does it simplify the issue?
> Bug in DistinctPrefixFilter
> ---------------------------
>
> Key: PHOENIX-3156
> URL: https://issues.apache.org/jira/browse/PHOENIX-3156
> Project: Phoenix
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Priority: Blocker
> Fix For: 4.8.0
>
> Attachments: 3156-v2.txt, 3156.txt
>
>
> There's a corner case I found where a DISTINCT and GROUP BY query along a
> prefix of a compound row key might return incorrect results.
> The filter relies on seeing the _0 column absolutely last, and not seeing all
> Cells that should be filtered. That break in two scenarios:
> # we have a table with key (key1, key2, key3) and columns (c1 and c2). Now
> construct a WHERE <a clause that always matches c1>, <a clause that filters
> by c2) GROUP BY key1, key2. Now the filter would mis-skip when it sees the
> Cell for c1.
> # we force lower key column names. In that case those would sort after the _0
> column. The DistinctPrefixFilter would see the _0 column first and skip.
> In both case we are effectively changing the order in which the filters are
> applied. The DistinctPrefixFilter is no longer for the row.
> I can fix #1 (by ignoring all Cells other than then _0 one). I do not know
> how to fix case #2.
> I think this is a blocker and we may have to undo the entire DISTINCT and
> GROUP BY prefix optimization.
> [[email protected]], [~giacomotaylor], [~samarthjain].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)