Lars Hofhansl created PHOENIX-3156:
--------------------------------------

             Summary: Bug in DistinctPrefixFilter
                 Key: PHOENIX-3156
                 URL: https://issues.apache.org/jira/browse/PHOENIX-3156
             Project: Phoenix
          Issue Type: Bug
            Reporter: Lars Hofhansl
            Priority: Blocker
             Fix For: 4.8.0


There's a corner case I found where a DISTINCT and GROUP BY query along a 
prefix of a compound row key might return incorrect results.

The filter relies on seeing the _0 column absolutely last, and not seeing all 
Cells that should be filtered. That break in two scenarios:
# we have a table with key (key1, key2, key3) and columns (c1 and c2). Now 
construct a WHERE <a clause that always matches c1>, <a clause that filters by 
c2) GROUP BY key1, key2. Now the filter would mis-skip when it sees the Cell 
for c1.
# we force lower key column names. In that case those would sort after the _0 
column. The DistinctPrefixFilter would see the _0 column first and skip.

I can fix #1 (by ignoring all Cells other than then _0 one). I do not know how 
to fix case #2.

I think this is a blocker and we may have to undo the entire DISTINCT and GROUP 
BY prefix optimization.

[~an...@apache.org], [~giacomotaylor], [~samarthjain].




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to