[ https://issues.apache.org/jira/browse/PHOENIX-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15644889#comment-15644889 ]
James Taylor edited comment on PHOENIX-3438 at 11/9/16 7:30 PM: ---------------------------------------------------------------- Make sure to push down the limit to the aggregation: {code} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id IN ('org1','org3') ORDER BY score DESC LIMIT 3 {code} was (Author: jamestaylor): Make sure to push down the limit to the aggregation: {code} SELECT DISTINCT entity_id, score FROM test.test WHERE organization_id IN ('org1','org3') ORDER BY max_score DESC LIMIT 3 {code} > Phoenix should be able to optimize LIMIT query with IN clause > ------------------------------------------------------------- > > Key: PHOENIX-3438 > URL: https://issues.apache.org/jira/browse/PHOENIX-3438 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.8.0 > Reporter: Joel Palmert > Assignee: James Taylor > Priority: Critical > Fix For: 4.8.3 > > > This is taken from a real production issue we're having but has been > simplified here. Steps to repro: > CREATE TABLE IF NOT EXISTS TEST.TEST ( > ORGANIZATION_ID CHAR(15) NOT NULL, > SCORE INTEGER NOT NULL, > ENTITY_ID CHAR(15) > CONSTRAINT TEST_PK PRIMARY KEY ( > ORGANIZATION_ID, > SCORE DESC > ) > ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=43200 > EXPLAIN > SELECT entity_id, MAX(score) max_score > FROM test.test > WHERE organization_id IN ('org1','org3') > GROUP BY entity_id > ORDER BY max_score DESC > LIMIT 1 > CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEST ['org1 > '] - ['org2 '] > SERVER AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID] > CLIENT MERGE SORT > CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC] > Notice that Phoenix gets and sends all the rows for the two organizations to > the client. Since the ORDER BY is in PK order I would expect it to just > 1. Get the first row for each organization (in index order) rather than > scanning all the rows. > 2. Merge them (only 2 rows in this case) > 3. Limit 1 > Consider the following query and plan for what I would expect to see: > EXPLAIN > SELECT entity_id, MAX(score) max_score > FROM( > SELECT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > WHERE organization_id = 'org1' > ORDER BY score DESC > LIMIT 1) > UNION ALL > SELECT entity_id, score > FROM( > SELECT entity_id, score > FROM test.test > WHERE organization_id = 'org2' > ORDER BY score DESC > LIMIT 1)) > GROUP BY entity_id > ORDER BY max_score DESC > LIMIT 1 > UNION ALL OVER 2 QUERIES > CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org1 '] > SERVER 1 ROW LIMIT > CLIENT 1 ROW LIMIT > CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org2 '] > SERVER 1 ROW LIMIT > CLIENT 1 ROW LIMIT > CLIENT SORTED BY [ENTITY_ID] > CLIENT AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID] > CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC] -- This message was sent by Atlassian JIRA (v6.3.4#6332)