[
https://issues.apache.org/jira/browse/PHOENIX-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Palmert updated PHOENIX-3438:
----------------------------------
Summary: Phoenix should be able to optimize LIMIT query with IN clause
(was: Phoenix should be able to optimize LIMUT query with IN clause)
> Phoenix should be able to optimize LIMIT query with IN clause
> -------------------------------------------------------------
>
> Key: PHOENIX-3438
> URL: https://issues.apache.org/jira/browse/PHOENIX-3438
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.8.0
> Reporter: Joel Palmert
>
> This is taken from a real production issue we're having but has been
> simplified here. Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> SCORE INTEGER NOT NULL,
> ENTITY_ID CHAR(15)
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> SCORE DESC
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=43200
> EXPLAIN
> SELECT entity_id, MAX(score) max_score
> FROM test.test
> WHERE organization_id IN ('org1','org3')
> GROUP BY entity_id
> ORDER BY max_score DESC
> LIMIT 1
> CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 KEYS OVER TEST.TEST ['org1
> '] - ['org2 ']
> SERVER AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
> CLIENT MERGE SORT
> CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]
> Notice that Phoenix gets and sends all the rows for the two organizations to
> the client. Since the ORDER BY is in PK order I would expect it to just
> 1. Get the first row for each organization (in index order) rather than
> scanning all the rows.
> 2. Merge them (only 2 rows in this case)
> 3. Limit 1
> Consider the following query and plan for what I would expect to see:
> EXPLAIN
> SELECT entity_id, MAX(score) max_score
> FROM(
> SELECT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> ORDER BY score DESC
> LIMIT 1)
> UNION ALL
> SELECT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> ORDER BY score DESC
> LIMIT 1))
> GROUP BY entity_id
> ORDER BY max_score DESC
> LIMIT 1
> UNION ALL OVER 2 QUERIES
> CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org1 ']
> SERVER 1 ROW LIMIT
> CLIENT 1 ROW LIMIT
> CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER TEST.TEST ['org2 ']
> SERVER 1 ROW LIMIT
> CLIENT 1 ROW LIMIT
> CLIENT SORTED BY [ENTITY_ID]
> CLIENT AGGREGATE INTO DISTINCT ROWS BY [ENTITY_ID]
> CLIENT TOP 1 ROW SORTED BY [MAX(SCORE) DESC]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)