[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Palmert updated PHOENIX-3451:
----------------------------------
    Summary: Secondary index and query using distinct: LIMIT doesn't return the 
first rows  (was: Secondary index brakes ordering when using distinct)

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -----------------------------------------------------------------------------
>
>                 Key: PHOENIX-3451
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3451
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.8.0
>            Reporter: Joel Palmert
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
>     ORGANIZATION_ID CHAR(15) NOT NULL,
>     CONTAINER_ID CHAR(15) NOT NULL,
>     ENTITY_ID CHAR(15) NOT NULL,
>     SCORE DOUBLE,
>     CONSTRAINT TEST_PK PRIMARY KEY (
>         ORGANIZATION_ID,
>         CONTAINER_ID,
>         ENTITY_ID
>     )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId5    1.2
> entityId3    1.4
> The expected out out would be
> entityId8    1.45
> entityId3    1.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> Further notice that the outputed 2 rows is not the first two rows if you run 
> the query without the limit.
> The rows being returned are the ones assigned to container1. It looks like 
> Phoenix is first getting the rows from the first container and when it finds 
> that to be enough it stops the scan. What it should be doing is getting 2 
> results for each container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to