[ https://issues.apache.org/jira/browse/PHOENIX-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218205#comment-15218205 ]
Samarth Jain commented on PHOENIX-2758: --------------------------------------- +1, looks good. > Ordered GROUP BY not occurring with leading PK equality expression > ------------------------------------------------------------------ > > Key: PHOENIX-2758 > URL: https://issues.apache.org/jira/browse/PHOENIX-2758 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Attachments: PHOENIX-2758.patch, PHOENIX-2758_wip.patch > > > The following query: > {code} > SELECT SUM(DUP_COUNT) FROM ( > SELECT COUNT(1) As DUP_COUNT > FROM DATACLOUD.DATA_ASSESSMENT_RECORD > WHERE JOURNEY_ID='07ixx000000004J' AND > DATASOURCE=0 AND MATCH_STATUS <= 1 and > ORGANIZATION_ID='07ixx000000004J' > GROUP BY MATCH_STATUS, EXTERNAL_DATASOURCE_KEY > HAVING COUNT(1) > 1); > {code} > Should use an ORDERED DISTINCT, but doesn't. > This is the DDL: > {code} > CREATE TABLE DATACLOUD.DATA_ASSESSMENT_RECORD (ORGANIZATION_ID char(15) not > null, > JOURNEY_ID char(15) not null, > DATASOURCE SMALLINT not null, > MATCH_STATUS TINYINT not null, > EXTERNAL_DATASOURCE_KEY varchar(30), > ENTITY_ID char(15) not null, > CONSTRAINT PK PRIMARY KEY ( > ORGANIZATION_ID, > JOURNEY_ID, > DATASOURCE, > MATCH_STATUS, > EXTERNAL_DATASOURCE_KEY, > ENTITY_ID)) > {code} > The optimization does occur if you include the PK columns in the GROUP BY > like this: > {code} > SELECT SUM(DUP_COUNT) FROM ( > SELECT COUNT(1) As DUP_COUNT > FROM DATACLOUD.DATA_ASSESSMENT_RECORD > WHERE JOURNEY_ID='07ixx000000004J' AND > DATASOURCE=0 AND MATCH_STATUS <= 1 and > ORGANIZATION_ID='07ixx000000004J' > GROUP BY ORGANIZATION_ID, JOURNEY_ID, DATASOURCE, MATCH_STATUS, > EXTERNAL_DATASOURCE_KEY > HAVING COUNT(1) > 1); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)