Daniel Wong created PHOENIX-5280: ------------------------------------ Summary: Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys Key: PHOENIX-5280 URL: https://issues.apache.org/jira/browse/PHOENIX-5280 Project: Phoenix Issue Type: Improvement Reporter: Daniel Wong
Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys Recently a user has had an issue where they have a composite pk with 2 columns say (organizationId varchar, departmentId varchar). They want to query all their data with a condition where department is fully qualified department. Example SELECT * FROM TABLE WHERE departmentId='123'. They also know that 95% of the organization leading edge contains the qualified trailing edge. However department = '123' is less than 5% of the total data in the table. Based on the explain plan today for this we would run a Round Robin Full Scan with a filter on departmentId='123'. While one possible approach to not run a full table scan is to build an index on department. Another approach could be to construct a new version of a skipscan like filter to control this scan. Essentially we could use 1 lookup to find the organizaitonId then additoin skipscan for the trailing key. For a given region assume the data looks like this. ||organizationId||departmentId|| |org1|100| |org4|100| |org4|101| |org4|123| |org5|100| |org5|123| First query the initial row in the region. We get 'org1','100'. From this we can construct the next rows of ['org1','123' - 'org1','123\x0'). After proessing that block (in our case 0 rows) we would run to the row at or greater than nextKey(current orgnaziationId),'123'. This would give us org4,101. We would then run to the row of 'org4','123'. Essentailly 1 step to find the orgId and then a scan of all the departments for that value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)