Daniel Wong created PHOENIX-5280:
------------------------------------

             Summary: Provide Improvements to Scan on Composite PK where 
Leading Edge not fully Specified but the edge next columns are in most leading 
keys
                 Key: PHOENIX-5280
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5280
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Daniel Wong


Provide Improvements to Scan on Composite PK where Leading Edge not fully 
Specified but the edge next columns are in most leading keys

Recently a user has had an issue where they have a composite pk with 2 columns 
say (organizationId varchar, departmentId varchar).  They want to query all 
their data with a condition where department is fully qualified department.  
Example SELECT * FROM TABLE WHERE  departmentId='123'.  They also know that 95% 
of the organization leading edge contains the qualified trailing edge.  However 
department = '123' is less than 5% of the total data in the table.

Based on the explain plan today for this we would run a Round Robin Full Scan 
with a filter on departmentId='123'.
While one possible approach to not run a full table scan is to build an index 
on department. Another approach could be to construct a new version of a 
skipscan like filter to control this scan.  Essentially we could use 1 lookup 
to find the organizaitonId then additoin skipscan for the trailing key.

For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|


First query the initial row in the region.  We get 'org1','100'.  From this we 
can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
proessing that block (in our case 0 rows) we would run to the row at or greater 
than  nextKey(current orgnaziationId),'123'.  This would give us org4,101.  We 
would then run to the row of 'org4','123'.  Essentailly 1 step to find the 
orgId and then a scan of all the departments for that value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to