[jira] [Updated] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys

2019-05-14 Thread Daniel Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Wong updated PHOENIX-5280:
-
Description: 
Provide Improvements to Scan on Composite PK where Leading Edge not fully 
Specified but the edge next columns are in most leading keys

Recently a user has had an issue where they have a composite pk with 2 columns 
say (organizationId varchar, departmentId varchar).  They want to query all 
their data with a condition where department is fully qualified department.  
Example SELECT * FROM TABLE WHERE  departmentId='123'.  They also know that 95% 
of the organization leading edge contains the qualified trailing edge.  However 
department = '123' is less than 5% of the total data in the table.

Based on the explain plan today for this we would run a Round Robin Full Scan 
with a filter on departmentId='123'.
 While one possible approach to not run a full table scan is to build an index 
on department. Another approach could be to construct a new version of a 
skipscan like filter to control this scan.  Essentially we could use 1 lookup 
to find the organizationId then additional skipscan for the trailing key.  This 
could be triggered with a sql syntax hint or in the future data driven.

For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|

First query the initial row in the region.  We get 'org1','100'.  From this we 
can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
proessing that block (in our case 0 rows) we would run to the row at or greater 
than  nextKey(current orgnaziationId),'123'.  This would give us org4,101.  We 
would then run to the row of 'org4','123'.  Essentailly 1 step to find the 
orgId and then a scan of all the departments for that value.

  was:
Provide Improvements to Scan on Composite PK where Leading Edge not fully 
Specified but the edge next columns are in most leading keys

Recently a user has had an issue where they have a composite pk with 2 columns 
say (organizationId varchar, departmentId varchar).  They want to query all 
their data with a condition where department is fully qualified department.  
Example SELECT * FROM TABLE WHERE  departmentId='123'.  They also know that 95% 
of the organization leading edge contains the qualified trailing edge.  However 
department = '123' is less than 5% of the total data in the table.

Based on the explain plan today for this we would run a Round Robin Full Scan 
with a filter on departmentId='123'.
While one possible approach to not run a full table scan is to build an index 
on department. Another approach could be to construct a new version of a 
skipscan like filter to control this scan.  Essentially we could use 1 lookup 
to find the organizaitonId then additoin skipscan for the trailing key.

For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|


First query the initial row in the region.  We get 'org1','100'.  From this we 
can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
proessing that block (in our case 0 rows) we would run to the row at or greater 
than  nextKey(current orgnaziationId),'123'.  This would give us org4,101.  We 
would then run to the row of 'org4','123'.  Essentailly 1 step to find the 
orgId and then a scan of all the departments for that value.


> Provide Improvements to Scan on Composite PK where Leading Edge not fully 
> Specified but the edge next columns are in most leading keys
> --
>
> Key: PHOENIX-5280
> URL: https://issues.apache.org/jira/browse/PHOENIX-5280
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Daniel Wong
>Priority: Minor
>
> Provide Improvements to Scan on Composite PK where Leading Edge not fully 
> Specified but the edge next columns are in most leading keys
> Recently a user has had an issue where they have a composite pk with 2 
> columns say (organizationId varchar, departmentId varchar).  They want to 
> query all their data with a condition where department is fully qualified 
> department.  Example SELECT * FROM TABLE WHERE  departmentId='123'.  They 
> also know that 95% of the organization leading edge contains the qualified 
> trailing edge.  However department = '123' is less than 5% of the total data 
> in the table.
> Based on the explain plan today for this we would run a Round Robin Full Scan 
> with a filter on departmentId='123'.
>  While one possible approach to not run a full table scan is to build an 
> index on department. Another approach could be to construct a new version of 
> a skipscan like 

[jira] [Updated] (PHOENIX-5280) Provide Improvements to Scan on Composite PK where Leading Edge not fully Specified but the edge next columns are in most leading keys

2019-05-14 Thread Daniel Wong (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Wong updated PHOENIX-5280:
-
Description: 
Provide Improvements to Scan on Composite PK where Leading Edge not fully 
Specified but the edge next columns are in most leading keys

Recently a user has had an issue where they have a composite pk with 2 columns 
say (organizationId varchar, departmentId varchar).  They want to query all 
their data with a condition where department is fully qualified department.  
Example SELECT * FROM TABLE WHERE  departmentId='123'.  They also know that 95% 
of the organization leading edge contains the qualified trailing edge.  However 
department = '123' is less than 5% of the total data in the table.

Based on the explain plan today for this we would run a Round Robin Full Scan 
with a filter on departmentId='123'.
 While one possible approach to not run a full table scan is to build an index 
on department. Another approach could be to construct a new version of a 
skipscan like filter to control this scan.  Essentially we could use 1 lookup 
to find the organizationId then additional skipscan for the trailing key.  This 
could be triggered with a sql syntax hint or in the future data driven.

For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|

First query the initial row in the region.  We get 'org1','100'.  From this we 
can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
processing that block (in our case 0 rows) we would run to the row at or 
greater than  nextKey(current orgnaziationId),'123'.  This would give us 
org4,101.  We would then run to the row of 'org4','123'.  Essentially 1 step to 
find the orgId and then a scan of all the departments for that value.

  was:
Provide Improvements to Scan on Composite PK where Leading Edge not fully 
Specified but the edge next columns are in most leading keys

Recently a user has had an issue where they have a composite pk with 2 columns 
say (organizationId varchar, departmentId varchar).  They want to query all 
their data with a condition where department is fully qualified department.  
Example SELECT * FROM TABLE WHERE  departmentId='123'.  They also know that 95% 
of the organization leading edge contains the qualified trailing edge.  However 
department = '123' is less than 5% of the total data in the table.

Based on the explain plan today for this we would run a Round Robin Full Scan 
with a filter on departmentId='123'.
 While one possible approach to not run a full table scan is to build an index 
on department. Another approach could be to construct a new version of a 
skipscan like filter to control this scan.  Essentially we could use 1 lookup 
to find the organizationId then additional skipscan for the trailing key.  This 
could be triggered with a sql syntax hint or in the future data driven.

For a given region assume the data looks like this.
||organizationId||departmentId||
|org1|100|
|org4|100|
|org4|101|
|org4|123|
|org5|100|
|org5|123|

First query the initial row in the region.  We get 'org1','100'.  From this we 
can construct the next rows of ['org1','123' - 'org1','123\x0').  After 
proessing that block (in our case 0 rows) we would run to the row at or greater 
than  nextKey(current orgnaziationId),'123'.  This would give us org4,101.  We 
would then run to the row of 'org4','123'.  Essentailly 1 step to find the 
orgId and then a scan of all the departments for that value.


> Provide Improvements to Scan on Composite PK where Leading Edge not fully 
> Specified but the edge next columns are in most leading keys
> --
>
> Key: PHOENIX-5280
> URL: https://issues.apache.org/jira/browse/PHOENIX-5280
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Daniel Wong
>Priority: Minor
>
> Provide Improvements to Scan on Composite PK where Leading Edge not fully 
> Specified but the edge next columns are in most leading keys
> Recently a user has had an issue where they have a composite pk with 2 
> columns say (organizationId varchar, departmentId varchar).  They want to 
> query all their data with a condition where department is fully qualified 
> department.  Example SELECT * FROM TABLE WHERE  departmentId='123'.  They 
> also know that 95% of the organization leading edge contains the qualified 
> trailing edge.  However department = '123' is less than 5% of the total data 
> in the table.
> Based on the explain plan today for this we would run a Round Robin Full Scan 
> with a filter on departmentId='123'.
>  While one possible approach to not run a full table scan is to build an 
> index on