[ 
https://issues.apache.org/jira/browse/PHOENIX-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehdi Salarkia updated PHOENIX-5950:
------------------------------------
    Description: 
For a table with a composite primary
{code:java}
CREATE TABLE MY_TABLE (K1 INTEGER NOT NULL, K2 VARCHAR NOT NULL,K3 INTEGER NOT 
NULL, V1 DECIMAL, CONSTRAINT pk PRIMARY KEY (K1, K2, K3))
{code}
when a view is created that has some (and not all) of primary key columns
{code:java}
CREATE VIEW MY_VIEW(v2 VARCHAR, V3 VARCHAR ) AS SELECT * FROM MY_TABLE WHERE K2 
= 'A'
{code}
if you run a query on the view without providing all the primary columns
{code:java}
EXPLAIN SELECT K1, K2, K3, V1 FROM MY_VIEW WHERE (K1,K3) IN ((1,2),(3,4));
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                              PLAN                             
                 | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN SKIP SCAN ON 2 KEYS OVER MY_TABLE 
[1,'A'] - [3,'A']  | null            | null           | null         |
|     SERVER FILTER BY (K1, K3) IN 
([128,0,0,1,128,0,0,2],[128,0,0,3,128,0,0,4])                 | null            
| null           | null         |
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.047 seconds)
{code}
 the query generated is scan rather than a point look up
 same query on the parent table (with all the keys) looks like this
{code:java}
EXPLAIN SELECT K1, K2, K3, V1 FROM MY_TABLE WHERE (K1,K2,K3) IN 
((1,'A',2),(3,'A',4));
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                               PLAN                            
                   | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 2 ROWS 268 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 2 
KEYS OVER MY_TABLE  | 268             | 2              | 0            |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.025 seconds)
{code}
The issue is view condition is always added as `AND` to user provided where 
clause and in this case query optimizer is failing to optimize this query to a 
point look up.

 

 -------------------------------------------------------------- *[Affected Use 
Case]*  
----------------------------------------------------------------------------

The impact of this issue is most visible when you try to run IndexScrutiny tool 
a view with an index which generates queries like 
{code:java}
SELECT /*+ NO_INDEX */ CAST("K1" AS INTEGER),CAST("K2" AS VARCHAR),CAST("V1" AS 
DECIMAL),CAST("0"."V2" AS VARCHAR),CAST("0"."V3" AS VARCHAR) FROM MY_VIEW WHERE 
("K1","K3") IN ((?,?),(?,?));
{code}
and has very poor performance and causes performance degradation. 





-------------------------------------------------------------- *[POSSIBLE 
WORKAROUND]*  
----------------------------------------------------------------------------

One possible workaround is to provide all the pk (including the view pk 
columns) 
{code:java}
 EXPLAIN SELECT K1, K2, K3, V1, V2, V3 FROM MY_VIEW WHERE (K1,K2,K3) IN 
((1,'A',2),(3,'A',4));
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                               PLAN                            
                   | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 2 ROWS 632 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 2 
KEYS OVER MY_TABLE  | 632             | 2              | 0            |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
{code}
but as you can see the projected _EST_BYTES_READ_ goes up because the 
underlying query that gets executed is something like:
{code:java}
SELECT K1, K2, K3, V1, V2, V3 FROM MY_VIEW WHERE (K1,K2,K3) IN 
((1,'A',2),(3,'A',4)) AND K2 = 'A';
{code}
and certainly the `_AND K2 = 'A'_` is redundant.

 

 

------------------------------------------------------------- -- *[PROPOSED 
SOLUTION]* -- 
-----------------------------------------------------------------------------
 we can make the view condition to be injected into any partial primary key 
lookup (tuple style conditions) respecting the same order for columns defined 
in the parent table

  was:
For a table with a composite primary
{code:java}
CREATE TABLE MY_TABLE (K1 INTEGER NOT NULL, K2 VARCHAR NOT NULL,K3 INTEGER NOT 
NULL, V1 DECIMAL, CONSTRAINT pk PRIMARY KEY (K1, K2, K3))
{code}
when a view is created that has some (and not all) of primary key columns
{code:java}
CREATE VIEW MY_VIEW(v2 VARCHAR, V3 VARCHAR ) AS SELECT * FROM MY_TABLE WHERE K2 
= 'A'
{code}
if you run a query on the view without providing all the primary columns
{code:java}
EXPLAIN SELECT K1, K2, K3, V1 FROM MY_VIEW WHERE (K1,K3) IN ((1,2),(3,4));
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                              PLAN                             
                 | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN SKIP SCAN ON 2 KEYS OVER MY_TABLE 
[1,'A'] - [3,'A']  | null            | null           | null         |
|     SERVER FILTER BY (K1, K3) IN 
([128,0,0,1,128,0,0,2],[128,0,0,3,128,0,0,4])                 | null            
| null           | null         |
+------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.047 seconds)
{code}
 the query generated is scan rather than a point look up
 same query on the parent table (with all the keys) looks like this
{code:java}
EXPLAIN SELECT K1, K2, K3, V1 FROM MY_TABLE WHERE (K1,K2,K3) IN 
((1,'A',2),(3,'A',4));
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                               PLAN                            
                   | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 2 ROWS 268 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 2 
KEYS OVER MY_TABLE  | 268             | 2              | 0            |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.025 seconds)
{code}
The issue is view condition is always added as `AND` to user provided where 
clause and in this case query optimizer is failing to optimize this query to a 
point look up.

 

 

-------------------------------------------------------------- *[POSSIBLE 
WORKAROUND]*  -**- 
--------------------------------------------------------------------------

One possible workaround is to provide all the pk (including the view pk 
columns) 
{code:java}
 EXPLAIN SELECT K1, K2, K3, V1, V2, V3 FROM MY_VIEW WHERE (K1,K2,K3) IN 
((1,'A',2),(3,'A',4));
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                               PLAN                            
                   | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 2 ROWS 632 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 2 
KEYS OVER MY_TABLE  | 632             | 2              | 0            |
+--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
{code}
but as you can see the projected _EST_BYTES_READ_ goes up because the 
underlying query that gets executed is something like:
{code:java}
SELECT K1, K2, K3, V1, V2, V3 FROM MY_VIEW WHERE (K1,K2,K3) IN 
((1,'A',2),(3,'A',4)) AND K2 = 'A';
{code}
and certainly the `_AND K2 = 'A'_` is redundant.

 

 

-------------------------------------------------------------- -*[PROPOSED 
SOLUTION]*- 
------------------------------------------------------------------------------
 we can make the view condition to be injected into any partial primary key 
lookup (tuple style conditions) respecting the same order for columns defined 
in the parent table


> View With Where Clause On A Table With Composite Key Should Be Able To 
> Optimize Queries 
> ----------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5950
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5950
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Mehdi Salarkia
>            Assignee: Mehdi Salarkia
>            Priority: Major
>
> For a table with a composite primary
> {code:java}
> CREATE TABLE MY_TABLE (K1 INTEGER NOT NULL, K2 VARCHAR NOT NULL,K3 INTEGER 
> NOT NULL, V1 DECIMAL, CONSTRAINT pk PRIMARY KEY (K1, K2, K3))
> {code}
> when a view is created that has some (and not all) of primary key columns
> {code:java}
> CREATE VIEW MY_VIEW(v2 VARCHAR, V3 VARCHAR ) AS SELECT * FROM MY_TABLE WHERE 
> K2 = 'A'
> {code}
> if you run a query on the view without providing all the primary columns
> {code:java}
> EXPLAIN SELECT K1, K2, K3, V1 FROM MY_VIEW WHERE (K1,K3) IN ((1,2),(3,4));
> +------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> |                                              PLAN                           
>                    | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN SKIP SCAN ON 2 KEYS OVER MY_TABLE 
> [1,'A'] - [3,'A']  | null            | null           | null         |
> |     SERVER FILTER BY (K1, K3) IN 
> ([128,0,0,1,128,0,0,2],[128,0,0,3,128,0,0,4])                 | null          
>   | null           | null         |
> +------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> 2 rows selected (0.047 seconds)
> {code}
>  the query generated is scan rather than a point look up
>  same query on the parent table (with all the keys) looks like this
> {code:java}
> EXPLAIN SELECT K1, K2, K3, V1 FROM MY_TABLE WHERE (K1,K2,K3) IN 
> ((1,'A',2),(3,'A',4));
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> |                                               PLAN                          
>                      | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | CLIENT 1-CHUNK 2 ROWS 268 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 
> 2 KEYS OVER MY_TABLE  | 268             | 2              | 0            |
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> 1 row selected (0.025 seconds)
> {code}
> The issue is view condition is always added as `AND` to user provided where 
> clause and in this case query optimizer is failing to optimize this query to 
> a point look up.
>  
>  -------------------------------------------------------------- *[Affected 
> Use Case]*  
> ----------------------------------------------------------------------------
> The impact of this issue is most visible when you try to run IndexScrutiny 
> tool a view with an index which generates queries like 
> {code:java}
> SELECT /*+ NO_INDEX */ CAST("K1" AS INTEGER),CAST("K2" AS VARCHAR),CAST("V1" 
> AS DECIMAL),CAST("0"."V2" AS VARCHAR),CAST("0"."V3" AS VARCHAR) FROM MY_VIEW 
> WHERE ("K1","K3") IN ((?,?),(?,?));
> {code}
> and has very poor performance and causes performance degradation. 
> -------------------------------------------------------------- *[POSSIBLE 
> WORKAROUND]*  
> ----------------------------------------------------------------------------
> One possible workaround is to provide all the pk (including the view pk 
> columns) 
> {code:java}
>  EXPLAIN SELECT K1, K2, K3, V1, V2, V3 FROM MY_VIEW WHERE (K1,K2,K3) IN 
> ((1,'A',2),(3,'A',4));
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> |                                               PLAN                          
>                      | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> | CLIENT 1-CHUNK 2 ROWS 632 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP ON 
> 2 KEYS OVER MY_TABLE  | 632             | 2              | 0            |
> +--------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
> {code}
> but as you can see the projected _EST_BYTES_READ_ goes up because the 
> underlying query that gets executed is something like:
> {code:java}
> SELECT K1, K2, K3, V1, V2, V3 FROM MY_VIEW WHERE (K1,K2,K3) IN 
> ((1,'A',2),(3,'A',4)) AND K2 = 'A';
> {code}
> and certainly the `_AND K2 = 'A'_` is redundant.
>  
>  
> ------------------------------------------------------------- -- *[PROPOSED 
> SOLUTION]* -- 
> -----------------------------------------------------------------------------
>  we can make the view condition to be injected into any partial primary key 
> lookup (tuple style conditions) respecting the same order for columns defined 
> in the parent table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to