[ 
https://issues.apache.org/jira/browse/TRAFODION-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259908#comment-16259908
 ] 

David Wayne Birdsall commented on TRAFODION-2813:
-------------------------------------------------

I've debugged this. Here are my findings:

There is a layer of heuristics that tries to reduce the number of indexes that 
the Optimizer will consider during its costing phases. I will guess these 
heuristics were added in Neoview days when compilation time was a concern. The 
layer begins at Scan::addIndexInfo (optimizer/RelExpr.cpp). It analyzes the 
available indexes for such things as whether they cover all the required 
columns (so we can do index-only access), whether the index satisfies a 
required order, and so on. When INDEX_ELIMINATION_LEVEL is AGGRESSIVE, it also 
tries to prune out indexes that begin with the same column, keeping only the 
one with the smallest I/O. The latter logic is in 
IndexProperty::updatePossibleIndexes (optimizer/IndexDesc.cpp) and 
IndexProperty::compareIndexPromise (optimizer/IndexDesc.cpp). The latter 
routine has a very curious check:

    // currently it is the same for indexOnlyScans ans alternateIndexScans.
    // If index key starts from the same (single column) then the smaller index 
    // (having smaller KbPerVolume attribute) is MORE promising, the bigger 
index
    // is LESS promising, and the same index size has the SAME promise.


    const IndexDesc * index = getIndexDesc();
    const IndexDesc * otherIndex = ixProp->getIndexDesc();
    if ( ((IndexColumn 
*)(index->getIndexKey()[0]).getItemExpr())->getDefinition() != 
         ((IndexColumn 
*)(otherIndex->getIndexKey()[0]).getItemExpr())->getDefinition() )

      return INCOMPATIBLE;

The goal of this code appears to be to keep around a representative of each 
class of indexes that begin on a particular column. So, for example, if the 
base table had primary key A, B, C, and we had an index B, C, A, this logic 
would keep both. Though if there were two indexes, B, A, C and B, C, A, this 
code would only keep one of them.

That code was written before we had salting and divisioning, however. In the 
case of Q3, both the base table and the index begin with "_SALT_", so this "if" 
was not taken. Logic after this attempts to compute how much I/O would be done 
through each index, and gets it wrong, and we keep the wrong index.

So, I think this code should be changed so we skip any matching leading salt or 
divisioning columns before doing the comparison. That will get us back to the 
original intent of this code. But this is a tentative conclusion; this 
particular code is new to me so there may be a better solution.

As a result of this analysis, I found a work-around: By setting CQD 
INDEX_ELIMINATION_LEVEL to 'MAXIMUM', the pruning heuristic is bypassed, the 
costing code considers both base table and index paths, and correctly picks an 
efficient MDAM plan on the base table. (The default for that CQD is 
'AGGRESSIVE'.)

> Salting + heuristics prevent MDAM plan on base table from being considered
> --------------------------------------------------------------------------
>
>                 Key: TRAFODION-2813
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2813
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.3-incubating
>         Environment: All
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> When a salted table also has salted indexes, it may happen that a query that 
> has a good MDAM plan on the base table gets a full scan on a salted index 
> instead.
> The problem is that there is a defective heuristic that rules out base table 
> access before we get to the costing code. So the costing code only sees one 
> access path, namely the index, and (correctly) picks a full scan on it.
> The script below reproduces the problem:
> ?section setup
> drop table if exists trafodion.seabase.t1 cascade;
> CREATE TABLE TRAFODION.SEABASE.T1
>   ( 
>     key1                       NUMERIC(9, 0) NO DEFAULT NOT NULL NOT
>       DROPPABLE NOT SERIALIZED
>   , key2                      VARCHAR(64) CHARACTER SET ISO88591 COLLATE
>       DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED
>   , key3                       TIMESTAMP(6) NO DEFAULT NOT NULL NOT
>       DROPPABLE NOT SERIALIZED
>   , nonkey                       VARCHAR(1024) CHARACTER SET ISO88591
>       COLLATE DEFAULT DEFAULT NULL NOT SERIALIZED
>   , PRIMARY KEY (key1 ASC, key2 ASC, key3 ASC)
>   )
>   SALT USING 4 PARTITIONS
>        ON (key2)
>   DIVISION BY (DATE_TRUNC('HOUR',key3)
>      NAMED AS ("_DIVISION_1_"))
>  ATTRIBUTES ALIGNED FORMAT 
>   HBASE_OPTIONS 
>   ( 
>     DATA_BLOCK_ENCODING = 'FAST_DIFF',
>     MEMSTORE_FLUSH_SIZE = '1073741824' 
>   ) 
> ;
> upsert using load into TRAFODION.SEABASE.T1
> select 50000 + c0,
>  'user ' || cast(c5 + 10*c0 + 100*c1 + 1000*c2 + 10000*c3 as varchar(20)),
>  timestamp '2017-07-01 12:00:00.000000' + cast(c1 + 10*c2 + 100*c3 + 1000*c4 
> + 1000*c5 as interval minute(6)),
>  'valid prod ' || cast(7*c3 + 4*c4 + 11*c2 as varchar(20))
> from (values(1)) t
> transpose 0,1,2,3,4,5,6,7,8,9 as c0
> transpose 0,1,2,3,4,5,6,7,8,9 as c1
> transpose 0,1,2,3,4,5,6,7,8,9 as c2
> transpose 0,1,2,3,4,5,6,7,8,9 as c3
> transpose 0,1,2,3,4,5,6,7,8,9 as c4
> transpose 0,1,2,3,4,5,6,7,8,9 as c5;
> update statistics for table TRAFODION.SEABASE.T1 on every column;
> CREATE INDEX T1_IDX1 ON TRAFODION.SEABASE.T1
>   ( 
>     key2 ASC
>   , key3 ASC
>   )
>  ATTRIBUTES ALIGNED FORMAT 
>   HBASE_OPTIONS 
>   ( 
>     DATA_BLOCK_ENCODING = 'FAST_DIFF',
>     MEMSTORE_FLUSH_SIZE = '1073741824' 
>   ) 
>  SALT LIKE TABLE 
> ;
> ?section queries
> prepare s1 from
> SELECT key3, key2 FROM trafodion.seabase.t1 WHERE key1 = 50001 AND key3 
> >= to_date(cast (20170705080000 as char(20)),'YYYYMMDDHH24MISS') AND key3 
> <= to_date(cast (20170705085959 as char(20)),'YYYYMMDDHH24MISS') ;
> explain s1;
> In this example, S1 gets full scans on the index, even though an MDAM plan on 
> the base table would be far more efficient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to