[ https://issues.apache.org/jira/browse/TRAFODION-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259908#comment-16259908 ]
David Wayne Birdsall commented on TRAFODION-2813: ------------------------------------------------- I've debugged this. Here are my findings: There is a layer of heuristics that tries to reduce the number of indexes that the Optimizer will consider during its costing phases. I will guess these heuristics were added in Neoview days when compilation time was a concern. The layer begins at Scan::addIndexInfo (optimizer/RelExpr.cpp). It analyzes the available indexes for such things as whether they cover all the required columns (so we can do index-only access), whether the index satisfies a required order, and so on. When INDEX_ELIMINATION_LEVEL is AGGRESSIVE, it also tries to prune out indexes that begin with the same column, keeping only the one with the smallest I/O. The latter logic is in IndexProperty::updatePossibleIndexes (optimizer/IndexDesc.cpp) and IndexProperty::compareIndexPromise (optimizer/IndexDesc.cpp). The latter routine has a very curious check: // currently it is the same for indexOnlyScans ans alternateIndexScans. // If index key starts from the same (single column) then the smaller index // (having smaller KbPerVolume attribute) is MORE promising, the bigger index // is LESS promising, and the same index size has the SAME promise. const IndexDesc * index = getIndexDesc(); const IndexDesc * otherIndex = ixProp->getIndexDesc(); if ( ((IndexColumn *)(index->getIndexKey()[0]).getItemExpr())->getDefinition() != ((IndexColumn *)(otherIndex->getIndexKey()[0]).getItemExpr())->getDefinition() ) return INCOMPATIBLE; The goal of this code appears to be to keep around a representative of each class of indexes that begin on a particular column. So, for example, if the base table had primary key A, B, C, and we had an index B, C, A, this logic would keep both. Though if there were two indexes, B, A, C and B, C, A, this code would only keep one of them. That code was written before we had salting and divisioning, however. In the case of Q3, both the base table and the index begin with "_SALT_", so this "if" was not taken. Logic after this attempts to compute how much I/O would be done through each index, and gets it wrong, and we keep the wrong index. So, I think this code should be changed so we skip any matching leading salt or divisioning columns before doing the comparison. That will get us back to the original intent of this code. But this is a tentative conclusion; this particular code is new to me so there may be a better solution. As a result of this analysis, I found a work-around: By setting CQD INDEX_ELIMINATION_LEVEL to 'MAXIMUM', the pruning heuristic is bypassed, the costing code considers both base table and index paths, and correctly picks an efficient MDAM plan on the base table. (The default for that CQD is 'AGGRESSIVE'.) > Salting + heuristics prevent MDAM plan on base table from being considered > -------------------------------------------------------------------------- > > Key: TRAFODION-2813 > URL: https://issues.apache.org/jira/browse/TRAFODION-2813 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Affects Versions: 2.3-incubating > Environment: All > Reporter: David Wayne Birdsall > Assignee: David Wayne Birdsall > > When a salted table also has salted indexes, it may happen that a query that > has a good MDAM plan on the base table gets a full scan on a salted index > instead. > The problem is that there is a defective heuristic that rules out base table > access before we get to the costing code. So the costing code only sees one > access path, namely the index, and (correctly) picks a full scan on it. > The script below reproduces the problem: > ?section setup > drop table if exists trafodion.seabase.t1 cascade; > CREATE TABLE TRAFODION.SEABASE.T1 > ( > key1 NUMERIC(9, 0) NO DEFAULT NOT NULL NOT > DROPPABLE NOT SERIALIZED > , key2 VARCHAR(64) CHARACTER SET ISO88591 COLLATE > DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED > , key3 TIMESTAMP(6) NO DEFAULT NOT NULL NOT > DROPPABLE NOT SERIALIZED > , nonkey VARCHAR(1024) CHARACTER SET ISO88591 > COLLATE DEFAULT DEFAULT NULL NOT SERIALIZED > , PRIMARY KEY (key1 ASC, key2 ASC, key3 ASC) > ) > SALT USING 4 PARTITIONS > ON (key2) > DIVISION BY (DATE_TRUNC('HOUR',key3) > NAMED AS ("_DIVISION_1_")) > ATTRIBUTES ALIGNED FORMAT > HBASE_OPTIONS > ( > DATA_BLOCK_ENCODING = 'FAST_DIFF', > MEMSTORE_FLUSH_SIZE = '1073741824' > ) > ; > upsert using load into TRAFODION.SEABASE.T1 > select 50000 + c0, > 'user ' || cast(c5 + 10*c0 + 100*c1 + 1000*c2 + 10000*c3 as varchar(20)), > timestamp '2017-07-01 12:00:00.000000' + cast(c1 + 10*c2 + 100*c3 + 1000*c4 > + 1000*c5 as interval minute(6)), > 'valid prod ' || cast(7*c3 + 4*c4 + 11*c2 as varchar(20)) > from (values(1)) t > transpose 0,1,2,3,4,5,6,7,8,9 as c0 > transpose 0,1,2,3,4,5,6,7,8,9 as c1 > transpose 0,1,2,3,4,5,6,7,8,9 as c2 > transpose 0,1,2,3,4,5,6,7,8,9 as c3 > transpose 0,1,2,3,4,5,6,7,8,9 as c4 > transpose 0,1,2,3,4,5,6,7,8,9 as c5; > update statistics for table TRAFODION.SEABASE.T1 on every column; > CREATE INDEX T1_IDX1 ON TRAFODION.SEABASE.T1 > ( > key2 ASC > , key3 ASC > ) > ATTRIBUTES ALIGNED FORMAT > HBASE_OPTIONS > ( > DATA_BLOCK_ENCODING = 'FAST_DIFF', > MEMSTORE_FLUSH_SIZE = '1073741824' > ) > SALT LIKE TABLE > ; > ?section queries > prepare s1 from > SELECT key3, key2 FROM trafodion.seabase.t1 WHERE key1 = 50001 AND key3 > >= to_date(cast (20170705080000 as char(20)),'YYYYMMDDHH24MISS') AND key3 > <= to_date(cast (20170705085959 as char(20)),'YYYYMMDDHH24MISS') ; > explain s1; > In this example, S1 gets full scans on the index, even though an MDAM plan on > the base table would be far more efficient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)