Repository: incubator-trafodion Updated Branches: refs/heads/master c7199d5aa -> dabad9ae9
Rework to address compGeneral/TEST006 failure (cherry picked from commit 6320b901b6f123da9697dc8a898785168958dc10) Project: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/commit/682c48e2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/tree/682c48e2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-trafodion/diff/682c48e2 Branch: refs/heads/master Commit: 682c48e260d6bde50b13bf40567c9353e3b3dcda Parents: c66fca2 Author: Dave Birdsall <dave.birds...@esgyn.com> Authored: Wed Oct 18 21:45:04 2017 +0000 Committer: Dave Birdsall <dbirds...@apache.org> Committed: Wed Oct 18 22:43:20 2017 +0000 ---------------------------------------------------------------------- core/sql/optimizer/ScanOptimizer.cpp | 87 +++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-trafodion/blob/682c48e2/core/sql/optimizer/ScanOptimizer.cpp ---------------------------------------------------------------------- diff --git a/core/sql/optimizer/ScanOptimizer.cpp b/core/sql/optimizer/ScanOptimizer.cpp index b753be6..5ca7d86 100644 --- a/core/sql/optimizer/ScanOptimizer.cpp +++ b/core/sql/optimizer/ScanOptimizer.cpp @@ -5702,6 +5702,63 @@ FileScanOptimizer::runMdamTests } #endif + +// The following method is now dead code and will soon be replaced functionally. +// This method attempts to decide whether MDAM should be used on a table that +// lack statistics. It does so by analyzing the key predicates, considering +// the upper bounds on UECs implied by them and the number of ranges they +// define. +// +// The calling method, FileScanOptimizer::computeCostForMultipleSubset, forces +// MDAM when this method returns TRUE. There is one problem though: The costing +// logic is bypassed so there is no computation of what stop columns to use. +// So, we use the default settings which are to traverse *EVERY* key column. +// This can be very bad from a performance perspective. +// +// Apart from this design flaw, there are some other specific bugs in this +// method and the methods it calls. I'm documenting them here in case someone +// after me feels they need to use this code after all. (Note that JIRA +// TRAFODION-2645 intends to ultimately replace this code.) Good luck! Here's +// the (probably not exhaustive) list: +// +// 1. Int32 totalUec should be an Int64, because that's what +// RangeSpec::getTotalDistinctValues returns. (The implied cast of Int64 down +// to Int32 essentially does a mod 2^32 operation, resulting in a garbage value.) +// +// 2. Similarly, ARRAY(Int32) uecsByKeyColumns should be ARRAY(Int64). +// +// 3. In the Subrange object (qmscommon/Range.h), the "end" member is uninitialized +// for predicates of the form A < 10. So you get a random value back from +// RangeSpec::getTotalDistinctValues. Similarly, the "start" member is +// uninitialized for predicates of the form A > 10. +// +// 4. Fixing the uninitialized member problem above is tricky. The value to use +// for "start" and "end" depends on the data type of the key column it is +// compared to. Because of VEG effects, this can vary. For example, if we have +// predicates of the form A < 10 AND A = B, and A happens to be an INTEGER +// while B happens to be a SMALLINT UNSIGNED, the "end" member value used should +// be -2^32 for A, but 0 for B. So, the NAType of the key column needs to be +// passed in to RangeSpec::getTotalDistinctValues. +// +// 5. Subrange::getStartScalarValElem and Subrange::getEndScalarValElem need +// to return NULL if startIsMin_ or endIsMax_ are set respectively. (These +// are the cases where "start" and "end" are uninitialized. And as mentioned +// in point 4, they can't be initialized until we pick the key column out of +// any VEG.) +// +// 6. SubrangeBase::getTotalDistinctValues needs to treat a NULL return from +// Subrange::getStartScalarValElem or Subrange::getEndScalarValElem as a +// minimum or maximum value respectively by the data type of the key column. +// SubrangeBase::getExactNumericMinMax can be used to obtain those values. +// +// It is not obvious to me why these issues were not noticed sooner. In my +// case I found them when investigating a non-deterministic failure in test +// compGeneral/TEST006, where an index access query was (sometimes!) flipping +// over to MDAM. This was unmasked by the change I made in JIRA TRAFODION-2765; +// formerly *another* set of heuristcs prevented us from reaching this code. +// However, that is no guarantee that this code was never reached before. + + NABoolean FileScanOptimizer::isMDAMFeasibleForHBase(const IndexDesc* idesc, ValueIdSet& preds) { Lng32 threshold = (Lng32)(ActiveSchemaDB()->getDefaults()). @@ -5804,11 +5861,41 @@ FileScanOptimizer::computeCostForMultipleSubset ValueIdSet exePreds; ValueIdSet selPreds = getRelExpr().getSelectionPred(); + /* + Commenting this code out pending a rewrite of the MDAM costing code. + + What the code below tries to do is if the table has no statistics, + it looks at the predicates of any RangeSpec and tries to determine + if they imply an MDAM plan of just a few positionings. And if so, + the code returns a zero Cost object, which in effect forces the + caller to use MDAM. + + What is wrong-headed about this approach is that the MdamKey object + passed in is set to use all key column predicates as MDAM predicates + and has the stop columns set to their maximums. That is, by returning + here we have bypassed the costing logic that figures out the optimal + stop column setting. This can result in *TREMENDOUSLY* inefficient + plans. + + Fixing this requires essentially rewriting the costing code. And since + I am already doing that elsewhere using a different approach (see + JIRA TRAFODION-2645), there is no point in doing it yet another time + here. + + However if someone comes along after me and insists on fixing this + particular code, I've documented some additional issues in the method + FileScanOptimizer::isMDAMFeasibleForHBase. + + The effect of commenting this code out is that tables without statistics + will tend to get single subset plans. This is probably OK for the moment + as most such tables are small. Often they are metadata tables. + const IndexDesc *indexDesc = getFileScan().getIndexDesc(); NABoolean isHbaseTable = indexDesc->getPrimaryTableDesc()->getNATable()->isHbaseTable(); if ( isHbaseTable && isMDAMFeasibleForHBase(indexDesc, selPreds) ) { return new (HEAP) Cost(); } + */ if ((CmpCommon::getDefault(RANGESPEC_TRANSFORMATION) == DF_ON ) && (selPreds.entries()))