Question: Multi-column histograms for key columns on salted tables

Dave Birdsall Fri, 12 Feb 2016 09:10:21 -0800

Hi,



Reaching out to the development community to learn a little history.



For a single-partitioned table T, with primary key columns A, B, C, if I do
an UPDATE STATISTICS ON EVERY KEY (or ON EVERY COLUMN), I will get
multi-column histograms for prefixes of the key. So, I’ll get a
multi-column histogram for (A,B) and for (A,B,C).



However, if I now salt that table, then an additional column, “_SALT_”, is
prepended to the primary key.



And the behavior of UPDATE STATISTICS is different. Instead of getting
multi-column histograms for (“_SALT_”,A), (“_SALT_”,A,B), and
(“_SALT_”,A,B,C), I get only (“_SALT_”,A,B,C).



Indeed, JIRA TRAFODION-1467 complains about this very behavior. See
https://issues.apache.org/jira/browse/TRAFODION-1467.



Now, I was looking into fixing this JIRA, and I found the following code in
sql/ustat/hs_parser.cpp:



        // For salted table, generate only the longest MC for the key
(subject

        // to max cols determined above) unless a cqd is set to gen all MCs
of

        // allowable sizes.

        if (CmpCommon::getDefault(USTAT_ADD_SALTED_KEY_PREFIXES_FOR_MC) ==
DF_OFF &&

            hs_globals->objDef->getColNum("_SALT_", FALSE) >= 0)

          minMCGroupSz = numKeys;



And indeed this CQD has the default value of ‘OFF’.



So it appears this behavior is intentional.



My question is, why? On the surface it seems a bit arbitrary to me.



Depending on the answer, I’ll address the JIRA in one of the following ways:



1.       Say, it’s working as intended. And if you want all the
multi-column histograms, set CQD USTAT_ADD_SALTED_KEY_PREFIXES_FOR_MC ‘ON’
prior to issuing the UPDATE STATISTICS command.

2.       Change the default to ‘ON’. And you’ll have to set CQD
USTAT_ADD_SALTED_KEY_PREFIXES_FOR_MC ‘OFF’ to reduce the number of
multi-column histograms on key prefixes in salted tables.

3.       Remove the CQD altogether if there was no good reason for it.
Which would give us the ‘ON’ behavior all the time.



Dave

Question: Multi-column histograms for key columns on salted tables

Reply via email to