Re: Query optimizer-miss with unqualified expressions, bug or feature?

Roy Lyseng Mon, 19 Oct 2015 13:56:54 -0700

Hi Shawn,

On 19.10.15 22.33, shawn l.green wrote:



On 10/19/2015 3:48 PM, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 16.07, Ben Clewett wrote:

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and
then use
two starts: one on 'a > 0' and one on 'a < 0', taking a union of the
result?
Which might make a significant result to something?


That is correct. However, if the substitution type for BOOLEAN was
UNSIGNED TINYINT instead of TINYINT, the range 'a < 0' would not even be
needed, and the quite cumbersome UNION would be avoided. But the best
solution would of course be a two-valued boolean type, where 'a <> 0'
would easily be transformed to a = 1. It would also mean that statistics
for the columns would be better, with TINYINT each value has the
estimated probability 1/256, whereas a boolean value would have
probability 1/2.


<joking around>
256 possible values of BOOLEAN? I've heard of fuzzy logic but this is awesome!
Some new literal value names to consider: maybe, sort_of, nearly_always,
certainly, practically_never, likely, ...
</joking>*

Well, it is the practical consequence of using TINYINT as the substitution typefor BOOLEAN...



On a more serious note, indexes with limited cardinality are less useful than
those with excellent cardinality. Cardinality is an approximation (or
calculation. It depends on your storage engine) of how many unique values there
are in the index.

If the Optimizer estimates (based on a calculation based on the Cardinality)
that more than about 30% of a table would need to be retrieved in random order
based on an index, then that index is disallowed.  Why? Because the physical
disk overhead of doing random access averages just slightly more than 3x the
overhead used to scan a much larger block of data.

http://dev.mysql.com/doc/refman/5.6/en/how-to-avoid-table-scan.html

An index on just a Boolean value would have at best a cardinality of 2. So, any
indexes on Boolean values should include other columns to help the index become
more selective.

http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html

You are right about the index use, so it would be interesting only with asignificant skew, say 10% TRUE values. However, the optimizer is not only aboutindexing, but also about calculating the filtering effect of a predicate. Usinga true BOOLEAN rather than a TINYINT would give a better estimate of thefiltering effect, and thus of the estimated number of rows as the outcome of aquery.



*Actually, fuzzy logic has lots of practical application in real world
situations. They are just not using the MySQL BOOLEAN data type to store the
value for comparison.


Thanks,
Roy

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: Query optimizer-miss with unqualified expressions, bug or feature?

Reply via email to