Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-20 Thread Johan De Meersman
- Original Message -
> From: "Shawn Green" <shawn.l.gr...@oracle.com>
> Subject: Re: Query optimizer-miss with unqualified expressions, bug or 
> feature?
> 
> On a more serious note, indexes with limited cardinality are less useful
> than those with excellent cardinality. Cardinality is an approximation
> (or calculation. It depends on your storage engine) of how many unique
> values there are in the index.

On a related note, are there any plans (and could you offer a rough timeframe?) 
to include bitmap indices in MySQL?


Thanks,
Johan

-- 
Unhappiness is discouraged and will be corrected with kitten pictures.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Ben Clewett


I have noticed that an unqualified boolean expression cannot be 
optimized by MySQL to use an index in 5.6.24.


For example:

CREATE TABLE t (
  i INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  a BOOLEAN NOT NULL,
  KEY a (a)
) ENGINE=InnoDB;

This will hit key 'a':

SELECT * FROM t WHERE a = TRUE;

This will return the same results, but not use key 'a':

SELECT * FROM t WHERE a;

Is this a bug, or deliberate behaviour, or a missing feature, or perhaps 
something else?


Thanks,

Ben Clewett.




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Ben Clewett

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and 
then use two starts: one on 'a > 0' and one on 'a < 0', taking a union 
of the result?  Which might make a significant result to something?


Ben.


On 2015-10-19 14:19, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 15.10, Ben Clewett wrote:


I have noticed that an unqualified boolean expression cannot be 
optimized by

MySQL to use an index in 5.6.24.

For example:

CREATE TABLE t (
   i INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
   a BOOLEAN NOT NULL,
   KEY a (a)
) ENGINE=InnoDB;

This will hit key 'a':

SELECT * FROM t WHERE a = TRUE;

This will return the same results, but not use key 'a':

SELECT * FROM t WHERE a;

Is this a bug, or deliberate behaviour, or a missing feature, or perhaps
something else?


MySQL does not have a true boolean type, so this is actually 
interpreted as


  SELECT * FROM t WHERE a <> 0;

The optimizer is not able to see that "a <> 0" means "a = 1", and 
hence no index will be used.


Thanks,
Roy



Thanks,

Ben Clewett.








--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Roy Lyseng

Hi Ben,

On 19.10.15 15.10, Ben Clewett wrote:


I have noticed that an unqualified boolean expression cannot be optimized by
MySQL to use an index in 5.6.24.

For example:

CREATE TABLE t (
   i INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
   a BOOLEAN NOT NULL,
   KEY a (a)
) ENGINE=InnoDB;

This will hit key 'a':

SELECT * FROM t WHERE a = TRUE;

This will return the same results, but not use key 'a':

SELECT * FROM t WHERE a;

Is this a bug, or deliberate behaviour, or a missing feature, or perhaps
something else?


MySQL does not have a true boolean type, so this is actually interpreted as

  SELECT * FROM t WHERE a <> 0;

The optimizer is not able to see that "a <> 0" means "a = 1", and hence no index 
will be used.


Thanks,
Roy



Thanks,

Ben Clewett.







--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Roy Lyseng

Hi Shawn,

On 19.10.15 22.33, shawn l.green wrote:



On 10/19/2015 3:48 PM, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 16.07, Ben Clewett wrote:

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and
then use
two starts: one on 'a > 0' and one on 'a < 0', taking a union of the
result?
Which might make a significant result to something?


That is correct. However, if the substitution type for BOOLEAN was
UNSIGNED TINYINT instead of TINYINT, the range 'a < 0' would not even be
needed, and the quite cumbersome UNION would be avoided. But the best
solution would of course be a two-valued boolean type, where 'a <> 0'
would easily be transformed to a = 1. It would also mean that statistics
for the columns would be better, with TINYINT each value has the
estimated probability 1/256, whereas a boolean value would have
probability 1/2.




256 possible values of BOOLEAN? I've heard of fuzzy logic but this is awesome!
Some new literal value names to consider: maybe, sort_of, nearly_always,
certainly, practically_never, likely, ...
*


Well, it is the practical consequence of using TINYINT as the substitution type 
for BOOLEAN...





On a more serious note, indexes with limited cardinality are less useful than
those with excellent cardinality. Cardinality is an approximation (or
calculation. It depends on your storage engine) of how many unique values there
are in the index.

If the Optimizer estimates (based on a calculation based on the Cardinality)
that more than about 30% of a table would need to be retrieved in random order
based on an index, then that index is disallowed.  Why? Because the physical
disk overhead of doing random access averages just slightly more than 3x the
overhead used to scan a much larger block of data.

http://dev.mysql.com/doc/refman/5.6/en/how-to-avoid-table-scan.html

An index on just a Boolean value would have at best a cardinality of 2. So, any
indexes on Boolean values should include other columns to help the index become
more selective.

http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html


You are right about the index use, so it would be interesting only with a 
significant skew, say 10% TRUE values. However, the optimizer is not only about 
indexing, but also about calculating the filtering effect of a predicate. Using 
a true BOOLEAN rather than a TINYINT would give a better estimate of the 
filtering effect, and thus of the estimated number of rows as the outcome of a 
query.





*Actually, fuzzy logic has lots of practical application in real world
situations. They are just not using the MySQL BOOLEAN data type to store the
value for comparison.



Thanks,
Roy

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread shawn l.green



On 10/19/2015 3:48 PM, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 16.07, Ben Clewett wrote:

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and
then use
two starts: one on 'a > 0' and one on 'a < 0', taking a union of the
result?
Which might make a significant result to something?


That is correct. However, if the substitution type for BOOLEAN was
UNSIGNED TINYINT instead of TINYINT, the range 'a < 0' would not even be
needed, and the quite cumbersome UNION would be avoided. But the best
solution would of course be a two-valued boolean type, where 'a <> 0'
would easily be transformed to a = 1. It would also mean that statistics
for the columns would be better, with TINYINT each value has the
estimated probability 1/256, whereas a boolean value would have
probability 1/2.




256 possible values of BOOLEAN? I've heard of fuzzy logic but this is 
awesome! Some new literal value names to consider: maybe, sort_of, 
nearly_always, certainly, practically_never, likely, ...

*


On a more serious note, indexes with limited cardinality are less useful 
than those with excellent cardinality. Cardinality is an approximation 
(or calculation. It depends on your storage engine) of how many unique 
values there are in the index.


If the Optimizer estimates (based on a calculation based on the 
Cardinality) that more than about 30% of a table would need to be 
retrieved in random order based on an index, then that index is 
disallowed.  Why? Because the physical disk overhead of doing random 
access averages just slightly more than 3x the overhead used to scan a 
much larger block of data.


http://dev.mysql.com/doc/refman/5.6/en/how-to-avoid-table-scan.html

An index on just a Boolean value would have at best a cardinality of 2. 
So, any indexes on Boolean values should include other columns to help 
the index become more selective.


http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html


*Actually, fuzzy logic has lots of practical application in real world 
situations. They are just not using the MySQL BOOLEAN data type to store 
the value for comparison.


--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Integrated Cloud Applications & Platform Services
Office: Blountville, TN

Become certified in MySQL! Visit https://www.mysql.com/certification/ 
for details.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Query optimizer-miss with unqualified expressions, bug or feature?

2015-10-19 Thread Roy Lyseng

Hi Ben,

On 19.10.15 16.07, Ben Clewett wrote:

Hi Roy,

Thanks for the clear explanation.

I guess (hypothetically) the optimizer could see if it has a key, and then use
two starts: one on 'a > 0' and one on 'a < 0', taking a union of the result?
Which might make a significant result to something?


That is correct. However, if the substitution type for BOOLEAN was UNSIGNED 
TINYINT instead of TINYINT, the range 'a < 0' would not even be needed, and the 
quite cumbersome UNION would be avoided. But the best solution would of course 
be a two-valued boolean type, where 'a <> 0' would easily be transformed to a = 
1. It would also mean that statistics for the columns would be better, with 
TINYINT each value has the estimated probability 1/256, whereas a boolean value 
would have probability 1/2.


Thanks,
Roy



Ben.


On 2015-10-19 14:19, Roy Lyseng wrote:

Hi Ben,

On 19.10.15 15.10, Ben Clewett wrote:


I have noticed that an unqualified boolean expression cannot be optimized by
MySQL to use an index in 5.6.24.

For example:

CREATE TABLE t (
   i INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
   a BOOLEAN NOT NULL,
   KEY a (a)
) ENGINE=InnoDB;

This will hit key 'a':

SELECT * FROM t WHERE a = TRUE;

This will return the same results, but not use key 'a':

SELECT * FROM t WHERE a;

Is this a bug, or deliberate behaviour, or a missing feature, or perhaps
something else?


MySQL does not have a true boolean type, so this is actually interpreted as

  SELECT * FROM t WHERE a <> 0;

The optimizer is not able to see that "a <> 0" means "a = 1", and hence no
index will be used.

Thanks,
Roy



Thanks,

Ben Clewett.









--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql