Hello Rick,
On 5/23/2013 7:08 PM, Rick James wrote:
Watch out for CAST(), DATE(), and any other function. In a WHERE clause, if
you hide an indexed column inside a function, the index cannot be used for
optimization.
INDEX(datetime_col)
...
WHERE DATE(datetime_col) = '2013-01-01'
will not use the index!
The workaround is messy, but worth it (for performance):
WHERE datetime_col >= '2013-01-01'
AND datetime_col < '2013-01-01' + INTERVAL 1 DAY
(or any of a zillion variants)
(Yeah, it seems like the optimizer could do the obvious transformation for you.
Hint, hint, Shawn.)
Or we could coerce datetime values back to their date values when both
are being used. The trick now becomes choosing between rounding the
datetime value (times past noon round to the next date) or do we use the
floor() function all the time.
This has been discussed and the consensus was that the most predictable
and performant behavior was to extend a date value to become a datetime
value by associating it with midnight (0000).
Let's look at some examples:
a) '2013-05-14 07:00:00' = '2013-05-14'
This is true as the datetime value is 7 hours after midnight.
b) '2013-05-14 07:00:00' >= '2013-05-14'
AND '2013-05-14 07:00:00' < '2013-05-15'
This is true as the time value is somewhen between both midnights.
c) '2013-05-14 07:00:00' > '2013-05-14' + INTERVAL 8 HOURS
This is false. The offset applied to the date term means the time
portion of the resulting datetime value is 0800, not 0000. (0700 > 0800)
is false.
d) And what if instead of comparing against the FLOOR() of each date we
rounded datetime values up or down to their nearest dates?
'2013-05-14 17:00:00' = '2013-05-14'
This would be false because the datetime value would have rounded up to
'2013-05-15'.
There is also a strong desire to make a database server try to do
exactly what the user tells it to do. If the user wants to compare a
value to another value with an equality check, we should do that. It
would be very odd behavior if an equality check suddenly turns into a
ranged check. I realize how much time it would save people to not need
to include both ends of the range:
> WHERE datetime_col >= '2013-01-01'
> AND datetime_col < '2013-01-01' + INTERVAL 1 DAY
but for predictability and reliability, this is one rewrite that may not
always be true.
Perhaps a new operator like IN_DATE is what you are looking for?
(example)
'2013-05-14 17:00:00' IN_DATE '2013-05-14'
This would be true.
But what if the date field were a due date and all submissions needed to
be in by noon on those dates? This IN_DATE operator would return true
for submissions beyond the deadline, too (a wrong result). But a direct
comparison would be true:
'2013-05-14 17:00:00' >= '2013-01-01'
AND '2013-05-14 17:00:00' <= '2013-05-14' + INTERVAL 12 HOUR
Of course you could always write that as a stored function, too.
(pseudo declaration)
CREATE FUNCTION IN_DATE(dateval, datetimeval) (...)
(example usage)
IN_DATE('2013-05-14','2013-05-14 17:00:00')
This could be true, too. But a stored function would interfere with the
use of indexes to resolve the query.
Then we get into the problems of how would this operator or function
would handle comparisons to TIMESTAMP columns and many other variations
like how to see if a datetime is within a 2 day span instead of one.
It's a major ball of wax to rewrite these queries in the optimizer just
to avoid one extra line of SQL code per comparison. To us "Keep It
Simple, Stupid!" (KISS) seemed like the better approach to the problem.
We do one thing (make dates represent midnight on that date when they
need to be compared to datetime values) and allow the users to decide
how to handle the rest of the comparison according to their specific needs.
Respectfully,
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql