puzzled by date functions (long)

2006-03-01 Thread Giuseppe Maxia
Yesterday I was analyzing the behavior of the query optimizer, and I stumbled 
into a most curious case.
I created two functions returning the extremes of a date range, and I wanted to 
see how many times those functions were
called when used in a WHERE clause So I added log tracing instructions to both 
of them. The result was quite surprising.
Let's set the environment first.

create database if not exists test ;
use test ;

delimiter //

drop function if exists today_start //
CREATE FUNCTION today_start() RETURNS datetime
begin
   insert into mylog (routine_name) values ('today_start');
   --   return current_date();
   return '2006-02-28 00:00:00';
end //

drop function if exists today_end //
CREATE FUNCTION today_end() RETURNS datetime
begin
   insert into mylog (routine_name) values ('today_end');
   -- return current_date() + interval 1 day - interval 1 second;
   return '2006-02-28 23:59:59';
end //

delimiter ;

drop table if exists t1;
create table t1 (
 id int(11) NOT NULL auto_increment,
 dt datetime NOT NULL,
 PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

drop table if exists mylog;
create table mylog (
   id int not null auto_increment primary key,
   routine_name varchar(20) not null,
   TS timestamp
);

INSERT INTO `t1` VALUES
   (1,'2006-02-28 11:19:35'), (2,'2006-02-28 11:19:38'),
   (3,'2006-02-28 11:19:40'), (4,'2006-03-01 11:20:09'),
   (5,'2006-03-01 11:20:11'), (6,'2006-03-01 11:20:12'),
   (7,'2006-03-01 11:20:13');

select * from t1;
++-+
| id | dt  |
++-+
|  1 | 2006-02-28 11:19:35 |
|  2 | 2006-02-28 11:19:38 |
|  3 | 2006-02-28 11:19:40 |
|  4 | 2006-03-01 11:20:09 |
|  5 | 2006-03-01 11:20:11 |
|  6 | 2006-03-01 11:20:12 |
|  7 | 2006-03-01 11:20:13 |
++-+

Now I select all today's rows from t1.

select * from t1 where dt between today_start() and today_end();
++-+
| id | dt  |
++-+
|  1 | 2006-02-28 11:19:35 |
|  2 | 2006-02-28 11:19:38 |
|  3 | 2006-02-28 11:19:40 |
++-+

That's correct. Now, let's see how many times the function was called:

select * from mylog;
++--+-+
| id | routine_name | TS  |
++--+-+
|  1 | today_start  | 2006-02-28 12:26:24 |
|  2 | today_end| 2006-02-28 12:26:24 |
++--+-+

And that too was what I expected. But the story changes if I use a slightly 
different table. This one has the same
columns as t1, but the primary key is the datetime column.

drop table if exists t2;
create table t2 (
 id int not null,
 dt datetime NOT NULL,
 PRIMARY KEY (dt)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into t2 (id, dt) select id, dt from t1;

Now I did the same experiment with this table:

truncate mylog;
select * from t2 where dt between today_start() and today_end();
++-+
| id | dt  |
++-+
|  1 | 2006-02-28 11:19:35 |
|  2 | 2006-02-28 11:19:38 |
|  3 | 2006-02-28 11:19:40 |
++-+

The query finds the same records. Let's see what happens to mylog:

select * from mylog;
++--+-+
| id | routine_name | TS  |
++--+-+
|  1 | today_start  | 2006-02-28 12:30:00 |
|  2 | today_end| 2006-02-28 12:30:00 |
|  3 | today_start  | 2006-02-28 12:30:00 |
|  4 | today_end| 2006-02-28 12:30:00 |
++--+-+

I can't imagine why this is happening. The only difference is that dt is now 
primary key. Instead of being called once,
the routine is called twice. If I simply drop the primary key in t2, then the 
routine is called once per query, as expected.
The result does not change if I use InnoDB tables instead of MyISAM.

Can anyone explain what is happening here?

Thanks
Giuseppe


-- 
 _  _ _  _
(_|| | |(_|  The Data Charmer
 _|
http://datacharmer.blogspot.com/


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: puzzled by date functions (long)

2006-03-01 Thread Dan Nelson
In the last episode (Mar 01), Giuseppe Maxia said:
 Yesterday I was analyzing the behavior of the query optimizer, and I
 stumbled into a most curious case. I created two functions returning
 the extremes of a date range, and I wanted to see how many times
 those functions were called when used in a WHERE clause So I added
 log tracing instructions to both of them. The result was quite
 surprising. Let's set the environment first.
[..] 
 I can't imagine why this is happening. The only difference is that dt
 is now primary key. Instead of being called once, the routine is
 called twice. If I simply drop the primary key in t2, then the
 routine is called once per query, as expected. The result does not
 change if I use InnoDB tables instead of MyISAM.

My guess is that the query optimizer is comparing the range endpoints
against the index to see whether it needs to do a full table scan,
index range scan, or optimize the table away.  Depending on what data
is shared between the optimizer and the query enging itself, it may
need to evaluate the WHERE clause multiple times.  Just a guess though;
examining the source, or replacing your stored procedure with a UDF
function that printed its own stack trace to a text file, would tell
you for certain.

-- 
Dan Nelson
[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]