Re: Query Performance

SGreen Thu, 14 Apr 2005 12:34:33 -0700

Fernando Henrique Giorgetti <[EMAIL PROTECTED]> wrote on 04/14/2005 
02:34:30 PM:


> Hi Folks!
> 
> Here, I have the following table:
> 
> CREATE TABLE `accesses` (
>   `time` varchar(15) NOT NULL default '',
>   `duration` int(10) default NULL,
>   `user` varchar(25) NOT NULL default '',
>   `ipaddr` varchar(15) NOT NULL default '',
>   `result` varchar(30) default NULL,
>   `bytes` int(10) default NULL,
>   `reqmethod` varchar(10) default NULL,
>   `urlparent` varchar(100) NOT NULL default '',
>   KEY `usuario` (`usuario`),
>   KEY `time_result` (`time`, `result`)
> );
> 
> If my table has a great number of rows (something like 5 millions), 
> the result time is too much longer.
> 
> select user, count(distinct concat(date_format(from_unixtime(time), 
> "%d/%m/%Y"), " - ", time_format(from_unixtime(time), "%H:%i")), 
> ipaddr, urlparent) as qtd, sec_to_time(sum(duration)/1000) as 
> duration, sum(bytes) as bytes from acessos where time >= 1109646000 
> and time <= 1112324399 and result <> "TCP_DENIED/403" group by user 
> order by user;
> 
> PS: explaining this select, the time_result key is a possible_key, 
> but, in the key field I have the NULL value (the NULL persists even 
> if I force with use index()).
> 
> Can anybody help me what can I do to make this query faster 
> (indexes, tuning, or, change the table structure or the query).
> 
> Thank you !
> -- 
> Fernando Henrique Giorgetti
> [EMAIL PROTECTED]
> Departamento de Tecnologia
> http://www.gruponet.com.br
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> 

If I unfold and indent your query I get:

select user
        , count(distinct 
                concat(
                        date_format(from_unixtime(time), "%d/%m/%Y")
                        , " - "
                        , time_format(from_unixtime(time), "%H:%i")
                )
                , ipaddr
                , urlparent
        ) as qtd
        , sec_to_time(sum(duration)/1000) as duration
        , sum(bytes) as bytes 
from acessos 
where time >= 1109646000 
        and time <= 1112324399 
        and result <> "TCP_DENIED/403" 
group by user 
order by user;

Your COUNT() operator seems to be trying to execute a COUNT((concatenated 
date to nearest minute), ipaddr, urlparent). I may have unfolded it 
incorrectly but that's how it seems to me. I think you meant to put the 
ipaddr and urlparent fields INTO the CONCAT() but I am just working from 
what I got.

There is a faster way to compute time to the nearest minute than what you 
are doing with the string conversions. Just do an integer division of your 
TIME value by 60 and throw away the remainder like this:

time DIV 60

or like this:
FLOOR(time/60)

(http://dev.mysql.com/doc/mysql/en/arithmetic-functions.html)

If I understand your COUNT(DISTINCT ) statement correctly, you want to 
know how in how many different minutes the user used either a unique 
ipaddr or a different urlparent. Am I close? You can also compute <> as 
the OR of two ranges (which may end up using the index or it may not...) 
So this could be a valid revision of your original query:

select user
        , count(distinct 
                concat(
                        FLOOR(time/60)
                        , ipaddr
                        , urlparent
                )
        ) as qtd
        , sec_to_time(sum(duration)/1000) as duration
        , sum(bytes) as bytes 
from acessos 
where time >= 1109646000 
        and time <= 1112324399 
        and (
                result < 'TCP_DENIED/403' 
                OR result > 'TCP_DENIED/403'
        )
group by user;

Note: GROUP BY includes a free ORDER BY unless you specify otherwise.

HTH,

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Re: Query Performance

Reply via email to