Re: Optimizing SQL statement

Puiu Hrenciuc Thu, 20 Apr 2006 17:31:30 -0700

First of all thanks all for your answers.
Second I'll explain what I manage to do in regard with this issue,
maybe someone else may need it in the future.


So, first I have started by changing the way I have stored the `ip`
field from varchar(15) to int unsigned and populated this field with
the 4 bytes value of the ip ( INET_ATON ). Then I have done some
benchmarking using the varchar field and ORDER BY NULL to avoid
sorting. After that I have done some benchmarks using the ip stored as
numbers ( of course I have redefined the primary key, etc, etc ). I was
surprised to find out that grouping by a varchar was FASTER then by an
int column. I was surprised to find out this because my logic tells me that
it should be easyer to group by a 4 byte data than by a 15 bytes data.
The tests were done using same computer, same database engine.
After that, I also tryed to convert the table to InnoDB ( originally it was
MyISAM ) and made same tests ( ip as int and as varchar ). The results
were completly different, the grouping by ip as int was faster than ip
as varchar and overall both queries were faster than if executed against
a MyISAM table. I decided to keep the IP as INT and table as InnoDB,
the time is now reduced to 2.3s ( avg ), still too much, but faster than
before anyway. I think table partitions from 5.1 will help by splitting 
records
by year and month , but waiting to get stable though.

Queries used :

For IP as VARCHAR :

SELECT `ip`, `type`,SUM(`inbound`) AS `in`, SUM(`outbound`) AS`out`
FROM `accounting`
GROUP BY `ip`,`type`
WHERE `record_time` BETWEEN '20060101000000' AND '20060201000000'
ORDER BY NULL

For IP as INT UNSIGNED :

SELECT INET_NTOA(`ip`) AS `ip`, `type`,SUM(`inbound`) AS `in`, 
SUM(`outbound`) AS`out`
FROM `accounting`
GROUP BY `ip`,`type`
WHERE `record_time` BETWEEN '20060101000000' AND '20060201000000'
ORDER BY NULL

Results :

table_engine              ip_type               avg_query_time (s)
MyISAM                VARCHAR(15)                  6.7
MyISAM                INT UNSIGNED                 9.4
InnoDB                  VARCHAR(15)                 2.8
InnoDB                  INT UNSIGNED                2.3

That would be all, thanks again.

Puiu Hrenciuc

P.S.: Removing ORDER BY NULL adds "using filesort" and takes longer.


"Joerg Bruehe" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Hi all!
>
> (Sorry for the late reply.)
>
> Puiu Hrenciuc wrote (re-ordered):
>> "Barry" <[EMAIL PROTECTED]> wrote in message 
>> news:[EMAIL PROTECTED]
>>> Puiu Hrenciuc wrote:
>>>> Hi,
>>>>
>>>> I have a table ( MyISAM, MySQL 5.0.20, FreeBSD ) containing network 
>>>> traffic data :
>>>>
>>>> record_time datetime  - time when recording was added
>>>> ip char(15) - ip that generated the traffic
>>>> type tinyint(3) - traffic type ( 1 - local, 2 - internet )
>>>> inbound int(10) - in bytes
>>>> outbound int(10) - out bytes
>>>>
>>>> Records are inserted each 5 minutes through a cron script.
>>>> Currently there are 3,330,367 rows.
>>>>
>>>> Primary index is defined on ( ip, type, record_time ), columns in that 
>>>> order.
>>>> Also there is an index defined only on record_time
>>>>
>>>> Now for an example to get traffic for this month, I use :
>>>>
>>>> SELECT `ip`,`type`,SUM(`inbound`) AS `in`, SUM(`outbound`) as `out`
>>>> FROM `accounting`
>>>> WHERE `record_time` BETWEEN 20060401000000 AND 20060413160000
>>>> GROUP BY `ip`,`type`
>>>>
>>>> this query takes aprox 7 seconds
>>>>
>>>> Using EXPLAIN gives :
>>>>
>>>> select_type    table              type      possible_keys         key 
>>>> key_len      ref          rows                                  Extra
>>>> SIMPLE         accounting   range       record_time     record_time 8 
>>>> NULL     362410     Using where; Using temporary; Using filesort
>>>>
>>>> If I remove the SUM functions I am getting also "Using index in 
>>>> group-by"
>>>> and the query takes only 0.25 sec
>>>>
>>>> Is there anyway to optimize this query to get faster responses ?
>
>
> For the original query, the index on "record_time" can be used to select 
> the records to be considered, and the base table must be accessed to sum 
> the "inbound" and "outbound" values.
>
> Without the summing, all information needed for the answer is in the 
> primary key, so an "index-only" strategy can be used (does not need the 
> base table, profits from key cache, ...).
> Also, by the "group by" clause each distinct combination of (ip, type) is 
> needed only once, so the amount of data that need to be handled is much 
> smaller.
>
> The summing must make a big difference, this cannot be avoided.
>
> You _might_ try an index (record_time, ip, type), because here the leading 
> (= most significant) part can be used for your "where condition", and the 
> next components support the "group by", so there is a slight chance to 
> avoid the sorting.
> Disclaimer: This is pure speculation from my part!
>
>
>>>>
>>> Set an index on ip and type and "probably" also on record_time
> >
> > Hmmm, I have omited that :
> >
> > I also have an index on (ip,type) in that order
>
>
> The index on (ip,type) is a prefix of the primary key (ip, type, 
> record_time), and in general any prefix of an existing key can be used.
>
> In other words: I doubt whether there are any statements that make use of 
> this additional index but would not make the same use of the primary key.
> But as an index needs to be maintained on each insert/update/delete, IMO 
> all this additional index does is slow down these statements (and take up 
> disk space).
>
> You might reduce workload by dropping that index, but it has no effect on 
> the two queries (except reduced file size, amount of data, and all these 
> general consequences).
>
>
> Regards,
> Joerg
>
> -- 
> Joerg Bruehe, Senior Production Engineer
> MySQL AB, www.mysql.com
> Office:  (+49 30) 417 01 487     VoIP: [EMAIL PROTECTED] 



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Optimizing SQL statement

Reply via email to