First of all thanks all for your answers. Second I'll explain what I manage to do in regard with this issue, maybe someone else may need it in the future.
So, first I have started by changing the way I have stored the `ip` field from varchar(15) to int unsigned and populated this field with the 4 bytes value of the ip ( INET_ATON ). Then I have done some benchmarking using the varchar field and ORDER BY NULL to avoid sorting. After that I have done some benchmarks using the ip stored as numbers ( of course I have redefined the primary key, etc, etc ). I was surprised to find out that grouping by a varchar was FASTER then by an int column. I was surprised to find out this because my logic tells me that it should be easyer to group by a 4 byte data than by a 15 bytes data. The tests were done using same computer, same database engine. After that, I also tryed to convert the table to InnoDB ( originally it was MyISAM ) and made same tests ( ip as int and as varchar ). The results were completly different, the grouping by ip as int was faster than ip as varchar and overall both queries were faster than if executed against a MyISAM table. I decided to keep the IP as INT and table as InnoDB, the time is now reduced to 2.3s ( avg ), still too much, but faster than before anyway. I think table partitions from 5.1 will help by splitting records by year and month , but waiting to get stable though. Queries used : For IP as VARCHAR : SELECT `ip`, `type`,SUM(`inbound`) AS `in`, SUM(`outbound`) AS`out` FROM `accounting` GROUP BY `ip`,`type` WHERE `record_time` BETWEEN '20060101000000' AND '20060201000000' ORDER BY NULL For IP as INT UNSIGNED : SELECT INET_NTOA(`ip`) AS `ip`, `type`,SUM(`inbound`) AS `in`, SUM(`outbound`) AS`out` FROM `accounting` GROUP BY `ip`,`type` WHERE `record_time` BETWEEN '20060101000000' AND '20060201000000' ORDER BY NULL Results : table_engine ip_type avg_query_time (s) MyISAM VARCHAR(15) 6.7 MyISAM INT UNSIGNED 9.4 InnoDB VARCHAR(15) 2.8 InnoDB INT UNSIGNED 2.3 That would be all, thanks again. Puiu Hrenciuc P.S.: Removing ORDER BY NULL adds "using filesort" and takes longer. "Joerg Bruehe" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi all! > > (Sorry for the late reply.) > > Puiu Hrenciuc wrote (re-ordered): >> "Barry" <[EMAIL PROTECTED]> wrote in message >> news:[EMAIL PROTECTED] >>> Puiu Hrenciuc wrote: >>>> Hi, >>>> >>>> I have a table ( MyISAM, MySQL 5.0.20, FreeBSD ) containing network >>>> traffic data : >>>> >>>> record_time datetime - time when recording was added >>>> ip char(15) - ip that generated the traffic >>>> type tinyint(3) - traffic type ( 1 - local, 2 - internet ) >>>> inbound int(10) - in bytes >>>> outbound int(10) - out bytes >>>> >>>> Records are inserted each 5 minutes through a cron script. >>>> Currently there are 3,330,367 rows. >>>> >>>> Primary index is defined on ( ip, type, record_time ), columns in that >>>> order. >>>> Also there is an index defined only on record_time >>>> >>>> Now for an example to get traffic for this month, I use : >>>> >>>> SELECT `ip`,`type`,SUM(`inbound`) AS `in`, SUM(`outbound`) as `out` >>>> FROM `accounting` >>>> WHERE `record_time` BETWEEN 20060401000000 AND 20060413160000 >>>> GROUP BY `ip`,`type` >>>> >>>> this query takes aprox 7 seconds >>>> >>>> Using EXPLAIN gives : >>>> >>>> select_type table type possible_keys key >>>> key_len ref rows Extra >>>> SIMPLE accounting range record_time record_time 8 >>>> NULL 362410 Using where; Using temporary; Using filesort >>>> >>>> If I remove the SUM functions I am getting also "Using index in >>>> group-by" >>>> and the query takes only 0.25 sec >>>> >>>> Is there anyway to optimize this query to get faster responses ? > > > For the original query, the index on "record_time" can be used to select > the records to be considered, and the base table must be accessed to sum > the "inbound" and "outbound" values. > > Without the summing, all information needed for the answer is in the > primary key, so an "index-only" strategy can be used (does not need the > base table, profits from key cache, ...). > Also, by the "group by" clause each distinct combination of (ip, type) is > needed only once, so the amount of data that need to be handled is much > smaller. > > The summing must make a big difference, this cannot be avoided. > > You _might_ try an index (record_time, ip, type), because here the leading > (= most significant) part can be used for your "where condition", and the > next components support the "group by", so there is a slight chance to > avoid the sorting. > Disclaimer: This is pure speculation from my part! > > >>>> >>> Set an index on ip and type and "probably" also on record_time > > > > Hmmm, I have omited that : > > > > I also have an index on (ip,type) in that order > > > The index on (ip,type) is a prefix of the primary key (ip, type, > record_time), and in general any prefix of an existing key can be used. > > In other words: I doubt whether there are any statements that make use of > this additional index but would not make the same use of the primary key. > But as an index needs to be maintained on each insert/update/delete, IMO > all this additional index does is slow down these statements (and take up > disk space). > > You might reduce workload by dropping that index, but it has no effect on > the two queries (except reduced file size, amount of data, and all these > general consequences). > > > Regards, > Joerg > > -- > Joerg Bruehe, Senior Production Engineer > MySQL AB, www.mysql.com > Office: (+49 30) 417 01 487 VoIP: [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]