Re: stunningly slow query

SGreen Sun, 02 Apr 2006 21:36:27 -0700

[EMAIL PROTECTED] wrote on 04/02/2006 05:35:59 PM:

<snip>
> > > Can you post your show create table tbl_name statement for 
> > > these tables that involve slow queries?
> > 
> > | old_crumb |CREATE TABLE `old_crumb` (
> >   `link_ID` bigint(20) default NULL,
> >   `dir_Travel` char(1) default NULL,
> >   `customer_ID` int(11) NOT NULL default '0',
> >   `source_ID` int(11) NOT NULL default '0',
> >   `vehicle_ID` int(11) NOT NULL default '0',
> >   `actual_Time` datetime NOT NULL default '0000-00-00 00:00:00',
<snip>
> >   `last_Modified` datetime default NULL,
> >   PRIMARY KEY  (`customer_ID`,`source_ID`,`vehicle_ID`,`actual_Time`),
> >   KEY `old_crumb_ix_reported_Time` (`reported_Time`),
> >   KEY `old_crumb_ix_link_ID` (`link_ID`,`dir_Travel`)
> > ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=1000000000 
> COMMENT='List of breadcrumbs already rolled up.' INDEX 
> DIRECTORY='/var/mysql_idx/trimble/' |
> 
> I'm no DB expert Chris but do you really need to create a 
> primary key index over 4 columns?
> 
> What about something simple and possibly faster like adding 
> a seperate ID primary key column to the table like:
> 
>  | old_crumb |CREATE TABLE `old_crumb` (
>    `ID` int unsigned not null auto_increment
>    `link_ID` bigint(20) default NULL,
>    `dir_Travel` char(1) default NULL,
>    `customer_ID` int(11) NOT NULL default '0',
>    `source_ID` int(11) NOT NULL default '0',
>    `vehicle_ID` int(11) NOT NULL default '0',
> snip
>    PRIMARY KEY (`ID`),
> snip
> 
> An unsigned int will take an extra 4 bytes of storage space 
> per row, and will give you an index range of 0 - 4294967295.
> 
> If that is not enough range, an unsigned bigint will take an 
> extra 8 bytes of storage space, and will give you an index 
> range of 0 - 18446744073709551615.
> 
> Although this will increase the amount of storage space 
> required in the .MYD file, it may also decrease the amount 
> of space required in the .MYI index file, as you would not 
> be needing to store multi-column indexes.
>


Keith,
Your method won't guarantee that there are no rows where the combination 
of the values in those four columns fails to repeat in any other row. To 
do that would require an EXTRA four-column unique index of type UNIQUE. 
Your proposal would actually make the situation worse as now there would 
be two indexes to maintain to achieve the same effect as the previous 
single PK.

> > This is the other
> > 
> >  link_area | CREATE TABLE `link_area` (
> >   `link_ID` bigint(20) NOT NULL default '0',
> >   `dir_Travel` char(1) NOT NULL default '',
> >   `area_ID` int(11) NOT NULL default '0',
> >   PRIMARY KEY  (`link_ID`,`dir_Travel`),
> >   KEY `link_area_ix_area_ID` (`area_ID`)
> > ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COMMENT='Link area table' 
> INDEX DIRECTORY='/var/mysql_idx/landsonar/' |
> > 
> > Inserts into the link_area were going very very slowly while data
> > was being moved into old_crumb. old_crumb is large - my suspicion
> > at this point is that the process of looking for key conflicts was
> > slowing things down and starving other query traffic.
> 
> The same could be applied to the link_area table:
> Do you need dir_travel as part of the primary key?

If dir_travel is part of what makes each row different than every other 
row, then YES he needs that column as part of his primary key. 

> 
> >  link_area | CREATE TABLE `link_area` (
> >   `link_ID` bigint(20) NOT NULL default '0',
> >   `dir_Travel` char(1) NOT NULL default '',
> >   `area_ID` int(11) NOT NULL default '0',
> >   PRIMARY KEY  (`link_ID`),
> >   KEY `link_area_ix_area_ID` (`area_ID`)
> > ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COMMENT='Link area 
> table' INDEX DI$
> 
> Regards 
> 
> Keith
> 

Your suggestions were well intended. However, this seems to me that his 
key caches are just not large enough to keep the whole key in memory. It 
may be possible for him to maintain a smaller "current" or "daily" table 
that is then batch merged into the larger "historic" copy of his old-crum 
table. I believe he is correct in guessing that his insert traffic to 
old_crum is interfering with the inserts into link_area and that the most 
likely cause is the need to both validate the new rows against the PK and 
add the new rows into the PK.

In the spirit of Keith's suggestion, is there any reason why you cannot 
make a hash or lookup table of all of your 
(`customer_ID`,`source_ID`,`vehicle_ID`) triplets and replace those 
columns in old_crum (and it's PK) with the single value?  That way you 
don't lose your row uniqueness but gain space in your PK. You could also 
reduce your actual_time column to an integer value (instead of a date 
value) so that you are comparing against a numeric value when you compare 
against the PK?

When you are dealing with 10s of millions of rows like you are, these 
little changes can make some big differences.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Re: stunningly slow query

Reply via email to