RE: optimizing a select statement over a database with >50 million recs

Donny Simonton Sun, 29 Feb 2004 16:05:57 -0800

>From looking at this, you have a using temporary, on table b.  Which may be
the biggest slow down, it's hard to somewhat tell.  I assume you have
indexes on everything...  I'm not sure where you are running explain from,
but when I run one it looks like below.  You can easily see where problems
are, like Using Filesort, but that's because I am using an order by.  And
This query is going against 6 tables with over 50 million rows in each.


+----+-------------+------------------+--------+-------------------------+--
-------+---------+--------------------------------+------+------------------
--------+
| id | select_type | table            | type   | possible_keys           |
key     | key_len | ref                            | rows | Extra
|
+----+-------------+------------------+--------+-------------------------+--
-------+---------+--------------------------------+------+------------------
--------+
|  1 | SIMPLE      | Phrase           | const  | PRIMARY,phrase          |
phrase  |      80 | const                          |    1 | Using filesort
|
|  1 | SIMPLE      | WordStem         | const  | PRIMARY,stem            |
PRIMARY |       4 | const                          |    1 |
|
|  1 | SIMPLE      | Word             | const  | PRIMARY,word            |
PRIMARY |       4 | const                          |    1 |
|
|  1 | SIMPLE      | WordScoreTemp    | ref    | PRIMARY,word_exist,word |
word    |      80 | const                          |   15 | Using where
|
|  1 | SIMPLE      | RelatedStopWords | eq_ref | PRIMARY                 |
PRIMARY |      45 | WordMagic.WordScoreTemp.phrase |    1 | Using where;
Using index |
|  1 | SIMPLE      | Price            | eq_ref | PRIMARY                 |
PRIMARY |      45 | WordMagic.WordScoreTemp.phrase |    1 |
|
+----+-------------+------------------+--------+-------------------------+--
-------+---------+--------------------------------+------+------------------
--------+

But looking at your explain with is hard to understand as I mentioned above,
it look like your rows are 5281 * 1 * 1 * 4 * 2 or 42248 rows it has to go
through.  Hell, I'm going through it and it's baffling my mind exactly what
those numbers are for.  Can you run explain from the command line and not
remove all the headers so we can make everything out.

Also see if you can change WHERE (f.name_id =45 OR f.name_id =56) to use IN.
It has proven to be much faster for me.  But that won't speed it up .3
seconds.

Donny


> -----Original Message-----
> From: karthik viswanathan [mailto:[EMAIL PROTECTED]
> Sent: Sunday, February 29, 2004 5:41 PM
> To: Donny Simonton; 'karthik viswanathan'; [EMAIL PROTECTED]
> Subject: RE: optimizing a select statement over a database with >50
> million recs
> 
> Thanks for looking at my problem. here is the explain for this version of
> select
> 
> b | ref | id,p_id,combine | combine | 4 | const | 5281 | Using where;
> Using index;
> Using temporary |
> c | eq_ref | PRIMARY,p_id | PRIMARY | 4 | b.p_id | 1 |  |  |
> d | ref | name_id,out_id | out_id | 4 | c.out_id | 1 |  |  |
> a | ref | p_id | p_id | 4 | b.p_id | 3 | Using where |
> e | eq_ref | PRIMARY,start_id,combine,name_id | PRIMARY | 4 | d.name_id |
> 1 |
> Distinct |
> f | range | PRIMARY,name_id | PRIMARY | 4 | NULL | 2 | Using | where; |
> Distinct |
> 
> Karthik
> 
> 
> > Can you send us a new explain on this version?
> >
> > Donny
> >
> > > -----Original Message-----
> > > From: karthik viswanathan [mailto:[EMAIL PROTECTED]
> > > Sent: Sunday, February 29, 2004 4:12 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: optimizing a select statement over a database with >50
> > > million recs
> > >
> > > here is the updated select statement i came with but still needs
> > > improvement
> > >
> > > SELECT Distinct (a.id)
> > > From table1 a
> > > INNER JOIN table1 b
> > > USING ( p_id )
> > > INNER JOIN table2 c
> > > USING ( p_id )
> > > INNER JOIN table3 d
> > > USING ( out_id )
> > > INNER JOIN table4 e ON ( d.name_id = e.name_id )
> > > INNER JOIN table4 f  ON ( e.start_id
> > > BETWEEN f.left_id AND f.end_id )
> > > WHERE (f.name_id =45 OR f.name_id =56)
> > > AND b.id =275 AND a.id != b.id
> > >
> > > This reduced the time from 0.5 secs to 0.3 secs but still i am looking
> for
> > > some major
> > > improvemnts. I am using this query in PHP and its been repeated
> several
> > > times to
> > > display a list. Any advice on this will be really useful. Is there
> > > anything I could
> > > do with the mysql and php configuration to increase the performance?
> The
> > > server has
> > > just 1GB ram, will adding more memory help?
> > >
> > > Thanks
> > > Karthik.
> > >
> > > > Need help on optimizing the select statement:
> > > >
> > > > Table structure
> > > >
> > > > table1
> > > > ------
> > > > id
> > > > p_id
> > > >
> > > > table2
> > > > ------
> > > > p_id
> > > > out_id
> > > >
> > > > table3
> > > > ------
> > > > out_id
> > > > name_id
> > > >
> > > > table4
> > > > -------
> > > > name_id (unique)
> > > > prev_id
> > > > start_id (unique)
> > > > end_id (unique)
> > > >
> > > > Only table4 has unique fields all other fields are not unique.
> > > >
> > > > The following select statement does what is required but the tables
> are
> > > really
> > > > huge
> > > > (> 50 million records) so need to be optimized
> > > >
> > > > SELECT DISTINCT (a.id)
> > > > FROM table1 a, table1 b, table2 c, table3 d, table4 e, table4 f
> > > > WHERE a.id =275 AND a.p_id = b.p_id AND a.id != b.id
> > > >   AND a.p_id = c.p_id AND c.out_id = d.out_id AND d.name_id =
> e.name_id
> > > >   AND e.start_id >= f.start_id AND e.end_id <= f.end_id AND e.end_id
> !=0
> > > >   AND (f.name_id =45 OR f.name_id =56)
> > > > GROUP BY b.id
> > > >
> > > > The explain for the above statement is
> > > >
> > > > a | ref | id,p_id,combine | combine | 4 | const | 5281 | Using
> where;
> > > Using
> > > > index;
> > > > Using temporary; Using f...
> > > > c | eq_ref | PRIMARY,p_id | PRIMARY | 4 | a.p_id | 1 |  |  |
> > > > d | ref | name_id,out_id | out_id | 4 | b.out_id | 1 |  |  |
> > > > b | ref | p_id | p_id | 4 | a.p_id | 3 | Using where |
> > > > e | range | PRIMARY,start_id,end_id,combine,name_id | PRIMARY | 4 |
> NULL
> > > | 2 |
> > > > Using
> > > > where |
> > > > f | eq_ref | PRIMARY,start_id,end_id,combine,name_id | PRIMARY | 4 |
> > > c.name_id |
> > >
> > > > 1 |
> > > > Using where |
> > > >
> > > > I am sure there should be some better way to do this using Inner
> join or
> > > > something
> > > > similar but I am not sure how. It will be helpful if you could
> suggest
> > > me some
> > > > improvements for this query. If you need any further explanation
> please
> > > let me
> > > > know.
> > > >
> > > > Thanks for your help
> > > > Karu.
> > > >
> > > >
> > > >
> > > > --
> > > > MySQL General Mailing List
> > > > For list archives: http://lists.mysql.com/mysql
> > > > To unsubscribe:
> http://lists.mysql.com/[EMAIL PROTECTED]
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > MySQL General Mailing List
> > > For list archives: http://lists.mysql.com/mysql
> > > To unsubscribe:
> > > http://lists.mysql.com/[EMAIL PROTECTED]
> > >
> >
> >
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> >
> 
> 
> 
> 
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/[EMAIL PROTECTED]
> 




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: optimizing a select statement over a database with >50 million recs

Reply via email to