Re: Table joins are slow things to deal with. . .

2003-01-22 Thread Stefan Hinz, iConnect \(Berlin\)
Steve,

 ([Defendant] Query WITH a join - 8.79 seconds!
 EXPLAIN SELECT Defendants.CaseNumber FROM Defendants, Cases WHERE
 Cases.CaseNumber = Defendants.CaseNumber AND Filed = 1999-01-01 AND
 (Defendant LIKE owen% OR Defendant LIKE pitts% OR Defendant LIKE
 general%);

First thing that comes into mind is: You said you indexed the join
fields (i.e. CaseNumber), but what about Defendants.Defendant? Okay, you
did. Hmm.

Next thing would be to check if the appropriate keys were used
(CaseNumber, Defendant). Yes, they were used.

Next thing would be to check if the optimizer chose a good join type:

 | Defendants | range  | CaseNumber,Defendant | Defendant   |  30 |
 | Cases  | eq_ref | CasesNumber,Filed| CasesNumber |  30 |

The join types are range and eq_ref, that's fine.

The number of examined rows in Defendants seems okay, too:

 rows  | Extra  |
 82756 | where used |

Okay, I leave this one to the gurus :-/

Regards,
--
  Stefan Hinz [EMAIL PROTECTED]
  Geschäftsführer / CEO iConnect GmbH http://iConnect.de
  Heesestr. 6, 12169 Berlin (Germany)
  Tel: +49 30 7970948-0  Fax: +49 30 7970948-3

- Original Message -
From: Steve Quezadas [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, January 22, 2003 12:32 AM
Subject: Table joins are slow things to deal with. . .


 Maybe I'm dumb for saying this, but sql joins seems expensive to do in
 terms of performance (yes, I indexed the joined fields). If I do a
query
 search of a 2,600,000 record defendant table WITHOUT a join (SELECT
 DISTINCT CaseNumber FROM Defendants WHERE Defendant LIKE owen% OR
 Defendant LIKE pitts% OR Defendant LIKE general%;). Performance is
 generally zippy at .53 seconds (which ends up pulling about 34,000
rows).

 HOWEVER, once I join this table with the much smaller [Cases] table
 (about 140,000 rows), performance plummets to 8.79 seconds! Quite a
 drop! The SQL statement is: 'SELECT Defendants.CaseNumber FROM
 Defendants, Cases WHERE Cases.CaseNumber = Defendants.CaseNumber AND
 Filed = 1999-01-01 AND (Defendant LIKE owen% OR Defendant LIKE
 pitts% OR Defendant LIKE general%);'. I get about 10,500 rows
 returned here, but man, what an increase in time.

 So my main question is: Is this degradation in performance to be
 expected with a join? What I am thinking about doing is perhaps doing
 two smaller queries instead of one larger one. My first query can just
 query the very large Defendants table without a join and create a
 temporary table of CaseNumbers (takes about .53 seconds). Then I can
 join that temporary table into the much smaller [Cases] table and then
 run that part of the query (which takes about 1.23 seconds).

 So bottom line:
 Results of big query with one join: 8.79 seconds
 Results of creating two smaller queries: 1.76 seconds (.53 seconds +
 1.23 seconds).

 So I am thinking about optimizing the query in my PHP program to make
 two smaller queries instead of one bigger query. Is this a standard
 thing to do to increase speed? Or am I doing something stupid and
 inefficient in the join that I am not aware of?

 - Steve

 PS Here is some information about my tables and indexes:

 ([Defendant] Query WITHOUT a join) - .53 seconds.
 EXPLAIN SELECT DISTINCT CaseNumber FROM Defendants WHERE Defendant
LIKE
 owen% OR Defendant LIKE pitts% OR Defendant LIKE general%;


++---+---+---+-+--+-
--+-+
 | table  | type  | possible_keys | key   | key_len | ref  |
 rows  | Extra   |

++---+---+---+-+--+-
--+-+
 | Defendants | range | Defendant | Defendant |  30 | NULL |
 82756 | where used; Using temporary |

++---+---+---+-+--+-
--+-+

 ([Defendant] Query WITH a join - 8.79 seconds!
 EXPLAIN SELECT Defendants.CaseNumber FROM Defendants, Cases WHERE
 Cases.CaseNumber = Defendants.CaseNumber AND Filed = 1999-01-01 AND
 (Defendant LIKE owen% OR Defendant LIKE pitts% OR Defendant LIKE
 general%);

+++--+-+-+--
-+---++
 | table  | type   | possible_keys| key | key_len |
 ref   | rows  | Extra  |

+++--+-+-+--
-+---++
 | Defendants | range  | CaseNumber,Defendant | Defendant   |  30 |
 NULL  | 82756 | where used |
 | Cases  | eq_ref | CasesNumber,Filed| CasesNumber |  30 |
 Defendants.CaseNumber | 1 | where used |

+++--+-+-+--
-+---++

 Table and index information
 Cases Table - 140,000 records
 +---+---+--+-+-+---+
 | Field

Re: Table joins are slow things to deal with. . .

2003-01-22 Thread Michael T. Babcock
Steve Quezadas wrote:


PS Here is some information about my tables and indexes:



Maybe I missed it, but where's the EXPLAIN on the JOIN query?

--
Michael T. Babcock
C.T.O., FibreSpeed Ltd.
http://www.fibrespeed.net/~mbabcock



-
Before posting, please check:
  http://www.mysql.com/manual.php   (the manual)
  http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Table joins are slow things to deal with. . .

2003-01-22 Thread Benjamin Pflugmann
Hello.

On Wed 2003-01-22 at 09:13:20 +0100, [EMAIL PROTECTED] wrote:
 Steve,
 
  ([Defendant] Query WITH a join - 8.79 seconds!
  EXPLAIN SELECT Defendants.CaseNumber FROM Defendants, Cases WHERE
  Cases.CaseNumber = Defendants.CaseNumber AND Filed = 1999-01-01 AND
  (Defendant LIKE owen% OR Defendant LIKE pitts% OR Defendant LIKE
  general%);
 
 First thing that comes into mind is: You said you indexed the join
 fields (i.e. CaseNumber), but what about Defendants.Defendant? Okay, you
 did. Hmm.
 
 Next thing would be to check if the appropriate keys were used
 (CaseNumber, Defendant). Yes, they were used.
 
 Next thing would be to check if the optimizer chose a good join type:
 
  | Defendants | range  | CaseNumber,Defendant | Defendant   |  30 |
  | Cases  | eq_ref | CasesNumber,Filed| CasesNumber |  30 |
 
 The join types are range and eq_ref, that's fine.
 
 The number of examined rows in Defendants seems okay, too:
 
  rows  | Extra  |
  82756 | where used |

Good check list. :-)

Two things I noticed: Using a 30-byte wide index (CaseNumber) is not
fastest. Probably it would help to create an additional INT
AUTO_INCREMENT with key and join via that. That reduces the amount of
data to be read from disk for the index by the factor 4.25
((30+4)/(4+4)), and also cuts down the time for comparisions and
improves relative effictivness of the key cache. Whether this the main
reason for the slowness, I don't know, but it cannot hurt to try.

Another thing that I would try is to to create a combined index on
(Filed,CaseNumber) and (CaseNumber,Filed). Drop the one that does not
get used afterwards. This should enable MySQL to resolve the join only
using the index: currently it has to load the data records in order to
check whether Filed is in range (i.e. it has to load 34,000 records
although only 10,500 are returned afterwards that is about 24,000
seeks and reads without need). An index on (Defendend,CaseNumber) will
also help. Of course, if you introduce an id (the INT) column as
suggested above, use that instead of CaseNumber in the combined
indexes.

And at last, be sure to run OPTIMIZE and ANALYZE on the tables (of,
course, you can also start with that :-).

Please check speed after each change and report back. I am very
interested to hear whether any of the suggestion had real effect in
your case (the EXPLAIN for each would also be nice).

Btw, what size is your key_cache? And how many reads/misses does it
have?

HTH,

Benjamin.


 Okay, I leave this one to the gurus :-/

Not sure that I'd call me like this, but I chose to add my 2 cents
anyhow. ;)

[...]
  Maybe I'm dumb for saying this, but sql joins seems expensive to do in
  terms of performance (yes, I indexed the joined fields). If I do a
 query
  search of a 2,600,000 record defendant table WITHOUT a join (SELECT
  DISTINCT CaseNumber FROM Defendants WHERE Defendant LIKE owen% OR
  Defendant LIKE pitts% OR Defendant LIKE general%;). Performance is
  generally zippy at .53 seconds (which ends up pulling about 34,000
 rows).
 
  HOWEVER, once I join this table with the much smaller [Cases] table
  (about 140,000 rows), performance plummets to 8.79 seconds! Quite a
  drop! The SQL statement is: 'SELECT Defendants.CaseNumber FROM
  Defendants, Cases WHERE Cases.CaseNumber = Defendants.CaseNumber AND
  Filed = 1999-01-01 AND (Defendant LIKE owen% OR Defendant LIKE
  pitts% OR Defendant LIKE general%);'. I get about 10,500 rows
  returned here, but man, what an increase in time.
[...]
  ([Defendant] Query WITHOUT a join) - .53 seconds.
  EXPLAIN SELECT DISTINCT CaseNumber FROM Defendants WHERE Defendant
 LIKE
  owen% OR Defendant LIKE pitts% OR Defendant LIKE general%;
 
 
 ++---+---+---+-+--+-
 --+-+
  | table  | type  | possible_keys | key   | key_len | ref  |
  rows  | Extra   |
 
 ++---+---+---+-+--+-
 --+-+
  | Defendants | range | Defendant | Defendant |  30 | NULL |
  82756 | where used; Using temporary |
 
 ++---+---+---+-+--+-
 --+-+
 
  ([Defendant] Query WITH a join - 8.79 seconds!
  EXPLAIN SELECT Defendants.CaseNumber FROM Defendants, Cases WHERE
  Cases.CaseNumber = Defendants.CaseNumber AND Filed = 1999-01-01 AND
  (Defendant LIKE owen% OR Defendant LIKE pitts% OR Defendant LIKE
  general%);
 
 +++--+-+-+--
 -+---++
  | table  | type   | possible_keys| key | key_len |
  ref   | rows  | Extra  |
 
 +++--+-+-+--
 -+---++
  | Defendants | range  | CaseNumber,Defendant | Defendant   |  30 |
  NULL 

Table joins are slow things to deal with. . .

2003-01-21 Thread Steve Quezadas
Maybe I'm dumb for saying this, but sql joins seems expensive to do in 
terms of performance (yes, I indexed the joined fields). If I do a query 
search of a 2,600,000 record defendant table WITHOUT a join (SELECT 
DISTINCT CaseNumber FROM Defendants WHERE Defendant LIKE owen% OR 
Defendant LIKE pitts% OR Defendant LIKE general%;). Performance is 
generally zippy at .53 seconds (which ends up pulling about 34,000 rows).

HOWEVER, once I join this table with the much smaller [Cases] table 
(about 140,000 rows), performance plummets to 8.79 seconds! Quite a 
drop! The SQL statement is: 'SELECT Defendants.CaseNumber FROM 
Defendants, Cases WHERE Cases.CaseNumber = Defendants.CaseNumber AND 
Filed = 1999-01-01 AND (Defendant LIKE owen% OR Defendant LIKE 
pitts% OR Defendant LIKE general%);'. I get about 10,500 rows 
returned here, but man, what an increase in time.

So my main question is: Is this degradation in performance to be 
expected with a join? What I am thinking about doing is perhaps doing 
two smaller queries instead of one larger one. My first query can just 
query the very large Defendants table without a join and create a 
temporary table of CaseNumbers (takes about .53 seconds). Then I can 
join that temporary table into the much smaller [Cases] table and then 
run that part of the query (which takes about 1.23 seconds).

So bottom line:
Results of big query with one join: 8.79 seconds
Results of creating two smaller queries: 1.76 seconds (.53 seconds + 
1.23 seconds).

So I am thinking about optimizing the query in my PHP program to make 
two smaller queries instead of one bigger query. Is this a standard 
thing to do to increase speed? Or am I doing something stupid and 
inefficient in the join that I am not aware of?

- Steve

PS Here is some information about my tables and indexes:

([Defendant] Query WITHOUT a join) - .53 seconds.
EXPLAIN SELECT DISTINCT CaseNumber FROM Defendants WHERE Defendant LIKE 
owen% OR Defendant LIKE pitts% OR Defendant LIKE general%;

++---+---+---+-+--+---+-+
| table  | type  | possible_keys | key   | key_len | ref  | 
rows  | Extra   |
++---+---+---+-+--+---+-+
| Defendants | range | Defendant | Defendant |  30 | NULL | 
82756 | where used; Using temporary |
++---+---+---+-+--+---+-+

([Defendant] Query WITH a join - 8.79 seconds!
EXPLAIN SELECT Defendants.CaseNumber FROM Defendants, Cases WHERE 
Cases.CaseNumber = Defendants.CaseNumber AND Filed = 1999-01-01 AND 
(Defendant LIKE owen% OR Defendant LIKE pitts% OR Defendant LIKE 
general%);
+++--+-+-+---+---++
| table  | type   | possible_keys| key | key_len | 
ref   | rows  | Extra  |
+++--+-+-+---+---++
| Defendants | range  | CaseNumber,Defendant | Defendant   |  30 | 
NULL  | 82756 | where used |
| Cases  | eq_ref | CasesNumber,Filed| CasesNumber |  30 | 
Defendants.CaseNumber | 1 | where used |
+++--+-+-+---+---++

Table and index information
Cases Table - 140,000 records
+---+---+--+-+-+---+
| Field | Type  | Null | Key | Default | Extra |
+---+---+--+-+-+---+
| CaseNumber| char(30)  |  | PRI | |   |
| County| char(30)  |  | | |   |
| CaseName  | char(120) |  | | |   |
| Court | char(30)  |  | | |   |
| Filed | datetime  | YES  | MUL | NULL|   |
| NumPlaintiffs | tinyint(4)| YES  | | NULL|   |
| DateAdded | timestamp(14) | YES  | | NULL|   |
+---+---+--+-+-+---+

Defendants table - 2,600,000 records
+-+---+--+-+-+---+
| Field   | Type  | Null | Key | Default | Extra |
+-+---+--+-+-+---+
| CaseNumber  | char(30)  |  | MUL | |   |
| Defendant   | char(30)  |  | MUL | |   |
| DateEntered | timestamp(14) | YES  | | NULL|   |
+-+---+--+-+-+---+

Index of Defendants
++++--+-+---+-+--++-+
| Table  | Non_unique | Key_name   | Seq_in_index | Column_name | 
Collation | Cardinality