Howto optimize Distinct query over 1.3mil rows?

2010-09-28 Thread John Stoffel

Hi,

I'm running MySQL 5.0.51a-24+lenny3-log  on a Debian Lenny box with
2Gb of RAM and a pair of dual core 2.6Ghz CPUs.  I'm using a pair of
40Gb disks mirrored using MD (Linux software RAID) for both the OS and
the storage of the mysql DBs.  

My problem child is doing this simple query:

   mysql select distinct Call_No from Newspaper_Issues
   mysql WHERE BIB_ID = 464;
   +--+
   | Call_No  |
   +--+
   | News | 
   | NewsD CT | 
   +--+
   2 rows in set (2.98 sec)

The Newspaper_Issues table has 1.3 million rows, and has a structure
like this:

   mysql describe  Newspaper_Issues;
   ++-+--+-+-++
   | Field  | Type| Null | Key | Default | Extra  |
   ++-+--+-+-++
   | Record_No  | int(11) | NO   | PRI | NULL| auto_increment | 
   | BIB_ID | varchar(38) | NO   | MUL | NULL|| 
   | Issue_Date | date| NO   | MUL | NULL|| 
   | Type_Code  | char(1) | NO   | | r   || 
   | Condition_Code | char(1) | NO   | | o   || 
   | Format_Code| char(1) | NO   | | p   || 
   | Edition_Code   | char(1) | NO   | | n   || 
   | Date_Type_Code | char(1) | NO   | | n   || 
   | Ed_Type| tinyint(1)  | NO   | | 1   || 
   | RS_Code| char(1) | NO   | | c   || 
   | Call_No| varchar(36) | YES  | MUL | NULL|| 
   | Printed_Date   | varchar(10) | YES  | | NULL|| 
   | Update_Date| date| NO   | | NULL|| 
   ++-+--+-+-++
   13 rows in set (0.00 sec)


I've tried adding various indexes, and reading up on howto optimize
DISTINCT or GROUP BY queries, but I'm hitting a wall here.  My current indexes 
are:

mysql show index from Newspaper_Issues;
+--++--+--+-+---+-+--++--++-+
| Table| Non_unique | Key_name | Seq_in_index | 
Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | 
Comment |
+--++--+--+-+---+-+--++--++-+
| Newspaper_Issues |  0 | PRIMARY  |1 | 
Record_No   | A | 1333298 | NULL | NULL   |  | BTREE  | 
| 
| Newspaper_Issues |  1 | BIB_ID   |1 | BIB_ID  
| A |   14980 | NULL | NULL   |  | BTREE  | 
| 
| Newspaper_Issues |  1 | Call_No  |1 | Call_No 
| A | 927 | NULL | NULL   | YES  | BTREE  | 
| 
| Newspaper_Issues |  1 | Issue_Date   |1 | 
Issue_Date  | A |   49381 | NULL | NULL   |  | BTREE  | 
| 
| Newspaper_Issues |  1 | BIB_ID_Issue_Date|1 | BIB_ID  
| A |   14980 | NULL | NULL   |  | BTREE  | 
| 
| Newspaper_Issues |  1 | BIB_ID_Issue_Date|2 | 
Issue_Date  | A | 1333298 | NULL | NULL   |  | BTREE  | 
| 
| Newspaper_Issues |  1 | call_no_short|1 | Call_No 
| A |  30 |6 | NULL   | YES  | BTREE  | 
| 
| Newspaper_Issues |  1 | BIB_ID_call_no_short |1 | BIB_ID  
| A |   14980 | NULL | NULL   |  | BTREE  | 
| 
| Newspaper_Issues |  1 | BIB_ID_call_no_short |2 | Call_No 
| A |   15503 | NULL | NULL   | YES  | BTREE  | 
| 
| Newspaper_Issues |  1 | call_no_bib_id   |1 | Call_No 
| A | 927 | NULL | NULL   | YES  | BTREE  | 
| 
| Newspaper_Issues |  1 | call_no_bib_id   |2 | BIB_ID  
| A |   15503 | NULL | NULL   |  | BTREE  | 
| 
+--++--+--+-+---+-+--++--++-+
11 rows in set (0.00 sec)


So now when I do an explain on my query I get:

mysql explain select distinct(Call_No) from Newspaper_Issues WHERE BIB_ID 
= 464;

++-+--+---+---++-+--+-+--+
| id | select_type | table| 

Re: Howto optimize Distinct query over 1.3mil rows?

2010-09-28 Thread Johnny Withers
BIB_ID is VARCHAR, you should probably try

WHERE BIB_ID='464' so MySQL treats the value as a string

JW


On Tue, Sep 28, 2010 at 10:02 AM, John Stoffel j...@stoffel.org wrote:


 Hi,

 I'm running MySQL 5.0.51a-24+lenny3-log  on a Debian Lenny box with
 2Gb of RAM and a pair of dual core 2.6Ghz CPUs.  I'm using a pair of
 40Gb disks mirrored using MD (Linux software RAID) for both the OS and
 the storage of the mysql DBs.

 My problem child is doing this simple query:

   mysql select distinct Call_No from Newspaper_Issues
   mysql WHERE BIB_ID = 464;
   +--+
   | Call_No  |
   +--+
   | News |
   | NewsD CT |
   +--+
   2 rows in set (2.98 sec)

 The Newspaper_Issues table has 1.3 million rows, and has a structure
 like this:

   mysql describe  Newspaper_Issues;
   ++-+--+-+-++
   | Field  | Type| Null | Key | Default | Extra  |
   ++-+--+-+-++
   | Record_No  | int(11) | NO   | PRI | NULL| auto_increment |
   | BIB_ID | varchar(38) | NO   | MUL | NULL||
   | Issue_Date | date| NO   | MUL | NULL||
   | Type_Code  | char(1) | NO   | | r   ||
   | Condition_Code | char(1) | NO   | | o   ||
   | Format_Code| char(1) | NO   | | p   ||
   | Edition_Code   | char(1) | NO   | | n   ||
   | Date_Type_Code | char(1) | NO   | | n   ||
   | Ed_Type| tinyint(1)  | NO   | | 1   ||
   | RS_Code| char(1) | NO   | | c   ||
   | Call_No| varchar(36) | YES  | MUL | NULL||
   | Printed_Date   | varchar(10) | YES  | | NULL||
   | Update_Date| date| NO   | | NULL||
   ++-+--+-+-++
   13 rows in set (0.00 sec)


 I've tried adding various indexes, and reading up on howto optimize
 DISTINCT or GROUP BY queries, but I'm hitting a wall here.  My current
 indexes are:

 mysql show index from Newspaper_Issues;

 +--++--+--+-+---+-+--++--++-+
 | Table| Non_unique | Key_name | Seq_in_index |
 Column_name | Collation | Cardinality | Sub_part | Packed | Null |
 Index_type | Comment |

 +--++--+--+-+---+-+--++--++-+
 | Newspaper_Issues |  0 | PRIMARY  |1 |
 Record_No   | A | 1333298 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID   |1 |
 BIB_ID  | A |   14980 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | Call_No  |1 |
 Call_No | A | 927 | NULL | NULL   | YES  | BTREE
  | |
 | Newspaper_Issues |  1 | Issue_Date   |1 |
 Issue_Date  | A |   49381 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID_Issue_Date|1 |
 BIB_ID  | A |   14980 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID_Issue_Date|2 |
 Issue_Date  | A | 1333298 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | call_no_short|1 |
 Call_No | A |  30 |6 | NULL   | YES  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID_call_no_short |1 |
 BIB_ID  | A |   14980 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID_call_no_short |2 |
 Call_No | A |   15503 | NULL | NULL   | YES  | BTREE
  | |
 | Newspaper_Issues |  1 | call_no_bib_id   |1 |
 Call_No | A | 927 | NULL | NULL   | YES  | BTREE
  | |
 | Newspaper_Issues |  1 | call_no_bib_id   |2 |
 BIB_ID  | A |   15503 | NULL | NULL   |  | BTREE
  | |

 +--++--+--+-+---+-+--++--++-+
 11 rows in set (0.00 sec)


 So now when I do an explain on my query I get:

mysql explain select distinct(Call_No) from Newspaper_Issues WHERE
 BIB_ID = 464;

  
 

Re: Howto optimize Distinct query over 1.3mil rows?

2010-09-28 Thread Johan De Meersman
If Cal_NO is a recurring value, then yes, that is the way it should be done
in a relational schema.

Your index cardinality of 15.000 against 1.3 million rows is reasonable,
although not incredible; is your index cache large enough to acccomodate all
your indices ?


On Tue, Sep 28, 2010 at 5:02 PM, John Stoffel j...@stoffel.org wrote:


 Hi,

 I'm running MySQL 5.0.51a-24+lenny3-log  on a Debian Lenny box with
 2Gb of RAM and a pair of dual core 2.6Ghz CPUs.  I'm using a pair of
 40Gb disks mirrored using MD (Linux software RAID) for both the OS and
 the storage of the mysql DBs.

 My problem child is doing this simple query:

   mysql select distinct Call_No from Newspaper_Issues
   mysql WHERE BIB_ID = 464;
   +--+
   | Call_No  |
   +--+
   | News |
   | NewsD CT |
   +--+
   2 rows in set (2.98 sec)

 The Newspaper_Issues table has 1.3 million rows, and has a structure
 like this:

   mysql describe  Newspaper_Issues;
   ++-+--+-+-++
   | Field  | Type| Null | Key | Default | Extra  |
   ++-+--+-+-++
   | Record_No  | int(11) | NO   | PRI | NULL| auto_increment |
   | BIB_ID | varchar(38) | NO   | MUL | NULL||
   | Issue_Date | date| NO   | MUL | NULL||
   | Type_Code  | char(1) | NO   | | r   ||
   | Condition_Code | char(1) | NO   | | o   ||
   | Format_Code| char(1) | NO   | | p   ||
   | Edition_Code   | char(1) | NO   | | n   ||
   | Date_Type_Code | char(1) | NO   | | n   ||
   | Ed_Type| tinyint(1)  | NO   | | 1   ||
   | RS_Code| char(1) | NO   | | c   ||
   | Call_No| varchar(36) | YES  | MUL | NULL||
   | Printed_Date   | varchar(10) | YES  | | NULL||
   | Update_Date| date| NO   | | NULL||
   ++-+--+-+-++
   13 rows in set (0.00 sec)


 I've tried adding various indexes, and reading up on howto optimize
 DISTINCT or GROUP BY queries, but I'm hitting a wall here.  My current
 indexes are:

 mysql show index from Newspaper_Issues;

 +--++--+--+-+---+-+--++--++-+
 | Table| Non_unique | Key_name | Seq_in_index |
 Column_name | Collation | Cardinality | Sub_part | Packed | Null |
 Index_type | Comment |

 +--++--+--+-+---+-+--++--++-+
 | Newspaper_Issues |  0 | PRIMARY  |1 |
 Record_No   | A | 1333298 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID   |1 |
 BIB_ID  | A |   14980 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | Call_No  |1 |
 Call_No | A | 927 | NULL | NULL   | YES  | BTREE
  | |
 | Newspaper_Issues |  1 | Issue_Date   |1 |
 Issue_Date  | A |   49381 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID_Issue_Date|1 |
 BIB_ID  | A |   14980 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID_Issue_Date|2 |
 Issue_Date  | A | 1333298 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | call_no_short|1 |
 Call_No | A |  30 |6 | NULL   | YES  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID_call_no_short |1 |
 BIB_ID  | A |   14980 | NULL | NULL   |  | BTREE
  | |
 | Newspaper_Issues |  1 | BIB_ID_call_no_short |2 |
 Call_No | A |   15503 | NULL | NULL   | YES  | BTREE
  | |
 | Newspaper_Issues |  1 | call_no_bib_id   |1 |
 Call_No | A | 927 | NULL | NULL   | YES  | BTREE
  | |
 | Newspaper_Issues |  1 | call_no_bib_id   |2 |
 BIB_ID  | A |   15503 | NULL | NULL   |  | BTREE
  | |

 +--++--+--+-+---+-+--++--++-+
 11 rows in set (0.00 sec)


 So now when I do an explain on my query I get:


Re: Howto optimize Distinct query over 1.3mil rows?

2010-09-28 Thread John Stoffel

Johnny BIB_ID is VARCHAR, you should probably try
Johnny WHERE BIB_ID='464' so MySQL treats the value as a string

Wow!  What a difference that makes!  Time to A) update my queries, or
B) fix the DB schema to NOT use varchar there.  

mysql select SQL_NO_CACHE distinct(Call_No) from Newspaper_Issues
mysql WHERE BIB_ID = 464;
+--+
| Call_No  |
+--+
| News | 
| NewsD CT | 
+--+
2 rows in set (3.06 sec)

mysql select SQL_NO_CACHE distinct(Call_No) from Newspaper_Issues
mysql WHERE BIB_ID = '464';
+--+
| Call_No  |
+--+
| News | 
| NewsD CT | 
+--+
2 rows in set (0.02 sec)

Thanks a ton for your help, I would have never figured this out, esp
since I was looking down all the wrong rat holes.  

Thanks,
John

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org