Howto optimize Distinct query over 1.3mil rows?
Hi, I'm running MySQL 5.0.51a-24+lenny3-log on a Debian Lenny box with 2Gb of RAM and a pair of dual core 2.6Ghz CPUs. I'm using a pair of 40Gb disks mirrored using MD (Linux software RAID) for both the OS and the storage of the mysql DBs. My problem child is doing this simple query: mysql select distinct Call_No from Newspaper_Issues mysql WHERE BIB_ID = 464; +--+ | Call_No | +--+ | News | | NewsD CT | +--+ 2 rows in set (2.98 sec) The Newspaper_Issues table has 1.3 million rows, and has a structure like this: mysql describe Newspaper_Issues; ++-+--+-+-++ | Field | Type| Null | Key | Default | Extra | ++-+--+-+-++ | Record_No | int(11) | NO | PRI | NULL| auto_increment | | BIB_ID | varchar(38) | NO | MUL | NULL|| | Issue_Date | date| NO | MUL | NULL|| | Type_Code | char(1) | NO | | r || | Condition_Code | char(1) | NO | | o || | Format_Code| char(1) | NO | | p || | Edition_Code | char(1) | NO | | n || | Date_Type_Code | char(1) | NO | | n || | Ed_Type| tinyint(1) | NO | | 1 || | RS_Code| char(1) | NO | | c || | Call_No| varchar(36) | YES | MUL | NULL|| | Printed_Date | varchar(10) | YES | | NULL|| | Update_Date| date| NO | | NULL|| ++-+--+-+-++ 13 rows in set (0.00 sec) I've tried adding various indexes, and reading up on howto optimize DISTINCT or GROUP BY queries, but I'm hitting a wall here. My current indexes are: mysql show index from Newspaper_Issues; +--++--+--+-+---+-+--++--++-+ | Table| Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +--++--+--+-+---+-+--++--++-+ | Newspaper_Issues | 0 | PRIMARY |1 | Record_No | A | 1333298 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID |1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | Call_No |1 | Call_No | A | 927 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | Issue_Date |1 | Issue_Date | A | 49381 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_Issue_Date|1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_Issue_Date|2 | Issue_Date | A | 1333298 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | call_no_short|1 | Call_No | A | 30 |6 | NULL | YES | BTREE | | | Newspaper_Issues | 1 | BIB_ID_call_no_short |1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_call_no_short |2 | Call_No | A | 15503 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | call_no_bib_id |1 | Call_No | A | 927 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | call_no_bib_id |2 | BIB_ID | A | 15503 | NULL | NULL | | BTREE | | +--++--+--+-+---+-+--++--++-+ 11 rows in set (0.00 sec) So now when I do an explain on my query I get: mysql explain select distinct(Call_No) from Newspaper_Issues WHERE BIB_ID = 464; ++-+--+---+---++-+--+-+--+ | id | select_type | table|
Re: Howto optimize Distinct query over 1.3mil rows?
BIB_ID is VARCHAR, you should probably try WHERE BIB_ID='464' so MySQL treats the value as a string JW On Tue, Sep 28, 2010 at 10:02 AM, John Stoffel j...@stoffel.org wrote: Hi, I'm running MySQL 5.0.51a-24+lenny3-log on a Debian Lenny box with 2Gb of RAM and a pair of dual core 2.6Ghz CPUs. I'm using a pair of 40Gb disks mirrored using MD (Linux software RAID) for both the OS and the storage of the mysql DBs. My problem child is doing this simple query: mysql select distinct Call_No from Newspaper_Issues mysql WHERE BIB_ID = 464; +--+ | Call_No | +--+ | News | | NewsD CT | +--+ 2 rows in set (2.98 sec) The Newspaper_Issues table has 1.3 million rows, and has a structure like this: mysql describe Newspaper_Issues; ++-+--+-+-++ | Field | Type| Null | Key | Default | Extra | ++-+--+-+-++ | Record_No | int(11) | NO | PRI | NULL| auto_increment | | BIB_ID | varchar(38) | NO | MUL | NULL|| | Issue_Date | date| NO | MUL | NULL|| | Type_Code | char(1) | NO | | r || | Condition_Code | char(1) | NO | | o || | Format_Code| char(1) | NO | | p || | Edition_Code | char(1) | NO | | n || | Date_Type_Code | char(1) | NO | | n || | Ed_Type| tinyint(1) | NO | | 1 || | RS_Code| char(1) | NO | | c || | Call_No| varchar(36) | YES | MUL | NULL|| | Printed_Date | varchar(10) | YES | | NULL|| | Update_Date| date| NO | | NULL|| ++-+--+-+-++ 13 rows in set (0.00 sec) I've tried adding various indexes, and reading up on howto optimize DISTINCT or GROUP BY queries, but I'm hitting a wall here. My current indexes are: mysql show index from Newspaper_Issues; +--++--+--+-+---+-+--++--++-+ | Table| Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +--++--+--+-+---+-+--++--++-+ | Newspaper_Issues | 0 | PRIMARY |1 | Record_No | A | 1333298 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID |1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | Call_No |1 | Call_No | A | 927 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | Issue_Date |1 | Issue_Date | A | 49381 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_Issue_Date|1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_Issue_Date|2 | Issue_Date | A | 1333298 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | call_no_short|1 | Call_No | A | 30 |6 | NULL | YES | BTREE | | | Newspaper_Issues | 1 | BIB_ID_call_no_short |1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_call_no_short |2 | Call_No | A | 15503 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | call_no_bib_id |1 | Call_No | A | 927 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | call_no_bib_id |2 | BIB_ID | A | 15503 | NULL | NULL | | BTREE | | +--++--+--+-+---+-+--++--++-+ 11 rows in set (0.00 sec) So now when I do an explain on my query I get: mysql explain select distinct(Call_No) from Newspaper_Issues WHERE BIB_ID = 464;
Re: Howto optimize Distinct query over 1.3mil rows?
If Cal_NO is a recurring value, then yes, that is the way it should be done in a relational schema. Your index cardinality of 15.000 against 1.3 million rows is reasonable, although not incredible; is your index cache large enough to acccomodate all your indices ? On Tue, Sep 28, 2010 at 5:02 PM, John Stoffel j...@stoffel.org wrote: Hi, I'm running MySQL 5.0.51a-24+lenny3-log on a Debian Lenny box with 2Gb of RAM and a pair of dual core 2.6Ghz CPUs. I'm using a pair of 40Gb disks mirrored using MD (Linux software RAID) for both the OS and the storage of the mysql DBs. My problem child is doing this simple query: mysql select distinct Call_No from Newspaper_Issues mysql WHERE BIB_ID = 464; +--+ | Call_No | +--+ | News | | NewsD CT | +--+ 2 rows in set (2.98 sec) The Newspaper_Issues table has 1.3 million rows, and has a structure like this: mysql describe Newspaper_Issues; ++-+--+-+-++ | Field | Type| Null | Key | Default | Extra | ++-+--+-+-++ | Record_No | int(11) | NO | PRI | NULL| auto_increment | | BIB_ID | varchar(38) | NO | MUL | NULL|| | Issue_Date | date| NO | MUL | NULL|| | Type_Code | char(1) | NO | | r || | Condition_Code | char(1) | NO | | o || | Format_Code| char(1) | NO | | p || | Edition_Code | char(1) | NO | | n || | Date_Type_Code | char(1) | NO | | n || | Ed_Type| tinyint(1) | NO | | 1 || | RS_Code| char(1) | NO | | c || | Call_No| varchar(36) | YES | MUL | NULL|| | Printed_Date | varchar(10) | YES | | NULL|| | Update_Date| date| NO | | NULL|| ++-+--+-+-++ 13 rows in set (0.00 sec) I've tried adding various indexes, and reading up on howto optimize DISTINCT or GROUP BY queries, but I'm hitting a wall here. My current indexes are: mysql show index from Newspaper_Issues; +--++--+--+-+---+-+--++--++-+ | Table| Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +--++--+--+-+---+-+--++--++-+ | Newspaper_Issues | 0 | PRIMARY |1 | Record_No | A | 1333298 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID |1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | Call_No |1 | Call_No | A | 927 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | Issue_Date |1 | Issue_Date | A | 49381 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_Issue_Date|1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_Issue_Date|2 | Issue_Date | A | 1333298 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | call_no_short|1 | Call_No | A | 30 |6 | NULL | YES | BTREE | | | Newspaper_Issues | 1 | BIB_ID_call_no_short |1 | BIB_ID | A | 14980 | NULL | NULL | | BTREE | | | Newspaper_Issues | 1 | BIB_ID_call_no_short |2 | Call_No | A | 15503 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | call_no_bib_id |1 | Call_No | A | 927 | NULL | NULL | YES | BTREE | | | Newspaper_Issues | 1 | call_no_bib_id |2 | BIB_ID | A | 15503 | NULL | NULL | | BTREE | | +--++--+--+-+---+-+--++--++-+ 11 rows in set (0.00 sec) So now when I do an explain on my query I get:
Re: Howto optimize Distinct query over 1.3mil rows?
Johnny BIB_ID is VARCHAR, you should probably try Johnny WHERE BIB_ID='464' so MySQL treats the value as a string Wow! What a difference that makes! Time to A) update my queries, or B) fix the DB schema to NOT use varchar there. mysql select SQL_NO_CACHE distinct(Call_No) from Newspaper_Issues mysql WHERE BIB_ID = 464; +--+ | Call_No | +--+ | News | | NewsD CT | +--+ 2 rows in set (3.06 sec) mysql select SQL_NO_CACHE distinct(Call_No) from Newspaper_Issues mysql WHERE BIB_ID = '464'; +--+ | Call_No | +--+ | News | | NewsD CT | +--+ 2 rows in set (0.02 sec) Thanks a ton for your help, I would have never figured this out, esp since I was looking down all the wrong rat holes. Thanks, John -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org