Re: Howto optimize Distinct query over 1.3mil rows?

Johnny Withers Tue, 28 Sep 2010 08:50:00 -0700

BIB_ID is VARCHAR, you should probably try

WHERE BIB_ID='464' so MySQL treats the value as a string


JW


On Tue, Sep 28, 2010 at 10:02 AM, John Stoffel <[email protected]> wrote:

>
> Hi,
>
> I'm running MySQL 5.0.51a-24+lenny3-log  on a Debian Lenny box with
> 2Gb of RAM and a pair of dual core 2.6Ghz CPUs.  I'm using a pair of
> 40Gb disks mirrored using MD (Linux software RAID) for both the OS and
> the storage of the mysql DBs.
>
> My problem child is doing this simple query:
>
>   mysql> select distinct Call_No from Newspaper_Issues
>   mysql> WHERE BIB_ID = 464;
>   +----------+
>   | Call_No  |
>   +----------+
>   | News     |
>   | NewsD CT |
>   +----------+
>   2 rows in set (2.98 sec)
>
> The Newspaper_Issues table has 1.3 million rows, and has a structure
> like this:
>
>   mysql> describe  Newspaper_Issues;
>   +----------------+-------------+------+-----+---------+----------------+
>   | Field          | Type        | Null | Key | Default | Extra          |
>   +----------------+-------------+------+-----+---------+----------------+
>   | Record_No      | int(11)     | NO   | PRI | NULL    | auto_increment |
>   | BIB_ID         | varchar(38) | NO   | MUL | NULL    |                |
>   | Issue_Date     | date        | NO   | MUL | NULL    |                |
>   | Type_Code      | char(1)     | NO   |     | r       |                |
>   | Condition_Code | char(1)     | NO   |     | o       |                |
>   | Format_Code    | char(1)     | NO   |     | p       |                |
>   | Edition_Code   | char(1)     | NO   |     | n       |                |
>   | Date_Type_Code | char(1)     | NO   |     | n       |                |
>   | Ed_Type        | tinyint(1)  | NO   |     | 1       |                |
>   | RS_Code        | char(1)     | NO   |     | c       |                |
>   | Call_No        | varchar(36) | YES  | MUL | NULL    |                |
>   | Printed_Date   | varchar(10) | YES  |     | NULL    |                |
>   | Update_Date    | date        | NO   |     | NULL    |                |
>   +----------------+-------------+------+-----+---------+----------------+
>   13 rows in set (0.00 sec)
>
>
> I've tried adding various indexes, and reading up on howto optimize
> DISTINCT or GROUP BY queries, but I'm hitting a wall here.  My current
> indexes are:
>
> mysql> show index from Newspaper_Issues;
>
> +------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> | Table            | Non_unique | Key_name             | Seq_in_index |
> Column_name | Collation | Cardinality | Sub_part | Packed | Null |
> Index_type | Comment |
>
> +------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> | Newspaper_Issues |          0 | PRIMARY              |            1 |
> Record_No   | A         |     1333298 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID               |            1 |
> BIB_ID      | A         |       14980 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | Call_No              |            1 |
> Call_No     | A         |         927 |     NULL | NULL   | YES  | BTREE
>  |         |
> | Newspaper_Issues |          1 | Issue_Date           |            1 |
> Issue_Date  | A         |       49381 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID_Issue_Date    |            1 |
> BIB_ID      | A         |       14980 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID_Issue_Date    |            2 |
> Issue_Date  | A         |     1333298 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | call_no_short        |            1 |
> Call_No     | A         |          30 |        6 | NULL   | YES  | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID_call_no_short |            1 |
> BIB_ID      | A         |       14980 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID_call_no_short |            2 |
> Call_No     | A         |       15503 |     NULL | NULL   | YES  | BTREE
>  |         |
> | Newspaper_Issues |          1 | call_no_bib_id       |            1 |
> Call_No     | A         |         927 |     NULL | NULL   | YES  | BTREE
>  |         |
> | Newspaper_Issues |          1 | call_no_bib_id       |            2 |
> BIB_ID      | A         |       15503 |     NULL | NULL   |      | BTREE
>  |         |
>
> +------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> 11 rows in set (0.00 sec)
>
>
> So now when I do an explain on my query I get:
>
>    mysql> explain select distinct(Call_No) from Newspaper_Issues WHERE
> BIB_ID = 464;
>
>  
> +----+-------------+------------------+-------+-----------------------------------------------+----------------+---------+------+---------+--------------------------+
>    | id | select_type | table            | type  | possible_keys
>                       | key            | key_len | ref  | rows    | Extra
>                  |
>
>  
> +----+-------------+------------------+-------+-----------------------------------------------+----------------+---------+------+---------+--------------------------+
>    |  1 | SIMPLE      | Newspaper_Issues | index |
> BIB_ID,BIB_ID_Issue_Date,BIB_ID_call_no_short | call_no_bib_id | 227     |
> NULL | 1333298 | Using where; Using index |
>
>  
> +----+-------------+------------------+-------+-----------------------------------------------+----------------+---------+------+---------+--------------------------+
>    1 row in set (0.00 sec)
>
> Interestingly enough, I get much better performance if I just drop the
> WHERE clause, but that doesn't help me get what I want though.  *grin*
>
>    mysql> explain select distinct(Call_No) from Newspaper_Issues;
>
>  
> +----+-------------+------------------+-------+---------------+---------+---------+------+------+--------------------------+
>    | id | select_type | table            | type  | possible_keys | key
> | key_len | ref  | rows | Extra                    |
>
>  
> +----+-------------+------------------+-------+---------------+---------+---------+------+------+--------------------------+
>    |  1 | SIMPLE      | Newspaper_Issues | range | NULL          | Call_No
> | 111     | NULL |  928 | Using index for group-by |
>
>  
> +----+-------------+------------------+-------+---------------+---------+---------+------+------+--------------------------+
>    1 row in set (0.00 sec)
>
>
>
> Would it make sense to split the Call_No data off into it's own table, and
> put in a proper numeric ID, instead of the current VARCHAR(36) it uses?  So
> in Newspaper_Issues I'd just have a Call_No_ID and a new Call_No table with:
>
>      Call_No_ID    INT
>      Call_No       Char(36)
>
> which would hopefully index better?  I only have 928 distinct Call_No
> strings, so I'm not afraid of wasting space or anything.
>
> Thanks,
> John
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[email protected]
>
>


-- 
-----------------------------
Johnny Withers
601.209.4985
[email protected]

Re: Howto optimize Distinct query over 1.3mil rows?

Reply via email to