Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)

Jocelyn Fournier Wed, 18 Dec 2002 19:56:33 -0800

Hi,

I think you'd better add an unique ID to both table defined as int
corresponding to each seq_ID, and then do the join on this ID rather than on
Seq_ID (join on varchar is far from the fastest solution :)) (unless seq_ID
could be converted into int directly ?)


(but it takes time, even for me (bi athlon MP 2200+) :


mysql> SELECT COUNT(*) FROM searchmainhardwarefr7 LEFT JOIN
searchjoinhardwarefr7 ON
searchjoinhardwarefr7.numreponse=searchmainhardwarefr7.numreponse;
+----------+
| COUNT(*) |
+----------+
| 39396361 |
+----------+
1 row in set (3 min 23.15 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM searchmainhardwarefr7 LEFT JOIN
searchjoinhardwarefr7 ON
searchjoinhardwarefr7.numreponse=searchmainhardwarefr7.numreponse;
+----+-------------+-----------------------+--------+---------------+-------
-----+---------+----------------------------------+----------+-------------+
| id | select_type | table                 | type   | possible_keys | key
| key_len | ref                              | rows     | Extra       |
+----+-------------+-----------------------+--------+---------------+-------
-----+---------+----------------------------------+----------+-------------+
|  1 | SIMPLE      | searchmainhardwarefr7 | index  | NULL          |
numreponse |       4 | NULL                             | 39396576 | Using
index |
|  1 | SIMPLE      | searchjoinhardwarefr7 | eq_ref | numreponse    |
numreponse |       4 | searchmainhardwarefr7.numreponse |        1 | Using
index |
+----+-------------+-----------------------+--------+---------------+-------
-----+---------+----------------------------------+----------+-------------+
)


Regards,
  Jocelyn
----- Original Message -----
From: "Qunfeng Dong" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, December 18, 2002 9:17 PM
Subject: Re: Can MySQL handle 120 million records? - Ok, If you guys really
can handle tens of millions records, you have to help me to enjoy MySQL too
:-)


> Boy, you guys are die-hard MySQL fans :-) I think your
> strong defending convinced us MySQL can handle 120
> million records :-) But I know some ordinary users out
> there like me who are not experts on tuning the MySQL
> performance (they did send me private emails saying
> they encountered the similar slow join problem). So
> please help us to keep the faith.
>
> We are trying to develop a simple biology database to
> maintain some DNA Sequence information. My problem is
> coming from the following two tables:
>
> CREATE TABLE NewSequence
> (
>         Seq_ID          varchar(50)     NOT NULL,
>         GenBank_Acc     varchar(10),
>         Organism        varchar(50)     NOT NULL,
>         Seq_Type  enum("EST","GSS","EST Contig","EST
> Singlet","GSS Contig","GSS Singlet","GSS Plasmid
> Contig","Protein") NOT NULL,
>         Seq_Length      int             NOT NULL,
>         Seq_Title       text            NOT NULL,
>         Comment         text,
>         Entry_Date      date            NOT NULL,
>         PRIMARY KEY     (Seq_ID),
>         UNIQUE  (GenBank_Acc),
>         INDEX (Seq_Type),
>         INDEX (Organism)
> );
>
> This NewSequence table is used to track some general
> info about sequence. Notice I have to use text
> datatype to describe "Comment" and "Seq_Title" fields;
> therefore I have to use varchar for other string
> fields. In addition, the Seq_ID is not numerical.
> BTW, I found indexing on Seq_Type. Organism which are
> very repeative still helps with accessing. This table
> has 2676711 rows.
>
>
> CREATE TABLE NewSequence_Homolog
> (
>         Seq_ID          varchar(50)     NOT NULL,
>         Homolog_PID     int             NOT NULL,
>         Homolog_Desc    varchar(50)     NOT NULL,
>         Homolog_Species varchar(50),
>         PRIMARY KEY     (Seq_ID, Homolog_PID)
> );
>
> This NewSequence_Homolog table is to track which
> protein sequences (homolog) are similar to the
> sequence I store in the NewSequence table. This table
> has 997654 rows.
>
> mysql> select count(*) from NewSequence s left join
> NewSequence_Homolog h on s.Seq_ID = h.Seq_ID;
> +----------+
> | count(*) |
> +----------+
> |  3292029 |
> +----------+
> 1 row in set (1 min 30.50 sec)
>
> So a simple left join took about 1 min and half.
> First, is this slow or I am too picky?
>
> This is the "Explain".
> mysql> explain select count(*) from NewSequence s left
> join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID;
>
+-------+-------+---------------+---------+---------+----------+---------+--
-----------+
> | table | type  | possible_keys | key     | key_len |
> ref      | rows    | Extra       |
>
+-------+-------+---------------+---------+---------+----------+---------+--
-----------+
> | s     | index | NULL          | PRIMARY |      50 |
> NULL     | 2676711 | Using index |
> | h     | ref   | PRIMARY       | PRIMARY |      50 |
> s.Seq_ID |    9976 | Using index |
>
+-------+-------+---------------+---------+---------+----------+---------+--
-----------+
>
>
> I am running MySQL 3.23.49 on RedHat linux 7.3 on a
> dedicated server with 4 GB memory. The only setting I
> changed is to copy the my-huge.cnf into /etc/my.cnf.
>
> Qunfeng
>
> --- "Michael T. Babcock" <[EMAIL PROTECTED]>
> wrote:
> > Qunfeng Dong wrote:
> >
> > >not-so-good performance (join on tables much
> > smaller
> > >than yours takes minutes even using index) and I
> > seem
> > >to read all the docs I could find on the web about
> > how
> > >to optimize but they are not working for me (I am
> > >
> >
> > Have you stored a slow query log to run them through
> > 'explain' and see
> > why they're slow?  Do you want to post some of them
> > here so we can
> > suggest what might be done to make them faster?
> >
> > --
> > Michael T. Babcock
> > C.T.O., FibreSpeed Ltd.
> > http://www.fibrespeed.net/~mbabcock
> >
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail
<[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>
>
>
>


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)

Reply via email to