Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)

2002-12-19 Thread Brent Baisley
Why isn't Seq_ID not an unsigned int? Primary key should always be 
something generated by the system that has no other significance than 
being a primary key. If there actually is a seq_id piece of data that 
has some other significance, I wouldn't use it to link all your data. 
Then you don't have to have a compound index (Seq_ID, Homolog_PID) in 
your NewSequence_Homolog table.

I could understand why you may need to use text for the Comment, but 
would char(255) be good enough for the title? Then you could split 
Comment out to a separate table so you can get fixed length records. If 
comments should always be displayed with the data, then perhaps putting 
a flag field in the table to indicate there are comments. Then comments 
can be viewed individually.

Your simple left join does seem to be taking quite a long time. I 
wouldn't consider myself an expert in MySQL, but I would think that your 
index key length of 50 has to be slowing things down. The left join 
example you have is also scanning an entire table of 2676711 records. I 
don't know how big the table is in disk space, but I would guess the 
entire table is not cached in RAM.

A few weeks back I did read something about a company that was doing 
modeling on grain falling in a silo or something very complex like fluid 
dynamics. They were having severe performance issue where it would take 
10 hours to model something. They used all the various Unix tools to 
determine where the bottleneck was (disk, memory, or CPU). It was RAM 
and disk I/O (due to low RAM) that was slowing things down. They started 
adding disks for scratch areas and virtual memory and made sure there 
were no hot disks. They knocked about two hours off of the time. They 
then added a whole bunch of RAM so the entire table could be loaded into 
RAM and got things down to something like 17 minutes.
Regardless of whether you are using Oracle or MySQL, you are still 
limited by the hardware you are running it on. Finding what the 
bottleneck is (disk, memory, cpu, or network) is the key.

On Wednesday, December 18, 2002, at 04:17 PM, Qunfeng Dong wrote:

This NewSequence table is used to track some general
info about sequence. Notice I have to use text
datatype to describe Comment and Seq_Title fields;
therefore I have to use varchar for other string
fields. In addition, the Seq_ID is not numerical.
BTW, I found indexing on Seq_Type. Organism which are
very repeative still helps with accessing. This table
has 2676711 rows.


--
Brent Baisley
Systems Architect
Landover Associates, Inc.
Search  Advisory Services for Advanced Technology Environments
p: 212.759.6400/800.759.0577


-
Before posting, please check:
  http://www.mysql.com/manual.php   (the manual)
  http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Thanks! Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)

2002-12-19 Thread Qunfeng Dong
Well, thanks to all of your great help! I am able to
speed up the query {select count(*) from NEW_Sequence
s left join NEW_Sequence_Homolog h on s.Seq_ID =
h.Seq_ID;} from 1 min 52.61 sec down to 20.62 sec now.
The only thing I changed so far was the Seq_ID from
type varchar to bigint. The Seq_ID was not all
numerical for different type of Sequences; but I
managed to assign numerical code to those
non-numerical ones now. 

Qunfeng

  CREATE TABLE NewSequence
  (
  Seq_ID  varchar(50) NOT NULL,
  GenBank_Acc varchar(10),
  Organismvarchar(50) NOT NULL,
  Seq_Type  enum(EST,GSS,EST
 Contig,EST
  Singlet,GSS Contig,GSS Singlet,GSS Plasmid
  Contig,Protein) NOT NULL,
  Seq_Length  int NOT NULL,
  Seq_Title   textNOT NULL,
  Comment text,
  Entry_Date  dateNOT NULL,
  PRIMARY KEY (Seq_ID),
  UNIQUE  (GenBank_Acc),
  INDEX (Seq_Type),
  INDEX (Organism)
  );
 
  This NewSequence table is used to track some
 general
  info about sequence. Notice I have to use text
  datatype to describe Comment and Seq_Title
 fields;
  therefore I have to use varchar for other string
  fields. In addition, the Seq_ID is not numerical.
  BTW, I found indexing on Seq_Type. Organism which
 are
  very repeative still helps with accessing. This
 table
  has 2676711 rows.
 
 
  CREATE TABLE NewSequence_Homolog
  (
  Seq_ID  varchar(50) NOT NULL,
  Homolog_PID int NOT NULL,
  Homolog_Descvarchar(50) NOT NULL,
  Homolog_Species varchar(50),
  PRIMARY KEY (Seq_ID, Homolog_PID)
  );
 
  This NewSequence_Homolog table is to track which
  protein sequences (homolog) are similar to the
  sequence I store in the NewSequence table. This
 table
  has 997654 rows.
 
  mysql select count(*) from NewSequence s left
 join
  NewSequence_Homolog h on s.Seq_ID = h.Seq_ID;
  +--+
  | count(*) |
  +--+
  |  3292029 |
  +--+
  1 row in set (1 min 30.50 sec)
 
  So a simple left join took about 1 min and half.
  First, is this slow or I am too picky?
 
  This is the Explain.
  mysql explain select count(*) from NewSequence s
 left
  join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID;
 

+---+---+---+-+-+--+-+--
 ---+
  | table | type  | possible_keys | key |
 key_len |
  ref  | rows| Extra   |
 

+---+---+---+-+-+--+-+--
 ---+
  | s | index | NULL  | PRIMARY | 
 50 |
  NULL | 2676711 | Using index |
  | h | ref   | PRIMARY   | PRIMARY | 
 50 |
  s.Seq_ID |9976 | Using index |
 

+---+---+---+-+-+--+-+--
 ---+
 
 
  I am running MySQL 3.23.49 on RedHat linux 7.3 on
 a
  dedicated server with 4 GB memory. The only
 setting I
  changed is to copy the my-huge.cnf into
 /etc/my.cnf.
 
  Qunfeng
 
  --- Michael T. Babcock [EMAIL PROTECTED]
  wrote:
   Qunfeng Dong wrote:
  
   not-so-good performance (join on tables much
   smaller
   than yours takes minutes even using index) and
 I
   seem
   to read all the docs I could find on the web
 about
   how
   to optimize but they are not working for me (I
 am
   
  
   Have you stored a slow query log to run them
 through
   'explain' and see
   why they're slow?  Do you want to post some of
 them
   here so we can
   suggest what might be done to make them faster?
 
=== message truncated ===


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)

2002-12-18 Thread Qunfeng Dong
Boy, you guys are die-hard MySQL fans :-) I think your
strong defending convinced us MySQL can handle 120
million records. But I know some ordinary users out
there like me who are not experts on tuning the MySQL
performance (they did send me private emails saying
they encountered the similar slow join problem). So
please help us to keep the faith.  

We are trying to develop a simple biology database to
maintain some DNA Sequence information. My problem is
coming from the following two tables:

CREATE TABLE NewSequence
(
Seq_ID  varchar(50) NOT NULL,
GenBank_Acc varchar(10),
Organismvarchar(50) NOT NULL,
Seq_Type  enum(EST,GSS,EST Contig,EST
Singlet,GSS Contig,GSS Singlet,GSS Plasmid
Contig,Protein) NOT NULL,
Seq_Length  int NOT NULL,
Seq_Title   textNOT NULL,
Comment text,
Entry_Date  dateNOT NULL,
PRIMARY KEY (Seq_ID),
UNIQUE  (GenBank_Acc),
INDEX (Seq_Type),
INDEX (Organism)
);

CREATE TABLE NewSequence_Homolog
(
Seq_ID  varchar(50) NOT NULL,
Homolog_PID int NOT NULL,
Homolog_Descvarchar(50) NOT NULL,
Homolog_Species varchar(50),
PRIMARY KEY (Seq_ID, Homolog_PID)
);






--- Michael T. Babcock [EMAIL PROTECTED]
wrote:
 Qunfeng Dong wrote:
 
 not-so-good performance (join on tables much
 smaller
 than yours takes minutes even using index) and I
 seem
 to read all the docs I could find on the web about
 how
 to optimize but they are not working for me (I am
 
 
 Have you stored a slow query log to run them through
 'explain' and see 
 why they're slow?  Do you want to post some of them
 here so we can 
 suggest what might be done to make them faster?
 
 -- 
 Michael T. Babcock
 C.T.O., FibreSpeed Ltd.
 http://www.fibrespeed.net/~mbabcock
 
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)

2002-12-18 Thread Qunfeng Dong
Boy, you guys are die-hard MySQL fans :-) I think your
strong defending convinced us MySQL can handle 120
million records :-) But I know some ordinary users out
there like me who are not experts on tuning the MySQL
performance (they did send me private emails saying
they encountered the similar slow join problem). So
please help us to keep the faith.  

We are trying to develop a simple biology database to
maintain some DNA Sequence information. My problem is
coming from the following two tables:

CREATE TABLE NewSequence
(
Seq_ID  varchar(50) NOT NULL,
GenBank_Acc varchar(10),
Organismvarchar(50) NOT NULL,
Seq_Type  enum(EST,GSS,EST Contig,EST  
Singlet,GSS Contig,GSS Singlet,GSS Plasmid
Contig,Protein) NOT NULL,
Seq_Length  int NOT NULL,
Seq_Title   textNOT NULL,
Comment text,
Entry_Date  dateNOT NULL,
PRIMARY KEY (Seq_ID),
UNIQUE  (GenBank_Acc),
INDEX (Seq_Type),
INDEX (Organism)
);

This NewSequence table is used to track some general
info about sequence. Notice I have to use text
datatype to describe Comment and Seq_Title fields;
therefore I have to use varchar for other string
fields. In addition, the Seq_ID is not numerical. 
BTW, I found indexing on Seq_Type. Organism which are
very repeative still helps with accessing. This table
has 2676711 rows.


CREATE TABLE NewSequence_Homolog
(
Seq_ID  varchar(50) NOT NULL,
Homolog_PID int NOT NULL,
Homolog_Descvarchar(50) NOT NULL,
Homolog_Species varchar(50),
PRIMARY KEY (Seq_ID, Homolog_PID)
);

This NewSequence_Homolog table is to track which
protein sequences (homolog) are similar to the
sequence I store in the NewSequence table. This table
has 997654 rows. 

mysql select count(*) from NewSequence s left join
NewSequence_Homolog h on s.Seq_ID = h.Seq_ID;
+--+
| count(*) |
+--+
|  3292029 |
+--+
1 row in set (1 min 30.50 sec)

So a simple left join took about 1 min and half.
First, is this slow or I am too picky?

This is the Explain.
mysql explain select count(*) from NewSequence s left
join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID;
+---+---+---+-+-+--+-+-+
| table | type  | possible_keys | key | key_len |
ref  | rows| Extra   |
+---+---+---+-+-+--+-+-+
| s | index | NULL  | PRIMARY |  50 |
NULL | 2676711 | Using index |
| h | ref   | PRIMARY   | PRIMARY |  50 |
s.Seq_ID |9976 | Using index |
+---+---+---+-+-+--+-+-+


I am running MySQL 3.23.49 on RedHat linux 7.3 on a
dedicated server with 4 GB memory. The only setting I
changed is to copy the my-huge.cnf into /etc/my.cnf.

Qunfeng

--- Michael T. Babcock [EMAIL PROTECTED]
wrote:
 Qunfeng Dong wrote:
 
 not-so-good performance (join on tables much
 smaller
 than yours takes minutes even using index) and I
 seem
 to read all the docs I could find on the web about
 how
 to optimize but they are not working for me (I am
 
 
 Have you stored a slow query log to run them through
 'explain' and see 
 why they're slow?  Do you want to post some of them
 here so we can 
 suggest what might be done to make them faster?
 
 -- 
 Michael T. Babcock
 C.T.O., FibreSpeed Ltd.
 http://www.fibrespeed.net/~mbabcock
 
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)

2002-12-18 Thread Jocelyn Fournier
Hi,

I think you'd better add an unique ID to both table defined as int
corresponding to each seq_ID, and then do the join on this ID rather than on
Seq_ID (join on varchar is far from the fastest solution :)) (unless seq_ID
could be converted into int directly ?)

(but it takes time, even for me (bi athlon MP 2200+) :


mysql SELECT COUNT(*) FROM searchmainhardwarefr7 LEFT JOIN
searchjoinhardwarefr7 ON
searchjoinhardwarefr7.numreponse=searchmainhardwarefr7.numreponse;
+--+
| COUNT(*) |
+--+
| 39396361 |
+--+
1 row in set (3 min 23.15 sec)

mysql EXPLAIN SELECT COUNT(*) FROM searchmainhardwarefr7 LEFT JOIN
searchjoinhardwarefr7 ON
searchjoinhardwarefr7.numreponse=searchmainhardwarefr7.numreponse;
++-+---++---+---
-+-+--+--+-+
| id | select_type | table | type   | possible_keys | key
| key_len | ref  | rows | Extra   |
++-+---++---+---
-+-+--+--+-+
|  1 | SIMPLE  | searchmainhardwarefr7 | index  | NULL  |
numreponse |   4 | NULL | 39396576 | Using
index |
|  1 | SIMPLE  | searchjoinhardwarefr7 | eq_ref | numreponse|
numreponse |   4 | searchmainhardwarefr7.numreponse |1 | Using
index |
++-+---++---+---
-+-+--+--+-+
)


Regards,
  Jocelyn
- Original Message -
From: Qunfeng Dong [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, December 18, 2002 9:17 PM
Subject: Re: Can MySQL handle 120 million records? - Ok, If you guys really
can handle tens of millions records, you have to help me to enjoy MySQL too
:-)


 Boy, you guys are die-hard MySQL fans :-) I think your
 strong defending convinced us MySQL can handle 120
 million records :-) But I know some ordinary users out
 there like me who are not experts on tuning the MySQL
 performance (they did send me private emails saying
 they encountered the similar slow join problem). So
 please help us to keep the faith.

 We are trying to develop a simple biology database to
 maintain some DNA Sequence information. My problem is
 coming from the following two tables:

 CREATE TABLE NewSequence
 (
 Seq_ID  varchar(50) NOT NULL,
 GenBank_Acc varchar(10),
 Organismvarchar(50) NOT NULL,
 Seq_Type  enum(EST,GSS,EST Contig,EST
 Singlet,GSS Contig,GSS Singlet,GSS Plasmid
 Contig,Protein) NOT NULL,
 Seq_Length  int NOT NULL,
 Seq_Title   textNOT NULL,
 Comment text,
 Entry_Date  dateNOT NULL,
 PRIMARY KEY (Seq_ID),
 UNIQUE  (GenBank_Acc),
 INDEX (Seq_Type),
 INDEX (Organism)
 );

 This NewSequence table is used to track some general
 info about sequence. Notice I have to use text
 datatype to describe Comment and Seq_Title fields;
 therefore I have to use varchar for other string
 fields. In addition, the Seq_ID is not numerical.
 BTW, I found indexing on Seq_Type. Organism which are
 very repeative still helps with accessing. This table
 has 2676711 rows.


 CREATE TABLE NewSequence_Homolog
 (
 Seq_ID  varchar(50) NOT NULL,
 Homolog_PID int NOT NULL,
 Homolog_Descvarchar(50) NOT NULL,
 Homolog_Species varchar(50),
 PRIMARY KEY (Seq_ID, Homolog_PID)
 );

 This NewSequence_Homolog table is to track which
 protein sequences (homolog) are similar to the
 sequence I store in the NewSequence table. This table
 has 997654 rows.

 mysql select count(*) from NewSequence s left join
 NewSequence_Homolog h on s.Seq_ID = h.Seq_ID;
 +--+
 | count(*) |
 +--+
 |  3292029 |
 +--+
 1 row in set (1 min 30.50 sec)

 So a simple left join took about 1 min and half.
 First, is this slow or I am too picky?

 This is the Explain.
 mysql explain select count(*) from NewSequence s left
 join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID;

+---+---+---+-+-+--+-+--
---+
 | table | type  | possible_keys | key | key_len |
 ref  | rows| Extra   |

+---+---+---+-+-+--+-+--
---+
 | s | index | NULL  | PRIMARY |  50 |
 NULL | 2676711 | Using index |
 | h | ref   | PRIMARY   | PRIMARY |  50 |
 s.Seq_ID |9976 | Using index |

+---+---+---+-+-+--+-+--
---+


 I am running MySQL 3.23.49 on RedHat linux 7.3 on a
 dedicated server with 4 GB memory. The only setting I
 changed is to copy

Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)

2002-12-18 Thread Ryan Fox

- Original Message -
From: Qunfeng Dong [EMAIL PROTECTED]

 We are trying to develop a simple biology database to
 maintain some DNA Sequence information. My problem is
 coming from the following two tables:
snip

Making indexes smaller will help.  Does it need to be varchar(50)?
Also, I'd consider creating a numeric auto_increment primary key on your
NewSequence table, and using it to relate the 2 tables together.  It may
make for some more complex SQL statements to describe the relationship, but
you'll gain the time back in performance.  Consider using the following.

CREATE TABLE NewSequence
(
id  int(11) not null auto_increment,
Seq_ID  varchar(50) NOT NULL,
GenBank_Acc varchar(10),
Organismvarchar(50) NOT NULL,
Seq_Type  enum(EST,GSS,EST Contig,EST
Singlet,GSS Contig,GSS Singlet,GSS Plasmid
Contig,Protein) NOT NULL,
Seq_Length  int NOT NULL,
Seq_Title   textNOT NULL,
Comment text,
Entry_Date  dateNOT NULL,
PRIMARY KEY (id),
UNIQUE (Seq_ID),
UNIQUE  (GenBank_Acc),
INDEX (Seq_Type),
INDEX (Organism)
);

CREATE TABLE NewSequence_Homolog
(
id  int(11) NOT NULL,
Homolog_PID int NOT NULL,
Homolog_Descvarchar(50) NOT NULL,
Homolog_Species varchar(50),
PRIMARY KEY (id, Homolog_PID)
);

This would make your example query:
select count(*) from NewSequence s left join NewSequence_Homolog h on s.id =
h.id;
And this would run much quicker, as instead of searching through 50
character indexes for each table, it would only have to look at 11 digit
indexes.  Much quicker.

Ryan

sql, query and stuff


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php