Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)
Why isn't Seq_ID not an unsigned int? Primary key should always be something generated by the system that has no other significance than being a primary key. If there actually is a seq_id piece of data that has some other significance, I wouldn't use it to link all your data. Then you don't have to have a compound index (Seq_ID, Homolog_PID) in your NewSequence_Homolog table. I could understand why you may need to use text for the Comment, but would char(255) be good enough for the title? Then you could split Comment out to a separate table so you can get fixed length records. If comments should always be displayed with the data, then perhaps putting a flag field in the table to indicate there are comments. Then comments can be viewed individually. Your simple left join does seem to be taking quite a long time. I wouldn't consider myself an expert in MySQL, but I would think that your index key length of 50 has to be slowing things down. The left join example you have is also scanning an entire table of 2676711 records. I don't know how big the table is in disk space, but I would guess the entire table is not cached in RAM. A few weeks back I did read something about a company that was doing modeling on grain falling in a silo or something very complex like fluid dynamics. They were having severe performance issue where it would take 10 hours to model something. They used all the various Unix tools to determine where the bottleneck was (disk, memory, or CPU). It was RAM and disk I/O (due to low RAM) that was slowing things down. They started adding disks for scratch areas and virtual memory and made sure there were no hot disks. They knocked about two hours off of the time. They then added a whole bunch of RAM so the entire table could be loaded into RAM and got things down to something like 17 minutes. Regardless of whether you are using Oracle or MySQL, you are still limited by the hardware you are running it on. Finding what the bottleneck is (disk, memory, cpu, or network) is the key. On Wednesday, December 18, 2002, at 04:17 PM, Qunfeng Dong wrote: This NewSequence table is used to track some general info about sequence. Notice I have to use text datatype to describe Comment and Seq_Title fields; therefore I have to use varchar for other string fields. In addition, the Seq_ID is not numerical. BTW, I found indexing on Seq_Type. Organism which are very repeative still helps with accessing. This table has 2676711 rows. -- Brent Baisley Systems Architect Landover Associates, Inc. Search Advisory Services for Advanced Technology Environments p: 212.759.6400/800.759.0577 - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Thanks! Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)
Well, thanks to all of your great help! I am able to speed up the query {select count(*) from NEW_Sequence s left join NEW_Sequence_Homolog h on s.Seq_ID = h.Seq_ID;} from 1 min 52.61 sec down to 20.62 sec now. The only thing I changed so far was the Seq_ID from type varchar to bigint. The Seq_ID was not all numerical for different type of Sequences; but I managed to assign numerical code to those non-numerical ones now. Qunfeng CREATE TABLE NewSequence ( Seq_ID varchar(50) NOT NULL, GenBank_Acc varchar(10), Organismvarchar(50) NOT NULL, Seq_Type enum(EST,GSS,EST Contig,EST Singlet,GSS Contig,GSS Singlet,GSS Plasmid Contig,Protein) NOT NULL, Seq_Length int NOT NULL, Seq_Title textNOT NULL, Comment text, Entry_Date dateNOT NULL, PRIMARY KEY (Seq_ID), UNIQUE (GenBank_Acc), INDEX (Seq_Type), INDEX (Organism) ); This NewSequence table is used to track some general info about sequence. Notice I have to use text datatype to describe Comment and Seq_Title fields; therefore I have to use varchar for other string fields. In addition, the Seq_ID is not numerical. BTW, I found indexing on Seq_Type. Organism which are very repeative still helps with accessing. This table has 2676711 rows. CREATE TABLE NewSequence_Homolog ( Seq_ID varchar(50) NOT NULL, Homolog_PID int NOT NULL, Homolog_Descvarchar(50) NOT NULL, Homolog_Species varchar(50), PRIMARY KEY (Seq_ID, Homolog_PID) ); This NewSequence_Homolog table is to track which protein sequences (homolog) are similar to the sequence I store in the NewSequence table. This table has 997654 rows. mysql select count(*) from NewSequence s left join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID; +--+ | count(*) | +--+ | 3292029 | +--+ 1 row in set (1 min 30.50 sec) So a simple left join took about 1 min and half. First, is this slow or I am too picky? This is the Explain. mysql explain select count(*) from NewSequence s left join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID; +---+---+---+-+-+--+-+-- ---+ | table | type | possible_keys | key | key_len | ref | rows| Extra | +---+---+---+-+-+--+-+-- ---+ | s | index | NULL | PRIMARY | 50 | NULL | 2676711 | Using index | | h | ref | PRIMARY | PRIMARY | 50 | s.Seq_ID |9976 | Using index | +---+---+---+-+-+--+-+-- ---+ I am running MySQL 3.23.49 on RedHat linux 7.3 on a dedicated server with 4 GB memory. The only setting I changed is to copy the my-huge.cnf into /etc/my.cnf. Qunfeng --- Michael T. Babcock [EMAIL PROTECTED] wrote: Qunfeng Dong wrote: not-so-good performance (join on tables much smaller than yours takes minutes even using index) and I seem to read all the docs I could find on the web about how to optimize but they are not working for me (I am Have you stored a slow query log to run them through 'explain' and see why they're slow? Do you want to post some of them here so we can suggest what might be done to make them faster? === message truncated === __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)
Boy, you guys are die-hard MySQL fans :-) I think your strong defending convinced us MySQL can handle 120 million records. But I know some ordinary users out there like me who are not experts on tuning the MySQL performance (they did send me private emails saying they encountered the similar slow join problem). So please help us to keep the faith. We are trying to develop a simple biology database to maintain some DNA Sequence information. My problem is coming from the following two tables: CREATE TABLE NewSequence ( Seq_ID varchar(50) NOT NULL, GenBank_Acc varchar(10), Organismvarchar(50) NOT NULL, Seq_Type enum(EST,GSS,EST Contig,EST Singlet,GSS Contig,GSS Singlet,GSS Plasmid Contig,Protein) NOT NULL, Seq_Length int NOT NULL, Seq_Title textNOT NULL, Comment text, Entry_Date dateNOT NULL, PRIMARY KEY (Seq_ID), UNIQUE (GenBank_Acc), INDEX (Seq_Type), INDEX (Organism) ); CREATE TABLE NewSequence_Homolog ( Seq_ID varchar(50) NOT NULL, Homolog_PID int NOT NULL, Homolog_Descvarchar(50) NOT NULL, Homolog_Species varchar(50), PRIMARY KEY (Seq_ID, Homolog_PID) ); --- Michael T. Babcock [EMAIL PROTECTED] wrote: Qunfeng Dong wrote: not-so-good performance (join on tables much smaller than yours takes minutes even using index) and I seem to read all the docs I could find on the web about how to optimize but they are not working for me (I am Have you stored a slow query log to run them through 'explain' and see why they're slow? Do you want to post some of them here so we can suggest what might be done to make them faster? -- Michael T. Babcock C.T.O., FibreSpeed Ltd. http://www.fibrespeed.net/~mbabcock __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)
Boy, you guys are die-hard MySQL fans :-) I think your strong defending convinced us MySQL can handle 120 million records :-) But I know some ordinary users out there like me who are not experts on tuning the MySQL performance (they did send me private emails saying they encountered the similar slow join problem). So please help us to keep the faith. We are trying to develop a simple biology database to maintain some DNA Sequence information. My problem is coming from the following two tables: CREATE TABLE NewSequence ( Seq_ID varchar(50) NOT NULL, GenBank_Acc varchar(10), Organismvarchar(50) NOT NULL, Seq_Type enum(EST,GSS,EST Contig,EST Singlet,GSS Contig,GSS Singlet,GSS Plasmid Contig,Protein) NOT NULL, Seq_Length int NOT NULL, Seq_Title textNOT NULL, Comment text, Entry_Date dateNOT NULL, PRIMARY KEY (Seq_ID), UNIQUE (GenBank_Acc), INDEX (Seq_Type), INDEX (Organism) ); This NewSequence table is used to track some general info about sequence. Notice I have to use text datatype to describe Comment and Seq_Title fields; therefore I have to use varchar for other string fields. In addition, the Seq_ID is not numerical. BTW, I found indexing on Seq_Type. Organism which are very repeative still helps with accessing. This table has 2676711 rows. CREATE TABLE NewSequence_Homolog ( Seq_ID varchar(50) NOT NULL, Homolog_PID int NOT NULL, Homolog_Descvarchar(50) NOT NULL, Homolog_Species varchar(50), PRIMARY KEY (Seq_ID, Homolog_PID) ); This NewSequence_Homolog table is to track which protein sequences (homolog) are similar to the sequence I store in the NewSequence table. This table has 997654 rows. mysql select count(*) from NewSequence s left join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID; +--+ | count(*) | +--+ | 3292029 | +--+ 1 row in set (1 min 30.50 sec) So a simple left join took about 1 min and half. First, is this slow or I am too picky? This is the Explain. mysql explain select count(*) from NewSequence s left join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID; +---+---+---+-+-+--+-+-+ | table | type | possible_keys | key | key_len | ref | rows| Extra | +---+---+---+-+-+--+-+-+ | s | index | NULL | PRIMARY | 50 | NULL | 2676711 | Using index | | h | ref | PRIMARY | PRIMARY | 50 | s.Seq_ID |9976 | Using index | +---+---+---+-+-+--+-+-+ I am running MySQL 3.23.49 on RedHat linux 7.3 on a dedicated server with 4 GB memory. The only setting I changed is to copy the my-huge.cnf into /etc/my.cnf. Qunfeng --- Michael T. Babcock [EMAIL PROTECTED] wrote: Qunfeng Dong wrote: not-so-good performance (join on tables much smaller than yours takes minutes even using index) and I seem to read all the docs I could find on the web about how to optimize but they are not working for me (I am Have you stored a slow query log to run them through 'explain' and see why they're slow? Do you want to post some of them here so we can suggest what might be done to make them faster? -- Michael T. Babcock C.T.O., FibreSpeed Ltd. http://www.fibrespeed.net/~mbabcock __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)
Hi, I think you'd better add an unique ID to both table defined as int corresponding to each seq_ID, and then do the join on this ID rather than on Seq_ID (join on varchar is far from the fastest solution :)) (unless seq_ID could be converted into int directly ?) (but it takes time, even for me (bi athlon MP 2200+) : mysql SELECT COUNT(*) FROM searchmainhardwarefr7 LEFT JOIN searchjoinhardwarefr7 ON searchjoinhardwarefr7.numreponse=searchmainhardwarefr7.numreponse; +--+ | COUNT(*) | +--+ | 39396361 | +--+ 1 row in set (3 min 23.15 sec) mysql EXPLAIN SELECT COUNT(*) FROM searchmainhardwarefr7 LEFT JOIN searchjoinhardwarefr7 ON searchjoinhardwarefr7.numreponse=searchmainhardwarefr7.numreponse; ++-+---++---+--- -+-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+--- -+-+--+--+-+ | 1 | SIMPLE | searchmainhardwarefr7 | index | NULL | numreponse | 4 | NULL | 39396576 | Using index | | 1 | SIMPLE | searchjoinhardwarefr7 | eq_ref | numreponse| numreponse | 4 | searchmainhardwarefr7.numreponse |1 | Using index | ++-+---++---+--- -+-+--+--+-+ ) Regards, Jocelyn - Original Message - From: Qunfeng Dong [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, December 18, 2002 9:17 PM Subject: Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-) Boy, you guys are die-hard MySQL fans :-) I think your strong defending convinced us MySQL can handle 120 million records :-) But I know some ordinary users out there like me who are not experts on tuning the MySQL performance (they did send me private emails saying they encountered the similar slow join problem). So please help us to keep the faith. We are trying to develop a simple biology database to maintain some DNA Sequence information. My problem is coming from the following two tables: CREATE TABLE NewSequence ( Seq_ID varchar(50) NOT NULL, GenBank_Acc varchar(10), Organismvarchar(50) NOT NULL, Seq_Type enum(EST,GSS,EST Contig,EST Singlet,GSS Contig,GSS Singlet,GSS Plasmid Contig,Protein) NOT NULL, Seq_Length int NOT NULL, Seq_Title textNOT NULL, Comment text, Entry_Date dateNOT NULL, PRIMARY KEY (Seq_ID), UNIQUE (GenBank_Acc), INDEX (Seq_Type), INDEX (Organism) ); This NewSequence table is used to track some general info about sequence. Notice I have to use text datatype to describe Comment and Seq_Title fields; therefore I have to use varchar for other string fields. In addition, the Seq_ID is not numerical. BTW, I found indexing on Seq_Type. Organism which are very repeative still helps with accessing. This table has 2676711 rows. CREATE TABLE NewSequence_Homolog ( Seq_ID varchar(50) NOT NULL, Homolog_PID int NOT NULL, Homolog_Descvarchar(50) NOT NULL, Homolog_Species varchar(50), PRIMARY KEY (Seq_ID, Homolog_PID) ); This NewSequence_Homolog table is to track which protein sequences (homolog) are similar to the sequence I store in the NewSequence table. This table has 997654 rows. mysql select count(*) from NewSequence s left join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID; +--+ | count(*) | +--+ | 3292029 | +--+ 1 row in set (1 min 30.50 sec) So a simple left join took about 1 min and half. First, is this slow or I am too picky? This is the Explain. mysql explain select count(*) from NewSequence s left join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID; +---+---+---+-+-+--+-+-- ---+ | table | type | possible_keys | key | key_len | ref | rows| Extra | +---+---+---+-+-+--+-+-- ---+ | s | index | NULL | PRIMARY | 50 | NULL | 2676711 | Using index | | h | ref | PRIMARY | PRIMARY | 50 | s.Seq_ID |9976 | Using index | +---+---+---+-+-+--+-+-- ---+ I am running MySQL 3.23.49 on RedHat linux 7.3 on a dedicated server with 4 GB memory. The only setting I changed is to copy
Re: Can MySQL handle 120 million records? - Ok, If you guys really can handle tens of millions records, you have to help me to enjoy MySQL too :-)
- Original Message - From: Qunfeng Dong [EMAIL PROTECTED] We are trying to develop a simple biology database to maintain some DNA Sequence information. My problem is coming from the following two tables: snip Making indexes smaller will help. Does it need to be varchar(50)? Also, I'd consider creating a numeric auto_increment primary key on your NewSequence table, and using it to relate the 2 tables together. It may make for some more complex SQL statements to describe the relationship, but you'll gain the time back in performance. Consider using the following. CREATE TABLE NewSequence ( id int(11) not null auto_increment, Seq_ID varchar(50) NOT NULL, GenBank_Acc varchar(10), Organismvarchar(50) NOT NULL, Seq_Type enum(EST,GSS,EST Contig,EST Singlet,GSS Contig,GSS Singlet,GSS Plasmid Contig,Protein) NOT NULL, Seq_Length int NOT NULL, Seq_Title textNOT NULL, Comment text, Entry_Date dateNOT NULL, PRIMARY KEY (id), UNIQUE (Seq_ID), UNIQUE (GenBank_Acc), INDEX (Seq_Type), INDEX (Organism) ); CREATE TABLE NewSequence_Homolog ( id int(11) NOT NULL, Homolog_PID int NOT NULL, Homolog_Descvarchar(50) NOT NULL, Homolog_Species varchar(50), PRIMARY KEY (id, Homolog_PID) ); This would make your example query: select count(*) from NewSequence s left join NewSequence_Homolog h on s.id = h.id; And this would run much quicker, as instead of searching through 50 character indexes for each table, it would only have to look at 11 digit indexes. Much quicker. Ryan sql, query and stuff - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php