Well, thanks to all of your great help! I am able to speed up the query {select count(*) from NEW_Sequence s left join NEW_Sequence_Homolog h on s.Seq_ID = h.Seq_ID;} from 1 min 52.61 sec down to 20.62 sec now. The only thing I changed so far was the Seq_ID from type varchar to bigint. The Seq_ID was not all numerical for different type of Sequences; but I managed to assign numerical code to those non-numerical ones now.
Qunfeng > > CREATE TABLE NewSequence > > ( > > Seq_ID varchar(50) NOT NULL, > > GenBank_Acc varchar(10), > > Organism varchar(50) NOT NULL, > > Seq_Type enum("EST","GSS","EST > Contig","EST > > Singlet","GSS Contig","GSS Singlet","GSS Plasmid > > Contig","Protein") NOT NULL, > > Seq_Length int NOT NULL, > > Seq_Title text NOT NULL, > > Comment text, > > Entry_Date date NOT NULL, > > PRIMARY KEY (Seq_ID), > > UNIQUE (GenBank_Acc), > > INDEX (Seq_Type), > > INDEX (Organism) > > ); > > > > This NewSequence table is used to track some > general > > info about sequence. Notice I have to use text > > datatype to describe "Comment" and "Seq_Title" > fields; > > therefore I have to use varchar for other string > > fields. In addition, the Seq_ID is not numerical. > > BTW, I found indexing on Seq_Type. Organism which > are > > very repeative still helps with accessing. This > table > > has 2676711 rows. > > > > > > CREATE TABLE NewSequence_Homolog > > ( > > Seq_ID varchar(50) NOT NULL, > > Homolog_PID int NOT NULL, > > Homolog_Desc varchar(50) NOT NULL, > > Homolog_Species varchar(50), > > PRIMARY KEY (Seq_ID, Homolog_PID) > > ); > > > > This NewSequence_Homolog table is to track which > > protein sequences (homolog) are similar to the > > sequence I store in the NewSequence table. This > table > > has 997654 rows. > > > > mysql> select count(*) from NewSequence s left > join > > NewSequence_Homolog h on s.Seq_ID = h.Seq_ID; > > +----------+ > > | count(*) | > > +----------+ > > | 3292029 | > > +----------+ > > 1 row in set (1 min 30.50 sec) > > > > So a simple left join took about 1 min and half. > > First, is this slow or I am too picky? > > > > This is the "Explain". > > mysql> explain select count(*) from NewSequence s > left > > join NewSequence_Homolog h on s.Seq_ID = h.Seq_ID; > > > +-------+-------+---------------+---------+---------+----------+---------+-- > -----------+ > > | table | type | possible_keys | key | > key_len | > > ref | rows | Extra | > > > +-------+-------+---------------+---------+---------+----------+---------+-- > -----------+ > > | s | index | NULL | PRIMARY | > 50 | > > NULL | 2676711 | Using index | > > | h | ref | PRIMARY | PRIMARY | > 50 | > > s.Seq_ID | 9976 | Using index | > > > +-------+-------+---------------+---------+---------+----------+---------+-- > -----------+ > > > > > > I am running MySQL 3.23.49 on RedHat linux 7.3 on > a > > dedicated server with 4 GB memory. The only > setting I > > changed is to copy the my-huge.cnf into > /etc/my.cnf. > > > > Qunfeng > > > > --- "Michael T. Babcock" <[EMAIL PROTECTED]> > > wrote: > > > Qunfeng Dong wrote: > > > > > > >not-so-good performance (join on tables much > > > smaller > > > >than yours takes minutes even using index) and > I > > > seem > > > >to read all the docs I could find on the web > about > > > how > > > >to optimize but they are not working for me (I > am > > > > > > > > > > Have you stored a slow query log to run them > through > > > 'explain' and see > > > why they're slow? Do you want to post some of > them > > > here so we can > > > suggest what might be done to make them faster? > === message truncated === __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php