Hi,
Sorry for the late reply, out for the week-end!

Here is the information you asked for (I modified the columns' type as you suggested)
mysql> show create table matches\G
*************************** 1. row ***************************
Table: matches
Create Table: CREATE TABLE `matches` (
`protID` varchar(50) default NULL,
`drugID` int(11) default NULL,
`sentID` int(11) default NULL,
KEY `sentenceId` (`sentID`),
KEY `drugID` (`drugID`),
KEY `protID` (`protID`)
) TYPE=MyISAM
1 row in set (0.00 sec)


mysql> show create table sentence\G
*************************** 1. row ***************************
      Table: sentence
Create Table: CREATE TABLE `sentence` (
 `id` int(10) unsigned NOT NULL auto_increment,
 `text` text,
 `pmid` int(11) default NULL,
 PRIMARY KEY  (`id`)
) TYPE=MyISAM
1 row in set (0.00 sec)

mysql> show create table synonyms\G
*************************** 1. row ***************************
      Table: synonyms
Create Table: CREATE TABLE `synonyms` (
 `nameID` varchar(50) default NULL,
 `syn` text,
 UNIQUE KEY `c` (`nameID`(20),`syn`(20))
) TYPE=MyISAM
1 row in set (0.01 sec)

I think I used int as much as possible,here some data samples:

INSERT INTO `matches` VALUES ('Q9UP51', 202531, 4);
INSERT INTO `matches` VALUES ('SVC2_HUMAN', 202037, 5);
INSERT INTO `matches` VALUES ('PF2R_SHEEP', 202096, 6);
INSERT INTO `matches` VALUES ('CAQS_RAT', 202037, 7);


INSERT INTO `sentence` VALUES (4, 'Of <NP>10 contraceptives </NP>tested , <NP><span class=\'drug\'>Ortho-Gynol</span> </NP>was found to be the most rapidly acting , followed by <NP>Cooper Creme </NP>, <a href=\'http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[UNIPROT-acc:(Q9UP51)]+-id+008+-view+UniprotDateView\'>Lactikol- B</a>\n, <NP>Vagi-Serol </NP>, <NP>Marvosan </NP>, <NP>Clinicol </NP>, <NP>Jelly-X </NP>, <NP>Bor-Oxyquin </NP>, <NP>Cellikol </NP>, and <NP>Lanteen Blue Jelly </NP>in <NP>that order </NP>. ', 12305459);


(the text is usually longer than 255 characters, so I think text is the only choice, except longtext which can only be worst for indexation in my opinion)

INSERT INTO `synonyms` VALUES ('202037', 'testosterone');
INSERT INTO `synonyms` VALUES ('Q91XV1', 'gonadotropin');
INSERT INTO `synonyms` VALUES ('202037', 'estradiol');
INSERT INTO `synonyms` VALUES ('Q91XV1', 'gonadotropin-suppressing');

I tried to run the query you gave me (with temporary tables) it is still too long so I guess that my only solution now is to use indexes.
When modifying the columns' type from text to varchar, even if the text was indexed as unique mysql complained about duplicates, and actually there were few duplicates, I thought it was as you said because only the beginning of the text field is indexed but I had 2 cases were the names were short (less than 15 characters) and the same (even the spaces).Removing one and the index on varchar was created. Any idea why?


I will try to play with the indexes, if I understand well I'd better index the three columns in once, because that will automatically index each of them?or am I wrong?

Anyway, thank you for all your advices, I'm really learning a lot of things with that case!
Melanie






From: [EMAIL PROTECTED]
To: "mel list_php" <[EMAIL PROTECTED]>
CC: mysql@lists.mysql.com
Subject: Re: help on query/group by
Date: Fri, 18 Mar 2005 12:43:06 -0500

"mel list_php" <[EMAIL PROTECTED]> wrote on 03/18/2005 10:35:30 AM:

> Hi again,
>
> Thanks for the explanation about the join and the group by.
>
> I wanted to test your query (almost a simple copy/paste :-)) ).
>
> The first 2 queries are ok, but the third one still is too long :
> mysql> CREATE  TEMPORARY  TABLE tmpSynDrug(  KEY ( nameID (20))  )
SELECT
> nameID, max( syn )  AS drugSyn
>     -> FROM synonyms
>     -> WHERE syn
>     -> LIKE  'a%'
>     -> GROUP  BY nameID;
> Query OK, 9693 rows affected (1.07 sec)
> Records: 9693  Duplicates: 0  Warnings: 0
>
> mysql>
> mysql> CREATE TEMPORARY TABLE tmpSynProt (key (nameID(20)))
>     -> SELECT nameID, max(syn) as protSyn
>     -> FROM synonyms
>     -> WHERE syn LIKE 'a%'
>     -> GROUP BY nameID;
> Query OK, 9693 rows affected (1.03 sec)
> Records: 9693  Duplicates: 0  Warnings: 0
>
> mysql>
> mysql> CREATE  TEMPORARY  TABLE tmpMatch(  KEY ( sentID)  ) SELECT
m.sentID,
> m.drugID, m.protID, tsd.drugSyn, tsp.protSyn
>     -> FROM matches m
>     -> INNER  JOIN tmpSynDrug tsd ON tsd.nameID = m.drugID
>     -> INNER  JOIN tmpSynProt tsp ON tsp.nameID = m.protID;
>
>
> I've tried once this morning (GMT time), it ran for one hour nothing I
> killed mysql (btw, how can I kill only one query when mysql "freezes"?I
> aborted but then mysqladmin shutdown didn't work anymore..)
> I thought it was maybe because I have other heavy stuff running, but I
tried
> a second time now and it's been running for 2 hours now, with almost
nothing
> else on the desktop.
> So apparently the join between matches and the other tables is still too

> heavy....
> Any idea?
<snip>

OK, I reviewed what you have posted so far and I found a performance
killer.  On the table "matches", the columns protID and drugID are
declared as text. This is "bad" for searching as you can only index the
first portion of any text column. Those columns should be declared as CHAR
 or VARCHAR or better yet, some integer value. If all 3 columns in the
"matches" table are integers (INT or  BIGINT, preferably UNSIGNED) then
this becomes a fixed-width table and lookups become exceedingly fast. All
indexes on those columns also become number-based and numeric comparisons
occur *much* faster than string comparisons. I very rarely use non-numeric
primary keys for just this reason.

You mentioned there were indexes on the table and provided the output of
DESC for the table  but DESC does a very poor job of actually describing
indexes. I prefer the output of SHOW CREATE TABLE xxxx\G  as it gives me a
complete table creation statement(Use /G and not ; to eliminate a lot of
excess formatting in the output). Can you generate that for me, please? I
practically guarantee that if we re-tool that table (including the
indexes), our query times will drop like rocks.

You can kill a single query through the commands SHOW [FULL] PROCESSLIST
(to identify the # of the process you want to kill) and KILL # (using the
# you just looked up). This usually drops the connection to the client
running the query you killed, too (so be prepared to reconnect).

http://dev.mysql.com/doc/mysql/en/show-processlist.html
http://dev.mysql.com/doc/mysql/en/kill.html

I just realized that we are only querying for the search conditions DRUG
like 'a%' AND PROTIEN like 'a%'. If we wanted to do the OR of that (DRUG
like ... OR Protien like ...) we could use a UNION query to generate
tmpMatch

CREATE TEMPORARY TABLE tmpMatch (key sentID)
(SELECT m.sentID, m.drugID, m.protID, tsd.drugSyn, tsp.protSyn
FROM matches m
INNER JOIN tmpSynDrugs tsd
        ON tsd.nameID = m.drugID
INNER JOIN tmpSynProt tsp
        ON tsp.nameID = m.protID)
UNION
(SELECT m.sentID, m.drugID, m.protID, tsd.drugSyn, tsp.protSyn
FROM matches m
INNER JOIN tmpSynDrugs tsd
        ON tsd.nameID = m.drugID
INNER JOIN tmpSynProt tsp
        ON tsp.nameID = m.protID);

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

_________________________________________________________________
It's fast, it's easy and it's free. Get MSN Messenger today! http://www.msn.co.uk/messenger



-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to