Hi Jeroen,

Have you tried creating a new table by selecting the data that you want and
then removing the old table and replacing it with the new one.

Eg.

CREATE TABLE new_g2d (
        <the precise definition for your g2d table goes here i imagine something
like the following>
        chr char(10) NOT NULL DEFAULT '',
        start int NOT NULL DEFAULT 0,
        stop int NOT NULL DEFAULT 0,
        gene_name char(64) NOT NULL DEFAULT '',
        score decimal(6,3) NOT NULL DEFAULT 000.000,
        strand char(5) NOT NULL DEFAULT '-'
) SELECT
        chr,
        MIN(start),
        MAX(stop),
        gene_name,
        AVG(score),
        strand
FROM
        g2d
GROUP BY gene_name;

ALTER TABLE g2d RENAME TO g2d_old;

ALTER TABLE new_g2d RENAME TO g2d;


When you are satisfied that your new g2d table is what you want you can
safely drop your g2d_old table.

HTH,
Lachlan


-----Original Message-----
From: Jeroen Van Goey [mailto:[EMAIL PROTECTED]
Sent: Wednesday, 11 August 2004 9:16 PM
To: [EMAIL PROTECTED]
Subject: removing duplicates and giving average of other field.


Hi again,

thanks for the quick reply on my previous question, the solution given
by Michael Stassen worked like a charm. But now I've again run into a
situation where I'm stuck with my limited knowledge of MySQL.

Given the table:
mysql> select * from g2d;
+-------+----------+----------+-----------------+---------+--------+
| chr   | start    | stop     | gene_name       | score   | strand |
+-------+----------+----------+-----------------+---------+--------+
| chr13 | 58214248 | 58214733 | ENSG00000139734 | 774.518 | -      |
| chr13 | 58214732 | 58215034 | ENSG00000139734 | 774.518 | -      |
| chr13 | 58215238 | 58215432 | ENSG00000139734 | 774.518 | -      |
| chr13 | 58215021 | 58215182 | ENSG00000139734 | 774.518 | -      |
| chr13 | 58214732 | 58215031 | ENSG00000139734 | 755.930 | -      |
| chr13 | 58214296 | 58214727 | ENSG00000139734 | 755.930 | -      |
| chr13 | 58215045 | 58215098 | ENSG00000139734 | 755.930 | -      |
| chr13 | 65481104 | 65481424 | ENSG00000184226 | 896.699 | -      |
| chr13 | 65481119 | 65481424 | ENSG00000184226 | 809.759 | -      |
| chr13 | 65481116 | 65481355 | ENSG00000184226 | 809.759 | -      |
| chr13 | 65481113 | 65481421 | ENSG00000184226 | 808.113 | -      |
| chr13 | 65481092 | 65481424 | ENSG00000184226 | 796.108 | -      |
| chr13 | 65481110 | 65481424 | ENSG00000184226 | 796.011 | -      |
| chr13 | 65481098 | 65481358 | ENSG00000184226 | 796.011 | -      |
....

I would like to reduce the table, by combining all the records with
the same gene_name. The score field should then give the average of
all the scores for that gene_name.

The table above would become then:

mysql> select * from g2d;
+-------+----------+----------+-----------------+---------+--------+
| chr   | start    | stop     | gene_name       | score   | strand |
+-------+----------+----------+-----------------+---------+--------+
| chr13 | 58214248 | 58215432 | ENSG00000139734 | 766.551 | -      |
| chr13 | 65481092 | 65481424 | ENSG00000184226 | 819.408 | -      |

where 766.551 = (((774.518 * 4) + (755.930 *3)) / 7)
 and 819.408 = ((896.699 + (809.759 *2) + 808.113 + 796.108 + 796.011) / 6)

If there's also an easy way to make the start field the lowest value
found for that gene_name, and the stop field the highest value, that
would be nice, but it isn't that important for my application, so
don't bother to much with it.

Thanks in advance, I appreciate your input very much.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to