Hi

I am having some trouble figuring out how mySQL interprets the group by
clause in a particular set of data that I have (the ensembl database if
anyone is familar with it!)

Now, I have some data like this:

+---------+------------+----------+
| gene_id | display_id | db_name  |
+---------+------------+----------+
|   24173 | Q9H701     | SPTREMBL |
|   24173 | Q96GS5     | SPTREMBL |
+---------+------------+----------+

and this is produced by the following SQL:

 select t.gene_id, x.display_id,e.db_name
      from ensembl_core_test.objectXref     as ox,
            ensembl_core_test.Xref           as x,
            ensembl_core_test.transcript     as t,
            ensembl_core_test.externalDB     as e
     where e.db_name='SPTREMBL' and
            x.xrefID         = ox.xrefID and
           t.translation_id = ox.ensembl_id and
           e.externalDBId=x.externalDBId and gene_id =24173;

Now, the SQL is not important, what is is that we have two display_ids
for one gene_id.  Now, if we add a "group by gene_id" clause into the
above SQL, then presumably mySQL must make an arbitrary decision on
which display_id to choose.  And the odd thing is that in the main
ensembl database it chooses one, and in my local copy it chooses the
other!

So what I want to figure out is how mySQL makes that arbitrary decision
- is it based on which it comes across first in memory, which it comes
across last, alphabetical order, random choice (though mySQL is always
consistent in which it chooses) ... or is there some other way it will
make the decision?

Furthermore, and more confusingly, mySQL chooses differently if I
parameterise the SQL.  For example:

mysql> create table test
    -> select t.gene_id, x.display_id,e.db_name
    ->       from ensembl_core_test.objectXref     as ox,
    ->            ensembl_core_test.Xref           as x,
    ->            ensembl_core_test.transcript     as t,
    ->       ensembl_core_test.externalDB     as e
    ->      where e.db_name='SPTREMBL' and
    ->            x.xrefID         = ox.xrefID and
    ->            t.translation_id = ox.ensembl_id and
    ->            e.externalDBId=x.externalDBId and gene_id = 24173
    ->   group by gene_id
    ->   order by gene_id;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from test;
+---------+------------+----------+
| gene_id | display_id | db_name  |
+---------+------------+----------+
|   24173 | Q9H701     | SPTREMBL |
+---------+------------+----------+
1 row in set (0.00 sec)


compare this to:


mysql> create table test
    -> select t.gene_id, x.display_id,e.db_name
    ->       from ensembl_core_test.objectXref     as ox,
    ->            ensembl_core_test.Xref           as x,
    ->            ensembl_core_test.transcript     as t,
    ->       ensembl_core_test.externalDB     as e
    ->      where e.db_name='SPTREMBL' and
    ->            x.xrefID         = ox.xrefID and
    ->            t.translation_id = ox.ensembl_id and
    ->            e.externalDBId=x.externalDBId
    ->   group by gene_id
    ->   order by gene_id;
Query OK, 11674 rows affected (6.84 sec)
Records: 11674  Duplicates: 0  Warnings: 0

mysql> select * from test where gene_id = 24173;
+---------+------------+----------+
| gene_id | display_id | db_name  |
+---------+------------+----------+
|   24173 | Q96GS5     | SPTREMBL |
+---------+------------+----------+
1 row in set (0.01 sec)


So here we see that mySQL has chosen differently simply because of the
presence or absence of the gene_id = 24173 in the create table command

I want to try and figure out why mySQL is making these rather
inconsistant decisions and see if it's possible to remove this "feature"

Thanks for your time, if there is another mailing list which is more
appropriate, please tell me!

Thanks
Mick


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to