Hi I am having some trouble figuring out how mySQL interprets the group by clause in a particular set of data that I have (the ensembl database if anyone is familar with it!)
Now, I have some data like this: +---------+------------+----------+ | gene_id | display_id | db_name | +---------+------------+----------+ | 24173 | Q9H701 | SPTREMBL | | 24173 | Q96GS5 | SPTREMBL | +---------+------------+----------+ and this is produced by the following SQL: select t.gene_id, x.display_id,e.db_name from ensembl_core_test.objectXref as ox, ensembl_core_test.Xref as x, ensembl_core_test.transcript as t, ensembl_core_test.externalDB as e where e.db_name='SPTREMBL' and x.xrefID = ox.xrefID and t.translation_id = ox.ensembl_id and e.externalDBId=x.externalDBId and gene_id =24173; Now, the SQL is not important, what is is that we have two display_ids for one gene_id. Now, if we add a "group by gene_id" clause into the above SQL, then presumably mySQL must make an arbitrary decision on which display_id to choose. And the odd thing is that in the main ensembl database it chooses one, and in my local copy it chooses the other! So what I want to figure out is how mySQL makes that arbitrary decision - is it based on which it comes across first in memory, which it comes across last, alphabetical order, random choice (though mySQL is always consistent in which it chooses) ... or is there some other way it will make the decision? Furthermore, and more confusingly, mySQL chooses differently if I parameterise the SQL. For example: mysql> create table test -> select t.gene_id, x.display_id,e.db_name -> from ensembl_core_test.objectXref as ox, -> ensembl_core_test.Xref as x, -> ensembl_core_test.transcript as t, -> ensembl_core_test.externalDB as e -> where e.db_name='SPTREMBL' and -> x.xrefID = ox.xrefID and -> t.translation_id = ox.ensembl_id and -> e.externalDBId=x.externalDBId and gene_id = 24173 -> group by gene_id -> order by gene_id; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> select * from test; +---------+------------+----------+ | gene_id | display_id | db_name | +---------+------------+----------+ | 24173 | Q9H701 | SPTREMBL | +---------+------------+----------+ 1 row in set (0.00 sec) compare this to: mysql> create table test -> select t.gene_id, x.display_id,e.db_name -> from ensembl_core_test.objectXref as ox, -> ensembl_core_test.Xref as x, -> ensembl_core_test.transcript as t, -> ensembl_core_test.externalDB as e -> where e.db_name='SPTREMBL' and -> x.xrefID = ox.xrefID and -> t.translation_id = ox.ensembl_id and -> e.externalDBId=x.externalDBId -> group by gene_id -> order by gene_id; Query OK, 11674 rows affected (6.84 sec) Records: 11674 Duplicates: 0 Warnings: 0 mysql> select * from test where gene_id = 24173; +---------+------------+----------+ | gene_id | display_id | db_name | +---------+------------+----------+ | 24173 | Q96GS5 | SPTREMBL | +---------+------------+----------+ 1 row in set (0.01 sec) So here we see that mySQL has chosen differently simply because of the presence or absence of the gene_id = 24173 in the create table command I want to try and figure out why mySQL is making these rather inconsistant decisions and see if it's possible to remove this "feature" Thanks for your time, if there is another mailing list which is more appropriate, please tell me! Thanks Mick --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php