Thank you :-)

I do realise the SQL is not being used as it should be, but as it is not my
database and nor is it my SQL, there's not much I can do to really change
either of these.... I merely have to work with a database that has been
produced by a third party group, and I must say at this point that I have
nothing but praise for the ensembl group for the difficult job that they have
done very well (see http://www.ensembl.org)

So, I realise this is not the problem of mySQL, but what I want to try and
find out is really the internal workings of mySQL when it performs this
operation so that I can understand why my copy of ensembl, which is produced
from direct dumps of the main ensembl, behaves differently to the main
ensembl.

I guess from what you are saying, that the data is stored randomly, that there
is very little I can do to actually make my database behave the same as the
main database?  Could it be affected in any way by operating system and/or
file system?  Superficially the data is organised in exactly the same way in
both databases, but I have no doubt that things like memory locations are
completely different, but possibly if I could understand what the variables
are that affect this behaviour I could minimise the inconsistancy...?

Thanks for your time

Mick

Richard Emery wrote:

> mysql is acting correctly.
>
> GROUP BY is used to consolidate data for SUMming, COUNTing, etc.  Your
> SELECT statement makes not such request.  You have simply requested the
> value of a specific field.  Data are stored in mysql databases randomly.
> Therefore, when you request a field's data, you are getting whatever is
> first in the list of records matching your WHERE clause.
>
> Bottom line: you are NOT using GROUP BY as it is supposed to be used.  Your
> SQL is in error, not mysql.
>
> hope this helps...
> ----- Original Message -----
> From: Mick Watson <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Friday, April 19, 2002 6:01 AM
> Subject: group by help?
>
> Hi
>
> I am having some trouble figuring out how mySQL interprets the group by
> clause in a particular set of data that I have (the ensembl database if
> anyone is familar with it!)
>
> Now, I have some data like this:
>
> +---------+------------+----------+
> | gene_id | display_id | db_name  |
> +---------+------------+----------+
> |   24173 | Q9H701     | SPTREMBL |
> |   24173 | Q96GS5     | SPTREMBL |
> +---------+------------+----------+
>
> and this is produced by the following SQL:
>
>  select t.gene_id, x.display_id,e.db_name
>       from ensembl_core_test.objectXref     as ox,
>             ensembl_core_test.Xref           as x,
>             ensembl_core_test.transcript     as t,
>             ensembl_core_test.externalDB     as e
>      where e.db_name='SPTREMBL' and
>             x.xrefID         = ox.xrefID and
>            t.translation_id = ox.ensembl_id and
>            e.externalDBId=x.externalDBId and gene_id =24173;
>
> Now, the SQL is not important, what is is that we have two display_ids
> for one gene_id.  Now, if we add a "group by gene_id" clause into the
> above SQL, then presumably mySQL must make an arbitrary decision on
> which display_id to choose.  And the odd thing is that in the main
> ensembl database it chooses one, and in my local copy it chooses the
> other!
>
> So what I want to figure out is how mySQL makes that arbitrary decision
> - is it based on which it comes across first in memory, which it comes
> across last, alphabetical order, random choice (though mySQL is always
> consistent in which it chooses) ... or is there some other way it will
> make the decision?
>
> Furthermore, and more confusingly, mySQL chooses differently if I
> parameterise the SQL.  For example:
>
> mysql> create table test
>     -> select t.gene_id, x.display_id,e.db_name
>     ->       from ensembl_core_test.objectXref     as ox,
>     ->            ensembl_core_test.Xref           as x,
>     ->            ensembl_core_test.transcript     as t,
>     ->       ensembl_core_test.externalDB     as e
>     ->      where e.db_name='SPTREMBL' and
>     ->            x.xrefID         = ox.xrefID and
>     ->            t.translation_id = ox.ensembl_id and
>     ->            e.externalDBId=x.externalDBId and gene_id = 24173
>     ->   group by gene_id
>     ->   order by gene_id;
> Query OK, 1 row affected (0.00 sec)
> Records: 1  Duplicates: 0  Warnings: 0
>
> mysql> select * from test;
> +---------+------------+----------+
> | gene_id | display_id | db_name  |
> +---------+------------+----------+
> |   24173 | Q9H701     | SPTREMBL |
> +---------+------------+----------+
> 1 row in set (0.00 sec)
>
> compare this to:
>
> mysql> create table test
>     -> select t.gene_id, x.display_id,e.db_name
>     ->       from ensembl_core_test.objectXref     as ox,
>     ->            ensembl_core_test.Xref           as x,
>     ->            ensembl_core_test.transcript     as t,
>     ->       ensembl_core_test.externalDB     as e
>     ->      where e.db_name='SPTREMBL' and
>     ->            x.xrefID         = ox.xrefID and
>     ->            t.translation_id = ox.ensembl_id and
>     ->            e.externalDBId=x.externalDBId
>     ->   group by gene_id
>     ->   order by gene_id;
> Query OK, 11674 rows affected (6.84 sec)
> Records: 11674  Duplicates: 0  Warnings: 0
>
> mysql> select * from test where gene_id = 24173;
> +---------+------------+----------+
> | gene_id | display_id | db_name  |
> +---------+------------+----------+
> |   24173 | Q96GS5     | SPTREMBL |
> +---------+------------+----------+
> 1 row in set (0.01 sec)
>
> So here we see that mySQL has chosen differently simply because of the
> presence or absence of the gene_id = 24173 in the create table command
>
> I want to try and figure out why mySQL is making these rather
> inconsistant decisions and see if it's possible to remove this "feature"
>
> Thanks for your time, if there is another mailing list which is more
> appropriate, please tell me!
>
> Thanks
> Mick
>
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail
> <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to