It's ok, I do actually fully understand relational database theory and what the
group by statement is used for, but like I say, I personally have no control
over the database or the SQL :-)

I was hoping to get hold of someone who maybe knows the source code quite well
already and could  help me out before I take the drastic step of trawling
through the source code myself.

It does strike me that, on the rare occasions when a truly arbitrary decision
must be made, when there is a choice but it really doesn't matter which is
chosen, there is no method in relational databases to make this choice in a
consistent manner.  A tough task probably, and obviously group by is not the
perfect answer, but you can see why the developer made that choice as it does
make mySQL make an arbitrary decision... just not a consistent one :-)

Thanks

Mick

Richard Emery wrote:

> As a relational database, it doe not matter the order in which data are
> stored.
>
> Rather, what matters are the methods by which data are extracted via the
> SELECT statement.  In order to derive benefits of GROUP BY, you must use it
> with SELECTs that use GROUP BY functionality, such as SUM(), COUNT().
>
> If you need to know the internal machinations of mysql, I can only suggest
> that you study the source code.
> ----- Original Message -----
> From: Mick Watson <[EMAIL PROTECTED]>
> To: Richard Emery <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Friday, April 19, 2002 7:34 AM
> Subject: Re: group by help?
>
> Thank you :-)
>
> I do realise the SQL is not being used as it should be, but as it is not my
> database and nor is it my SQL, there's not much I can do to really change
> either of these.... I merely have to work with a database that has been
> produced by a third party group, and I must say at this point that I have
> nothing but praise for the ensembl group for the difficult job that they
> have
> done very well (see http://www.ensembl.org)
>
> So, I realise this is not the problem of mySQL, but what I want to try and
> find out is really the internal workings of mySQL when it performs this
> operation so that I can understand why my copy of ensembl, which is produced
> from direct dumps of the main ensembl, behaves differently to the main
> ensembl.
>
> I guess from what you are saying, that the data is stored randomly, that
> there
> is very little I can do to actually make my database behave the same as the
> main database?  Could it be affected in any way by operating system and/or
> file system?  Superficially the data is organised in exactly the same way in
> both databases, but I have no doubt that things like memory locations are
> completely different, but possibly if I could understand what the variables
> are that affect this behaviour I could minimise the inconsistancy...?
>
> Thanks for your time
>
> Mick
>
> Richard Emery wrote:
>
> > mysql is acting correctly.
> >
> > GROUP BY is used to consolidate data for SUMming, COUNTing, etc.  Your
> > SELECT statement makes not such request.  You have simply requested the
> > value of a specific field.  Data are stored in mysql databases randomly.
> > Therefore, when you request a field's data, you are getting whatever is
> > first in the list of records matching your WHERE clause.
> >
> > Bottom line: you are NOT using GROUP BY as it is supposed to be used.
> Your
> > SQL is in error, not mysql.
> >
> > hope this helps...
> > ----- Original Message -----
> > From: Mick Watson <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Friday, April 19, 2002 6:01 AM
> > Subject: group by help?
> >
> > Hi
> >
> > I am having some trouble figuring out how mySQL interprets the group by
> > clause in a particular set of data that I have (the ensembl database if
> > anyone is familar with it!)
> >
> > Now, I have some data like this:
> >
> > +---------+------------+----------+
> > | gene_id | display_id | db_name  |
> > +---------+------------+----------+
> > |   24173 | Q9H701     | SPTREMBL |
> > |   24173 | Q96GS5     | SPTREMBL |
> > +---------+------------+----------+
> >
> > and this is produced by the following SQL:
> >
> >  select t.gene_id, x.display_id,e.db_name
> >       from ensembl_core_test.objectXref     as ox,
> >             ensembl_core_test.Xref           as x,
> >             ensembl_core_test.transcript     as t,
> >             ensembl_core_test.externalDB     as e
> >      where e.db_name='SPTREMBL' and
> >             x.xrefID         = ox.xrefID and
> >            t.translation_id = ox.ensembl_id and
> >            e.externalDBId=x.externalDBId and gene_id =24173;
> >
> > Now, the SQL is not important, what is is that we have two display_ids
> > for one gene_id.  Now, if we add a "group by gene_id" clause into the
> > above SQL, then presumably mySQL must make an arbitrary decision on
> > which display_id to choose.  And the odd thing is that in the main
> > ensembl database it chooses one, and in my local copy it chooses the
> > other!
> >
> > So what I want to figure out is how mySQL makes that arbitrary decision
> > - is it based on which it comes across first in memory, which it comes
> > across last, alphabetical order, random choice (though mySQL is always
> > consistent in which it chooses) ... or is there some other way it will
> > make the decision?
> >
> > Furthermore, and more confusingly, mySQL chooses differently if I
> > parameterise the SQL.  For example:
> >
> > mysql> create table test
> >     -> select t.gene_id, x.display_id,e.db_name
> >     ->       from ensembl_core_test.objectXref     as ox,
> >     ->            ensembl_core_test.Xref           as x,
> >     ->            ensembl_core_test.transcript     as t,
> >     ->       ensembl_core_test.externalDB     as e
> >     ->      where e.db_name='SPTREMBL' and
> >     ->            x.xrefID         = ox.xrefID and
> >     ->            t.translation_id = ox.ensembl_id and
> >     ->            e.externalDBId=x.externalDBId and gene_id = 24173
> >     ->   group by gene_id
> >     ->   order by gene_id;
> > Query OK, 1 row affected (0.00 sec)
> > Records: 1  Duplicates: 0  Warnings: 0
> >
> > mysql> select * from test;
> > +---------+------------+----------+
> > | gene_id | display_id | db_name  |
> > +---------+------------+----------+
> > |   24173 | Q9H701     | SPTREMBL |
> > +---------+------------+----------+
> > 1 row in set (0.00 sec)
> >
> > compare this to:
> >
> > mysql> create table test
> >     -> select t.gene_id, x.display_id,e.db_name
> >     ->       from ensembl_core_test.objectXref     as ox,
> >     ->            ensembl_core_test.Xref           as x,
> >     ->            ensembl_core_test.transcript     as t,
> >     ->       ensembl_core_test.externalDB     as e
> >     ->      where e.db_name='SPTREMBL' and
> >     ->            x.xrefID         = ox.xrefID and
> >     ->            t.translation_id = ox.ensembl_id and
> >     ->            e.externalDBId=x.externalDBId
> >     ->   group by gene_id
> >     ->   order by gene_id;
> > Query OK, 11674 rows affected (6.84 sec)
> > Records: 11674  Duplicates: 0  Warnings: 0
> >
> > mysql> select * from test where gene_id = 24173;
> > +---------+------------+----------+
> > | gene_id | display_id | db_name  |
> > +---------+------------+----------+
> > |   24173 | Q96GS5     | SPTREMBL |
> > +---------+------------+----------+
> > 1 row in set (0.01 sec)
> >
> > So here we see that mySQL has chosen differently simply because of the
> > presence or absence of the gene_id = 24173 in the create table command
> >
> > I want to try and figure out why mySQL is making these rather
> > inconsistant decisions and see if it's possible to remove this "feature"
> >
> > Thanks for your time, if there is another mailing list which is more
> > appropriate, please tell me!
> >
> > Thanks
> > Mick
> >
> > ---------------------------------------------------------------------
> > Before posting, please check:
> >    http://www.mysql.com/manual.php   (the manual)
> >    http://lists.mysql.com/           (the list archive)
> >
> > To request this thread, e-mail <[EMAIL PROTECTED]>
> > To unsubscribe, e-mail
> > <[EMAIL PROTECTED]>
> > Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail
> <[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to