Re: [GENERAL] Embarassing GROUP question

2009-10-03 Thread Tom Lane
Corey Tisdale co...@eyewantmedia.com writes:
 SELECT
   meaningful_data,
   event_type,
   event_date
 FROM
   event_log
 GROUP BY
   event_type
 ORDER BY
   event_date DESC

Is event_type a primary key, or at least a candidate key, for this
table?  (I would guess not based on the name.)

If it is, then the above is actually well-defined, because there is
only one possible input row for each group.  The GROUP BY is actually
kinda pointless in that case.

If it is not, then the above is *not* well-defined --- there are
multiple possible meaningful_data and event_date values for each
event_type value, and you have absolutely no idea which ones you
will get.  This is not allowed per SQL standard, and MySQL has
done you no service by failing to detect the ambiguity.

What you might be after is something like Postgres' DISTINCT ON
feature, which allows you to resolve the ambiguity by specifying
a sort order for the rows within each group (and then taking the
first row in each group).  See the weather reports example in
our SELECT reference page.

I have never really played around with this aspect of MySQL ...
but looking at this example, and presuming that you find that
it actually does something useful, I wonder whether they interpret
the combination of GROUP BY and ambiguous-per-spec ORDER BY
in some fashion similar to DISTINCT ON.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Embarassing GROUP question

2009-10-03 Thread Sam Mason
On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
 What you might be after is something like Postgres' DISTINCT ON
 feature

Yup, looks that way to me as well.

 I have never really played around with this aspect of MySQL ...

Me neither.

 but looking at this example, and presuming that you find that
 it actually does something useful, I wonder whether they interpret
 the combination of GROUP BY and ambiguous-per-spec ORDER BY
 in some fashion similar to DISTINCT ON.

Yup, does look that way doesn't it.  It's still a weird pair of
semantics to conflate.

Hum, if they were assuming that you'd always have to implement GROUP BY
by doing a sort step first then I can see why they'd end up with this.
But if you want to do *anything* else (i.e. hash aggregate in PG) then
you want to keep the semantics of GROUP BY and ORDER BY separate as the
spec and indeed PG does.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Embarassing GROUP question

2009-10-03 Thread Tom Lane
Sam Mason s...@samason.me.uk writes:
 On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
 but looking at this example, and presuming that you find that
 it actually does something useful, I wonder whether they interpret
 the combination of GROUP BY and ambiguous-per-spec ORDER BY
 in some fashion similar to DISTINCT ON.

 Yup, does look that way doesn't it.  It's still a weird pair of
 semantics to conflate.

I poked around in the MySQL 5.1 manual to see if this is true.
I think it isn't --- it says very clearly here:
http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
that you simply get an arbitrary choice among the possible values
when you reference an ambiguous column.  It's possible that Corey's
query actually does give him the answers he wants, but apparently
it would be an implementation artifact that they're not promising
to maintain.

 Hum, if they were assuming that you'd always have to implement GROUP BY
 by doing a sort step first then I can see why they'd end up with this.

It's worse than that --- they actually are promising that GROUP BY
orders the results!  In
http://dev.mysql.com/doc/refman/5.1/en/select.html
I find

If you use GROUP BY, output rows are sorted according to the
GROUP BY columns as if you had an ORDER BY for the same
columns. To avoid the overhead of sorting that GROUP BY
produces, add ORDER BY NULL:

SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;

MySQL extends the GROUP BY clause so that you can also specify
ASC and DESC after columns named in the clause:

SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;

The first of these examples implies that they allow ORDER BY to override
the default GROUP BY sorting, which would mean that the ORDER BY sort
has to happen after the GROUP BY operation, unlike the approach we take
for DISTINCT ON.  So that means the ORDER BY *isn't* going to affect
which row gets chosen out of each event_type group.

What I am currently betting is that Corey's query does not really do
what he thinks it does in MySQL.  It probably is selecting a random
representative row in each group and then sorting on the basis of the
event_dates in those rows.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Embarassing GROUP question

2009-10-03 Thread Corey Tisdale
We're coming from mysql 4, and changing the sort order changes the  
values of all columns as you would expect, given that you would expect  
a sort statement to affect grouping.  This certainly isn't the only  
time I've used this syntax. I've been mysql user for ten years, and  
the outcome has been consistant across hundreds of tables and millions  
of rows and thousands of queries. If you ever have to use or modify a  
mysql db, just keep this in mind in case it saves you some time.


That being said, we've discovered a few instances where docs were  
wrong, found numerous bugs with bitshifting and blob objects and cache  
usage and io buffering. We even sarted working on our own storage  
engine until we came to our senses and switched RDBMSeses.


5.1 has chased more than a few folks off, and rather than upgrade to  
it, we started porting to postgres. I didn't mean for my comparison to  
appearas a knock against postgres, merely to explain why I was having  
such a problem with such a simple issue. Thanks again for the help.


Corey Tisdale

On Oct 3, 2009, at 3:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:


Sam Mason s...@samason.me.uk writes:

On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:

but looking at this example, and presuming that you find that
it actually does something useful, I wonder whether they interpret
the combination of GROUP BY and ambiguous-per-spec ORDER BY
in some fashion similar to DISTINCT ON.



Yup, does look that way doesn't it.  It's still a weird pair of
semantics to conflate.


I poked around in the MySQL 5.1 manual to see if this is true.
I think it isn't --- it says very clearly here:
http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
that you simply get an arbitrary choice among the possible values
when you reference an ambiguous column.  It's possible that Corey's
query actually does give him the answers he wants, but apparently
it would be an implementation artifact that they're not promising
to maintain.

Hum, if they were assuming that you'd always have to implement  
GROUP BY
by doing a sort step first then I can see why they'd end up with  
this.


It's worse than that --- they actually are promising that GROUP BY
orders the results!  In
http://dev.mysql.com/doc/refman/5.1/en/select.html
I find

   If you use GROUP BY, output rows are sorted according to the
   GROUP BY columns as if you had an ORDER BY for the same
   columns. To avoid the overhead of sorting that GROUP BY
   produces, add ORDER BY NULL:

   SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;

   MySQL extends the GROUP BY clause so that you can also specify
   ASC and DESC after columns named in the clause:

   SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;

The first of these examples implies that they allow ORDER BY to  
override

the default GROUP BY sorting, which would mean that the ORDER BY sort
has to happen after the GROUP BY operation, unlike the approach we  
take

for DISTINCT ON.  So that means the ORDER BY *isn't* going to affect
which row gets chosen out of each event_type group.

What I am currently betting is that Corey's query does not really do
what he thinks it does in MySQL.  It probably is selecting a random
representative row in each group and then sorting on the basis of the
event_dates in those rows.

   regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Embarassing GROUP question

2009-10-03 Thread Martin Gainty

Most Database Administrators dont allow jpg/png/gifs into BLOB columns simply 
because its Run-length
encoding and MUCH easier to store the picture's link e.g. 
http://www.mywebsite.com/PictureOfFido.jpg

Oracle on the other hand can store multi-gb images into blobs then again you're 
paying for that 'luxury'

Also keep in mind Postgres is under BSD license so you're getting what you pay 
for
LegalStuff/

PostgreSQL Database Management System

(formerly known as Postgres, then as Postgres95)



Portions Copyright (c) 1996-2009, The PostgreSQL Global Development Group



Portions Copyright (c) 1994, The Regents of the University of California



Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this
paragraph and the following two paragraphs appear in all copies.



IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.



THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
ON AN AS IS BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
/LegalStuff

what types of caching issues are you experencing?
Martin Gainty 
__ 
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité
 
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger 
sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung 
oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem 
Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. 
Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung 
fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le 
destinataire prévu, nous te demandons avec bonté que pour satisfaire informez 
l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est 
interdite. Ce message sert à l'information seulement et n'aura pas n'importe 
quel effet légalement obligatoire. Étant donné que les email peuvent facilement 
être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité 
pour le contenu fourni.




 From: co...@eyewantmedia.com
 To: t...@sss.pgh.pa.us
 Subject: Re: [GENERAL] Embarassing GROUP question 
 Date: Sat, 3 Oct 2009 16:56:02 -0500
 CC: s...@samason.me.uk; pgsql-general@postgresql.org
 
 We're coming from mysql 4, and changing the sort order changes the  
 values of all columns as you would expect, given that you would expect  
 a sort statement to affect grouping.  This certainly isn't the only  
 time I've used this syntax. I've been mysql user for ten years, and  
 the outcome has been consistant across hundreds of tables and millions  
 of rows and thousands of queries. If you ever have to use or modify a  
 mysql db, just keep this in mind in case it saves you some time.
 
 That being said, we've discovered a few instances where docs were  
 wrong, found numerous bugs with bitshifting and blob objects and cache  
 usage and io buffering. We even sarted working on our own storage  
 engine until we came to our senses and switched RDBMSeses.
 
 5.1 has chased more than a few folks off, and rather than upgrade to  
 it, we started porting to postgres. I didn't mean for my comparison to  
 appearas a knock against postgres, merely to explain why I was having  
 such a problem with such a simple issue. Thanks again for the help.
 
 Corey Tisdale
 
 On Oct 3, 2009, at 3:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Sam Mason s...@samason.me.uk writes:
  On Sat, Oct 03, 2009 at 01:05:49PM -0400, Tom Lane wrote:
  but looking at this example, and presuming that you find that
  it actually does something useful, I wonder whether they interpret
  the combination of GROUP BY and ambiguous-per-spec ORDER BY
  in some fashion similar to DISTINCT ON.
 
  Yup, does look that way doesn't it.  It's still a weird pair of
  semantics to conflate.
 
  I poked around in the MySQL 5.1 manual to see if this is true.
  I think it isn't --- it says very clearly here:
  http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html
  that you simply get an arbitrary choice among the possible values
  when you reference an ambiguous column.  It's possible that Corey's
  query actually does give him the answers he wants, but apparently
  it would be an implementation artifact that they're not promising
  to maintain

Re: [GENERAL] Embarassing GROUP question

2009-10-03 Thread Tom Lane
Corey Tisdale co...@eyewantmedia.com writes:
 We're coming from mysql 4, and changing the sort order changes the  
 values of all columns as you would expect, given that you would expect  
 a sort statement to affect grouping.  This certainly isn't the only  
 time I've used this syntax. I've been mysql user for ten years, and  
 the outcome has been consistant across hundreds of tables and millions  
 of rows and thousands of queries. If you ever have to use or modify a  
 mysql db, just keep this in mind in case it saves you some time.

Okay, I got sufficiently interested to drag out the nearest copy of
mysql and try it ...

mysql create table t (f1 int, f2 int, f3 int);
Query OK, 0 rows affected (0.00 sec)

mysql insert into t values(1,11,111), (1,22,222), (1,44,444), (1,33,333);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql insert into t values(2,55,555), (2,22,222), (2,44,444), (2,33,333);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql insert into t values(3,55,555), (3,22,222), (3,44,444), (3,77,777);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql select * from t group by f1 order by f2;
+--+--+--+
| f1   | f2   | f3   |
+--+--+--+
|1 |   11 |  111 | 
|2 |   55 |  555 | 
|3 |   55 |  555 | 
+--+--+--+
3 rows in set (0.00 sec)

mysql select * from t group by f1 order by f2 desc;
+--+--+--+
| f1   | f2   | f3   |
+--+--+--+
|2 |   55 |  555 | 
|3 |   55 |  555 | 
|1 |   11 |  111 | 
+--+--+--+
3 rows in set (0.00 sec)

Looks to me like we're arbitrarily getting the physically-first row in
each f1 group.  It's certainly not looking for the minimum or maximum f2.

The above is with 5.1.37, but I find essentially the same wording in
the 3.x/4.x manual as in the 5.1 manual.

Now it's certainly possible that in particular circumstances you might
happen to get the right results --- for example, a scan that was using
an index might happen to deliver the rows in the right order.  But I
don't see any evidence that mysql is reliably producing groupwise
minimums or maximums with this syntax.  The long discussions in the
comments here:
http://dev.mysql.com/doc/refman/4.1/en/example-maximum-column-group-row.html
don't suggest that anyone else believes it works, either.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Embarassing GROUP question

2009-10-03 Thread Corey Tisdale
You may have nailed it. Everythig would have been indexed I. The order  
it was grouped by, so perhaps the order in which things are indexed  
and accesse is the kicker, or perhaps we've been consistantly lucky.


We also weren't adding image data to blobs, we were bit mapping  
faceted data to blob and shifting to allow people to shop by artist or  
color or subject matter across millions of posters. Normalized tables  
just weren't cutting it, and bit shifting up to 32 bit was crazy fast.  
After we rolled it out in production, we found mysql converts blobs to  
32 bit unsigned ints before shifting. Postgres appears to not do this  
at all, or our arbitrarily large test data did not trigger it on  
postgres.


After the last few days, it is becoming apparent how much of a joke  
mysql has been. Thanks again for such quick direction!


Corey Tisdale

On Oct 3, 2009, at 5:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:


Corey Tisdale co...@eyewantmedia.com writes:

We're coming from mysql 4, and changing the sort order changes the
values of all columns as you would expect, given that you would  
expect

a sort statement to affect grouping.  This certainly isn't the only
time I've used this syntax. I've been mysql user for ten years, and
the outcome has been consistant across hundreds of tables and  
millions

of rows and thousands of queries. If you ever have to use or modify a
mysql db, just keep this in mind in case it saves you some time.


Okay, I got sufficiently interested to drag out the nearest copy of
mysql and try it ...

mysql create table t (f1 int, f2 int, f3 int);
Query OK, 0 rows affected (0.00 sec)

mysql insert into t values(1,11,111), (1,22,222), (1,44,444),  
(1,33,333);

Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql insert into t values(2,55,555), (2,22,222), (2,44,444),  
(2,33,333);

Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql insert into t values(3,55,555), (3,22,222), (3,44,444),  
(3,77,777);

Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql select * from t group by f1 order by f2;
+--+--+--+
| f1   | f2   | f3   |
+--+--+--+
|1 |   11 |  111 |
|2 |   55 |  555 |
|3 |   55 |  555 |
+--+--+--+
3 rows in set (0.00 sec)

mysql select * from t group by f1 order by f2 desc;
+--+--+--+
| f1   | f2   | f3   |
+--+--+--+
|2 |   55 |  555 |
|3 |   55 |  555 |
|1 |   11 |  111 |
+--+--+--+
3 rows in set (0.00 sec)

Looks to me like we're arbitrarily getting the physically-first row in
each f1 group.  It's certainly not looking for the minimum or  
maximum f2.


The above is with 5.1.37, but I find essentially the same wording in
the 3.x/4.x manual as in the 5.1 manual.

Now it's certainly possible that in particular circumstances you might
happen to get the right results --- for example, a scan that was using
an index might happen to deliver the rows in the right order.  But I
don't see any evidence that mysql is reliably producing groupwise
minimums or maximums with this syntax.  The long discussions in the
comments here:
http://dev.mysql.com/doc/refman/4.1/en/example-maximum-column-group-row.html
don't suggest that anyone else believes it works, either.

   regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Embarassing GROUP question

2009-10-03 Thread Sam Mason
On Sat, Oct 03, 2009 at 06:12:20PM -0500, Corey Tisdale wrote:
 We also weren't adding image data to blobs, we were bit mapping  
 faceted data to blob and shifting to allow people to shop by artist or  
 color or subject matter across millions of posters. Normalized tables  
 just weren't cutting it, and bit shifting up to 32 bit was crazy fast.  

Just out of interest; have you tried PG's support of fancier index
types?  HStore or intarray would appear to help with what you're doing.
Not quite sure what you're actually doing so my guess could be a long
way off!

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general