Re: [SQL] Time differences between rows, not columns?

2005-08-30 Thread Anthony Molinaro
Amit,
  You say "I want to know how much time it took for a bill to be
accepted after it was submitted"

So, do you want between 10 and 40, not 10 and 20?
I assume you meant 10 and 40.
Ok, there's a few approaches to your questions, first
To get "how much time it took for a bill to be accepted after
it was submitted":

select bill_id,
   (max(case when status_id = 40 
then statustime end) - 
max(case when status_id = 10 
then statustime end))
  from bill_status 
 where status_id in ( 10,40 )
 group by bill_id;

that will give you the amt of time (simple date arithmetic, so, in days)
from submission to being accepted.

You also, want a simple distribution, you can use the technique above
But with SUM:

select sum(case when diff < 7 then 1 else 0 end) as lt_7,
   sum(case when diff between 7 and 15 then 1 else 0 end) as
btw_7_15,
   sum(case when diff > 15 then 1 else 0 end) as gt_15
  from (
select bill_id,
   (max(case when status_id = 40 
then statustime end) - 
max(case when status_id = 10 
then statustime end)) as diff
  from bill_status 
 where status_id in ( 10,40 )
 group by bill_id
   ) x;


I have a recipe in my upcoming book ('The SQL Cookbook' by O'Reilly)
So, I'd like to know if what I suggested works out for you.
Obviously, you'll need whatever tweaks to make it perfect for your
system, the feedback I am concerned about is the technique.

hope that helps,
  Anthony

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 5:45 AM
To: pgsql-sql@postgresql.org
Subject: [SQL] Time differences between rows, not columns?
Importance: High

Using postgresSQL 8.0 for windows, running on windows 2003 server 16gb
RAM, 3Ghz dual p4.
Language: Java - JDBC postgres driver: postgresql-8.0-310.jdbc3.jar


I have an application with 3 tables (in this context that is)


Table bills

  bill_id NOT NULL serial
  ... And other columns


Table bill_status

  bill_id (references bills.bill_id)
  statusid int4 (references bill_statuslookup.statusid)
  statustime datetime


Table bill_statuslookup

  statusid serial not null
  statusname varchar(255)




The application basically tracks a workflow of bills flowing fromone
department to another.
Everytime the bill moves one step, an entry is made into the bill_status
table.

Eg.
Bills table 
-
Bill_id otherfield1

1   


Bill_status table:
-
Bill_id statusidstatustime

1 10 2005-04-04 00:34:31
1 20 2005-04-05 00:55:00


Bill_statuslookup table:
-
Statusid  Statusname

10submitted
20received
30rejected
40accepted
..
..



Now my problem is this:

1. Find the time taken for each bill to reach from status 10 to status
20 , given the time of status 10 should be between t1 and t2.

Eg I want to know how much time it took for a bill to be accepted after
it was submitted (criteria: submitted between yesterday and today)

2. I want to know how many bills took <7 days, how many tok 7-15 days,
how many took >15 days etc.


The status is a lookup table because the workflow constantly changes,
and I can't have submitted_on, recd_on accepted_on times etc in the main
bills table as columns because there are way too many statuses in the
life of a bill (read invoice).


Hope its clear as to what I'm looking for.
Any help is greatly appreciated!!

Regards,
Amit

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] Time differences between rows, not columns?

2005-08-30 Thread Anthony Molinaro
Amit,
  Glad it worked out :)

As for the performance, lemme say that while I'm a huge fan
Of postgres, my experience in regards to optimization in a production
Environment is limited to Oracle and DB2.

In oracle for example, if you have an index on a numeric field
and perform min/max on it, there's a very cool algorithm
that allows very fast retrieval of that data (basically the index
scan is optimized cuz oracle knows you want only the extreme values).
So, on oracle I'd use the MAX and SUM versions I suggested because
There's a specific optimization to facilitate it and the query, since
It accesses the table only once, is doing less logical reads.

So, at the very least, in the version I suggested, since you are not
Joining, you are performing less logical reads, which is always good.
But, ultimately, you have to test and see what works for you.
The self join might be just fine.
10k rows should not be a problem for either method, 
assuming you have an index on statusid and bill_id.

Give it a spin and update this thread.

Regards,
  Anthony

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 30, 2005 8:08 AM
To: Anthony Molinaro; pgsql-sql@postgresql.org
Subject: RE: [SQL] Time differences between rows, not columns?
Importance: High

Thanks!

Amit,You say "I want to know how much time it took for a bill to
be accepted after it was submitted" So, do you want between 10 and 40,
not 10 and 20? I assume you meant 10 and 40... 
...Could be any status to any status, I wanted to generalize the concept
for future usage.


I tried the below:
1.

select bill_id,(max(case when statusid = 40 then statustime end) -
max(case when statusid = 10 then statustime end)) from
ap.vits_statusupdate where statusid in ( 10,40 ) group by bill_id; 
Took 16 ms...

But meanwhile I also tried:
select a.bill_id, b.statustime-a.statustime from ap.vits_statusupdate a,
ap.vits_statusupdate b where a.statusid = 10 and b.statusid = 40 and
a.bill_id = b.bill_id

Took 15 ms

The second query is taking approximately 1 ms lesser time.. (have tried
about 30 times in pgadmin)

Got about 10 records in my test resultset.

Surprisingly, if I add the order by bill_id cluase at the end of both
queries, then your query performs 1 ms faster than mine, Don't know why,
Is there any performance issues if this is run over 1+ rows?
Which (yours or mine?) do you think will be faster, and more efficient?

2.

select sum(case when diff < 7 then 1 else 0 end) as lt_7,sum(case when
diff between 7 and 15 then 1 else 0 end) as btw_7_15,sum(case when diff
> 15 then 1 else 0 end) as gt_15
  from (select bill_id,(max(case when statusid = 40 then statustime end)
- max(case when statusid = 10 then statustime end)) as diff  from
ap.vits_statusupdate 
 where statusid in ( 10,40 ) group by bill_id) x;

Tried this one too with both the queries in the sub-select (yours and
mine), and works perfectly well for me, with accurate answers, and
exactly what I'm looking for! shows 0 ms (??)..

Next question.. Which one should I use? The joined query? Or the max()
query?

Thanks a lot for your time!

Regards,
Amit



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Anthony Molinaro
Sent: Tuesday, August 30, 2005 4:59 PM
To: Wadhwa, Amit; pgsql-sql@postgresql.org
Subject: Re: [SQL] Time differences between rows, not columns?

Amit,
  You say "I want to know how much time it took for a bill to be
accepted after it was submitted"

So, do you want between 10 and 40, not 10 and 20?
I assume you meant 10 and 40.
Ok, there's a few approaches to your questions, first
To get "how much time it took for a bill to be accepted after
it was submitted":

select bill_id,
   (max(case when status_id = 40 
then statustime end) - 
max(case when status_id = 10 
then statustime end))
  from bill_status 
 where status_id in ( 10,40 )
 group by bill_id;

that will give you the amt of time (simple date arithmetic, so, in days)
from submission to being accepted.

You also, want a simple distribution, you can use the technique above
But with SUM:

select sum(case when diff < 7 then 1 else 0 end) as lt_7,
   sum(case when diff between 7 and 15 then 1 else 0 end) as
btw_7_15,
   sum(case when diff > 15 then 1 else 0 end) as gt_15
  from (
select bill_id,
   (max(case when status_id = 40 
then statustime end) - 
max(case when status_id = 10 
then statustime end)) as diff
  from bill_status 
 where status_id in ( 10,40 )
 group by bill_id
   ) x;


I have a recipe in my upcoming book ('The SQL Cookbook' by O'Reilly)
So, I'd like to know if what I suggested works out for you.
Obviously, you'll need whatever tweaks to make it perfect for your
system, the feedback I am concerned about is the technique.

hope 

Re: [SQL] how to replace

2005-09-08 Thread Anthony Molinaro
Michael,
  You practically solved it yourself in the subject of the email ;)

 select replace('abcd','b','') from your_table;

Hope that helps,
  Anthony

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of "Michael Höller"
Sent: Thursday, September 08, 2005 5:25 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] how to replace

Hello,

I initially thought this is simple.. I want to relpace a character to
nothing. Eg. relace "B" to "" ->  ABCD to ACD.

All me approches vaild but I am sure that I have seen it already and
think it was not tricky..

Can someone please help me ?

Thanks a lot 
Michael




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] how to do 'deep queries'?

2005-09-26 Thread Anthony Molinaro








that query is 100% correct.

 

it’s just an equijoin (a type of
inner join) between 3 tables.

 

the syntax you show is how queries should
be written and is more

representative of what a joins between
relations really are:

Cartesian products with filters applied

 

the ansi syntax, the explicit JOIN …
ON  stuff is (imho) unnecessary,

useful only for outer joins since all the
vendors did it differently.

 

what you have will work for postgreSQL, I used
the syntax you show in my book

for every single join recipe except for
outjoins.

 

are you seeing errors?

 

regards,

 
Anthony

-Original Message-
From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of jeff sacksteder
Sent: Monday, September 26, 2005 8:34 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] how to do 'deep
queries'?

 

Is there supported syntax to do 'deep' queries? That
is where A relates to B relates to C, returning fields from each table?

This doesn't seem to work. Is there a google-able term for this sort of query?

select 
   foo.aaa,
   bar.bbb,
   baz.ccc

from
   foo,bar,baz

where 
   foo.bar_id = bar.id
and
   bar.baz_id = baz.id











Re: [SQL] how to do 'deep queries'?

2005-09-27 Thread Anthony Molinaro
Daryl,
 
> Whether you feel that is unnecessary or not, it *is* the ANSI Standard

> and is thus, by definition, "how queries should be written."  
  
I disagree 100%.  Oracle and db2 introduced window functions years
before
Ansi added them. Should we not have used them? It absurd to avoid using
a feature cuz it's not ansi.

Honestly, Don't be a slave to ansi, you miss out on all the great
vendor specific functionality *that you're already paying for*

> it was added to make the *intention* of the query clearer.

More clearer to whom? 

Certainly not developers who have been working for many years 
using the old syntax. 

The intention of the old syntax is perfect. Realize that the problem is
not the old syntax, the problem is the watered down database field
today. 
I see this more and more with each interview I conduct looking 
for dba's and developers.

You know, it used to be that database developers 
had a solid background in math and relational theory.   
Sadly, that's not the case anymore...

   select * from a,b where a.id=b.id  

Suggests a Cartesian product between two relations then a filter to keep
only matching rows. 

That's a join. And that syntax is a *perfect* representation of it.

So to whom is ansi more clear? To the person who knows nothing about
databases and decided one day to get a certification and call themselves
an expert?

Or maybe the person who decided one day on a whim to get into databases
and not realize that tons of code from the prior decade use the old
style syntax?

> Because others are likely to read your query many more times than you 
> write it, clarity of intent *is* important.

I've never worked in a place that used ANSI only syntax and I've never
had a problem with clarity nor any developers I've worked with.
So, I don't at all get what you're saying...

Old style is short and sweet and perfect. 
Ansi dumbed it down, that's the bottom line.
And for people who've been developing for sometime,
It's wholly unnecessary.

Regards,
  Anthony

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Daryl Richter
Sent: Tuesday, September 27, 2005 9:24 AM
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] how to do 'deep queries'?

Anthony Molinaro wrote:
> that query is 100% correct.
>  
> it's just an equijoin (a type of inner join) between 3 tables.
>  
> the syntax you show is how queries should be written and is more
> representative of what a joins between relations really are:
> Cartesian products with filters applied
>  
> the ansi syntax, the explicit JOIN ... ON  stuff is (imho)
unnecessary,
> useful only for outer joins since all the vendors did it differently.
>

Whether you feel that is unnecessary or not, it *is* the ANSI Standard 
and is thus, by definition, "how queries should be written."

In addition to cleaning up the outer join issue, it was added to make 
the *intention* of the query clearer.

Because others are likely to read your query many more times than you 
write it, clarity of intent *is* important.

> what you have will work for postgreSQL, I used the syntax you show in
my
> book
> for every single join recipe except for outjoins.
>  
> are you seeing errors?
>  
> regards,
>   Anthony

[original snipped]

-- 
Daryl

"We want great men who, when fortune frowns, will not be discouraged."
 -- Colonel Henry Knox, 1776


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] how to do 'deep queries'?

2005-09-27 Thread Anthony Molinaro
> Well, perhaps you will one day and a developer will hose your server 
> with a "accidental" cross join and then you will understand.

Hehe :)) 

hey man, that's what testing and code review is all about
(dev teams still do that don't they?)

Accidental cartesians don't get to production ;)

Regards,
  Anthony

-Original Message-
From: Daryl Richter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 27, 2005 11:35 AM
To: Anthony Molinaro
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] how to do 'deep queries'?

Anthony Molinaro wrote:
 > Daryl,
 >
 >
 >>Whether you feel that is unnecessary or not, it *is* the ANSI
Standard
 >
 >
 >>and is thus, by definition, "how queries should be written."
 >
 >
 > I disagree 100%.  Oracle and db2 introduced window functions years
 > before
 > Ansi added them. Should we not have used them? It absurd to avoid
using
 > a feature cuz it's not ansi.
 >

Of course it would be absurd, I have not suggested otherwise.  Joins are

not a *new* feature.

 > Honestly, Don't be a slave to ansi, you miss out on all the great
 > vendor specific functionality *that you're already paying for*
 >
 >
 >>it was added to make the *intention* of the query clearer.
 >
 >
 > More clearer to whom?
 >
 > Certainly not developers who have been working for many years
 > using the old syntax.
 >
 > The intention of the old syntax is perfect. Realize that the problem
is
 > not the old syntax, the problem is the watered down database field
 > today.
 > I see this more and more with each interview I conduct looking
 > for dba's and developers.
 >

I generally agree with your assessment of the state of database 
knowledge (particularly re developers).  It is, however, the reality we 
live in.

[snipped nostalgia and back-patting]

 > I've never worked in a place that used ANSI only syntax and I've
never
 > had a problem with clarity nor any developers I've worked with.
 > So, I don't at all get what you're saying...

 > Old style is short and sweet and perfect.
 > Ansi dumbed it down, that's the bottom line.
 > And for people who've been developing for sometime,
 > It's wholly unnecessary.
 >

Well, perhaps you will one day and a developer will hose your server 
with a "accidental" cross join and then you will understand.

But hopefully not.  ;)

 > Regards,
 >   Anthony
 >

[rest snipped]

-- 
Daryl
Director of Technology

(( Brandywine Asset Management  )
  ( "Expanding the Science of Global Investing"  )
  (  http://www.brandywine.com   ))



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Selecting records not present in related tables

2005-10-06 Thread Anthony Molinaro








Jeziel,

 
there are a couple techniques you can try, two I
like are set difference and anti-joins.

 

here’s the set diff:

 

select id 

  from messages 

except 

select id 

  from usermessages

 

that will returns all id from messages not in usermessages

 

if ID is indexed on both tables, you may wanna try an anti
join:

 

select m.id

  from messages m


   left join 

   usermessages um

   
on ( m.id = um.id )

 where um.id is null

 

 

both techniques can be visciously efficient.

 

good luck,

 
Anthony  

  


 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of Hector Rosas
Sent: Thursday, October 06, 2005
3:44 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] Selecting records
not present in related tables

 

Hello, I'm trying to select records in a table not
present in a related table, in example, I've a table with message information
(subject, message, date, etc) and another (usermessages) with where user(s) has
that message, its state, etc. Records in this table will be deleted in a
certain time (just some extra info). 
I want to select messages records that aren't present in the other table
(usermessages), I got the next two queries, maybe someone can suggest a better
one.

SELECT
m.id FROM messages AS m
WHERE (SELECT count(um.*) FROM usermessages AS um WHERE um.idmessage=m.id )=0;

SELECT m.id FROM messages AS m where id NOT IN (select
um.idmessage FROM usermessages um);

Both queries work, but doing a EXPLAIN ANALYZE I got the next results.

bd=# explain
analyze SELECT m.id FROM messages AS m 
bd-# WHERE (SELECT count(um.*) FROM usermessages AS um WHERE
um.idmessage=m.id)=0;
   
QUERY
PLAN   

---

 Seq Scan on messages m  (cost=0.00..3915.75 rows=3 width=4) (actual
time=40.531..40.531 rows=0 loops=1)
   Filter: ((subplan) = 0)
   SubPlan
 ->  Aggregate  (cost=9.11..9.11 rows=1
width=4) (actual time=0.098..0.104 rows=1 loops=355)
   ->  Index
Scan using message_selection on usermessages um  (cost=0.00..9.10 rows=3
width=4) (actual time=0.067..0.078 rows=1 loops=355)

Index Cond: (idmessage = $0)
 Total runtime: 40.605 ms
(7 rows)
 
bd=# explain analyze select m.id FROM messages AS m
where id NOT IN (select um.idmessage FROM usermessages um);
 
QUERY PLAN
--

 Seq Scan on messages m  (cost=9.68..43.00 rows=213 width=4) (actual
time=20.329..20.329 rows=0 loops=1)
   Filter: (NOT (hashed subplan))
   SubPlan
 ->  Seq Scan on usermessages um 
(cost=0.00..8.54 rows=454 width=4) (actual time=0.008..13.094 rows=454 loops=1)
 Total runtime: 20.386 ms
(5 rows)

In first query, cost can be between 0 and almost 4 sec, and also I see that
loops value, that I don't know what performance issues could arise.
In second query, I see a seq scan, which I don't like, I think that with too
many records this query could take ages, or maybe not, but loops value is 1. 

I hope someone can give some advice with those queries , or maybe a better
query. I've not decided which query I'm going to use, thanks!

Jeziel.








Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-11 Thread Anthony Molinaro
You're 100% correct, this is a bug in mysql.

Sadly, they tout this as a feature!

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Rick Schumeyer
Sent: Tuesday, October 11, 2005 5:12 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] pg, mysql comparison with "group by" clause

I'm not sure what I was thinking, but I tried the following query in pg:

SELECT * FROM t GROUP BY state;

pg returns an error.

Mysql, OTOH, returns the first row for each state.  (The first row with
"AK", the first row with "PA", etc.)  

I'm no SQL expert, but it seems to me that the pg behavior is correct,
and
the mysql result is just weird.  Am I correct?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Anthony Molinaro
> Not always, but I'd rather get the right answer with difficulty than
the
wrong one with ease. :)

agreed. 

I made it a point to mention this so called "feature" in my book.

This is a bug they never fixed and they decided to call it a feature.

It is, imo, *ridiculous*.

Regards,
  Anthony

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Scott Marlowe
Sent: Wednesday, October 12, 2005 6:25 PM
To: Greg Stark
Cc: Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

On Wed, 2005-10-12 at 16:54, Greg Stark wrote:
> Stephan Szabo <[EMAIL PROTECTED]> writes:
> 
> > On Tue, 11 Oct 2005, Rick Schumeyer wrote:
> > 
> > > I'm not sure what I was thinking, but I tried the following query
in pg:
> > >
> > > SELECT * FROM t GROUP BY state;
> > >
> > > pg returns an error.
> > >
> > > Mysql, OTOH, returns the first row for each state.  (The first row
with
> > > "AK", the first row with "PA", etc.)
> > >
> > > I'm no SQL expert, but it seems to me that the pg behavior is
correct, and
> > > the mysql result is just weird.  Am I correct?
> > 
> > In your case, it sounds like the mysql result is wrong. I believe
SQL99
> > would allow it if the other columns were functionally dependant upon
state
> > (as there'd by definition only be one value for the other columns
per
> > group).
> 
> I believe this is a documented feature.

Hehe.  When I turn on my windshield wipers and my airbag deploys, is it
a documented "feature" if the dealership told me about this behaviour
ahead of time?  In much the same way, while this behaviour may be
documented by MySQL, I can't imagine it really being called a feature. 
But at least this misbehaviour is documented.  However, I think most
people in the MySQL universe just stumble onto it by accident when they
try it and it works.  I'd at least prefer it to throw a warning or
notice or something.

> MySQL treats "select a,b from t group by a" equivalently to Postgres's

> "select distinct on (a) a,b from t"
> 
> I suppose "equivalent" isn't quite true. It's more general since it
allows
> aggregate functions as well. The equivalently general Postgres syntax
is to
> have a first() aggregate function and do "select a,first(b) from t
group by a".

A Subselect would let you do such a thing as well, and while it's more
complicated to write, it is likely to be easier to tell just what it's
doing.

> I'm sure it's very convenient.

Not always, but I'd rather get the right answer with difficulty than the
wrong one with ease. :)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Anthony Molinaro
> I don't see why you think people stumble on this by accident. I think
it's
> actually an extremely common need.

I can't imagine how that's a common need at all.

It makes no sense.

When you add an additional column in the select, it must be included in
the group by as it changes the meaning of the query.

Consider:

select deptno, count(*) from emp group by deptno;

DEPTNO   COUNT(*)
-- --
10  3
20  5
30  6


the query above counts the number of employees in each department.


Now consider the following query:

select deptno,job,count(*) from emp group by deptno,job;

DEPTNO JOB COUNT(*)
-- - --
10 CLERK  1
10 MANAGER1
10 PRESIDENT  1
20 CLERK  2
20 ANALYST2
20 MANAGER1
30 CLERK  1
30 MANAGER1
30 SALESMAN   4


the query above counts the number of different job types in each
department.

In mysql, you would be allowed to do the following:

select deptno,job,count(*) from emp group by deptno;

but it makes no sense. What value would it return and what does it mean?
How can that possibly represent reliable data? What would the result set
above look like? It would be meaningless in a production system.
honestly.


It's a silly bug that mysql has touted has a feature
and I can't imagine why people think it's useful.


> I think most MySQL users don't stumble on it, they learn it as the way
to
> handle the common use case when you join a master table against a
detail 
> table and then want to aggregate all the detail records.

Huh? I don't follow that at all
Perhaps your confusing the concept with window functions that neither pg
nor mysql have implemented yet?
For example, using window functions allows you to return aggregate and
detailed info simultaneously:

select ename,
   deptno,
   job,
   count(*)over(partition by deptno) as emps_per_dept,
   count(*)over(partition by deptno,job) as job_per_dept,
   count(*)over() as total
   from emp

ENAME  DEPTNO JOB   EMPS_PER_DEPT JOB_PER_DEPT TOTAL
-- -- - -  -
MILLER 10 CLERK 3114
CLARK  10 MANAGER   3114
KING   10 PRESIDENT 3114
SCOTT  20 ANALYST   5214
FORD   20 ANALYST   5214
SMITH  20 CLERK 5214
ADAMS  20 CLERK 5214
JONES  20 MANAGER   5114
JAMES  30 CLERK 6114
BLAKE  30 MANAGER   6114
ALLEN  30 SALESMAN  6414
MARTIN 30 SALESMAN  6414
TURNER 30 SALESMAN  6414
WARD   30 SALESMAN  6414


But this is not a group by, this is aggregating and windowing, 
which is quite different from mysql adding that nasty little bug 
and calling it a feature.

 - a

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Greg Stark
Sent: Wednesday, October 12, 2005 9:13 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

Scott Marlowe <[EMAIL PROTECTED]> writes:

> Hehe.  When I turn on my windshield wipers and my airbag deploys, is
it
> a documented "feature" if the dealership told me about this behaviour
> ahead of time?  

Well it's more like my car where the dashboard dims when I turn on my
headlights which annoys me to no end since I learned to always put my
headlights on even in the day.

> In much the same way, while this behaviour may be documented by MySQL,
I
> can't imagine it really being called a feature. But at least this
> misbehaviour is documented. However, I think most people in the MySQL
> universe just stumble onto it by accident when they try it and it
works. I'd
> at least prefer it to throw a warning or notice or something.

I don't see why you think people stumble on this by accident. I think
it's
actually an extremely common need. So common that Postgres has the same
feature (though less general) and invented a whole syntax to handle it.

I think most MySQL users don't stumble on it, they learn it as the way
to
handle the common use case when you join a master table against a detail
table
and then want to aggregate all the detail records. In standard SQL you
have to
write GROUP BY ... and list every single column you need from the master
table. Forcing the database to do a lot of redundant comparisons and
sort on
uselessly long keys where in fact you only really need it to sort and
group by
the primary key.

Remember, m

Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Anthony Molinaro
Tom,
  Don't do it man...
  It's not the same. The standard has it wrong (that or I misunderstood
your
  explanation). But this statement:

> you shouldn't have to explicitly list the other columns from that
> table --- they are guaranteed to have unique values per group anyway

...is way off.
By changing the values in the select/group by you are changing 
Group! How can you arbitrarily add or exclude a column?
You can't do it.

Look, all it takes is one example to prove the standard wrong...



Consider this simple example (in oracle, but the same in pg):


create table foo (id number primary key, name varchar2(10));

insert into foo values (1,'joe');

insert into foo values (2,'joe');

insert into foo values (3,'jim');


select * from foo;

ID NAME
-- --
 1 joe
 2 joe
 3 jim


ok, lets get the count by ID (which is unique):

select id, count(*) from foo group by id;

ID   COUNT(*)
-- --
 1  1
 2  1
 3  1


makes sense...

the values for ID, 1,2,3 are our groups
and the count aggregate op shows that there's
one member per group.


now let's get the count by name:

select name, count(*) from foo group by name;

NAME COUNT(*)
-- --
jim 1
joe 2

so far so good. The queries and results are representative of the actual
data.  The values for name, "jim" and "joe" are our groups
and the count aggregate op shows that there's 1 member in the "jim"
group and two members in the "joe" group.


But, as soon as we add id to the group by...

select name, count(*) from foo group by name,id;

NAME COUNT(*)
-- --
jim 1
joe 1
joe 1

it changes the query! Even tho id is unique, it changes the query.
The group by it's definition (it's in the group by) is no longer
Name or id, it's both name and id.
How can you simply remove id? Which result set should it return
The first or second? Makes no sense...

If it's aint obvious why, simply plug id into the select:

select id, name, count(*) from foo group by name,id;

ID NAME COUNT(*)
--- -- --
  3 jim 1
  1 joe 1
  2 joe 1



so, how can it be that you don't have to list the other columns
(in this case "name")? it makes no sense because if you remove the 
either column from the group by (id or name) it changes the meaning of
the 
query.
The way you guys do it now is correct. Mysql has it wrong.
And if the standard states that you can leave out the pk from a group
by,
They are wrong too, as the simple examples above prove.

Fyi, Oracle just bought innodb, so, I'd not be too concerned with mysql
and they so called "features" anyway.


Regards,
  Anthony

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
Sent: Wednesday, October 12, 2005 9:51 PM
To: Greg Stark
Cc: Scott Marlowe; Stephan Szabo; Rick Schumeyer;
pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause 

Greg Stark <[EMAIL PROTECTED]> writes:
> I think most MySQL users don't stumble on it, they learn it as the way
> to handle the common use case when you join a master table against a
> detail table and then want to aggregate all the detail records. In
> standard SQL you have to write GROUP BY ... and list every single
> column you need from the master table. Forcing the database to do a
> lot of redundant comparisons and sort on uselessly long keys where in
> fact you only really need it to sort and group by the primary key.

Actually, if you're grouping by a table's primary key, the SQL99 spec
says you shouldn't have to explicitly list the other columns from that
table --- they are guaranteed to have unique values per group anyway.
This is a single case in the "functional dependency" stuff.  That
verbiage is incredibly dense and I don't think we want to tackle all of
it any time soon, but the primary-key case probably wouldn't be very
hard to implement.  We really ought to have this in TODO ... I'm sure
it's been discussed before.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Anthony Molinaro
Greg,
  You'll have to pardon me...
 
I saw this comment:

"I don't see why you think people stumble on this by accident. 
I think it's actually an extremely common need."

Which, if referring to the ability to have items in the select that do
not
need to be included in the group, (excluding constants and the like)
is just silly.

OTOH, if you're all poking fun at a mysql bug that they try to pass
off as a feature, then yes, I'm a clod and I missed that the first time
around :)

Regards,
  Anthony

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 12, 2005 11:43 PM
To: Anthony Molinaro
Cc: Tom Lane; Greg Stark; Scott Marlowe; Stephan Szabo; Rick Schumeyer;
pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

"Anthony Molinaro" <[EMAIL PROTECTED]> writes:

> By changing the values in the select/group by you are changing 
> Group! How can you arbitrarily add or exclude a column?
> You can't do it.

Go back and reread the previous posts again. You missed the whole point.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-12 Thread Anthony Molinaro
Greg,

Ok, I think I see where you're going (I don't agree, but I think
I get you now).

So, using your example of: 
"dept_name is guaranteed to be the same for all records with the
same dept_id."

Here:

select d.deptno,d.dname 
  from emp  e, dept d 
 where e.deptno=d.deptno

DEPTNO DNAME
-- --
10 ACCOUNTING
10 ACCOUNTING
10 ACCOUNTING
20 RESEARCH
20 RESEARCH
20 RESEARCH
20 RESEARCH
20 RESEARCH
30 SALES
30 SALES
30 SALES
30 SALES
30 SALES
30 SALES


 ok, so there's your scenario.

And you're suggesting that one should be able to
Do the following query?

select d.deptno,d.dname,count(*) 
  from emp  e, dept d 
 where e.deptno=d.deptno


if that's what you suggest, then we'll just have to agree to disagree.

That query needs a group by. What you're suggesting is, imo,
a wholly unnecessary shortcut (almost as bad as that ridiculous "natural
join" - whoever came up with that should be tarred and feathered).

I think I see your point now, I just disagree.
Your depending on syntax to work based on data integrity? 
Hmmm don't think I like that idea
 
What performance improvement do you get from leaving group by out?
Look at the query above, doesn't a count of distinct deptno,dname pairs
have
to take place anyway? What do you save by excluding group by?
Are you suggesting COUNT be computed for each row (windowed) or that
COUNT is computed for each group?

If you want repeating rows, then you want windowing. 
For example:

select d.deptno,d.dname,count(*)over(partition by d.deptno,d.dname) cnt 
  from emp  e, dept d 
 where e.deptno=d.deptno

DEPTNO DNAME  CNT
-- -- ---
10 ACCOUNTING   3
10 ACCOUNTING   3
10 ACCOUNTING   3
20 RESEARCH 5
20 RESEARCH 5
20 RESEARCH 5
20 RESEARCH 5
20 RESEARCH 5
30 SALES6
30 SALES6
30 SALES6
30 SALES6
30 SALES6
30 SALES6



if you want "groups", then use group by:

select d.deptno,d.dname,count(*) cnt 
  from emp  e, dept d
 where e.deptno=d.deptno 
 group by d.deptno,d.dname

DEPTNO DNAME  CNT
-- -- ---
10 ACCOUNTING   3
20 RESEARCH 5
30 SALES6


what your suggesting doesn't seem to fit in at all,
particularly when pg implements window functions.

If you're suggesting the pg optimizer isn't doing the right thing
with group by queries, then this is an optimizer issue and 
that should be hacked, not group by. If you're suggesting certain
rows be ditched or shortcuts be taken, then the optimizer should do
that, not the programmer writing sql.

Db2 and oracle have no problem doing these queries, I don't see
why pg should have a problem.

imo, the only items that should not be listed in the group by
are:

1. constants and deterministic functions
2. scalar subqueries
3. window functions

1 - because the value is same for each row
2&3 - because they are evaluated after the grouping takes place

regards,
  Anthony

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 13, 2005 12:25 AM
To: Anthony Molinaro
Cc: [EMAIL PROTECTED]; Tom Lane; Scott Marlowe; Stephan Szabo; Rick
Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

"Anthony Molinaro" <[EMAIL PROTECTED]> writes:

> Greg,
>   You'll have to pardon me...
>  
> I saw this comment:
> 
> "I don't see why you think people stumble on this by accident. 
> I think it's actually an extremely common need."
> 
> Which, if referring to the ability to have items in the select that do
not
> need to be included in the group, (excluding constants and the like)
is just
> silly.

Well the "constants and the like" are precisely the point. There are
plenty of
cases where adding the column to the GROUP BY is unnecessary and since
Postgres makes no attempt to prune them out, inefficient. And constants
aren't
the only such case. The most common case is columns that are coming from
a
table where the primary key is already included in the GROUP BY list.

In the case of columns coming from a table where the primary key is
already in
the GROUP BY list it's possible for the database to deduce that it's
unnecessary to group on that column. 

But it's also possible to have cases where the programmer has out of
band
knowledge that it's unnecessary but the database doesn't have that
knowledge.
The most obvious case that comes to mind is a denormalized data model
that
includes a redundant column.

  select dept_id, dept_name, count(*) from employee_list

For example if dept_name is guaranteed to be the same f

Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-13 Thread Anthony Molinaro
Greg,
 
"
You would prefer:

select user_id, 
   any(username) as username, any(firstname) as firstname, 
   any(lastname) as lastname, any(address) as address,
   any(city) as city, any(street) as street, any(phone) as phone,
   any(last_update) as last_update, any(last_login) as last_login,
   any(referrer_id) as referrer_id, any(register_date) as
register_date,
   ...
   sum(money) as balance,
   count(money) as num_txns
  from user join user_money using (user_id) group by user_id
"

yes, that's right!

Guess what? It's been that way for years. Why change it now?

You're arguing something that works perfectly 
and has been understood for years.
Changing the syntax cuz pg doesn't optimize it the way you like is
ridiculous.

Perhaps this change would make the newbies happy but I cant imagine
an experienced developer asking for this, let alone argue for it.

> I'm pretty unsympathetic to the "we should make a language less
powerful 
> and more awkward because someone might use it wrong" argument.

More awkward? What *you're* suggesting is more awkward. You realize that
right?
How can syntax that is understood and accepted for years be more
awkward?

Again, you're asking for changes that no one but a newbie would ask
for

I'm not at all suggesting you are/aren't a newbie 
(so don't take offense to this :), 
all I'm saying is that for experienced developers, 
we'd hope that the source code developers for pg/oracle/db2 etc 
are focusing on more important things, not rewriting things that already
work because something doesn't wanna type out column names...

regards,
  Anthony

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Greg Stark
Sent: Thursday, October 13, 2005 2:27 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause


Scott Marlowe <[EMAIL PROTECTED]> writes:

> Sorry, but it's worse than that.  It is quite possible that two people
> could run this query at the same time and get different data from the
> same set and the same point in time.  That shouldn't happen
accidentally
> in SQL, you should know it's coming.

I'm pretty unsympathetic to the "we should make a language less powerful
and
more awkward because someone might use it wrong" argument.

> > In standard SQL you have to
> > write GROUP BY ... and list every single column you need from the
master
> > table. Forcing the database to do a lot of redundant comparisons and
sort on
> > uselessly long keys where in fact you only really need it to sort
and group by
> > the primary key.
> 
> But again, you're getting whatever row the database feels like giving
> you.  A use of a simple, stupid aggregate like an any() aggregate
would
> be fine here, and wouldn't require a lot of overhead, and would meet
the
> SQL spec.

Great, so I have a user table with, oh, say, 40 columns. And I want to
return
all those columns plus their current account balance in a single query.

The syntax under discussion would be:

select user.*, sum(money) from user join user_money using (user_id)
group by user_id

You would prefer:

select user_id, 
   any(username) as username, any(firstname) as firstname, 
   any(lastname) as lastname, any(address) as address,
   any(city) as city, any(street) as street, any(phone) as phone,
   any(last_update) as last_update, any(last_login) as last_login,
   any(referrer_id) as referrer_id, any(register_date) as
register_date,
   ...
   sum(money) as balance,
   count(money) as num_txns
  from user join user_money using (user_id) group by user_id


Having a safeties is fine but when I have to disengage the safety for
every
single column it starts to get more than a little annoying. 

Note that you cannot write the above as a subquery since there are two
aggregates. You could write it as a join against a view but don't expect
to
get the same plans from Postgres for that.


> Actually, for things like aggregates, I've often been able to improve
> performance with sub selects in PostgreSQL.  

If your experience is like mine it's a case of two wrongs cancelling
each
other out. The optimizer underestimates the efficiency of nested loops
which
is another problem. Since subqueries' only eligible plan is basically a
nested
loop it often turns out to be faster than the more exotic plans a join
can
reach.

In an ideal world subqueries would be transformed into the equivalent
join (or
some more general join structure that can cover both sets of semantics)
and
then planned through the same code path. In an ideal world the user
should be
guaranteed that equivalent queries would always result in the same plan
regardless of how they're written.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

-

Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-13 Thread Anthony Molinaro
Well...

An additional gripe is that this isn't a good feature (standard or not).
Oracle doesn't do it. Db2 doesn't do it. I strongly suggest you guys
don't 
do it.

If you wanna do the optimizations under the covers, cool, but I can't
imagine how this would be useful other than for saving some typing...

Seems more trouble than it's worth and changes a concept that's tried
and true for many years.

Regards,
  Anthony

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
Sent: Thursday, October 13, 2005 2:50 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause 

 In standard SQL you have to
 write GROUP BY ... and list every single column you need from the
master
 table.

This thread seems to have gone off on a tangent that depends on the
assumption that the above is a correct statement.  It's not.  It *was*
true, in SQL92, but SQL99 lets you omit unnecessary GROUP BY columns.

The gripe against mysql, I think, is that they don't enforce the
conditions that guarantee the query will give a unique result.

The gripe against postgres is that we haven't implemented the SQL99
semantics yet.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-13 Thread Anthony Molinaro
Tom,

  I'm sure there's all sorts of cool optimizations under the covers 
  to perhaps maybe to this short circuiting,  
  but as the sql goes, yeah, I'm sure.

Here's an example on oracle 10g release 2 (copy paste from my screen so
you can see the error messages and all):



SQL> create table foo(id number primary key, name varchar2(10));

Table created.

SQL> insert into foo values (1,'sam');

1 row created.

SQL> insert into foo values (2,'sam');

1 row created.

SQL> commit;

Commit complete.

SQL> select id, count(*) from foo;
select id, count(*) from foo
   *
ERROR at line 1:
ORA-00937: not a single-group group function


SQL> select name, count(*) from foo;
select name, count(*) from foo
   *
ERROR at line 1:
ORA-00937: not a single-group group function


SQL> select name, count(*) from foo group by id;
select name, count(*) from foo group by id
   *
ERROR at line 1:
ORA-00979: not a GROUP BY expression


SQL> select name, count(*) from foo group by name;

NAME COUNT(*)
-- --
sam 2

SQL> select name, count(*) from foo group by name,id;

NAME COUNT(*)
-- --
sam 1
sam 1

SQL>


I can't imagine Oracle making a change such as the one we're discussing
at this point. Perhaps in 8.1.6, ~7 years ago, when *tons* of sql
changes were implemented (analytics, CASE, ordering in inline views,
CUBE, ROLLUP), 
but not now...

then again, oracle is 100% completely driven by money, so, if enough
customers ask for it, it will happen eventually. I just can't imagine
anyone asking for this feature when we're paying 40k per cpu just to
run oracle; there are much more important things for them to be workin
on...


Regards,
  Anthony

-Original Message-----
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 13, 2005 3:17 PM
To: Anthony Molinaro
Cc: Scott Marlowe; Greg Stark; Stephan Szabo; Rick Schumeyer;
pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause 

"Anthony Molinaro" <[EMAIL PROTECTED]> writes:
> An additional gripe is that this isn't a good feature (standard or
not).
> Oracle doesn't do it. Db2 doesn't do it.

You sure about that?  It's hard to believe that the SQL committee would
put a feature into the spec that neither Oracle nor IBM intended to
implement.  Those two pretty much control the committee after all ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] pg, mysql comparison with "group by" clause

2005-10-14 Thread Anthony Molinaro
Sheesh...

I work with 2 MIT guys, and man, you guys will never admit you're wrong.
Must be something in the water down there ;) 

Hey man, you know what, to each his own
(but Apostol is one of my favorites, so maybe have that in common? :)

You apparently like this shortcut, so be it.

I'll say this tho, Oracle and db2 don't do it even tho
it's in the standard and their optimizer is already doing
the right thing. That's gotta tell ya something, no?

don't look for this feature to be something you can do everywhere.
If for some reason postgres implements it, it will be the only
vendor to do so (though, this seems like a very MySQL-ish thing to do
so maybe not just Postgres)

take care,
  Anthony

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 14, 2005 3:21 AM
To: Anthony Molinaro
Cc: Greg Stark; Scott Marlowe; Stephan Szabo; Rick Schumeyer;
pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause


"Anthony Molinaro" <[EMAIL PROTECTED]> writes:

> More awkward? What *you're* suggesting is more awkward. You realize
that
> right? How can syntax that is understood and accepted for years be
more
> awkward?

Well gosh, I would say that that's something only a newbie could say
about
SQL of all things...

I had a whole thing written after that but I just deleted it. I grow
tired of
this thread.

I am pretty happy to hear that the SQL standard endorsed the idea having
the
right thing happen if the primary key is present in the grouping list.
That
would be a wonderful feature for Postgres.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster