[SQL] Table design question

2006-06-01 Thread David Clarke

I'm reading Joe Celko's book SQL Programming Style for the second time
and although I've been an OO developer for quite a few years I'm
fairly green wrt SQL. Joe is obviously something of a curmudgeon and I
would fall squarely into his newbie OO developer ordinal scale and I'm
trying to avoid the slide into stupid newbie OO developer.

So I'm designing a table and I'm looking for an appropriate key. The
natural key is a string from a few characters up to a maximum of
perhaps 100. Joe gets quite fierce about avoiding the use of a serial
id column as a key. The string is unique in the table and fits the
criteria for a key. So should I follow Joe's advice and use my natural
key as the primary key? It sounds reasonable but it will mean at least
one other table will have the string as a foreign key. My postgres
intro book has id columns all over the place but is it really that big
an issue these days to have a 100 character primary key? Are there
postgres-specific implications for either approach?

Thanks
Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] SELECT DISTINCT too slow

2006-06-01 Thread Miroslav Šulc




Hello,

I have a table with cca 400,000 rows. The table contains column "key"
of varchar(20) type containing 10 distinct values. I want to get out
what distinct values are present in the column. I use this simple
query, which is very slow:

SELECT DISTINCT Key FROM MRTPContactValue

Here is the query plan:


  

  QUERY PLAN


  
  Unique  (cost=64882.26..66964.59 rows=9 width=9) (actual time=26139.972..29593.164 rows=10 loops=1)
  


  
->  Sort  (cost=64882.26..65923.43 rows=416466 width=9) (actual time=26139.964..27975.944 rows=416466 loops=1)
  


  
  Sort Key: "key"
  


  
  ->  Seq Scan on mrtpcontactvalue  (cost=0.00..8669.66 rows=416466 width=9) (actual time=0.026..2460.535 rows=416466 loops=1)
  


  Total runtime: 29603.159 ms

  


I've tried index on the "key" column but no improvement.

Is there a way to speed the SELECT up?

Thank you for any suggestions.
-- 
Miroslav Šulc


begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Schodov=C3=A1 309/10;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] Table design question

2006-06-01 Thread Rod Taylor
> So I'm designing a table and I'm looking for an appropriate key. The
> natural key is a string from a few characters up to a maximum of
> perhaps 100. Joe gets quite fierce about avoiding the use of a serial
> id column as a key. The string is unique in the table and fits the
> criteria for a key. So should I follow Joe's advice and use my natural
> key as the primary key? It sounds reasonable but it will mean at least
> one other table will have the string as a foreign key. My postgres

Unfortunately as a result of the implementation of most commercial
databases (and PostgreSQL), using a string as the primary key will cause
a performance hit.

Yes, normally it is better and for things like status flags and other
constant values that rarely change it is a good approach.


I would tend to add a SERIAL and make it the primary key (using it in
foreign tables) but also making the string column unique and not
null'able.

Essentially it gives the table two primary keys in the hope that some
day PostgreSQL will add a layer of abstraction between primary/foreign
key lookups and presentation that allows for usage of strings directly
without a performance hit.
-- 


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Alvaro Herrera
Miroslav ?ulc wrote:
> Hello,
> 
> I have a table with cca 400,000 rows. The table contains column "key" of
> varchar(20) type containing 10 distinct values. I want to get out what
> distinct values are present in the column. I use this simple query,
> which is very slow:
> 
> SELECT DISTINCT Key FROM MRTPContactValue

You could get the universe of values from the table where this is a
primary key, and use an IN clause (which apparently is more efficient
than an EXISTS in some cases, but try that too) to search for values
that exist in the MRTPContactValue table.

I assume you do have the other table, don't you?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Miroslav Šulc




Well, "key" is not primary key from another table. It is just a column
in pair "key" => "value".
The structure of the table is this:

Id (primary key)
MRTPContactId (id of contact from table MRTPContact)
Key (key from pair key => value)
Value (value from pair key => value)

So I want the get the list of keys used in the table.

Miroslav Šulc


Alvaro Herrera napsal(a):

  Miroslav ?ulc wrote:
  
  
Hello,

I have a table with cca 400,000 rows. The table contains column "key" of
varchar(20) type containing 10 distinct values. I want to get out what
distinct values are present in the column. I use this simple query,
which is very slow:

SELECT DISTINCT Key FROM MRTPContactValue

  
  
You could get the universe of values from the table where this is a
primary key, and use an IN clause (which apparently is more efficient
than an EXISTS in some cases, but try that too) to search for values
that exist in the MRTPContactValue table.

I assume you do have the other table, don't you?

  



begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Schodov=C3=A1 309/10;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Alvaro Herrera
Miroslav ?ulc wrote:
> Well, "key" is not primary key from another table. It is just a column
> in pair "key" => "value".
> The structure of the table is this:
> 
> Id (primary key)
> MRTPContactId (id of contact from table MRTPContact)
> Key (key from pair key => value)
> Value (value from pair key => value)
> 
> So I want the get the list of keys used in the table.

The plan you get is the most efficient possible for that query.  If you
had a table of possible keys (which should of course be FK of "Key"),
you could get a much faster version :-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Miroslav Šulc
It might be a good solution :-)

Thank you for your help.

Miroslav Šulc


Alvaro Herrera napsal(a):
> Miroslav ?ulc wrote:
>   
>> Well, "key" is not primary key from another table. It is just a column
>> in pair "key" => "value".
>> The structure of the table is this:
>>
>> Id (primary key)
>> MRTPContactId (id of contact from table MRTPContact)
>> Key (key from pair key => value)
>> Value (value from pair key => value)
>>
>> So I want the get the list of keys used in the table.
>> 
>
> The plan you get is the most efficient possible for that query.  If you
> had a table of possible keys (which should of course be FK of "Key"),
> you could get a much faster version :-)
>
>   
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Schodov=C3=A1 309/10;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Tom Lane
=?ISO-8859-2?Q?Miroslav_=A9ulc?= <[EMAIL PROTECTED]> writes:
> I have a table with cca 400,000 rows. The table contains column "key" of
> varchar(20) type containing 10 distinct values. I want to get out what
> distinct values are present in the column. I use this simple query,
> which is very slow:

> SELECT DISTINCT Key FROM MRTPContactValue

Try
SELECT Key FROM MRTPContactValue GROUP BY Key

The "select distinct" code is a bit old and crufty, GROUP BY is usually
smarter.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Miroslav Šulc
The GROUP BY is really fast :-)

Thank you.

Miroslav Šulc


Tom Lane napsal(a):
> Try
>   SELECT Key FROM MRTPContactValue GROUP BY Key
>
> The "select distinct" code is a bit old and crufty, GROUP BY is usually
> smarter.
>
>   regards, tom lane
>   
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Schodov=C3=A1 309/10;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Table design question

2006-06-01 Thread Greg Stark

"David Clarke" <[EMAIL PROTECTED]> writes:

> is it really that big an issue these days to have a 100 character primary
> key? Are there postgres-specific implications for either approach?

It's exactly the same size issue as ever. A 20% increase in space usage is a
20% performance hit in certain types of queries regardless of how fast or
cheap your hardware has become.

This is an issue where reasonable people differ quite vociferously so you may
get contradictory responses. But really it depends heavily on the exact
circumstances and requires a judgement call based on experience. Any time
someone tries to sum it up with a blanket rule it's going to be wrong some of
the time.

But that said I also tend to tilt towards creating serial ids. Unless the
string is already a synthetic unique identifier created by your application
you can't really trust its "uniqueness" for all time. Plenty of people have
built databases using natural unique keys that turned out to be not so unique
or turned out to just be a poor choice for external reasons (think of Social
Security Numbers, for example).

I've had my string unique identifiers corrupted by uppercasing, declared case
insensitive, declared private information that couldn't be leaked, and
declared offensive words that had to be updated. Each of which is a pain to
deal with when it's your primary key.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Alvaro Herrera
Miroslav ?ulc wrote:
> The GROUP BY is really fast :-)

Doh!  How does it do it?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Greg Stark
Alvaro Herrera <[EMAIL PROTECTED]> writes:

> Miroslav ?ulc wrote:
> > Well, "key" is not primary key from another table. It is just a column
> > in pair "key" => "value".
> > The structure of the table is this:
> > 
> > Id (primary key)
> > MRTPContactId (id of contact from table MRTPContact)
> > Key (key from pair key => value)
> > Value (value from pair key => value)
> > 
> > So I want the get the list of keys used in the table.
> 
> The plan you get is the most efficient possible for that query.  If you
> had a table of possible keys (which should of course be FK of "Key"),
> you could get a much faster version :-)

Actually you could try the equivalent query:

 SELECT Key FROM MRTPContactValue GROUP BY Key

This may or may not be faster because it can use a hash aggregate plan. I
would expect it to be faster here because there are few distinct keys and the
planner predicts that. 

Eventually these two queries should be handled the same by Postgres but Hash
Aggregates are a new addition and DISTINCT/DISTINCT ON hasn't been adapted to
make use of them.

Also, incidentally, I don't see how a table of possible keys could help you
here. Nothing forces they table MRTPContactValue to use all possible keys...

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Miroslav Šulc




Greg Stark napsal(a):

  Actually you could try the equivalent query:

 SELECT Key FROM MRTPContactValue GROUP BY Key

This may or may not be faster because it can use a hash aggregate plan. I
would expect it to be faster here because there are few distinct keys and the
planner predicts that. 

Eventually these two queries should be handled the same by Postgres but Hash
Aggregates are a new addition and DISTINCT/DISTINCT ON hasn't been adapted to
make use of them.

Also, incidentally, I don't see how a table of possible keys could help you
here. Nothing forces they table MRTPContactValue to use all possible keys...
  


I simpified the case because it was slow by itself. GROUP BY really
makes this a lot faster.


The table contains properties for each contact that I cannot control
how many properties and what names of the properties there will be. In
my scenario user can export the data through user interface and I need
to know what keys are used there to create appropriate column names.
There is even one constraint. The contacts are grouped into groups so I
need to get only the keys from a selected group. The real query is this
(which is not so fast as the plain SELECT ... GROUP BY ... because the
other table is also large enough) but now it is faster than before:

SELECT Key FROM MRTPContactValue
INNER JOIN MRTPContact
ON MRTPContactValue.MRTPContactId = MRTPContact.Id
WHERE MRTPContact.MRTPWaveQuestionnaireId = 1
GROUP BY Key


Here's the query plan:


  

  QUERY PLAN


  
  HashAggregate  (cost=32639.67..32639.76 rows=9 width=9) (actual time=19407.116..19407.146 rows=10 loops=1)
  


  
->  Hash Join  (cost=8070.36..31598.51 rows=416466 width=9) (actual time=5917.367..17607.502 rows=416466 loops=1)
  


  
  Hash Cond: ("outer".mrtpcontactid = "inner".id)
  


  
  ->  Seq Scan on mrtpcontactvalue  (cost=0.00..8669.66 rows=416466 width=17) (actual time=9.094..4233.131 rows=416466 loops=1)
  


  
  ->  Hash  (cost=7119.80..7119.80 rows=137824 width=8) (actual time=5096.750..5096.750 rows=137824 loops=1)
  


  
->  Seq Scan on mrtpcontact  (cost=0.00..7119.80 rows=137824 width=8) (actual time=9.312..4337.647 rows=137824 loops=1)
  


  
  Filter: (mrtpwavequestionnaireid = 1)
  


  Total runtime: 19417.873 ms

  



The same query using DISTINCT takes about 40 sec to complete.


Thank you.

--
Miroslav Šulc


begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Schodov=C3=A1 309/10;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Table design question

2006-06-01 Thread postgres
On Thu, 01 Jun 2006, David Clarke wrote:

> So I'm designing a table and I'm looking for an appropriate key. The
> natural key is a string from a few characters up to a maximum of
> perhaps 100. Joe gets quite fierce about avoiding the use of a serial
> id column as a key. The string is unique in the table and fits the

The use of surrogate keys is a mostly religious issue. Celko is an
Orthodox, many others are Reform. Where you want to align yourself
is a personal choice. 

In defense of the Reform movement, I'd note that modern DBs are more
performant with ints than varchars. More importantly, I've found it
much easier to modify DBs designed with surrogate keys than natural
keys, especially when natural keys span columns. It allows a rather
simple convention for coders to write against, and avoids some messy
modification issues when the spec changes.

It is not my intention to bash the purists, and there are good arguments
on the Ortho side, too. I'm merely giving a bit of advice from the
point of view of someone who lives in the constant evolution side
of DB usage.

-j

-- 
Jamie Lawrence[EMAIL PROTECTED]
When I was a boy I was told that anybody could become President. 
Now I'm beginning to believe it.
   - Clarence Darrow



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Collin Peters

I am having some serious mental block here.  Here is the abstract
version of my problem.  I have a table like this:

unique_id (PK)   broadcast_id   date_sent  status
1  1 2005-04-0430
2  1 2005-04-01 30
3  1 2005-05-20 10
4  2 2005-05-29 30

So it is a table that stores broadcasts including the broadcast_id,
the date sent, and the status of the broadcast.

What I would like to do is simply get the last date_sent and it's
status for every broadcast.  I can't do a GROUP BY because I can't put
an aggregate on the status column.

SELECT MAX(date_sent), status
FROM broadcast_history
GROUP BY broadcast_id

How do I get the status for the most recent date_sent using GROUP BY?

DISTINCT also doesn't work

SELECT DISTINCT ON (email_broadcast_id) *
FROM email_broadcast_history
ORDER BY date_sent

As you have to have the DISTINCT fields matching the ORDER BY fields.
I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent

I keep thinking am I missing something.  Does anybody have any ideas?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] Table design question

2006-06-01 Thread codeWarrior
I never use anything other than "id SERIAL NOT NULL PRIMARY KEY" for my 
PKEY's -- as an absolute rule -- I guess I am a purist...  Everything else 
(the other columns) can have unique constraints, etcetera and be FOREIGN 
KEYS, etc...

Try INSERTING your 100 character "natural" key into a table with 10M++ rows 
only to find out there there is already a duplicate talk about a 
performance hit or SELECT -- you end up using way too much RAM and 
bandwidth -- unecessarily...


IMHO: You ought to use a numeric, auto-generated sequence (SERIAL) for you 
PKEY's ...



""David Clarke"" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> I'm reading Joe Celko's book SQL Programming Style for the second time
> and although I've been an OO developer for quite a few years I'm
> fairly green wrt SQL. Joe is obviously something of a curmudgeon and I
> would fall squarely into his newbie OO developer ordinal scale and I'm
> trying to avoid the slide into stupid newbie OO developer.
>
> So I'm designing a table and I'm looking for an appropriate key. The
> natural key is a string from a few characters up to a maximum of
> perhaps 100. Joe gets quite fierce about avoiding the use of a serial
> id column as a key. The string is unique in the table and fits the
> criteria for a key. So should I follow Joe's advice and use my natural
> key as the primary key? It sounds reasonable but it will mean at least
> one other table will have the string as a foreign key. My postgres
> intro book has id columns all over the place but is it really that big
> an issue these days to have a 100 character primary key? Are there
> postgres-specific implications for either approach?
>
> Thanks
> Dave
>
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Oisin Glynn

Collin Peters wrote:

I am having some serious mental block here.  Here is the abstract
version of my problem.  I have a table like this:

unique_id (PK)   broadcast_id   date_sent  status
1  1 2005-04-0430
2  1 2005-04-01 30
3  1 2005-05-20 10
4  2 2005-05-29 30

So it is a table that stores broadcasts including the broadcast_id,
the date sent, and the status of the broadcast.

What I would like to do is simply get the last date_sent and it's
status for every broadcast.  I can't do a GROUP BY because I can't put
an aggregate on the status column.

SELECT MAX(date_sent), status
FROM broadcast_history
GROUP BY broadcast_id

How do I get the status for the most recent date_sent using GROUP BY?

DISTINCT also doesn't work

SELECT DISTINCT ON (email_broadcast_id) *
FROM email_broadcast_history
ORDER BY date_sent

As you have to have the DISTINCT fields matching the ORDER BY fields.
I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent

I keep thinking am I missing something.  Does anybody have any ideas?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


May not be the most efficient but seems to work here.

Select broadcast_id,status  from broadcast_history bh1 where 
bh1.date_sent = (select max(date_sent) from broadcast_history bh2 where 
bh1.broadcast_id=bh2.broadcast_id) order by bh1.broadcast_id;


Oisin


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Aaron Bono
select my_sub.max_date, broadcast_history.statusfrom (SELECT MAX(date_sent) max_date, broadcast_idFROM broadcast_historyGROUP BY broadcast_id) my_subinner join broadcast_history on (broadcast_history.broadcast_id = my_sub.broadcast_id
and broadcast_history.date_sent = my_sub.max_date);This should work if the combined broadcast_id, date_sent is unique.  If not, you will need to decide what record to pick in case of a tie.
On 6/1/06, Collin Peters <[EMAIL PROTECTED]> wrote:
I am having some serious mental block here.  Here is the abstractversion of my problem.  I have a table like this:unique_id (PK)   broadcast_id   date_sent  status1  1 2005-04-0430
2  1 2005-04-01 303  1 2005-05-20 104  2 2005-05-29 30So it is a table that stores broadcasts including the broadcast_id,
the date sent, and the status of the broadcast.What I would like to do is simply get the last date_sent and it'sstatus for every broadcast.  I can't do a GROUP BY because I can't putan aggregate on the status column.
SELECT MAX(date_sent), statusFROM broadcast_historyGROUP BY broadcast_idHow do I get the status for the most recent date_sent using GROUP BY?DISTINCT also doesn't workSELECT DISTINCT ON (email_broadcast_id) *
FROM email_broadcast_historyORDER BY date_sentAs you have to have the DISTINCT fields matching the ORDER BY fields.I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sentI keep thinking am I missing something.  Does anybody have any ideas?
---(end of broadcast)---TIP 2: Don't 'kill -9' the postmaster-- ==
 Aaron Bono PresidentAranya Software Technologies, Inc. http://www.aranya.com We take care of your technology needs. Phone: (816) 695-6071
==


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Yasir Malik

What I would like to do is simply get the last date_sent and it's
status for every broadcast.  I can't do a GROUP BY because I can't put
an aggregate on the status column.

SELECT MAX(date_sent), status
FROM broadcast_history
GROUP BY broadcast_id


You could try the following:
select status
  from broadcast_history bh
 where bh.date_sent =
   (select max(bh2.date_sent)
  from broadcast_history bh2);

This reminds me of an interview question:  I was asked how to get a 
maximum column from a table without using max.  How would you do that?


Thanks,
Yasir

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Yasir Malik

select max(date_sent) from table;
would equal
select date_sent from broadcast_history order by date_sent DESC limit 1;

That sounds like a hack.  Is limit a SQL-99 standard?  Is there are 
another way to do this?

Sorry to take over your topic, Collin.

Thanks,
Yasir

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [SQL] SELECT DISTINCT too slow

2006-06-01 Thread Florian Weimer
* Alvaro Herrera:

> Miroslav ?ulc wrote:
>> The GROUP BY is really fast :-)
>
> Doh!  How does it do it?

It uses a hash table and can therefore discard duplicate rows more
quickly (essentially linear time in the number of rows if the number
of different rows is bounded).

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Tom Lane
"Collin Peters" <[EMAIL PROTECTED]> writes:
> What I would like to do is simply get the last date_sent and it's
> status for every broadcast.  I can't do a GROUP BY because I can't put
> an aggregate on the status column.

You missed the key idea about how to use DISTINCT ON.

SELECT DISTINCT ON (email_broadcast_id) *
FROM email_broadcast_history
ORDER BY email_broadcast_id, date_sent DESC

You order by the DISTINCT ON fields, then one or more additional fields
to select the representative row you want within each DISTINCT ON group.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Aaron Bono

It is a hack, but when someone wants you to do something in a way
different from the norm, aren't they asking for a hack?

SQL Server does something like
select top (1) from 

I am thinking this is NOT a SQL-99 standard.

-Aaron

On 6/1/06, Yasir Malik <[EMAIL PROTECTED]> wrote:

> select max(date_sent) from table;
> would equal
> select date_sent from broadcast_history order by date_sent DESC limit 1;
>
That sounds like a hack.  Is limit a SQL-99 standard?  Is there are
another way to do this?
Sorry to take over your topic, Collin.

Thanks,
Yasir


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Yasir Malik

It is a hack, but when someone wants you to do something in a way
different from the norm, aren't they asking for a hack?

SQL Server does something like
select top (1) from 

I am thinking this is NOT a SQL-99 standard.

This was an interview with Amazon, and I don't think Amazon wanted a hack. 
I hope Amazon doesn't use hacks.  There has to be another way.  It would 
be cruel if they expected me know some database specific functionality.

Here's what I gave them:
select value
  from table t
 where t.value >
  (select t2.value
 from table t2);

which would be fine if the sub-select returned simply a list, but that 
isn't even valid SQL (I didn't get the job, BTW).


Yasir

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Alvaro Herrera
Yasir Malik wrote:
> >It is a hack, but when someone wants you to do something in a way
> >different from the norm, aren't they asking for a hack?
> >
> >SQL Server does something like
> >select top (1) from 
> >
> >I am thinking this is NOT a SQL-99 standard.
> >
> This was an interview with Amazon, and I don't think Amazon wanted a hack. 
> I hope Amazon doesn't use hacks.  There has to be another way.  It would 
> be cruel if they expected me know some database specific functionality.

Do they use PostgreSQL at Amazon?  Maybe they wanted to know if you are
aware of common non-standard tricks for whatever RDBMS they use.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Scott Marlowe
On Thu, 2006-06-01 at 14:47, Yasir Malik wrote:
> > It is a hack, but when someone wants you to do something in a way
> > different from the norm, aren't they asking for a hack?
> >
> > SQL Server does something like
> > select top (1) from 
> >
> > I am thinking this is NOT a SQL-99 standard.
> >
> This was an interview with Amazon, and I don't think Amazon wanted a hack. 
> I hope Amazon doesn't use hacks.  There has to be another way.  It would 
> be cruel if they expected me know some database specific functionality.
> Here's what I gave them:
> select value
>from table t
>   where t.value >
>(select t2.value
>   from table t2);
> 
> which would be fine if the sub-select returned simply a list, but that 
> isn't even valid SQL (I didn't get the job, BTW).

Sometimes the only reasonable answer is a hack.

Much like in the older versions of PostgreSQL where select max(id) from
sometable was slower than watching grass grow, the standard "hack" was
to do select id from sometable order by id desc limit 1.

If the SQL way kills your server and the hack doesn't, you use the hack
and move on.  

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Table design question

2006-06-01 Thread Chris Browne
"codeWarrior" <[EMAIL PROTECTED]> writes:
> I never use anything other than "id SERIAL NOT NULL PRIMARY KEY" for my 
> PKEY's -- as an absolute rule -- I guess I am a purist...  Everything else 
> (the other columns) can have unique constraints, etcetera and be FOREIGN 
> KEYS, etc...
>
> Try INSERTING your 100 character "natural" key into a table with 10M++ rows 
> only to find out there there is already a duplicate talk about a 
> performance hit or SELECT -- you end up using way too much RAM and 
> bandwidth -- unecessarily...

Celko is decidedly *NOT* promoting the notion that you should use a
100 byte long "natural key."

Jamie's comments of "Orthodox versus Reform" seem reasonably
appropriate in outlining something of the difference between the
positions.  

Just because Celko is "Orthodox" doesn't mean he's *stupid*; he makes
all sorts of noises about using some ISO standard (11179) such that
your primary key values are actually described in some authoritative
manner.



If you go about using ISO-11179, then, in principle, you could
register your scheme for describing IDs for the objects in your system
with some ISO people so that others might even become aware of the
official status of this.

I may not care for doing this; you may not either; a company that
builds auto parts that they want to sell into the automotive industry
may care about standardizing their part IDs quite a lot.

They're not interested in generating stupidly long identifiers; that's
inefficient in many ways...

> IMHO: You ought to use a numeric, auto-generated sequence (SERIAL)
> for you PKEY's ...

That's definitely the "Reform" position ;-), and it is a position that
can be rationally held.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/multiplexor.html
Rules of  the Evil  Overlord #185.  "If I capture  an enemy  known for
escaping via  ingenious and fantastic  little gadgets, I will  order a
full cavity  search and confiscate all personal  items before throwing
him in my dungeon."  

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread mark.dingee
Collin,

I have a similar circumstance in one of my own apps.  I operate under
the simple presumption that the unique_id is sequential and thus the
record with the highest unique_id is the most recent entry.  In that
case I use a query such as 

select * 
from broadcast_history
where unique_id in (
select broadcast_id, max(unique_id) 
from broadcast_history
group by broadcast_id)

which permits me to examine the entire record which is necessary in my
situation.

Good luck
Mark

On Thu, 2006-06-01 at 10:43 -0700, Collin Peters wrote:
> I am having some serious mental block here.  Here is the abstract
> version of my problem.  I have a table like this:
> 
> unique_id (PK)   broadcast_id   date_sent  status
> 1  1 2005-04-0430
> 2  1 2005-04-01 30
> 3  1 2005-05-20 10
> 4  2 2005-05-29 30
> 
> So it is a table that stores broadcasts including the broadcast_id,
> the date sent, and the status of the broadcast.
> 
> What I would like to do is simply get the last date_sent and it's
> status for every broadcast.  I can't do a GROUP BY because I can't put
> an aggregate on the status column.
> 
> SELECT MAX(date_sent), status
> FROM broadcast_history
> GROUP BY broadcast_id
> 
> How do I get the status for the most recent date_sent using GROUP BY?
> 
> DISTINCT also doesn't work
> 
> SELECT DISTINCT ON (email_broadcast_id) *
> FROM email_broadcast_history
> ORDER BY date_sent
> 
> As you have to have the DISTINCT fields matching the ORDER BY fields.
> I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent
> 
> I keep thinking am I missing something.  Does anybody have any ideas?
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Rod Taylor
On Thu, 2006-06-01 at 14:13 -0400, Yasir Malik wrote:
> > What I would like to do is simply get the last date_sent and it's
> > status for every broadcast.  I can't do a GROUP BY because I can't put
> > an aggregate on the status column.
> >
> > SELECT MAX(date_sent), status
> > FROM broadcast_history
> > GROUP BY broadcast_id
> >
> You could try the following:
> select status
>from broadcast_history bh
>   where bh.date_sent =
> (select max(bh2.date_sent)
>from broadcast_history bh2);
> 
> This reminds me of an interview question:  I was asked how to get a 
> maximum column from a table without using max.  How would you do that?

Find the list of everything that isn't the highest value, then invert
it.

Don't expect it to perform very well though.

select col
  from foo
 where col not in 
   (select f1.col
  from foo as f1
join foo as f2 on (f1.col < f2.col)
   );


-- 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[SQL] Advanced Query

2006-06-01 Thread operationsengineer1
hi all, i posted this problem on the novice thread,
but it makes much more sense to post it here, instead.
 sorry fo rthe double posting, i'll be sure to post
advanced SQL questions here in the future.

i have the following two tables (trimmed down for
simplicity's sake):

t_inspect
id, inspect_timestamp

t_inspect_result
id, inspect_id, inspect_pass

yes, i need both tables, although it might not be
obvious since i trimmed down the columns in this
simple example.

inspect_pass (bool): pass = true, fail = false

let's say i have the following values:

t_inspect
1, 2006-05-31...
2, 2006-06-01...

t_inspect_result
1, 1, true
2, 2, false
3, 2, false
4, 2, false
5, 2, true

iow, the first inspection passes the first time, the
second inspection (t_inspect.id = 2) had to be
inspected 4 times before it pass inspection.  you can
assume it was reworked inbetween inspections and more
defects were found upon reinspection.

i'm trying to develop a query that will provide the
first pass yield.  iow, the yield generated by
counting *only* the results associated with the first
time a unit is inspected for a given inspect.id.

t_inspect_result
1, 1, *true* -- first inspect for t_inspect.id = 1
2, 2, *false* -- first inspect for t_inspect.id = 2
3, 2, false
4, 2, false
5, 2, true

specifically, this case would yield 50%  (1 pass / 2
total) since the first inspection passed the first
time and the second inspection failed the first time.

i think i can get the first pass results through a
given inspection by using "distinct on
(t_inspect.id)..."  i say think b/c the actual query
is quite complex and i'm not 100% sure my results are
consistent with what i'm expecting.

i think i can get the results of the entire
t_inspect_result table using the count function - get
#passes, get #total and do some math.

what i can't seem to do is to get both - a count of
the total number of t_inspect_result.inspect_pass
where the value is true and a total count, by unique
t_inspect.id.

any guidance would be much appreciated.

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Table design question

2006-06-01 Thread postgres
On Thu, 01 Jun 2006, Chris Browne wrote:

> Celko is decidedly *NOT* promoting the notion that you should use a
> 100 byte long "natural key."
> 
> Jamie's comments of "Orthodox versus Reform" seem reasonably
> appropriate in outlining something of the difference between the
> positions.  

Just to be clear, that was all I was trying to do. I probably should
have mentioned that any attempt to use such an attribute as a PK should 
be met with a baseball bat or other shillelagh-ish implement, but was 
interrupted several times during that email drafting.

> I may not care for doing this; you may not either; a company that
> builds auto parts that they want to sell into the automotive industry
> may care about standardizing their part IDs quite a lot.

This is another important point. In some situations, a rigid data model
can be a godsend to coders. If you happen to sit in such an enviable
position, I would encourage you to take advantage of it. (This doesn't
mean picking bad keys, of course.) 

I liberally sprinkle surrogate keys around simply because most of the
projects I work on have transient requirements, so spontaneous rejiggery 
and various pokery are both commonplace, and SKs provide "enough" data
integrity that the cost/benefit curve seems to peak there. Were I doing
projects that had longer release cycles, I'd re-evaluate that position,
and likely see a marginal reduction in bugs.

None of this should be taken as bashing Celko - he's a smart man and an
excellent source of advice.

-j

-- 
Jamie Lawrence[EMAIL PROTECTED]
When I talked to the president, he was loaded.
  - Brent Scowcroft, Kissinger's assistant, 10/11/73



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Advanced Query

2006-06-01 Thread Michael Fuhr
On Thu, Jun 01, 2006 at 04:09:21PM -0700, [EMAIL PROTECTED] wrote:
> what i can't seem to do is to get both - a count of
> the total number of t_inspect_result.inspect_pass
> where the value is true and a total count, by unique
> t_inspect.id.

Are you looking for something like this?

SELECT 1.0 * sum(CASE WHEN inspect_pass THEN 1 ELSE 0 END) / count(*)
FROM (
  SELECT DISTINCT ON (inspect_id) inspect_id, inspect_pass
  FROM t_inspect_result
  ORDER BY inspect_id, id
) AS s;

Multiply by 100.0 instead of 1.0 if you want percent.

If you have a cast from boolean to integer (built-in in 8.1, easily
created in earlier versions) then you could replace the CASE
expression with a cast (inspect_pass::integer).  Whether to use the
more explicit CASE or the more concise cast is a matter of style.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Advanced Query

2006-06-01 Thread operationsengineer1
> On Thu, Jun 01, 2006 at 04:09:21PM -0700,
> [EMAIL PROTECTED] wrote:
> > what i can't seem to do is to get both - a count
> of
> > the total number of t_inspect_result.inspect_pass
> > where the value is true and a total count, by
> unique
> > t_inspect.id.
> 
> Are you looking for something like this?
> 
> SELECT 1.0 * sum(CASE WHEN inspect_pass THEN 1 ELSE
> 0 END) / count(*)
> FROM (
>   SELECT DISTINCT ON (inspect_id) inspect_id,
> inspect_pass
>   FROM t_inspect_result
>   ORDER BY inspect_id, id
> ) AS s;
> 
> Multiply by 100.0 instead of 1.0 if you want
> percent.
> 
> If you have a cast from boolean to integer (built-in
> in 8.1, easily
> created in earlier versions) then you could replace
> the CASE
> expression with a cast (inspect_pass::integer). 
> Whether to use the
> more explicit CASE or the more concise cast is a
> matter of style.

Michael, wow!  i never heard of case or seen the "if /
then" style in sql.  i need to get out more... or
maybe less. ;-)

i have been working through a simplified version of
the problem and i am accurately getting the "pieces"
of data that i need (#pass, #total) - it is similar to
your example following your first FROM statement.

i just need to work the complexities back in w/o
destroying my current results and then perform the
math on the results - either in pgsql or in my app.

i'll play around with the more advanced stuff
tomorrow.

thanks - i think i have enough pieces to get this
done.  if not...  "i'll be baaawck."

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Table design question

2006-06-01 Thread David Clarke

On 6/2/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

On Thu, 01 Jun 2006, Chris Browne wrote:

> Celko is decidedly *NOT* promoting the notion that you should use a
> 100 byte long "natural key."
>
> Jamie's comments of "Orthodox versus Reform" seem reasonably
> appropriate in outlining something of the difference between the
> positions.

Just to be clear, that was all I was trying to do. I probably should
have mentioned that any attempt to use such an attribute as a PK should
be met with a baseball bat or other shillelagh-ish implement, but was
interrupted several times during that email drafting.

> I may not care for doing this; you may not either; a company that
> builds auto parts that they want to sell into the automotive industry
> may care about standardizing their part IDs quite a lot.

This is another important point. In some situations, a rigid data model
can be a godsend to coders. If you happen to sit in such an enviable
position, I would encourage you to take advantage of it. (This doesn't
mean picking bad keys, of course.)

None of this should be taken as bashing Celko - he's a smart man and an
excellent source of advice.

-j



Thanks everyone who replied (and also for the insightful and measured
responses, not every news group is so lucky). I had progressed down
the path of the serial id column but re-reading Celko's book - he
spends some pages railing against "proprietary auto-numbering
features" - I wanted to feel confident I was making the right choice.

Thanks again
Dave

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Aaron Bono
Is this SQL-99 compliant or a PostgreSQL specific query?  I really like it and have never seen this before.-AaronOn 6/1/06, Tom Lane <
[EMAIL PROTECTED]> wrote:"Collin Peters" <
[EMAIL PROTECTED]> writes:> What I would like to do is simply get the last date_sent and it's> status for every broadcast.  I can't do a GROUP BY because I can't put> an aggregate on the status column.
You missed the key idea about how to use DISTINCT ON.SELECT DISTINCT ON (email_broadcast_id) *FROM email_broadcast_historyORDER BY email_broadcast_id, date_sent DESCYou order by the DISTINCT ON fields, then one or more additional fields
to select the representative row you want within each DISTINCT ON group.


Re: [SQL] Am I crazy or is this SQL not possible

2006-06-01 Thread Tom Lane
"Aaron Bono" <[EMAIL PROTECTED]> writes:
> Is this SQL-99 compliant or a PostgreSQL specific query?  I really like it
> and have never seen this before.

DISTINCT ON is a Postgres-ism, I'm afraid.  It's pretty handy though.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly