[SQL] inner join is much faster! is that right?

2006-09-19 Thread James Im

I really thought that Postgresql would rewrite a query from

select *
from file a, file_tag b, tag c
where a.id_file=b.id_file and b.id_tag=c.id_tag and a.id_file=100100

to something like:

select *
from (file a inner join file_tag b on (a.id_file=b.id_file)) inner join
tag c on (b.id_tag=c.id_tag)
where a.id_file=100100


However this is not the case!

I have a many-to-many relation between 2 tables (tag and file). tag has
4 rows and file has 5 millions rows. file_tag has 50 millions rows.

In this setup, the first select takes a couple of minutes to complete
while the second query takes only a second to complete.

Is this a normal expected behavior? Shouldn't Postgresql be able to
rewrite the query correctly?

_
Få 250 MB gratis lagerplads på MSN Hotmail:  http://www.hotmail.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[SQL] Borland Database Engine assumes Varchar(255) as Memo

2006-09-19 Thread Ezequias Rodrigues da Rocha
Hi list,I don't know if there is anybody using postgresql with Delphi, but I would like to exange some knowledge with some of you. BDE assumes my Varchar(255) as Memo and it reports many problem to my programming.
Does someone already noticed it and can help me ?Regards ...--  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-  Atenciosamente (Sincerely)
Ezequias Rodrigues da Rocha =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-A pior das democracias ainda é melhor do que a melhor das ditadurasThe worst of democracies is still better than the better of dictatorships
 http://ezequiasrocha.blogspot.com/


Re: [SQL] inner join is much faster! is that right?

2006-09-19 Thread Chris Mair

> I really thought that Postgresql would rewrite a query from
> 
> select *
> from file a, file_tag b, tag c
> where a.id_file=b.id_file and b.id_tag=c.id_tag and a.id_file=100100
> 
> to something like:
> 
> select *
> from (file a inner join file_tag b on (a.id_file=b.id_file)) inner join
> tag c on (b.id_tag=c.id_tag)
> where a.id_file=100100

These shouldn't be other than two ways to express the same (inner)
join.

Your timings seem to suggest that in the first case PG computes
the whole join between 3 tables and only then applies the filter
in a.

Can you send the outputs of "explain " for these two and
let us know what version of PG this is?

Bye, Chris.


-- 

Chris Mair
http://www.1006.org


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] inner join is much faster! is that right?

2006-09-19 Thread James Im

I was doing the explain thing when I discovered my mistake!

The think is that I over simplyfied my examples. instead of searching
for one ID, I was searching for several ids and I forgot to put a
parentesis arround the or like this.

I was doing this:
select *
from file a, file_tag b, tag c
where a.id_file=b.id_file and b.id_tag=c.id_tag and
a.id_file=100100 or a.id_file=1001000200

But I should have done this:

select *
from file a, file_tag b, tag c
where a.id_file=b.id_file and b.id_tag=c.id_tag and
(a.id_file=100100 or a.id_file=1001000200)

This changes everything. The performance is now acceptable!

_
Vælg selv hvordan du vil kommunikere - skrift, tale, video eller billeder 
med MSN Messenger:  http://messenger.msn.dk/  - her kan du det hele



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[SQL] Nested loops are killing throughput

2006-09-19 Thread CG
Postgresql 8.1

I've tried turning off nested loops to see what the query planner would choose 
instead of nested loops. It chose a hash join, and it still had a nested loop 
in the mix! How can I entice the query planner to use a simpler join scheme? 
What criteria is used to determine whether or not a merge join will be used?

Here's an explain analyze...

Subquery Scan foo  (cost=762.21..762.50 rows=1 width=12) (actual 
time=10784.849..10786.992 rows=9 loops=1)
  ->  GroupAggregate  (cost=762.21..762.24 rows=1 width=30) (actual 
time=10767.702..10768.539 rows=9 loops=1)
->  Sort  (cost=762.21..762.21 rows=1 width=30) (actual 
time=10767.525..10767.989 rows=360 loops=1)
  Sort Key: (dpdl.field_value)::timestamp with time zone
  ->  Nested Loop  (cost=2.01..762.20 rows=1 width=30) (actual 
time=283.824..10749.007 rows=360 loops=1)
->  Nested Loop  (cost=0.00..744.28 rows=1 width=16) 
(actual time=31.227..6980.765 rows=5436 loops=1)
  ->  Nested Loop  (cost=0.00..135.29 rows=101 
width=16) (actual time=25.514..273.660 rows=5436 loops=1)
->  Index Scan using ut_company_name_idx on 
user_table ut  (cost=0.00..21.96 rows=5 width=11) (actual time=6.596..6.649 
rows=9 loops=1)
  Index Cond: ((company_name)::text = Acme, 
Inc.'::text)
->  Index Scan using packet_user_idx on packet 
dp  (cost=0.00..19.89 rows=222 width=27) (actual time=16.939..28.025 rows=604 
loops=9)
  Index Cond: ((dp.username)::text = 
("outer".username)::text)
  Filter: (trans_date > 
(date_trunc('month'::text, (now() - '1 mon'::interval)) - '1 year 6 
mons'::interval))
  ->  Index Scan using packet_status_puuid_pkey on 
packet_status dps  (cost=0.00..6.02 rows=1 width=16) (actual time=1.226..1.228 
rows=1 loops=5436)
Index Cond: (dps.packet_uuid = 
"outer".packet_uuid)
Filter: ((status & 2) = 0)
->  Bitmap Heap Scan on packet_datalink dpdl  
(cost=2.01..17.87 rows=4 width=30) (actual time=0.685..0.688 rows=0 loops=5436)
  Recheck Cond: (dpdl.packet_uuid = "outer".packet_uuid)
  Filter: (((field_name)::text = 'event_date'::text) 
AND ((field_value)::date >= ('now'::text)::date))
  ->  Bitmap Index Scan on packet_dl_puuid_idx  
(cost=0.00..2.01 rows=4 width=0) (actual time=0.402..0.402 rows=4 loops=5436)
Index Cond: (dpdl.packet_uuid = 
"outer".packet_uuid)
Total runtime: 10787.198 ms

Also, no one here can figure out why the row count prediction is off on the 
packet table either. It's vacuumed and analyzed.

...

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread Tom Lane
CG <[EMAIL PROTECTED]> writes:
> ->  Nested Loop  (cost=0.00..744.28 rows=1 width=16) 
> (actual time=31.227..6980.765 rows=5436 loops=1)
>   ->  Nested Loop  (cost=0.00..135.29 rows=101 
> width=16) (actual time=25.514..273.660 rows=5436 loops=1)
> ...
>   ->  Index Scan using packet_status_puuid_pkey on 
> packet_status dps  (cost=0.00..6.02 rows=1 width=16) (actual 
> time=1.226..1.228 rows=1 loops=5436)
> Index Cond: (dps.packet_uuid = 
> "outer".packet_uuid)
> Filter: ((status & 2) = 0)

One problem you've got is that the planner has no stats about the
selectivity of that status condition.  My advice would be to forget the
cute bitmask and store one or more plain boolean columns.  Easier to
write queries against and way more tractable statistically.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[SQL] Count and list only where count is bigger than 1

2006-09-19 Thread Ezequias Rodrigues da Rocha
Hi list,It is possible to make a count select and only display where count column is bigger than 1 ?My SQL is like thisSELECT distinct cli.bairro, COUNT( * ) as qtdFROM base.cliente cliGROUP BY 
cli.cidade, cli.bairroORDER BY 2I noticed that I cannot use "where qtd > 1" ok ?What to do ?Regards ...--  =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Atenciosamente (Sincerely)Ezequias Rodrigues da Rocha =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-A pior das democracias ainda é melhor do que a melhor das ditaduras
The worst of democracies is still better than the better of dictatorships http://ezequiasrocha.blogspot.com/


Re: [SQL] Count and list only where count is bigger than 1

2006-09-19 Thread Mezei Zoltán




Ezequias Rodrigues da Rocha wrote:
SELECT distinct cli.bairro, COUNT( * ) as qtd
FROM base.cliente cli
GROUP BY cli.cidade, cli.bairro
ORDER BY 2
  
  I noticed that I cannot use "where
qtd > 1" ok ?
  
What to do ?

Google/read tutorial for HAVING. E.g.

SELECT distinct cli.bairro, COUNT( * ) as qtd
FROM base.cliente cli
GROUP BY cli.cidade, cli.bairro
HAVING COUNT(*) > 1

Zizi




Re: [SQL] Count and list only where count is bigger than 1

2006-09-19 Thread bnichols
> Hi list,
>
> It is possible to make a count select and only display where count column
> is
> bigger than 1 ?
>
> My SQL is like this
>
> SELECT distinct cli.bairro, COUNT( * ) as qtd
> FROM base.cliente cli
> GROUP BY cli.cidade, cli.bairro
> ORDER BY 2
>
> I noticed that I cannot use "where qtd > 1" ok ?

SELECT distinct cli.bairro, COUNT( * ) as qtd
FROM base.cliente cli
GROUP BY cli.cidade, cli.bairro
HAVING count(*) > 1;




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Count and list only where count is bigger than 1

2006-09-19 Thread A. Kretschmer
am  Tue, dem 19.09.2006, um 12:04:55 -0300 mailte Ezequias Rodrigues da Rocha 
folgendes:
> Hi list,
> 
> It is possible to make a count select and only display where count column is
> bigger than 1 ?
> 
> My SQL is like this
> 
> SELECT distinct cli.bairro, COUNT( * ) as qtd
> FROM base.cliente cli
> GROUP BY cli.cidade, cli.bairro
> ORDER BY 2
> 
> I noticed that I cannot use "where qtd > 1" ok ?
> 
> What to do ?

HAVING.

17:14 < akretschmer> ??having
17:14 < pg_docbot_ads> For information about 'having' see:
17:14 < pg_docbot_ads> 
http://www.postgresql.org/docs/current/interactive/queries-table-expressions.html#QUERIES-GROUP


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47215,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread CG
- Original Message 
From: Tom Lane <[EMAIL PROTECTED]>
To: CG <[EMAIL PROTECTED]>
Cc: pgsql-sql@postgresql.org
Sent: Tuesday, September 19, 2006 11:03:07 AM
Subject: Re: [SQL] Nested loops are killing throughput


CG <[EMAIL PROTECTED]> writes:
> ->  Nested Loop  (cost=0.00..744.28 rows=1 width=16) 
> (actual time=31.227..6980.765 rows=5436 loops=1)
>   ->  Nested Loop  (cost=0.00..135.29 rows=101 
> width=16) (actual time=25.514..273.660 rows=5436 loops=1)
> ...
>   ->  Index Scan using packet_status_puuid_pkey on 
> packet_status dps  (cost=0.00..6.02 rows=1 width=16) (actual 
> time=1.226..1.228 rows=1 loops=5436)
> Index Cond: (dps.packet_uuid = 
> "outer".packet_uuid)
> Filter: ((status & 2) = 0)

>One problem you've got is that the planner has no stats about the
>selectivity of that status condition.  My advice would be to forget the
>cute bitmask and store one or more plain boolean columns.  Easier to
>write queries against and way more tractable statistically.

n/p ... I'll ditch it. 
 
Even if I simplify that query down to a straight-forward example...
 
select dp.*, dps.status from dpo.packet dp, dpo.packet_status dps where 
dp.packet_uuid = dps.packet_uuid and dp.username='joeuser';
 
I'm still getting the nested loop for a join.
 
Nested Loop  (cost=1.00..100013378.98 rows=2206 width=145) (actual 
time=46.743..18202.318 rows=2225 loops=1)
  ->  Index Scan using packet_user_idx on packet dp  (cost=0.00..88.03 
rows=2206 width=125) (actual time=42.263..124.519 rows=2225 loops=1)
Index Cond: ((username)::text = 'joeuser'::text)
  ->  Index Scan using packet_status_puuid_pkey on packet_status dps  
(cost=0.00..6.01 rows=1 width=20) (actual time=8.115..8.117 rows=1 loops=2225)
Index Cond: ("outer".packet_uuid = dps.packet_uuid)
Total runtime: 18205.880 ms

 
These indexes are being used, and the tables are freshly vacuum-analyzed...
 
CREATE UNIQUE INDEX packet_puuid_idx
  ON dpo.packet
  USING btree
  (packet_uuid);
 
CREATE INDEX packet_user_idx
  ON dpo.packet
  USING btree
  (username);
 
CREATE UNIQUE INDEX packet_status_puuid_pkey
  ON dpo.packet_status
  USING btree
  (packet_uuid);
 
I have no idea why I'm having such a difficult time digging through this data. 
It should be very straight-forward.
 
CG
 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread Tom Lane
CG <[EMAIL PROTECTED]> writes:
> I'm still getting the nested loop for a join.
 
> Nested Loop  (cost=1.00..100013378.98 rows=2206 width=145) (actual 
> time=46.743..18202.318 rows=2225 loops=1)
>   ->  Index Scan using packet_user_idx on packet dp  (cost=0.00..88.03 
> rows=2206 width=125) (actual time=42.263..124.519 rows=2225 loops=1)
> Index Cond: ((username)::text = 'joeuser'::text)
>   ->  Index Scan using packet_status_puuid_pkey on packet_status dps  
> (cost=0.00..6.01 rows=1 width=20) (actual time=8.115..8.117 rows=1 loops=2225)
> Index Cond: ("outer".packet_uuid = dps.packet_uuid)
> Total runtime: 18205.880 ms

If packet_status is large, that seems like a perfectly reasonable plan
to me.  If not ... what data type is packet_uuid?  Is its equality op
marked mergeable or hashable?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread CG
>If packet_status is large, that seems like a perfectly reasonable plan
>to me.  If not ... what data type is packet_uuid?  Is its equality op
>marked mergeable or hashable?

It is of type uniqueidentifier ... 
 
CREATE OPERATOR =(
  PROCEDURE = uniqueidentifier_eq,
  LEFTARG = uniqueidentifier,
  RIGHTARG = uniqueidentifier,
  COMMUTATOR = =,
  RESTRICT = eqsel,
  JOIN = eqjoinsel,
  HASHES,
  MERGES,
  SORT1 = <,
  SORT2 = <,
  LTCMP = <,
  GTCMP = >);
 
It looks like it is marked both mergable and hashable.
 
So what size (I have no idea what units to use to think about table size... 
kilobytes? pages? rows?) is "large"  ... It seems like we recently stepped over 
that imaginary line between medium and large.
 
We could re-tune PostgreSQL... What settings would make sense to tweak? Sort 
Memory? 
 
We could partition the tables... I understand the theory behind table 
partitioning and partial indexes, but I've never come across a howto or even a 
best practices list.
 
We could remove data from the table... I don't like this as much, but if that's 
the only way, that's the only way.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[SQL] Make Trigger run after completion of ENTIRE transaction

2006-09-19 Thread Henry Ortega
This maybe more of a theoretical question, can you actually make a Trigger run
after completion of the entire transaction?
 
Here's what I have:
LOG
user   |    startdate   |   enddate
 
enddate is getting updated by a trigger (on insert or update).
 
I have the following transaction:
BEGIN;
Insert into LOG(user,startdate) values('jdoe','2006-08-13');

Insert into LOG(user,startdate) values('jdoe','2006-08-14');
Insert into LOG(user,startdate) values('jdoe','2006-08-15');
Insert into LOG(user,startdate) values('jdoe','2006-08-16');
Insert into LOG(user,startdate) values('jdoe','2006-08-17');
... another 20-30 more inserts..
COMMIT;
 
The trigger actually runs on each Insert and therefore slows down the
Insert quite a bit. 
 
My question is, can you tell the trigger to run after Commit? 


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread Tom Lane
CG <[EMAIL PROTECTED]> writes:
>> If packet_status is large, that seems like a perfectly reasonable plan
>> to me.  If not ... what data type is packet_uuid?  Is its equality op
>> marked mergeable or hashable?

> It is of type uniqueidentifier ... 

Is that the one off gborg?  It's broken because the equality function is
marked volatile, and so the planner is afraid to try to use it for
merging or hashing.  (It's also not marked strict, which means you can
trivially crash the backend by passing it a null ...)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] Nested loops are killing throughput

2006-09-19 Thread CG
>Is that the one off gborg?  It's broken because the equality function is
>marked volatile, and so the planner is afraid to try to use it for
>merging or hashing.  (It's also not marked strict, which means you can
>trivially crash the backend by passing it a null ...)

It is the one from gborg. That's good to know.
 
So to fix, I should alter the supporting functions and set the ones that should 
not be VOLATILE to IMMUTABLE (newid() should be VOLATILE and that's about it) 
and set them to STRICT as well. I haven't had a problem with it crashing my 
backend in the three years we've been using it, but I'd rather not take any 
chances.
 
I know the hacker list has been batting back and forth the idea of a built-in 
UUID type. I'll be interested to see if you end up with a standard UUID type, a 
fixed field type with multiple generators for differing levels of paranoia, or 
the status quo.
 
CG

---(end of broadcast)---
TIP 6: explain analyze is your friend


[SQL] Need a SQL to create sets of hobbies

2006-09-19 Thread CN
Hi!

CREATE TABLE x1 (name TEXT, hobby TEXT, PRIMARY KEY(name,hobby));

INSERT INTO x1 VALUES('John','music');
INSERT INTO x1 VALUES('John','arts');
INSERT INTO x1 VALUES('Bob','arts');
INSERT INTO x1 VALUES('Bob','music');
INSERT INTO x1 VALUES('Rocky','copmputer');
INSERT INTO x1 VALUES('Steve','arts');
INSERT INTO x1 VALUES('Steve','football');
INSERT INTO x1 VALUES('Tom','computer');
INSERT INTO x1 VALUES('Tom','music');

select * from x1;

 name  |  hobby   
---+--
 John  | music
 John  | arts
 Bob   | arts
 Bob   | music
 Rocky | computer
 Steve | arts
 Steve | football
 Tom   | computer
 Tom   | music
(9 rows)

John and Bob have the same hobbies - music and arts. So music and arts
are treated as one set of hobbies.

Rocky has an unique set of interest - computer.

Steve also likes arts just as John and Bob do, but he also has an
exclusive interest - football. Thus, his set of hobbies is unique -
arts, football.

One of Tom's hobbies, music, overlaps those of John, Bob, and Rocky; but
computer does not. Hence his hobbies, computer and music, forms a new
set of hobbies.

Now we have 4 sets of hobbies:

set 1: music, arts
set 2: computer
set 3: arts, football
set 4: computer, music

I am looking for an SQL that creates sets of hobbies in table x2 by
selecting from table x1:

CREATE TABLE x2 (sid INTEGER, hobby TEXT, PRIMARY KEY(sid,hobby));

and makes x2 contain rows:

 sid   |  hobby   
---+--
1  | music
1  | arts
2  | computer
3  | arts
3  | football
4  | computer
4  | music

where gid starts from 1.

Thank you in advance!
CN

-- 
http://www.fastmail.fm - A no graphics, no pop-ups email service


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org