[SQL] inner join is much faster! is that right?
I really thought that Postgresql would rewrite a query from select * from file a, file_tag b, tag c where a.id_file=b.id_file and b.id_tag=c.id_tag and a.id_file=100100 to something like: select * from (file a inner join file_tag b on (a.id_file=b.id_file)) inner join tag c on (b.id_tag=c.id_tag) where a.id_file=100100 However this is not the case! I have a many-to-many relation between 2 tables (tag and file). tag has 4 rows and file has 5 millions rows. file_tag has 50 millions rows. In this setup, the first select takes a couple of minutes to complete while the second query takes only a second to complete. Is this a normal expected behavior? Shouldn't Postgresql be able to rewrite the query correctly? _ Få 250 MB gratis lagerplads på MSN Hotmail: http://www.hotmail.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[SQL] Borland Database Engine assumes Varchar(255) as Memo
Hi list,I don't know if there is anybody using postgresql with Delphi, but I would like to exange some knowledge with some of you. BDE assumes my Varchar(255) as Memo and it reports many problem to my programming. Does someone already noticed it and can help me ?Regards ...-- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Atenciosamente (Sincerely) Ezequias Rodrigues da Rocha =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-A pior das democracias ainda é melhor do que a melhor das ditadurasThe worst of democracies is still better than the better of dictatorships http://ezequiasrocha.blogspot.com/
Re: [SQL] inner join is much faster! is that right?
> I really thought that Postgresql would rewrite a query from > > select * > from file a, file_tag b, tag c > where a.id_file=b.id_file and b.id_tag=c.id_tag and a.id_file=100100 > > to something like: > > select * > from (file a inner join file_tag b on (a.id_file=b.id_file)) inner join > tag c on (b.id_tag=c.id_tag) > where a.id_file=100100 These shouldn't be other than two ways to express the same (inner) join. Your timings seem to suggest that in the first case PG computes the whole join between 3 tables and only then applies the filter in a. Can you send the outputs of "explain " for these two and let us know what version of PG this is? Bye, Chris. -- Chris Mair http://www.1006.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] inner join is much faster! is that right?
I was doing the explain thing when I discovered my mistake! The think is that I over simplyfied my examples. instead of searching for one ID, I was searching for several ids and I forgot to put a parentesis arround the or like this. I was doing this: select * from file a, file_tag b, tag c where a.id_file=b.id_file and b.id_tag=c.id_tag and a.id_file=100100 or a.id_file=1001000200 But I should have done this: select * from file a, file_tag b, tag c where a.id_file=b.id_file and b.id_tag=c.id_tag and (a.id_file=100100 or a.id_file=1001000200) This changes everything. The performance is now acceptable! _ Vælg selv hvordan du vil kommunikere - skrift, tale, video eller billeder med MSN Messenger: http://messenger.msn.dk/ - her kan du det hele ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[SQL] Nested loops are killing throughput
Postgresql 8.1 I've tried turning off nested loops to see what the query planner would choose instead of nested loops. It chose a hash join, and it still had a nested loop in the mix! How can I entice the query planner to use a simpler join scheme? What criteria is used to determine whether or not a merge join will be used? Here's an explain analyze... Subquery Scan foo (cost=762.21..762.50 rows=1 width=12) (actual time=10784.849..10786.992 rows=9 loops=1) -> GroupAggregate (cost=762.21..762.24 rows=1 width=30) (actual time=10767.702..10768.539 rows=9 loops=1) -> Sort (cost=762.21..762.21 rows=1 width=30) (actual time=10767.525..10767.989 rows=360 loops=1) Sort Key: (dpdl.field_value)::timestamp with time zone -> Nested Loop (cost=2.01..762.20 rows=1 width=30) (actual time=283.824..10749.007 rows=360 loops=1) -> Nested Loop (cost=0.00..744.28 rows=1 width=16) (actual time=31.227..6980.765 rows=5436 loops=1) -> Nested Loop (cost=0.00..135.29 rows=101 width=16) (actual time=25.514..273.660 rows=5436 loops=1) -> Index Scan using ut_company_name_idx on user_table ut (cost=0.00..21.96 rows=5 width=11) (actual time=6.596..6.649 rows=9 loops=1) Index Cond: ((company_name)::text = Acme, Inc.'::text) -> Index Scan using packet_user_idx on packet dp (cost=0.00..19.89 rows=222 width=27) (actual time=16.939..28.025 rows=604 loops=9) Index Cond: ((dp.username)::text = ("outer".username)::text) Filter: (trans_date > (date_trunc('month'::text, (now() - '1 mon'::interval)) - '1 year 6 mons'::interval)) -> Index Scan using packet_status_puuid_pkey on packet_status dps (cost=0.00..6.02 rows=1 width=16) (actual time=1.226..1.228 rows=1 loops=5436) Index Cond: (dps.packet_uuid = "outer".packet_uuid) Filter: ((status & 2) = 0) -> Bitmap Heap Scan on packet_datalink dpdl (cost=2.01..17.87 rows=4 width=30) (actual time=0.685..0.688 rows=0 loops=5436) Recheck Cond: (dpdl.packet_uuid = "outer".packet_uuid) Filter: (((field_name)::text = 'event_date'::text) AND ((field_value)::date >= ('now'::text)::date)) -> Bitmap Index Scan on packet_dl_puuid_idx (cost=0.00..2.01 rows=4 width=0) (actual time=0.402..0.402 rows=4 loops=5436) Index Cond: (dpdl.packet_uuid = "outer".packet_uuid) Total runtime: 10787.198 ms Also, no one here can figure out why the row count prediction is off on the packet table either. It's vacuumed and analyzed. ... ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] Nested loops are killing throughput
CG <[EMAIL PROTECTED]> writes: > -> Nested Loop (cost=0.00..744.28 rows=1 width=16) > (actual time=31.227..6980.765 rows=5436 loops=1) > -> Nested Loop (cost=0.00..135.29 rows=101 > width=16) (actual time=25.514..273.660 rows=5436 loops=1) > ... > -> Index Scan using packet_status_puuid_pkey on > packet_status dps (cost=0.00..6.02 rows=1 width=16) (actual > time=1.226..1.228 rows=1 loops=5436) > Index Cond: (dps.packet_uuid = > "outer".packet_uuid) > Filter: ((status & 2) = 0) One problem you've got is that the planner has no stats about the selectivity of that status condition. My advice would be to forget the cute bitmask and store one or more plain boolean columns. Easier to write queries against and way more tractable statistically. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] Count and list only where count is bigger than 1
Hi list,It is possible to make a count select and only display where count column is bigger than 1 ?My SQL is like thisSELECT distinct cli.bairro, COUNT( * ) as qtdFROM base.cliente cliGROUP BY cli.cidade, cli.bairroORDER BY 2I noticed that I cannot use "where qtd > 1" ok ?What to do ?Regards ...-- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Atenciosamente (Sincerely)Ezequias Rodrigues da Rocha =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-A pior das democracias ainda é melhor do que a melhor das ditaduras The worst of democracies is still better than the better of dictatorships http://ezequiasrocha.blogspot.com/
Re: [SQL] Count and list only where count is bigger than 1
Ezequias Rodrigues da Rocha wrote: SELECT distinct cli.bairro, COUNT( * ) as qtd FROM base.cliente cli GROUP BY cli.cidade, cli.bairro ORDER BY 2 I noticed that I cannot use "where qtd > 1" ok ? What to do ? Google/read tutorial for HAVING. E.g. SELECT distinct cli.bairro, COUNT( * ) as qtd FROM base.cliente cli GROUP BY cli.cidade, cli.bairro HAVING COUNT(*) > 1 Zizi
Re: [SQL] Count and list only where count is bigger than 1
> Hi list, > > It is possible to make a count select and only display where count column > is > bigger than 1 ? > > My SQL is like this > > SELECT distinct cli.bairro, COUNT( * ) as qtd > FROM base.cliente cli > GROUP BY cli.cidade, cli.bairro > ORDER BY 2 > > I noticed that I cannot use "where qtd > 1" ok ? SELECT distinct cli.bairro, COUNT( * ) as qtd FROM base.cliente cli GROUP BY cli.cidade, cli.bairro HAVING count(*) > 1; ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Count and list only where count is bigger than 1
am Tue, dem 19.09.2006, um 12:04:55 -0300 mailte Ezequias Rodrigues da Rocha folgendes: > Hi list, > > It is possible to make a count select and only display where count column is > bigger than 1 ? > > My SQL is like this > > SELECT distinct cli.bairro, COUNT( * ) as qtd > FROM base.cliente cli > GROUP BY cli.cidade, cli.bairro > ORDER BY 2 > > I noticed that I cannot use "where qtd > 1" ok ? > > What to do ? HAVING. 17:14 < akretschmer> ??having 17:14 < pg_docbot_ads> For information about 'having' see: 17:14 < pg_docbot_ads> http://www.postgresql.org/docs/current/interactive/queries-table-expressions.html#QUERIES-GROUP Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47215, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Nested loops are killing throughput
- Original Message From: Tom Lane <[EMAIL PROTECTED]> To: CG <[EMAIL PROTECTED]> Cc: pgsql-sql@postgresql.org Sent: Tuesday, September 19, 2006 11:03:07 AM Subject: Re: [SQL] Nested loops are killing throughput CG <[EMAIL PROTECTED]> writes: > -> Nested Loop (cost=0.00..744.28 rows=1 width=16) > (actual time=31.227..6980.765 rows=5436 loops=1) > -> Nested Loop (cost=0.00..135.29 rows=101 > width=16) (actual time=25.514..273.660 rows=5436 loops=1) > ... > -> Index Scan using packet_status_puuid_pkey on > packet_status dps (cost=0.00..6.02 rows=1 width=16) (actual > time=1.226..1.228 rows=1 loops=5436) > Index Cond: (dps.packet_uuid = > "outer".packet_uuid) > Filter: ((status & 2) = 0) >One problem you've got is that the planner has no stats about the >selectivity of that status condition. My advice would be to forget the >cute bitmask and store one or more plain boolean columns. Easier to >write queries against and way more tractable statistically. n/p ... I'll ditch it. Even if I simplify that query down to a straight-forward example... select dp.*, dps.status from dpo.packet dp, dpo.packet_status dps where dp.packet_uuid = dps.packet_uuid and dp.username='joeuser'; I'm still getting the nested loop for a join. Nested Loop (cost=1.00..100013378.98 rows=2206 width=145) (actual time=46.743..18202.318 rows=2225 loops=1) -> Index Scan using packet_user_idx on packet dp (cost=0.00..88.03 rows=2206 width=125) (actual time=42.263..124.519 rows=2225 loops=1) Index Cond: ((username)::text = 'joeuser'::text) -> Index Scan using packet_status_puuid_pkey on packet_status dps (cost=0.00..6.01 rows=1 width=20) (actual time=8.115..8.117 rows=1 loops=2225) Index Cond: ("outer".packet_uuid = dps.packet_uuid) Total runtime: 18205.880 ms These indexes are being used, and the tables are freshly vacuum-analyzed... CREATE UNIQUE INDEX packet_puuid_idx ON dpo.packet USING btree (packet_uuid); CREATE INDEX packet_user_idx ON dpo.packet USING btree (username); CREATE UNIQUE INDEX packet_status_puuid_pkey ON dpo.packet_status USING btree (packet_uuid); I have no idea why I'm having such a difficult time digging through this data. It should be very straight-forward. CG ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Nested loops are killing throughput
CG <[EMAIL PROTECTED]> writes: > I'm still getting the nested loop for a join. > Nested Loop (cost=1.00..100013378.98 rows=2206 width=145) (actual > time=46.743..18202.318 rows=2225 loops=1) > -> Index Scan using packet_user_idx on packet dp (cost=0.00..88.03 > rows=2206 width=125) (actual time=42.263..124.519 rows=2225 loops=1) > Index Cond: ((username)::text = 'joeuser'::text) > -> Index Scan using packet_status_puuid_pkey on packet_status dps > (cost=0.00..6.01 rows=1 width=20) (actual time=8.115..8.117 rows=1 loops=2225) > Index Cond: ("outer".packet_uuid = dps.packet_uuid) > Total runtime: 18205.880 ms If packet_status is large, that seems like a perfectly reasonable plan to me. If not ... what data type is packet_uuid? Is its equality op marked mergeable or hashable? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Nested loops are killing throughput
>If packet_status is large, that seems like a perfectly reasonable plan >to me. If not ... what data type is packet_uuid? Is its equality op >marked mergeable or hashable? It is of type uniqueidentifier ... CREATE OPERATOR =( PROCEDURE = uniqueidentifier_eq, LEFTARG = uniqueidentifier, RIGHTARG = uniqueidentifier, COMMUTATOR = =, RESTRICT = eqsel, JOIN = eqjoinsel, HASHES, MERGES, SORT1 = <, SORT2 = <, LTCMP = <, GTCMP = >); It looks like it is marked both mergable and hashable. So what size (I have no idea what units to use to think about table size... kilobytes? pages? rows?) is "large" ... It seems like we recently stepped over that imaginary line between medium and large. We could re-tune PostgreSQL... What settings would make sense to tweak? Sort Memory? We could partition the tables... I understand the theory behind table partitioning and partial indexes, but I've never come across a howto or even a best practices list. We could remove data from the table... I don't like this as much, but if that's the only way, that's the only way. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[SQL] Make Trigger run after completion of ENTIRE transaction
This maybe more of a theoretical question, can you actually make a Trigger run after completion of the entire transaction? Here's what I have: LOG user | startdate | enddate enddate is getting updated by a trigger (on insert or update). I have the following transaction: BEGIN; Insert into LOG(user,startdate) values('jdoe','2006-08-13'); Insert into LOG(user,startdate) values('jdoe','2006-08-14'); Insert into LOG(user,startdate) values('jdoe','2006-08-15'); Insert into LOG(user,startdate) values('jdoe','2006-08-16'); Insert into LOG(user,startdate) values('jdoe','2006-08-17'); ... another 20-30 more inserts.. COMMIT; The trigger actually runs on each Insert and therefore slows down the Insert quite a bit. My question is, can you tell the trigger to run after Commit?
Re: [SQL] Nested loops are killing throughput
CG <[EMAIL PROTECTED]> writes: >> If packet_status is large, that seems like a perfectly reasonable plan >> to me. If not ... what data type is packet_uuid? Is its equality op >> marked mergeable or hashable? > It is of type uniqueidentifier ... Is that the one off gborg? It's broken because the equality function is marked volatile, and so the planner is afraid to try to use it for merging or hashing. (It's also not marked strict, which means you can trivially crash the backend by passing it a null ...) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Nested loops are killing throughput
>Is that the one off gborg? It's broken because the equality function is >marked volatile, and so the planner is afraid to try to use it for >merging or hashing. (It's also not marked strict, which means you can >trivially crash the backend by passing it a null ...) It is the one from gborg. That's good to know. So to fix, I should alter the supporting functions and set the ones that should not be VOLATILE to IMMUTABLE (newid() should be VOLATILE and that's about it) and set them to STRICT as well. I haven't had a problem with it crashing my backend in the three years we've been using it, but I'd rather not take any chances. I know the hacker list has been batting back and forth the idea of a built-in UUID type. I'll be interested to see if you end up with a standard UUID type, a fixed field type with multiple generators for differing levels of paranoia, or the status quo. CG ---(end of broadcast)--- TIP 6: explain analyze is your friend
[SQL] Need a SQL to create sets of hobbies
Hi! CREATE TABLE x1 (name TEXT, hobby TEXT, PRIMARY KEY(name,hobby)); INSERT INTO x1 VALUES('John','music'); INSERT INTO x1 VALUES('John','arts'); INSERT INTO x1 VALUES('Bob','arts'); INSERT INTO x1 VALUES('Bob','music'); INSERT INTO x1 VALUES('Rocky','copmputer'); INSERT INTO x1 VALUES('Steve','arts'); INSERT INTO x1 VALUES('Steve','football'); INSERT INTO x1 VALUES('Tom','computer'); INSERT INTO x1 VALUES('Tom','music'); select * from x1; name | hobby ---+-- John | music John | arts Bob | arts Bob | music Rocky | computer Steve | arts Steve | football Tom | computer Tom | music (9 rows) John and Bob have the same hobbies - music and arts. So music and arts are treated as one set of hobbies. Rocky has an unique set of interest - computer. Steve also likes arts just as John and Bob do, but he also has an exclusive interest - football. Thus, his set of hobbies is unique - arts, football. One of Tom's hobbies, music, overlaps those of John, Bob, and Rocky; but computer does not. Hence his hobbies, computer and music, forms a new set of hobbies. Now we have 4 sets of hobbies: set 1: music, arts set 2: computer set 3: arts, football set 4: computer, music I am looking for an SQL that creates sets of hobbies in table x2 by selecting from table x1: CREATE TABLE x2 (sid INTEGER, hobby TEXT, PRIMARY KEY(sid,hobby)); and makes x2 contain rows: sid | hobby ---+-- 1 | music 1 | arts 2 | computer 3 | arts 3 | football 4 | computer 4 | music where gid starts from 1. Thank you in advance! CN -- http://www.fastmail.fm - A no graphics, no pop-ups email service ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org