[SQL] using LIMIT only on primary table
This is an extension of the problem solved by http://archives.postgresql.org/pgsql-sql/2002-03/msg00020.php but with a slightly different complication. I want to get the last 100 port commits from the database. Commits are stored in commit_log and commit_log_ports relates commits to ports. A given commit may affect more than one port (i.e. there is a 1-N relationship between commit_log and commit_log_ports). This gives me the last 100 commits: SELECT commit_log.* FROM commit_log ORDER BY commit_date DESC, id LIMIT 100; So a starting point for the last 100 port commits is: explain analyze SELECT distinct commit_log.* FROM commit_log_ports, commit_log WHERE commit_log.id = commit_log_ports.commit_log_id ORDER BY commit_log.commit_date DESC, commit_log_ports.commit_log_id LIMIT 100; But has a very high cost: Limit (cost=11275.92..11283.42 rows=100 width=55) (actual time=5769.07..5771.92 rows=100 loops=1) -> Unique (cost=11275.92..11643.73 rows=4904 width=55) (actual time=5769.05..5770.93 rows=101 loops=1) -> Sort (cost=11275.92..11275.92 rows=49042 width=55) (actual time=5769.04..5769.68 rows=112 loops=1) -> Hash Join (cost=3478.15..6387.22 rows=49042 width=55) (actual time=1263.69..4319.53 rows=49042 loops=1) -> Seq Scan on commit_log_ports (cost=0.00..825.42 rows=49042 width=4) (actual time=0.12..346.32 rows=49042 loops=1) -> Hash (cost=1749.51..1749.51 rows=42951 width=51) (actual time=1074.15..1074.15 rows=0 loops=1) -> Seq Scan on commit_log (cost=0.00..1749.51 rows=42951 width=51) (actual time=0.14..396.99 rows=42953 loops=1) Total runtime: 6158.10 msec I'd like to get that time down. I suspect it is high this is because the entire table is being joined, then the limit occurs. When I add "commit_log.id = 1" to the WHERE clause, the query plan changes to: Limit (cost=30.42..30.43 rows=1 width=55) (actual time=0.96..0.99 rows=1 loops=1) -> Unique (cost=30.42..30.43 rows=1 width=55) (actual time=0.95..0.97 rows=1 loops=1) -> Sort (cost=30.42..30.42 rows=1 width=55) (actual time=0.94..0.94 rows=1 loops=1) -> Nested Loop (cost=0.00..30.41 rows=1 width=55) (actual time=0.61..0.64 rows=1 loops=1) -> Index Scan using commit_log_pkey on commit_log (cost=0.00..5.93 rows=1 width=51) (actual time=0.31..0.32 rows=1 loops=1) -> Index Scan using commit_log_ports_cli on commit_log_ports (cost=0.00..24.41 rows=6 width=4) (actual time=0.25..0.27 rows=1 loops=1) I think that confirms that my indexes and PK/FK are set correctly. I could do something like: SELECT distinct commit_log_id FROM commit_log_ports ORDER BY commit_log_id desc LIMIT 100; But that doesn't take into consider the fact that commits can be added out of order and are not necessarily added in commit_date order. Any clues please? thanks -- Dan Langille And yes, I'm looking for a computer job: http://www.freebsddiary.org/dan_langille.php ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] How to update record in a specified order
On 9 Aug 2002 at 14:37, Josh Berkus wrote: > JLL, > > > I want to update a field with a 'NEXTVAL', but I want the record updated > > in a specific order. > > Any simple way of doing this other than having to create a temp table? > > Please be more speciifc. What do you mean, "specified order"? My reading of what is required: - sort a given result set according to some criteria - then do something like UPDATE table SET myfield = NEXTVAL - each row in the result set would get a value one more than the previous row -- Dan Langille I'm looking for a computer job: http://www.freebsddiary.org/dan_langille.php ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] bulk imports with sequence
On 20 Aug 2002 at 7:55, Aaron Held wrote: > I am importing a large number of records monthly using the \copy (from > text file)command. > > I would like to use a sequence as a unique row ID for display in my > app. > > Is there any way to let postgresql generate the sequence itself. > Currently the only way I can make it work is to grab the next seq > value and insert my own numbers into the file Yes: create sequence mytable_id_seq; alter table mytable alter column id set default nextval('mycolumn_id_seq'::text); -- Dan Langille I'm looking for a computer job: http://www.freebsddiary.org/dan_langille.php ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Case Sensitive "WHERE" Clauses?
On Thu, 26 Sep 2002, Andrew Perrin wrote: > No, I don't think it's supposed to be case-sensitive. In any case, whether > it's supposed to be or not, it certainly isn't in practice. AFAIK, they are case sensitive by design. It is the right thing to do. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Case Sensitive "WHERE" Clauses?
On Thu, 26 Sep 2002, Jordan Reiter wrote: > Are string comparisons in postgresql case sensitive? Yes, AFAIK. I disagree with your comments and recommendations posted at http://www.postgresql.org/idocs/index.php?datatype-character.html because my testing shows that varying text and fixed test comparisons are both case sensitive. testing=# \d casetest Table "casetest" Column | Type | Modifiers +---+--- name | text | city | character(10) | testing=# select * from casetest; name |city --+ Dan | Ottawa (1 row) testing=# select * from casetest where name = 'Dan'; name -- Dan (1 row) testing=# select * from casetest where name = 'dan'; name -- (0 rows) testing=# select * from casetest where city = 'ottawa'; name | city --+-- (0 rows) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] [GENERAL] arrays
On 30 Sep 2002 at 8:54, Josh Berkus wrote: > As such, I'd reccommend one of two approaches for you: > > 1) Post some of your schema ideas here, and let us show you how they > are better done relationally. The relational data model has 30 years > of thought behind it -- it can solve a lot of problems. Mike, Just in case you or others think Josh is some crazed lunatic[1] who doesn't know what he's talking about, I support his views on this topic. Avoid arrays. Normalize your data. [1] - Actually, I don't think I know anything about Josh, except that he's right about normalizing your data. -- Dan Langille I'm looking for a computer job: http://www.freebsddiary.org/dan_langille.php ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] [GENERAL] arrays
On 30 Sep 2002 at 12:09, Bruce Momjian wrote: > Dan Langille wrote: > > On 30 Sep 2002 at 8:54, Josh Berkus wrote: > > > > > As such, I'd reccommend one of two approaches for you: > > > > > > 1) Post some of your schema ideas here, and let us show you how they > > > are better done relationally. The relational data model has 30 years > > > of thought behind it -- it can solve a lot of problems. > > > > Mike, > > > > Just in case you or others think Josh is some crazed lunatic[1] who > > doesn't know what he's talking about, I support his views on this > > topic. Avoid arrays. Normalize your data. > > > > [1] - Actually, I don't think I know anything about Josh, except that > > he's right about normalizing your data. > > Yes, arrays have a very small window of usefulness, but the window does > exist, so we haven't removed them. I do not advocate removing them. I do advocate data normalization. Let's say it's a matter of Do The Right Thing(tm) unless you know what you're doing. -- Dan Langille I'm looking for a computer job: http://www.freebsddiary.org/dan_langille.php ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Suggestion: md5/crypt functions in sql
On Sun, 6 Oct 2002, Joe Conway wrote: > Aasmund Midttun Godal wrote: > > It would be very usefull to have these in sql, so that it is even easier to create >tables with encrypted passwords. > > > > See contrib/pgcrypto See also http://www.freebsddiary.org/postgresql-7.2.php which shows how I installed the above. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Need some SQL help
On Sun, 6 Oct 2002, Brian Ward wrote: > I have a table > id int > statusint > create_dt datetime > > When users log in and out I insert a row into this table throughout the day. > I'm trying though to come up with a querie to tell me and ordering of users > as they report in in the morning so i'm trying to say > select each user and the earliest time the logged in in a give day, then > order that by the create_dt column. > > Having trouble writing the SQL though select * from table order by id, create_dt If you want this information only for a specific date, when add in a where clause : where create_dt GT 'some date' Change Gt to the greater than character, which isn't working on this keyboard... ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] timestamp
On Sun, 6 Oct 2002, Brian Ward wrote: > How do I create a time stamp column in a table that default to the current > time > when a row is inserted? > I tried putting in now() as the default but I have something wrong > with the syntax or something I think > Any one have an example of a table creation script that has a timestamp > column? When you have an example which does not do what you want, it is better to provide that so we can see how to correct it. I am using : firstlogin timestamp default current_timestamp ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] timestamp
On Sun, 6 Oct 2002, Brian Ward wrote: And Brian, since this *is* a mailing list, it would be polite to use a valid email address. I suggest that you either unsubscribe or fix the email address. Mail to [EMAIL PROTECTED] bounces with the message: '550 Invalid recipient: <[EMAIL PROTECTED]>'. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] using deferred on PK/FK relationships
Can deferrable etc be used when deleting primary key records (master table), then reinserting them without losing foreign key records (slave table)? I ask because in our testing we can't; we lose the foreign key records in the slave table. I'm guessing we are trying to abuse the feature. here's a test script we tried: drop table master; CREATE TABLE master ( id integer NOT NULL, Primary Key (id) ); insert into master values (1); insert into master values (2); drop table slave; create table slave ( id int, foreign key (id)references master (id) on update restrict on delete cascade INITIALLY DEFERRED) ; insert into slave values (1); insert into slave values (1); Then: test=# BEGIN; BEGIN test=# SET CONSTRAINTS ALL DEFERRED; SET CONSTRAINTS test=# delete from master; DELETE 2 test=# insert into master values (1); INSERT 20959595 1 test=# insert into master values (2); INSERT 20959596 1 test=# select * from slave; id 1 1 (2 rows) test=# commit; COMMIT test=# select * from slave; id (0 rows) test=# Our hope was that after the commit, slave would retain the original rows. cheers -- Dan Langille ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] Trees: maintaining pathnames
My existing tree implementation reflects the files contained on disk. The full pathname to a particlar file is obtained from the path to the parent directory. I am now considering putting this information into a field in the table. Attached you will find the pg_dump from my test database (2.4k) if you want to test with this setup and in case what I have pasted below contains an error. Here is the table and the test data: create table tree(id int not null, parent_id int, name text not null, pathname text not null, primary key (id)); insert into tree (id, name, pathname) values (1, 'usr', '/usr'); insert into tree (id, name, parent_id, pathname) values (2, 'ports', 1, '/usr/ports'); insert into tree values (3, 2, 'security', 'test'); select * from tree; test=# select * from tree; id | parent_id | name | pathname +---+--+- 1 | | usr | /usr 2 | 1 | ports| /usr/ports 3 | 2 | security | /usr/ports/security (3 rows) The goal is to ensure that pathname always contains the correct value. Here are the functions/triggers which I created in order to attain that goal. This function ensures that the pathname is set correctly when a row is inserted or changed. create or replace function tree_pathname_set() returns opaque as ' DECLARE parent_pathname text; BEGIN RAISE NOTICE \'into tree_pathname_set with %:%:%\', new.id, new.name, new.pathname; select pathname into parent_pathname from tree where id = new.parent_id; if found then new.pathname = parent_pathname || \'/\' || new.name; else new.pathname = \'/\' || new.name; end if; RETURN new; END;' language 'plpgsql';\ create trigger tree_pathname_set before insert or update on tree for each row execute procedure tree_pathname_set(); This function ensures that any childre of a recently modified row are also kept up to date. create or replace function tree_pathname_set_children() returns opaque as 'BEGIN RAISE NOTICE \'into tree_pathname_set_children with %:%:%\', new.id, new.name, new.pathname; update tree set pathname = new.pathname || \'/\' || name where parent_id = new.id; RETURN new; END;' language 'plpgsql'; create trigger tree_pathname_set_children after insert or update on tree for each row execute procedure tree_pathname_set_children(); NOTE: the above is "insert or update" but as I typed this I realize that only update is sufficent. A change to the top level row is shown below: test=# update tree set name = 'dan' where id = 1; NOTICE: into tree_pathname_set with 1:dan:/usr NOTICE: into tree_pathname_set_children with 1:dan:/dan NOTICE: into tree_pathname_set with 2:ports:/dan/ports NOTICE: into tree_pathname_set_children with 2:ports:/dan/ports NOTICE: into tree_pathname_set with 3:security:/dan/ports/security NOTICE: into tree_pathname_set_children with 3:security:/dan/ports/security UPDATE 1 test=# select * from tree; id | parent_id | name | pathname +---+--+- 1 | | dan | /dan 2 | 1 | ports| /dan/ports 3 | 2 | security | /dan/ports/security (3 rows) test=# Suggestions, comment, open ridicule, most welcome. thanks. -- -- Selected TOC Entries: -- \connect - pgsql -- -- TOC Entry ID 3 (OID 15830772) -- -- Name: "plpgsql_call_handler" () Type: FUNCTION Owner: pgsql -- CREATE FUNCTION "plpgsql_call_handler" () RETURNS opaque AS '$libdir/plpgsql', 'plpgsql_call_handler' LANGUAGE 'C'; -- -- TOC Entry ID 4 (OID 15830773) -- -- Name: plpgsql Type: PROCEDURAL LANGUAGE Owner: -- CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql' HANDLER "plpgsql_call_handler" LANCOMPILER ''; \connect - dan -- -- TOC Entry ID 6 (OID 15830774) -- -- Name: "tree_pathname_set" () Type: FUNCTION Owner: dan -- CREATE FUNCTION "tree_pathname_set" () RETURNS opaque AS ' DECLARE parent_pathname text; BEGIN RAISE NOTICE ''into tree_pathname_set with %:%:%'', new.id, new.name, new.pathname; select pathname into parent_pathname from tree where id = new.parent_id; if found then new.pathname = parent_pathname || ''/'' || new.name; else new.pathname = ''/'' || new.name; end if; RETURN new; END;' LANGUAGE 'plpgsql'; -- -- TOC Entry ID 2 (OID 15832154) -- -- Name: tree Type: TABLE Owner: dan -- CREATE TABLE "tree" ( "id" integer NOT NULL, "parent_id" integer, "name" text NOT NULL, "pathname" text NOT NULL, Constraint "tree_pkey" Primary Key ("id") ); -- -- TOC Entry ID 5 (OID 15834571) -- -- Name: "tree_pathname_set_children" () Type: FUNCTION Owner: dan -- CREATE FUNCTION "tree_pathname_set_children" () RETURNS opaque AS 'BEGIN RAISE
Re: [SQL] Trees: maintaining pathnames
On 18 Nov 2002 at 1:09, [EMAIL PROTECTED] wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > NotDashEscaped: You need GnuPG to verify this message > > > Instead of storing the path in each row, why not let Postgres > take care of computing it with a function? Then make a view > and you've got the same table, without all the triggers. This is how it is now done. I wanted to be able to so this fairly quickly: select * from tree where pathname like '/usr/local/%' in order to get the subtree below a given point. Sorry I didn't mention that before. > > CREATE TABLE tree ( > idINTEGER NOT NULL, > parent_id INTEGER, > "name"TEXT NOT NULL, > PRIMARY KEY (id) > ); > > > INSERT INTO tree VALUES (1,NULL,''); > INSERT INTO tree VALUES (2,1,'usr'); > INSERT INTO tree VALUES (3,1,'tmp'); > INSERT INTO tree VALUES (4,1,'home'); > INSERT INTO tree VALUES (5,4,'greg'); > INSERT INTO tree VALUES (6,5,'etc'); > > CREATE OR REPLACE FUNCTION pathname(INTEGER) > RETURNS TEXT AS > ' > > DECLARE > mypath TEXT; > myname TEXT; > myid INTEGER; > > BEGIN > > SELECT parent_id,name FROM tree WHERE id=$1 INTO myid,mypath; > IF mypath IS NULL THEN > RETURN ''No such id\n''; > END IF; > > LOOP > SELECT parent_id,name FROM tree WHERE id=myid INTO myid,myname; > mypath := ''/'' || mypath; > EXIT WHEN myid IS NULL; > mypath := myname || mypath; > END LOOP; > > RETURN mypath; > > END; > ' LANGUAGE 'plpgsql'; > > CREATE VIEW mytree AS SELECT *, PATHNAME(id) AS path FROM tree; > > SELECT * FROM tree ORDER BY id; > > id | parent_id | name > +---+-- > 1 | | > 2 | 1 | usr > 3 | 1 | tmp > 4 | 1 | home > 5 | 4 | greg > 6 | 5 | etc > (6 rows) > > SELECT * FROM mytree ORDER BY id; > > id | parent_id | name | path > +---+--+ > 1 | | | / > 2 | 1 | usr | /usr > 3 | 1 | tmp | /tmp > 4 | 1 | home | /home > 5 | 4 | greg | /home/greg > 6 | 5 | etc | /home/greg/etc > (6 rows) > > UPDATE tree SET name='users' WHERE id=4; > > SELECT * FROM mytree ORDER BY id; > > id | parent_id | name | path > +---+---+- > 1 | | | / > 2 | 1 | usr | /usr > 3 | 1 | tmp | /tmp > 4 | 1 | users | /users > 5 | 4 | greg | /users/greg > 6 | 5 | etc | /users/greg/etc > (6 rows) That's good. Thank you. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Trees: maintaining pathnames
On 17 Nov 2002 at 14:51, Josh Berkus wrote: > Dan, > > > My existing tree implementation reflects the files contained on disk. > > The > > full pathname to a particlar file is obtained from the path to the > > parent > > directory. I am now considering putting this information into a > > field in > > the table. > > > Suggestions, comment, open ridicule, most welcome. thanks. > > This is a fine implementation using the adjacency list model of tree > design. However, I think you may find that the string-based tree > implementation in /contrib/ltree is more suited to your purposes, and > easier to maintain. That looks interesting. I have installed that onto a test server and I'm playing around with it.[1] The contrib/ltree project implements a tree via text parsing. Below I show the test data it created. For my usage, I'm not sure I need it. I have implemented the "Adjacency List" tree implementation (that's what I've been told). In short, my tree contains three basic fields: id, name, parent_id. Given that I'm considering adding a new field path_name to the tree, I can't see the ltree package will give me anything more than I can get from like. My main reason for adding path_name was doing queries such as: select * from tree where path_name like '/path/to/parent/%' which will return me all the descendants of a give node (in this case '/path/to/parent/'.[2] I have discussed [offlist] the option of using a secondary table to store the pathname (i.e. a cach table) which would be updated using a loop in the tigger instead of using cascading triggers. I would prefer to keep the pathname in the same table. In my application, I have about 120,000 nodes in the tree. I am using PL/pgSQL quite a lot. Perhaps moving the triggers to C at a later date may provide a speed increase if the tree expands considerably. Also, it is noted that those triggers set the pathname twice, once in the before, and once in the after trigger. I'll try to optimize that for a future "release". ltreetest=# \d List of relations Name | Type | Owner --+---+--- test | table | dan (1 row) ltreetest=# select * from test; path --- Top Top.Science Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Hobbies Top.Hobbies.Amateurs_Astronomy Top.Collections Top.Collections.Pictures Top.Collections.Pictures.Astronomy Top.Collections.Pictures.Astronomy.Stars Top.Collections.Pictures.Astronomy.Galaxies Top.Collections.Pictures.Astronomy.Astronauts (13 rows) [1] - For other following on, I had to do the following: - downloaded the 7.2 version of the code from http://www.sai.msu.su/~megera/postgres/gist/ltree/ - installed using gmake not make - grabbed the sample file from http://developer.postgresql.org/cvsweb.cgi/pgsql- server/contrib/ltree/ltreetest.sql [2] - My application involves mirroring a file system (directories and files). FWIW, in this instances, files are not renamed, they are deleted and recreated elsewhere. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL] connectby questions
I just installed 7.3rc1 and added contrib/tablefunc. I am able to get the example in the README document to work. I am having trouble understanding how to get my real data to behave. The table is: freshports=# \d element Table "public.element" Column| Type |Modifiers -+--+-- id | integer | not null default nextval('element_id_seq'::text) name| text | not null parent_id | integer | directory_file_flag | character(1) | not null status | character(1) | not null I have been able to get simple examples to work: freshports=# select id, parent_id from connectby('element', 'id', 'parent_id', '104503', 0) as t(id int, parent_id int, level int); id | parent_id +--- 104503 | 104504 |104503 104505 |104503 (3 rows) Why does level not appear here? I see a similar problem with this query: freshports=# select id, parent_id from connectby('element', 'id', 'parent_id', '104503', 0, '/') as t(id int, parent_id int, level int, branch text); id | parent_id +--- 104503 | 104504 |104503 104505 |104503 (3 rows) Here is the actual data for the above nodes: freshports=# select * from element where id in (104503, 104504, 104505); id | name | parent_id | directory_file_flag | status +--+---+-+ 104503 | multimedia | 77344 | D | A 104504 | Makefile |104503 | F | A 104505 | chapter.sgml |104503 | F | A (3 rows) What I would like to include in the output is all of the above fields. But I can't seem to get that to work: freshports=# select id, parent_id, name from connectby('element', 'id', 'parent_id', '104503', 0, '/') as t(id int, parent_id int, level int, branch text, name text); ERROR: Query-specified return tuple not valid for Connectby: wrong number of columns I was able to do this with a view: freshports=# create view simple_element as select id, parent_id from element; CREATE VIEW freshports=# select * from connectby('simple_element', 'id', 'parent_id', '104503', 0, '/') as t(id int, parent_id int, level int, branch text); id | parent_id | level |branch +---+---+--- 104503 | | 0 | 104503 104504 |104503 | 1 | 104503/104504 104505 |104503 | 1 | 104503/104505 (3 rows) Whis is expected given what I see in the README. But there doesn't seem to be any way to get the name field out: freshports=# drop view simple_element; DROP VIEW freshports=# create view simple_element as select id, parent_id, name from element; CREATE VIEW freshports=# select * from connectby('simple_element', 'id', 'parent_id', '104503', 0, '/') as t(id int, parent_id int, level int, branch text); id | parent_id | level |branch +---+---+--- 104503 | | 0 | 104503 104504 |104503 | 1 | 104503/104504 104505 |104503 | 1 | 104503/104505 (3 rows) freshports=# select * from connectby('simple_element', 'id', 'parent_id', '104503', 0, '/') as t(id int, parent_id int, level int, branch text, name text); ERROR: Query-specified return tuple not valid for Connectby: wrong number of columns freshports=# I hope it's just that it's late and I'm missing something. Cheers. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] connectby questions
On Fri, 22 Nov 2002, Dan Langille wrote: > Here is the actual data for the above nodes: > > freshports=# select * from element where id in (104503, 104504, 104505); >id | name | parent_id | directory_file_flag | status > +--+---+-+ > 104503 | multimedia | 77344 | D | A > 104504 | Makefile |104503 | F | A > 104505 | chapter.sgml |104503 | F | A > (3 rows) > > What I would like to include in the output is all of the above fields. DOH! Use a join stupid! freshports=# select t.*, tt.name freshports-# from connectby('simple_element', 'id', 'parent_id', '104503', 0, '/') freshports-# as t(id int, parent_id int, level int, branch text), element tt freshports-# where t.id = tt.id; id | parent_id | level |branch | name +---+---+---+-- 104503 | | 0 | 104503| multimedia 104504 |104503 | 1 | 104503/104504 | Makefile 104505 |104503 | 1 | 104503/104505 | chapter.sgml (3 rows) Ok, that works. But I have two issues: 1 - speed: That join takes 7 seconds: freshports=# explain analyze freshports-# select t.*, tt.name freshports-# from connectby('simple_element', 'id', 'parent_id', '104503', 0, '/ ') freshports-# as t(id int, parent_id int, level int, branch text), elemen t tt freshports-# where t.id = tt.id; QUERY PLAN -- Merge Join (cost=62.33..3050.43 rows=1000 width=60) (actual time=7420.23..7421 .03 rows=3 loops=1) Merge Cond: ("outer".id = "inner".id) -> Index Scan using element_pkey on element tt (cost=0.00..2708.97 rows=104 649 width=16) (actual time=1.69..5933.32 rows=104505 loops=1) -> Sort (cost=62.33..64.83 rows=1000 width=44) (actual time=10.84..10.87 ro ws=3 loops=1) Sort Key: t.id -> Function Scan on connectby t (cost=0.00..12.50 rows=1000 width=44) (actual time=10.12..10.17 rows=3 loops=1) Total runtime: 7421.78 msec (7 rows) freshports=# 2 - What I really want in the output is the branch defined by the name fields, not by the id fields (e.g. instead of 104503/104504, show multimedia/Makefile. For what its worth, I did populate my test database with the full pathname field, maintained by triggers. However, the initial population of that data took 160 minutes... Luckily, the tiggers are there as constraints (of a sort) rather than actually used to cascade changes. In practice, nodes do not get renamed in my application. Cheers ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] calculating interval
On 22 Nov 2002, praveen vejandla wrote: > Dear All, > > Is there any way in postgresql to calculate the interval between > two times excluding specific days,specific duration. > > Ex: > timestamp1 : 2002-10-01 10:30AM > timestamp2 : 2002-15-01 04:50PM > > suppose if i need the difference between timestamp1,timestamp2 but > i don't want to count how many sun days are coming, i want to > ignore all sundays in between,i want to ignore certain timings(say > 10.00 AM to 5:00PM)s also,then how can I get the duration in this > way. My guess: write a function. Calculating the number of days between the two dates is easy. To avoid certain days, of the week, in your case, Sunday, I would count the number of whole weeks between the two dates. test=# select '2002-10-01 10:30AM'::timestamp - '2002-15-01 04:50PM'::timestamp; ?column? 258 days 16:40 In this case 258/7 = 36.85... So you know you have 36 Sundays in there. This will need adjusting for non-full weeks. Hope that gets you started. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Trees: maintaining pathnames
On 20 Nov 2002 at 15:20, Dan Langille wrote: > On 17 Nov 2002 at 14:51, Josh Berkus wrote: > > > Dan, > > > > > My existing tree implementation reflects the files contained on > > > disk. > > > The > > > full pathname to a particlar file is obtained from the path to the > > > parent directory. I am now considering putting this information > > > into a field in the table. > > > > > Suggestions, comment, open ridicule, most welcome. thanks. > > > > This is a fine implementation using the adjacency list model of tree > > design. However, I think you may find that the string-based tree > > implementation in /contrib/ltree is more suited to your purposes, > > and easier to maintain. > > That looks interesting. I have installed that onto a test server and > I'm playing around with it. FWIW, the ltree seems to implement a tree through text manipulation. I already have a tree (using a sinble table with id, parent_id). Therefore, I think ltree is not an option in this situation. My creation of the pathname was to save processing time. I'll talk more about that in my next post. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Trees: maintaining pathnames
On 17 Nov 2002 at 11:39, Dan Langille wrote: > My existing tree implementation reflects the files contained on disk. > The full pathname to a particlar file is obtained from the path to the > parent directory. I am now considering putting this information into > a field in the table. > > Attached you will find the pg_dump from my test database (2.4k) if you > want to test with this setup and in case what I have pasted below > contains an error. > > Here is the table and the test data: > > create table tree(id int not null, parent_id int, name text not null, > pathname text not null, primary key (id)); > > insert into tree (id, name, pathname) values (1, 'usr', '/usr'); > insert into tree (id, name, parent_id, pathname) values (2, 'ports', > 1, '/usr/ports'); insert into tree values (3, 2, 'security', 'test'); > > select * from tree; > > test=# select * from tree; > id | parent_id | name | pathname > +---+--+- > 1 | | usr | /usr > 2 | 1 | ports| /usr/ports > 3 | 2 | security | /usr/ports/security > (3 rows) > > > The goal is to ensure that pathname always contains the correct value. I am now trying another method, which involves the use of a cache table. In short, we store the pathname in another table. create table tree_pathnames ( id int4 not null, pathname text not null, primary key(id), foreign key (id) references tree(id) on delete cascade on update cascade ); I populated this table with the following: insert into tree_pathnames select id, pathname from tree; My next task was to create a function which would cascade a change to tree.name throughout tree_pathname. Here is what I came up with: create or replace function tree_pathname_set_children(int4, text) returns int as 'DECLARE node ALIAS for $1; path ALIAS for $2; children record; BEGIN FOR children IN SELECT ep.id, ep.pathname, e.name FROM element_pathnames ep, element e WHERE ep.id = e.id AND e.parent_id = node LOOP -- children.pathname = path || ''/'' || children.name; RAISE NOTICE ''in tree_pathname_set_children %/%'', path, children.name ; UPDATE element_pathnames set pathname = path || ''/'' || children.name where id = children.id; perform tree_pathname_set_children(children.id, path || ''/'' || children.name); END LOOP; return 0; END;' language 'plpgsql'; This function is invoked from within the trigger on tree: create or replace function tree_pathnames() returns opaque as ' DECLARE parent_pathname text; my_pathname text; BEGIN if old.name <> new.name then select pathname into parent_pathname from tree_pathnames where id = new.parent_id; if found then my_pathname = parent_pathname || \'/\' ||new.name; else my_pathname = \'/\' || new.name; end if; new.pathname = my_pathname; update tree_pathnames set pathname = my_pathname where id = new.id; perform tree_pathname_set_children(new.id,my_pathname); end if; RETURN new; END;' language 'plpgsql'; drop trigger tree_pathnames on element; create trigger tree_pathnames before update on element for each row execute procedure tree_pathnames(); I have done only preliminary testing on this, but it seems to work fine for my application. Comments please. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[SQL] subselect instead of a view...
Create view WLE_pathnames as SELECT E.name, EP.pathname FROM element E, element_pathnames EP, watch_list_element WLE WHERE WLE.watch_list_id = 3724 and WLE.element_id= E.id and E.id = EP.id; name | pathname -+- euchre | /ports/games/euchre reallyslick | /ports/graphics/reallyslick The above query is pretty well optimized: Nested Loop (cost=0.00..647.08 rows=62 width=61) (actual time=0.99..1.19 rows=2 loops=1) -> Nested Loop (cost=0.00..437.06 rows=62 width=20) (actual time=0.66..0.78 rows=2 loops=1) -> Index Scan using watch_list_element_pkey on watch_list_element wle (cost=0.00..229.64 rows=62 width=4) (actual time=0.34..0.36 rows=2 loops=1) -> Index Scan using element_pkey on element e (cost=0.00..3.34 rows=1 width=16) (actual time=0.16..0.17 rows=1 loops=2) -> Index Scan using element_pathnames_pkey on element_pathnames ep (cost=0.00..3.38 rows=1 width=41) (actual time=0.16..0.17 rows=1 loops=2) Total runtime: 1.44 msec Now I want to get all the stuff from element_pathnames like pathname || '/%'. Does that make sense? Essentially, I want this using the above view: explain analyze SELECT E.id, CLE.commit_log_id, E.name, EP.pathname FROM element E, element_pathnames EP, commit_log_elements CLE, WLE_pathnames WLEP WHERE E.id = EP.id AND EP.pathname like WLEP.pathname || '/%' AND CLE.element_id= E.id ORDER BY EP.pathname; I know this can be done better, I just can't figure out how. I keep thinking of a subselect but I'm totally blocked. It must be bed time. Sort (cost=285579.85..285579.85 rows=67012 width=114) (actual time=9463.95..9464.01 rows=11 loops=1) -> Hash Join (cost=264060.42..272748.13 rows=67012 width=114) (actual time=9154.69..9463.55 rows=11 loops=1) -> Seq Scan on commit_log_elements cle (cost=0.00..3936.75 rows=216575 width=8) (actual time=0.18..1762.38 rows=216575 loops=1) -> Hash (cost=263370.92..263370.92 rows=36997 width=106) (actual time=5716.62..5716.62 rows=0 loops=1) -> Hash Join (cost=258032.99..263370.92 rows=36997 width=106) (actual time=5524.78..5695.47 rows=10 loops=1) -> Seq Scan on element e (cost=0.00..2286.70 rows=119570 width=16) (actual time=0.15..892.40 rows=119570 loops=1) -> Hash (cost=257416.50..257416.50 rows=36997 width=90) (actual time=3481.05..3481.05 rows=0 loops=1) -> Nested Loop (cost=0.00..257416.50 rows=36997 width=90) (actual time=1847.01..3465.54 rows=10 loops=1) -> Nested Loop (cost=0.00..647.08 rows=62 width=49) (actual time=0.96..1.50 rows=2 loops=1) -> Nested Loop (cost=0.00..437.06 rows=62 width=8) (actual time=0.64..0.94 rows=2 loops=1) -> Index Scan using watch_list_element_pkey on watch_list_element wle (cost=0.00..229.64 rows=62 width=4) (actual time=0.34..0.37 rows=2 loops=1) -> Index Scan using element_pkey on element e (cost=0.00..3.34 rows=1 width=4) (actual time=0.21..0.22 rows=1 loops=2) -> Index Scan using element_pathnames_pkey on element_pathnames ep (cost=0.00..3.38 rows=1 width=41) (actual time=0.21..0.23 rows=1 loops=2) -> Seq Scan on element_pathnames ep (cost=0.00..2355.70 rows=119570 width=41) (actual time=0.08..858.74 rows=119570 loops=2) Total runtime: 9464.51 msec Clues please? -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] subselect instead of a view...
On 25 Nov 2002 at 22:15, Dan Langille wrote: > I know this can be done better, I just can't figure out how. I keep > thinking of a subselect but I'm totally blocked. It must be bed time. It's odd what reading the paper, relaxing with a book, and then trying to sleep can generate. There I was, almost dropping off, when I realised I needed this: SELECT E.name, EP.pathname, E.id, EP2.pathname, CL.id FROM element E, element_pathnames EP, watch_list_element WLE, element_pathnames EP2, element E2, commit_log_elements CLE, commit_log CL WHERE WLE.watch_list_id = 3724 and WLE.element_id= E.id and E.id = EP.id and EP2.pathname like EP.pathname || '/%' AND EP2.id= E2.id AND E2.id = CLE.element_id AND CLE.commit_log_id = CL.id; I am still suspicous of that like. It seems to be the performance killer here. There is an index which can be used: # explain select * from element_pathnames WHERE pathname like 'abc%'; NOTICE: QUERY PLAN: Index Scan using element_pathnames_pathname on element_pathnames (cost=0.00..5.80 rows=1 width=41) But in the main query, it doesn't get picked up. The explain appears below (and at http://www.freshports.org/tmp/explain.txt which will be easier to read than this text-wrapped version). There are quite a few sequential scans there. I'm confused as to why the indexes are not being used. A "vacuum analyze" has been run. Thanks. Hash Join (cost=266574.28..279596.82 rows=67012 width=118) -> Hash Join (cost=263685.03..272372.74 rows=67012 width=114) -> Seq Scan on commit_log_elements cle (cost=0.00..3936.75 rows=216575 width=8) -> Hash (cost=262995.54..262995.54 rows=36997 width=106) -> Hash Join (cost=2994.62..262995.54 rows=36997 width=106) -> Nested Loop (cost=0.00..257416.50 rows=36997 width=102) -> Nested Loop (cost=0.00..647.08 rows=62 width=61) -> Nested Loop (cost=0.00..437.06 rows=62 width=20) -> Index Scan using watch_list_element_pkey on watch_list_element wle (cost=0.00..229.64 rows=62 width=4) -> Index Scan using element_pkey on element e (cost=0.00..3.34 rows=1 width=16) -> Index Scan using element_pathnames_pkey on element_pathnames ep (cost=0.00..3.38 rows=1 width=41) -> Seq Scan on element_pathnames ep2 (cost=0.00..2355.70 rows=119570 width=41) -> Hash (cost=2286.70..2286.70 rows=119570 width=4) -> Seq Scan on element e2 (cost=0.00..2286.70 rows=119570 width=4) -> Hash (cost=2543.20..2543.20 rows=58420 width=4) -> Seq Scan on commit_log cl (cost=0.00..2543.20 rows=58420 width=4) -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] subselect instead of a view...
On 26 Nov 2002 at 0:29, Tom Lane wrote: > "Dan Langille" <[EMAIL PROTECTED]> writes: > > SELECT E.name, EP.pathname, E.id, EP2.pathname, CL.id > > ... > > and EP2.pathname like EP.pathname || '/%' > > > I am still suspicous of that like. It seems to be the performance > > killer here. There is an index which can be used: > > It won't be, though. The LIKE-to-indexscan transformation happens at > plan time, and that means it can only happen if the pattern is a > constant. Which it surely will not be in your example. Thanks. I'll see if I can come up with something else to do this. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] escape single quote in INSERT command
On 27 Nov 2002 at 0:01, [EMAIL PROTECTED] wrote: > > Hi Group - > > > > I have a perl application for a registration form. > > Same Here, > > Why dont' you use prepare and execute in case you are using DBI > same program is like this. > > $dbh = DBI -> connect ( ".."); > $sth = $dbh -> prepare("insert into tab (a,b) values (?,?)"); > $sth -> execute($a , $b ); > $sth -> finish(); > $dbh -> commit(); > $dbh -> disconnect(); IIRC, there is a dbi->quote() function as well. That should properly escape anything. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] master-detail relationship and count
On 29 Nov 2002 at 13:16, Achilleus Mantzios wrote: > On Fri, 29 Nov 2002, Gary Stainburn wrote: > > > As you can see from the extract below, your statement has worked for all > > landmarks that have links, but ignores any landmarks with out links. How can > > I adjust this so that all landmarks are listed, but with a zero count where > > appropriate? > > Then, use LEFT OUTER JOIN ... USING (), > in combination with COALESCE(). > > (read the docs) When it comes to outer joins, this page is quite useful: http://www.postgresql.org/idocs/index.php?explicit-joins.html I was reading it yesterday when improving the speed of some JOINs. The JOIN went from 3440ms to about 18ms when following the advice on that page. I've put the queries and the explain output at http://www.freshports.org/tmp/outer-join.txt. This will be an example of an OUTER JOIN but not the master-detail situation which occurs elsewhere in the database. cheers -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[SQL] adding a GROUP BY to an outer join
This select gives me the number of times a given element appears on each of the watch lists owned by user 2; SELECT COUNT(watch_list_id), element_id FROM watch_list WL, watch_list_element WLE WHERE WL.user_id = 2 AND WL.id = WLE.watch_list_id GROUP BY WLE.element_id; This query assumes there is only one watch list per person, and it tells me whether or not a given item in commits_latest_ports appears on that single watch list. SELECT category, port, CASE when WLE.element_id is null then 0 else 1 END as watch FROM watch_list_element WLE RIGHT OUTER JOIN ( select * from commits_latest_ports ) AS TEMP ON WLE.watch_list_id = 32 AND WLE.element_id= TEMP.element_id ORDER BY commit_date_raw desc, category, port limit 10 My goal is to combine the two queries (i.e. allow multiple watch lists). What I came up with works well. Can you see another solution? select category, port, commits_latest_ports.element_id, commit_date_raw, TEMP.watch from commits_latest_ports LEFT OUTER JOIN (SELECT element_id, COUNT(watch_list_id) as watch FROM watch_list JOIN watch_list_element ON watch_list.id = watch_list_element.watch_list_id AND watch_list.user_id = 2 GROUP BY watch_list_element.element_id) AS TEMP ON TEMP.element_id = commits_latest_ports.element_id ORDER BY commit_date_raw, category, port; She runs pretty well: Sort (cost=1046.27..1046.27 rows=115 width=44) (actual time=6.18..6.75 rows=115 loops=1) -> Hash Join (cost=1034.57..1042.34 rows=115 width=44) (actual time=1.94..4.88 rows=115 loops=1) -> Seq Scan on commits_latest_ports (cost=0.00..7.15 rows=115 width=32) (actual time=0.09..1.51 rows=115 loops=1) -> Hash (cost=1034.55..1034.55 rows=6 width=12) (actual time=1.74..1.74 rows=0 loops=1) -> Subquery Scan temp (cost=1034.24..1034.55 rows=6 width=12) (actual time=1.18..1.64 rows=10 loops=1) -> Aggregate (cost=1034.24..1034.55 rows=6 width=12) (actual time=1.17..1.52 rows=10 loops=1) -> Group (cost=1034.24..1034.39 rows=63 width=12) (actual time=1.11..1.32 rows=10 loops=1) -> Sort (cost=1034.24..1034.24 rows=63 width=12) (actual time=1.10..1.15 rows=10 loops=1) -> Nested Loop (cost=0.00..1032.35 rows=63 width=12) (actual time=0.64..0.97 rows=10 loops=1) -> Index Scan using watch_list_user_id on watch_list (cost=0.00..15.25 rows=4 width=4) (actual time=0.29..0.31 rows=3 loops=1) -> Index Scan using watch_list_element_pkey on watch_list_element (cost=0.00..272.63 rows=75 width=8) (actual time=0.12..0.16 rows=3 loops=3) Total runtime: 19.78 msec Phew! That's fast! -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] ON DELETE CASCADE
On 12 Dec 2002 at 11:01, Tim Perdue wrote: > That command works, but now I think I have 2x as many triggers as I > want. How do I get rid of the original triggers? I had to do something similar and documented it at http://www.freebsddiary.org/postgresql-dropping-constraints.php hth -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Oracle outer join porting question
On 15 Jan 2003 at 16:31, Marko Asplund wrote: > > i'm trying to port an existing application from Oracle8i to PostgreSQL but > i'm having problems understanding a certain outer join query type used in > the application. the query includes a normal outer join between two tables > but also uses outer join syntax to join a table with a constant. here's a > simplified version of the query: > > SELECT doc.id,doc.title,sub.user_id,sub.operation > FROM document doc, document_subscription sub > WHERE 6 = sub.user_id(+) AND sub.document_id(+) = doc.id; > > what does the '6 = sub.user_id(+)' condition exactly do in this query? > how would this be translated SQL92 join syntax used by PostgreSQL? > > i've tried converting it to: > > SELECT doc.id,doc.title,sub.user_id,sub.operation > FROM document doc LEFT OUTER JOIN document_subscription sub > ON sub.document_id = doc.id > WHERE (sub.user_id = 6 OR sub.user_id IS NULL); > > but this query is missing the rows in the documents table which have a > corresponding document_subscription row with 'not user_id = 6'. What about this: SELECT doc.id,doc.title,sub.user_id,sub.operation FROM document doc LEFT OUTER JOIN document_subscription sub ON sub.document_id = doc.id; id | title | user_id | operation +---+-+--- 1 | doc1 | 5 | op1 2 | doc2 | 5 | op2 2 | doc2 | 6 | op2 4 | doc4 | | (4 rows) > > here're also simplified definitions of the two tables used in the query > and some test data: Thanks for supplying the table and data. That makes things much easier. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Oracle outer join porting question
On 15 Jan 2003 at 16:40, Tambet Matiisen wrote: > Try this: > > SELECT doc.id,doc.title,sub.user_id,sub.operation > FROM document doc LEFT OUTER JOIN document_subscription sub > ON sub.document_id = doc.id AND sub.user_id = 6; FWIW: test=# SELECT doc.id,doc.title,sub.user_id,sub.operation test-# FROM document doc LEFT OUTER JOIN document_subscription sub test-# ON sub.document_id = doc.id AND sub.user_id = 6; id | title | user_id | operation +---+-+--- 1 | doc1 | | 2 | doc2 | 6 | op2 4 | doc4 | | -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL] design review, FreshPorts change
Hi folks, I know a number of you use FreeBSD and my FreshPorts website. I've just posted http://www.freshports.org/docs/404-for-virtual-pages.php which contains some proposed changes. Of note is the use of a rule to update a cross reference table. I'd appreciate feedback please, both technical and user. Cheers -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] 7.3 "group by" issue
Hi folks, This query: SELECT element_id as wle_element_id, COUNT(watch_list_id) FROM watch_list JOIN watch_list_element ON watch_list.id = watch_list_element.watch_list_id AND watch_list.user_id = 1 GROUP BY watch_list_element.element_id gives this error: ERROR: Attribute unnamed_join.element_id must be GROUPed or used in an aggregate function Note that in the select the table name is not mentioned but it is in the GROUP BY. To solve the problem, you either have to name the table in both locations or not name it in either location. Why? -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] 7.3 "group by" issue
On Fri, 21 Feb 2003, Josh Berkus wrote: > Dan, Chad, > > > I see the distinction you are making. > > > > Maybe Tom or Josh could throw out a better answer, but I think that youve > > called it one thing in your select and tried to group by it using a > > syntaticly different name. > > This looks like a bug to me. Please write it up and send it to BUGS. Will do. > 7.3.2, I assume? Yes. FWIW, I'm upgrading FreshPorts.org from 7.2.3. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] 7.3 "group by" issue
On 21 Feb 2003 at 19:18, Gaetano Mendola wrote: > > Hi folks, > > > > This query: > > > > SELECT element_id as wle_element_id, COUNT(watch_list_id) > > FROM watch_list JOIN watch_list_element > > ON watch_list.id = watch_list_element.watch_list_id > >AND watch_list.user_id = 1 > > GROUP BY watch_list_element.element_id > > Try: > > SELECT element_id as wle_element_id, COUNT(watch_list_id) > FROM watch_list JOIN watch_list_element > ON watch_list.id = watch_list_element.watch_list_id > WHERE > watch_list.user_id = 1 >GROUP BY watch_list_element.element_id ERROR: Attribute unnamed_join.element_id must be GROUPed or used in an aggregate function -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] 7.3 "group by" issue
On 21 Feb 2003 at 13:00, Chad Thompson wrote: > > On 21 Feb 2003 at 19:18, Gaetano Mendola wrote: > > > > > > Hi folks, > > > > > > > > This query: > > > > > > > > SELECT element_id as wle_element_id, COUNT(watch_list_id) > > > > FROM watch_list JOIN watch_list_element > > > > ON watch_list.id = watch_list_element.watch_list_id > > > >AND watch_list.user_id = 1 > > > > GROUP BY watch_list_element.element_id > > > > > > Try: > > > > > > SELECT element_id as wle_element_id, COUNT(watch_list_id) > > > FROM watch_list JOIN watch_list_element > > > ON watch_list.id = watch_list_element.watch_list_id > > > WHERE > > > watch_list.user_id = 1 > > >GROUP BY watch_list_element.element_id > > > > ERROR: Attribute unnamed_join.element_id must be GROUPed or used in > > an aggregate function > > > > I think that the wrong problem was solved here. Items in the order by > clause must be in the target list. > > heres what it says in the docs > *The ORDER BY clause specifies the sort order: > > *SELECT select_list > * FROM table_expression > * ORDER BY column1 [ASC | DESC] [, column2 [ASC | DESC] ...] > *column1, etc., refer to select list columns. These can be either the output > name of a column (see Section 4.3.2) or the number of a column. Some > examples: > > Note that "column1, etc., refer to select list" I don't see how ORDER BY enters into this situation. It's not used. What are you saying? -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[SQL] Analyse article
Hi folks, I've just finished writing up a bit on analyze. http://www.freebsddiary.org/postgresql-analyze.php It's not so much an explanation of the command as it is a series of examples which show a query evolving from a 4 second monster to a 14 ms speed daemon. Corrections, comments, etc appreciated. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] 7.3 "group by" issue
On 21 Feb 2003 at 13:30, Chad Thompson wrote: > The same applies to group by... Sorry for the confusion. > > If the column is not in the select section of the statement, it cant group > by it. > Try this. > > SELECT element_id as wle_element_id, COUNT(watch_list_id) >FROM watch_list JOIN watch_list_element > ON watch_list.id = watch_list_element.watch_list_id > WHERE >watch_list.user_id = 1 > GROUP BY wle_element_id Yes, that works. But so do these. SELECT watch_list_element.element_id as wle_element_id, COUNT(watch_list_id) FROM watch_list JOIN watch_list_element ON watch_list.id = watch_list_element.watch_list_id WHERE watch_list.user_id = 1 GROUP BY watch_list_element.element_id SELECT element_id as wle_element_id, COUNT(watch_list_id) FROM watch_list JOIN watch_list_element ON watch_list.id = watch_list_element.watch_list_id WHERE watch_list.user_id = 1 GROUP BY element_id The original situation which did not work is: SELECT watch_list_element.element_id as wle_element_id, COUNT(watch_list_id) FROM watch_list JOIN watch_list_element ON watch_list.id = watch_list_element.watch_list_id WHERE watch_list.user_id = 1 GROUP BY element_id My question: why should it not work? It's referring to the same column as the previous two examples which do work. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] timestamp with postgresql 7.3
On 4 Apr 2003 at 22:18, Claude wrote: > > Hi, > > I have a table a field with timestamps in seconds since epoch and I > would like to get a human readable date... but it seems that > postgresql 7.3 does not support the datetime(), timestamp(), > timestamptz() functions... > > I tried the example in: > http://archives.postgresql.org/pgsql-bugs/2002-07/msg00117.php > > and get: > > DB=# select val, datetime(val), "timestamp"(val), timestamptz(val) > from test_table; ERROR: Function datetime(integer) does not exist > Unable to identify a function that satisfies the given > argument types You may need to add explicit typecasts > > And tried various typecasts without any success. > > Any help? I think you need to read the 7.3 documentation and release notes. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] inet versus text for ip addresses
The PostgreSQL inet datatype stores an holds an IP host address, and optionally the identity of the subnet it is in, all in one field. This requires 12 bytes. Using my "random" data of approximately 8000 IP addresses collected during previous polls, I've found the average length of an IP address is 13.1 bytes.An integer requires 4 bytes. First question: Why not store an option to store just an IP address? That should require less than the 12 bytes for inet. On to the real question: The existing tables are: create table recount_ips ( ipidserialnot null, ipaddress inet not null, primary key (ipid) ); create unique index recount_ips_ip_address on recount_ips (ipaddress); create table recount_iptopolls ( pollid integer not null, ipidinteger not null, primary key (pollid, ipid) ); alter table recount_iptopolls add foreign key (pollid) references recount_polls (pollid) on update restrict on delete restrict; alter table recount_iptopolls add foreign key (ipid) references recount_ips (ipid) on update restrict on delete restrict; I think a better solution is one table: create table recount_iptopolls ( pollid integer not null, ipaddress inet not null, primary key (pollid, ipaddress) ); alter table recount_iptopolls add foreign key (pollid) references recount_polls (pollid) on update restrict on delete restrict; It removes a table and the associated primary key, and removed a foreign key from the modified recount_iptopolls table. Comments? -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] Getting the return type right for SETOF
Hi folks, I'm playing with SETOF on functions. But I can't get the return type correct. What have I missed? A cast? CREATE OR REPLACE FUNCTION elementGet (text) RETURNS SETOF element_type AS ' select 1, \'test\', \'F\' \'A\', FALSE, FALSE ' LANGUAGE sql stable; ERROR: function declared to return element_type returns "unknown" instead of text at column 2 \d element_type Composite type "public.element_type" Column | Type +- id | integer name | text type | text status | text iscategory | boolean isport | boolean -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Getting the return type right for SETOF
On 30 Aug 2003 at 13:59, Stephan Szabo wrote: > On Sat, 30 Aug 2003, Dan Langille wrote: > > > Hi folks, > > > > I'm playing with SETOF on functions. But I can't get the return type > > correct. What have I missed? A cast? > > > > CREATE OR REPLACE FUNCTION elementGet (text) RETURNS SETOF > > element_type AS ' > > > > select 1, > >\'test\', > >\'F\' > >\'A\', > >FALSE, > >FALSE > > ' > > LANGUAGE sql stable; > > ERROR: function declared to return element_type returns "unknown" > > instead of text at column 2 > > I think you'll need to explicitly make the three text columns text rather > than just a plain literal (so ''test''::text for example) Right you are! Here is the real function: CREATE OR REPLACE FUNCTION elementGet (text) RETURNS SETOF element_type AS ' select id, name::text, directory_file_flag::text, status::text, case when IsPort(Pathname_ID($1)) IS NULL THEN FALSE ELSE TRUE END, case when IsCategory(Pathname_ID($1)) IS NULL THEN FALSE ELSE TRUE END FROM element WHERE id = PathName_ID($1); ' LANGUAGE sql stable; select * from elementGet('ports/security/logcheck'); id | name | type | status | iscategory | isport ---+--+--+++ 37342 | logcheck | D| A | t | f And it's fast too: explain analyse select * from elementGet('ports/security/logcheck'); QUERY PLAN ------ - Function Scan on elementget (cost=0.00..12.50 rows=1000 width=102) (actual time=64.28..64.28 rows=1 loops=1) Total runtime: 64.35 msec Thank you. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Backup of multiple tables
On 23 Sep 2003 at 0:34, Andreas Joseph Krogh wrote: > On Friday 19 September 2003 17:38, Tom Lane wrote: > > > pg_dump can only handle one -t option at a time. It'd make sense to > > allow multiple -t options (likewise -n) but no one's got round to > > improving the code in that particular direction. I don't think it would > > be hard; want to fix it and send in a patch? > > I've never looked at the code-base of pgsql before, but I'll give it a try. > Don't expect anything real soon tho. If you do deliver, I for one will buy you a beer should we ever meet. I'm sure others would feel simlarly obliged. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
[SQL] Removing simliar elements from a set
Hi folks, I'm trying to remove items from a set which are similar to items in another set. This is the set (MASTER) from which I wish to remove items: /ports/Mk/bsd.python.mk /ports/lang/python-doc-html/distinfo /ports/lang/python/Makefile /ports/lang/python/distinfo /ports/lang/python/files/patch-Modules-Setup.dist These are the items (MATCHES) which are the prefixes which must be removed: /ports/lang/python-doc-html /ports/lang/python In this case, the answer would be: /ports/Mk/bsd.python.mk In short, we remove all items from MASTER which are under the directories specified in MATCHES. My first attempt, which works only if MATCHES contains one item: SELECT * FROM MASTER JOIN MATCHES ON NOT (MASTER.pathname ~ ('^' || MATCHES.pathname || '/.+')); However, if there is more than one row in MATCHES, this will not work. Clues please? -- Dan Langille - http://www.langille.org/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] Removing simliar elements from a set
On 26 Sep 2003 at 16:55, Josh Berkus wrote: > Dan, > > > I'm trying to remove items from a set which are similar to items in > > another set. > > > > In short, we remove all items from MASTER which are under the directories > > specified in MATCHES. > > from your example, you are trying to remove all directories which do *not* > match. What do you want, exactly? Josh and I talked on IRC about this. This is the result set I want: /ports/Mk/bsd.python.mk I want things from MASTER which do not match things in MATCHES. Josh suggested this: SELECT * FROM master WHERE NOT EXISTS ( SELECT * FROM matches WHERE master.pathname LIKE (matches.pathname || '/%')); Cheers. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL]
On 28 Sep 2003 at 15:45, Tom Lane wrote: > Dan Langille <[EMAIL PROTECTED]> writes: > > WHERE lastlogin between current_date - interval \''' || > > quote_literal(i - 1) || '' days\' > > AND current_date - interval \''' || > > quote_literal(i) || '' days\'''; > > IIRC, quote_literal() puts single quotes around its result. So you have > too many quotes there. Given that you know i is an integer, you don't > really need quote_literal for it. Actually, you don't need EXECUTE > here at all. Why not just > > FOR i IN 1..MaxDays LOOP > SELECT count(*) > INTO r > FROM users > WHERE lastlogin between current_date - (i-1) * interval ''1 day'' > AND current_date - i * interval ''1 day''; > RETURN NEXT r; > END LOOP; Thank you. I had to replace the " with \', but here is what I came up with (after adding another item to the SELECT): CREATE OR REPLACE FUNCTION LoginCounts(int) RETURNS SETOF logincounts_record AS ' DECLARE MaxDays ALIAS for $1; r logincounts_record%rowtype; i integer; BEGIN raise notice ''MaxDays''; FOR i IN 1..MaxDays LOOP SELECT 1 AS days, count(*) as count INTO r FROM users WHERE lastlogin between current_date - (i-1) * interval \'1 day\' AND current_date - i * interval \'1 day\'; RETURN NEXT r; END LOOP; RETURN; END ' LANGUAGE plpgsql; However, the results are confusing. I'm getting the wrong number of parameters. The value being returned appears to be the value supplied. But the log results show an interesting pattern in the number of selects being run. working-copy.freshports.org=# select count(*) from LoginCounts(1); NOTICE: MaxDays count --- 1 (1 row) The log says: 2003-09-28 16:01:54 [32813] LOG: query: select count(*) from LoginCounts(1); 2003-09-28 16:01:54 [32813] NOTICE: MaxDays 2003-09-28 16:01:54 [32813] LOG: query: select cast($1 as timestamp without time zone) - $2; working-copy.freshports.org=# select count(*) from LoginCounts(2); NOTICE: MaxDays count --- 2 (1 row) And the log says: 2003-09-28 16:02:04 [32813] LOG: query: select count(*) from LoginCounts(2); 2003-09-28 16:02:04 [32813] NOTICE: MaxDays 2003-09-28 16:02:04 [32813] LOG: query: select cast($1 as timestamp without time zone) - $2; 2003-09-28 16:02:04 [32813] LOG: query: select cast($1 as timestamp without time zone) - $2; 2003-09-28 16:02:04 [32813] LOG: query: select cast($1 as timestamp without time zone) - $2; The type in question is: CREATE TYPE logincounts_record AS ( daysinteger, count integer ); -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL]
I'm trying to create a function which returns a result set using a dynamic query. The problem occurs when it compiles. I suspect it's my quoting, but I'm not sure of the cause. CREATE OR REPLACE FUNCTION LoginCounts(int) RETURNS SETOF logincounts_record AS ' DECLARE MaxDays ALIAS for $1; r logincounts_record%rowtype; i integer; BEGIN FOR i IN 1..MaxDays LOOP EXECUTE '' SELECT count(*) INTO r FROM users WHERE lastlogin between current_date - interval \''' || quote_literal(i - 1) || '' days\' AND current_date - interval \''' || quote_literal(i) || '' days\'''; RETURN NEXT r; END LOOP; RETURN; END ' LANGUAGE plpgsql; # select * from LoginCounts(2); WARNING: Error occurred while executing PL/pgSQL function logincounts WARNING: line 9 at execute statement ERROR: parser: parse error at or near "days" at character 151 thnks -- Dan Langille - http://www.langille.org/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] now() in loop statement
On 29 Sep 2003 at 15:58, Kerv wrote: > What is wrong with this function because the mytime variable contain the > same value in each iteration: >From http://www.postgresql.org/docs/7.3/static/functions- datetime.html: It is important to realize that CURRENT_TIMESTAMP and related functions return the start time of the current transaction; their values do not change during the transaction. timeofday() returns the wall clock time and does advance during transactions. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL]
On 29 Sep 2003 at 10:04, Jean-Luc Lachance wrote: > Wouldn't: > > insert into r > select count(*) > from users > where date( lastlogin) > current_date - MaxDays * interval '' 1 day'' > group by date( lastlogin); > > be more efficient? Yes it would, by a factor of 5. freshports=# explain analyse select * from LoginCounts(3); QUERY PLAN -- Function Scan on logincounts (cost=0.00..12.50 rows=1000 width=8) (actual time=1141.04..1141.06 rows=3 loops=1) Total runtime: 1141.13 msec (2 rows) freshports=# explain analyse select count(*) freshports-# from users freshports-# where date( lastlogin) > current_date - 3 * interval ' 1 day' freshports-# group by date( lastlogin); QUERY PLAN -- - Aggregate (cost=539.78..552.75 rows=173 width=8) (actual time=197.54..198.97 rows=3 loops=1) -> Group (cost=539.78..548.42 rows=1730 width=8) (actual time=196.97..198.43 rows=110 loops=1) -> Sort (cost=539.78..544.10 rows=1730 width=8) (actual time=196.95..197.39 rows=110 loops=1) Sort Key: date(lastlogin) -> Seq Scan on users (cost=0.00..446.75 rows=1730 width=8) (actual time=0.87..195.38 rows=110 loops=1) Filter: ((date(lastlogin))::timestamp without time zone > (('now'::text)::date - '3 days'::interval)) Total runtime: 199.33 msec (7 rows) freshports=# Thank you. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL]
On 29 Sep 2003 at 10:04, Jean-Luc Lachance wrote: > Wouldn't: > > insert into r > select count(*) > from users > where date( lastlogin) > current_date - MaxDays * interval '' 1 day'' > group by date( lastlogin); > > be more efficient? Yes it would, by a factor of 5. P.S. but it would not show dates for which there are no logins. The above can return zero rows. The previous example always returns MaxDays rows. -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Multiple table join
On 6 Oct 2003 at 10:26, Louise Cofield wrote: > I am attempting to select fields Location and Item_Num from table A > where A.Location = B.Location, > > AND > > select Item_Description from table C, where A.Item_Num = C.Item_Num. Try: select Location, Item_Num from table A, B, C where A.Location = B.Location and A.Item_Num = C.Item_Num -- Dan Langille : http://www.langille.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] UPDATE one table with values from another
I know there is a simple solution, but I can't remember what it is. :( I have two similar tables. I want to update the fields from one table to contain the values form the other. The two tables are: laptop.freshports.org=# \d commit_log_ports Table "public.commit_log_ports" Column | Type | Modifiers ---+--+--- commit_log_id | integer | not null port_id | integer | not null needs_refresh | smallint | not null port_version | text | port_revision | text | Indexes: commit_log_ports_pkey primary key btree (commit_log_id, port_id), needs_refresh btree (needs_refresh) Foreign Key constraints: $1 FOREIGN KEY (commit_log_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE CASCADE, $2 FOREIGN KEY (port_id) REFERENCES ports(id) ON UPDATE CASCADE ON DELETE CASCADE Triggers: commit_log_ports_insert laptop.freshports.org=# \d commit_log_ports_elements Table "public.commit_log_ports_elements" Column | Type | Modifiers ---+--+--- commit_log_id | integer | not null element_id| integer | not null needs_refresh | smallint | not null port_version | text | port_revision | text | Indexes: commit_log_ports_elements_pkey primary key btree (commit_log_id, element_id) Foreign Key constraints: $1 FOREIGN KEY (commit_log_id) REFERENCES commit_log(id) ON UPDATE CASCADE ON DELETE CASCADE, $2 FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE CASCADE ON DELETE CASCADE laptop.freshports.org=# I can obtain the values I want with this query: SELECT CLP.* FROM commit_log_ports CLP, ports P, commit_log_ports_elements X WHERE CLP.port_id = P.id AND CLP.commit_log_id = X.commit_log_id AND X.element_id = P.element_id; I started writing the UPDATE and got as far as this before brain fatigue set in: UPDATE commit_log_ports_elements X SET X.needs_refresh = CLP.needs_refresh, X.port_version = CLP.port_version, X.port_revision = CLP.port_revision WHERE X.commit_log_id = commit_log_ports CLP AND X. A clue please? Thank you. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] UPDATE one table with values from another
On Wed, 8 Oct 2003, Josh Berkus wrote: > Dan, > > > UPDATE commit_log_ports_elements X > >SET X.needs_refresh = CLP.needs_refresh, > >X.port_version = CLP.port_version, > >X.port_revision = CLP.port_revision > > FROM commit_log_ports CLP > WHERE X.commit_log_id = CLP.commit_log_id Thanks Josh. After a 5 hour drive to Hamilton, my brain was only capable of doing the email. > You can always ask this kind of thing on IRC . If I'd been at home, I would have. This laptop of mine is getting pretty old. It took pretty close to 15 minutes for it to set a field to zero in 91,295 rows I need more ram and a faster laptop! cheers ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL] people who buy A, also buy C, D, E
The goal of my query is: given a book, what did other people who bought this book also buy? I plan the list the 5 most popular such books. In reality, this isn't about books, but that makes it easier to understand I think. We have a table of customer_id (watch_list_id) and book_id (element_id). freshports.org=# \d watch_list_element Table "public.watch_list_element" Column | Type | Modifiers ---+-+--- watch_list_id | integer | not null element_id| integer | not null Indexes: "watch_list_element_pkey" primary key, btree (watch_list_id, element_id) "watch_list_element_element_id" btree (element_id) Foreign-key constraints: "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON UPDATE CASCADE ON DELETE CASCADE "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE CASCADE ON DELETE CASCADE freshports.org=# I have a query which returns the needed results: SELECT W.element_id FROM watch_list_element W WHERE w.watch_list_id in (select watch_list_id from watch_list_element where element_id = 54968) GROUP BY W.element_id ORDER BY count(W.watch_list_id) DESC LIMIT 5; But performance is an issue here. So I'm planning to calculate all the possible values and cache them. That is, given each element_id in a watch_list, what are the top 5 element_id values on all the lists on which the original element_id appears? I'm having trouble constructing the query. I'm not even sure I can do this in one select, but that would be nice. Examples and clues are appreciated. Any ideas? Thank you. -- Dan Langille : http://www.langille.org/ BSDCan - The Technical BSD Conference - http://www.bsdcan.org/ NEW brochure available at http://www.bsdcan.org/2005/advocacy/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] people who buy A, also buy C, D, E
On 26 Apr 2005 at 14:24, Christoph Haller wrote: > Dan Langille wrote: > > > > The goal of my query is: given a book, what did other people who > > bought this book also buy? I plan the list the 5 most popular such > > books. In reality, this isn't about books, but that makes it easier > > to understand I think. > > > > We have a table of customer_id (watch_list_id) and book_id > > (element_id). > > > > freshports.org=# \d watch_list_element > > Table "public.watch_list_element" > > Column | Type | Modifiers > > ---+-+--- > > watch_list_id | integer | not null > > element_id| integer | not null > > Indexes: > > "watch_list_element_pkey" primary key, btree (watch_list_id, > > element_id) > > "watch_list_element_element_id" btree (element_id) > > Foreign-key constraints: > > "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON > > UPDATE CASCADE ON DELETE CASCADE > > "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE > > CASCADE ON DELETE CASCADE > > > > freshports.org=# > > > > I have a query which returns the needed results: > > > > SELECT W.element_id > >FROM watch_list_element W > > WHERE w.watch_list_id in (select watch_list_id from > > watch_list_element where element_id = 54968) > >GROUP BY W.element_id > >ORDER BY count(W.watch_list_id) DESC > > LIMIT 5; > > > > But performance is an issue here. So I'm planning to calculate all > > the possible values and cache them. That is, given each element_id > > in a watch_list, what are the top 5 element_id values on all the > > lists on which the original element_id appears? > > > > I'm having trouble constructing the query. I'm not even sure I can > > do this in one select, but that would be nice. Examples and clues > > are appreciated. > > > > Any ideas? > > > > Thank you. > > -- > > Just two ideas. > > 1) Older Postgres versions are notorious for being slow > on "IN" clauses. > Does this one (untested) perform better: > > SELECT W.element_id, count(W.watch_list_id) > FROM watch_list_element W > WHERE EXISTS > (SELECT * FROM watch_list_element E > WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) > GROUP BY W.element_id ORDER BY 2 DESC LIMIT 5; I'm on PostgreSQL 7.4.7: freshports.org=# explain analyse freshports.org-# SELECT W.element_id, count(W.watch_list_id) freshports.org-# FROM watch_list_element W freshports.org-# WHERE EXISTS freshports.org-# (SELECT * FROM watch_list_element E freshports.org(# WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) freshports.org-# GROUP BY W.element_id freshports.org-# ORDER BY 2 DESC freshports.org-# LIMIT 5; QUERY PLAN -- -- - Limit (cost=417905.49..417905.51 rows=5 width=8) (actual time=3142.480..3142.528 rows=5 loops=1) -> Sort (cost=417905.49..417908.08 rows=1033 width=8) (actual time=3142.471..3142.486 rows=5 loops=1) Sort Key: count(watch_list_id) -> HashAggregate (cost=417851.20..417853.78 rows=1033 width=8) (actual time=3074.170..3112.294 rows=7338 loops=1) -> Seq Scan on watch_list_element w (cost=0.00..417506.76 rows=6 width=8) (actual time=0.129..2619.989 rows=94018 loops=1) Filter: (subplan) SubPlan -> Index Scan using watch_list_element_pkey on watch_list_element e (cost=0.00..3.02 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=137776) Index Cond: (($0 = watch_list_id) AND (element_id = 54968)) Total runtime: 3143.304 ms (10 rows) freshports.org=# Compare that to the original query: freshports.org=# explain analyse freshports.org-# SELECT W.element_id freshports.org-#FROM watch_list_element W freshports.org-# WHERE w.watch_list_id in (select watch_list_id from freshports.org(# watch_list_element where element_id = 54968) freshports.org-#GROUP BY W.element_id freshports.org-#ORDER BY count(W.watch_list_id) DESC freshports.org-# LIMIT 5; QUERY PLAN -- -