Re: [SQL] UPDATE WITH ORDER BY
> Rodrigo Carvalhaes wrote: > > Hi Guys! > > I need to make an UPDATE on a column reordering it with a sequence > using order by a description. > Confusing??? Well.. Let me give an example... > > Today, my table it's organized like this: > > Code / Description > 9 Orange > 15 Apple > 1 Pear > 3 Tomato > > I wanna to reorganize (reordering the code from 1 to ... ordering by > description) > > Code / Description > 1 Apple > 2 Orange > 3 Pear > 4 Tomato > > I created a sequence but I am having no succes to use it because > UPDATE don't accept ORDER BY CLAUSE. The "ideal SQL" is UPDATE table > SET code = nextval('sequence') ORDER BY description > > I searched a lot on the NET without ant "tip" for my case. > It's a very simple need but I am not able to solve it... > > Anyone knows how I can do it? > > Cheers, > > -- > Rodrigo Carvalhaes > I doubt this can be done by a single SQL command. My approach is a function. I did: CREATE TABLE fruittable( fruitcode INTEGER, fruitname TEXT ); INSERT INTO fruittable VALUES( 9, 'Orange' ); INSERT INTO fruittable VALUES( 15, 'Apple' ); INSERT INTO fruittable VALUES( 1, 'Pear' ); INSERT INTO fruittable VALUES( 3, 'Tomato' ); SELECT * FROM fruittable ORDER BY fruitname ; fruitcode | fruitname ---+--- 15 | Apple 9 | Orange 1 | Pear 3 | Tomato CREATE OR REPLACE FUNCTION reorder_fruitcode() RETURNS INTEGER AS ' DECLARE newcode INTEGER ; fruitrecord RECORD ; BEGIN newcode := 1 ; FOR fruitrecord IN SELECT * FROM fruittable ORDER BY fruitname LOOP RAISE NOTICE ''fruitname is %'', fruitrecord.fruitname ; UPDATE fruittable SET fruitcode = newcode WHERE fruitname = fruitrecord.fruitname ; newcode := newcode + 1 ; END LOOP ; RETURN 1; END; ' LANGUAGE plpgsql; SELECT reorder_fruitcode(); SELECT * FROM fruittable ORDER BY fruitname ; fruitcode | fruitname ---+--- 1 | Apple 2 | Orange 3 | Pear 4 | Tomato (4 rows) Voila. Regards, Christoph ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] people who buy A, also buy C, D, E
Dan Langille wrote: > > The goal of my query is: given a book, what did other people who > bought this book also buy? I plan the list the 5 most popular such > books. In reality, this isn't about books, but that makes it easier > to understand I think. > > We have a table of customer_id (watch_list_id) and book_id > (element_id). > > freshports.org=# \d watch_list_element > Table "public.watch_list_element" > Column | Type | Modifiers > ---+-+--- > watch_list_id | integer | not null > element_id| integer | not null > Indexes: > "watch_list_element_pkey" primary key, btree (watch_list_id, > element_id) > "watch_list_element_element_id" btree (element_id) > Foreign-key constraints: > "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON > UPDATE CASCADE ON DELETE CASCADE > "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE > CASCADE ON DELETE CASCADE > > freshports.org=# > > I have a query which returns the needed results: > > SELECT W.element_id >FROM watch_list_element W > WHERE w.watch_list_id in (select watch_list_id from > watch_list_element where element_id = 54968) >GROUP BY W.element_id >ORDER BY count(W.watch_list_id) DESC > LIMIT 5; > > But performance is an issue here. So I'm planning to calculate all > the possible values and cache them. That is, given each element_id in > a watch_list, what are the top 5 element_id values on all the lists > on which the original element_id appears? > > I'm having trouble constructing the query. I'm not even sure I can > do this in one select, but that would be nice. Examples and clues > are appreciated. > > Any ideas? > > Thank you. > -- Just two ideas. 1) Older Postgres versions are notorious for being slow on "IN" clauses. Does this one (untested) perform better: SELECT W.element_id, count(W.watch_list_id) FROM watch_list_element W WHERE EXISTS (SELECT * FROM watch_list_element E WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) GROUP BY W.element_id ORDER BY 2 DESC LIMIT 5; 2) I suspect calculating all possible values would require time and an enormous cache buffer in size as well as re-calculating pretty often. So my approach would be trying to tune the query before introducing cached results. HTH Regards, Christoph ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] people who buy A, also buy C, D, E
On 26 Apr 2005 at 14:24, Christoph Haller wrote: > Dan Langille wrote: > > > > The goal of my query is: given a book, what did other people who > > bought this book also buy? I plan the list the 5 most popular such > > books. In reality, this isn't about books, but that makes it easier > > to understand I think. > > > > We have a table of customer_id (watch_list_id) and book_id > > (element_id). > > > > freshports.org=# \d watch_list_element > > Table "public.watch_list_element" > > Column | Type | Modifiers > > ---+-+--- > > watch_list_id | integer | not null > > element_id| integer | not null > > Indexes: > > "watch_list_element_pkey" primary key, btree (watch_list_id, > > element_id) > > "watch_list_element_element_id" btree (element_id) > > Foreign-key constraints: > > "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON > > UPDATE CASCADE ON DELETE CASCADE > > "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE > > CASCADE ON DELETE CASCADE > > > > freshports.org=# > > > > I have a query which returns the needed results: > > > > SELECT W.element_id > >FROM watch_list_element W > > WHERE w.watch_list_id in (select watch_list_id from > > watch_list_element where element_id = 54968) > >GROUP BY W.element_id > >ORDER BY count(W.watch_list_id) DESC > > LIMIT 5; > > > > But performance is an issue here. So I'm planning to calculate all > > the possible values and cache them. That is, given each element_id > > in a watch_list, what are the top 5 element_id values on all the > > lists on which the original element_id appears? > > > > I'm having trouble constructing the query. I'm not even sure I can > > do this in one select, but that would be nice. Examples and clues > > are appreciated. > > > > Any ideas? > > > > Thank you. > > -- > > Just two ideas. > > 1) Older Postgres versions are notorious for being slow > on "IN" clauses. > Does this one (untested) perform better: > > SELECT W.element_id, count(W.watch_list_id) > FROM watch_list_element W > WHERE EXISTS > (SELECT * FROM watch_list_element E > WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) > GROUP BY W.element_id ORDER BY 2 DESC LIMIT 5; I'm on PostgreSQL 7.4.7: freshports.org=# explain analyse freshports.org-# SELECT W.element_id, count(W.watch_list_id) freshports.org-# FROM watch_list_element W freshports.org-# WHERE EXISTS freshports.org-# (SELECT * FROM watch_list_element E freshports.org(# WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) freshports.org-# GROUP BY W.element_id freshports.org-# ORDER BY 2 DESC freshports.org-# LIMIT 5; QUERY PLAN -- -- - Limit (cost=417905.49..417905.51 rows=5 width=8) (actual time=3142.480..3142.528 rows=5 loops=1) -> Sort (cost=417905.49..417908.08 rows=1033 width=8) (actual time=3142.471..3142.486 rows=5 loops=1) Sort Key: count(watch_list_id) -> HashAggregate (cost=417851.20..417853.78 rows=1033 width=8) (actual time=3074.170..3112.294 rows=7338 loops=1) -> Seq Scan on watch_list_element w (cost=0.00..417506.76 rows=6 width=8) (actual time=0.129..2619.989 rows=94018 loops=1) Filter: (subplan) SubPlan -> Index Scan using watch_list_element_pkey on watch_list_element e (cost=0.00..3.02 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=137776) Index Cond: (($0 = watch_list_id) AND (element_id = 54968)) Total runtime: 3143.304 ms (10 rows) freshports.org=# Compare that to the original query: freshports.org=# explain analyse freshports.org-# SELECT W.element_id freshports.org-#FROM watch_list_element W freshports.org-# WHERE w.watch_list_id in (select watch_list_id from freshports.org(# watch_list_element where element_id = 54968) freshports.org-#GROUP BY W.element_id freshports.org-#ORDER BY count(W.watch_list_id) DESC freshports.org-# LIMIT 5; QUERY PLAN -- -- Limit (cost=1174.26..1174.28 rows=5 width=8) (actual time=1786.628..1786.675 rows=5 loops=1) -> Sort (cost=1174.26..1176.16 rows=758 width=8) (actual time=1786.618..1786.634 rows=5 loops=1) Sort Key: count(w.watch_list_id) -> HashAggregate (cost=1136.11..1138.01 rows=758 width=8) (actual time=1718.439..1756.290 rows=7338 loops=1) ->
Re: [SQL] people who buy A, also buy C, D, E
O Christoph Haller έγραψε στις Apr 26, 2005 : > Dan Langille wrote: > > > > The goal of my query is: given a book, what did other people who > > bought this book also buy? I plan the list the 5 most popular such > > books. In reality, this isn't about books, but that makes it easier > > to understand I think. > > > > We have a table of customer_id (watch_list_id) and book_id > > (element_id). > > > > freshports.org=# \d watch_list_element > > Table "public.watch_list_element" > > Column | Type | Modifiers > > ---+-+--- > > watch_list_id | integer | not null > > element_id| integer | not null > > Indexes: > > "watch_list_element_pkey" primary key, btree (watch_list_id, > > element_id) > > "watch_list_element_element_id" btree (element_id) > > Foreign-key constraints: > > "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON > > UPDATE CASCADE ON DELETE CASCADE > > "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE > > CASCADE ON DELETE CASCADE > > > > freshports.org=# > > > > I have a query which returns the needed results: > > > > SELECT W.element_id > >FROM watch_list_element W > > WHERE w.watch_list_id in (select watch_list_id from > > watch_list_element where element_id = 54968) > >GROUP BY W.element_id > >ORDER BY count(W.watch_list_id) DESC > > LIMIT 5; > > > > But performance is an issue here. So I'm planning to calculate all > > the possible values and cache them. That is, given each element_id in > > a watch_list, what are the top 5 element_id values on all the lists > > on which the original element_id appears? > > > > I'm having trouble constructing the query. I'm not even sure I can > > do this in one select, but that would be nice. Examples and clues > > are appreciated. > > > > Any ideas? > > > > Thank you. > > -- > > Just two ideas. > > 1) Older Postgres versions are notorious for being slow > on "IN" clauses. > Does this one (untested) perform better: > > SELECT W.element_id, count(W.watch_list_id) > FROM watch_list_element W > WHERE EXISTS > (SELECT * FROM watch_list_element E > WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) > GROUP BY W.element_id > ORDER BY 2 DESC > LIMIT 5; > > 2) I suspect calculating all possible values would require time and > an enormous cache buffer in size as well as re-calculating pretty often. > So my approach would be trying to tune the query before introducing > cached results. AFAIK, problems like this fall into the "Data Mining" field, and often their solution go beyond some DB arrangments. A little research wouldn't hurt, IMO. > > HTH > > Regards, Christoph > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend > -- -Achilleus ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] people who buy A, also buy C, D, E
Dan Langille wrote: > > On 26 Apr 2005 at 14:24, Christoph Haller wrote: > > > Dan Langille wrote: > > > > > > The goal of my query is: given a book, what did other people who > > > bought this book also buy? I plan the list the 5 most popular such > > > books. In reality, this isn't about books, but that makes it easier > > > to understand I think. > > > > > > We have a table of customer_id (watch_list_id) and book_id > > > (element_id). > > > > > > freshports.org=# \d watch_list_element > > > Table "public.watch_list_element" > > > Column | Type | Modifiers > > > ---+-+--- > > > watch_list_id | integer | not null > > > element_id| integer | not null > > > Indexes: > > > "watch_list_element_pkey" primary key, btree (watch_list_id, > > > element_id) > > > "watch_list_element_element_id" btree (element_id) > > > Foreign-key constraints: > > > "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON > > > UPDATE CASCADE ON DELETE CASCADE > > > "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE > > > CASCADE ON DELETE CASCADE > > > > > > freshports.org=# > > > > > > I have a query which returns the needed results: > > > > > > SELECT W.element_id > > >FROM watch_list_element W > > > WHERE w.watch_list_id in (select watch_list_id from > > > watch_list_element where element_id = 54968) > > >GROUP BY W.element_id > > >ORDER BY count(W.watch_list_id) DESC > > > LIMIT 5; > > > > > > But performance is an issue here. So I'm planning to calculate all > > > the possible values and cache them. That is, given each element_id > > > in a watch_list, what are the top 5 element_id values on all the > > > lists on which the original element_id appears? > > > > > > I'm having trouble constructing the query. I'm not even sure I can > > > do this in one select, but that would be nice. Examples and clues > > > are appreciated. > > > > > > Any ideas? > > > > > > Thank you. > > > -- > > > > Just two ideas. > > > > 1) Older Postgres versions are notorious for being slow > > on "IN" clauses. > > Does this one (untested) perform better: > > > > SELECT W.element_id, count(W.watch_list_id) > > FROM watch_list_element W > > WHERE EXISTS > > (SELECT * FROM watch_list_element E > > WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id) > > GROUP BY W.element_id ORDER BY 2 DESC LIMIT 5; > > I'm on PostgreSQL 7.4.7: > > freshports.org=# explain analyse > freshports.org-# SELECT W.element_id, count(W.watch_list_id) > freshports.org-# FROM watch_list_element W > freshports.org-# WHERE EXISTS > freshports.org-# (SELECT * FROM watch_list_element E > freshports.org(# WHERE E.element_id = 54968 AND W.watch_list_id = > E.watch_list_id) > freshports.org-# GROUP BY W.element_id > freshports.org-# ORDER BY 2 DESC > freshports.org-# LIMIT 5; > >QUERY PLAN > -- > -- > - > Limit (cost=417905.49..417905.51 rows=5 width=8) (actual > time=3142.480..3142.528 rows=5 loops=1) >-> Sort (cost=417905.49..417908.08 rows=1033 width=8) (actual > time=3142.471..3142.486 rows=5 loops=1) > Sort Key: count(watch_list_id) > -> HashAggregate (cost=417851.20..417853.78 rows=1033 > width=8) (actual time=3074.170..3112.294 rows=7338 loops=1) >-> Seq Scan on watch_list_element w > (cost=0.00..417506.76 rows=6 width=8) (actual > time=0.129..2619.989 rows=94018 loops=1) > Filter: (subplan) > SubPlan >-> Index Scan using watch_list_element_pkey > on watch_list_element e (cost=0.00..3.02 rows=1 width=8) (actual > time=0.011..0.011 rows=1 loops=137776) > Index Cond: (($0 = watch_list_id) AND > (element_id = 54968)) > Total runtime: 3143.304 ms > (10 rows) > > freshports.org=# > > Compare that to the original query: > > freshports.org=# explain analyse > freshports.org-# SELECT W.element_id > freshports.org-#FROM watch_list_element W > freshports.org-# WHERE w.watch_list_id in (select watch_list_id > from > freshports.org(# watch_list_element where element_id = 54968) > freshports.org-#GROUP BY W.element_id > freshports.org-#ORDER BY count(W.watch_list_id) DESC > freshports.org-# LIMIT 5; > > QUERY PLAN > -- > -- > > Limit (cost=1174.26..1174.28 rows=5 width=8) (actual > time=1786.628..1786.675 rows=5 loops=1) >-> Sort (cost=1174.26..1176.16 rows=758 width=8) (actual > time=1786.618..1786.634 rows=5 loops=1) > Sort Key: count(w.watch_list_id) > -> HashAggregate (co
Re: [SQL] UPDATE WITH ORDER BY
Thanksyou and Franz for your help. Simple and efficient... I was blind The plpgsql "for" is the perfect solution It was great. Have a nice week!!! Cheers, Rodrigo Carvalhaes Christoph Haller wrote: Rodrigo Carvalhaes wrote: Hi Guys! I need to make an UPDATE on a column reordering it with a sequence using order by a description. Confusing??? Well.. Let me give an example... Today, my table it's organized like this: Code / Description 9 Orange 15 Apple 1 Pear 3 Tomato I wanna to reorganize (reordering the code from 1 to ... ordering by description) Code / Description 1 Apple 2 Orange 3 Pear 4 Tomato I created a sequence but I am having no succes to use it because UPDATE don't accept ORDER BY CLAUSE. The "ideal SQL" is UPDATE table SET code = nextval('sequence') ORDER BY description I searched a lot on the NET without ant "tip" for my case. It's a very simple need but I am not able to solve it... Anyone knows how I can do it? Cheers, -- Rodrigo Carvalhaes I doubt this can be done by a single SQL command. My approach is a function. I did: CREATE TABLE fruittable( fruitcode INTEGER, fruitname TEXT ); INSERT INTO fruittable VALUES( 9, 'Orange' ); INSERT INTO fruittable VALUES( 15, 'Apple' ); INSERT INTO fruittable VALUES( 1, 'Pear' ); INSERT INTO fruittable VALUES( 3, 'Tomato' ); SELECT * FROM fruittable ORDER BY fruitname ; fruitcode | fruitname ---+--- 15 | Apple 9 | Orange 1 | Pear 3 | Tomato CREATE OR REPLACE FUNCTION reorder_fruitcode() RETURNS INTEGER AS ' DECLARE newcode INTEGER ; fruitrecord RECORD ; BEGIN newcode := 1 ; FOR fruitrecord IN SELECT * FROM fruittable ORDER BY fruitname LOOP RAISE NOTICE ''fruitname is %'', fruitrecord.fruitname ; UPDATE fruittable SET fruitcode = newcode WHERE fruitname = fruitrecord.fruitname ; newcode := newcode + 1 ; END LOOP ; RETURN 1; END; ' LANGUAGE plpgsql; SELECT reorder_fruitcode(); SELECT * FROM fruittable ORDER BY fruitname ; fruitcode | fruitname ---+--- 1 | Apple 2 | Orange 3 | Pear 4 | Tomato (4 rows) Voila. Regards, Christoph ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Abraço, Rodrigo Carvalhaes DBA PostgreSQL Moderador grupo siga-br -- Esta mensagem foi verificada pelo sistema de antivírus e acredita-se estar livre de perigo.
[SQL] several questions about R-tree index
According to the manual at: http://www.postgresql.org/docs/7.4/static/functions-geometry.html " The PostgreSQL query planner will consider using an R-tree index whenever an indexed column is involved in a comparison using one of these operators: <<, &<, &>, >>, @, ~=, && (Refer to Section 9.9 about the meaning of these operators.)" Shouldn't the ~ (contains) operator be included also? Isn't ~ the commutator of @ ? I am considering using R-tree's for other-than-geometric purposes. Here is the story: I have string data (column smiles) which represents chemical structures. Comparing strings for equality works perfectly using an ordinary B-tree index. But locating chemical substructures requires a more elaborate (read time-consuming) match function. I search for substructures like this: Select count(smiles) from structures where matches(smiles, subsmiles); where subsmiles is the search pattern desired. This is not a fast search - about 15 seconds for 0.25 million rows. To speed up this search I have a function (called fingerprint) which produces a bit string representation of some common and uncommon substructures. I have populated my table with column fp = fingerprint(smiles) So, this is about 5 times faster: Select count(smiles) from structures where (fp & fingerprint(subsmiles)) = fingerprint(subsmiles) & matches(smiles, subsmiles); The sequence scan using the first where-clause takes about 3 seconds and the final matches on the resulting subset is insignificant when the subset is small. But I think I might be able to do better (faster) using R-trees. Bitstrings can be thought of as "containing" when one bitstring has all the same bits set as another, even if it has other bits set too - this is the gist of the first where-clause above. Can I expect to be able to use R-tree's to do this? Will I simply have to define a new datatype and three R-tree functions (union, intersection and size). Will the use of the ~ (contains) operator cause the planner to consider using an index (this is my first question, way above)? I hope someone has had the patience and interest to read this far. Thanks, TJ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] several questions about R-tree index
On 2005-04-26, "TJ O'Donnell" <[EMAIL PROTECTED]> wrote: > But I think I might be able to do better (faster) using R-trees. > Bitstrings can be thought of as "containing" when one bitstring has all the > same bits set as another, even if it has other bits set too - this is the > gist of the first where-clause above. > > Can I expect to be able to use R-tree's to do this? You may want to use GIST instead - it is more flexible. > Will I simply have to define a new datatype and three R-tree functions > (union, intersection and size). > Will the use of the ~ (contains) operator cause the planner to consider > using an index (this is my first question, way above)? What the planner actually looks for is, having identified the specific operator and types, whether an index opclass exists for them, and if so whether an index using that opclass exists. The actual names of the operators are irrelevent. So for either rtree or GIST, all you need is to define your new datatype, with its associated operators, and create an operator class for it with appropriate support functions, and create indexes using that opclass. Once all that is done, the planner will consider using them. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] several questions about R-tree index
"TJ O'Donnell" <[EMAIL PROTECTED]> writes: > Shouldn't the ~ (contains) operator be included also? > Isn't ~ the commutator of @ ? Yeah, it looks like the documentation is in error: regression=# select amopopr::regoperator,amopstrategy from pg_amop where amopclaid in regression-# (select oid from pg_opclass where opcamid = 402); amopopr | amopstrategy -+-- <<(box,box) |1 &<(box,box) |2 &&(box,box) |3 &>(box,box) |4 >>(box,box) |5 ~=(box,box) |6 ~(box,box) |7 @(box,box) |8 <<(polygon,polygon) |1 &<(polygon,polygon) |2 &&(polygon,polygon) |3 &>(polygon,polygon) |4 >>(polygon,polygon) |5 ~=(polygon,polygon) |6 ~(polygon,polygon) |7 @(polygon,polygon) |8 (16 rows) Of course, the thing that leaps out here is that there are only two built-in opclasses for rtree. Not exactly confidence building. > I am considering using R-tree's for other-than-geometric purposes. I concur with Andrew's suggestion to consider GIST. GIST has its own issues, but at least there are people looking at it/using it/working on it. R-tree doesn't seem to have any user community that really cares. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])