[SQL] SEVEN cross joins?!?!?
I have a table with only 434 rows in it. Two important columns are itemid and locn. Each item must be in one of seven locations. We need to create a combo by selecting one item from each of seven locations; then determine which combo is the best according to our analysis (see below). A subselect for items in a location looks something like: (select * from suit_item where locn='Head' AND username='Walter' ORDER BY itemid LIMIT 10) as Head One subselect for each location, cross join them all and the query generates 10,000,000 combinations! Without the LIMIT 10, there are 78 * 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for username='Walter' (the only user at the moment). The large volume is causing a problem for my systems! The ORDER BY itemid was added only so that same 10 items were processed on different computer systems I tested this query on. Only one item for 7th locn in the database at the moment. Every item has three key properties val1, val2 and val3. For each combo, we calculate: (Head.val1 + Arm.val1 + ... Leg.val1) AS Calc1 (Head.val2 + Arm.val2 + ... Leg.val2) AS Calc2 (Head.val3 + Arm.val3 + ... Leg.val3) AS Calc3 Each calculation has a pseudo max value coded so that values above this max are considered equal: CASE WHEN calc1 70 then 70 else calc1 END as ordcalc1 CASE WHEN calc2 15 then 15 else calc2 END as ordcalc2 CASE WHEN calc3 60 then 60 else calc3 END as ordcalc3 Then I use: ORDER BY ordcalc1 DESC, ordcalc2 DESC, ordcalc3 DESC When I activated a couple of my brain cells, I realised that adding WHERE ordcalc1 = 70 AND ordcalc2 = 15 AND ordcalc3 = 60 after the cross joins might help things out a bit. The 10,000,000 results was reduced significantly (8K - 30K with different samples). Because the ordcalc cannot be used in a WHERE clause, the entire expression was repeated. I used php to generate the query from pieces so that I could avoid lots of repetition in coding (but still there in final query). The query itself is about 6K when assembled. After that big introduction, I have a couple of questions: 1) Did I approach the problem incorrectly? Is there another way to approach this query so that fewer combos are analysed? 2) Are there any optimisations that could improve query speed? Since the table is so small, I guessed that indexes wouldn't help. I created an index on (username, itemid), but it doesn't get used. Output of EXPLAIN ANALYSE found here: http://www.execulink.com/~fbax/JOINS/ 3) When run on P2 and P4 systems, I would expect to see huge improvement in time taken to process query, but I don't (only 35-40% better)? i = number of items in LIMIT of subselect rc = raw record count rcw = record count with limits in WHERE clause p2 = seconds for query to run on P2-400M pg=7.4.3 ram=32M p4 = seconds for query to run on P4-2.8G pg=7.3.5 ram=1G i=10 - rc=1,000,000 rcw=27,086 p2=81 p4=49 i=11 - rc=1,771,561 rcw=41,121 p2=141 p4=86 i=12 - rc=2,985,984 rcw=56,425 p2=216 p4=142 i=13 - rc=4,826,809 rcw=81,527 p2=??? p4=228 On P2 system i=13 query returns empty page with no errors on server. On P4 system i=15 results in: PostgreSQL Error: 1 (ERROR: tuplestore: write failed) I suppose this is a temp file - is it created in $DATA? OpenBSD has several partitions, so I'll need to know which one is too small. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] SEVEN cross joins?!?!?
Frank Bax wrote: I have a table with only 434 rows in it. Two important columns are itemid and locn. Each item must be in one of seven locations. We need to create a combo by selecting one item from each of seven locations; then determine which combo is the best according to our analysis (see below). A subselect for items in a location looks something like: (select * from suit_item where locn='Head' AND username='Walter' ORDER BY itemid LIMIT 10) as Head One subselect for each location, cross join them all and the query generates 10,000,000 combinations! Without the LIMIT 10, there are 78 * 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for username='Walter' (the only user at the moment). The large volume is causing a problem for my systems! The ORDER BY itemid was added only so that same 10 items were processed on different computer systems I tested this query on. Only one item for 7th locn in the database at the moment. Frank - it might just be me, but I've read your email twice and despite all the information I still don't have any idea what you are trying to do. Are you saying that you WANT to generate a cross-join, score the millions of results and then pick the best 10? It's doing what you want, but you'd like it to be faster. Or are you saying that you'd like to avoid the explosion in rows altogether? In either case - I don't suppose you could provide a real example of the query, so we can see exactly what you're trying to do. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[SQL] question re. count, group by, and having
The following query returns an error (column c does not exist) in pg 8.0.3: (The column state is the two letter abbreviation for a US state) -- get the number of rows for each state; list in descending order; include only states with at least 6 rows select state, count(state) as c from t group by state having c 5 order by c desc; -- gives error If I leave the having clause out, I get the expected results: select state, count(state) as c from t group by state order by c desc; -- this works Is this a bug or a feature? Im not sure why I can use c in the order by clause but not the having clause. pg is much happier with the full having count(state) 5. Will this cause count to be evaluated twice? If it matters, state is varchar(2).
Re: [SQL] question re. count, group by, and having
On 10/11/05 8:50 AM, Rick Schumeyer [EMAIL PROTECTED] wrote: The following query returns an error (column c does not exist) in pg 8.0.3: (The column 'state' is the two letter abbreviation for a US state) -- get the number of rows for each state; list in descending order; include only states with at least 6 rows select state, count(state) as c from t group by state having c 5 order by c desc; -- gives error If I leave the having clause out, I get the expected results: select state, count(state) as c from t group by state order by c desc; -- this works Is this a bug or a feature? I'm not sure why I can use 'c' in the order by clause but not the having clause. pg is much happier with the full having count(state) 5. Will this cause count to be evaluated twice? I think that postgres is smart enough to do the evaluation only once, but this might be version-dependent, but one of the gurus will have to comment on which version (if there is a version dependence) first made this improvement. Sean ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] question re. count, group by, and having
Rick Schumeyer [EMAIL PROTECTED] writes: Is this a bug or a feature? I'm not sure why I can use 'c' in the order by clause but not the having clause. pg is much happier with the full having count(state) 5. Actually, referring to any of the output columns in any of the modifier clauses is logically suspect. Original SQL (back around 89 or so) required ORDER BY items to be output column names, thus wiring in an assumption that sorting happens after calculation of the output values, but that is surely not true for any of the other clauses. And it's pretty bogus even for sorting, since you might wish to sort on a value you're not displaying. If we were working in a green field we'd doubtless get rid of the output-column-reference feature entirely. But for backward compatibility's sake we're stuck with allowing ORDER BY items to be simple output column names, per ancient versions of the SQL spec. At one point or another somebody thought it a good idea to propagate that special rule into GROUP BY; which in hindsight was an awful idea. (It's not in the spec. I'm not sure if this is just a Postgres-ism or if we borrowed someone else's bad idea.) But we're stuck with supporting that odd case too, now. We certainly aren't going to add more. Will this cause count to be evaluated twice? Recent versions of PG are smart enough to merge duplicate aggregates. This isn't necessarily true for other forms of common subexpressions, but it works for aggregate functions. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[SQL] ichar
I moved from postgres 7.0.2 to 8.0.1, in the 7.0.2 version I process the next query, and it succeed: select ichar(letra_ascii) || '-' || substr('0' ||num_factura,length (num_factura)+1,5) as factura from facturas but in 8.0.1 it outputs the next error: ERROR: not exist the function ichar(integer) HINT: any function match with the arguments types. Add explicit cast of types ---(end of broadcast)--- TIP 6: explain analyze is your friend
[SQL] Returning NULL results?
_SQL := 'SELECT TmessageId FROM tmpBids WHERE TphoneNumber = ' || quote_literal(phoneNumber) || ' AND Tbid = ' || aBid; FOR rec IN EXECUTE _SQL LOOP bidCount := rec._qv; END LOOP; This works ok as long as the SELECT query returns rows. Of course, if it returns multiple rows bidCount variable will hold just the last row value, but the design of application is like that so the query from the start returns only one row, or returns no rows. Of course, if it returns no rows, I'm presented with an error, saying: ERROR: record rec has no field _qv This is logical. My question would be is there a way around this withouth first doing SELECT COUNT(*) FROM tmbBids WHERE ..., and then if COUNT(*) is zero THEN bidCount := NULL ELSE if COUNT(*) is not zero then do SELECT from the begining? Mike -- Mario Splivalo Mob-Art [EMAIL PROTECTED] I can do it quick, I can do it cheap, I can do it well. Pick any two. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] Difference from average
Neil Saunders wrote: Hi all, I'm developing a property rental database. One of the tables tracks the price per week for different properties: CREATE TABLE public.prices ( id SERIAL, property_id INTEGER, start_date TIMESTAMP WITHOUT TIME ZONE, end_date TIMESTAMP WITHOUT TIME ZONE, price DOUBLE PRECISION NOT NULL ) WITH OIDS; CREATE INDEX prices_idx ON public.prices USING btree (property_id); I'd like to display the prices per property in a table, with each row coloured different shades; darker shades representing the more expensive periods for that property. To do this, I propose to calculate the percentage difference of each rows price from the average for that property, so if for example I have two rows, one for price=200 and one for price=300, i'd like to retrieve both records along with the calculated field indicating that the rows are -20%, +20% from the average, respectively. I've started with the following query, but since I'm still learning how PostgreSQL works, I'm confused as to the efficiency of the following statement: SELECT *, (price - (SELECT avg(price) from prices)) as diff FROM prices; I'd personally write it something like: SELECT prices.property_id, prices.price AS actual_price, averages.avg_price, (averages.avg_price - prices.price) AS price_diff ((averages.avg_price - prices.price)/averages.avg_price) AS pc_diff FROM prices, (SELECT property_id, avg(price) as avg_price FROM prices) AS averages WHERE prices.property_id = averages.property_id ; That's as much to do with how I think about the problem as to any testing though. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] ichar
Judith Altamirano Figueroa [EMAIL PROTECTED] writes: ERROR: not exist the function ichar(integer) [ digs in archives... ] Looks like we renamed ichar() to chr() quite some time ago. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [SQL] Returning NULL results?
Mario Splivalo wrote: Of course, if it returns no rows, I'm presented with an error, saying: ERROR: record rec has no field _qv This is logical. My question would be is there a way around this withouth first doing SELECT COUNT(*) FROM tmbBids WHERE ..., and then if COUNT(*) is zero THEN bidCount := NULL ELSE if COUNT(*) is not zero then do SELECT from the begining? Could you check the FOUND variable? As in IF NOT FOUND THEN RETURN NULL -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] ichar
Judith Altamirano Figueroa wrote: I moved from postgres 7.0.2 to 8.0.1, If you only have one compatability problem, I'd be surprised. in the 7.0.2 version I process the next query, and it succeed: select ichar(letra_ascii) || '-' || substr('0' ||num_factura,length (num_factura)+1,5) as factura from facturas but in 8.0.1 it outputs the next error: ERROR: not exist the function ichar(integer) HINT: any function match with the arguments types. Add explicit cast of types I'll point you to Tom Lane's thoughts on this back in 2000: http://archives.postgresql.org/pgsql-hackers/2000-09/msg00418.php HTH -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] SEVEN cross joins?!?!?
At 08:29 AM 10/11/05, Richard Huxton wrote: Frank Bax wrote: I have a table with only 434 rows in it. Two important columns are itemid and locn. Each item must be in one of seven locations. We need to create a combo by selecting one item from each of seven locations; then determine which combo is the best according to our analysis (see below). A subselect for items in a location looks something like: (select * from suit_item where locn='Head' AND username='Walter' ORDER BY itemid LIMIT 10) as Head One subselect for each location, cross join them all and the query generates 10,000,000 combinations! Without the LIMIT 10, there are 78 * 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for username='Walter' (the only user at the moment). The large volume is causing a problem for my systems! The ORDER BY itemid was added only so that same 10 items were processed on different computer systems I tested this query on. Only one item for 7th locn in the database at the moment. Frank - it might just be me, but I've read your email twice and despite all the information I still don't have any idea what you are trying to do. Are you saying that you WANT to generate a cross-join, score the millions of results and then pick the best 10? It's doing what you want, but you'd like it to be faster. Or are you saying that you'd like to avoid the explosion in rows altogether? In either case - I don't suppose you could provide a real example of the query, so we can see exactly what you're trying to do. There is no best 10. I currently limit each subselect to 10 items so that query will actually run without crashing. I would like to remove the ORDER BY itemid LIMIT 10 mentioned above. At the end of the query I have a LIMIT 100 clause which will stay and produces a list of best 100 combos. Either of your solutions would be acceptable; since avoiding the explosion would also make the query faster. Current calculations indicate that running the query without LIMIT 10 in subselect would take years to process. The query is filled with expressions. I'm not sure I can shorten it without making typos or deleting something important, so I'll make it available on web here: http://www.execulink.com/~fbax/JOINS/ Results of explain analyse is also there. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[SQL] pg, mysql comparison with group by clause
I'm not sure what I was thinking, but I tried the following query in pg: SELECT * FROM t GROUP BY state; pg returns an error. Mysql, OTOH, returns the first row for each state. (The first row with AK, the first row with PA, etc.) I'm no SQL expert, but it seems to me that the pg behavior is correct, and the mysql result is just weird. Am I correct? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] pg, mysql comparison with group by clause
On Tue, 2005-10-11 at 16:12, Rick Schumeyer wrote: I'm not sure what I was thinking, but I tried the following query in pg: SELECT * FROM t GROUP BY state; pg returns an error. Mysql, OTOH, returns the first row for each state. (The first row with AK, the first row with PA, etc.) I'm no SQL expert, but it seems to me that the pg behavior is correct, and the mysql result is just weird. Am I correct? Yes, you are correct. The SQL standard is quite clear that in order to appear in the select list, an entry must either be in the group by or be operated upon by an aggregate function. PostgreSQL supports this same action by way of the non-standard select distinct on(fieldlist), fieldlist from Since you don't know for sure which answer you'll get each time, it's better to KNOW you're doing something that may not be reproduceable than to accidentally do it when your database SHOULD be throwing an error. That's just one of many many things MySQL does that makes my head hurt. For more, search google for mysql gotchas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] pg, mysql comparison with group by clause
You're 100% correct, this is a bug in mysql. Sadly, they tout this as a feature! -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rick Schumeyer Sent: Tuesday, October 11, 2005 5:12 PM To: pgsql-sql@postgresql.org Subject: [SQL] pg, mysql comparison with group by clause I'm not sure what I was thinking, but I tried the following query in pg: SELECT * FROM t GROUP BY state; pg returns an error. Mysql, OTOH, returns the first row for each state. (The first row with AK, the first row with PA, etc.) I'm no SQL expert, but it seems to me that the pg behavior is correct, and the mysql result is just weird. Am I correct? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] question re. count, group by, and having
Hi Logically HAVING is executed after the GROUP BY and it must contain only the columns in the GROUP BY or aggregated function. select state, count(state) as c from t group by state having c 5 The above query grouped only on state and HAVING can be used only with the column state or the aggregated function count(state). Regards, R.Muralidharan Rick Schumeyer [EMAIL PROTECTED] wrote: The following query returns an error (column c does not exist) in pg 8.0.3: (The column state is the two letter abbreviation for a US state) -- get the number of rows for each state; list in descending order; include only states with at least 6 rows select state, count(state) as c from t group by state having c 5 order by c desc; -- gives error If I leave the having clause out, I get the expected results: select state, count(state) as c from t group by state order by c desc; -- this works Is this a bug or a feature? Im not sure why I can use c in the order by clause but not the having clause. pg is much happier with the full having count(state) 5. Will this cause count to be evaluated twice? If it matters, state is varchar(2). Yahoo! India Matrimony: Find your partner online.
Re: [SQL] pg, mysql comparison with group by clause
On Tue, 11 Oct 2005, Rick Schumeyer wrote: I'm not sure what I was thinking, but I tried the following query in pg: SELECT * FROM t GROUP BY state; pg returns an error. Mysql, OTOH, returns the first row for each state. (The first row with AK, the first row with PA, etc.) I'm no SQL expert, but it seems to me that the pg behavior is correct, and the mysql result is just weird. Am I correct? In your case, it sounds like the mysql result is wrong. I believe SQL99 would allow it if the other columns were functionally dependant upon state (as there'd by definition only be one value for the other columns per group). ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] SEVEN cross joins?!?!?
Aha ! A gamer... playing with armor and hit points and things -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank Bax Sent: Tuesday, October 11, 2005 1:06 PM To: pgsql-sql@postgresql.org Subject: Re: [SQL] SEVEN cross joins?!?!? At 08:29 AM 10/11/05, Richard Huxton wrote: Frank Bax wrote: I have a table with only 434 rows in it. Two important columns are itemid and locn. Each item must be in one of seven locations. We need to create a combo by selecting one item from each of seven locations; then determine which combo is the best according to our analysis (see below). A subselect for items in a location looks something like: (select * from suit_item where locn='Head' AND username='Walter' ORDER BY itemid LIMIT 10) as Head One subselect for each location, cross join them all and the query generates 10,000,000 combinations! Without the LIMIT 10, there are 78 * 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for username='Walter' (the only user at the moment). The large volume is causing a problem for my systems! The ORDER BY itemid was added only so that same 10 items were processed on different computer systems I tested this query on. Only one item for 7th locn in the database at the moment. Frank - it might just be me, but I've read your email twice and despite all the information I still don't have any idea what you are trying to do. Are you saying that you WANT to generate a cross-join, score the millions of results and then pick the best 10? It's doing what you want, but you'd like it to be faster. Or are you saying that you'd like to avoid the explosion in rows altogether? In either case - I don't suppose you could provide a real example of the query, so we can see exactly what you're trying to do. There is no best 10. I currently limit each subselect to 10 items so that query will actually run without crashing. I would like to remove the ORDER BY itemid LIMIT 10 mentioned above. At the end of the query I have a LIMIT 100 clause which will stay and produces a list of best 100 combos. Either of your solutions would be acceptable; since avoiding the explosion would also make the query faster. Current calculations indicate that running the query without LIMIT 10 in subselect would take years to process. The query is filled with expressions. I'm not sure I can shorten it without making typos or deleting something important, so I'll make it available on web here: http://www.execulink.com/~fbax/JOINS/ Results of explain analyse is also there. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org