[SQL] SEVEN cross joins?!?!?

2005-10-11 Thread Frank Bax
I have a table with only 434 rows in it.  Two important columns are 
itemid and locn.  Each item must be in one of seven locations.  We need 
to create a combo by selecting one item from each of seven locations; 
then determine which combo is the best according to our analysis (see 
below).


A subselect for items in a location looks something like:
(select * from suit_item where locn='Head' AND username='Walter' ORDER BY 
itemid LIMIT 10) as Head


One subselect for each location, cross join them all and the query 
generates 10,000,000 combinations!  Without the LIMIT 10,  there are  78 
* 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for 
username='Walter' (the only user at the moment).  The large volume is 
causing a problem for my systems!  The ORDER BY itemid was added only so 
that same 10 items were processed on different computer systems I tested 
this query on.  Only one item for 7th locn in the database at the moment.


Every item has three key properties val1, val2 and val3.  For each combo, 
we calculate:

(Head.val1 + Arm.val1 + ... Leg.val1) AS Calc1
(Head.val2 + Arm.val2 + ... Leg.val2) AS Calc2
(Head.val3 + Arm.val3 + ... Leg.val3) AS Calc3
Each calculation has a pseudo max value coded so that values above this 
max are considered equal:

CASE WHEN calc1  70 then 70 else calc1 END as ordcalc1
CASE WHEN calc2  15 then 15 else calc2 END as ordcalc2
CASE WHEN calc3  60 then 60 else calc3 END as ordcalc3
Then I use:
ORDER BY ordcalc1 DESC, ordcalc2 DESC, ordcalc3 DESC

When I activated a couple of my brain cells, I realised that adding WHERE 
ordcalc1 = 70 AND ordcalc2 = 15 AND ordcalc3 = 60 after the cross joins 
might help things out a bit.  The 10,000,000 results was reduced 
significantly (8K - 30K with different samples).  Because the ordcalc 
cannot be used in a WHERE clause, the entire expression was repeated.


I used php to generate the query from pieces so that I could avoid lots of 
repetition in coding (but still there in final query).  The query itself is 
about 6K when assembled.


After that big introduction, I have a couple of questions:

1) Did I approach the problem incorrectly?  Is there another way to 
approach this query so that fewer combos are analysed?


2) Are there any optimisations that could improve query speed?  Since the 
table is so small, I guessed that indexes wouldn't help.  I created an 
index on (username, itemid), but it doesn't get used.  Output of EXPLAIN 
ANALYSE found here:

http://www.execulink.com/~fbax/JOINS/

3) When run on P2 and P4 systems, I would expect to see huge improvement in 
time taken to process query, but I don't (only 35-40% better)?


i = number of items in LIMIT of subselect
rc = raw record count
rcw = record count with limits in WHERE clause
p2 = seconds for query to run on P2-400M pg=7.4.3 ram=32M
p4 = seconds for query to run on P4-2.8G pg=7.3.5 ram=1G

i=10 - rc=1,000,000 rcw=27,086 p2=81  p4=49
i=11 - rc=1,771,561 rcw=41,121 p2=141 p4=86
i=12 - rc=2,985,984 rcw=56,425 p2=216 p4=142
i=13 - rc=4,826,809 rcw=81,527 p2=??? p4=228

On P2 system i=13 query returns empty page with no errors on server.

On P4 system i=15 results in:
PostgreSQL Error: 1 (ERROR: tuplestore: write failed)

I suppose this is a temp file - is it created in $DATA?  OpenBSD has 
several partitions, so I'll need to know which one is too small. 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [SQL] SEVEN cross joins?!?!?

2005-10-11 Thread Richard Huxton

Frank Bax wrote:
I have a table with only 434 rows in it.  Two important columns are 
itemid and locn.  Each item must be in one of seven locations.  We 
need to create a combo by selecting one item from each of seven 
locations; then determine which combo is the best according to our 
analysis (see below).


A subselect for items in a location looks something like:
(select * from suit_item where locn='Head' AND username='Walter' ORDER 
BY itemid LIMIT 10) as Head


One subselect for each location, cross join them all and the query 
generates 10,000,000 combinations!  Without the LIMIT 10,  there are  
78 * 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for 
username='Walter' (the only user at the moment).  The large volume is 
causing a problem for my systems!  The ORDER BY itemid was added only 
so that same 10 items were processed on different computer systems I 
tested this query on.  Only one item for 7th locn in the database at the 
moment.


Frank - it might just be me, but I've read your email twice and despite 
all the information I still don't have any idea what you are trying to do.


Are you saying that you WANT to generate a cross-join, score the 
millions of results and then pick the best 10? It's doing what you want, 
but you'd like it to be faster.


Or are you saying that you'd like to avoid the explosion in rows altogether?

In either case - I don't suppose you could provide a real example of the 
query, so we can see exactly what you're trying to do.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[SQL] question re. count, group by, and having

2005-10-11 Thread Rick Schumeyer








The following query returns an error (column c does
not exist) in pg 8.0.3:



(The column state is the two letter
abbreviation for a US state)



-- get the number of rows for each state; list in descending
order; include only states with at least 6 rows

select state, count(state) as c from t group by state having
c  5 order by c desc; -- gives error



If I leave the having clause out, I get the expected
results:



select state, count(state) as c from t group by state order
by c desc; -- this works



Is this a bug or a feature? Im not sure why I
can use c in the order by clause but not the having clause.
pg is much happier with the full having count(state)  5.
Will this cause count to be evaluated twice?



If it matters, state is varchar(2).














Re: [SQL] question re. count, group by, and having

2005-10-11 Thread Sean Davis
On 10/11/05 8:50 AM, Rick Schumeyer [EMAIL PROTECTED] wrote:

 The following query returns an error (column c does not exist) in pg 8.0.3:
 
 
 
 (The column 'state' is the two letter abbreviation for a US state)
 
 
 
 -- get the number of rows for each state; list in descending order; include
 only states with at least 6 rows
 
 select state, count(state) as c from t group by state having c  5 order by c
 desc; -- gives error
 
 
 
 If I leave the having clause out, I get the expected results:
 
 
 
 select state, count(state) as c from t group by state order by c desc; -- this
 works
 
 
 
 Is this a bug or a feature?  I'm not sure why I can use 'c' in the order by
 clause but not the having clause.  pg is much happier with the full having
 count(state)  5.  Will this cause count to be evaluated twice?

I think that postgres is smart enough to do the evaluation only once, but
this might be version-dependent, but one of the gurus will have to comment
on which version (if there is a version dependence) first made this
improvement.  

Sean


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] question re. count, group by, and having

2005-10-11 Thread Tom Lane
Rick Schumeyer [EMAIL PROTECTED] writes:
 Is this a bug or a feature?  I'm not sure why I can use 'c' in the order by
 clause but not the having clause.  pg is much happier with the full having
 count(state)  5.

Actually, referring to any of the output columns in any of the modifier
clauses is logically suspect.  Original SQL (back around 89 or so)
required ORDER BY items to be output column names, thus wiring in an
assumption that sorting happens after calculation of the output values,
but that is surely not true for any of the other clauses.  And it's
pretty bogus even for sorting, since you might wish to sort on a value
you're not displaying.

If we were working in a green field we'd doubtless get rid of the
output-column-reference feature entirely.  But for backward
compatibility's sake we're stuck with allowing ORDER BY items to
be simple output column names, per ancient versions of the SQL spec.
At one point or another somebody thought it a good idea to propagate
that special rule into GROUP BY; which in hindsight was an awful idea.
(It's not in the spec.  I'm not sure if this is just a Postgres-ism
or if we borrowed someone else's bad idea.)  But we're stuck with
supporting that odd case too, now.  We certainly aren't going to add
more.

 Will this cause count to be evaluated twice?

Recent versions of PG are smart enough to merge duplicate aggregates.
This isn't necessarily true for other forms of common subexpressions,
but it works for aggregate functions.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[SQL] ichar

2005-10-11 Thread Judith Altamirano Figueroa
I moved from postgres 7.0.2 to 8.0.1,

in the 7.0.2 version I process the next query, and it succeed:

select ichar(letra_ascii) || '-' || substr('0' ||num_factura,length
(num_factura)+1,5) as factura from facturas 

but in 8.0.1 it outputs the next error:

ERROR:  not exist the function ichar(integer) 
HINT:  any function match with the arguments types.
Add explicit cast of types



---(end of broadcast)---
TIP 6: explain analyze is your friend


[SQL] Returning NULL results?

2005-10-11 Thread Mario Splivalo
_SQL := 'SELECT TmessageId FROM tmpBids WHERE TphoneNumber = ' 
|| quote_literal(phoneNumber) 
|| ' AND Tbid = ' || aBid;
FOR rec IN EXECUTE _SQL LOOP bidCount := rec._qv; END LOOP;

This works ok as long as the SELECT query returns rows. Of course, if it
returns multiple rows bidCount variable will hold just the last row
value, but the design of application is like that so the query from the
start returns only one row, or returns no rows.

Of course, if it returns no rows, I'm presented with an error, saying:

ERROR:  record rec has no field _qv

This is logical. My question would be is there a way around this
withouth first doing SELECT COUNT(*) FROM tmbBids WHERE ..., and then if
COUNT(*) is zero THEN bidCount := NULL ELSE if COUNT(*) is not zero then
do SELECT from the begining?

Mike
-- 
Mario Splivalo
Mob-Art
[EMAIL PROTECTED]

I can do it quick, I can do it cheap, I can do it well. Pick any two.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] Difference from average

2005-10-11 Thread Richard Huxton

Neil Saunders wrote:

Hi all,

I'm developing a property rental database. One of the tables tracks
the price per week for different properties:

CREATE TABLE public.prices (
  id SERIAL,
  property_id INTEGER,
  start_date TIMESTAMP WITHOUT TIME ZONE,
  end_date TIMESTAMP WITHOUT TIME ZONE,
  price DOUBLE PRECISION NOT NULL
) WITH OIDS;

CREATE INDEX prices_idx ON public.prices
  USING btree (property_id);

I'd like to display the prices per property in a table, with each row
coloured different shades; darker shades representing the more
expensive periods for that property. To do this, I propose to
calculate the percentage difference of each rows price from the
average for that property, so if for example I have two rows, one for
price=200 and one for price=300, i'd like to retrieve both records
along with the calculated field indicating that the rows are -20%,
+20% from the average, respectively.

I've started with the following query, but since I'm still learning
how PostgreSQL works, I'm confused as to the efficiency of the
following statement:

SELECT *, (price - (SELECT avg(price) from prices)) as diff FROM prices;


I'd personally write it something like:

SELECT
  prices.property_id,
  prices.price AS actual_price,
  averages.avg_price,
  (averages.avg_price - prices.price) AS price_diff
  ((averages.avg_price - prices.price)/averages.avg_price) AS pc_diff
FROM
  prices,
  (SELECT property_id, avg(price) as avg_price FROM prices) AS averages
WHERE
  prices.property_id = averages.property_id
;

That's as much to do with how I think about the problem as to any 
testing though.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [SQL] ichar

2005-10-11 Thread Tom Lane
Judith Altamirano Figueroa [EMAIL PROTECTED] writes:
 ERROR:  not exist the function ichar(integer) 

[ digs in archives... ]  Looks like we renamed ichar() to chr() quite
some time ago.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] Returning NULL results?

2005-10-11 Thread Richard Huxton

Mario Splivalo wrote:


Of course, if it returns no rows, I'm presented with an error, saying:

ERROR:  record rec has no field _qv

This is logical. My question would be is there a way around this
withouth first doing SELECT COUNT(*) FROM tmbBids WHERE ..., and then if
COUNT(*) is zero THEN bidCount := NULL ELSE if COUNT(*) is not zero then
do SELECT from the begining?


Could you check the FOUND variable? As in IF NOT FOUND THEN RETURN NULL

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [SQL] ichar

2005-10-11 Thread Richard Huxton

Judith Altamirano Figueroa wrote:

I moved from postgres 7.0.2 to 8.0.1,


If you only have one compatability problem, I'd be surprised.


in the 7.0.2 version I process the next query, and it succeed:

select ichar(letra_ascii) || '-' || substr('0' ||num_factura,length
(num_factura)+1,5) as factura from facturas 


but in 8.0.1 it outputs the next error:

ERROR:  not exist the function ichar(integer) 
HINT:  any function match with the arguments types.

Add explicit cast of types


I'll point you to Tom Lane's thoughts on this back in 2000:
 http://archives.postgresql.org/pgsql-hackers/2000-09/msg00418.php

HTH
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] SEVEN cross joins?!?!?

2005-10-11 Thread Frank Bax

At 08:29 AM 10/11/05, Richard Huxton wrote:


Frank Bax wrote:
I have a table with only 434 rows in it.  Two important columns are 
itemid and locn.  Each item must be in one of seven locations.  We 
need to create a combo by selecting one item from each of seven 
locations; then determine which combo is the best according to our 
analysis (see below).

A subselect for items in a location looks something like:
(select * from suit_item where locn='Head' AND username='Walter' ORDER BY 
itemid LIMIT 10) as Head
One subselect for each location, cross join them all and the query 
generates 10,000,000 combinations!  Without the LIMIT 10,  there are
78 * 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for 
username='Walter' (the only user at the moment).  The large volume is 
causing a problem for my systems!  The ORDER BY itemid was added only 
so that same 10 items were processed on different computer systems I 
tested this query on.  Only one item for 7th locn in the database at the 
moment.


Frank - it might just be me, but I've read your email twice and despite 
all the information I still don't have any idea what you are trying to do.


Are you saying that you WANT to generate a cross-join, score the millions 
of results and then pick the best 10? It's doing what you want, but you'd 
like it to be faster.


Or are you saying that you'd like to avoid the explosion in rows altogether?

In either case - I don't suppose you could provide a real example of the 
query, so we can see exactly what you're trying to do.



There is no best 10.  I currently limit each subselect to 10 items so 
that query will actually run without crashing.  I would like to remove the 
ORDER BY itemid LIMIT 10 mentioned above.  At the end of the query I have 
a LIMIT 100 clause which will stay and produces a list of best 100 combos.


Either of your solutions would be acceptable; since avoiding the 
explosion would also make the query faster.  Current calculations 
indicate that running the query without LIMIT 10 in subselect would take 
years to process.


The query is filled with expressions.  I'm not sure I can shorten it 
without making typos or deleting something important, so I'll make it 
available on web here:

http://www.execulink.com/~fbax/JOINS/
Results of explain analyse is also there. 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[SQL] pg, mysql comparison with group by clause

2005-10-11 Thread Rick Schumeyer
I'm not sure what I was thinking, but I tried the following query in pg:

SELECT * FROM t GROUP BY state;

pg returns an error.

Mysql, OTOH, returns the first row for each state.  (The first row with
AK, the first row with PA, etc.)  

I'm no SQL expert, but it seems to me that the pg behavior is correct, and
the mysql result is just weird.  Am I correct?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] pg, mysql comparison with group by clause

2005-10-11 Thread Scott Marlowe
On Tue, 2005-10-11 at 16:12, Rick Schumeyer wrote:
 I'm not sure what I was thinking, but I tried the following query in pg:
 
 SELECT * FROM t GROUP BY state;
 
 pg returns an error.
 
 Mysql, OTOH, returns the first row for each state.  (The first row with
 AK, the first row with PA, etc.)  
 
 I'm no SQL expert, but it seems to me that the pg behavior is correct, and
 the mysql result is just weird.  Am I correct?

Yes, you are correct.  The SQL standard is quite clear that in order to
appear in the select list, an entry must either be in the group by or be
operated upon by an aggregate function.  PostgreSQL supports this same
action by way of the non-standard

select distinct on(fieldlist), fieldlist from 

Since you don't know for sure which answer you'll get each time, it's
better to KNOW you're doing something that may not be reproduceable than
to accidentally do it when your database SHOULD be throwing an error.

That's just one of many many things MySQL does that makes my head hurt. 
For more, search google for mysql gotchas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] pg, mysql comparison with group by clause

2005-10-11 Thread Anthony Molinaro
You're 100% correct, this is a bug in mysql.

Sadly, they tout this as a feature!

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Rick Schumeyer
Sent: Tuesday, October 11, 2005 5:12 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] pg, mysql comparison with group by clause

I'm not sure what I was thinking, but I tried the following query in pg:

SELECT * FROM t GROUP BY state;

pg returns an error.

Mysql, OTOH, returns the first row for each state.  (The first row with
AK, the first row with PA, etc.)  

I'm no SQL expert, but it seems to me that the pg behavior is correct,
and
the mysql result is just weird.  Am I correct?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] question re. count, group by, and having

2005-10-11 Thread Muralidharan Ramakrishnan
Hi

 Logically HAVING is executed after the GROUP BY and it must contain only the columns in the GROUP BY or aggregated function. 

select state, count(state) as c from t group by state having c  5 

The above query grouped only on state and HAVING can be used only with the column state or the aggregated function count(state).

Regards,
R.Muralidharan
 Rick Schumeyer [EMAIL PROTECTED] wrote:





The following query returns an error (“column c does not exist”) in pg 8.0.3:

(The column ‘state’ is the two letter abbreviation for a US state)

-- get the number of rows for each state; list in descending order; include only states with at least 6 rows
select state, count(state) as c from t group by state having c  5 order by c desc; -- gives error

If I leave the having clause out, I get the expected results:

select state, count(state) as c from t group by state order by c desc; -- this works

Is this a bug or a feature? I’m not sure why I can use ‘c’ in the order by clause but not the having clause. pg is much happier with the full “having count(state)  5”. Will this cause count to be evaluated twice?

If it matters, state is varchar(2).



		 
Yahoo! India Matrimony: Find your partner online.

Re: [SQL] pg, mysql comparison with group by clause

2005-10-11 Thread Stephan Szabo
On Tue, 11 Oct 2005, Rick Schumeyer wrote:

 I'm not sure what I was thinking, but I tried the following query in pg:

 SELECT * FROM t GROUP BY state;

 pg returns an error.

 Mysql, OTOH, returns the first row for each state.  (The first row with
 AK, the first row with PA, etc.)

 I'm no SQL expert, but it seems to me that the pg behavior is correct, and
 the mysql result is just weird.  Am I correct?

In your case, it sounds like the mysql result is wrong. I believe SQL99
would allow it if the other columns were functionally dependant upon state
(as there'd by definition only be one value for the other columns per
group).


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] SEVEN cross joins?!?!?

2005-10-11 Thread Greg Patnude
Aha ! A gamer... playing with armor and hit points and things 




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Frank Bax
Sent: Tuesday, October 11, 2005 1:06 PM
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] SEVEN cross joins?!?!?

At 08:29 AM 10/11/05, Richard Huxton wrote:

Frank Bax wrote:
I have a table with only 434 rows in it.  Two important columns are 
itemid and locn.  Each item must be in one of seven locations.  We 
need to create a combo by selecting one item from each of seven 
locations; then determine which combo is the best according to our 
analysis (see below).
A subselect for items in a location looks something like:
(select * from suit_item where locn='Head' AND username='Walter' ORDER BY 
itemid LIMIT 10) as Head
One subselect for each location, cross join them all and the query 
generates 10,000,000 combinations!  Without the LIMIT 10,  there are
78 * 37 * 91 * 81 * 99 * 47 * 1 = 98,981,901,018 results returned for 
username='Walter' (the only user at the moment).  The large volume is 
causing a problem for my systems!  The ORDER BY itemid was added only 
so that same 10 items were processed on different computer systems I 
tested this query on.  Only one item for 7th locn in the database at the 
moment.

Frank - it might just be me, but I've read your email twice and despite 
all the information I still don't have any idea what you are trying to do.

Are you saying that you WANT to generate a cross-join, score the millions 
of results and then pick the best 10? It's doing what you want, but you'd 
like it to be faster.

Or are you saying that you'd like to avoid the explosion in rows
altogether?

In either case - I don't suppose you could provide a real example of the 
query, so we can see exactly what you're trying to do.


There is no best 10.  I currently limit each subselect to 10 items so 
that query will actually run without crashing.  I would like to remove the 
ORDER BY itemid LIMIT 10 mentioned above.  At the end of the query I have 
a LIMIT 100 clause which will stay and produces a list of best 100
combos.

Either of your solutions would be acceptable; since avoiding the 
explosion would also make the query faster.  Current calculations 
indicate that running the query without LIMIT 10 in subselect would take 
years to process.

The query is filled with expressions.  I'm not sure I can shorten it 
without making typos or deleting something important, so I'll make it 
available on web here:
 http://www.execulink.com/~fbax/JOINS/
Results of explain analyse is also there. 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org