Re: [GENERAL] small table, huge table, and a join = slow and tough query. cake inside!

2008-05-30 Thread Ed Fialkowski
Hey guys, thanks so much for all the help.  I had never seen the HAVING
clause anywhere.  That solved so many problems.

As for speed, I do not have indexing (most of my sql experience so far has
been on fairly 'small' tables, which I guess would be considered tiny for
everyone else).

test=# VACUUM nei_area_val;
VACUUM
test=# ANALYZE nei_area_val;
ANALYZE

and..I think one of those two helped out quite a bit.  Here's an explain
analyze for one of my more complicated queries: (nei_area_val = huge,
nei_area = small)

test=# EXPLAIN ANALYZE SELECT nei_area.fips, AsText(nei_area.the_geom) as
fs_text_geom, nei_area.name, nei_area_val.strpollutantcode,
SUM(nei_area_val.dblemissionnumericvalue_ton) FROM nei_area INNER JOIN
nei_area_val ON nei_area.fips = nei_area_val.fips WHERE
nei_area_val.strpollutantcode='CO' AND the_geom 
SetSRID('BOX3D(-100.00 40.00,-90.00 50.00)'::box3d, 4269)
and intersects(the_geom, SetSRID('BOX3D(-100.00 40.00,-90.00 5
0.00)'::box3d, 4269)) GROUP BY nei_area.fips, nei_area.the_geom,
nei_area.name, nei_area_val.strpollutantcode HAVING
SUM(nei_area_val.dblemissionnumericvalue_ton)  500;

QUERY
PLAN

---
 GroupAggregate  (cost=106998.22..107035.89 rows=39 width=3797) (actual
time=4712.708..5743.313 rows=230 loops=1)
   Filter: (sum(dblemissionnumericvalue_ton)  500::double precision)
   -  Sort  (cost=106998.22..107003.49 rows=2108 width=3797) (actual
time=4708.411..5330.771 rows=15679 loops=1)
 Sort Key: nei_area.fips, nei_area.the_geom, nei_area.name,
nei_area_val.strpollutantcode
 -  Hash Join  (cost=30627.47..103430.84 rows=2108 width=3797)
(actual time=2555.057..3938.329 rows=15679 loops=1)
   Hash Cond: (nei_area_val.fips = nei_area.fips)
   -  Seq Scan on nei_area_val  (cost=0.00..72346.21
rows=116288 width=25) (actual time=46.964..2446.264 rows=122885 loops=1)
 Filter: (strpollutantcode = 'CO'::text)
   -  Hash  (cost=30626.84..30626.84 rows=50 width=3781)
(actual time=1193.834..1193.834 rows=415 loops=1)
 -  Seq Scan on nei_area  (cost=0.00..30626.84 rows=50
width=3781) (actual time=1038.950..1187.324 rows=415 loops=1)
   Filter: ((the_geom 
'010320AD100100050059C0444059C04940008056C04940008056C0444059C04440'::geometry)
AND intersects(the_geom,
'010320AD100100050059C0444059C04940008056C04940008056C0444059C04440'::geometry))
 Total runtime: 5762.061 ms
(12 rows)



5.7 seconds!  I can live with that! So, maybe it was just the vacuum I
needed?

Anyway, I don't have indexing, I'll read about it and try not to screw
anything up when I add it, but seriously thanks so much all of you!

-Ed


Re: [GENERAL] small table, huge table, and a join = slow and tough query. cake inside!

2008-05-29 Thread Alban Hertroys


On May 28, 2008, at 9:27 PM, edfialk wrote:


example rows from huge table (almost 900,000 rows):
fips, pollutant, value


That's not really huge in comparison with other pgsql databases.  
There are terabyte installations out there ;)



SELECT small.fips, small.geom, small.name, SUM(huge.value) from small
JOIN huge on huge.fips = small.fips WHERE (SUM(huge.value))  500 AND
huge.pollutant='co';

wonder if that makes sense.  Obviously, can't have an aggregate in
where clause, so I've tried a couple WHERE (SELECT) kind of things,
nothing working out too well.

So first, if anyone has any idea on the best way I can do a WHERE
(sum(huge.value)  500)
or...
any ideas on how I could speed up the query, I would be so extremely
grateful.


That's what GROUP BY and HAVING are for:

SELECT fips, small.geom, small.name, SUM(huge.value)
FROM small
JOIN huge USING (fips)
WHERE huge.pollutant='co'
GROUP BY fips, small.geom, small.name
HAVING SUM(huge.value)  500;

Guessing from your performance problem you may not have an index on  
huge.fips? And did you vacuum/analyse those tables anytime recently?


Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,483e4a67927663141972859!



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] small table, huge table, and a join = slow and tough query. cake inside!

2008-05-28 Thread edfialk
Hey everybody, I'm having an issue executing the query I want.  I've
let a couple queries run a good 20 minutes and it still hasn't
finished.  For my app, this is way too long to be useful, and I'm not
real sure what else I can do, let alone if my query is even what I
want.

So I'm really hoping the great internets will help me out.

example rows from huge table (almost 900,000 rows):
fips, pollutant, value
1, co, 0.1
1, co, 0.2
1, co, 0.3
1, so, 1.0
1, so, 2.0
2, co, 0.5
2, co, 0.3
etc.

example from small table (3233 rows)
fips, geom, name
1, some geometry, Some County, Some State
2, some geometry, Some Other County, Some State

Every fips in the small table has MULTIPLE rows in the huge table
(roughly 300 per).

So, what I need is all kinds of things, but to start (pseudo-query):

SELECT small.fips, small.geom, small.name, SUM(huge.value) from small
JOIN huge on huge.fips = small.fips WHERE (SUM(huge.value))  500 AND
huge.pollutant='co';

wonder if that makes sense.  Obviously, can't have an aggregate in
where clause, so I've tried a couple WHERE (SELECT) kind of things,
nothing working out too well.

So first, if anyone has any idea on the best way I can do a WHERE
(sum(huge.value)  500)
or...
any ideas on how I could speed up the query, I would be so extremely
grateful.

Thanks in advance!
-Ed

P.S. Sorry, the cake was a lie.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] small table, huge table, and a join = slow and tough query. cake inside!

2008-05-28 Thread Stephen Denne
edfialk wrote:
 SELECT small.fips, small.geom, small.name, SUM(huge.value) from small
 JOIN huge on huge.fips = small.fips WHERE (SUM(huge.value))  500 AND
 huge.pollutant='co';
 
 wonder if that makes sense.  Obviously, can't have an aggregate in
 where clause, so I've tried a couple WHERE (SELECT) kind of things,
 nothing working out too well.
 
 So first, if anyone has any idea on the best way I can do a WHERE
 (sum(huge.value)  500)
 or...
 any ideas on how I could speed up the query, I would be so extremely
 grateful.

SELECT small.fips, small.geom, small.name, SUM(huge.value)
from small JOIN huge on huge.fips = small.fips 
WHERE
huge.pollutant='co';
GROUP BY small.fips, small.geom, small.name
HAVING SUM(huge.value)  500;

Regards,
Stephen Denne
At the Datamail Group we value teamwork, respect, achievement, client focus, 
and courage. 
This email with any attachments is confidential and may be subject to legal 
privilege.  
If it is not intended for you please advise by replying immediately, destroy it 
and do not 
copy, disclose or use it in any way.

The Datamail Group, through our GoGreen programme, is committed to 
environmental sustainability.  
Help us in our efforts by not printing this email.
__
  This email has been scanned by the DMZGlobal Business Quality
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] small table, huge table, and a join = slow and tough query. cake inside!

2008-05-28 Thread Gregory Stark
edfialk [EMAIL PROTECTED] writes:

 So, what I need is all kinds of things, but to start (pseudo-query):

 SELECT small.fips, small.geom, small.name, SUM(huge.value) from small
 JOIN huge on huge.fips = small.fips WHERE (SUM(huge.value))  500 AND
 huge.pollutant='co';

 wonder if that makes sense.  Obviously, can't have an aggregate in
 where clause, so I've tried a couple WHERE (SELECT) kind of things,
 nothing working out too well.

For this case HAVING will suffice:

select small.*, sum(huge.value)
  from small
  join huge on (huge.fips = small.fips)
 where huge.pollutant='co'
 having sum(huge.value)  500

But in more complex cases you may have to use a subquery and further where
clauses or even joins outside the subquery.

You could write this, for example, as:

select * 
  from small join (
select fips,sum(huge.value) as sum
  from huge
 where pollutant='co'
   ) as huge_sum using (fips)
  where huge_sum.sum  500
 
Which may actually run faster (Unfortunately Postgres doesn't use the foreign
key relationship when planning so it can't reorder the join and the where
clause because it doesn't know that every huge record will have a matching
small record)

 any ideas on how I could speed up the query, I would be so extremely
 grateful.

You could try an index on pollutant or fips,pollutant but with the numbers
you're talking about they probably won't help unless you have a *lot* of
different pollutants and not all that many records for each pollutant.

To get anything more you would have to post an EXPLAIN output and preferably
an EXPLAIN ANALYZE output if you can find a query which completes.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] small table, huge table, and a join = slow and tough query. cake inside!

2008-05-28 Thread Klint Gore

edfialk wrote:

So, what I need is all kinds of things, but to start (pseudo-query):

SELECT small.fips, small.geom, small.name, SUM(huge.value) from small
JOIN huge on huge.fips = small.fips WHERE (SUM(huge.value))  500 AND
huge.pollutant='co';

wonder if that makes sense.  Obviously, can't have an aggregate in
where clause, so I've tried a couple WHERE (SELECT) kind of things,
nothing working out too well.
So first, if anyone has any idea on the best way I can do a WHERE
(sum(huge.value)  500)
  

See GROUP BY and HAVING


or...
any ideas on how I could speed up the query, I would be so extremely
grateful.

  

What columns are primary keys or indexed?

Run this and post what it says

 vacuum;
 analyze;
 explain insert query here

If you can wait for the query to finish, change the last line to 
explain analyze ...


klint.

--
Klint Gore
Database Manager
Sheep CRC
A.G.B.U.
University of New England
Armidale NSW 2350

Ph: 02 6773 3789  
Fax: 02 6773 3266

EMail: [EMAIL PROTECTED]


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general