In response to Christopher Condit <con...@sdsc.edu>:

> > > I have two tables that are georeferenced (although in this case I'm
> > not using PostGIS) that I need to join.
> > > A ( lat | lon | depth | value)
> > > |A| = 1,100,000
> > >
> > > B ( lat | lon | attributes)
> > > |B| = 14,000,000
> > >
> > > A is a special case because the lat / lon values are all at half
> > degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.
> > > I've written a function in B called getSpecialLat(latitude) and
> > getSpecialLon(longitude)  to calculate the correct A latitude and built
> > an index on both functions.
> > >
> > > Here's the query that I'm trying, but it's rather slow:
> > > SELECT B.* FROM B,
> > > (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> > AS foo
> > > WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) =
> > foo.lon
> > >
> > > "Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"
> > > "  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26 rows=6 
> > > width=16)"
> > > "        Index Cond: ((value > 0) AND (value < 2))"
> > > "        Filter: (depth = 0)"
> > > "  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424 width=422)"
> > > "        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon) AND 
> > > (getSpecialLat((B.lat)::numeric) = A.lat))"
> > > "        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"
> > > "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..1760.38 
> > > rows=84859 width=0)"
> > > "                    Index Cond: (getSpecialLon((B.lon)::numeric) = 
> > > A.lon)"
> > > "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..1766.81 
> > > rows=84859 width=0)"
> > > "                    Index Cond: (getSpeicalLat((B.latitude)::numeric) = 
> > > A.lat)"
> > >
> > > Am I missing something in terms of speeding up this query?
> > 
> > I'd be interested to see if the query rewritten as a JOIN would be
> > faster.
> 
> I can write it like this:
> select b.*
> from b join a on (getwoalatitude(b.latitude::numeric) = a.lat
> and getwoalongitude(b.longitude::numeric) = a.lon)
> where 
> a.value > 0 and a.value < 2 and a.depth = 0
> 
> which results in this plan:
> "Nested Loop  (cost=1387.20..13152982.35 rows=1625767 width=422)"
> "  ->  Index Scan using a_depthidx on a_(cost=0.00..1464.07 rows=6897 
> width=16)"
> "        Index Cond: (depth = 0)"
> "        Filter: ((value > 0::numeric) AND (value < 2::numeric))"
> "  ->  Bitmap Heap Scan on b  (cost=1387.20..1686.37 rows=424 width=422)"
> "        Recheck Cond: ((getSpecialLon((b.lon)::numeric) = a.lon) AND 
> (getSpecialLat((b.lat)::numeric) = a.lat))"
> "        ->  BitmapAnd  (cost=1387.20..1387.20 rows=424 width=0)"
> "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..672.15 
> rows=84859 width=0)"
> "                    Index Cond: (getSpecialLon((b.lon)::numeric) = a.lon)"
> "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..672.36 
> rows=84859 width=0)"
> "                    Index Cond: (getSpecialLat((b.lat)::numeric) = a.lat)"
> 
> However it's still taking ages to execute (over five minutes - I stopped it 
> before it finished)

Do you really expect that query to return 1.6M rows?  I doubt it, since
the subselect version only returns 1400.  If you do a vacuum analyze on
those two tables, does it speed either of the queries up?

Try "set enable_nestloop=off" and rerun the two queries.  If that helps
and analyze didn't then perhaps you need to increase the statistics
target on those two tables, or perhaps you'll just have to use that
set command to force the planner to avoid a nested loop.

Hope one of these helps.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to