I have a database with the following tables:

zip {
   zip_id bigserial,
   zip_cd varchar(6) -- this is indexed
}

city {
   city_id bigserial,
   city_name varchar(50)
}

zip_city {
   city_id bigint (FK to city table),
   zip_id bigint (FK to zip table),
   longitude numeric,
   latitude numeric
}

We want to find the zip_cd values for all zip codes within 50 miles of a
specified zip code and have this query so far (assume our given zip code is
64131):

   select
       zip_city.zip_id,
       gps_distance(zip_city.longitude::numeric,
zip_city.latitude::numeric, my_location.longitude::numeric,
my_location.latitude::numeric) AS distance
   from zip_city
   inner join (
       select
           longitude,
           latitude
       from zip
       inner join zip_city on (
           zip.zip_id = zip_city.zip_id
       )
       where zip_cd = '64131'
   ) my_location on (
       -- We use this as an approximation to filter out as many records as
possible before doing the gps_distance() function call
       -- It works well as a outer bounds to get the most obvious outside
zip_city values.
       abs(zip_city.longitude - my_location.longitude) <= (50.0 / 60.0)
       AND abs(zip_city.latitude - my_location.latitude) <= (50.0 / 60.0)
   )

This runs in about 2 seconds on Windows and gives me 650 records - not bad.

When I add just a little bit:

   select
       zip_city.zip_id,
       gps_distance(zip_city.longitude::numeric,
zip_city.latitude::numeric, my_location.longitude::numeric,
my_location.latitude::numeric) AS distance
   from zip_city
   -->>>> THE ONLY DIFFERENCE IS THE NEXT 3 LINES <<<<--
   inner join zip on (
       zip_city.zip_id = zip.zip_id
   )
   -->>>> End of added code <<<<--
   inner join (
       select
           longitude,
           latitude
       from zip
       inner join zip_city on (
           zip.zip_id = zip_city.zip_id
       )
       where zip_cd = '64131'
   ) my_location on (
       abs(zip_city.longitude - my_location.longitude) <= (50.0 / 60.0)
       AND abs(zip_city.latitude - my_location.latitude) <= (50.0 / 60.0)
   )

Performance tanks with this query - it takes over 120 seconds (that is where
I set the timeout).  I read that a foreign key doesn't help performance and
my guess is that PostgreSQL isn't just joining on the 650 records but rather
on the over 800,000 records in the zip table (yes we have that many - we
include Canada zip codes which is 90% of the data - Blame Canada! ;).

Is there something I can do to boost performance?  I tried putting the first
query above in an inner join before joining with the zip table with the same
result.

BTW, on our Linux box the full query we run (which adds 3 more tables on the
whole operation along with more filtering on the zip table) finishes in
under 10 seconds.  Problem is our development is on Windows and this is a
real pain for developers to test.

Any assistance is appreciated.  I did look at the explain plans but I don't
see anything other than what I mentioned above.  I can include those if
necessary.

Thanks!
Aaron

--
==================================================================
  Aaron Bono
  Aranya Software Technologies, Inc.
  http://www.aranya.com
  http://codeelixir.com
==================================================================

Reply via email to