Gregory,
I have been using within() to do the same, and have found many of the same 
performance issues.  There are a couple of points that I think are useful:

1) the "&&" queries are generally fast, especially when using a GIST index, 
because it is a very effective index - nothing comparable exists (to my 
knowledge) for distance and within().
2) In my investigations it seems that MULTIPOLYGONS may be the culprit in 
slowing things down. That is, it is not the complexity of the geometry per se, 
but rather, the number of isolated geometries within a given shape column, as 
indicated by the command "select my_id, numGeometries(the_geom) from my_table 
group by my_id;".  I believe that if you are operating on shapes where the 
"numGeometries" command returns a value of 1, they will perform within() 
queries very quickly, however, when that number becomes considerably greater 
than 1, the queries bog down.  This occurs often when you have shoreline type 
geometries that are non-continuous, or geometries that have a number of "holes" 
in them.

I think that if you can insure that your geometries are single continuous 
entities that you can improve performance.  I have thought to also investigate 
the code that underlies the within() command, but have been unsuccessful in 
understanding the code.  Some optimization here would be very worthwhile.  If 
you were to be able to progress in this area, I would be interested in seeing 
what you can find, and perhaps contributing to the investigation.

I would be interested to know if anyone who knows more about this topic thinks 
my conclusions are correct.

Hope this is at least somewhat helpful,
r.b.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gregory 
Williamson
Sent: Thursday, July 05, 2007 8:06 AM
To: PostGIS Users Discussion; postgis-users@postgis.refractions.net
Subject: RE: [postgis-users] Large geometry issue

FWIW, this is PostgreSQL 8.2.4,  POSTGIS="1.2.1" GEOS="3.0.0rc4-CAPI-1.3.3" 
USE_STATS
on a linux box with 8 gigs o' RAM ...
GSW


-----Original Message-----
From: [EMAIL PROTECTED] on behalf of Gregory Williamson
Sent: Thu 7/5/2007 3:09 AM
To: postgis-users@postgis.refractions.net
Subject: [postgis-users] Large geometry issue

Dear peoples,

I have a problem with a query that uses an absurdly large geometry (2530 in a 
single polygon). This is srid -1 (part of a large test of postgres vs some 
other database product). Everything has been vacuumed and analyzed.

The initial search to find candidates in a target table is quite fast:
catest=# select count(*) from wtm_sub_cell w, order_line_item x WHERE x.bbox && 
w.geometry AND x.id_as_int = 114672;
 count
-------
 13168
(1 row)

Time: 9.472 ms

Trying to get the list narrowed to geometries that are completely contained by 
the requested shape is slow:
catest=# select count(*) from wtm_sub_cell w, order_line_item x WHERE x.bbox && 
w.geometry AND distance(x.geometry,w.geometry) = 0 and x.id_as_int = 114672;
 count
-------
  1112
(1 row)

Time: 69277.780 ms

So I have two questions:
  a) anything better to use than "distance(x,y) = 0) ? I tried st_within -- it 
is about the same speed but returns no polys, which is strange to me, but I 
also haven't looked at these in detail yet. For example:
catest=# select count(*) from wtm_sub_cell w, order_line_item x WHERE x.bbox && 
w.geometry AND st_within(x.geometry,w.geometry) and x.id_as_int = 114672;
 count
-------
     0
(1 row)

Time: 1173.185 ms
(same results with st_within(w.geometry,x.geometry):
catest=# select count(*) from wtm_sub_cell w, order_line_item x WHERE x.bbox && 
w.geometry AND st_within(w.geometry,x.geometry) and x.id_as_int = 114672;
 count
-------
     0
(1 row)


  b) anything I can do to speed things up ? I have tried boosting work mem to 
16 megs (from 1) and it made no apparent difference.



I have a self contained test case that shows the same behavior -- the one large 
poly and all the candidates in another table. Apologies for the size; hopefully 
it's not been mangled in the transfers.

Explain analyze of the sample (the sequential is sensible since there is only 
one row in the table):
catest=# explain analyze select count(*) from wsc_candidates w, oli_req x WHERE 
w.geometry && x.bbox AND distance(w.geometry,x.oli_req_geom) > 0 AND 
x.oli_req_id = 114672;
                                                                      QUERY 
PLAN                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=20.28..20.29 rows=1 width=0) (actual 
time=77232.858..77232.859 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..9.30 rows=4389 width=0) (actual 
time=6.389..77221.506 rows=12056 loops=1)
         Join Filter: (distance(w.geometry, x.oli_req_geom) > 0::double 
precision)
         ->  Seq Scan on oli_req x  (cost=0.00..1.01 rows=1 width=40602) 
(actual time=0.007..0.009 rows=1 loops=1)
               Filter: (oli_req_id = 114672)
         ->  Index Scan using wsc_c_spatial_ndx on wsc_candidates w  
(cost=0.00..8.27 rows=1 width=109) (actual time=0.022..25.991 rows=13168 
loops=1)
               Index Cond: (w.geometry && x.bbox)
               Filter: (w.geometry && x.bbox)
 Total runtime: 77232.901 ms
(9 rows)

Time: 77233.773 ms


And for the real thing:
catest=# explain analyze select count(*) from wtm_sub_cell w, order_line_item x 
WHERE w.geometry && x.bbox AND distance(w.geometry,x.geometry) = 0 AND 
x.id_as_int = 114672;
                                                                 QUERY 
PLAN                                           
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=141.83..141.84 rows=1 width=0) (actual 
time=77457.587..77457.588 rows=1 loops=1)
   ->  Nested Loop  (cost=5.99..141.83 rows=1 width=0) (actual 
time=15.682..77456.541 rows=1112 loops=1)
         Join Filter: (distance(w.geometry, x.geometry) = 0::double precision)
         ->  Index Scan using oli_id_ndx on order_line_item x  (cost=0.00..8.30 
rows=1 width=383) (actual time=0.012..0.018 rows=1 loops=1)
               Index Cond: (id_as_int = 114672)
         ->  Bitmap Heap Scan on wtm_sub_cell w  (cost=5.99..132.97 rows=32 
width=109) (actual time=2.988..21.796 rows=13168 loops=1)
               Filter: (w.geometry && x.bbox)
               ->  Bitmap Index Scan on wsc_geom_idx1  (cost=0.00..5.98 rows=32 
width=0) (actual time=2.828..2.828 rows=13168 loops=1)
                     Index Cond: (w.geometry && x.bbox)
 Total runtime: 77457.633 ms
(10 rows)

Time: 77458.458 ms


The tables involved by size:
catest=# select count(*) from wsc_candidates;
 count
-------
 13168
(1 row)

Time: 2.586 ms
catest=# select count(*) from oli_req;
 count
-------
     1
(1 row)

Time: 0.193 ms
catest=# select count(*) from wtm_sub_cell;
  count
---------
 6399928
(1 row)

Time: 1776.508 ms
catest=# select count(*) from order_line_item;
 count
--------
 395921
(1 row)

Time: 176.083 ms


Many thanks for your time and bandwidth!

Greg Williamson
Senior DBA
GlobeXplorer LLC, a DigitalGlobe company

Confidentiality Notice: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information and must be protected in accordance with those 
provisions. Any unauthorized review, use, disclosure or distribution is 
prohibited. If you are not the intended recipient, please contact the sender by 
reply e-mail and destroy all copies of the original message.

(My corporate masters made me say this.)

_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to