[SQL] GiST index question: performance

2007-03-05 Thread Steve Midgley

Hi,

First off, can I say how much I love GiST? It's already solved a few 
problems for me that seemed impossible to solve in real-time queries. 
Thanks to everyone who works on that project!


I'm developing a geographic index based on a set of zip code 
boundaries. Points of interest (POI) will fall within some boundaries 
and not others. I need to search to find which POI are within a 
specified boundary.


I think have two options (see below) and I'm wondering if anyone has an 
opinion or experience as to whether one or the other will have 
substantially different performance characteristics. I can obviously 
test when I get that far, but I'd prefer to try the anticipated faster 
route first, if anyone has existing experience they can share:


1) Index a series of circles of NN radius around each boundary marker 
(lat/long point). Run a search on POI for those that fall within any of 
the specified circles.


2) Index a set of polygons that mark the "minimum area" around the 
boundary markers in question. Run a search on POI that fall within this 
single polygon.


The polygon will have more points, but there will be more circles to 
search - my understanding of GiST is limited so I'm not sure if there's 
a performance benefit to searching many circles or a few polygons.


My tables are of this size:

# of POI: 50,000
# of zip blocks (with and without regions): 217,000
# of zip blocks in a given city (and hence in a given polygon): ~5

Any thoughts or ideas?

Thank you,

Steve

p.s. I could use a GIS system alongside of Postgres but performance and 
efficiency are key to this system, and it seems to me that raw GiST 
indexed SQL queries are going to be fastest and create the lowest load 
on the server?



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [SQL] GiST index question: performance

2007-03-05 Thread Oleg Bartunov

On Mon, 5 Mar 2007, Steve Midgley wrote:


Hi,

First off, can I say how much I love GiST? It's already solved a few problems 
for me that seemed impossible to solve in real-time queries. Thanks to 
everyone who works on that project!


Thanks, Steve !



I'm developing a geographic index based on a set of zip code boundaries. 
Points of interest (POI) will fall within some boundaries and not others. I 
need to search to find which POI are within a specified boundary.


You POI is what we call ConeSearch query in astronomy.
Please, take a look on Q3C algorithm available from http://q3c.sf.net.
Some information 
http://www.sai.msu.su/~megera/wiki/SkyPixelization


This is what we use in our Virtual Observatory project and we're able to
work with 10^9 objects on moderate hardware. It doesn't use GiST but
special pixelization scheme allow to use standard Btree.



I think have two options (see below) and I'm wondering if anyone has an 
opinion or experience as to whether one or the other will have substantially 
different performance characteristics. I can obviously test when I get that 
far, but I'd prefer to try the anticipated faster route first, if anyone has 
existing experience they can share:


1) Index a series of circles of NN radius around each boundary marker 
(lat/long point). Run a search on POI for those that fall within any of the 
specified circles.


2) Index a set of polygons that mark the "minimum area" around the boundary 
markers in question. Run a search on POI that fall within this single 
polygon.


The polygon will have more points, but there will be more circles to search - 
my understanding of GiST is limited so I'm not sure if there's a performance 
benefit to searching many circles or a few polygons.


My tables are of this size:

# of POI: 50,000
# of zip blocks (with and without regions): 217,000
# of zip blocks in a given city (and hence in a given polygon): ~5

Any thoughts or ideas?

Thank you,

Steve

p.s. I could use a GIS system alongside of Postgres but performance and 
efficiency are key to this system, and it seems to me that raw GiST indexed 
SQL queries are going to be fastest and create the lowest load on the server?



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

  http://www.postgresql.org/about/donate



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [SQL] GiST index question: performance

2007-03-05 Thread Steve Midgley

Thanks Oleg - very interesting stuff you are working on.

You may recall I exchanged emails with you on openfts a little while 
ago - my ISP that manages my Pg SQL server is (in my interests) 
concerned about installing anything non-standard (read: unstable) onto 
their server. I was able to get them to install your TSearch2 b/c it's 
been proven many times, but I'm hesitant to even bring up Q3C since 
it's less widely deployed.


The search method I proposed in my first email is not totally accurate 
but just searching circles with radii using a GiST index and standard 
Pg circle datatypes seems like a "close enough" solution for me (as 
opposed to Q3C's conical search intersections with a spherical 
projection). I realize that at higher latitudes my circles will be 
elliptical but our needs are for approximations that are very fast 
rather than accurate and the radii being searched are small relative to 
the size of the sphere (I.e. when searching Nome, find everything in 
+/- 40 miles and especially don't return Anchorage POI)..


It's an end user database, so if the query takes 500ms, that's really 
too long. On the Q3C site, I see that your measure of speed is 
processing many, many rows in 20 hours, which is a whole different 
ballgame. :)


Do you have a thought as to whether GiST is going to be faster/more 
efficient with Pg standard types of polygons or circles? I suppose I 
should just test out both, and quit wasting your time. I'll certainly 
repost to the list with whatever I uncover.


I really do appreciate the help you've provided.

Sincerely,

Steve



At 12:21 PM 3/5/2007, you wrote:

On Mon, 5 Mar 2007, Steve Midgley wrote:


Hi,

First off, can I say how much I love GiST? It's already solved a few 
problems for me that seemed impossible to solve in real-time queries. 
Thanks to everyone who works on that project!


Thanks, Steve !



I'm developing a geographic index based on a set of zip code 
boundaries. Points of interest (POI) will fall within some boundaries 
and not others. I need to search to find which POI are within a 
specified boundary.


You POI is what we call ConeSearch query in astronomy.
Please, take a look on Q3C algorithm available from http://q3c.sf.net.
Some information http://www.sai.msu.su/~megera/wiki/SkyPixelization

This is what we use in our Virtual Observatory project and we're able 
to

work with 10^9 objects on moderate hardware. It doesn't use GiST but
special pixelization scheme allow to use standard Btree.



I think have two options (see below) and I'm wondering if anyone has 
an opinion or experience as to whether one or the other will have 
substantially different performance characteristics. I can obviously 
test when I get that far, but I'd prefer to try the anticipated 
faster route first, if anyone has existing experience they can share:


1) Index a series of circles of NN radius around each boundary marker 
(lat/long point). Run a search on POI for those that fall within any 
of the specified circles.


2) Index a set of polygons that mark the "minimum area" around the 
boundary markers in question. Run a search on POI that fall within 
this single polygon.


The polygon will have more points, but there will be more circles to 
search - my understanding of GiST is limited so I'm not sure if 
there's a performance benefit to searching many circles or a few 
polygons.


My tables are of this size:

# of POI: 50,000
# of zip blocks (with and without regions): 217,000
# of zip blocks in a given city (and hence in a given polygon): ~5

Any thoughts or ideas?

Thank you,

Steve

p.s. I could use a GIS system alongside of Postgres but performance 
and efficiency are key to this system, and it seems to me that raw 
GiST indexed SQL queries are going to be fastest and create the 
lowest load on the server?



---(end of 
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

  http://www.postgresql.org/about/donate


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83