Re: [PERFORM] Help optimizing a slow index scan
I may be wrong but we in astronomy have several sky indexing schemes, which allows to effectively use classical btree index. See http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization for details. Sergei Koposov has developed Q3C contrib module for PostgreSQL 8.1+ and we use it with billiard size astronomical catalogs. Oleg On Fri, 17 Mar 2006, Michael Fuhr wrote: On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote: Dan Harris [EMAIL PROTECTED] writes: Furthermore, by doing so, I am tying my queries directly to postgres-isms. One of the long term goals of this project is to be able to fairly transparently support any ANSI SQL-compliant back end with the same code base. Unfortunately, there isn't any portable or standard (not exactly the same thing ;-)) SQL functionality for dealing gracefully with two-dimensional searches, which is what your lat/long queries are. The OpenGIS Simple Features Specification[1] is a step in that direction, no? PostGIS[2], MySQL[3], and Oracle Spatial[4] implement to varying degrees. With PostGIS you do have to add non-standard operators to a query's predicate to benefit from GiST indexes on spatial columns, but the rest of the query can be straight out of the SQL and OGC standards. [1] http://www.opengeospatial.org/docs/99-049.pdf [2] http://www.postgis.org/ [3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html [4] http://www.oracle.com/technology/products/spatial/index.html Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Help optimizing a slow index scan
Try contrib/btree_gist. I've tried that one, but for my case it didn't help much. The performance was almost equal or even slower than built-in btree. On Fri, 17 Mar 2006 08:53:44 -0700 Dan Harris [EMAIL PROTECTED] wrote: Dan Harris wrote: Markus Bertheau wrote: Have you tried using a GIST index on lat long? These things are meant for two-dimensional data, whereas btree doesn't handle two-dimensional data that well. How many rows satisfy either of the long / lat condition? According to the analyze, less than 500 rows matched. I'll look into GIST indexes, thanks for the feedback. -Dan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster When I try to create a GIST index, I get the following error: create index eventgeo_lat_idx on eventgeo using GIST (lat); ERROR: data type double precision has no default operator class for access method gist HINT: You must specify an operator class for the index or define a default operator class for the data type. I'm not sure what a default operator class is, exactly.. -Dan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Evgeny Gridasov Software Engineer I-Free, Russia ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Help optimizing a slow index scan
On Fri, 17 Mar 2006, Evgeny Gridasov wrote: Try contrib/btree_gist. contrib/btree_gist does nothing more than built-in btree - it's just an support for multicolumn GiST indices. I've tried that one, but for my case it didn't help much. The performance was almost equal or even slower than built-in btree. On Fri, 17 Mar 2006 08:53:44 -0700 Dan Harris [EMAIL PROTECTED] wrote: Dan Harris wrote: Markus Bertheau wrote: Have you tried using a GIST index on lat long? These things are meant for two-dimensional data, whereas btree doesn't handle two-dimensional data that well. How many rows satisfy either of the long / lat condition? According to the analyze, less than 500 rows matched. I'll look into GIST indexes, thanks for the feedback. -Dan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster When I try to create a GIST index, I get the following error: create index eventgeo_lat_idx on eventgeo using GIST (lat); ERROR: data type double precision has no default operator class for access method gist HINT: You must specify an operator class for the index or define a default operator class for the data type. I'm not sure what a default operator class is, exactly.. -Dan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help optimizing a slow index scan
Markus Bertheau wrote: Have you tried using a GIST index on lat long? These things are meant for two-dimensional data, whereas btree doesn't handle two-dimensional data that well. How many rows satisfy either of the long / lat condition? According to the analyze, less than 500 rows matched. I'll look into GIST indexes, thanks for the feedback. -Dan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Help optimizing a slow index scan
Dan Harris wrote: Markus Bertheau wrote: Have you tried using a GIST index on lat long? These things are meant for two-dimensional data, whereas btree doesn't handle two-dimensional data that well. How many rows satisfy either of the long / lat condition? According to the analyze, less than 500 rows matched. I'll look into GIST indexes, thanks for the feedback. -Dan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster When I try to create a GIST index, I get the following error: create index eventgeo_lat_idx on eventgeo using GIST (lat); ERROR: data type double precision has no default operator class for access method gist HINT: You must specify an operator class for the index or define a default operator class for the data type. I'm not sure what a default operator class is, exactly.. -Dan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Help optimizing a slow index scan
On 3/16/06, Dan Harris [EMAIL PROTECTED] wrote: explain analyze select distinct eventmain.incidentid, eventmain.entrydate, eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy from eventmain, eventgeo where eventmain.incidentid = eventgeo.incidentid and ( long -104.998027962962 and long -104.985957781349 ) and ( lat 39.7075542720006 and lat 39.7186195832938 ) and eventmain.entrydate '2006-1-1 00:00' and eventmain.entrydate = '2006-3-17 00:00' order by eventmain.entrydate; As others will probably mention, effective queries on lot/long which is a spatial problem will require r-tree or gist. I don't have a lot of experience with exotic indexes but this may be the way to go. One easy optimization to consider making is to make an index on either (incidentid, entrydate) or (incident_id,long) which ever is more selective. This is 'yet another query' that would be fun to try out and tweak using the 8.2 upcoming row-wise comparison. merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Help optimizing a slow index scan
On Fri, Mar 17, 2006 at 08:34:26 -0700, Dan Harris [EMAIL PROTECTED] wrote: Markus Bertheau wrote: Have you tried using a GIST index on lat long? These things are meant for two-dimensional data, whereas btree doesn't handle two-dimensional data that well. How many rows satisfy either of the long / lat condition? According to the analyze, less than 500 rows matched. I'll look into GIST indexes, thanks for the feedback. Have you looked at using the Earth Distance contrib module? If a spherical model of the earth is suitable for your application, then it may work for you and might be easier than trying to create Gist indexes yourself. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help optimizing a slow index scan
On 3/17/06, Bruno Wolff III [EMAIL PROTECTED] wrote: Have you looked at using the Earth Distance contrib module? If a spherical model of the earth is suitable for your application, then it may work for you and might be easier than trying to create Gist indexes yourself. earth distance = great stuff. If the maximum error is known then you can just pad the distance and filter the result on the client if exact precision is needed. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help optimizing a slow index scan
Merlin Moncure wrote: As others will probably mention, effective queries on lot/long which is a spatial problem will require r-tree or gist. I don't have a lot of experience with exotic indexes but this may be the way to go. One easy optimization to consider making is to make an index on either (incidentid, entrydate) or (incident_id,long) which ever is more selective. This is 'yet another query' that would be fun to try out and tweak using the 8.2 upcoming row-wise comparison. merlin Thanks to everyone for your suggestions. One problem I ran into is that apparently my version doesn't support the GIST index that was mentioned. function 'box' doesn't exist ).. So I'm guessing that both this as well as the Earth Distance contrib require me to add on some more pieces that aren't there. Furthermore, by doing so, I am tying my queries directly to postgres-isms. One of the long term goals of this project is to be able to fairly transparently support any ANSI SQL-compliant back end with the same code base. If I had full control over the query designs, I could make stored procedures to abstract this. However, I have to deal with a gray box third-party reporting library that isn't so flexible. I'll certainly consider going with something postgre-specific, but only as a last resort. I tried the multi-column index as mentioned above but didn't see any noticeable improvement in elapsed time, although the planner did use the new index. What is the real reason for the index not being very effective on these columns? Although the numbers are in a very limited range, it seems that the records would be very selective as it's not terribly common for multiple rows to share the same coords. Is the 8.2. upcoming row-wise comparison something that would be likely to help me? Thanks again for your input ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Help optimizing a slow index scan
On 3/17/06, Dan Harris [EMAIL PROTECTED] wrote: Merlin Moncure wrote: Thanks to everyone for your suggestions. One problem I ran into is that apparently my version doesn't support the GIST index that was mentioned. function 'box' doesn't exist ).. So I'm guessing that both this as well as the Earth Distance contrib require me to add on some more pieces that aren't there. earth distance is a contrib module that has to be built and installed. it does use some pg-isms so I guess that can be ruled out. GIST is a bit more complex and I would consider reading the documentation very carefully regarding them and make your own determination. Furthermore, by doing so, I am tying my queries directly to postgres-isms. [snip] I tried the multi-column index as mentioned above but didn't see any noticeable improvement in elapsed time, although the planner did use the new index. did you try both flavors of the multiple key index I suggested? (there were other possiblities, please experiment) Is the 8.2. upcoming row-wise comparison something that would be likely to help me? possibly. good news is that rwc is ansi sql. you can see my blog about it here: http://people.planetpostgresql.org/merlin/ Specifically, if you can order your table with an order by statement such that the records you want are contingous, then yes. However, even though it's ansi sql, various commercial databases implement rwc improperly or not at all (mysql, to their credit, gets it right) and I still feel like an exotic index or some other nifty pg trick might be the best performance approach here). Merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Help optimizing a slow index scan
Dan Harris [EMAIL PROTECTED] writes: Furthermore, by doing so, I am tying my queries directly to postgres-isms. One of the long term goals of this project is to be able to fairly transparently support any ANSI SQL-compliant back end with the same code base. Unfortunately, there isn't any portable or standard (not exactly the same thing ;-)) SQL functionality for dealing gracefully with two-dimensional searches, which is what your lat/long queries are. You should accept right now that you can have portability or you can have good performance, not both. Merlin's enthusiasm for row-comparison queries is understandable because that fix definitely helped a common problem. But row comparison has nothing to do with searches in two independent dimensions. Row comparison basically makes it easier to exploit the natural behavior of multicolumn btree indexes ... but a multicolumn btree index does not efficiently support queries that involve separate range limitations on each index column. (If you think about the index storage order you'll see why: the answer entries are not contiguous in the index.) To support two-dimensional searches you really need a non-btree index structure, such as GIST. Since this isn't standard, demanding a portable answer won't get you anywhere. (I don't mean to suggest that Postgres is the only database that has such functionality, just that the DBs that do have it don't agree on any common API.) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Help optimizing a slow index scan
On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote: Dan Harris [EMAIL PROTECTED] writes: Furthermore, by doing so, I am tying my queries directly to postgres-isms. One of the long term goals of this project is to be able to fairly transparently support any ANSI SQL-compliant back end with the same code base. Unfortunately, there isn't any portable or standard (not exactly the same thing ;-)) SQL functionality for dealing gracefully with two-dimensional searches, which is what your lat/long queries are. The OpenGIS Simple Features Specification[1] is a step in that direction, no? PostGIS[2], MySQL[3], and Oracle Spatial[4] implement to varying degrees. With PostGIS you do have to add non-standard operators to a query's predicate to benefit from GiST indexes on spatial columns, but the rest of the query can be straight out of the SQL and OGC standards. [1] http://www.opengeospatial.org/docs/99-049.pdf [2] http://www.postgis.org/ [3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html [4] http://www.oracle.com/technology/products/spatial/index.html -- Michael Fuhr ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Help optimizing a slow index scan
explain analyze select distinct eventmain.incidentid, eventmain.entrydate, eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy from eventmain, eventgeo where eventmain.incidentid = eventgeo.incidentid and ( long -104.998027962962 and long -104.985957781349 ) and ( lat 39.7075542720006 and lat 39.7186195832938 ) and eventmain.entrydate '2006-1-1 00:00' and eventmain.entrydate = '2006-3-17 00:00' order by eventmain.entrydate; QUERY PLAN - Unique (cost=121313.81..121330.72 rows=451 width=178) (actual time=723719.761..723726.875 rows=1408 loops=1) - Sort (cost=121313.81..121314.94 rows=451 width=178) (actual time=723719.755..723721.807 rows=1408 loops=1) Sort Key: eventmain.entrydate, eventmain.disposition, eventmain.incidentid, eventgeo.reportingarea, eventgeo.beatid, eventmain.finaltype, eventmain.casenumber, eventgeo.eventlocation, eventmain.insertdate, eventmain.priority, eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy - Nested Loop (cost=0.00..121293.93 rows=451 width=178) (actual time=1916.230..723712.900 rows=1408 loops=1) - Index Scan using eventgeo_lat_idx on eventgeo (cost=0.00..85488.05 rows=10149 width=76) (actual time=0.402..393376.129 rows=22937 loops=1) Index Cond: ((lat 39.7075542720006::double precision) AND (lat 39.7186195832938::double precision)) Filter: ((long -104.998027962962::double precision) AND (long -104.985957781349::double precision)) - Index Scan using eventmain_incidentid_idx on eventmain (cost=0.00..3.52 rows=1 width=119) (actual time=14.384..14.392 rows=0 loops=22937) Index Cond: ((eventmain.incidentid)::text = (outer.incidentid)::text) Filter: ((entrydate '2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate = '2006-03-17 00:00:00'::timestamp without time zone)) Total runtime: 723729.238 ms(!) I'm trying to figure out why it's consuming so much time on the index scan for eventgeo_lat_idx. Also, I have an index on long that the planner does not appear to find helpful. There are 3.3 million records in eventmain and eventgeo. The server has a reasonably fast RAID10 setup with 16x 15k RPM drives and 12GB of RAM ( 11GB listed as cache by vmstat ). Running version 8.0.2 on linux kernel 2.6.12. I have just vacuum analyze'd both tables, rebuilt the eventgeo_lat_idx index and reran the query multiple times to see if caching helped ( it didn't help much ). The server seems to be fine utilizing other fields from this table but using long and lat seem to drag it down significantly. Is it because there's such slight differences between the records, since they are all within a few hundredths of a degree from each other? Thanks for your time and ideas. -Dan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match