subject:"\[PERFORM\] Help optimizing a slow index scan"

Re: [PERFORM] Help optimizing a slow index scan

2006-03-18 Thread Oleg Bartunov


I may be wrong but we in astronomy have several sky indexing schemes, which
allows to effectively use classical btree index. See 
http://www.sai.msu.su/~megera/oddmuse/index.cgi/SkyPixelization
for details. Sergei Koposov has developed Q3C contrib module for 
PostgreSQL 8.1+ and we use it with billiard size astronomical catalogs.



Oleg

On Fri, 17 Mar 2006, Michael Fuhr wrote:


On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote:

Dan Harris [EMAIL PROTECTED] writes:

Furthermore, by doing so, I am tying my queries directly to
postgres-isms.  One of the long term goals of this project is to be
able to fairly transparently support any ANSI SQL-compliant back end
with the same code base.


Unfortunately, there isn't any portable or standard (not exactly the
same thing ;-)) SQL functionality for dealing gracefully with
two-dimensional searches, which is what your lat/long queries are.


The OpenGIS Simple Features Specification[1] is a step in that
direction, no?  PostGIS[2], MySQL[3], and Oracle Spatial[4] implement
to varying degrees.  With PostGIS you do have to add non-standard
operators to a query's predicate to benefit from GiST indexes on
spatial columns, but the rest of the query can be straight out of
the SQL and OGC standards.

[1] http://www.opengeospatial.org/docs/99-049.pdf
[2] http://www.postgis.org/
[3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
[4] http://www.oracle.com/technology/products/spatial/index.html




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Help optimizing a slow index scan

2006-03-18 Thread Evgeny Gridasov

Try contrib/btree_gist.
I've tried that one, but for my case it didn't help much.
The performance was almost equal or even slower than built-in btree.

On Fri, 17 Mar 2006 08:53:44 -0700
Dan Harris [EMAIL PROTECTED] wrote:

 Dan Harris wrote:
  Markus Bertheau wrote:
  Have you tried using a GIST index on lat  long? These things are
  meant for two-dimensional data, whereas btree doesn't handle
  two-dimensional data that well. How many rows satisfy either of the
  long / lat condition?
 
   
  
  According to the analyze, less than 500 rows matched.  I'll look into 
  GIST indexes, thanks for the feedback.
 
  -Dan
 
  ---(end of broadcast)---
  TIP 2: Don't 'kill -9' the postmaster
 
 When I try to create a GIST index, I get the following error:
 
 create index eventgeo_lat_idx on eventgeo using GIST (lat);
 
 ERROR:  data type double precision has no default operator class for 
 access method gist
 HINT:  You must specify an operator class for the index or define a 
 default operator class for the data type.
 
 I'm not sure what a default operator class is, exactly..
 
 -Dan
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 


-- 
Evgeny Gridasov
Software Engineer 
I-Free, Russia

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Help optimizing a slow index scan

2006-03-18 Thread Oleg Bartunov


On Fri, 17 Mar 2006, Evgeny Gridasov wrote:


Try contrib/btree_gist.


contrib/btree_gist does nothing more than  built-in btree - it's just
an support for multicolumn GiST indices.


I've tried that one, but for my case it didn't help much.
The performance was almost equal or even slower than built-in btree.

On Fri, 17 Mar 2006 08:53:44 -0700
Dan Harris [EMAIL PROTECTED] wrote:


Dan Harris wrote:

Markus Bertheau wrote:

Have you tried using a GIST index on lat  long? These things are
meant for two-dimensional data, whereas btree doesn't handle
two-dimensional data that well. How many rows satisfy either of the
long / lat condition?





According to the analyze, less than 500 rows matched.  I'll look into
GIST indexes, thanks for the feedback.

-Dan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


When I try to create a GIST index, I get the following error:

create index eventgeo_lat_idx on eventgeo using GIST (lat);

ERROR:  data type double precision has no default operator class for
access method gist
HINT:  You must specify an operator class for the index or define a
default operator class for the data type.

I'm not sure what a default operator class is, exactly..

-Dan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings







Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Dan Harris


Markus Bertheau wrote:

Have you tried using a GIST index on lat  long? These things are
meant for two-dimensional data, whereas btree doesn't handle
two-dimensional data that well. How many rows satisfy either of the
long / lat condition?

  

According to the analyze, less than 500 rows matched.  I'll look into 
GIST indexes, thanks for the feedback.


-Dan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Dan Harris


Dan Harris wrote:

Markus Bertheau wrote:

Have you tried using a GIST index on lat  long? These things are
meant for two-dimensional data, whereas btree doesn't handle
two-dimensional data that well. How many rows satisfy either of the
long / lat condition?

 

According to the analyze, less than 500 rows matched.  I'll look into 
GIST indexes, thanks for the feedback.


-Dan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


When I try to create a GIST index, I get the following error:

create index eventgeo_lat_idx on eventgeo using GIST (lat);

ERROR:  data type double precision has no default operator class for 
access method gist
HINT:  You must specify an operator class for the index or define a 
default operator class for the data type.


I'm not sure what a default operator class is, exactly..

-Dan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Merlin Moncure

On 3/16/06, Dan Harris [EMAIL PROTECTED] wrote:
 explain analyze
 select distinct eventmain.incidentid, eventmain.entrydate,
 eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy
 from eventmain, eventgeo
 where
 eventmain.incidentid = eventgeo.incidentid and
 ( long  -104.998027962962 and long  -104.985957781349 ) and
 ( lat  39.7075542720006 and lat  39.7186195832938 ) and
 eventmain.entrydate  '2006-1-1 00:00' and
 eventmain.entrydate = '2006-3-17 00:00'
 order by
 eventmain.entrydate;

As others will probably mention, effective queries on lot/long which
is a spatial problem will require r-tree or gist.  I don't have a lot
of experience with exotic indexes but this may be the way to go.

One easy optimization to consider making is to make an index on either
(incidentid, entrydate) or (incident_id,long) which ever is more
selective.

This is 'yet another query' that would be fun to try out and tweak
using the 8.2 upcoming row-wise comparison.

merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Bruno Wolff III

On Fri, Mar 17, 2006 at 08:34:26 -0700,
  Dan Harris [EMAIL PROTECTED] wrote:
 Markus Bertheau wrote:
 Have you tried using a GIST index on lat  long? These things are
 meant for two-dimensional data, whereas btree doesn't handle
 two-dimensional data that well. How many rows satisfy either of the
 long / lat condition?
 
   
 
 According to the analyze, less than 500 rows matched.  I'll look into 
 GIST indexes, thanks for the feedback.

Have you looked at using the Earth Distance contrib module? If a spherical
model of the earth is suitable for your application, then it may work for you
and might be easier than trying to create Gist indexes yourself.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Merlin Moncure

On 3/17/06, Bruno Wolff III [EMAIL PROTECTED] wrote:
 Have you looked at using the Earth Distance contrib module? If a spherical
 model of the earth is suitable for your application, then it may work for you
 and might be easier than trying to create Gist indexes yourself.

earth distance = great stuff.  If the maximum error is known then you
can just pad the distance and filter the result on the client if exact
precision is needed.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Dan Harris


Merlin Moncure wrote:


As others will probably mention, effective queries on lot/long which
is a spatial problem will require r-tree or gist.  I don't have a lot
of experience with exotic indexes but this may be the way to go.

One easy optimization to consider making is to make an index on either
(incidentid, entrydate) or (incident_id,long) which ever is more
selective.

This is 'yet another query' that would be fun to try out and tweak
using the 8.2 upcoming row-wise comparison.

merlin
  
Thanks to everyone for your suggestions.  One problem I ran into is that 
apparently my version doesn't support the GIST index that was 
mentioned.  function 'box' doesn't exist ).. So I'm guessing that both 
this as well as the Earth Distance contrib require me to add on some 
more pieces that aren't there.


Furthermore, by doing so, I am tying my queries directly to 
postgres-isms.  One of the long term goals of this project is to be 
able to fairly transparently support any ANSI SQL-compliant back end 
with the same code base.  If I had full control over the query designs, 
I could make stored procedures to abstract this.  However, I have to 
deal with a gray box third-party reporting library that isn't so 
flexible.  I'll certainly consider going with something 
postgre-specific, but only as a last resort.


I tried the multi-column index as mentioned above but didn't see any 
noticeable improvement in elapsed time, although the planner did use the 
new index.


What is the real reason for the index not being very effective on these 
columns?  Although the numbers are in a very limited range, it seems 
that the records would be very selective as it's not terribly common for 
multiple rows to share the same coords.


Is the 8.2. upcoming row-wise comparison something that would be 
likely to help me?


Thanks again for your input

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Merlin Moncure

On 3/17/06, Dan Harris [EMAIL PROTECTED] wrote:
 Merlin Moncure wrote:
 Thanks to everyone for your suggestions.  One problem I ran into is that
 apparently my version doesn't support the GIST index that was
 mentioned.  function 'box' doesn't exist ).. So I'm guessing that both
 this as well as the Earth Distance contrib require me to add on some
 more pieces that aren't there.

earth distance is a contrib module that has to be built and installed.
it does use some pg-isms so I guess that can be ruled out.  GIST is a
bit more complex and I would consider reading the documentation very
carefully regarding them and make your own determination.

 Furthermore, by doing so, I am tying my queries directly to
 postgres-isms.  [snip]

 I tried the multi-column index as mentioned above but didn't see any
 noticeable improvement in elapsed time, although the planner did use the
 new index.

did you try both flavors of the multiple key index I suggested? (there
were other possiblities, please experiment)

 Is the 8.2. upcoming row-wise comparison something that would be
 likely to help me?

possibly. good news is that rwc is ansi sql.  you can see my blog
about it here: http://people.planetpostgresql.org/merlin/

Specifically, if you can order your table with an order by statement
such that the records you want are contingous, then yes.  However,
even though it's ansi sql, various commercial databases implement rwc
improperly or not at all (mysql, to their credit, gets it right) and I
still feel like an exotic index or some other nifty pg trick might be
the best performance approach here).

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Tom Lane

Dan Harris [EMAIL PROTECTED] writes:
 Furthermore, by doing so, I am tying my queries directly to 
 postgres-isms.  One of the long term goals of this project is to be 
 able to fairly transparently support any ANSI SQL-compliant back end 
 with the same code base.

Unfortunately, there isn't any portable or standard (not exactly the
same thing ;-)) SQL functionality for dealing gracefully with
two-dimensional searches, which is what your lat/long queries are.
You should accept right now that you can have portability or you can
have good performance, not both.

Merlin's enthusiasm for row-comparison queries is understandable because
that fix definitely helped a common problem.  But row comparison has
nothing to do with searches in two independent dimensions.  Row
comparison basically makes it easier to exploit the natural behavior of
multicolumn btree indexes ... but a multicolumn btree index does not
efficiently support queries that involve separate range limitations on
each index column.  (If you think about the index storage order you'll
see why: the answer entries are not contiguous in the index.)

To support two-dimensional searches you really need a non-btree index
structure, such as GIST.  Since this isn't standard, demanding a
portable answer won't get you anywhere.  (I don't mean to suggest that
Postgres is the only database that has such functionality, just that
the DBs that do have it don't agree on any common API.)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Michael Fuhr

On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote:
 Dan Harris [EMAIL PROTECTED] writes:
  Furthermore, by doing so, I am tying my queries directly to 
  postgres-isms.  One of the long term goals of this project is to be 
  able to fairly transparently support any ANSI SQL-compliant back end 
  with the same code base.
 
 Unfortunately, there isn't any portable or standard (not exactly the
 same thing ;-)) SQL functionality for dealing gracefully with
 two-dimensional searches, which is what your lat/long queries are.

The OpenGIS Simple Features Specification[1] is a step in that
direction, no?  PostGIS[2], MySQL[3], and Oracle Spatial[4] implement
to varying degrees.  With PostGIS you do have to add non-standard
operators to a query's predicate to benefit from GiST indexes on
spatial columns, but the rest of the query can be straight out of
the SQL and OGC standards.

[1] http://www.opengeospatial.org/docs/99-049.pdf
[2] http://www.postgis.org/
[3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
[4] http://www.oracle.com/technology/products/spatial/index.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] Help optimizing a slow index scan

2006-03-16 Thread Dan Harris


explain analyze
select distinct eventmain.incidentid, eventmain.entrydate, 
eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy

from eventmain, eventgeo
where
   eventmain.incidentid = eventgeo.incidentid and
   ( long  -104.998027962962 and long  -104.985957781349 ) and
   ( lat  39.7075542720006 and lat  39.7186195832938 ) and
   eventmain.entrydate  '2006-1-1 00:00' and
   eventmain.entrydate = '2006-3-17 00:00'
order by
   eventmain.entrydate;

  QUERY 
PLAN  


-
Unique  (cost=121313.81..121330.72 rows=451 width=178) (actual 
time=723719.761..723726.875 rows=1408 loops=1)
  -  Sort  (cost=121313.81..121314.94 rows=451 width=178) (actual 
time=723719.755..723721.807 rows=1408 loops=1)
Sort Key: eventmain.entrydate, eventmain.disposition, 
eventmain.incidentid, eventgeo.reportingarea, eventgeo.beatid, 
eventmain.finaltype, eventmain.casenumber, eventgeo.eventlocation, 
eventmain.insertdate, eventmain.priority, eventgeo.long, eventgeo.lat, 
eventgeo.geox, eventgeo.geoy
-  Nested Loop  (cost=0.00..121293.93 rows=451 width=178) 
(actual time=1916.230..723712.900 rows=1408 loops=1)
  -  Index Scan using eventgeo_lat_idx on eventgeo  
(cost=0.00..85488.05 rows=10149 width=76) (actual time=0.402..393376.129 
rows=22937 loops=1)
Index Cond: ((lat  39.7075542720006::double 
precision) AND (lat  39.7186195832938::double precision))
Filter: ((long  -104.998027962962::double 
precision) AND (long  -104.985957781349::double precision))
  -  Index Scan using eventmain_incidentid_idx on 
eventmain  (cost=0.00..3.52 rows=1 width=119) (actual 
time=14.384..14.392 rows=0 loops=22937)
Index Cond: ((eventmain.incidentid)::text = 
(outer.incidentid)::text)
Filter: ((entrydate  '2006-01-01 
00:00:00'::timestamp without time zone) AND (entrydate = '2006-03-17 
00:00:00'::timestamp without time zone))


Total runtime:   723729.238 ms(!) 



I'm trying to figure out why it's consuming so much time on the index 
scan for eventgeo_lat_idx.  Also, I have an index on long that the 
planner does not appear to find helpful.


There are 3.3 million records in eventmain and eventgeo.  The server has 
a reasonably fast RAID10 setup with 16x 15k RPM drives and 12GB of RAM ( 
11GB listed as cache by vmstat ).  Running version 8.0.2 on linux 
kernel 2.6.12.


I have just vacuum analyze'd both tables, rebuilt the eventgeo_lat_idx 
index and reran the query multiple times to see if caching helped ( it 
didn't help much ).   The server seems to be fine utilizing other fields 
from this table but using long and lat seem to drag it down 
significantly.


 Is it because there's such slight differences between the records, 
since they are all within a few hundredths of a degree from each other?


Thanks for your time and ideas.

-Dan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

Re: [PERFORM] Help optimizing a slow index scan

[PERFORM] Help optimizing a slow index scan

13 matches

Site Navigation

Mail list logo

Footer information