Re: [postgis-users] speed of query

2011-03-02 Thread Mark Cave-Ayland

On 01/03/11 21:48, Michael Smedberg wrote:


OK, next guess.  It looks like your SQL might be pretty inefficient.
  For example, consider this SQL statement:

SELECT a, b FROM ll
WHERE
ST_Within(
ST_Point(
ST_X(ST_Transform(the_geom, 4326)),
ST_Y(ST_Transform(the_geom, 4326))
),
ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
)

I think that says something like:
Look through the II table
For each row, transform the_geom to 4326, and take the X
For each row, transform the_geom to 4326, and take the Y
Make a point from that X,Y
Check if that point is within a box

I don't think that'll use an index, and I think it will do a bunch of
transformation work for every row.

I think that instead of transforming every row in II to 4326, you'd
probably be better served by transforming your bounding box to 2163 one
time.  I think the SQL would look something like this:

SELECT
a,
b
FROM
ll
WHERE
ST_Within(
the_geom,
ST_Transform(
ST_MakeBox2D(
ST_Point(
-91.048,
45.956
),
ST_Point(
-90.973,
46.007
)
),
2163
)
)


In any case, you should probably try looking at the output of EXPLAIN or
EXPLAIN ANALYZE to understand whether your index is being used, etc.


Hi Michael,

Yes indeed - I think you summarised this brilliantly :)  If your 
geometries are stored as SRID 2163, then the index bounding boxes will 
also be stored in SRID 2163 - hence the index can only be used for 
queries involving other SRID 2163 bounding boxes. Otherwise PostgreSQL 
assumes it has to convert your entire geometry column to SRID 4326 first 
in order to calculate the intersection, which involves scanning the 
entire table and converting all the geometries on the fly...



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] speed of query

2011-03-01 Thread Ben Madin
Puneet,

I'm thinking you are doing a lot of extra calculations here by constantly 
transforming the points.

Why did you turn a geometry into two points and then make the geometry again... 
this would invalidate the use of an index on the same column?  (Neither did you 
define a SRID for your box in the first query)

Have a look at :

file:///usr/local/pgsql/share/doc/contrib/postgis-1.5/postgis.html#ST_MakeBox2D

I think the query (slightly modified here)

--Return all features that fall reside or partly reside in a US national atlas 
coordinate bounding box
--It is assumed here that the geometries are stored with SRID = 2163 (US 
National atlas equal area)
SELECT feature_id, feature_name, the_geom
FROM features
WHERE the_geom && st_transform(ST_SetSRID(ST_MakeBox2D(ST_Point(-91.048, 
45.956), ST_Point(-90.973, 46.007)),4326),2163)

is what you are after (obviously you will have to transform the lat long points 
(or the box.

cheers

Ben



On 02/03/2011, at 3:40 AM, Puneet Kishor wrote:

> I have a table with ~ 13.25 million points.
> 
> CREATE TABLE ll (
> gid serial NOT NULL,
> latitude double precision,
> longitude double precision,
> a integer,
> b integer,
> the_geom geometry,
> CONSTRAINT ll_pkey PRIMARY KEY (gid),
> CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
> CONSTRAINT enforce_geotype_the_geom CHECK (
> geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL
> ),
> CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 2163)
> )
> WITH (
> OIDS=FALSE
> );
> 
> I want to select the columns a,b for the rows that lie within a box made by 
> points [-91.048, 45.956] and [-90.973, 46.007]. Here are my results --
> 
> Query 1
> 
> SELECT a, b FROM ll 
> WHERE 
> ST_Within(
> ST_Point(
> ST_X(ST_Transform(the_geom, 4326)), 
> ST_Y(ST_Transform(the_geom, 4326))
> ),
> ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
> )
> 
> 31 rows returned in 46125 ms
> 
> Query 2 
> 
> SELECT a, b FROM ll 
> WHERE 
> ST_X(ST_Transform(the_geom, 4326)) >= -91.048 AND 
> ST_X(ST_Transform(the_geom, 4326)) <= -90.973 AND 
> ST_Y(ST_Transform(the_geom, 4326)) >= 45.956 AND 
> ST_Y(ST_Transform(the_geom, 4326)) <= 46.007
> 
> 31 rows returned in 25729 ms
> 
> Query 3
> 
> SELECT a, b FROM ll 
> WHERE 
> longitude >= -91.048 AND 
> longitude <= -90.973 AND 
> latitude >= 45.956 AND 
> latitude <= 46.007
> 
> 31 rows returned in 4011 ms
> 
> Query 4
> 
> I also have the same data in a SQLite database with an R*Tree index on 
> lat/lon. A query analogous to Query 3 returns fast enough to not even 
> register a time... a few milliseconds; effectively 0 seconds.
> 
> What gives? 
> -- 
> Puneet Kishor 
> 
> 
> 
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] speed of query

2011-03-01 Thread Jean-François Gigand
Hi,

Indeed: ST_Point, ST_MakeBox2D and ST_Transform are IMMUTABLE, so the
whole expression is computed once for all rows in this case. The only
function called every time is ST_Within, which normally makes use of
the spatial index.

I wonder if the '&&' operator (ST_Overlaps) would be faster or not...

JF


2011/3/1 Michael Smedberg :
> OK, next guess.  It looks like your SQL might be pretty inefficient.  For
> example, consider this SQL statement:
> SELECT a, b FROM ll
> WHERE
> ST_Within(
> ST_Point(
> ST_X(ST_Transform(the_geom, 4326)),
> ST_Y(ST_Transform(the_geom, 4326))
> ),
> ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
> )
> I think that says something like:
> Look through the II table
> For each row, transform the_geom to 4326, and take the X
> For each row, transform the_geom to 4326, and take the Y
> Make a point from that X,Y
> Check if that point is within a box
> I don't think that'll use an index, and I think it will do a bunch of
> transformation work for every row.
> I think that instead of transforming every row in II to 4326, you'd probably
> be better served by transforming your bounding box to 2163 one time.  I
> think the SQL would look something like this:
>
> SELECT
> a,
> b
> FROM
> ll
> WHERE
> ST_Within(
> the_geom,
> ST_Transform(
> ST_MakeBox2D(
> ST_Point(
> -91.048,
> 45.956
> ),
> ST_Point(
> -90.973,
> 46.007
> )
> ),
> 2163
> )
> )
>
> In any case, you should probably try looking at the output of EXPLAIN or
> EXPLAIN ANALYZE to understand whether your index is being used, etc.
>
>
>
>
> On Tue, Mar 1, 2011 at 11:40 AM, Puneet Kishor  wrote:
>>
>> I have a table with ~ 13.25 million points.
>>
>> CREATE TABLE ll (
>> gid serial NOT NULL,
>> latitude double precision,
>> longitude double precision,
>> a integer,
>> b integer,
>> the_geom geometry,
>> CONSTRAINT ll_pkey PRIMARY KEY (gid),
>> CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
>> CONSTRAINT enforce_geotype_the_geom CHECK (
>> geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL
>> ),
>> CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 2163)
>> )
>> WITH (
>> OIDS=FALSE
>> );
>>
>> I want to select the columns a,b for the rows that lie within a box made
>> by points [-91.048, 45.956] and [-90.973, 46.007]. Here are my results --
>>
>> Query 1
>>
>> SELECT a, b FROM ll
>> WHERE
>> ST_Within(
>> ST_Point(
>> ST_X(ST_Transform(the_geom, 4326)),
>> ST_Y(ST_Transform(the_geom, 4326))
>> ),
>> ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
>> )
>>
>> 31 rows returned in 46125 ms
>>
>> Query 2
>>
>> SELECT a, b FROM ll
>> WHERE
>> ST_X(ST_Transform(the_geom, 4326)) >= -91.048 AND
>> ST_X(ST_Transform(the_geom, 4326)) <= -90.973 AND
>> ST_Y(ST_Transform(the_geom, 4326)) >= 45.956 AND
>> ST_Y(ST_Transform(the_geom, 4326)) <= 46.007
>>
>> 31 rows returned in 25729 ms
>>
>> Query 3
>>
>> SELECT a, b FROM ll
>> WHERE
>> longitude >= -91.048 AND
>> longitude <= -90.973 AND
>> latitude >= 45.956 AND
>> latitude <= 46.007
>>
>> 31 rows returned in 4011 ms
>>
>> Query 4
>>
>> I also have the same data in a SQLite database with an R*Tree index on
>> lat/lon. A query analogous to Query 3 returns fast enough to not even
>> register a time... a few milliseconds; effectively 0 seconds.
>>
>> What gives?
>> --
>> Puneet Kishor
>>
>>
>>
>> ___
>> postgis-users mailing list
>> postgis-users@postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] speed of query

2011-03-01 Thread Michael Smedberg
OK, next guess.  It looks like your SQL might be pretty inefficient.  For
example, consider this SQL statement:

SELECT a, b FROM ll
WHERE
ST_Within(
ST_Point(
ST_X(ST_Transform(the_geom, 4326)),
ST_Y(ST_Transform(the_geom, 4326))
),
ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
)

I think that says something like:
Look through the II table
For each row, transform the_geom to 4326, and take the X
For each row, transform the_geom to 4326, and take the Y
Make a point from that X,Y
Check if that point is within a box

I don't think that'll use an index, and I think it will do a bunch of
transformation work for every row.

I think that instead of transforming every row in II to 4326, you'd probably
be better served by transforming your bounding box to 2163 one time.  I
think the SQL would look something like this:

SELECT
a,
b
FROM
ll
WHERE
ST_Within(
the_geom,
ST_Transform(
ST_MakeBox2D(
ST_Point(
-91.048,
45.956
),
ST_Point(
-90.973,
46.007
)
),
2163
)
)


In any case, you should probably try looking at the output of EXPLAIN or
EXPLAIN ANALYZE to understand whether your index is being used, etc.





On Tue, Mar 1, 2011 at 11:40 AM, Puneet Kishor  wrote:

> I have a table with ~ 13.25 million points.
>
> CREATE TABLE ll (
> gid serial NOT NULL,
> latitude double precision,
> longitude double precision,
> a integer,
> b integer,
> the_geom geometry,
> CONSTRAINT ll_pkey PRIMARY KEY (gid),
> CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
> CONSTRAINT enforce_geotype_the_geom CHECK (
> geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL
> ),
> CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 2163)
> )
> WITH (
> OIDS=FALSE
> );
>
> I want to select the columns a,b for the rows that lie within a box made by
> points [-91.048, 45.956] and [-90.973, 46.007]. Here are my results --
>
> Query 1
>
> SELECT a, b FROM ll
> WHERE
> ST_Within(
> ST_Point(
> ST_X(ST_Transform(the_geom, 4326)),
> ST_Y(ST_Transform(the_geom, 4326))
> ),
> ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
> )
>
> 31 rows returned in 46125 ms
>
> Query 2
>
> SELECT a, b FROM ll
> WHERE
> ST_X(ST_Transform(the_geom, 4326)) >= -91.048 AND
> ST_X(ST_Transform(the_geom, 4326)) <= -90.973 AND
> ST_Y(ST_Transform(the_geom, 4326)) >= 45.956 AND
> ST_Y(ST_Transform(the_geom, 4326)) <= 46.007
>
> 31 rows returned in 25729 ms
>
> Query 3
>
> SELECT a, b FROM ll
> WHERE
> longitude >= -91.048 AND
> longitude <= -90.973 AND
> latitude >= 45.956 AND
> latitude <= 46.007
>
> 31 rows returned in 4011 ms
>
> Query 4
>
> I also have the same data in a SQLite database with an R*Tree index on
> lat/lon. A query analogous to Query 3 returns fast enough to not even
> register a time... a few milliseconds; effectively 0 seconds.
>
> What gives?
> --
> Puneet Kishor
>
>
>
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] speed of query

2011-03-01 Thread Puneet Kishor

On Tuesday, March 1, 2011 at 3:30 PM, Michael Smedberg wrote: 
> It looks like maybe you don't have an index on your the_geom column? If so, 
> you might want to read the "Indexing the data" section of 
> http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_tut01
> 

I should have added the following information --

CREATE INDEX ll_the_geom_gist
ON ll
USING gist
(the_geom);

I do have an index. Ironically, I don't have an index on longitude and latitude 
columns, yet, searching on those columns is faster than on the_geom.

> On Tue, Mar 1, 2011 at 11:40 AM, Puneet Kishor  wrote:
> >  I have a table with ~ 13.25 million points.
> > 
> >  CREATE TABLE ll (
> >  gid serial NOT NULL,
> >  latitude double precision,
> >  longitude double precision,
> >  a integer,
> >  b integer,
> >  the_geom geometry,
> >  CONSTRAINT ll_pkey PRIMARY KEY (gid),
> >  CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
> >  CONSTRAINT enforce_geotype_the_geom CHECK (
> >  geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL
> >  ),
> >  CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 2163)
> >  )
> >  WITH (
> >  OIDS=FALSE
> >  );
> > 
> >  I want to select the columns a,b for the rows that lie within a box made 
> > by points [-91.048, 45.956] and [-90.973, 46.007]. Here are my results --
> > 
> >  Query 1
> > 
> >  SELECT a, b FROM ll
> >  WHERE
> >  ST_Within(
> >  ST_Point(
> >  ST_X(ST_Transform(the_geom, 4326)),
> >  ST_Y(ST_Transform(the_geom, 4326))
> >  ),
> >  ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
> >  )
> > 
> >  31 rows returned in 46125 ms
> > 
> >  Query 2
> > 
> >  SELECT a, b FROM ll
> >  WHERE
> >  ST_X(ST_Transform(the_geom, 4326)) >= -91.048 AND
> >  ST_X(ST_Transform(the_geom, 4326)) <= -90.973 AND
> >  ST_Y(ST_Transform(the_geom, 4326)) >= 45.956 AND
> >  ST_Y(ST_Transform(the_geom, 4326)) <= 46.007
> > 
> >  31 rows returned in 25729 ms
> > 
> >  Query 3
> > 
> >  SELECT a, b FROM ll
> >  WHERE
> >  longitude >= -91.048 AND
> >  longitude <= -90.973 AND
> >  latitude >= 45.956 AND
> >  latitude <= 46.007
> > 
> >  31 rows returned in 4011 ms
> > 
> >  Query 4
> > 
> >  I also have the same data in a SQLite database with an R*Tree index on 
> > lat/lon. A query analogous to Query 3 returns fast enough to not even 
> > register a time... a few milliseconds; effectively 0 seconds.
> > 
> >  What gives?
> >  --
> >  Puneet Kishor
> > 
> > 
> > 
> >  ___
> >  postgis-users mailing list
> > postgis-users@postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> 

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] speed of query

2011-03-01 Thread Michael Smedberg
It looks like maybe you don't have an index on your the_geom column?  If so,
you might want to read the "Indexing the data" section of
http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_tut01

On Tue, Mar 1, 2011 at 11:40 AM, Puneet Kishor  wrote:

> I have a table with ~ 13.25 million points.
>
> CREATE TABLE ll (
> gid serial NOT NULL,
> latitude double precision,
> longitude double precision,
> a integer,
> b integer,
> the_geom geometry,
> CONSTRAINT ll_pkey PRIMARY KEY (gid),
> CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
> CONSTRAINT enforce_geotype_the_geom CHECK (
> geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL
> ),
> CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 2163)
> )
> WITH (
> OIDS=FALSE
> );
>
> I want to select the columns a,b for the rows that lie within a box made by
> points [-91.048, 45.956] and [-90.973, 46.007]. Here are my results --
>
> Query 1
>
> SELECT a, b FROM ll
> WHERE
> ST_Within(
> ST_Point(
> ST_X(ST_Transform(the_geom, 4326)),
> ST_Y(ST_Transform(the_geom, 4326))
> ),
> ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
> )
>
> 31 rows returned in 46125 ms
>
> Query 2
>
> SELECT a, b FROM ll
> WHERE
> ST_X(ST_Transform(the_geom, 4326)) >= -91.048 AND
> ST_X(ST_Transform(the_geom, 4326)) <= -90.973 AND
> ST_Y(ST_Transform(the_geom, 4326)) >= 45.956 AND
> ST_Y(ST_Transform(the_geom, 4326)) <= 46.007
>
> 31 rows returned in 25729 ms
>
> Query 3
>
> SELECT a, b FROM ll
> WHERE
> longitude >= -91.048 AND
> longitude <= -90.973 AND
> latitude >= 45.956 AND
> latitude <= 46.007
>
> 31 rows returned in 4011 ms
>
> Query 4
>
> I also have the same data in a SQLite database with an R*Tree index on
> lat/lon. A query analogous to Query 3 returns fast enough to not even
> register a time... a few milliseconds; effectively 0 seconds.
>
> What gives?
> --
> Puneet Kishor
>
>
>
> ___
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


[postgis-users] speed of query

2011-03-01 Thread Puneet Kishor
I have a table with ~ 13.25 million points.

CREATE TABLE ll (
gid serial NOT NULL,
latitude double precision,
longitude double precision,
a integer,
b integer,
the_geom geometry,
CONSTRAINT ll_pkey PRIMARY KEY (gid),
CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
CONSTRAINT enforce_geotype_the_geom CHECK (
geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL
),
CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 2163)
)
WITH (
OIDS=FALSE
);

I want to select the columns a,b for the rows that lie within a box made by 
points [-91.048, 45.956] and [-90.973, 46.007]. Here are my results --

Query 1

SELECT a, b FROM ll 
WHERE 
ST_Within(
ST_Point(
ST_X(ST_Transform(the_geom, 4326)), 
ST_Y(ST_Transform(the_geom, 4326))
),
ST_MakeBox2D(ST_Point(-91.048, 45.956), ST_Point(-90.973, 46.007))
)

31 rows returned in 46125 ms

Query 2 

SELECT a, b FROM ll 
WHERE 
ST_X(ST_Transform(the_geom, 4326)) >= -91.048 AND 
ST_X(ST_Transform(the_geom, 4326)) <= -90.973 AND 
ST_Y(ST_Transform(the_geom, 4326)) >= 45.956 AND 
ST_Y(ST_Transform(the_geom, 4326)) <= 46.007

31 rows returned in 25729 ms

Query 3

SELECT a, b FROM ll 
WHERE 
longitude >= -91.048 AND 
longitude <= -90.973 AND 
latitude >= 45.956 AND 
latitude <= 46.007

31 rows returned in 4011 ms

Query 4

I also have the same data in a SQLite database with an R*Tree index on lat/lon. 
A query analogous to Query 3 returns fast enough to not even register a time... 
a few milliseconds; effectively 0 seconds.

What gives? 
-- 
Puneet Kishor 



___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users