Sorry,

The query run times are significantly slower on outdb as that using indb
here are the run times on 2 queries.

ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US
Counties)
OutDB: 873.564s (14 minutes 33s)     InDB:  127.36s (2 minutes 7s)

ST_Count(select single band here)/ST_Clip(on all bands)/Inner
Join/ST_Transform (US Counties)
OutDB: 9537.371s (2 hours 38minutes)   InDB:  310s (5 minutes 10 seconds)

In the query planner it shows a large change in the number of columns
(width) that are picked up in the CTE_rast_select.
These extra columns slow down the ability to process the data.

OUT DB
CTE rast_select
>     ->  Nested Loop  (cost=0.28..76131.41 rows=62033 *width=1086)*
>           ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)


In DB
Nested Loop  (cost=0.28..51767.41 rows=62033 *width=272*)
>           ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)

On Wed, May 27, 2015 at 4:31 PM, PT <wmo...@potentialtech.com> wrote:

> On Tue, 26 May 2015 12:52:24 -0500
> David Haynes II <dahay...@umn.edu> wrote:
>
> > Hello,
> >
> > I have a question about the query optimizer and its performance on
> spatial
> > datasets, specifically rasters. My use case is rather unique, the
> > application that I am developing allows users to request summarizations
> of
> > various geographic boundaries around the world. Therefore our raster
> > datasets are global. We are in the process of conducting some benchmarks
> > for our system and we noticed something unexpected.
> >
> > The query is the same except the first is run on a raster (46gigs) in out
> > of database (outdb) and the second is the same raster (46gigs) stored in
> > database (indb). The raster is multibanded (13), with each band
> > representing one entire MODIS global scene. A single year of MODIS is
> > approximately 3.6 gigs.
> >
> > The outdb is being out performed by indb, because the query optimizer
> gets
> > smarter. But what is also interesting is all the extra pieces that are
> > brought in with outdb.
> >
> > with poly as
> > ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom  FROM
> > us_counties )
> > , rast_select as
> > ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
> > rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast,
> s.geom) )
> > select r.id, r.name, ST_Count(r.rast, 1, True)
> >
> >
> >    QUERY PLAN With Outdb
> >
> --------------------------------------------------------------------------------------------------
> > Sort   (cost=93911.29..93926.80 rows=6204 width=254)
> >   Sort Key: r.id, r.name
> >   CTE poly
> >     ->  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109
> width=62247)
> >   CTE rast_select
> >     ->  Nested Loop  (cost=0.28..76131.41 rows=62033 width=1086)
> >           ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
> >           ->  Index Scan using modis_rast_gist on modis r_1
> >  (cost=0.28..24.40 rows=2 width=836)
> >                 Index Cond: ((rast)::geometry && s.geom)
> >                 Filter: _st_intersects(s.geom, rast, NULL::integer)
> >   ->  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
> >         ->  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
> > width=254)
> >
> >                                                 QUERY PLAN With Indb
> >
> >
> -------------------------------------------------------------------------------------------------------------
> > Sort   (cost=69547.29..69562.80 rows=6204 width=254)
> >   Sort Key: r.id, r.name
> >   CTE poly
> >     ->  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109
> width=62247)
> >   CTE rast_select
> >     ->  Nested Loop  (cost=0.28..51767.41 rows=62033 width=272)
> >           ->  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
> >           ->  Index Scan using modis_noout_rast_gist on modis_noout r_1
> >  (cost=0.28..16.56 rows=2 width=22)
> >                 Index Cond: ((rast)::geometry && s.geom)
> >                 Filter: _st_intersects(s.geom, rast, NULL::integer)
> >   ->  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
> >         ->  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
> > width=254)
>
> I could be missing something here, but I don't see how long the queries
> actually take to run. Have you actually run the queries and timed them?
> Keep in mind that analyze does not actually run the query, it only plans
> it, so the actual run time is unknown if all you do is analyze.
>
> The query plans appear to be equal, assuming there are slight variances
> in the names of tables from one DB to another (and I assume that your
> description of indb and outdb reflects the fact that there are (for
> reasons unknown) two copies of the data).
>
> The only purpose to those estimates is to choose a good plan. If the
> plan is bad for one database and both databases have the same data, then
> the plan will be bad for both.
>
> Since there have been no other responses, I'm guessing that others are
> confused by your question as well. Can you describe the actual problem
> that you're seeing?
>
> --
> Bill Moran <wmo...@potentialtech.com>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
David Haynes, Ph.D.
Research Associate Terra Populus
Minnesota Population Center
www.terrapop.org

Reply via email to