Sorry, The query run times are significantly slower on outdb as that using indb here are the run times on 2 queries.
ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US Counties) OutDB: 873.564s (14 minutes 33s) InDB: 127.36s (2 minutes 7s) ST_Count(select single band here)/ST_Clip(on all bands)/Inner Join/ST_Transform (US Counties) OutDB: 9537.371s (2 hours 38minutes) InDB: 310s (5 minutes 10 seconds) In the query planner it shows a large change in the number of columns (width) that are picked up in the CTE_rast_select. These extra columns slow down the ability to process the data. OUT DB CTE rast_select > -> Nested Loop (cost=0.28..76131.41 rows=62033 *width=1086)* > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250) In DB Nested Loop (cost=0.28..51767.41 rows=62033 *width=272*) > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250) On Wed, May 27, 2015 at 4:31 PM, PT <[email protected]> wrote: > On Tue, 26 May 2015 12:52:24 -0500 > David Haynes II <[email protected]> wrote: > > > Hello, > > > > I have a question about the query optimizer and its performance on > spatial > > datasets, specifically rasters. My use case is rather unique, the > > application that I am developing allows users to request summarizations > of > > various geographic boundaries around the world. Therefore our raster > > datasets are global. We are in the process of conducting some benchmarks > > for our system and we noticed something unexpected. > > > > The query is the same except the first is run on a raster (46gigs) in out > > of database (outdb) and the second is the same raster (46gigs) stored in > > database (indb). The raster is multibanded (13), with each band > > representing one entire MODIS global scene. A single year of MODIS is > > approximately 3.6 gigs. > > > > The outdb is being out performed by indb, because the query optimizer > gets > > smarter. But what is also interesting is all the extra pieces that are > > brought in with outdb. > > > > with poly as > > ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom FROM > > us_counties ) > > , rast_select as > > ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from > > rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast, > s.geom) ) > > select r.id, r.name, ST_Count(r.rast, 1, True) > > > > > > QUERY PLAN With Outdb > > > -------------------------------------------------------------------------------------------------- > > Sort (cost=93911.29..93926.80 rows=6204 width=254) > > Sort Key: r.id, r.name > > CTE poly > > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109 > width=62247) > > CTE rast_select > > -> Nested Loop (cost=0.28..76131.41 rows=62033 width=1086) > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250) > > -> Index Scan using modis_rast_gist on modis r_1 > > (cost=0.28..24.40 rows=2 width=836) > > Index Cond: ((rast)::geometry && s.geom) > > Filter: _st_intersects(s.geom, rast, NULL::integer) > > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254) > > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033 > > width=254) > > > > QUERY PLAN With Indb > > > > > ------------------------------------------------------------------------------------------------------------- > > Sort (cost=69547.29..69562.80 rows=6204 width=254) > > Sort Key: r.id, r.name > > CTE poly > > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109 > width=62247) > > CTE rast_select > > -> Nested Loop (cost=0.28..51767.41 rows=62033 width=272) > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250) > > -> Index Scan using modis_noout_rast_gist on modis_noout r_1 > > (cost=0.28..16.56 rows=2 width=22) > > Index Cond: ((rast)::geometry && s.geom) > > Filter: _st_intersects(s.geom, rast, NULL::integer) > > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254) > > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033 > > width=254) > > I could be missing something here, but I don't see how long the queries > actually take to run. Have you actually run the queries and timed them? > Keep in mind that analyze does not actually run the query, it only plans > it, so the actual run time is unknown if all you do is analyze. > > The query plans appear to be equal, assuming there are slight variances > in the names of tables from one DB to another (and I assume that your > description of indb and outdb reflects the fact that there are (for > reasons unknown) two copies of the data). > > The only purpose to those estimates is to choose a good plan. If the > plan is bad for one database and both databases have the same data, then > the plan will be bad for both. > > Since there have been no other responses, I'm guessing that others are > confused by your question as well. Can you describe the actual problem > that you're seeing? > > -- > Bill Moran <[email protected]> > > > -- > Sent via pgsql-general mailing list ([email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- David Haynes, Ph.D. Research Associate Terra Populus Minnesota Population Center www.terrapop.org
