Re: [postgis-users] Is this a perverse use-case for raster types?

Bborie Park Mon, 29 Oct 2012 16:33:45 -0700

I believe ST_Intersects() works on out-of-db rasters in the 2.0 series,
possibly 2.0.1.


As for performance of in-db vs out-db, in-db is slightly faster but my
benchmarks are rather old.  I hope to do some testing soon to see if I
can improve out-db performance.

Tile size is critical regardless of whether or not you're going to store
your rasters in-db or out-db.  Generally, tiles should be 100x100 or
smaller.  Ideal tile size depends upon the input raster's dimensions and
what tile dimension is cleanly divisible from the raster's dimension.

I wonder what your benchmark's performance would be like if the raster
is out-db.  I'd expect a flat line with little change regardless the #
of bands.

-bborie

On 10/29/2012 04:23 PM, James Hiebert wrote:
>> If you've got a large number of bands (100s or more), you may want to
>> consider having the rasters be out-of-db.
> 
> I had considered that (better, actually, than duplicating our data, right?), 
> but was finding that st_intersects wasn't yet implemented for out of db 
> storage.  Looking through the trunk code, though, it appears that maybe 
> you've gone ahead and implemented that since 2.0.1?  If so, great!  
> ST_PixelAsPoints() is another good reason for me to seriously consider 
> working out of trunk...
> 
>> Part of the problem is that
>> anything stored in PostgreSQL (in-db) is TOASTed so needs to be
>> deserialized (and probably decompressed).  So, if the serialized raster
>> is big (more bands), the deTOASTing will take longer.
> 
> Thanks; good to know.
> 
>> Another problem with your benchmark query is that the ST_Clip() is
>> running twice (for height and width).
> 
> Ah, that changes the picture pretty dramatically (see attached plot).  Since 
> it improves by a lot more than a factor of two, I suspect maybe I'm having 
> some desktop scaling issues or something.  I'll go ahead and actually put 
> this on our database server, try the trunk version, and go from there.  This 
> is at least somewhat encouraging :)  Thanks for the suggestions.
> 
> ~James
> 
> On Mon, Oct 29, 2012 at 03:50:04PM -0700, Bborie Park wrote:
>> James,
>>
>> I use PostGIS raster for a similar purpose (model outputs) though my
>> model outputs are for a specific day (average temperature for a specific
>> date).  So, one raster with one band per day per variable.  I could
>> combine a year's worth of bands into one raster but I decided against that.
>>
>> If you've got a large number of bands (100s or more), you may want to
>> consider having the rasters be out-of-db.  Part of the problem is that
>> anything stored in PostgreSQL (in-db) is TOASTed so needs to be
>> deserialized (and probably decompressed).  So, if the serialized raster
>> is big (more bands), the deTOASTing will take longer.
>>
>> Another problem with your benchmark query is that the ST_Clip() is
>> running twice (for height and width).
>>
>> If you're in the evaluation stage and you're compiling PostGIS yourself,
>> I'd recommend trying SVN -trunk (will become 2.1) as it has additional
>> capabilities and performance improvements.  I'm already using -trunk in
>> production as I needed the new features (full disclosure: I wrote almost
>> the new features in -trunk).
>>
>> -bborie
>>
>> On 10/29/2012 03:32 PM, James Hiebert wrote:
>>> Hi All,
>>>
>>> I'm considering using PostGIS rasters for storage of raster data at my 
>>> organization and I'm looking for some advice (or perhaps a reality check).  
>>> I work for a region climate services provider and the vast majority of our 
>>> data (by volume, not necessarily complexity) are output from climate 
>>> models.  These are generally a n-by-m raster with one band for each 
>>> timestep.  There could be upwards of 36k to 72k timesteps for a typical 
>>> model run.  We have hundreds of model runs.
>>>
>>> So my question is, is it insane to be thinking of storing that many bands 
>>> in a PostGIS raster?  Or more specifically, is this _not_ a use case for 
>>> which PostGIS rasters were designed?  I notice that most of the examples in 
>>> the docs and in "PostGIS In Action" focus only on images and I can imagine 
>>> that handling multispectral satellite images as being more of the intended 
>>> use case.
>>>
>>> I did a little benchmarking of a typical use case of ours ("What's the 
>>> average temperature inside a some polygon, e.g. a river basin?").  I 
>>> noticed that the run time for doing a ST_Clip(raster, band, geometry) and 
>>> ST_Intersects(raster, band, geometry) appears to be super-linear even when 
>>> doing it on just a single band.  I ran the following query:
>>> SELECT rid, st_height(st_clip(rast, 1, the_geom)), st_width(st_clip(rast, 
>>> the_geom)) FROM basins INNER JOIN bcsd ON ST_Intersects(rast, 1, the_geom) 
>>> WHERE rid = <rid> (where basins is table of river basins with one single 
>>> polygon and bcsd is a table with a raster column "rast").
>>> for a set of rasters with increasing number of bands, and the time to run 
>>> the query is shown in the attached plot.  Since the raster properties are 
>>> presumably shared across all the bands, it seems odd to me that run time 
>>> would increase.  I would expect it to be _contant_ (with constant number of 
>>> pixels), but I suppose that that's my own ignorance as to how the PG type 
>>> extensions work?
>>>
>>> Comments or explanations are welcome.
>>>
>>> ~James
> 
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 

-- 
Bborie Park
Programmer
Center for Vectorborne Diseases
UC Davis
530-752-8380
bkp...@ucdavis.edu
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] Is this a perverse use-case for raster types?

Reply via email to