Re: [postgis-users] Optimizing PostGIS/Geoserver schema for huge dataset

Andy Colson Fri, 31 Mar 2017 19:34:07 -0700

This says 100:
http://stackoverflow.com/questions/6104774/how-many-table-partitions-is-too-many-in-postgres


This says 1000 is too many:
http://dba.stackexchange.com/questions/95977/maximum-partitioning-postgresql

Honestly you'd have to benchmark it, because I have no idea if there is a 
difference between 100 and 1000.

That being said, I'm surprised a good index isn't fast enough.  Partitions do 
cut the index size down, which is good, but it still has to scan all the child 
tables to see which match.

Do you ever update or delete data from the flow table?
Correct me if I'm wrong, but it looks like your where clause only uses fields: 
timeid, wkb_geometry and streamflow.  Yes?


Your explain analyze include table conus_flow, which isnt int your query or 
view.  Are you sure that's the right explain analyze?

-Andy


On 03/31/2017 08:18 PM, Andrew Gaydos wrote:

Thanks for the help!

I originally tried putting everything into a single non-partitioned table but 
the performance was horrible! Since each set of 2.3M rows shares the same 
timestamp, I thought this would be a good way to divide up the data when 
partitioning - I set a constraint on each, e.g.

table 1: constraint: timeid=101
table 2: constraint: timeid=102
etc.

I could try grouping times into a single table, e.g.

table 1: constraint: 100 <= timeid < 110
table 2: constraint: 110 <= timeid < 120
etc.

so that would give me 1000 partitions of 24 million rows each.

Is this what you were suggesting? What do you think the optimal balance of 
partitions and rows would be? 100 partitions of 240 million rows each? 10 
partitions of 2.4 billion rows each? At some point I think I would run into the 
insufferable performance I was getting with a single table, though.

Actually, now that I check the number of partitions is closer to 17,000, and 
number of rows per is 2.7M, so 46 billion rows altogether...

Thanks again!
-Andy

On Fri, Mar 31, 2017 at 6:15 PM, Andy Colson <a...@squeakycode.net 
<mailto:a...@squeakycode.net>> wrote:

    On 03/31/2017 11:38 AM, Andrew Gaydos wrote:

        Hi,



        My questions are

         1. It seems that for every session, there is a one-time penalty for 
the first query (several minutes) after which queries tend to run much quicker 
(about 10 seconds for all the tiles to be served). What is going on here?
         2. Is there a way to optimize GeoServer's queries against this schema, 
or a more efficient query to try?
         3. other postgres optimizations that might help?

        I'm pretty new to both GeoServer and PostGIS and have a sinking feeling 
that I could be structuring this dataset and queries more efficiently, but I've 
run out of ideas and don't have any postgres experts at work to ask, so I'm 
posting here.

        Thanks for any insight!

        -Andy


    Andy's Unite!

    err.. anyway, Here is the problem:

        data table: (10,000 partitions, each with 2.3 million rows)



    Lots of partitions will kill planning time. Look at the very bottom of:
    https://www.postgresql.org/docs/9.5/static/ddl-partitioning.html 
<https://www.postgresql.org/docs/9.5/static/ddl-partitioning.html>

    Do you have your heart set on lots of partitions?  How'd you feel about 
100? or maybe 1000?

    -Andy


_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

Re: [postgis-users] Optimizing PostGIS/Geoserver schema for huge dataset

Reply via email to