Hi Stefan, > I haven’t tested the newly released postgis 3.0.1 since there is no easy installation so far (as long as I see).
If you can create a minimal reproducible example [ https://en.wikipedia.org/wiki/Minimal_working_example ] then the core developers can easily replicate, debug and fix this issue. ( on the github ? ; with some docker + docker-compose scripts. ) In the last weeks - some new postgis docker images created - and imho this is perfect for testing. ( images: https://hub.docker.com/r/postgis/postgis/tags repo: https://github.com/postgis/docker-postgis ) postgis/postgis:10-2.5 postgis/postgis:10-3.0 postgis/postgis:12-2.5 postgis/postgis:12-3.0 Regards, Imre Stefan Duling <stefan.dul...@mapz.com> ezt írta (időpont: 2020. febr. 25., K, 10:28): > Hello everyone, > > I just want to report the outcomes of my latest performance research. > > Postgres 12.1 and Postgres 12.2 are both restrictively slow in performing > our geometry processing queries - no matter what postgis version (3.0.0, > 2.5) I use and how the raw data came into the db. Setting jit on or off > and/or allow parallel workers and ANALYZE before the queries hasn’t a > significant effect on my test query. > > I installed Postgres 10.4 with postgis 2.4 on the same machine on a > separate cluster and it runs the queries in fraction of time. See explain > analyze output below. > > I haven’t tested the newly released postgis 3.0.1 since there is no easy > installation so far (as long as I see). > As long as I don’t see a clear way to break it down further we will stay > with postgres 10.4. > > Regards, > > Stefan > > ————————— > NEW SETUP WITH POSTGRES 12.4, POSTGIS 2.4 > > > EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS > SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, > ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry > FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, > is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo > GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ProjectSet (cost=272668581.44..3513224466.44 rows=612003000000 width=65) > (actual time=56997.276..91803.763 rows=739733 loops=1) > -> GroupAggregate (cost=272668581.44..297148701.44 rows=612003000 > width=65) (actual time=56997.263..69736.892 rows=611704 loops=1) > Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, > osm_admin_areas.admin_level, osm_admin_areas.name_en, > osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, > osm_admin_areas.is_in > -> Sort (cost=272668581.44..274198588.94 rows=612003000 > width=65) (actual time=56997.204..61484.384 rows=708178 loops=1) > Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, > osm_admin_areas.admin_level, osm_admin_areas.name_en, > osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, > osm_admin_areas.is_in > Sort Method: external merge Disk: 4087800kB > -> Result (cost=0.00..162527467.79 rows=612003000 > width=65) (actual time=6.103..44923.073 rows=708178 loops=1) > -> ProjectSet (cost=0.00..3406687.79 rows=612003000 > width=65) (actual time=6.095..44633.240 rows=708178 loops=1) > -> Seq Scan on osm_admin_areas > (cost=0.00..190612.03 rows=612003 width=6983) (actual time=0.103..4156.204 > rows=611707 loops=1) > Planning time: 4.031 ms > Execution time: 193917.392 ms > (11 rows) > > Am 20.02.2020 um 14:49 schrieb Stefan Duling <stefan.dul...@mapz.com>: > > Hi Imre, > > thanks for showing the other discussions. It really seems like it’s a very > similar problem. > > I tried: > > - jit = off > - max_parallel_workers_per_gather = 0 > - jit = off with max_parallel_workers_per_gather = 0 > - ANALYZE osm_admin_areas; before the query > > Sadly those options have not the effect to bring the query time close to > the old setup with postgres 10 > > Next ideas I plan to test: > - There is one further difference between the 2 setups: I use Imposm to > import the data. On the old setup (postgres10) I use v0.8.1, on the new > setup (postgres12) I use v0.10.0. Although the geometry column contains > exactly the same data in both setups, it could be worth to test it. Imposm > was used by the others in your linked discussions too. > - Try to upgrade Postgres to 12.2 > > Regards, > Stefan > > > Am 20.02.2020 um 13:08 schrieb Imre Samu <pella.s...@gmail.com>: > > Hi Stefan, > > > I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any > significant differences. > > Please re-test with: > - "ANALYZE osm_admin_areas; " AND " jit=off > max_parallel_workers_per_gather=0" parameters > based on similar problem: > https://github.com/openstreetmap/osm2pgsql/issues/1045#issuecomment-570165252 > > And check the debugging thread: > https://www.postgresql.org/message-id/flat/16183-64843a4ebc44d9d2%40postgresql.org > ( > maybe there is a fix ) > > -------- > > And is it possible to upgrade to PG12.2 ( from 12.1 ) ? > - There are lot of fixes : https://www.postgresql.org/docs/release/12.2/ > imho: your old database: * "Postgres 10.12"* is the latest PG10 ; and it > is correct to compare with the latest PG12 ) > > regards, > Imre > > > > > Stefan Duling <stefan.dul...@mapz.com> ezt írta (időpont: 2020. febr. > 20., Cs, 11:36): > >> Hi Paul, >> >> thanks for your response! >> >> I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any >> significant differences. >> I also tried to use only one worker per gatherer. Although there is a >> little improvement, the query time still is doubled compared to the old >> setup with Postgres 10. >> >> Almost all geometry related queries of my data processing pipeline need >> multiple calculation times while indexing and clustering is faster with >> Postgres 12. >> >> I continue in trying to break it down further. >> >> Regards, >> >> Stefan >> >> ————————— >> >> NEW SETUP WITH POSTGIS 2.5 >> >> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, >> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; >> >> QUERY PLAN >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ProjectSet (cost=40703445.82..43928566.78 rows=609088000 width=65) >> (actual time=51587.707..506619.624 rows=750429 loops=1) >> -> GroupAggregate (cost=40703445.82..40727809.34 rows=609088 >> width=65) (actual time=51587.663..67802.117 rows=611704 loops=1) >> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, >> osm_admin_areas.admin_level, osm_admin_areas.name_en, >> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, >> osm_admin_areas.is_in >> -> Sort (cost=40703445.82..40704968.54 rows=609088 width=65) >> (actual time=51586.580..57029.848 rows=708178 loops=1) >> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, >> osm_admin_areas.admin_level, osm_admin_areas.name_en, >> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, >> osm_admin_areas.is_in >> Sort Method: external merge Disk: 4087808kB >> -> Gather (cost=1000.00..40638832.88 rows=609088 >> width=65) (actual time=1240.826..35018.389 rows=708178 loops=1) >> Workers Planned: 4 >> Workers Launched: 4 >> -> Result (cost=0.00..40576924.08 rows=152272000 >> width=65) (actual time=988.020..29198.984 rows=141636 loops=5) >> -> ProjectSet (cost=0.00..986204.08 >> rows=152272000 width=65) (actual time=988.012..29085.994 rows=141636 >> loops=5) >> -> Parallel Seq Scan on >> osm_admin_areas (cost=0.00..186014.72 rows=152272 width=6545) (actual >> time=0.369..5959.617 rows=122341 loops=5) >> Planning Time: 14.213 ms >> JIT: >> Functions: 110 >> Options: Inlining true, Optimization true, Expressions true, Deforming >> true >> Timing: Generation 39.248 ms, Inlining 649.432 ms, Optimization >> 2369.206 ms, Emission 1896.395 ms, Total 4954.282 ms >> Execution Time: 620994.272 ms >> (18 rows) >> >> ————————— >> >> NEW SETUP WITH POSTGIS 2.5 AND max_parallel_workers_per_gather = 1 >> >> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, >> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; >> >> QUERY PLAN >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ProjectSet (cost=95352014.80..98577135.76 rows=609088000 width=65) >> (actual time=51995.515..493885.155 rows=750429 loops=1) >> -> GroupAggregate (cost=95352014.80..95376378.32 rows=609088 >> width=65) (actual time=51995.438..66093.453 rows=611704 loops=1) >> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, >> osm_admin_areas.admin_level, osm_admin_areas.name_en, >> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, >> osm_admin_areas.is_in >> -> Sort (cost=95352014.80..95353537.52 rows=609088 width=65) >> (actual time=51995.267..55931.535 rows=708178 loops=1) >> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, >> osm_admin_areas.admin_level, osm_admin_areas.name_en, >> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, >> osm_admin_areas.is_in >> Sort Method: external merge Disk: 4087808kB >> -> Gather (cost=1000.00..95287401.86 rows=609088 >> width=65) (actual time=969.364..38077.753 rows=708178 loops=1) >> Workers Planned: 1 >> Workers Launched: 1 >> -> Result (cost=0.00..95225493.06 rows=358287000 >> width=65) (actual time=755.966..34508.959 rows=354089 loops=2) >> -> ProjectSet (cost=0.00..2070873.06 >> rows=358287000 width=65) (actual time=755.958..34344.231 rows=354089 >> loops=2) >> -> Parallel Seq Scan on >> osm_admin_areas (cost=0.00..188074.87 rows=358287 width=6545) (actual >> time=4.870..4186.864 rows=305854 loops=2) >> Planning Time: 0.310 ms >> JIT: >> Functions: 53 >> Options: Inlining true, Optimization true, Expressions true, Deforming >> true >> Timing: Generation 10.729 ms, Inlining 235.111 ms, Optimization >> 789.374 ms, Emission 455.139 ms, Total 1490.353 ms >> Execution Time: 602556.200 ms >> (18 rows) >> >> >> >> > Am 19.02.2020 um 12:24 schrieb Paul Ramsey <pram...@cleverelephant.ca>: >> > >> > If you can cut this down to a smaller, shorter query that shows the >> same characteristics, I would like to see it in a profiler to determine if >> the hot spots have moved. The explain seems more or less structurally the >> same, which leaves a couple possibilities: >> > - the st_collect running in parallel is actually an antipattern, >> (there’s no performance benefit in the collect itself, so the only >> potential win is in allowing other things under the collect to go parallel) >> > - one of the other functions in your query has gotten a lot hotter >> > The external sort is the same size in both cases, so that’s not the >> issue. >> > You can probably confirm if parallelism is the problem by just turning >> it off in pg12 and re-running, see if things get better. set >> max_workers_per_gather to 1 or one of the other parallel config options. >> > Thanks for gathering data, >> > P >> > >> >> On Feb 19, 2020, at 5:37 AM, Stefan Duling <stefan.dul...@mapz.com> >> wrote: >> >> >> >> Hi everyone, >> >> >> >> at the moment I am trying out a new setup for our geo database server. >> Primarily I upgraded from Postgres and Postgis to newer versions. Beside of >> the memory both databases run on the same machine setups and contain >> identical data. Differences are: >> >> >> >> OLD SETUP >> >> >> >> 16GB RAM >> >> Postgres 10.12 >> >> Postgis 2.4 >> >> shared_buffers = 4GB >> >> work_mem = 128MB >> >> maintenance_work_mem = 1536MB >> >> >> >> NEW SETUP >> >> >> >> 32GB RAM >> >> Postgres 12.1 >> >> Postgis 3.0 >> >> shared_buffers = 6GB >> >> work_mem = 256MB >> >> maintenance_work_mem = 3GB >> >> >> >> >> >> Sadly I observe several queries that take multiple times to execute >> with the new setup. While the most expensive query took ~10h on the old >> setup, nearly 3 days are needed on the new setup. Currently I am analysing >> a more lightweight query, that execution time doubled with the new setup. >> >> >> >> Can someone help me with the EXPLAIN ANALYZE results? Is Postgres 12 / >> Postgis 3 slower in general? Is the use of parallel workers ineffective in >> my use case? >> >> >> >> Thanks in advance! >> >> >> >> Stefan Duling >> >> >> >> ————————— >> >> >> >> OLD SETUP >> >> >> >> EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS >> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry >> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, >> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo >> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; >> >> >> QUERY PLAN >> >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> ProjectSet (cost=274906731.77..3540274986.77 rows=616689000000 >> width=65) (actual time=97024.487..141100.410 rows=739733 loops=1) >> >> -> GroupAggregate (cost=274906731.77..299574291.77 rows=616689000 >> width=65) (actual time=97024.476..118509.455 rows=611704 loops=1) >> >> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, >> osm_admin_areas.admin_level, osm_admin_areas.name_en, >> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, >> osm_admin_areas.is_in >> >> -> Sort (cost=274906731.77..276448454.27 rows=616689000 >> width=65) (actual time=97024.431..108902.529 rows=708178 loops=1) >> >> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, >> osm_admin_areas.admin_level, osm_admin_areas.name_en, >> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, >> osm_admin_areas.is_in >> >> Sort Method: external merge Disk: 4087720kB >> >> -> Result (cost=0.00..163888355.59 rows=616689000 >> width=65) (actual time=42.029..84168.985 rows=708178 loops=1) >> >> -> ProjectSet (cost=0.00..3549215.58 >> rows=616689000 width=65) (actual time=42.019..83866.290 rows=708178 loops=1) >> >> -> Seq Scan on osm_admin_areas >> (cost=0.00..308514.89 rows=616689 width=6931) (actual time=16.288..9536.779 >> rows=611707 loops=1) >> >> Planning time: 0.364 ms >> >> Execution time: 244169.613 ms >> >> (11 rows) >> >> >> >> ————————— >> >> >> >> NEW SETUP >> >> >> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS >> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry >> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, >> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo >> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; >> >> >> QUERY PLAN >> >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> ProjectSet (cost=390996247.38..409798165.99 rows=620679000 width=65) >> (actual time=38272.152..512018.424 rows=750429 loops=1) >> >> -> GroupAggregate (cost=390996247.38..391174692.59 rows=620679 >> width=65) (actual time=38272.119..58302.727 rows=611704 loops=1) >> >> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, >> osm_admin_areas.admin_level, osm_admin_areas.name_en, >> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, >> osm_admin_areas.is_in >> >> -> Sort (cost=390996247.38..390997799.08 rows=620679 >> width=65) (actual time=38272.021..46632.706 rows=708178 loops=1) >> >> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, >> osm_admin_areas.admin_level, osm_admin_areas.name_en, >> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, >> osm_admin_areas.is_in >> >> Sort Method: external merge Disk: 4087328kB >> >> -> Gather (cost=1000.00..390930320.45 rows=620679 >> width=65) (actual time=1035.032..23429.953 rows=708178 loops=1) >> >> Workers Planned: 4 >> >> Workers Launched: 4 >> >> -> Result (cost=0.00..390867252.55 rows=155170000 >> width=65) (actual time=817.294..20634.253 rows=141636 loops=5) >> >> -> ProjectSet (cost=0.00..1390552.55 >> rows=155170000 width=65) (actual time=817.287..20546.850 rows=141636 >> loops=5) >> >> -> Parallel Seq Scan on >> osm_admin_areas (cost=0.00..226001.70 rows=155170 width=6702) (actual >> time=0.381..3937.298 rows=122341 loops=5) >> >> Planning Time: 13.173 ms >> >> JIT: >> >> Functions: 110 >> >> Options: Inlining true, Optimization true, Expressions true, >> Deforming true >> >> Timing: Generation 27.663 ms, Inlining 580.814 ms, Optimization >> 2265.257 ms, Emission 1222.527 ms, Total 4096.260 ms >> >> Execution Time: 629042.211 ms >> >> (18 rows) >> >> _______________________________________________ >> >> postgis-users mailing list >> >> postgis-users@lists.osgeo.org >> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >> > >> > _______________________________________________ >> > postgis-users mailing list >> > postgis-users@lists.osgeo.org >> > https://lists.osgeo.org/mailman/listinfo/postgis-users >> >> _______________________________________________ >> postgis-users mailing list >> postgis-users@lists.osgeo.org >> https://lists.osgeo.org/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/postgis-users > > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/postgis-users > > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list postgis-users@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/postgis-users