Hello everyone, I just want to report the outcomes of my latest performance research.
Postgres 12.1 and Postgres 12.2 are both restrictively slow in performing our geometry processing queries - no matter what postgis version (3.0.0, 2.5) I use and how the raw data came into the db. Setting jit on or off and/or allow parallel workers and ANALYZE before the queries hasn’t a significant effect on my test query. I installed Postgres 10.4 with postgis 2.4 on the same machine on a separate cluster and it runs the queries in fraction of time. See explain analyze output below. I haven’t tested the newly released postgis 3.0.1 since there is no easy installation so far (as long as I see). As long as I don’t see a clear way to break it down further we will stay with postgres 10.4. Regards, Stefan ————————— NEW SETUP WITH POSTGRES 12.4, POSTGIS 2.4 EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ProjectSet (cost=272668581.44..3513224466.44 rows=612003000000 width=65) (actual time=56997.276..91803.763 rows=739733 loops=1) -> GroupAggregate (cost=272668581.44..297148701.44 rows=612003000 width=65) (actual time=56997.263..69736.892 rows=611704 loops=1) Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in -> Sort (cost=272668581.44..274198588.94 rows=612003000 width=65) (actual time=56997.204..61484.384 rows=708178 loops=1) Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in Sort Method: external merge Disk: 4087800kB -> Result (cost=0.00..162527467.79 rows=612003000 width=65) (actual time=6.103..44923.073 rows=708178 loops=1) -> ProjectSet (cost=0.00..3406687.79 rows=612003000 width=65) (actual time=6.095..44633.240 rows=708178 loops=1) -> Seq Scan on osm_admin_areas (cost=0.00..190612.03 rows=612003 width=6983) (actual time=0.103..4156.204 rows=611707 loops=1) Planning time: 4.031 ms Execution time: 193917.392 ms (11 rows) > Am 20.02.2020 um 14:49 schrieb Stefan Duling <stefan.dul...@mapz.com>: > > Hi Imre, > > thanks for showing the other discussions. It really seems like it’s a very > similar problem. > > I tried: > > - jit = off > - max_parallel_workers_per_gather = 0 > - jit = off with max_parallel_workers_per_gather = 0 > - ANALYZE osm_admin_areas; before the query > > Sadly those options have not the effect to bring the query time close to the > old setup with postgres 10 > > Next ideas I plan to test: > - There is one further difference between the 2 setups: I use Imposm to > import the data. On the old setup (postgres10) I use v0.8.1, on the new setup > (postgres12) I use v0.10.0. Although the geometry column contains exactly the > same data in both setups, it could be worth to test it. Imposm was used by > the others in your linked discussions too. > - Try to upgrade Postgres to 12.2 > > Regards, > Stefan > > >> Am 20.02.2020 um 13:08 schrieb Imre Samu <pella.s...@gmail.com >> <mailto:pella.s...@gmail.com>>: >> >> Hi Stefan, >> >> > I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any >> > significant differences. >> >> Please re-test with: >> - "ANALYZE osm_admin_areas; " AND " jit=off >> max_parallel_workers_per_gather=0" parameters >> based on similar problem: >> https://github.com/openstreetmap/osm2pgsql/issues/1045#issuecomment-570165252 >> >> <https://github.com/openstreetmap/osm2pgsql/issues/1045#issuecomment-570165252> >> >> And check the debugging thread: >> https://www.postgresql.org/message-id/flat/16183-64843a4ebc44d9d2%40postgresql.org >> >> <https://www.postgresql.org/message-id/flat/16183-64843a4ebc44d9d2%40postgresql.org> >> ( maybe there is a fix ) >> >> -------- >> >> And is it possible to upgrade to PG12.2 ( from 12.1 ) ? >> - There are lot of fixes : https://www.postgresql.org/docs/release/12.2/ >> <https://www.postgresql.org/docs/release/12.2/> >> imho: your old database: "Postgres 10.12" is the latest PG10 ; and it is >> correct to compare with the latest PG12 ) >> >> regards, >> Imre >> >> >> >> >> Stefan Duling <stefan.dul...@mapz.com <mailto:stefan.dul...@mapz.com>> ezt >> írta (időpont: 2020. febr. 20., Cs, 11:36): >> Hi Paul, >> >> thanks for your response! >> >> I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any >> significant differences. >> I also tried to use only one worker per gatherer. Although there is a little >> improvement, the query time still is doubled compared to the old setup with >> Postgres 10. >> >> Almost all geometry related queries of my data processing pipeline need >> multiple calculation times while indexing and clustering is faster with >> Postgres 12. >> >> I continue in trying to break it down further. >> >> Regards, >> >> Stefan >> >> ————————— >> >> NEW SETUP WITH POSTGIS 2.5 >> >> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; >> >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ProjectSet (cost=40703445.82..43928566.78 rows=609088000 width=65) (actual >> time=51587.707..506619.624 rows=750429 loops=1) >> -> GroupAggregate (cost=40703445.82..40727809.34 rows=609088 width=65) >> (actual time=51587.663..67802.117 rows=611704 loops=1) >> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name >> <http://osm_admin_areas.name/>, osm_admin_areas.admin_level, >> osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, >> osm_admin_areas.iso, osm_admin_areas.is_in >> -> Sort (cost=40703445.82..40704968.54 rows=609088 width=65) >> (actual time=51586.580..57029.848 rows=708178 loops=1) >> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name >> <http://osm_admin_areas.name/>, osm_admin_areas.admin_level, >> osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, >> osm_admin_areas.iso, osm_admin_areas.is_in >> Sort Method: external merge Disk: 4087808kB >> -> Gather (cost=1000.00..40638832.88 rows=609088 width=65) >> (actual time=1240.826..35018.389 rows=708178 loops=1) >> Workers Planned: 4 >> Workers Launched: 4 >> -> Result (cost=0.00..40576924.08 rows=152272000 >> width=65) (actual time=988.020..29198.984 rows=141636 loops=5) >> -> ProjectSet (cost=0.00..986204.08 >> rows=152272000 width=65) (actual time=988.012..29085.994 rows=141636 loops=5) >> -> Parallel Seq Scan on osm_admin_areas >> (cost=0.00..186014.72 rows=152272 width=6545) (actual time=0.369..5959.617 >> rows=122341 loops=5) >> Planning Time: 14.213 ms >> JIT: >> Functions: 110 >> Options: Inlining true, Optimization true, Expressions true, Deforming >> true >> Timing: Generation 39.248 ms, Inlining 649.432 ms, Optimization 2369.206 >> ms, Emission 1896.395 ms, Total 4954.282 ms >> Execution Time: 620994.272 ms >> (18 rows) >> >> ————————— >> >> NEW SETUP WITH POSTGIS 2.5 AND max_parallel_workers_per_gather = 1 >> >> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; >> >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ProjectSet (cost=95352014.80..98577135.76 rows=609088000 width=65) (actual >> time=51995.515..493885.155 rows=750429 loops=1) >> -> GroupAggregate (cost=95352014.80..95376378.32 rows=609088 width=65) >> (actual time=51995.438..66093.453 rows=611704 loops=1) >> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name >> <http://osm_admin_areas.name/>, osm_admin_areas.admin_level, >> osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, >> osm_admin_areas.iso, osm_admin_areas.is_in >> -> Sort (cost=95352014.80..95353537.52 rows=609088 width=65) >> (actual time=51995.267..55931.535 rows=708178 loops=1) >> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name >> <http://osm_admin_areas.name/>, osm_admin_areas.admin_level, >> osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, >> osm_admin_areas.iso, osm_admin_areas.is_in >> Sort Method: external merge Disk: 4087808kB >> -> Gather (cost=1000.00..95287401.86 rows=609088 width=65) >> (actual time=969.364..38077.753 rows=708178 loops=1) >> Workers Planned: 1 >> Workers Launched: 1 >> -> Result (cost=0.00..95225493.06 rows=358287000 >> width=65) (actual time=755.966..34508.959 rows=354089 loops=2) >> -> ProjectSet (cost=0.00..2070873.06 >> rows=358287000 width=65) (actual time=755.958..34344.231 rows=354089 loops=2) >> -> Parallel Seq Scan on osm_admin_areas >> (cost=0.00..188074.87 rows=358287 width=6545) (actual time=4.870..4186.864 >> rows=305854 loops=2) >> Planning Time: 0.310 ms >> JIT: >> Functions: 53 >> Options: Inlining true, Optimization true, Expressions true, Deforming >> true >> Timing: Generation 10.729 ms, Inlining 235.111 ms, Optimization 789.374 >> ms, Emission 455.139 ms, Total 1490.353 ms >> Execution Time: 602556.200 ms >> (18 rows) >> >> >> >> > Am 19.02.2020 um 12:24 schrieb Paul Ramsey <pram...@cleverelephant.ca >> > <mailto:pram...@cleverelephant.ca>>: >> > >> > If you can cut this down to a smaller, shorter query that shows the same >> > characteristics, I would like to see it in a profiler to determine if the >> > hot spots have moved. The explain seems more or less structurally the >> > same, which leaves a couple possibilities: >> > - the st_collect running in parallel is actually an antipattern, (there’s >> > no performance benefit in the collect itself, so the only potential win is >> > in allowing other things under the collect to go parallel) >> > - one of the other functions in your query has gotten a lot hotter >> > The external sort is the same size in both cases, so that’s not the issue. >> > You can probably confirm if parallelism is the problem by just turning it >> > off in pg12 and re-running, see if things get better. set >> > max_workers_per_gather to 1 or one of the other parallel config options. >> > Thanks for gathering data, >> > P >> > >> >> On Feb 19, 2020, at 5:37 AM, Stefan Duling <stefan.dul...@mapz.com >> >> <mailto:stefan.dul...@mapz.com>> wrote: >> >> >> >> Hi everyone, >> >> >> >> at the moment I am trying out a new setup for our geo database server. >> >> Primarily I upgraded from Postgres and Postgis to newer versions. Beside >> >> of the memory both databases run on the same machine setups and contain >> >> identical data. Differences are: >> >> >> >> OLD SETUP >> >> >> >> 16GB RAM >> >> Postgres 10.12 >> >> Postgis 2.4 >> >> shared_buffers = 4GB >> >> work_mem = 128MB >> >> maintenance_work_mem = 1536MB >> >> >> >> NEW SETUP >> >> >> >> 32GB RAM >> >> Postgres 12.1 >> >> Postgis 3.0 >> >> shared_buffers = 6GB >> >> work_mem = 256MB >> >> maintenance_work_mem = 3GB >> >> >> >> >> >> Sadly I observe several queries that take multiple times to execute with >> >> the new setup. While the most expensive query took ~10h on the old setup, >> >> nearly 3 days are needed on the new setup. Currently I am analysing a >> >> more lightweight query, that execution time doubled with the new setup. >> >> >> >> Can someone help me with the EXPLAIN ANALYZE results? Is Postgres 12 / >> >> Postgis 3 slower in general? Is the use of parallel workers ineffective >> >> in my use case? >> >> >> >> Thanks in advance! >> >> >> >> Stefan Duling >> >> >> >> ————————— >> >> >> >> OLD SETUP >> >> >> >> EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS >> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> >> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry >> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, >> >> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo >> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; >> >> >> >> QUERY PLAN >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> ProjectSet (cost=274906731.77..3540274986.77 rows=616689000000 width=65) >> >> (actual time=97024.487..141100.410 rows=739733 loops=1) >> >> -> GroupAggregate (cost=274906731.77..299574291.77 rows=616689000 >> >> width=65) (actual time=97024.476..118509.455 rows=611704 loops=1) >> >> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name >> >> <http://osm_admin_areas.name/>, osm_admin_areas.admin_level, >> >> osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, >> >> osm_admin_areas.iso, osm_admin_areas.is_in >> >> -> Sort (cost=274906731.77..276448454.27 rows=616689000 >> >> width=65) (actual time=97024.431..108902.529 rows=708178 loops=1) >> >> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name >> >> <http://osm_admin_areas.name/>, osm_admin_areas.admin_level, >> >> osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, >> >> osm_admin_areas.iso, osm_admin_areas.is_in >> >> Sort Method: external merge Disk: 4087720kB >> >> -> Result (cost=0.00..163888355.59 rows=616689000 >> >> width=65) (actual time=42.029..84168.985 rows=708178 loops=1) >> >> -> ProjectSet (cost=0.00..3549215.58 rows=616689000 >> >> width=65) (actual time=42.019..83866.290 rows=708178 loops=1) >> >> -> Seq Scan on osm_admin_areas >> >> (cost=0.00..308514.89 rows=616689 width=6931) (actual >> >> time=16.288..9536.779 rows=611707 loops=1) >> >> Planning time: 0.364 ms >> >> Execution time: 244169.613 ms >> >> (11 rows) >> >> >> >> ————————— >> >> >> >> NEW SETUP >> >> >> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS >> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, >> >> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry >> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, >> >> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo >> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in; >> >> >> >> QUERY PLAN >> >> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> ProjectSet (cost=390996247.38..409798165.99 rows=620679000 width=65) >> >> (actual time=38272.152..512018.424 rows=750429 loops=1) >> >> -> GroupAggregate (cost=390996247.38..391174692.59 rows=620679 >> >> width=65) (actual time=38272.119..58302.727 rows=611704 loops=1) >> >> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name >> >> <http://osm_admin_areas.name/>, osm_admin_areas.admin_level, >> >> osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, >> >> osm_admin_areas.iso, osm_admin_areas.is_in >> >> -> Sort (cost=390996247.38..390997799.08 rows=620679 width=65) >> >> (actual time=38272.021..46632.706 rows=708178 loops=1) >> >> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name >> >> <http://osm_admin_areas.name/>, osm_admin_areas.admin_level, >> >> osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, >> >> osm_admin_areas.iso, osm_admin_areas.is_in >> >> Sort Method: external merge Disk: 4087328kB >> >> -> Gather (cost=1000.00..390930320.45 rows=620679 >> >> width=65) (actual time=1035.032..23429.953 rows=708178 loops=1) >> >> Workers Planned: 4 >> >> Workers Launched: 4 >> >> -> Result (cost=0.00..390867252.55 rows=155170000 >> >> width=65) (actual time=817.294..20634.253 rows=141636 loops=5) >> >> -> ProjectSet (cost=0.00..1390552.55 >> >> rows=155170000 width=65) (actual time=817.287..20546.850 rows=141636 >> >> loops=5) >> >> -> Parallel Seq Scan on osm_admin_areas >> >> (cost=0.00..226001.70 rows=155170 width=6702) (actual >> >> time=0.381..3937.298 rows=122341 loops=5) >> >> Planning Time: 13.173 ms >> >> JIT: >> >> Functions: 110 >> >> Options: Inlining true, Optimization true, Expressions true, Deforming >> >> true >> >> Timing: Generation 27.663 ms, Inlining 580.814 ms, Optimization 2265.257 >> >> ms, Emission 1222.527 ms, Total 4096.260 ms >> >> Execution Time: 629042.211 ms >> >> (18 rows) >> >> _______________________________________________ >> >> postgis-users mailing list >> >> postgis-users@lists.osgeo.org <mailto:postgis-users@lists.osgeo.org> >> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >> >> <https://lists.osgeo.org/mailman/listinfo/postgis-users> >> > >> > _______________________________________________ >> > postgis-users mailing list >> > postgis-users@lists.osgeo.org <mailto:postgis-users@lists.osgeo.org> >> > https://lists.osgeo.org/mailman/listinfo/postgis-users >> > <https://lists.osgeo.org/mailman/listinfo/postgis-users> >> >> _______________________________________________ >> postgis-users mailing list >> postgis-users@lists.osgeo.org <mailto:postgis-users@lists.osgeo.org> >> https://lists.osgeo.org/mailman/listinfo/postgis-users >> <https://lists.osgeo.org/mailman/listinfo/postgis-users>_______________________________________________ >> postgis-users mailing list >> postgis-users@lists.osgeo.org <mailto:postgis-users@lists.osgeo.org> >> https://lists.osgeo.org/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list postgis-users@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/postgis-users