Hi,

Kai has made a number of interesting improvements to osm2pgsql in the last weeks. I believe some bits are still work in progress but on the whole osm2pgsql has become a lot more efficient - it makes better use of cache memory and can even use multiple processes for some tasks. Anyone who regularly spends time waiting for osm2pgsql to complete is encouraged to check out a recent version from svn and try if that improves things for him.

I think it would be great to share results of osm2pgsql runs among users - how long does it take to import X on infrastructure Y?

I've made a start here, please add/modify as you see fit:

http://wiki.openstreetmap.org/wiki/Osm2pgsql/Benchmarks

There's one particular use case that osm2pgsql did not cover so well in the past - the "I don't want to apply updates but I need to use slim mode nonetheless because I don't have enough memory for non-slim" use case.

osm2pgsql is not very well suited for this because it puts all its temporary information into the database instead of a more efficient random-access structure. This is something I'll leave for someone else to fix, but I did one thing to make this use case a bit better; I introduced a "--drop" flag that makes osm2pgsql drop the temporary tables after import, and also does not create the indexes on way id and relation id that a --slim import normally created. So now, after importing a data set with --drop and --slim, you should have a database that looks almost the same as one imported without --slim. By dropping the unnecessary tables and indexes, the database usually is only 25% of the size of a complete --slim import (but of course it is unsuitable for updates).

There's one strange thing I noticed. When I dropped the creation of indexes (more precisely, primary keys) on way id and polygon id, suddenly osm2pgsql took ages to run - even though these indexes are clearly not created in non-slim mode and therefore should not be required.

I found out that the culprit is in the multipolygon code, where after finding out that an one-way outer ring is tagged the same as the multipolgon relation itself, a "delete_way_from_output" is issued, presumably to remove that already-generated ring. This leads to a "DELETE from <table> where osm_id=<id>" which requires a table scan because of lack of primary keys.

I have now disabled this for --slim --drop mode (the change will not affect normal --slim mode), but have to investigate further - this will likely create some extra areas for outer rings, but since it doesn't have these indexes, non-slim mode should exhibit the same behaviour.

Is anyone aware of multipolygon handling not working right when not using --slim? We might have to (re)introduce the primary key for osm_id at least on the polygon table to allow this deletion of duplicate areas.


_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

Reply via email to