Le samedi 28 juillet 2012 13:39:48, Jukka Rahkonen a écrit : > Even Rouault <even.rouault <at> mines-paris.org> writes: > > I've commited in r24707 a change that is mainly a custom indexation > > mechanism for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to > > improve performances (Improve them about by a factor of 2 on a 1 GB PBF > > on my PC) > > I had a try with finland.osm.pbf and germany.osm.pbf with Windows 64-bit > binaries containing that change. Conversion of the Finnish OSM data with > ogr2ogr and the default osmconf.ini into Spatialite format took about 5 > minutes and it was a minute or two faster than it used to be. Conversion > of German data took 17 hours and it was a about as slow as before.
Yes, the performance improvement isn't so obvious when I/O is the limiting factor. However, the performance on germany.osm.pbf seemed very slow on your PC, but after testing on mine it takes ~9 hours, which seemed too slow since a conversion for the full planet-latest.osm.pbf (17 GB) into "null" (this is a debug output driver, not compiled by default, that doesn't write anything) has taken ~ 30h (which, while looking at http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks, isn't particularly bad) After investigations, most of the slowdown it is due to the building of the spatial index of the output spatialite DB. When the spatial index is created at DB initialization, and updated at each feature insertion, the performance is clearly affected. For example, when adding -lco SPATIAL_INDEX=NO to the command line, the conversion of germany only takes 2 hours. Adding manually the spatial index at the end with ogrinfo the.db -sql "SELECT CreateSpatialIndex('points', 'GEOMETRY')" (and the same for lines, polygons, multilines, multipolygons, other_relations) takes ~ 22 minutes, so overall, this is 4 times faster. In r24715, I've implemented defered spatial index creation. And indeed the whole process takes now ~ 2h20. > I guess it may be the output to spatialite format that gets so slow when > database size gets bigger. CPU usage was only couple of percents during > the last 10 hours and process took only 100-200 MB of memory. > What other output format could you recommend for testing? I don't think the output format would change performance so much. What takes time is disk seeking to get nodes to build way geometries, or to get ways to build multi geometries. So having RAID disks might help. The writing of the output data might certainly reduce the efficiency of OS I/O caching, but except if an output format is particularly verbose comparing to others, that should have little influence. What can speed-up things is to have lots of RAM and specify a huge value for OSM_MAX_TMPFILE_SIZE. Typically this would be 4 times the size of the PBF. However if the temp file(s) doesn't fit entirely into that size, this will not bring any advantage. > > -Jukka Rahkonen- > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/gdal-dev _______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev