Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Even Rouault wrote: Another set of tests with a brand new and quite powerful laptop. Specs for the computer: Intel i7-2760QM @2.4 GHz processor (8 threads) Hitachi Travelstar Z7K320 7200 rpm SATA disk 8 GB of memory Windows 7, 64-bit GDAL-version r24717, Win64 build from gisinternals.com Timings for germany.osm.pbf (1.3 GB) A) Default settings with command ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite germany.osm.pbf -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS OFF - reading the data 67 minutes - creating spatial indexes 38 minutes - total 105 minutes B) Using in-memory Spatialite db for the first step by giving SET OSM_MAX_TMPFILE_SIZE=7000 - reading the data 16 minutes - creating spatial indexes 38 minutes - total 54 minutes Peak memory usage during this conversion was 4.4 GB. Conclusions === * The initial reading of data is heavily i/o bound. This phase is really fast if there is enough memory for keeping the OSM tempfile in memory but SSD disk seems to offer equally good performance. * Creating spatial indexes for the Spatialite tables is also i/o bound. The hardware sets the speed limit and there are no other tricks for improving the performance. Multi-core CPU is quite idle during this phase with 10-15% load. * If user does not plan to do spatial queries then then it may be handy to save some time and create the Spatialite db without spatial indexes by using -lco SPATIAL_INDEX=NO option. * Windows disk i/o may be a limiting factor. I consider that for small OSM datasets the speed starts to be good enough. For me it is about the same if converting the Finnish OSM data (137 MB in .pbf format) takes 160 or 140 seconds when using the default settings or in-memory temporary database, respectively. Interesting findings. A SSD is of course the ideal hardware to get efficient random access to the nodes. I've just introduced inr 24719 a new config. option OSM_COMPRESS_NODES that can be set to YES. The effect is to use a compression algorithm while storing the temporary node DB. This can compress to a factor of 3 or 4, and help keeping the node DB to a size where it is below the RAM size and that the OS can dramatically cache it (at least on Linux). This can be efficient for OSM extracts of the size of the country, but probably not for a planet file. In the case of Germany and France, here's the effect on my PC (SATA disk) : $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf - progress --config OSM_COMPRESS_NODES YES [...] real25m34.029s user15m11.530s sys 0m36.470s $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf - progress --config OSM_COMPRESS_NODES NO [...] real74m33.077s user15m38.570s sys 1m31.720s $ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf - progress --config OSM_COMPRESS_NODES YES [...] real7m46.594s user7m24.990s sys 0m11.880s $ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf - progress --config OSM_COMPRESS_NODES NO [...] real108m48.967s user7m47.970s sys 2m9.310s I didn't turn it to YES by default, because I'm unsure of the performance impact on SSD. Perhaps you have a chance to test. I cannot test with SSD before weekend but otherwise the new configuration option really makes difference in some circumstances. I have ended up to use the following base command in speed tests: ogr2ogr -f SQLite -dsco spatialite=yes germany.sqlite germany.osm.pbf -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS -lco SPATIAL_INDEX=NO Writing into Spatialite is pretty fast with these options and even your null driver does not seem to be very much faster. What happens after this step (like creating indexes) has nothing to do with OSM driver. Test with Intel i7-2760QM @2.4 GHz processor, 7200 rpm SATA disk and 1.3 GB input file 'germany.osm.pbf' --config OSM_COMPRESS_NODES NO 67 minutes --config OSM_COMPRESS_NODES YES 15 minutes It means 52 minutes less time or 4.5 times more speed. Out of curiosity I tried what happens if I do the whole file input/output by using a 2.5 external USB 2.0 drive. 19 minutes! I made also a few tests with an old and much slower Windows computer. Running osm2pgsql with Finnish OSM data with that machine takes nowadays about 3 hours. Test with a single Intel Xeon @2.4 GHz processor and the same external USB 2.0 disk than in previous test Input file 'finland.osm.pbf' 122 MB Result: 7 minutes for both OSM_COMPRESS_NODES NO and OSM_COMPRESS_NODES YES Input file 'germany.osm.pbf' 1.3 GB Result: 112 minutes with OSM_COMPRESS_NODES YES Conclusions: * When input file has reached the limit where disk i/o cannot utilize cache properly, the compress_nodes setting
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Selon Rahkonen Jukka jukka.rahko...@mmmtike.fi: Interesting results. I'll wait a bit for your tests with SSD to turn OSM_COMPRESS_NODES to YES. Even if doesn't bring clear advantages, I don't think it would hurt a lot, because the extra CPU load introduced by the compression/decompression shouldn't be that high (the compression algorithm used is just encoding in Protocol Buffer of the differences of longitude/latitude between consecutive nodes, by chunk of 64 nodes) Just a word of caution to remind you that the temporary node DB will be written in the directory pointed by the CPL_TMPDIR config. option/env. variable if defined, if not defined in TMPDIR, if not defined in TEMP, if not defined in the current directory form which ogr2ogr is started. In Windows system, the TEMP env. variable is generally defined, so when you test with your USB external driver, it is very likely that the node DB is written in the temporary directory associated with your Windows account. As far as CPU load is concerned, the conversion is a single-threaded processus, so on a 8 core system, it is expected that it tops at 100 / 8 = 12,5 % of the global CPU power. With which hardware configuration and input PBF file do you manage to reach 100% CPU ? Is that load constant during the process : I imagine that it could change according to the stage of the conversion. There might be a potential for parallelizing some stuff. What comes in mind for now would be PBF decoding (when profiling only PBF decoding, the gzip decompression is the major CPU user, but not sure if it matters that much in a real-life ogr2ogr job) or way resolution (currently, we group ways into a batch until 1 million nodes or 75 000 ways have to be resolved, which leads to more efficient search in the node DB since we can sort nodes by increasing ids and avoid useless seeks). But that's not obviously immediate which would lead to increased efficiently. Parallelization may also generally requires more RAM if you need work buffers for each thread. Even Rouault wrote: Another set of tests with a brand new and quite powerful laptop. Specs for the computer: Intel i7-2760QM @2.4 GHz processor (8 threads) Hitachi Travelstar Z7K320 7200 rpm SATA disk 8 GB of memory Windows 7, 64-bit GDAL-version r24717, Win64 build from gisinternals.com Timings for germany.osm.pbf (1.3 GB) A) Default settings with command ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite germany.osm.pbf -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS OFF - reading the data 67 minutes - creating spatial indexes 38 minutes - total 105 minutes B) Using in-memory Spatialite db for the first step by giving SET OSM_MAX_TMPFILE_SIZE=7000 - reading the data 16 minutes - creating spatial indexes 38 minutes - total 54 minutes Peak memory usage during this conversion was 4.4 GB. Conclusions === * The initial reading of data is heavily i/o bound. This phase is really fast if there is enough memory for keeping the OSM tempfile in memory but SSD disk seems to offer equally good performance. * Creating spatial indexes for the Spatialite tables is also i/o bound. The hardware sets the speed limit and there are no other tricks for improving the performance. Multi-core CPU is quite idle during this phase with 10-15% load. * If user does not plan to do spatial queries then then it may be handy to save some time and create the Spatialite db without spatial indexes by using -lco SPATIAL_INDEX=NO option. * Windows disk i/o may be a limiting factor. I consider that for small OSM datasets the speed starts to be good enough. For me it is about the same if converting the Finnish OSM data (137 MB in .pbf format) takes 160 or 140 seconds when using the default settings or in-memory temporary database, respectively. Interesting findings. A SSD is of course the ideal hardware to get efficient random access to the nodes. I've just introduced inr 24719 a new config. option OSM_COMPRESS_NODES that can be set to YES. The effect is to use a compression algorithm while storing the temporary node DB. This can compress to a factor of 3 or 4, and help keeping the node DB to a size where it is below the RAM size and that the OS can dramatically cache it (at least on Linux). This can be efficient for OSM extracts of the size of the country, but probably not for a planet file. In the case of Germany and France, here's the effect on my PC (SATA disk) : $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf - progress --config OSM_COMPRESS_NODES YES [...] real25m34.029s user15m11.530s sys 0m36.470s $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf - progress --config
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Hi, Right, temporary DB was written on system disk so I repeated the test. Now everything was for sure done on the same USB 2.0 disk (read pbf, write results to Spatialite and handle temporaty DB). It took a bit longer but difference was not very big: 26 minutes vs. 19 minutes when temporary DB was on the system disk. CPU stays close to 100% on single processor XP throughout the conversion. Same thing with a laptop having a two-core processor and running on 32-bit Vista, both cores seem burn at full power. I made a rough parallelizing test by making 4 copies of finland.osm.pbf and running ogr2ogr in four separate windows. This way the total CPU load of the 8 cores was staying around 50%. Result: All four conversions were ready after 3 minutes (45 seconds per conversion) while a single conversion takes 2 minutes. Conclusion: 4 parellel conversions in 3 minutes vs. within 8 minutes if performed as serial runs is much faster. 50% CPU load may tell that the speed of SATA disk is the limiting factor now. Test with SSD drive should give more information about this. I had a try also with 6 parellel runs but that was slower with all runs ready in 6 minutes which makes 60 seconds per conversion. With 8 runs computer jammed when all the progress bars were at 95%. Result was not a surprise because my experience about doing image processing with gdalwarp and gdal_translate with an 8-core server is that I can get the maximum throughput with our hardware by running processes in 4-6 windows. If there are too many image conversions going on they start to disturb each other because disks cannot serve them all properly. However that computer behaves better when running conversions in 6 or more windows than this laptop. Somehow it feels like the laptop has only 4 real processors/cores even the resource manager is showing eight. I believe that by parallelizing the conversion program it is hard to take the juice as effectively from all the cores. It may be difficult to feed rendering chain by having a bunch of source databases but it looks strongly that by splitting Germany into four distinct OSM source files it would be possible to import the whole country in 15 minutes with a good laptop. Size of the OSM planet file is under 20 GB. Simple calculation suggests that importing the whole planet might be possible to do in 5 hours. With a laptop. Who will make a try? -Jukka- Even Rouault wrote: Selon Rahkonen Jukka jukka.rahko...@mmmtike.fi: Interesting results. I'll wait a bit for your tests with SSD to turn OSM_COMPRESS_NODES to YES. Even if doesn't bring clear advantages, I don't think it would hurt a lot, because the extra CPU load introduced by the compression/decompression shouldn't be that high (the compression algorithm used is just encoding in Protocol Buffer of the differences of longitude/latitude between consecutive nodes, by chunk of 64 nodes) Just a word of caution to remind you that the temporary node DB will be written in the directory pointed by the CPL_TMPDIR config. option/env. variable if defined, if not defined in TMPDIR, if not defined in TEMP, if not defined in the current directory form which ogr2ogr is started. In Windows system, the TEMP env. variable is generally defined, so when you test with your USB external driver, it is very likely that the node DB is written in the temporary directory associated with your Windows account. As far as CPU load is concerned, the conversion is a single-threaded processus, so on a 8 core system, it is expected that it tops at 100 / 8 = 12,5 % of the global CPU power. With which hardware configuration and input PBF file do you manage to reach 100% CPU ? Is that load constant during the process : I imagine that it could change according to the stage of the conversion. There might be a potential for parallelizing some stuff. What comes in mind for now would be PBF decoding (when profiling only PBF decoding, the gzip decompression is the major CPU user, but not sure if it matters that much in a real-life ogr2ogr job) or way resolution (currently, we group ways into a batch until 1 million nodes or 75 000 ways have to be resolved, which leads to more efficient search in the node DB since we can sort nodes by increasing ids and avoid useless seeks). But that's not obviously immediate which would lead to increased efficiently. Parallelization may also generally requires more RAM if you need work buffers for each thread. Even Rouault wrote: Another set of tests with a brand new and quite powerful laptop. Specs for the computer: Intel i7-2760QM @2.4 GHz processor (8 threads) Hitachi Travelstar Z7K320 7200 rpm SATA disk 8 GB of memory Windows 7, 64-bit GDAL-version r24717, Win64 build from gisinternals.com Timings for germany.osm.pbf (1.3 GB) A) Default settings with command ogr2ogr -f
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
I made a rough parallelizing test by making 4 copies of finland.osm.pbf and running ogr2ogr in four separate windows. This way the total CPU load of the 8 cores was staying around 50%. Result: All four conversions were ready after 3 minutes (45 seconds per conversion) while a single conversion takes 2 minutes. In my opinion, 45 seconds per conversion isn't really a good summary : I'd say that your computer could handle 4 conversions in parallel in 3 minutes. But the fact of running conversions in parallel didn't make them *individually* faster (that would be non-sense) that running a single one. We probably agree, that's just the way of presenting the info that is a bit strange. Conclusion: 4 parellel conversions in 3 minutes vs. within 8 minutes if performed as serial runs is much faster. 50% CPU load may tell that the speed of SATA disk is the limiting factor now. Test with SSD drive should give more information about this. Yes at some point the disk is the limiting factor whatever the number of CPUs you have. Somehow it feels like the laptop has only 4 real processors/cores even the resource manager is showing eight. I've not followed what the CPU state-of-the-art is currently, but perhaps it is a quad-core with hyper-theading ? The hyper-threaded virtual cores wouldn't be as efficient as normal cores. I believe that by parallelizing the conversion program it is hard to take the juice as effectively from all the cores. Yes, if you parallelize I/O operations, then there's a risk that it makes it slower actually. Only the CPU intensive operations should be parallelized to limit that risk. But when reading OSM data, there isn't that much computation involved. Way resolving is somehow stupid and mostly aobut I/O after all. Only the resolving of multipolygons might involve CPU intensive operations to compute the spatial relation between rings, but that's a tiny amount of the data of a OSM file, and even if it is slow, it is perhaps 10 or 20% of the global conversion time. It may be difficult to feed rendering chain by having a bunch of source databases but it looks strongly that by splitting Germany into four distinct OSM source files it would be possible to import the whole country in 15 minutes with a good laptop. I still maintain that splitting a file is a non trivial task. I strongly believe that to do so, you must import the whole country and do spatial requests afterwards. So, if the data producer doesn't do it for you, there's no point in doing it at your end. However if you get it splitted , then it might indeed be beneficial to operate on smaller extracts. (With a risk of some duplicated and/or truncated and/or missing objects at the border of the tiles) ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Even Rouault wrote: I made a rough parallelizing test by making 4 copies of finland.osm.pbf and running ogr2ogr in four separate windows. This way the total CPU load of the 8 cores was staying around 50%. Result: All four conversions were ready after 3 minutes (45 seconds per conversion) while a single conversion takes 2 minutes. In my opinion, 45 seconds per conversion isn't really a good summary : I'd say that your computer could handle 4 conversions in parallel in 3 minutes. But the fact of running conversions in parallel didn't make them *individually* faster (that would be non-sense) that running a single one. We probably agree, that's just the way of presenting the info that is a bit strange. Ok, let's use other units. Some suggestions: - data process rate as MB/sec or MB/minute (input sile size in pbf format) - node conversion rate nodes/sec - way or feature conversion rate as count/sec None of them is a perfect speed unit. Nodes/sec feels most exact but practical speed depends on the nature of data, especially on the amount of relations and how complicated they are. Megabytes of pbf data per minute could be rather good measure too. In my single process vs. four parallel processes example the conversion rates were 60 MB/minute vs. 160 MB/minute, respectively. By looking at file sizes in http://download.geofabrik.de/osm/europe/ one can make a fast estimate that converting 300 MB of data from Spain should take about 5 minutes. With parallel runs Finland, Sweden and Norway would also be ready at the same time without any cost. .. It may be difficult to feed rendering chain by having a bunch of source databases but it looks strongly that by splitting Germany into four distinct OSM source files it would be possible to import the whole country in 15 minutes with a good laptop. I still maintain that splitting a file is a non trivial task. I strongly believe that to do so, you must import the whole country and do spatial requests afterwards. So, if the data producer doesn't do it for you, there's no point in doing it at your end. However if you get it splitted , then it might indeed be beneficial to operate on smaller extracts. (With a risk of some duplicated and/or truncated and/or missing objects at the border of the tiles) I agree. Splitting OSM data files on the client side was my ancient idea from more than a week ago. It does not make sense nowadays. Data should come splitted from the data producer. It would need some thinking about how to split the data so that there would not be troubles at the data set seams. This GSoC project seems to aim at something similar http://wiki.openstreetmap.org/wiki/Google_Summer_of_Code/2012/Data_Tile_Service -Jukka- ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Another set of tests with a brand new and quite powerful laptop. Specs for the computer: Intel i7-2760QM @2.4 GHz processor (8 threads) Hitachi Travelstar Z7K320 7200 rpm SATA disk 8 GB of memory Windows 7, 64-bit GDAL-version r24717, Win64 build from gisinternals.com Timings for germany.osm.pbf (1.3 GB) A) Default settings with command ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite germany.osm.pbf -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS OFF - reading the data 67 minutes - creating spatial indexes 38 minutes - total 105 minutes B) Using in-memory Spatialite db for the first step by giving SET OSM_MAX_TMPFILE_SIZE=7000 - reading the data 16 minutes - creating spatial indexes 38 minutes - total 54 minutes Peak memory usage during this conversion was 4.4 GB. Conclusions === * The initial reading of data is heavily i/o bound. This phase is really fast if there is enough memory for keeping the OSM tempfile in memory but SSD disk seems to offer equally good performance. * Creating spatial indexes for the Spatialite tables is also i/o bound. The hardware sets the speed limit and there are no other tricks for improving the performance. Multi-core CPU is quite idle during this phase with 10-15% load. * If user does not plan to do spatial queries then then it may be handy to save some time and create the Spatialite db without spatial indexes by using -lco SPATIAL_INDEX=NO option. * Windows disk i/o may be a limiting factor. I consider that for small OSM datasets the speed starts to be good enough. For me it is about the same if converting the Finnish OSM data (137 MB in .pbf format) takes 160 or 140 seconds when using the default settings or in-memory temporary database, respectively. Interesting findings. A SSD is of course the ideal hardware to get efficient random access to the nodes. I've just introduced inr 24719 a new config. option OSM_COMPRESS_NODES that can be set to YES. The effect is to use a compression algorithm while storing the temporary node DB. This can compress to a factor of 3 or 4, and help keeping the node DB to a size where it is below the RAM size and that the OS can dramatically cache it (at least on Linux). This can be efficient for OSM extracts of the size of the country, but probably not for a planet file. In the case of Germany and France, here's the effect on my PC (SATA disk) : $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf - progress --config OSM_COMPRESS_NODES YES [...] real25m34.029s user15m11.530s sys 0m36.470s $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf - progress --config OSM_COMPRESS_NODES NO [...] real74m33.077s user15m38.570s sys 1m31.720s $ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf -progress --config OSM_COMPRESS_NODES YES [...] real7m46.594s user7m24.990s sys 0m11.880s $ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf -progress --config OSM_COMPRESS_NODES NO [...] real108m48.967s user7m47.970s sys 2m9.310s I didn't turn it to YES by default, because I'm unsure of the performance impact on SSD. Perhaps you have a chance to test. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Jukka Rahkonen jukka.rahkonen at mmmtike.fi writes: I borrowed my son's computer and made one more test. Important numbers about the computer: Windows 7, 64 -bit Four-core Intel i5 2500k @3,3 GHz SSD disk Timings for germany.osm.pbf by using -lco SPATIAL_INDEX=NO. Times are total times from the beginning of the test 70% progress - 5 minutes (I suppose resolving ways begins about at this phase) 100 % progress - 17 minutes manual index creation for all the layers ready - 45 minutes Results are interesting. Import to Spatialite without indexing was six times faster for me and I suppose it is mostly because of the SSD drive. But creating indexes took me 30 minutes while you timing was 22 minutes. Perhaps there is something sub-optimal in combination of Windows/Spatialite/Create spatial index. Anyway, the score for germany.osm.pbf is now 45 minutes. Someone with Linux and SSD drive is perhaps the one to beat the record. For comparison, converting finland.osm.pbf took 2 min 40 sec. Converting Germany took 20 times more time and I believe that the relation is OK now. Another set of tests with a brand new and quite powerful laptop. Specs for the computer: Intel i7-2760QM @2.4 GHz processor (8 threads) Hitachi Travelstar Z7K320 7200 rpm SATA disk 8 GB of memory Windows 7, 64-bit GDAL-version r24717, Win64 build from gisinternals.com Timings for germany.osm.pbf (1.3 GB) A) Default settings with command ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite germany.osm.pbf -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS OFF - reading the data 67 minutes - creating spatial indexes 38 minutes - total 105 minutes B) Using in-memory Spatialite db for the first step by giving SET OSM_MAX_TMPFILE_SIZE=7000 - reading the data 16 minutes - creating spatial indexes 38 minutes - total 54 minutes Peak memory usage during this conversion was 4.4 GB. Conclusions === * The initial reading of data is heavily i/o bound. This phase is really fast if there is enough memory for keeping the OSM tempfile in memory but SSD disk seems to offer equally good performance. * Creating spatial indexes for the Spatialite tables is also i/o bound. The hardware sets the speed limit and there are no other tricks for improving the performance. Multi-core CPU is quite idle during this phase with 10-15% load. * If user does not plan to do spatial queries then then it may be handy to save some time and create the Spatialite db without spatial indexes by using -lco SPATIAL_INDEX=NO option. * Windows disk i/o may be a limiting factor. I consider that for small OSM datasets the speed starts to be good enough. For me it is about the same if converting the Finnish OSM data (137 MB in .pbf format) takes 160 or 140 seconds when using the default settings or in-memory temporary database, respectively. -Jukka Rahkonen- ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Le samedi 28 juillet 2012 13:39:48, Jukka Rahkonen a écrit : Even Rouault even.rouault at mines-paris.org writes: I've commited in r24707 a change that is mainly a custom indexation mechanism for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to improve performances (Improve them about by a factor of 2 on a 1 GB PBF on my PC) I had a try with finland.osm.pbf and germany.osm.pbf with Windows 64-bit binaries containing that change. Conversion of the Finnish OSM data with ogr2ogr and the default osmconf.ini into Spatialite format took about 5 minutes and it was a minute or two faster than it used to be. Conversion of German data took 17 hours and it was a about as slow as before. Yes, the performance improvement isn't so obvious when I/O is the limiting factor. However, the performance on germany.osm.pbf seemed very slow on your PC, but after testing on mine it takes ~9 hours, which seemed too slow since a conversion for the full planet-latest.osm.pbf (17 GB) into null (this is a debug output driver, not compiled by default, that doesn't write anything) has taken ~ 30h (which, while looking at http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks, isn't particularly bad) After investigations, most of the slowdown it is due to the building of the spatial index of the output spatialite DB. When the spatial index is created at DB initialization, and updated at each feature insertion, the performance is clearly affected. For example, when adding -lco SPATIAL_INDEX=NO to the command line, the conversion of germany only takes 2 hours. Adding manually the spatial index at the end with ogrinfo the.db -sql SELECT CreateSpatialIndex('points', 'GEOMETRY') (and the same for lines, polygons, multilines, multipolygons, other_relations) takes ~ 22 minutes, so overall, this is 4 times faster. In r24715, I've implemented defered spatial index creation. And indeed the whole process takes now ~ 2h20. I guess it may be the output to spatialite format that gets so slow when database size gets bigger. CPU usage was only couple of percents during the last 10 hours and process took only 100-200 MB of memory. What other output format could you recommend for testing? I don't think the output format would change performance so much. What takes time is disk seeking to get nodes to build way geometries, or to get ways to build multi geometries. So having RAID disks might help. The writing of the output data might certainly reduce the efficiency of OS I/O caching, but except if an output format is particularly verbose comparing to others, that should have little influence. What can speed-up things is to have lots of RAM and specify a huge value for OSM_MAX_TMPFILE_SIZE. Typically this would be 4 times the size of the PBF. However if the temp file(s) doesn't fit entirely into that size, this will not bring any advantage. -Jukka Rahkonen- ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Even Rouault even.rouault at mines-paris.org writes: However, the performance on germany.osm.pbf seemed very slow on your PC, but after testing on mine it takes ~9 hours, which seemed too slow since a conversion for the full planet-latest.osm.pbf (17 GB) into null (this is a debug output driver, not compiled by default, that doesn't write anything) has taken ~ 30h (which, while looking at http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks, isn't particularly bad) After investigations, most of the slowdown it is due to the building of the spatial index of the output spatialite DB. When the spatial index is created at DB initialization, and updated at each feature insertion, the performance is clearly affected. For example, when adding -lco SPATIAL_INDEX=NO to the command line, the conversion of germany only takes 2 hours. Adding manually the spatial index at the end with ogrinfo the.db -sql SELECT CreateSpatialIndex('points', 'GEOMETRY') (and the same for lines, polygons, multilines, multipolygons, other_relations) takes ~ 22 minutes, so overall, this is 4 times faster. In r24715, I've implemented defered spatial index creation. And indeed the whole process takes now ~ 2h20. Hi, I borrowed my son's computer and made one more test. Important numbers about the computer: Windows 7, 64 -bit Four-core Intel i5 2500k @3,3 GHz SSD disk Timings for germany.osm.pbf by using -lco SPATIAL_INDEX=NO. Times are total times from the beginning of the test 70% progress - 5 minutes (I suppose resolving ways begins about at this phase) 100 % progress - 17 minutes manual index creation for all the layers ready - 45 minutes Results are interesting. Import to Spatialite without indexing was six times faster for me and I suppose it is mostly because of the SSD drive. But creating indexes took me 30 minutes while you timing was 22 minutes. Perhaps there is something sub-optimal in combination of Windows/Spatialite/Create spatial index. Anyway, the score for germany.osm.pbf is now 45 minutes. Someone with Linux and SSD drive is perhaps the one to beat the record. For comparison, converting finland.osm.pbf took 2 min 40 sec. Converting Germany took 20 times more time and I believe that the relation is OK now. -Jukka Rahkonen- ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Even Rouault even.rouault at mines-paris.org writes: I've commited in r24707 a change that is mainly a custom indexation mechanism for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to improve performances (Improve them about by a factor of 2 on a 1 GB PBF on my PC) I had a try with finland.osm.pbf and germany.osm.pbf with Windows 64-bit binaries containing that change. Conversion of the Finnish OSM data with ogr2ogr and the default osmconf.ini into Spatialite format took about 5 minutes and it was a minute or two faster than it used to be. Conversion of German data took 17 hours and it was a about as slow as before. I guess it may be the output to spatialite format that gets so slow when database size gets bigger. CPU usage was only couple of percents during the last 10 hours and process took only 100-200 MB of memory. What other output format could you recommend for testing? -Jukka Rahkonen- ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Le lundi 23 juillet 2012 19:25:22, Smith, Michael ERDC-CRREL-NH a écrit : Even, [osmusr@bigserver-proc osm]$ ogr2ogr -progress -f oci oci:user/pass@tns:tmp planet-latest.osm.pbf -lco dim=2 -lco srid=4326 -lco geometry_name=geometry -lco launder=yes --debug on 2osm_debug.log 0...10...20...30...40...50...60Š70 [osmusr@bigserver-proc osm]$ From the debug output Michael, The debug output would suggest that there was no more data to process, which is strange. I've tested a bit with a planet file dating back to a few weeks, with a modified OSM driver that does basically no processing except the parsing, and it managed to parse until the end of file. So in your situation, I'd assume that there was a parsing error, but I'm not 100% positive (might be something wrong in the interleaved reading mode ?) I've commited in r24707 a change that is mainly a custom indexation mechanism for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to improve performances (Improve them about by a factor of 2 on a 1 GB PBF on my PC) Along with that change, I've added some facility for extra error outputs. If a parsing error occured, an error message will be printed. And, before recompiling, you can edit ogr/ogrsf_frmts/osm/gpb.h and uncomment (by removing the // at the beginning of //#define DEBUG_GPB_ERRORS) line 40. This should report a more precise error if there's something wrong during the GPB parsing. You might also retry with --debug OSM and, at the end of the processing, you'll see a trace Number of bytes read in file : XXX : you can check that the value is the same as the size of the PBF file. Even ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
OK, I'll retest with these changes. Thanks! Mike On 7/24/12 6:08 PM, Even Rouault even.roua...@mines-paris.org wrote: Le lundi 23 juillet 2012 19:25:22, Smith, Michael ERDC-CRREL-NH a écrit : Even, [osmusr@bigserver-proc osm]$ ogr2ogr -progress -f oci oci:user/pass@tns:tmp planet-latest.osm.pbf -lco dim=2 -lco srid=4326 -lco geometry_name=geometry -lco launder=yes --debug on 2osm_debug.log 0...10...20...30...40...50...60Š70 [osmusr@bigserver-proc osm]$ From the debug output Michael, The debug output would suggest that there was no more data to process, which is strange. I've tested a bit with a planet file dating back to a few weeks, with a modified OSM driver that does basically no processing except the parsing, and it managed to parse until the end of file. So in your situation, I'd assume that there was a parsing error, but I'm not 100% positive (might be something wrong in the interleaved reading mode ?) I've commited in r24707 a change that is mainly a custom indexation mechanism for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to improve performances (Improve them about by a factor of 2 on a 1 GB PBF on my PC) Along with that change, I've added some facility for extra error outputs. If a parsing error occured, an error message will be printed. And, before recompiling, you can edit ogr/ogrsf_frmts/osm/gpb.h and uncomment (by removing the // at the beginning of //#define DEBUG_GPB_ERRORS) line 40. This should report a more precise error if there's something wrong during the GPB parsing. You might also retry with --debug OSM and, at the end of the processing, you'll see a trace Number of bytes read in file : XXX : you can check that the value is the same as the size of the PBF file. Even ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
[gdal-dev] OSM Driver and World Planet file (pbf format)
I'm finding that the new OSM Driver (I tested again with r24699) has a problem when working with the whole planet file. When I tried with the US Northeast subset, I got multipolygons and multilinestring entries. When reading the whole planet file, I did not. It gets to 70% and then ends (but without an error message). I also got fewer polygons than I was expecting. It seems like the reading got interrupted by some non reported error. I was writing to Oracle for this importing but got the same results writing to sqlite. It seems that smaller extracts work fine but the are some reading issues with the whole planet file (in pbf format). I can try with the .osm format. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Le lundi 23 juillet 2012 12:56:12, Smith, Michael ERDC-RDE-CRREL-NH a écrit : I'm finding that the new OSM Driver (I tested again with r24699) has a problem when working with the whole planet file. When I tried with the US Northeast subset, I got multipolygons and multilinestring entries. When reading the whole planet file, I did not. It gets to 70% and then ends (but without an error message). I also got fewer polygons than I was expecting. It seems like the reading got interrupted by some non reported error. I was writing to Oracle for this importing but got the same results writing to sqlite. It seems that smaller extracts work fine but the are some reading issues with the whole planet file (in pbf format). I can try with the .osm format. I didn't try yet with whole planet files. Takes too much time :-) Which command line did you use exactly ? Did it stop cleanly or with a segfault ? In the latter case, (assuming you are on Linux), running under gdb might be useful. What is your OS, 32/64 bit ? Perhaps, you could add --debug on. I'd suggest redirecting standard error file to a file because the log file can be huge. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Even, It stopped cleanly (no segfault) at 70%. OS is RHEL 6.2 64 bit. Import time was about 340 min. Command was ogr2ogr -progress -f oci oci:user/pass@tns:tmp planet-latest.osm.pbf -lco dim=2 -lco srid=4326 -lco geometry_name=geometry -lco launder=yes I'm rerunning now with the debug log to a file. Mike -- Michael Smith Remote Sensing/GIS Center US Army Corps of Engineers On 7/23/12 7:05 AM, Even Rouault even.roua...@mines-paris.org wrote: Le lundi 23 juillet 2012 12:56:12, Smith, Michael ERDC-RDE-CRREL-NH a écrit : I'm finding that the new OSM Driver (I tested again with r24699) has a problem when working with the whole planet file. When I tried with the US Northeast subset, I got multipolygons and multilinestring entries. When reading the whole planet file, I did not. It gets to 70% and then ends (but without an error message). I also got fewer polygons than I was expecting. It seems like the reading got interrupted by some non reported error. I was writing to Oracle for this importing but got the same results writing to sqlite. It seems that smaller extracts work fine but the are some reading issues with the whole planet file (in pbf format). I can try with the .osm format. I didn't try yet with whole planet files. Takes too much time :-) Which command line did you use exactly ? Did it stop cleanly or with a segfault ? In the latter case, (assuming you are on Linux), running under gdb might be useful. What is your OS, 32/64 bit ? Perhaps, you could add --debug on. I'd suggest redirecting standard error file to a file because the log file can be huge. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] OSM Driver and World Planet file (pbf format)
Even, [osmusr@bigserver-proc osm]$ ogr2ogr -progress -f oci oci:user/pass@tns:tmp planet-latest.osm.pbf -lco dim=2 -lco srid=4326 -lco geometry_name=geometry -lco launder=yes --debug on 2osm_debug.log 0...10...20...30...40...50...60Š70 [osmusr@bigserver-proc osm]$ From the debug output OCI: Flushing 100 features on layer POLYGONS OCI: Flushing 100 features on layer POLYGONS OCI: Flushing 100 features on layer POLYGONS OCI: Flushing 100 features on layer POLYGONS OCI: Flushing 100 features on layer POLYGONS OCI: Flushing 100 features on layer POLYGONS OCI: Flushing 100 features on layer POLYGONS OSM: Switching to 'lines' as they are too many features in 'polygons' OGR2OGR: 32827 features written in layer 'POLYGONS' OCI: In Create Layer ... OCI: Prepare(CREATE TABLE MULTILINESTRINGS ( OGR_FID INTEGER, geometry MDSYS.SDO_GEOMETRY )) OGR2OGR: 0 features written in layer 'MULTILINESTRINGS' OCI: In Create Layer ... OCI: Prepare(CREATE TABLE MULTIPOLYGONS ( OGR_FID INTEGER, geometry MDSYS.SDO_GEOMETRY )) OGR2OGR: 0 features written in layer 'MULTIPOLYGONS' OCI: In Create Layer ... OCI: Prepare(CREATE TABLE OTHER_RELATIONS ( OGR_FID INTEGER, geometry MDSYS.SDO_GEOMETRY )) OGR2OGR: 0 features written in layer 'OTHER_RELATIONS' OCI: Flushing 23 features on layer POINTS OCI: Flushing 99 features on layer LINES OCI: Flushing 27 features on layer POLYGONS OSM: nNodeSelectBetween = 50006 OSM: nNodeSelectIn = 94362 VSI: ~VSIUnixStdioFilesystemHandler() : nTotalBytesRead = 12682608949 (note that I removed some alter table lines for clarity) Mike On 7/23/12 9:16 AM, Smith, Michael ERDC-CRREL-NH michael.sm...@usace.army.mil wrote: Even, It stopped cleanly (no segfault) at 70%. OS is RHEL 6.2 64 bit. Import time was about 340 min. Command was ogr2ogr -progress -f oci oci:user/pass@tns:tmp planet-latest.osm.pbf -lco dim=2 -lco srid=4326 -lco geometry_name=geometry -lco launder=yes I'm rerunning now with the debug log to a file. Mike -- Michael Smith Remote Sensing/GIS Center US Army Corps of Engineers On 7/23/12 7:05 AM, Even Rouault even.roua...@mines-paris.org wrote: Le lundi 23 juillet 2012 12:56:12, Smith, Michael ERDC-RDE-CRREL-NH a écrit : I'm finding that the new OSM Driver (I tested again with r24699) has a problem when working with the whole planet file. When I tried with the US Northeast subset, I got multipolygons and multilinestring entries. When reading the whole planet file, I did not. It gets to 70% and then ends (but without an error message). I also got fewer polygons than I was expecting. It seems like the reading got interrupted by some non reported error. I was writing to Oracle for this importing but got the same results writing to sqlite. It seems that smaller extracts work fine but the are some reading issues with the whole planet file (in pbf format). I can try with the .osm format. I didn't try yet with whole planet files. Takes too much time :-) Which command line did you use exactly ? Did it stop cleanly or with a segfault ? In the latter case, (assuming you are on Linux), running under gdb might be useful. What is your OS, 32/64 bit ? Perhaps, you could add --debug on. I'd suggest redirecting standard error file to a file because the log file can be huge. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev