Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-08-01 Thread Rahkonen Jukka
Even Rouault wrote:
 
 
  Another set of tests with a brand new and quite powerful laptop.
   Specs for the
  computer:
  Intel i7-2760QM @2.4 GHz processor (8 threads) Hitachi Travelstar
  Z7K320 7200 rpm SATA disk
  8 GB of memory
  Windows 7, 64-bit
 
  GDAL-version r24717, Win64 build from gisinternals.com
 
  Timings for germany.osm.pbf (1.3 GB)
  
 
  A) Default settings with command
  ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite germany.osm.pbf
  -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS OFF
 
  - reading the data   67 minutes
  - creating spatial indexes   38 minutes
  - total 105 minutes
 
  B) Using in-memory Spatialite db for the first step by giving SET
  OSM_MAX_TMPFILE_SIZE=7000
 
  - reading the data  16 minutes
  - creating spatial indexes  38 minutes
  - total 54 minutes
 
  Peak memory usage during this conversion was 4.4 GB.
 
  Conclusions
  ===
  * The initial reading of data is heavily i/o bound. This phase is
  really fast if there is enough memory for keeping the OSM tempfile in
  memory but SSD disk seems to offer equally good performance.
  * Creating spatial indexes for the Spatialite tables is also i/o
  bound. The hardware sets the speed limit and there are no other tricks
  for improving the performance. Multi-core CPU is quite idle during
  this phase with 10-15% load.
  * If user does not plan to do spatial queries then then it may be
  handy to save some time and create the Spatialite db without spatial
  indexes by using -lco SPATIAL_INDEX=NO option.
  * Windows disk i/o may be a limiting factor.
 
  I consider that for small OSM datasets the speed starts to be good
  enough. For me it is about the same if converting the Finnish OSM data
  (137 MB in .pbf format) takes 160 or 140 seconds when using the
  default settings or in-memory temporary database, respectively.
 
 Interesting findings.
 
 A SSD is of course the ideal hardware to get efficient random access to the
 nodes.
 
 I've just introduced inr 24719 a new config. option OSM_COMPRESS_NODES
 that can be set to YES. The effect is to use a compression algorithm while
 storing the temporary node DB.  This can compress to a factor of 3 or 4, and
 help keeping the node DB to a size where it is below the RAM size and that
 the OS can dramatically cache it (at least on Linux). This can be efficient 
 for
 OSM extracts of the size of the country, but probably not for a planet file. 
 In
 the case of Germany and France, here's the effect on my PC (SATA disk) :
 
 $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf
 - progress --config OSM_COMPRESS_NODES YES [...]
 real25m34.029s
 user15m11.530s
 sys 0m36.470s
 
 $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf
 - progress --config OSM_COMPRESS_NODES NO [...]
 real74m33.077s
 user15m38.570s
 sys 1m31.720s
 
 $ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf -
 progress --config OSM_COMPRESS_NODES YES [...]
 real7m46.594s
 user7m24.990s
 sys 0m11.880s
 
 $ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf -
 progress --config OSM_COMPRESS_NODES NO [...]
 real108m48.967s
 user7m47.970s
 sys 2m9.310s
 
 I didn't turn it to YES by default, because I'm unsure of the performance
 impact on SSD. Perhaps you have a chance to test.

I cannot test with SSD before weekend but otherwise the new configuration 
option really makes difference in some circumstances.

I have ended up to use the following base command  in speed tests:
ogr2ogr -f SQLite -dsco spatialite=yes germany.sqlite germany.osm.pbf -gt 2 
-progress --config OGR_SQLITE_SYNCHRONOUS -lco SPATIAL_INDEX=NO

Writing into Spatialite is pretty fast with these options and even your null 
driver does not seem to be very much faster. What happens after this step (like 
creating indexes) has nothing to do with OSM driver.

Test with  Intel i7-2760QM @2.4 GHz processor,  7200 rpm SATA disk and 1.3 GB 
input file 'germany.osm.pbf' 
--config OSM_COMPRESS_NODES NO
67 minutes
--config OSM_COMPRESS_NODES YES
15 minutes

It means 52 minutes less time or 4.5 times more speed.
Out of curiosity I tried what happens if I do the whole file input/output by 
using a 2.5 external USB 2.0 drive.
19 minutes! 

I made also a few tests with an old and much slower Windows computer.   Running 
osm2pgsql with Finnish OSM data with that machine takes nowadays about 3 hours.

Test with a single Intel Xeon @2.4 GHz processor and the same external USB 2.0 
disk than in previous test
Input file 'finland.osm.pbf' 122 MB
Result: 7 minutes for both  OSM_COMPRESS_NODES NO and OSM_COMPRESS_NODES YES
Input file 'germany.osm.pbf' 1.3 GB
Result: 112 minutes with OSM_COMPRESS_NODES YES

Conclusions:
* When input file has reached the limit where disk i/o  cannot utilize cache 
properly, the compress_nodes setting 

Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-08-01 Thread Even Rouault
Selon Rahkonen Jukka jukka.rahko...@mmmtike.fi:

Interesting results. I'll wait a bit for your tests with SSD to turn
OSM_COMPRESS_NODES to YES. Even if doesn't bring clear advantages, I don't think
it would hurt a lot, because the extra CPU load introduced by the
compression/decompression shouldn't be that high (the compression algorithm used
is just encoding in Protocol Buffer of the differences of longitude/latitude
between consecutive nodes, by chunk of 64 nodes)

Just a word of caution to remind you that the temporary node DB will be written
in the directory pointed by the CPL_TMPDIR config. option/env. variable if
defined, if not defined in TMPDIR, if not defined in TEMP, if not defined in the
current directory form which ogr2ogr is started. In Windows system, the TEMP
env. variable is generally defined, so when you test with your USB external
driver, it is very likely that the node DB is written in the temporary directory
associated with your Windows account.

As far as CPU load is concerned, the conversion is a single-threaded processus,
so on a 8 core system, it is expected that it tops at 100 / 8 = 12,5 % of the
global CPU power. With which hardware configuration and input PBF file do you
manage to reach 100% CPU ? Is that load constant during the process : I imagine
that it could change according to the stage of the conversion.
There might be a potential for parallelizing some stuff. What comes in mind for
now would be PBF decoding (when profiling only PBF decoding, the gzip
decompression is the major CPU user, but not sure if it matters that much in a
real-life ogr2ogr job) or way resolution (currently, we group ways into a batch
until 1 million nodes or 75 000 ways have to be resolved, which leads to more
efficient search in the node DB since we can sort nodes by increasing ids and
avoid useless seeks). But that's not obviously immediate which would lead to
increased efficiently. Parallelization may also generally requires more RAM if
you need work buffers for each thread.

 Even Rouault wrote:
 
 
   Another set of tests with a brand new and quite powerful laptop.
Specs for the
   computer:
   Intel i7-2760QM @2.4 GHz processor (8 threads) Hitachi Travelstar
   Z7K320 7200 rpm SATA disk
   8 GB of memory
   Windows 7, 64-bit
  
   GDAL-version r24717, Win64 build from gisinternals.com
  
   Timings for germany.osm.pbf (1.3 GB)
   
  
   A) Default settings with command
   ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite germany.osm.pbf
   -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS OFF
  
   - reading the data   67 minutes
   - creating spatial indexes   38 minutes
   - total 105 minutes
  
   B) Using in-memory Spatialite db for the first step by giving SET
   OSM_MAX_TMPFILE_SIZE=7000
  
   - reading the data  16 minutes
   - creating spatial indexes  38 minutes
   - total 54 minutes
  
   Peak memory usage during this conversion was 4.4 GB.
  
   Conclusions
   ===
   * The initial reading of data is heavily i/o bound. This phase is
   really fast if there is enough memory for keeping the OSM tempfile in
   memory but SSD disk seems to offer equally good performance.
   * Creating spatial indexes for the Spatialite tables is also i/o
   bound. The hardware sets the speed limit and there are no other tricks
   for improving the performance. Multi-core CPU is quite idle during
   this phase with 10-15% load.
   * If user does not plan to do spatial queries then then it may be
   handy to save some time and create the Spatialite db without spatial
   indexes by using -lco SPATIAL_INDEX=NO option.
   * Windows disk i/o may be a limiting factor.
  
   I consider that for small OSM datasets the speed starts to be good
   enough. For me it is about the same if converting the Finnish OSM data
   (137 MB in .pbf format) takes 160 or 140 seconds when using the
   default settings or in-memory temporary database, respectively.
 
  Interesting findings.
 
  A SSD is of course the ideal hardware to get efficient random access to the
  nodes.
 
  I've just introduced inr 24719 a new config. option OSM_COMPRESS_NODES
  that can be set to YES. The effect is to use a compression algorithm while
  storing the temporary node DB.  This can compress to a factor of 3 or 4,
 and
  help keeping the node DB to a size where it is below the RAM size and that
  the OS can dramatically cache it (at least on Linux). This can be efficient
 for
  OSM extracts of the size of the country, but probably not for a planet
 file. In
  the case of Germany and France, here's the effect on my PC (SATA disk) :
 
  $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf
  - progress --config OSM_COMPRESS_NODES YES [...]
  real25m34.029s
  user15m11.530s
  sys 0m36.470s
 
  $ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf
  - progress --config 

Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-08-01 Thread Rahkonen Jukka
Hi,

Right, temporary DB was written on system disk so I repeated the test. Now 
everything was for sure done on the same USB 2.0 disk (read pbf, write results 
to Spatialite and handle temporaty DB). It took a bit longer but difference was 
not very big: 26 minutes vs. 19 minutes when temporary DB was on the system 
disk.

CPU stays close to 100% on single processor XP throughout the conversion.  Same 
thing with a laptop having a two-core processor and running on 32-bit Vista, 
both cores seem burn at full power.

I made a rough parallelizing test by making 4 copies of finland.osm.pbf and 
running ogr2ogr in four separate windows.  This way the total CPU load of the 8 
cores was staying around 50%. 
Result: All four conversions were ready after 3 minutes (45 seconds per 
conversion) while a single conversion takes 2 minutes.
Conclusion: 4 parellel  conversions in 3 minutes vs. within 8 minutes if 
performed as serial runs is much faster. 50% CPU load may tell that the speed 
of SATA disk is the limiting factor now.  Test with SSD drive should give more 
information about this.

I had a try also with 6 parellel runs but that was slower with all runs ready 
in 6 minutes which makes 60 seconds per conversion. With 8 runs computer jammed 
when all the progress bars were at 95%. 

Result was not a surprise because my experience about doing image processing 
with gdalwarp and gdal_translate with an 8-core server is that I can get the 
maximum throughput with our hardware by running processes in 4-6 windows.  If 
there are too many image conversions going on they start to disturb each other 
because disks cannot serve them all properly. However that computer behaves 
better when running conversions in 6 or more windows than this  laptop. Somehow 
it feels like the laptop has only 4 real processors/cores even the resource 
manager is showing eight.

I believe that by parallelizing the conversion program it is hard to take the 
juice as effectively from all the cores.

It may be difficult to feed rendering chain by having a bunch of source 
databases but it looks strongly that by splitting Germany into four distinct 
OSM source files it would be possible to import the whole country in 15 minutes 
with a good laptop.  Size of the OSM planet file is under 20 GB. Simple 
calculation suggests that importing the whole planet might be possible to do in 
5 hours. With a laptop. Who will make a try?

-Jukka-

 Even Rouault wrote:
 
 Selon Rahkonen Jukka jukka.rahko...@mmmtike.fi:
 
 Interesting results. I'll wait a bit for your tests with SSD to turn
 OSM_COMPRESS_NODES to YES. Even if doesn't bring clear advantages, I
 don't think it would hurt a lot, because the extra CPU load introduced by the
 compression/decompression shouldn't be that high (the compression
 algorithm used is just encoding in Protocol Buffer of the differences of
 longitude/latitude between consecutive nodes, by chunk of 64 nodes)
 
 Just a word of caution to remind you that the temporary node DB will be
 written in the directory pointed by the CPL_TMPDIR config. option/env.
 variable if defined, if not defined in TMPDIR, if not defined in TEMP, if not
 defined in the current directory form which ogr2ogr is started. In Windows
 system, the TEMP env. variable is generally defined, so when you test with
 your USB external driver, it is very likely that the node DB is written in the
 temporary directory associated with your Windows account.
 
 As far as CPU load is concerned, the conversion is a single-threaded
 processus, so on a 8 core system, it is expected that it tops at 100 / 8 = 
 12,5 %
 of the global CPU power. With which hardware configuration and input PBF
 file do you manage to reach 100% CPU ? Is that load constant during the
 process : I imagine that it could change according to the stage of the
 conversion.
 There might be a potential for parallelizing some stuff. What comes in mind
 for now would be PBF decoding (when profiling only PBF decoding, the gzip
 decompression is the major CPU user, but not sure if it matters that much in
 a real-life ogr2ogr job) or way resolution (currently, we group ways into a
 batch until 1 million nodes or 75 000 ways have to be resolved, which leads to
 more efficient search in the node DB since we can sort nodes by increasing
 ids and avoid useless seeks). But that's not obviously immediate which would
 lead to increased efficiently. Parallelization may also generally requires 
 more
 RAM if you need work buffers for each thread.
 
  Even Rouault wrote:
  
  
Another set of tests with a brand new and quite powerful laptop.
 Specs for the
computer:
Intel i7-2760QM @2.4 GHz processor (8 threads) Hitachi Travelstar
Z7K320 7200 rpm SATA disk
8 GB of memory
Windows 7, 64-bit
   
GDAL-version r24717, Win64 build from gisinternals.com
   
Timings for germany.osm.pbf (1.3 GB)

   
A) Default settings with command
ogr2ogr -f 

Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-08-01 Thread Even Rouault

 I made a rough parallelizing test by making 4 copies of finland.osm.pbf and
 running ogr2ogr in four separate windows.  This way the total CPU load of the
 8 cores was staying around 50%.
 Result: All four conversions were ready after 3 minutes (45 seconds per
 conversion) while a single conversion takes 2 minutes.

In my opinion, 45 seconds per conversion isn't really a good summary : I'd say
that your computer could handle 4 conversions in parallel in 3 minutes. But the
fact of running conversions in parallel didn't make them *individually* faster
(that would be non-sense) that running a single one. We probably agree, that's
just the way of presenting the info that is a bit strange.

 Conclusion: 4 parellel  conversions in 3 minutes vs. within 8 minutes if
 performed as serial runs is much faster. 50% CPU load may tell that the speed
 of SATA disk is the limiting factor now.  Test with SSD drive should give
 more information about this.

Yes at some point the disk is the limiting factor whatever the number of CPUs
you have.

 Somehow it feels like the laptop has only 4 real processors/cores
 even the resource manager is showing eight.

I've not followed what the CPU state-of-the-art is currently, but perhaps it is
a quad-core with hyper-theading ? The hyper-threaded virtual cores wouldn't be
as efficient as normal cores.


 I believe that by parallelizing the conversion program it is hard to take the
 juice as effectively from all the cores.

Yes, if you parallelize I/O operations, then there's a risk that it makes it
slower actually. Only the CPU intensive operations should be parallelized to
limit that risk. But when reading OSM data, there isn't that much computation
involved. Way resolving is somehow stupid and mostly aobut I/O after all. Only
the resolving of multipolygons might involve CPU intensive operations to compute
the spatial relation between rings, but that's a tiny amount of the data of a
OSM file, and even if it is slow, it is perhaps 10 or 20% of the global
conversion time.


 It may be difficult to feed rendering chain by having a bunch of source
 databases but it looks strongly that by splitting Germany into four distinct
 OSM source files it would be possible to import the whole country in 15
 minutes with a good laptop.

I still maintain that splitting a file is a non trivial task. I strongly believe
that to do so, you must import the whole country and do spatial requests
afterwards. So, if the data producer doesn't do it for you, there's no point in
doing it at your end. However if you get it splitted , then it might indeed be
beneficial to operate on smaller extracts. (With a risk of some duplicated
and/or truncated and/or missing objects at the border of the tiles)








___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-08-01 Thread Rahkonen Jukka
Even Rouault wrote:


 I made a rough parallelizing test by making 4 copies of finland.osm.pbf and
 running ogr2ogr in four separate windows.  This way the total CPU load of the
 8 cores was staying around 50%.
 Result: All four conversions were ready after 3 minutes (45 seconds per
 conversion) while a single conversion takes 2 minutes.

 In my opinion, 45 seconds per conversion isn't really a good summary : I'd 
 say
 that your computer could handle 4 conversions in parallel in 3 minutes. But 
 the
 fact of running conversions in parallel didn't make them *individually* faster
 (that would be non-sense) that running a single one. We probably agree, that's
 just the way of presenting the info that is a bit strange.

Ok, let's use other units. Some suggestions:
- data process rate as MB/sec or MB/minute (input sile size in pbf format)
- node conversion rate nodes/sec
- way or feature conversion rate as count/sec

None of them is a perfect speed unit. Nodes/sec feels most exact but practical 
speed 
depends on the nature of data, especially on the amount of relations and 
how complicated they are. Megabytes of pbf data per minute could be 
rather good measure too. In my single process vs. four parallel processes
example the conversion rates were 60 MB/minute vs. 160 MB/minute, 
respectively. By looking at file sizes in 
http://download.geofabrik.de/osm/europe/
one can make a fast estimate that converting 300 MB of data from Spain should
take about 5 minutes. With parallel runs Finland, Sweden and Norway would 
also be ready at the same time without any cost.

..
 It may be difficult to feed rendering chain by having a bunch of source
 databases but it looks strongly that by splitting Germany into four distinct
 OSM source files it would be possible to import the whole country in 15
 minutes with a good laptop.

 I still maintain that splitting a file is a non trivial task. I strongly 
 believe
 that to do so, you must import the whole country and do spatial requests
 afterwards. So, if the data producer doesn't do it for you, there's no point 
 in
 doing it at your end. However if you get it splitted , then it might indeed be
 beneficial to operate on smaller extracts. (With a risk of some duplicated
 and/or truncated and/or missing objects at the border of the tiles)

I agree. Splitting OSM data files on the client side was my ancient idea from 
more 
than a week ago. It does not make sense nowadays. Data should come splitted
from the data producer. It would need some thinking about how to split the
data so that there would not be troubles at the data set seams.  This GSoC 
project seems to aim at something similar
http://wiki.openstreetmap.org/wiki/Google_Summer_of_Code/2012/Data_Tile_Service

-Jukka-
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-31 Thread Even Rouault

 Another set of tests with a brand new and quite powerful laptop.
  Specs for the
 computer:
 Intel i7-2760QM @2.4 GHz processor (8 threads)
 Hitachi Travelstar Z7K320 7200 rpm SATA disk
 8 GB of memory
 Windows 7, 64-bit
 
 GDAL-version r24717, Win64 build from gisinternals.com
 
 Timings for germany.osm.pbf (1.3 GB)
 
 
 A) Default settings with command
 ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite
 germany.osm.pbf -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS OFF
 
 - reading the data   67 minutes
 - creating spatial indexes   38 minutes
 - total 105 minutes
 
 B) Using in-memory Spatialite db for the first step by giving
 SET OSM_MAX_TMPFILE_SIZE=7000
 
 - reading the data  16 minutes
 - creating spatial indexes  38 minutes
 - total 54 minutes
 
 Peak memory usage during this conversion was 4.4 GB.
 
 Conclusions
 ===
 * The initial reading of data is heavily i/o bound. This phase
 is really fast if there is enough memory for keeping the OSM
 tempfile in memory but SSD disk seems to offer equally good
 performance.
 * Creating spatial indexes for the Spatialite tables is also
 i/o bound. The hardware sets the speed limit and there are
 no other tricks for improving the performance. Multi-core
 CPU is quite idle during this phase with 10-15% load.
 * If user does not plan to do spatial queries then then it
 may be handy to save some time and create the Spatialite db
 without spatial indexes by using -lco SPATIAL_INDEX=NO option.
 * Windows disk i/o may be a limiting factor.
 
 I consider that for small OSM datasets the speed starts to be
 good enough. For me it is about the same if converting the
 Finnish OSM data (137 MB in .pbf format) takes 160 or 140
 seconds when using the default settings or in-memory temporary
 database, respectively.

Interesting findings.

A SSD is of course the ideal hardware to get efficient random access to the 
nodes.

I've just introduced inr 24719 a new config. option OSM_COMPRESS_NODES that can 
be set to YES. The effect is to use a compression algorithm while storing the 
temporary node DB.  This can compress to a factor of 3 or 4, and help keeping 
the node DB to a size where it is below the RAM size and that the OS can 
dramatically cache it (at least on Linux). This can be efficient for OSM 
extracts of the size of the country, but probably not for a planet file. In the 
case of Germany and France, here's the effect on my PC (SATA disk) :

$ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf -
progress --config OSM_COMPRESS_NODES YES
[...]
real25m34.029s
user15m11.530s
sys 0m36.470s

$ time ogr2ogr -f null null /home/even/gdal/data/osm/france_new.osm.pbf -
progress --config OSM_COMPRESS_NODES NO
[...]
real74m33.077s
user15m38.570s
sys 1m31.720s

$ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf -progress 
--config OSM_COMPRESS_NODES YES
[...]
real7m46.594s
user7m24.990s
sys 0m11.880s

$ time ogr2ogr -f null null /home/even/gdal/data/osm/germany.osm.pbf -progress 
--config OSM_COMPRESS_NODES NO
[...]
real108m48.967s
user7m47.970s
sys 2m9.310s

I didn't turn it to YES by default, because I'm unsure of the performance 
impact on SSD. Perhaps you have a chance to test.



___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-30 Thread Jukka Rahkonen
Jukka Rahkonen jukka.rahkonen at mmmtike.fi writes:


 I borrowed my son's computer and made one more test. Important
 numbers about the
 computer:
 Windows 7, 64 -bit
 Four-core Intel i5 2500k @3,3 GHz
 SSD disk
 
 Timings for germany.osm.pbf by using  -lco SPATIAL_INDEX=NO. Times are 
total
 times from the beginning of the test
 
 70% progress  - 5 minutes (I suppose resolving ways begins about at this 
 phase)
 100 % progress - 17 minutes
 manual index creation for all the layers ready - 45 minutes
 
 Results are interesting. Import to Spatialite without indexing was 
six times
 faster for me and I suppose it is mostly because of the SSD drive.
 But creating
 indexes took me 30 minutes while you timing was 22 minutes. 
Perhaps there is
 something sub-optimal in combination of 
Windows/Spatialite/Create spatial index.
 
 Anyway, the score for germany.osm.pbf is now 45 minutes. Someone
 with Linux and
 SSD drive is perhaps the one to beat the record.
 
 For comparison, converting finland.osm.pbf took 2 min 40 sec.
 Converting Germany
 took 20 times more time and I believe that the relation is OK now.

Another set of tests with a brand new and quite powerful laptop.
 Specs for the
computer:
Intel i7-2760QM @2.4 GHz processor (8 threads)
Hitachi Travelstar Z7K320 7200 rpm SATA disk
8 GB of memory
Windows 7, 64-bit

GDAL-version r24717, Win64 build from gisinternals.com

Timings for germany.osm.pbf (1.3 GB)


A) Default settings with command
ogr2ogr -f sqlite -dsco spatialite=yes germany.sqlite 
germany.osm.pbf -gt 2 -progress --config OGR_SQLITE_SYNCHRONOUS OFF

- reading the data   67 minutes
- creating spatial indexes   38 minutes
- total 105 minutes

B) Using in-memory Spatialite db for the first step by giving
SET OSM_MAX_TMPFILE_SIZE=7000

- reading the data  16 minutes
- creating spatial indexes  38 minutes
- total 54 minutes

Peak memory usage during this conversion was 4.4 GB.

Conclusions
===
* The initial reading of data is heavily i/o bound. This phase 
is really fast if there is enough memory for keeping the OSM 
tempfile in memory but SSD disk seems to offer equally good 
performance.
* Creating spatial indexes for the Spatialite tables is also 
i/o bound. The hardware sets the speed limit and there are 
no other tricks for improving the performance. Multi-core 
CPU is quite idle during this phase with 10-15% load.
* If user does not plan to do spatial queries then then it 
may be handy to save some time and create the Spatialite db 
without spatial indexes by using -lco SPATIAL_INDEX=NO option.
* Windows disk i/o may be a limiting factor.

I consider that for small OSM datasets the speed starts to be 
good enough. For me it is about the same if converting the 
Finnish OSM data (137 MB in .pbf format) takes 160 or 140 
seconds when using the default settings or in-memory temporary 
database, respectively.  

-Jukka Rahkonen-




___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-29 Thread Even Rouault
Le samedi 28 juillet 2012 13:39:48, Jukka Rahkonen a écrit :
 Even Rouault even.rouault at mines-paris.org writes:
  I've commited in r24707 a change that is mainly a custom indexation
  mechanism for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to
  improve performances (Improve them about by a factor of 2 on a 1 GB PBF
  on my PC)
 
 I had a try with finland.osm.pbf and germany.osm.pbf with Windows 64-bit
 binaries containing that change. Conversion of the Finnish OSM data with
 ogr2ogr and the default osmconf.ini into Spatialite format took about 5
 minutes and it was a minute or two faster than it used to be. Conversion
 of German data took 17 hours and it was a about as slow as before.

Yes, the performance improvement isn't so obvious when I/O is the limiting 
factor.

However, the performance on germany.osm.pbf seemed very slow on your PC, but 
after testing on mine it takes ~9 hours, which seemed too slow since a 
conversion for the full planet-latest.osm.pbf (17 GB)  into null (this is a 
debug output driver, not compiled by default, that doesn't write anything) has 
taken ~ 30h (which, while looking at 
http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks, isn't particularly 
bad)

After investigations, most of the slowdown it is due to the building of the 
spatial index of the output spatialite DB. When the spatial index is created 
at DB initialization, and updated at each feature insertion, the performance 
is clearly affected. For example, when adding  -lco SPATIAL_INDEX=NO to the 
command line, the conversion of germany only takes 2 hours. Adding manually 
the spatial index at the end with ogrinfo the.db -sql SELECT 
CreateSpatialIndex('points', 'GEOMETRY') (and the same for lines, polygons, 
multilines, multipolygons, other_relations) takes ~ 22 minutes, so overall,  
this is 4 times faster.

In r24715, I've implemented defered spatial index creation. And indeed the 
whole process takes now ~ 2h20.

 I guess it may be the output to spatialite format that gets so slow when
 database size gets bigger. CPU usage was only couple of percents during
 the last 10 hours and process took only 100-200 MB of memory.
 What other output format could you recommend for testing?

I don't think the output format would change performance so much. What takes 
time is disk seeking to get nodes to build way geometries, or to get ways to 
build multi geometries. So having RAID disks might help. The writing of the 
output data might certainly reduce the efficiency of OS I/O caching, but except 
if an output format is particularly verbose comparing to others, that should 
have little influence.

What can speed-up things is to have lots of RAM and specify a huge value for 
OSM_MAX_TMPFILE_SIZE. Typically this would be 4 times the size of the PBF. 
However if the temp file(s) doesn't fit entirely into that size, this will not 
bring any advantage.


 
 -Jukka Rahkonen-
 
 ___
 gdal-dev mailing list
 gdal-dev@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/gdal-dev
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-29 Thread Jukka Rahkonen
Even Rouault even.rouault at mines-paris.org writes:


 
 However, the performance on germany.osm.pbf seemed very slow on your PC, but 
 after testing on mine it takes ~9 hours, which seemed too slow since a 
 conversion for the full planet-latest.osm.pbf (17 GB)  into null (this is a 
 debug output driver, not compiled by default, that doesn't write anything) 
 has 
 taken ~ 30h (which, while looking at 
 http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks, isn't particularly 
 bad)
 
 After investigations, most of the slowdown it is due to the building of the 
 spatial index of the output spatialite DB. When the spatial index is created 
 at DB initialization, and updated at each feature insertion, the performance 
 is clearly affected. For example, when adding  -lco SPATIAL_INDEX=NO to the 
 command line, the conversion of germany only takes 2 hours. Adding manually 
 the spatial index at the end with ogrinfo the.db -sql SELECT 
 CreateSpatialIndex('points', 'GEOMETRY') (and the same for lines, polygons, 
 multilines, multipolygons, other_relations) takes ~ 22 minutes, so overall,  
 this is 4 times faster.
 
 In r24715, I've implemented defered spatial index creation. And indeed the 
 whole process takes now ~ 2h20.

Hi,

I borrowed my son's computer and made one more test. Important numbers about the
computer:
Windows 7, 64 -bit
Four-core Intel i5 2500k @3,3 GHz
SSD disk

Timings for germany.osm.pbf by using  -lco SPATIAL_INDEX=NO. Times are total
times from the beginning of the test

70% progress  - 5 minutes (I suppose resolving ways begins about at this phase)
100 % progress - 17 minutes
manual index creation for all the layers ready - 45 minutes

Results are interesting. Import to Spatialite without indexing was six times
faster for me and I suppose it is mostly because of the SSD drive. But creating
indexes took me 30 minutes while you timing was 22 minutes. Perhaps there is
something sub-optimal in combination of Windows/Spatialite/Create spatial index.

Anyway, the score for germany.osm.pbf is now 45 minutes. Someone with Linux and
SSD drive is perhaps the one to beat the record.

For comparison, converting finland.osm.pbf took 2 min 40 sec. Converting Germany
took 20 times more time and I believe that the relation is OK now.

-Jukka Rahkonen-



___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-28 Thread Jukka Rahkonen
Even Rouault even.rouault at mines-paris.org writes:


 
 I've commited in r24707 a change that is mainly a custom indexation mechanism 
 for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to improve 
 performances (Improve them about by a factor of 2 on a 1 GB PBF on my PC) 

I had a try with finland.osm.pbf and germany.osm.pbf with Windows 64-bit 
binaries containing that change. Conversion of the Finnish OSM data with 
ogr2ogr and the default osmconf.ini into Spatialite format took about 5 minutes 
and it was a minute or two faster than it used to be. Conversion of German data
took 17 hours and it was a about as slow as before.
I guess it may be the output to spatialite format that gets so slow when 
database size gets bigger. CPU usage was only couple of percents during 
the last 10 hours and process took only 100-200 MB of memory.
What other output format could you recommend for testing?

-Jukka Rahkonen-

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-24 Thread Even Rouault
Le lundi 23 juillet 2012 19:25:22, Smith, Michael ERDC-CRREL-NH a écrit :
 Even,
 
 [osmusr@bigserver-proc osm]$ ogr2ogr -progress -f oci
 oci:user/pass@tns:tmp planet-latest.osm.pbf -lco dim=2 -lco srid=4326 -lco
 geometry_name=geometry -lco launder=yes --debug on  2osm_debug.log
 0...10...20...30...40...50...60Š70
 [osmusr@bigserver-proc osm]$
 
 
 
 From the debug output
 

Michael,

The debug output would suggest that there was no more data to process, which 
is strange. I've tested a bit with a planet file dating back to a few weeks, 
with a modified OSM driver that does basically no processing except the 
parsing, and it managed to parse until the end of file. So in your situation, 
I'd assume that there was a parsing error, but I'm not 100% positive (might be 
something wrong in the interleaved reading mode ?)

I've commited in r24707 a change that is mainly a custom indexation mechanism 
for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to improve 
performances (Improve them about by a factor of 2 on a 1 GB PBF on my PC) 
Along with that change, I've added some facility for extra error outputs. If a 
parsing error occured, an error message will be printed. And, before 
recompiling, you can edit ogr/ogrsf_frmts/osm/gpb.h and uncomment (by removing 
the // at the beginning of //#define DEBUG_GPB_ERRORS) line 40. This should 
report a more precise error if there's something wrong during the GPB parsing.
You might also retry with --debug OSM and, at the end of the processing, 
you'll see a trace Number of bytes read in file : XXX : you can check 
that the value is the same as the size of the PBF file.

Even
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-24 Thread Smith, Michael ERDC-RDE-CRREL-NH
OK, I'll retest with these changes.

Thanks!

Mike

On 7/24/12 6:08 PM, Even Rouault even.roua...@mines-paris.org wrote:

Le lundi 23 juillet 2012 19:25:22, Smith, Michael ERDC-CRREL-NH a écrit :
 Even,
 
 [osmusr@bigserver-proc osm]$ ogr2ogr -progress -f oci
 oci:user/pass@tns:tmp planet-latest.osm.pbf -lco dim=2 -lco srid=4326
-lco
 geometry_name=geometry -lco launder=yes --debug on  2osm_debug.log
 0...10...20...30...40...50...60Š70
 [osmusr@bigserver-proc osm]$
 
 
 
 From the debug output
 

Michael,

The debug output would suggest that there was no more data to process,
which 
is strange. I've tested a bit with a planet file dating back to a few
weeks, 
with a modified OSM driver that does basically no processing except the
parsing, and it managed to parse until the end of file. So in your
situation, 
I'd assume that there was a parsing error, but I'm not 100% positive
(might be 
something wrong in the interleaved reading mode ?)

I've commited in r24707 a change that is mainly a custom indexation
mechanism 
for nodes (can be disabled with OSM_USE_CUSTOM_INDEXING=NO) to improve
performances (Improve them about by a factor of 2 on a 1 GB PBF on my PC)
Along with that change, I've added some facility for extra error outputs.
If a 
parsing error occured, an error message will be printed. And, before
recompiling, you can edit ogr/ogrsf_frmts/osm/gpb.h and uncomment (by
removing 
the // at the beginning of //#define DEBUG_GPB_ERRORS) line 40. This
should 
report a more precise error if there's something wrong during the GPB
parsing.
You might also retry with --debug OSM and, at the end of the processing,
you'll see a trace Number of bytes read in file : XXX : you can
check 
that the value is the same as the size of the PBF file.

Even

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


[gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-23 Thread Smith, Michael ERDC-RDE-CRREL-NH
I'm finding that the new OSM Driver (I tested again with r24699) has a problem 
when working with the whole planet file. When I tried with the US Northeast 
subset, I got multipolygons and multilinestring entries. When reading the whole 
planet file, I did not. It gets to 70% and then ends (but without an error 
message). I also got fewer polygons than I was expecting. It seems like the 
reading got interrupted by some non reported error.

I was writing to Oracle for this importing but got the same results writing to 
sqlite. It seems that smaller extracts work fine but the are some reading 
issues with the whole planet file (in pbf format). I can try with the .osm 
format.




___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-23 Thread Even Rouault
Le lundi 23 juillet 2012 12:56:12, Smith, Michael ERDC-RDE-CRREL-NH a écrit :
 I'm finding that the new OSM Driver (I tested again with r24699) has a
 problem when working with the whole planet file. When I tried with the US
 Northeast subset, I got multipolygons and multilinestring entries. When
 reading the whole planet file, I did not. It gets to 70% and then ends
 (but without an error message). I also got fewer polygons than I was
 expecting. It seems like the reading got interrupted by some non reported
 error.
 
 I was writing to Oracle for this importing but got the same results writing
 to sqlite. It seems that smaller extracts work fine but the are some
 reading issues with the whole planet file (in pbf format). I can try with
 the .osm format.

I didn't try yet with whole planet files. Takes too much time :-)

Which command line did you use exactly ?

Did it stop cleanly or with a segfault ? In the latter case, (assuming you are 
on Linux), running under gdb might be useful.

What is your OS, 32/64 bit ? Perhaps, you could add --debug on. I'd suggest 
redirecting standard error file to a file because the log file can be huge.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-23 Thread Smith, Michael ERDC-CRREL-NH
Even,

It stopped cleanly (no segfault) at 70%. OS is RHEL 6.2 64 bit. Import
time was about 340 min.

Command was 

ogr2ogr -progress -f oci oci:user/pass@tns:tmp planet-latest.osm.pbf -lco
dim=2 -lco srid=4326 -lco geometry_name=geometry -lco launder=yes

I'm rerunning now with the debug log to a file.

Mike

-- 
Michael Smith

Remote Sensing/GIS Center
US Army Corps of Engineers



On 7/23/12  7:05 AM, Even Rouault even.roua...@mines-paris.org wrote:

Le lundi 23 juillet 2012 12:56:12, Smith, Michael ERDC-RDE-CRREL-NH a
écrit :
 I'm finding that the new OSM Driver (I tested again with r24699) has a
 problem when working with the whole planet file. When I tried with the
US
 Northeast subset, I got multipolygons and multilinestring entries. When
 reading the whole planet file, I did not. It gets to 70% and then ends
 (but without an error message). I also got fewer polygons than I was
 expecting. It seems like the reading got interrupted by some non
reported
 error.
 
 I was writing to Oracle for this importing but got the same results
writing
 to sqlite. It seems that smaller extracts work fine but the are some
 reading issues with the whole planet file (in pbf format). I can try
with
 the .osm format.

I didn't try yet with whole planet files. Takes too much time :-)

Which command line did you use exactly ?

Did it stop cleanly or with a segfault ? In the latter case, (assuming
you are 
on Linux), running under gdb might be useful.

What is your OS, 32/64 bit ? Perhaps, you could add --debug on. I'd
suggest 
redirecting standard error file to a file because the log file can be
huge.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] OSM Driver and World Planet file (pbf format)

2012-07-23 Thread Smith, Michael ERDC-CRREL-NH
Even, 

[osmusr@bigserver-proc osm]$ ogr2ogr -progress -f oci
oci:user/pass@tns:tmp planet-latest.osm.pbf -lco dim=2 -lco srid=4326 -lco
geometry_name=geometry -lco launder=yes --debug on  2osm_debug.log
0...10...20...30...40...50...60Š70
[osmusr@bigserver-proc osm]$



From the debug output

OCI: Flushing 100 features on layer POLYGONS
OCI: Flushing 100 features on layer POLYGONS
OCI: Flushing 100 features on layer POLYGONS
OCI: Flushing 100 features on layer POLYGONS
OCI: Flushing 100 features on layer POLYGONS
OCI: Flushing 100 features on layer POLYGONS

OCI: Flushing 100 features on layer POLYGONS
OSM: Switching to 'lines' as they are too many features in 'polygons'
OGR2OGR: 32827 features written in layer 'POLYGONS'
OCI: In Create Layer ...
OCI: Prepare(CREATE TABLE MULTILINESTRINGS ( OGR_FID INTEGER, geometry
MDSYS.SDO_GEOMETRY ))
OGR2OGR: 0 features written in layer 'MULTILINESTRINGS'
OCI: In Create Layer ...
OCI: Prepare(CREATE TABLE MULTIPOLYGONS ( OGR_FID INTEGER, geometry
MDSYS.SDO_GEOMETRY ))
OGR2OGR: 0 features written in layer 'MULTIPOLYGONS'
OCI: In Create Layer ...
OCI: Prepare(CREATE TABLE OTHER_RELATIONS ( OGR_FID INTEGER, geometry
MDSYS.SDO_GEOMETRY ))
OGR2OGR: 0 features written in layer 'OTHER_RELATIONS'
OCI: Flushing 23 features on layer POINTS
OCI: Flushing 99 features on layer LINES
OCI: Flushing 27 features on layer POLYGONS
OSM: nNodeSelectBetween = 50006
OSM: nNodeSelectIn = 94362
VSI: ~VSIUnixStdioFilesystemHandler() : nTotalBytesRead = 12682608949


(note that I removed some alter table lines for clarity)


Mike

On 7/23/12  9:16 AM, Smith, Michael ERDC-CRREL-NH
michael.sm...@usace.army.mil wrote:

Even,

It stopped cleanly (no segfault) at 70%. OS is RHEL 6.2 64 bit. Import
time was about 340 min.

Command was 

ogr2ogr -progress -f oci oci:user/pass@tns:tmp planet-latest.osm.pbf -lco
dim=2 -lco srid=4326 -lco geometry_name=geometry -lco launder=yes

I'm rerunning now with the debug log to a file.

Mike

-- 
Michael Smith

Remote Sensing/GIS Center
US Army Corps of Engineers



On 7/23/12  7:05 AM, Even Rouault even.roua...@mines-paris.org wrote:

Le lundi 23 juillet 2012 12:56:12, Smith, Michael ERDC-RDE-CRREL-NH a
écrit :
 I'm finding that the new OSM Driver (I tested again with r24699) has a
 problem when working with the whole planet file. When I tried with the
US
 Northeast subset, I got multipolygons and multilinestring entries. When
 reading the whole planet file, I did not. It gets to 70% and then ends
 (but without an error message). I also got fewer polygons than I was
 expecting. It seems like the reading got interrupted by some non
reported
 error.
 
 I was writing to Oracle for this importing but got the same results
writing
 to sqlite. It seems that smaller extracts work fine but the are some
 reading issues with the whole planet file (in pbf format). I can try
with
 the .osm format.

I didn't try yet with whole planet files. Takes too much time :-)

Which command line did you use exactly ?

Did it stop cleanly or with a segfault ? In the latter case, (assuming
you are 
on Linux), running under gdb might be useful.

What is your OS, 32/64 bit ? Perhaps, you could add --debug on. I'd
suggest 
redirecting standard error file to a file because the log file can be
huge.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev