Hello,

on my day-job I recently had to solve the Problem of setting up a Postgis
database contaning a full-planet extract using one of those cheap 180GB SSD
on a semi-potent machine (only 8GB of RAM).

I first tried to use osm2pgsql for this purpose which is almost impossible
for a couple of reasons:

* I would have needed more than 180GB of disk space (at least during import)
* The import would take a _very_ long time (several days rather than hours)

So I had to look for a backup strategy!

I found one which proved to be that good, that I would like to discuss it
here as an alternative way for rendering-database setup.

The first thing I discovered is the fact, that we currently use roughly twice
the disk-space for intermediate tables (or file in case of flatnode) than
for the processed data itself.

Here is how this looks like on tile.openstreetmap.de:

         relation          | total_size 
---------------------------+------------
 public.planet_osm_line    | 61 GB
 public.planet_osm_polygon | 59 GB
 public.planet_osm_point   | 10 GB
 public.planet_osm_roads   |  8 GB

 public.planet_osm_ways    | 174 GB
 public.planet_osm_rels    |   4 GB

+ flatnode.dat 20 GB

processed data: 61GB+59GB+10GB+8GB=138GB
intermediate data: 20GB+174GB+4GB=202GB

The most annoying part is the 174GB planet_osm_ways table.

So what I did to solve my problem was using pg_dump for the processed data
tables only and setting up my target database using pg_restore.

Advantages:
* Very fast data import even on machines where import using osm2pgsql would
  be practically impossible
* Decent size of database dump (32GB in case of a --hstore-match-only database
  which is about the same size as a planet dump)

Disadvantages:
* Currently no update strategy available
* Will need a "master" osm2pgsql database
* Will "inherit" table scheme from "master" database

Conclusion:

IMO the disadvantages can be resolved or at least mitigated in the following
way:

* we generate a downloadable dump of the database on
  tile.openstreetmap.de (e.g. weekly)
* osm2pgsql needs to be patched to output the changes to the processed data
  tables as SQL commands which can then be used to replicated the slave 
databases
* We already use --hstore-match-only database format. So flexibility in
  table-layout is not that much of a concern as views to hstore column can be 
used
  for rendering instead of tables.

I would like to hear your comments to this proposial.

Regards

Sven

-- 
Threading is a performance hack.
(The Art of Unix Programming by Eric S. Raymond)

/me is giggls@ircnet, http://sven.gegg.us/ on the Web


-- 
"Thinking of using NT for your critical apps?
                                  Isn't there enough suffering in the world?"
                   (Advertisement of Sun Microsystems in Wall Street Journal)
/me is giggls@ircnet, http://sven.gegg.us/ on the Web

_______________________________________________
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev

Reply via email to