Hello, on my day-job I recently had to solve the Problem of setting up a Postgis database contaning a full-planet extract using one of those cheap 180GB SSD on a semi-potent machine (only 8GB of RAM).
I first tried to use osm2pgsql for this purpose which is almost impossible for a couple of reasons: * I would have needed more than 180GB of disk space (at least during import) * The import would take a _very_ long time (several days rather than hours) So I had to look for a backup strategy! I found one which proved to be that good, that I would like to discuss it here as an alternative way for rendering-database setup. The first thing I discovered is the fact, that we currently use roughly twice the disk-space for intermediate tables (or file in case of flatnode) than for the processed data itself. Here is how this looks like on tile.openstreetmap.de: relation | total_size ---------------------------+------------ public.planet_osm_line | 61 GB public.planet_osm_polygon | 59 GB public.planet_osm_point | 10 GB public.planet_osm_roads | 8 GB public.planet_osm_ways | 174 GB public.planet_osm_rels | 4 GB + flatnode.dat 20 GB processed data: 61GB+59GB+10GB+8GB=138GB intermediate data: 20GB+174GB+4GB=202GB The most annoying part is the 174GB planet_osm_ways table. So what I did to solve my problem was using pg_dump for the processed data tables only and setting up my target database using pg_restore. Advantages: * Very fast data import even on machines where import using osm2pgsql would be practically impossible * Decent size of database dump (32GB in case of a --hstore-match-only database which is about the same size as a planet dump) Disadvantages: * Currently no update strategy available * Will need a "master" osm2pgsql database * Will "inherit" table scheme from "master" database Conclusion: IMO the disadvantages can be resolved or at least mitigated in the following way: * we generate a downloadable dump of the database on tile.openstreetmap.de (e.g. weekly) * osm2pgsql needs to be patched to output the changes to the processed data tables as SQL commands which can then be used to replicated the slave databases * We already use --hstore-match-only database format. So flexibility in table-layout is not that much of a concern as views to hstore column can be used for rendering instead of tables. I would like to hear your comments to this proposial. Regards Sven -- Threading is a performance hack. (The Art of Unix Programming by Eric S. Raymond) /me is giggls@ircnet, http://sven.gegg.us/ on the Web -- "Thinking of using NT for your critical apps? Isn't there enough suffering in the world?" (Advertisement of Sun Microsystems in Wall Street Journal) /me is giggls@ircnet, http://sven.gegg.us/ on the Web _______________________________________________ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev