Re: [OSM-dev] Complete history of OSM data - questions and discussion
Hi, Have you been considering how to handle the history of old anonymous edits? This new history data should not reveal those user names but keep them anonymous. -Jukka Rahkonen- ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
Have you been considering how to handle the history of old anonymous edits? This new history data should not reveal those user names but keep them anonymous. User IDs and Usernames for those anonymous edits are left out of the dump. So the elements may not have an uid or user attribute. These are the relevant lines from the source code: http://bitbucket.org/lfrancke/historydump/src/tip/src/main/java/org/openstreetmap/util/Dumper.java#cl-176 http://bitbucket.org/lfrancke/historydump/src/tip/src/main/java/org/openstreetmap/util/Dumper.java#cl-230 http://bitbucket.org/lfrancke/historydump/src/tip/src/main/java/org/openstreetmap/util/Dumper.java#cl-441 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
On 12/11/09 16:28, Lars Francke wrote: I am partly done with my Java version. There are a few questions/problems/remarks: Is Java really up to this job from a performance point of view? - Is there a dump of the database available from just prior to the switch from API 0.4 to 0.5? I could try to use that to merge the history of the segments to the ways (as briefly discussed by Frederik) There is a dump, but it's a mysql dump so not easily readable. There may be a planet around somewhere as well but I don't think it will be synchronised to the actual shutdown time or have any history. - Any information on the size (in rows) of the tables would be nice (for testing purposes) It should be fairly obvious for the main tables as they just contain a row for each object. - What is the default_statistics_target for the columns/tables in question? Are there any other options set that would affect the query planner? I've seen the query planner make wildly inappropriate decisions so I'll try to check if the statements I use will work. I used the same technique as planet.c and only adapted the queries to versions and history tables. I'm not quite sure what you think knowing the value of that setting is going to help with. You only need to worry about optimising your queries if it turns out the planner gets them wrong but it's rarely a problem with Postgres especially with the kind of simple queries a dumper uses. - Do I have to take precautions in regards to database/machine/disk load? I could do something like the Auto-Vacuum daemon[2] or monitoring the load. Auto vacuum is on by default these days I believe. It's not something an ordinary user has any control over anyway. Tom -- Tom Hughes (t...@compton.nu) http://www.compton.nu/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
I am partly done with my Java version. There are a few questions/problems/remarks: Is Java really up to this job from a performance point of view? I haven't done any performance comparisons between planet.c and my program but I believe that there won't be much of a difference. I could rip out the history part and produce a current planet only program to compare the speeds. I've taken care to select a XML Writer that is known for its performance. There is Aalto[1] but I've never used it before so I'm hesitant to use it. I could put the database reading and the XML writing stuff in two different threads but again I'm not sure if that'd help or hurt. I know that this is quite a religious topic but I rarely see big differences in speed between C and Java. Especially in this case as most of the time will probably be spent in I/O. - Is there a dump of the database available from just prior to the switch from API 0.4 to 0.5? I could try to use that to merge the history of the segments to the ways (as briefly discussed by Frederik) There is a dump, but it's a mysql dump so not easily readable. There may be a planet around somewhere as well but I don't think it will be synchronised to the actual shutdown time or have any history. If you'd be willing to share the mysql dump (I of course wouldn't need user- or any other sensitive data) I'd try my best. It can't hurt. Planet won't be as useful because the history is missing. - Any information on the size (in rows) of the tables would be nice (for testing purposes) It should be fairly obvious for the main tables as they just contain a row for each object. For the main (current_*) tables, yes. But not for the history tables. I have no estimate how may versions there are .I could count the current versions from all elements but if you have a number that'd be great. - What is the default_statistics_target for the columns/tables in question? Are there any other options set that would affect the query planner? I've seen the query planner make wildly inappropriate decisions so I'll try to check if the statements I use will work. I used the same technique as planet.c and only adapted the queries to versions and history tables. I'm not quite sure what you think knowing the value of that setting is going to help with. You only need to worry about optimising your queries if it turns out the planner gets them wrong but it's rarely a problem with Postgres especially with the kind of simple queries a dumper uses. I've had problems in the past with exactly this. Easy queries resorting to thousands of seqscans. I just want to configure my database as close as possible to the live one for tests. I don't think there'll be problems either but it doesn't hurt to check. The statements will look something like this: 1) SELECT n.id, n.version, n.timestamp, n.changeset_id, c.user_id, n.visible, n.latitude, n.longitude FROM nodes n JOIN changesets c ON n.changeset_id=c.id ORDER BY n.id, n.version 2) SELECT id, version, k, v FROM node_tags ORDER BY id, version, k Perhaps you could just check them? - Do I have to take precautions in regards to database/machine/disk load? I could do something like the Auto-Vacuum daemon[2] or monitoring the load. Auto vacuum is on by default these days I believe. It's not something an ordinary user has any control over anyway. That's not what I meant but I was unclear :) I meant that I could use the same method as the Auto-Vacuum daemon which pauses regularly (cost-based) to alleviate load. Just as an example. Again: Until now the program just reads from the DB and dumps to the output stream. No special concerns as to the statements or the performance/load. I just want to pick the low hanging fruit as early as possible and those were the questions I thought of. Some of them (especially those about the query planner) came from problems I've experienced with osmdoc. In the end it is up to you (or whoever decides that) if you want to use my program, write one from scratch or adapt planet.c to dump the history. I don't really care either way as long as the end result is that we eventually have historical OSM data ;-) Lars ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
On Thu, Nov 12, 2009 at 4:28 PM, Lars Francke lars.fran...@gmail.com wrote: - As of now the XML is not indented. I use Woodstox[1] for XML output and that doesn't have an option to pretty print the output. It is not a problem for me but if it is requested I can use StaxMate or something else to properly indent the XML i'm pretty sure no-one minds. as you say, anyone who wants it indented can easily do it with xmllint and friends. - Changesets: num_changes from the database isn't dumped in planet.c. It is queried from the database but not used anywhere. The data _can_ be calculated but it isn't that easy if not using the standard db schema and not easily done by reading the XML stream. I could just dump it too. I haven't had a look at the API if this field is set correctly at all?! it should be set correctly. you're welcome to dump it out on the changesets if you think it's useful. - I'm using the same technique as planet.c in regards to the output of the data (just streaming it to standard output), I just assume that this is okay? Are there any other things I'll have to change in comparison to the way planet.c works? yeah. the output will be piped directly into pbzip2, most probably. cheers, matt ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
On 12/11/09 17:37, Lars Francke wrote: There is a dump, but it's a mysql dump so not easily readable. There may be a planet around somewhere as well but I don't think it will be synchronised to the actual shutdown time or have any history. If you'd be willing to share the mysql dump (I of course wouldn't need user- or any other sensitive data) I'd try my best. It can't hurt. Planet won't be as useful because the history is missing. The problem is we'll have to load the dump into mysql to remove the sensitive data... For the main (current_*) tables, yes. But not for the history tables. I have no estimate how may versions there are .I could count the current versions from all elements but if you have a number that'd be great. Approximate row counts; nodes - 860 million ways - 72 million relations - 1.4 million The statements will look something like this: 1) SELECT n.id, n.version, n.timestamp, n.changeset_id, c.user_id, n.visible, n.latitude, n.longitude FROM nodes n JOIN changesets c ON n.changeset_id=c.id ORDER BY n.id, n.version 2) SELECT id, version, k, v FROM node_tags ORDER BY id, version, k Perhaps you could just check them? They should be fine - the sort means they will take a while to start returning data but they're not doing anything silly. Tom -- Tom Hughes (t...@compton.nu) http://www.compton.nu/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
A quick status update and a link to the code. - I decided to dump num_changes too - One thing that startled me: planet.c converts _all_ relation member roles to lower case before dumping them. I'd consider this a bug but I'm sure there is a reason for this. Considering that neither the API nor Osmosis or the mysql versions of planet.c do this it is bound to create inconsistencies. I had a look at the documentation in the Wiki and couldn't find a reference to roles having to be lower case. So I decided to dump them as they are in the database - I'm done with the basic functionality. I've run _very limited_ tests but I plan to generate test data tomorrow to see how it fares (thanks Tom for the row counts) The source can be found at: http://bitbucket.org/lfrancke/historydump/ It is written in Java and uses Maven2. A simple mvn package should build a jar file but I can upload one if neccessary. It has three dependencies: WoodStox, PostgreSQL JDBC driver and Apache Commons CLI. Any feedback is welcome. Run it like this: java -jar historydump-1.0-SNAPSHOT-jar-with-dependencies.jar --help I'll write again once I've tested this more thoroughly and then the decision will be in your hands :) Good Night, Lars ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
There is a dump, but it's a mysql dump so not easily readable. There may be a planet around somewhere as well but I don't think it will be synchronised to the actual shutdown time or have any history. If you'd be willing to share the mysql dump (I of course wouldn't need user- or any other sensitive data) I'd try my best. It can't hurt. Planet won't be as useful because the history is missing. The problem is we'll have to load the dump into mysql to remove the sensitive data... That's what occurred to me too after I sent the mail. This is a nice to have-thing but by no way a necessity. Please let me know if I can be of any assistance as I'm very interested in the data. Otherwise I'll just bug you every few months ;-) If Matt is correct when he says that we'd need a full dump anyway on a license change wouldn't that include this old mysql data? I may be far off here. License issues are way over my head. Cheers, Lars ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
On Wed, Nov 11, 2009 at 6:41 AM, Lars Francke lars.fran...@gmail.com wrote: There are a few questions that probably need answering first and I hope we can start a discussion about this: - Am I correct in assuming that there are no general objections from the OSM server folks against such a dump? (Which would render the rest of this E-Mail useless ;-) the response has always been if someone writes it, and it's good, we'll run it :-) - Is anyone else currently working on this? for some values of working, yes. it's on my list of things to do for the license change plan - clearly we'll need a full data dump before we can re-license. - Which format should the data be dumped in (3) is the easiest to get done and most easily supported, in my opinion. - Distribution of the data and storage space requirements i have a feeling that the data, while big, won't be so big that the usual method of planet.osm.org + heanet mirror won't work. - Interval of dumps based on back-of-the envelope calculations, a full dump in planet format would take something like 7-10 days to do in parallel with normal server activity. so it couldn't be run every week and would probably be cumbersome to do every month. in my opinion, we should be looking at every 3-6 months. 3) A dump of all OSM elements in OSM format (http://www.openstreetmap.org/api/0.6/node/60078445/history) this is my favourite method as well. the easiest approach would be to modify planet.c to dump the full history, instead of just the current_* tables. note that brett has been working on option (2) by using osmosis to dump very historical diffs going back to the inception of the database. you can see the experimental results in http://planet.openstreetmap.org/history/ for my money, if we do both (2) and (3), then we cater for all consumers, and in a standard format. the output of the COPY command, while good for backups, isn't really suited to dumping the information that we have in the planet (given there will be edits by users who are still not public, etc...) if you want to get started hacking on planet.c then i'm happy to help. otherwise i'm hoping to get around to it by the end of the month, but there are never any guarantees ;-) cheers, matt ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
Hi, Lars Francke wrote: I understand that a lot of this data is available throughout the web using old snapshots and diffs but this comes in outdated formats and is by no way complete or easy to use. Keep in mind that while a full database dump will give you some things that are not in the old planet files, but the reverse is true as well - there is information in the old planet files (pre-0.5) that is not in the database and thus will not be part of a history dump. Specifically this applies to pre-0.5 way history. This is not really a big deal execpt for those who would hope to make OSM history animations going back farther than API 0.5. Bye Frederik ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
I understand that a lot of this data is available throughout the web using old snapshots and diffs but this comes in outdated formats and is by no way complete or easy to use. Keep in mind that while a full database dump will give you some things that are not in the old planet files, but the reverse is true as well - there is information in the old planet files (pre-0.5) that is not in the database and thus will not be part of a history dump. Specifically this applies to pre-0.5 way history. This is not really a big deal execpt for those who would hope to make OSM history animations going back farther than API 0.5. I had not thought of that. When I first used OSM, segments were long gone so I tend to forget those. There are a few planet dumps from those time but none before 060403 and no diffs (which would be required to fully reconstruct the history). While I'm of course interested in the most complete history possible I don't know if this data would be easy to integrate. As I understand it all the old segments that weren't part of a way were converted to a way and all 0.4-ways were just migrated to 0.5-ways (segments to node reference). I suppose segments that were part of at least one way were not converted to a way? So we'd need to: - Find the segments for previously unwayed segment-ways and incorporate their history into the way. As the new way starts at version 1 this would have to be a hack (version 0, counting backwards or something like that). - For ways we'd need to find the history data of the segments they were made of and merge that into the history of the way, again requiring versions before 1 I don't know if it is worth the trouble but on the other hand it would be nice to have a complete history. Especially as this would have to be done only once. But as I said: My knowledge of pre 0.5 times is limited at best and I'd be happy if you/someone else could tell me if what I wrote makes sense. I'd certainly be willing to have _a look_ at this, too. Thanks for pointing this out! Lars ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
Andy, Ultimately the formation of a mini project is probably needed. Input from those like yourself willing to work on it and the will and time from others who would need to support the work, including sysadmins. I had hoped that this could be (re-)solved rather unbureaucratic and without the need to involve the Wiki :) The wiki tends to drag discussions on I'm sure its all possible, but like so many things in OSM it also has to be practical and realistic to have any real chance of gathering momentum. Your and the other answers suggest to me that it is practical and realistic, thanks! At the moment I'm fully motivated to do what I can/need to get the data I want. Hopefully that's enough momentum. Cheers, Lars ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
Hi, Lars Francke wrote: I had not thought of that. When I first used OSM, segments were long gone so I tend to forget those. There are a few planet dumps from those time but none before 060403 and no diffs (which would be required to fully reconstruct the history). No, diffs only show the changes between two points in time, not what happened in between; so they cannot be used to fully reconstruct history. While I'm of course interested in the most complete history possible I don't know if this data would be easy to integrate. No, it would probably be hard. As I understand it all the old segments that weren't part of a way were converted to a way and all 0.4-ways were just migrated to 0.5-ways (segments to node reference). I suppose segments that were part of at least one way were not converted to a way? I think so too, but I am unsure what happened to tagged segments. Also we used to have lots of unordered ways where a number of segments were part of a way but not in a sorted order. Sometimes they weren't even contingent and thus had to be split in multiple ways when 0.5 was introduced. Also, we used this deliberately to model areas with holes (two chains of segments, one clockwise, one counter-clockwise, being part of the same way - that was your area with a hole!). These would have to be retro-fitted into multipolygons for every point in history. It is not difficult to do it once but to do it for two points in history and hope to assign the resulting changes to the same virtual relation id is... a challenge. - For ways we'd need to find the history data of the segments they were made of and merge that into the history of the way, again requiring versions before 1 Yes, we briefly thought about something like that when we did the 0.4-0.5 migration (a synthesized history if you will) but dropped the idea due to its complexity. Bye Frederik ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
- Am I correct in assuming that there are no general objections from the OSM server folks against such a dump? (Which would render the rest of this E-Mail useless ;-) the response has always been if someone writes it, and it's good, we'll run it :-) That's all I wanted to hear :) (3) is the easiest to get done and most easily supported, in my opinion. Once more: That's all I wanted to hear! - Distribution of the data and storage space requirements i have a feeling that the data, while big, won't be so big that the usual method of planet.osm.org + heanet mirror won't work. I'll have to rely on your word there but that sounds good too. One thing less to worry about. - Interval of dumps based on back-of-the envelope calculations, a full dump in planet format would take something like 7-10 days to do in parallel with normal server activity. so it couldn't be run every week and would probably be cumbersome to do every month. in my opinion, we should be looking at every 3-6 months. Sounds reasonable enough. I don't know how much demand there is for this data anyway. note that brett has been working on option (2) by using osmosis to dump very historical diffs going back to the inception of the database. you can see the experimental results in http://planet.openstreetmap.org/history/ for my money, if we do both (2) and (3), then we cater for all consumers, and in a standard format. the output of the COPY command, while good for backups, isn't really suited to dumping the information that we have in the planet (given there will be edits by users who are still not public, etc...) Indeed, (2) _and_ (3) would be the best solution! I had noticed the history diffs but didn't know their status. Thanks Brett for clarifying it! if you want to get started hacking on planet.c then i'm happy to help. otherwise i'm hoping to get around to it by the end of the month, but there are never any guarantees ;-) The last time I've programmed in C is quite a while back (same goes for C++ which is used by the postgres part of the dump program if I'm not mistaken)...so I'll have a look at it but I'm more comfortable with Java (or Python, Erlang, Ruby, ...). So I'll see what I can do and inform you about my progress. The worst that can happen is that we have two working solutions for the same problem. Not too bad :) I had brief discussions with Brett about Osmosis and incorporating certain changes into it so I've spent quite some time in its source code. Having said that: I probably won't program this as a new task for Osmosis but as a standalone program as this probably won't be used widely and doesn't justify the extra work required to incorporate this into Osmosis. Thanks for your response. I'm hopeful now that this can be done! Lars ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
On Wed, Nov 11, 2009 at 1:29 PM, Lars Francke lars.fran...@gmail.com wrote: I had brief discussions with Brett about Osmosis and incorporating certain changes into it so I've spent quite some time in its source code. Having said that: I probably won't program this as a new task for Osmosis but as a standalone program as this probably won't be used widely and doesn't justify the extra work required to incorporate this into Osmosis. just remember that new code = new bugs ;-) cheers, matt ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
I had not thought of that. When I first used OSM, segments were long gone so I tend to forget those. There are a few planet dumps from those time but none before 060403 and no diffs (which would be required to fully reconstruct the history). No, diffs only show the changes between two points in time, not what happened in between; so they cannot be used to fully reconstruct history. That's why I like the new replicate diffs. Has anyone done - and kept - a complete database dump before migrating from 0.4 to 0.5 or is the history (at least partially) lost? As I understand it all the old segments that weren't part of a way were converted to a way and all 0.4-ways were just migrated to 0.5-ways (segments to node reference). I suppose segments that were part of at least one way were not converted to a way? I think so too, but I am unsure what happened to tagged segments. Also we used to have lots of unordered ways where a number of segments were part of a way but not in a sorted order. Sometimes they weren't even contingent and thus had to be split in multiple ways when 0.5 was introduced. Also, we used this deliberately to model areas with holes (two chains of segments, one clockwise, one counter-clockwise, being part of the same way - that was your area with a hole!). These would have to be retro-fitted into multipolygons for every point in history. It is not difficult to do it once but to do it for two points in history and hope to assign the resulting changes to the same virtual relation id is... a challenge. Thanks for the history tour. I didn't know all that. Sounds a bit like the Wild West of OSM :) But I can't quite follow the multipolygon problem. I thought that _every_ old segment has been migrated to 0.5 in one way [sic!] or another and I would only prepend the history I can find to these existing ways. Am I thinking too simple here? - For ways we'd need to find the history data of the segments they were made of and merge that into the history of the way, again requiring versions before 1 Yes, we briefly thought about something like that when we did the 0.4-0.5 migration (a synthesized history if you will) but dropped the idea due to its complexity. I can certainly appreciate that decision. I just want to understand what would be needed and decide if it is worthwhile to do something about it or not. Lars ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Complete history of OSM data - questions and discussion
Lars Francke [mailto:lars.fran...@gmail.com] Sent: 11 November 2009 1:16 PM To: Andy Robinson (blackadder-lists) Cc: OpenStreetMap Dev Subject: Re: [OSM-dev] Complete history of OSM data - questions and discussion Andy, Ultimately the formation of a mini project is probably needed. Input from those like yourself willing to work on it and the will and time from others who would need to support the work, including sysadmins. I had hoped that this could be (re-)solved rather unbureaucratic and without the need to involve the Wiki :) The wiki tends to drag discussions on The community likes to be kept informed of developments and there may be others out there that wish to help with stuff or have a real interest in the form and function of what is done. Only a small number of the OSM community read the dev list. Putting stuff on the wiki doesn't necessarily mean you are asking for discussion, but rather it's a means of communicating what you are doing and providing a conduit for the community give feedback. Cheers Andy ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Complete history of OSM data - questions and discussion
Hi! I and many (okay at least a few) others have shown interest in the complete history data of OSM. I understand that a lot of this data is available throughout the web using old snapshots and diffs but this comes in outdated formats and is by no way complete or easy to use. I also had a look at the System Admin page on the Wiki but I don't really know whom to contact, thus this post on the mailing list. My question would be what would have to be done for a complete dump of the data. I read previous requests for this data and it seems as if there is no general objection to such a dump but that no one has written the proper tool for the job so far. As I have some free time on my hands (and about a hundred ideas/requests for the data for osmdoc) I'd be willing to at least _try_ to get something done. There are a few questions that probably need answering first and I hope we can start a discussion about this: - Am I correct in assuming that there are no general objections from the OSM server folks against such a dump? (Which would render the rest of this E-Mail useless ;-) - Is anyone else currently working on this? - Which format should the data be dumped in - Distribution of the data and storage space requirements - Interval of dumps * Format * 1) The easiest would be to just use the PostgreSQL COPY command (http://www.postgresql.org/docs/8.3/interactive/sql-copy.html). This would produce a file suitable to be read into any other PostgreSQL database with. Pros: - Easy to do - Probably one of the fastest options - Low overhead in the file formats Cons: - As far as I know there is no way to compress the data stream so everything would have to be written uncompressed first - The binary format is not really portable or easy to use, forced to use PostgreSQL as target, not able to filter data (Text formats available) - Even using text formats the data would be scattered (i.e. tags wouldn't be stored with the elements, node references wouldn't be stored with the ways, ...) - No OSM tools for this formats 2) A dump of all changesets in OsmChange mode (e.g. http://www.openstreetmap.org/api/0.6/changeset/3010332/download ). As I understand it Changesets have been created for every change. I don't quite understand why the first changeset (and nodes/ways) come from sometime in 2005 and not 2004 but I bet someone here can enlighten me. Pros: - Well known data format, many tools can work with OsmChange - Good if the user wants to rebuild/relive the change events as the Changesets should come roughly in the correct order/timeline - Possibility to split the process in multiple parts (e.g. history files with 50.000 changesets each) - Easy to update - Just add the new changesets (with the long running transactions, that are 'haunting' the diffs, posing the same problem) Cons: - XML file size overhead (doesn't matter that much compressed) - Probably a lot slower than the COPY method - Custom code would have to be written to do this export but it shouldn't be to hard to iterate over every changeset. The necessary indexes already seem to exist - Potentially bad if one is interested mainly in the elements itself, the history data for a single element could be scattered throughout the whole file 3) A dump of all OSM elements in OSM format (http://www.openstreetmap.org/api/0.6/node/60078445/history) Pros: - Good if the user is interested in the elements and their history and not the flow of changes - Easily split in smaller files (nodes, ways, relations, changesets, further subdivided by id ranges or something else) - Easy to process although tools might not work out of the box Cons: - XML file size overhead, Custom code needed (or has Osmosis already the possibility to do this?), slower than COPY - This format has not that much tool support as far as I know (multiple versions of an element in a single file) - Best format to rebuild a custom database of OSM as it is grouped by element and not rather arbitrarily by Changeset/date - Not very easy to update, the whole process would have to be redone (or changesets would have to be examined) A few personal remarks: - I personally favor option 3) but that is mainly because of my requirements for osmdoc. - I don't see missing tool support as a big problem as I suspect that the majority of the users of this data will have/want their own tools do analyze or store the data (just guessing). *Distribution and space requirements* I really can't say much about this as I have no idea of the size of the database or the space available on the server(s). But I hope one of the admins can tell me more about this. The planet has been distributed using BitTorrent in the past so this might be a possible solution for the history dump but it really is too early to tell. *Interval of the dumps* Theoretically only one dump would be needed as there are now the replicate diffs which should provide every change to the database. But as they are - at the moment - only available in 'minute'