Lars,

As you say, I don't think there is any objection in principle for a dump of
the whole history, but as your list suggests there are many issues to
resolve to enable it to happen and perhaps not the overwhelming will yet
from the community to form a mini project to make it happen. Its good to see
that you have given it some serious thought youself.

Ultimately the formation of a mini project is probably needed. Input from
those like yourself willing to work on it and the will and time from others
who would need to support the work, including sysadmins.

So I would suggest some work on the wiki and some more ideas etc on what
needs to be done and let's see if some other folks come out of the woodwork.
It will also allow the sysadmin team to state what the technical/time issues
will be from their end. Armed with sufficient info and the logical way to go
and the specific needs and tasks will emerge.

I'm sure its all possible, but like so many things in OSM it also has to be
practical and realistic to have any real chance of gathering momentum.

Cheers

Andy

>-----Original Message-----
>From: dev-boun...@openstreetmap.org [mailto:dev-boun...@openstreetmap.org]
>On Behalf Of Lars Francke
>Sent: 11 November 2009 6:42 AM
>To: OpenStreetMap Dev
>Subject: [OSM-dev] Complete history of OSM data - questions and discussion
>
>Hi!
>
>I and many (okay at least a few) others have shown interest in the
>complete history data of OSM. I understand that a lot of this data is
>available throughout the web using old snapshots and diffs but this
>comes in outdated formats and is by no way complete or easy to use. I
>also had a look at the System Admin page on the Wiki but I don't
>really know whom to contact, thus this post on the mailing list.
>
>My question would be what would have to be done for a complete dump of
>the data. I read previous requests for this data and it seems as if
>there is no general objection to such a dump but that no one has
>written the proper tool for the job so far. As I have some free time
>on my hands (and about a hundred ideas/requests for the data for
>osmdoc) I'd be willing to at least _try_ to get something done.
>
>There are a few questions that probably need answering first and I
>hope we can start a discussion about this:
>- Am I correct in assuming that there are no general objections from
>the OSM server folks against such a dump? (Which would render the rest
>of this E-Mail useless ;-)
>- Is anyone else currently working on this?
>- Which format should the data be dumped in
>- Distribution of the data and storage space requirements
>- Interval of dumps
>
>* Format *
>1) The easiest would be to just use the PostgreSQL COPY command
>(http://www.postgresql.org/docs/8.3/interactive/sql-copy.html). This
>would produce a file suitable to be read into any other PostgreSQL
>database with.
>
>Pros:
>- Easy to do
>- Probably one of the fastest options
>- Low overhead in the file formats
>
>Cons:
>- As far as I know there is no way to compress the data stream so
>everything would have to be written uncompressed first
>- The binary format is not really portable or easy to use, forced to
>use PostgreSQL as target, not able to filter data (Text formats
>available)
>- Even using text formats the data would be scattered (i.e. tags
>wouldn't be stored with the elements, node references wouldn't be
>stored with the ways, ...)
>- No OSM tools for this formats
>
>2) A dump of all changesets in OsmChange mode (e.g.
>http://www.openstreetmap.org/api/0.6/changeset/3010332/download ). As
>I understand it Changesets have been created for every change. I don't
>quite understand why the first changeset (and nodes/ways) come from
>sometime in 2005 and not 2004 but I bet someone here can enlighten me.
>
>Pros:
>- Well known data format, many tools can work with OsmChange
>- Good if the user wants to rebuild/relive the change events as the
>Changesets should come roughly in the correct order/timeline
>- Possibility to split the process in multiple parts (e.g. history
>files with 50.000 changesets each)
>- Easy to update -> Just add the new changesets (with the long running
>transactions, that are 'haunting' the diffs, posing the same problem)
>
>Cons:
>- XML file size overhead (doesn't matter that much compressed)
>- Probably a lot slower than the COPY method
>- Custom code would have to be written to do this export but it
>shouldn't be to hard to iterate over every changeset. The necessary
>indexes already seem to exist
>- Potentially bad if one is interested mainly in the elements itself,
>the history data for a single element could be scattered throughout
>the whole file
>
>3) A dump of all OSM elements in OSM format
>(http://www.openstreetmap.org/api/0.6/node/60078445/history)
>Pros:
>- Good if the user is interested in the elements and their history and
>not the "flow" of changes
>- Easily split in smaller files (nodes, ways, relations, changesets,
>further subdivided by id ranges or something else)
>- Easy to process although tools might not work out of the box
>
>Cons:
>- XML file size overhead, Custom code needed (or has Osmosis already
>the possibility to do this?), slower than COPY
>- This format has not that much tool support as far as I know
>(multiple versions of an element in a single file)
>- Best format to rebuild a "custom" database of OSM as it is grouped
>by element and not rather "arbitrarily" by Changeset/date
>- Not very easy to update, the whole process would have to be redone
>(or changesets would have to be examined)
>
>
>A few personal remarks:
>- I personally favor option 3) but that is mainly because of my
>requirements for osmdoc.
>- I don't see missing tool support as a big problem as I suspect that
>the majority of the users of this data will have/want their own tools
>do analyze or store the data (just guessing).
>
>
>*Distribution and space requirements*
>I really can't say much about this as I have no idea of the size of
>the database or the space available on the server(s). But I hope one
>of the admins can tell me more about this. The planet has been
>distributed using BitTorrent in the past so this might be a possible
>solution for the history dump but it really is too early to tell.
>
>*Interval of the dumps*
>Theoretically only one dump would be needed as there are now the
>replicate diffs which should provide every change to the database. But
>as they are - at the moment - only available in 'minute' format one
>might dump the history regularly (whatever that means, again depending
>on space requirement and if there is demand for this at all).
>
>
>I probably have forgotten some important aspects/problems/points and I
>hope to receive some feedback on this. I know that any "dump" program
>would have to be written in a way as to not interfere with normal
>operations (there is only one db server if I'm correct) but the
>current planet dump program probably gives a good indication about the
>load such a dump produces. Again, I have no data about this.
>
>Any pointers from the system administrators about the specifics and
>whom best to contact would be very welcome. Remarks about the data or
>its potential format (or possible uses for the data) are welcome too
>of course!
>
>Cheers,
>Lars
>
>_______________________________________________
>dev mailing list
>dev@openstreetmap.org
>http://lists.openstreetmap.org/listinfo/dev


_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

Reply via email to