Re: [OSM-dev] OSM History Retriever
Hi Martijn, On Thu, Jul 29, 2010 at 7:02 PM, Martijn van Exel wrote: > Hi Brett, > > What kind of trouble do you envisage when doing a bbox operation on a full > history dump? I guess the movement of features over time makes an accurate > determination of what is and what isn't in the bounding box less trivial, > but for my purposes having all the historical data for which the current > version is within the given bbox would be adequate. > This is educated guesswork because I haven't actually tried it ... Yep, the movement of features over time is one issue. The current bounding box task will only send node versions through that are inside the bounding box so it could potentially include some versions of a node and not others. The next problem is that it will track all node ids that have been included regardless of how many versions of the node were included. These tracked node ids are then used during way processing to determine which ways to include. The way processing won't take into account which version of a node was included so some ways may be included that shouldn't be. Finally the relation processing may also have issues where relations are included that point to ways for which only some versions have been included. It might mostly work because most data won't shift drastically during its lifetime but there will be exceptions. The bigger issue is that Osmosis has no concept of a "visible" attribute. Osmosis deals with two main types of data, entity streams (ie. nodes, ways and relations) and change streams (as per entity streams but with an action create, modify or delete). Entity streams are what is used for processing osm files, and change streams deal with osc files. The full history file has much more in common with an osc file than an osm file. The use of a "visible" attribute is another way of representing what Osmosis already represents using create, modify and delete actions on change streams. Trying to use Osmosis to process these full history files as if they were normal osm files may kind of work but entity streams are not designed to work this way. I think a better solution is to write a new XML change reader for Osmosis that reads these full history files and uses the visible attribute to determine if it's a create, modify or delete. It can then sends them into the pipeline as a normal change stream. The next step after that is to write a new bounding box task that can deal with change streams. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
Hi Brett, What kind of trouble do you envisage when doing a bbox operation on a full history dump? I guess the movement of features over time makes an accurate determination of what is and what isn't in the bounding box less trivial, but for my purposes having all the historical data for which the current version is within the given bbox would be adequate. I would not hesitate to get some of my team members busy with adapting osmosis to deal with full history planet extracts better if necessary, but I'd love to hear from you what the likely caveats are. Best, Martijn Sent from my iPad On Jul 23, 2010, at 2:20 PM, Brett Henderson wrote: > On Thu, Jul 22, 2010 at 6:14 PM, Martijn van Exel wrote: > > On 22 jul 2010, at 06:57, Brett Henderson wrote: > >> On Wed, Jul 21, 2010 at 11:52 PM, Lars Francke >> wrote: >> [...] >> >> That format is fine and exactly what I would have expected. I suspect >> Osmosis would parse it okay, but without support for the visible attribute >> it won't be particularly useful. >> > Not for visualization purposes maybe, but for analysis purposes the visible > attribute is not really an issue. My goal is to extract full history dumps > for certain spatial extents and import them into a PostGIS, in order to > calculate historical metrics exposing the crowd dynamics of OSM - for example > number of contributors over time, version growth over time, movement of nodes > over time. All this calculated for grid cells. > > The full history dump is 13GB bz2 compressed. Anyone got a rough idea how > long it would take for osmosis to extract, say, a bbox of the Netherlands out > of that on a 4GB AMD Opteron quad core machine? More RAM would probaby help? > > Osmosis is unlikely to work well on a full history dump. The --bounding-box > task is really only designed to work with data from a single point in time. > Data across a time range is much more difficult to accurately perform > bounding box filtering, although it might be good enough. A bigger issue is > that it will ignore visible attributes and strip those attributes from the > output. > > A relatively small amount of RAM is used for a single --bounding-box task if > you specify the idTrackerType=BitSet option. > > Brett > ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On Thu, Jul 22, 2010 at 6:14 PM, Martijn van Exel wrote: > > On 22 jul 2010, at 06:57, Brett Henderson wrote: > > On Wed, Jul 21, 2010 at 11:52 PM, Lars Francke wrote: > [...] > > That format is fine and exactly what I would have expected. I suspect > Osmosis would parse it okay, but without support for the visible attribute > it won't be particularly useful. > > > Not for visualization purposes maybe, but for analysis purposes the visible > attribute is not really an issue. My goal is to extract full history dumps > for certain spatial extents and import them into a PostGIS, in order to > calculate historical metrics exposing the crowd dynamics of OSM - for > example number of contributors over time, version growth over time, movement > of nodes over time. All this calculated for grid cells. > > The full history dump is 13GB bz2 compressed. Anyone got a rough idea how > long it would take for osmosis to extract, say, a bbox of the Netherlands > out of that on a 4GB AMD Opteron quad core machine? More RAM would probaby > help? > Osmosis is unlikely to work well on a full history dump. The --bounding-box task is really only designed to work with data from a single point in time. Data across a time range is much more difficult to accurately perform bounding box filtering, although it might be good enough. A bigger issue is that it will ignore visible attributes and strip those attributes from the output. A relatively small amount of RAM is used for a single --bounding-box task if you specify the idTrackerType=BitSet option. Brett ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On 22 jul 2010, at 06:57, Brett Henderson wrote: > On Wed, Jul 21, 2010 at 11:52 PM, Lars Francke wrote: > [...] > That format is fine and exactly what I would have expected. I suspect > Osmosis would parse it okay, but without support for the visible attribute it > won't be particularly useful. > Not for visualization purposes maybe, but for analysis purposes the visible attribute is not really an issue. My goal is to extract full history dumps for certain spatial extents and import them into a PostGIS, in order to calculate historical metrics exposing the crowd dynamics of OSM - for example number of contributors over time, version growth over time, movement of nodes over time. All this calculated for grid cells. The full history dump is 13GB bz2 compressed. Anyone got a rough idea how long it would take for osmosis to extract, say, a bbox of the Netherlands out of that on a 4GB AMD Opteron quad core machine? More RAM would probaby help? Best -- Martijn ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On Wed, Jul 21, 2010 at 11:52 PM, Lars Francke wrote: > On Wed, Jul 21, 2010 at 15:42, Brett Henderson wrote: > > I haven't looked at the full history dump to be honest so I'm not in a > great > > position to comment, but I'll comment anyway ;-) > > > > I'm curious what the format of the full history dump is. I'd like to > > understand how nodes, ways and relations are represented in the file but > I > > can't do so without downloading the whole thing and decompressing it. > > The format is pretty standard .osm XML. > It currently outputs a visible attribute for every object. > > Apart from that everything should be standard[1] and modeled after the > tool that writes the planet except that each element may occur > multiple times. > There are only two things to look out for: It is not "pretty-printed", > so everything is just one huge line (I know that some people do > line-based parsing but that won't work here) and as there's old data > mixed in not every element has uid and user attributes. > > Let me know if you have any questions or if a format change would help > in any way but I believe the current format is pretty normal and > should be parseable by almost every tool that already parses .osm > files. > That format is fine and exactly what I would have expected. I suspect Osmosis would parse it okay, but without support for the visible attribute it won't be particularly useful. > > Cheers, > Lars > > [1] > https://bitbucket.org/lfrancke/historydump/src/tip/src/main/java/org/openstreetmap/util/Dumper.java > ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
Am 21.07.2010 15:42, schrieb Brett Henderson: I'm curious what the format of the full history dump is. I'd like to understand how nodes, ways and relations are represented in the file but I can't do so without downloading the whole thing and decompressing it. Fyi. you can decompress and access it as it comes in via http: wget -O - -q http://planet.osm.org/full-experimental/full-planet-100214.osm.bz2 | bzip2 -dc | more Peter ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On Wed, Jul 21, 2010 at 15:42, Brett Henderson wrote: > I haven't looked at the full history dump to be honest so I'm not in a great > position to comment, but I'll comment anyway ;-) > > I'm curious what the format of the full history dump is. I'd like to > understand how nodes, ways and relations are represented in the file but I > can't do so without downloading the whole thing and decompressing it. The format is pretty standard .osm XML. It currently outputs a visible attribute for every object. Apart from that everything should be standard[1] and modeled after the tool that writes the planet except that each element may occur multiple times. There are only two things to look out for: It is not "pretty-printed", so everything is just one huge line (I know that some people do line-based parsing but that won't work here) and as there's old data mixed in not every element has uid and user attributes. Let me know if you have any questions or if a format change would help in any way but I believe the current format is pretty normal and should be parseable by almost every tool that already parses .osm files. Cheers, Lars [1] https://bitbucket.org/lfrancke/historydump/src/tip/src/main/java/org/openstreetmap/util/Dumper.java ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
I haven't looked at the full history dump to be honest so I'm not in a great position to comment, but I'll comment anyway ;-) I'm curious what the format of the full history dump is. I'd like to understand how nodes, ways and relations are represented in the file but I can't do so without downloading the whole thing and decompressing it. A full history dump would presumably require at least the addition of the "visible" attribute to the standard set of XML attributes. Currently Osmosis has support for normal entity streams, change streams, and dataset streams (random access to data, not in common use). Entities with visible attributes would require a new stream type which is not terribly difficult, but requires a few new interfaces and task managers to be defined. The bigger task is then writing tasks to support these new data types. In particular the existing --bounding-box task can't be used because it assumes that only a single version of each entity exists, and that nodes reside in a single location. With full history files you need to take into account that each way may refer to several different versions of nodes through time depending on timestamp and that each version of a node might reside in a completely different location. It's not as simple as the current bounding box task which just tracks which nodes it has included and then includes ways which reference them. I'll do my best to answer any questions if somebody wants to take this on but it doesn't sound trivial. Not much existing code could be re-used other than the generic pipeline management. On Wed, Jul 21, 2010 at 8:53 PM, Andy Allan wrote: > On Wed, Jul 21, 2010 at 10:06 AM, Martijn van Exel > wrote: > > >> Well we don't really want to be running that script lots of times for > different extents either - the idea would be take the dump that produces and > process it to produce subsets of the data as people do with the ordinary > planet dumps. > >> > > Are there any existing tools that could do the processing though? Would > osmosis for example be able to extract a bbox-defined subset of the > history.osm file? > > Not as far as I'm aware, unless osmosis happens to magically work! > It's the best tool for the job though, so I'd think some extra osmosis > tasks (--read-history, --write-history) would be the best approach. > I've no idea how much internal plumbing would be required though to > support this - anyone want to comment? > > Cheers, > Andy > > ___ > dev mailing list > dev@openstreetmap.org > http://lists.openstreetmap.org/listinfo/dev > ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On Wed, Jul 21, 2010 at 10:06 AM, Martijn van Exel wrote: >> Well we don't really want to be running that script lots of times for >> different extents either - the idea would be take the dump that produces and >> process it to produce subsets of the data as people do with the ordinary >> planet dumps. >> > Are there any existing tools that could do the processing though? Would > osmosis for example be able to extract a bbox-defined subset of the > history.osm file? Not as far as I'm aware, unless osmosis happens to magically work! It's the best tool for the job though, so I'd think some extra osmosis tasks (--read-history, --write-history) would be the best approach. I've no idea how much internal plumbing would be required though to support this - anyone want to comment? Cheers, Andy ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
> At SOTM I talked to Lars Francke about his history dump script that was run > once or twice a few months ago. He said it would definitely be feasible to > implement an extent parameter but that would require some significant > rewriting effort. I am willing to look into this but it's in Java which is > not my weapon of choice.. I would love to have a full history for a few > larger areas at some point so it would make sense for me to implement this. The code is here: https://bitbucket.org/lfrancke/historydump/src (but as you mentioned: it's Java) And as Tom has mentioned this would probably have to be an external service but that requires quite a few resources. If anyone has servers to spare I bet we can put them to good use :) And there's always the problem with ways crossing the area and things moving in and out of the area in question over time etc. If there is anything I can do to help let me know! Cheers, Lars ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On Jul 21, 2010, at 11:04 AM, Tom Hughes wrote: > On 21/07/10 10:01, Martijn van Exel wrote: > >> At SOTM I talked to Lars Francke about his history dump script that was run >> once or twice a few months ago. He said it would definitely be feasible to >> implement an extent parameter but that would require some significant >> rewriting effort. I am willing to look into this but it's in Java which is >> not my weapon of choice.. I would love to have a full history for a few >> larger areas at some point so it would make sense for me to implement this. > > Well we don't really want to be running that script lots of times for > different extents either - the idea would be take the dump that produces and > process it to produce subsets of the data as people do with the ordinary > planet dumps. > Are there any existing tools that could do the processing though? Would osmosis for example be able to extract a bbox-defined subset of the history.osm file? Martijn van Exel +++ m...@rtijn.org Laziness – Impatience – Hubris http://schaaltreinen.nl twitter: mvexel skype: mvexel flickr: rhodes ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On 21/07/10 10:01, Martijn van Exel wrote: At SOTM I talked to Lars Francke about his history dump script that was run once or twice a few months ago. He said it would definitely be feasible to implement an extent parameter but that would require some significant rewriting effort. I am willing to look into this but it's in Java which is not my weapon of choice.. I would love to have a full history for a few larger areas at some point so it would make sense for me to implement this. Well we don't really want to be running that script lots of times for different extents either - the idea would be take the dump that produces and process it to produce subsets of the data as people do with the ordinary planet dumps. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On Jul 21, 2010, at 10:48 AM, Tom Hughes wrote: > On 21/07/10 09:26, Martijn van Exel wrote: > >> I started this because there seems no direct way to retrieve the full >> history for a particular area. What I am worried about is that this is an >> abuse of the API, so please advise if this is the case. > > Well it's certainly going to be rather expensive, so if people use it more > than occasionally or start scripting it to scrape larger area then we are > clearly going to have a problem. > > Generating extracts from the full history dump would be preferable - not sure > what the status of getting that setup is though. > > Tom > For that reason I restricted the bbox size to a maximum of 5km. Of course someone could remove this limitation in the code quite easily. I realize that in urban / dense areas even 5km would mean a lot of requests so I want to build in a check once the initial map API request returns (maximum of a few thousand features or something). The use case for this program for me is being able to retrieve the history for a number of small areas easily, for analysis purposes related to my Crowd Quality research I talked about at SOTM. At SOTM I talked to Lars Francke about his history dump script that was run once or twice a few months ago. He said it would definitely be feasible to implement an extent parameter but that would require some significant rewriting effort. I am willing to look into this but it's in Java which is not my weapon of choice.. I would love to have a full history for a few larger areas at some point so it would make sense for me to implement this. Martijn van Exel +++ m...@rtijn.org Laziness – Impatience – Hubris http://schaaltreinen.nl twitter: mvexel skype: mvexel flickr: rhodes ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM History Retriever
On 21/07/10 09:26, Martijn van Exel wrote: I started this because there seems no direct way to retrieve the full history for a particular area. What I am worried about is that this is an abuse of the API, so please advise if this is the case. Well it's certainly going to be rather expensive, so if people use it more than occasionally or start scripting it to scrape larger area then we are clearly going to have a problem. Generating extracts from the full history dump would be preferable - not sure what the status of getting that setup is though. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev