Re: [OSM-dev] OSM History Retriever

2010-07-29 Thread Brett Henderson
Hi Martijn,

On Thu, Jul 29, 2010 at 7:02 PM, Martijn van Exel  wrote:

> Hi Brett,
>
> What kind of trouble do you envisage when doing a bbox operation on a full
> history dump? I guess the movement of features over time makes an accurate
> determination of what is and what isn't in the bounding box less trivial,
> but for my purposes having all the historical data for which the current
> version is within the given bbox would be adequate.
>

This is educated guesswork because I haven't actually tried it ...

Yep, the movement of features over time is one issue.  The current bounding
box task will only send node versions through that are inside the bounding
box so it could potentially include some versions of a node and not others.
The next problem is that it will track all node ids that have been included
regardless of how many versions of the node were included.  These tracked
node ids are then used during way processing to determine which ways to
include.  The way processing won't take into account which version of a node
was included so some ways may be included that shouldn't be.  Finally the
relation processing may also have issues where relations are included that
point to ways for which only some versions have been included.  It might
mostly work because most data won't shift drastically during its lifetime
but there will be exceptions.

The bigger issue is that Osmosis has no concept of a "visible" attribute.
Osmosis deals with two main types of data, entity streams (ie. nodes, ways
and relations) and change streams (as per entity streams but with an action
create, modify or delete).  Entity streams are what is used for processing
osm files, and change streams deal with osc files.  The full history file
has much more in common with an osc file than an osm file.  The use of a
"visible" attribute is another way of representing what Osmosis already
represents using create, modify and delete actions on change streams.

Trying to use Osmosis to process these full history files as if they were
normal osm files may kind of work but entity streams are not designed to
work this way.  I think a better solution is to write a new XML change
reader for Osmosis that reads these full history files and uses the visible
attribute to determine if it's a create, modify or delete.  It can then
sends them into the pipeline as a normal change stream.  The next step after
that is to write a new bounding box task that can deal with change streams.

Brett
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-29 Thread Martijn van Exel
Hi Brett,

What kind of trouble do you envisage when doing a bbox operation on a full 
history dump? I guess the movement of features over time makes an accurate 
determination of what is and what isn't in the bounding box less trivial, but 
for my purposes having all the historical data for which the current version is 
within the given bbox would be adequate.

I would not hesitate to get some of my team members busy with adapting osmosis 
to deal with full history planet extracts better if necessary, but I'd love to 
hear from you what the likely caveats are.

Best, 
Martijn 

Sent from my iPad

On Jul 23, 2010, at 2:20 PM, Brett Henderson  wrote:

> On Thu, Jul 22, 2010 at 6:14 PM, Martijn van Exel  wrote:
> 
> On 22 jul 2010, at 06:57, Brett Henderson wrote:
> 
>> On Wed, Jul 21, 2010 at 11:52 PM, Lars Francke  
>> wrote:
>> [...]
>> 
>> That format is fine and exactly what I would have expected.  I suspect 
>> Osmosis would parse it okay, but without support for the visible attribute 
>> it won't be particularly useful.
>>  
> Not for visualization purposes maybe, but for analysis purposes the visible 
> attribute is not really an issue. My goal is to extract full history dumps 
> for certain spatial extents and import them into a PostGIS, in order to 
> calculate historical metrics exposing the crowd dynamics of OSM - for example 
> number of contributors over time, version growth over time, movement of nodes 
> over time. All this calculated for grid cells. 
> 
> The full history dump is 13GB bz2 compressed. Anyone got a rough idea how 
> long it would take for osmosis to extract, say, a bbox of the Netherlands out 
> of that on a 4GB AMD Opteron quad core machine? More RAM would probaby help?
> 
> Osmosis is unlikely to work well on a full history dump.  The --bounding-box 
> task is really only designed to work with data from a single point in time.  
> Data across a time range is much more difficult to accurately perform 
> bounding box filtering, although it might be good enough.  A bigger issue is 
> that it will ignore visible attributes and strip those attributes from the 
> output.
> 
> A relatively small amount of RAM is used for a single --bounding-box task if 
> you specify the idTrackerType=BitSet option.
> 
> Brett
> 
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-23 Thread Brett Henderson
On Thu, Jul 22, 2010 at 6:14 PM, Martijn van Exel  wrote:

>
> On 22 jul 2010, at 06:57, Brett Henderson wrote:
>
> On Wed, Jul 21, 2010 at 11:52 PM, Lars Francke wrote:
> [...]
>
> That format is fine and exactly what I would have expected.  I suspect
> Osmosis would parse it okay, but without support for the visible attribute
> it won't be particularly useful.
>
>
> Not for visualization purposes maybe, but for analysis purposes the visible
> attribute is not really an issue. My goal is to extract full history dumps
> for certain spatial extents and import them into a PostGIS, in order to
> calculate historical metrics exposing the crowd dynamics of OSM - for
> example number of contributors over time, version growth over time, movement
> of nodes over time. All this calculated for grid cells.
>
> The full history dump is 13GB bz2 compressed. Anyone got a rough idea how
> long it would take for osmosis to extract, say, a bbox of the Netherlands
> out of that on a 4GB AMD Opteron quad core machine? More RAM would probaby
> help?
>

Osmosis is unlikely to work well on a full history dump.  The --bounding-box
task is really only designed to work with data from a single point in time.
Data across a time range is much more difficult to accurately perform
bounding box filtering, although it might be good enough.  A bigger issue is
that it will ignore visible attributes and strip those attributes from the
output.

A relatively small amount of RAM is used for a single --bounding-box task if
you specify the idTrackerType=BitSet option.

Brett
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-22 Thread Martijn van Exel

On 22 jul 2010, at 06:57, Brett Henderson wrote:

> On Wed, Jul 21, 2010 at 11:52 PM, Lars Francke  wrote:
> [...]
> That format is fine and exactly what I would have expected.  I suspect 
> Osmosis would parse it okay, but without support for the visible attribute it 
> won't be particularly useful.
>  
Not for visualization purposes maybe, but for analysis purposes the visible 
attribute is not really an issue. My goal is to extract full history dumps for 
certain spatial extents and import them into a PostGIS, in order to calculate 
historical metrics exposing the crowd dynamics of OSM - for example number of 
contributors over time, version growth over time, movement of nodes over time. 
All this calculated for grid cells. 

The full history dump is 13GB bz2 compressed. Anyone got a rough idea how long 
it would take for osmosis to extract, say, a bbox of the Netherlands out of 
that on a 4GB AMD Opteron quad core machine? More RAM would probaby help?

Best

-- Martijn

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Brett Henderson
On Wed, Jul 21, 2010 at 11:52 PM, Lars Francke wrote:

> On Wed, Jul 21, 2010 at 15:42, Brett Henderson  wrote:
> > I haven't looked at the full history dump to be honest so I'm not in a
> great
> > position to comment, but I'll comment anyway ;-)
> >
> > I'm curious what the format of the full history dump is.  I'd like to
> > understand how nodes, ways and relations are represented in the file but
> I
> > can't do so without downloading the whole thing and decompressing it.
>
> The format is pretty standard .osm XML.
> It currently outputs a visible attribute for every object.
>
> Apart from that everything should be standard[1] and modeled after the
> tool that writes the planet except that each element may occur
> multiple times.
> There are only two things to look out for: It is not "pretty-printed",
> so everything is just one huge line (I know that some people do
> line-based parsing but that won't work here) and as there's old data
> mixed in not every element has uid and user attributes.
>
> Let me know if you have any questions or if a format change would help
> in any way but I believe the current format is pretty normal and
> should be parseable by almost every tool that already parses .osm
> files.
>

That format is fine and exactly what I would have expected.  I suspect
Osmosis would parse it okay, but without support for the visible attribute
it won't be particularly useful.


>
> Cheers,
> Lars
>
> [1]
> https://bitbucket.org/lfrancke/historydump/src/tip/src/main/java/org/openstreetmap/util/Dumper.java
>
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Peter Körner

Am 21.07.2010 15:42, schrieb Brett Henderson:

I'm curious what the format of the full history dump is.  I'd like to
understand how nodes, ways and relations are represented in the file but
I can't do so without downloading the whole thing and decompressing it.

Fyi. you can decompress and access it as it comes in via http:

wget -O - -q 
http://planet.osm.org/full-experimental/full-planet-100214.osm.bz2 | 
bzip2 -dc | more


Peter

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Lars Francke
On Wed, Jul 21, 2010 at 15:42, Brett Henderson  wrote:
> I haven't looked at the full history dump to be honest so I'm not in a great
> position to comment, but I'll comment anyway ;-)
>
> I'm curious what the format of the full history dump is.  I'd like to
> understand how nodes, ways and relations are represented in the file but I
> can't do so without downloading the whole thing and decompressing it.

The format is pretty standard .osm XML.
It currently outputs a visible attribute for every object.

Apart from that everything should be standard[1] and modeled after the
tool that writes the planet except that each element may occur
multiple times.
There are only two things to look out for: It is not "pretty-printed",
so everything is just one huge line (I know that some people do
line-based parsing but that won't work here) and as there's old data
mixed in not every element has uid and user attributes.

Let me know if you have any questions or if a format change would help
in any way but I believe the current format is pretty normal and
should be parseable by almost every tool that already parses .osm
files.

Cheers,
Lars

[1] 
https://bitbucket.org/lfrancke/historydump/src/tip/src/main/java/org/openstreetmap/util/Dumper.java

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Brett Henderson
I haven't looked at the full history dump to be honest so I'm not in a great
position to comment, but I'll comment anyway ;-)

I'm curious what the format of the full history dump is.  I'd like to
understand how nodes, ways and relations are represented in the file but I
can't do so without downloading the whole thing and decompressing it.

A full history dump would presumably require at least the addition of the
"visible" attribute to the standard set of XML attributes.  Currently
Osmosis has support for normal entity streams, change streams, and dataset
streams (random access to data, not in common use).  Entities with visible
attributes would require a new stream type which is not terribly difficult,
but requires a few new interfaces and task managers to be defined.

The bigger task is then writing tasks to support these new data types.  In
particular the existing --bounding-box task can't be used because it assumes
that only a single version of each entity exists, and that nodes reside in a
single location.  With full history files you need to take into account that
each way may refer to several different versions of nodes through time
depending on timestamp and that each version of a node might reside in a
completely different location.  It's not as simple as the current bounding
box task which just tracks which nodes it has included and then includes
ways which reference them.

I'll do my best to answer any questions if somebody wants to take this on
but it doesn't sound trivial.  Not much existing code could be re-used other
than the generic pipeline management.

On Wed, Jul 21, 2010 at 8:53 PM, Andy Allan  wrote:

> On Wed, Jul 21, 2010 at 10:06 AM, Martijn van Exel 
> wrote:
>
> >> Well we don't really want to be running that script lots of times for
> different extents either - the idea would be take the dump that produces and
> process it to produce subsets of the data as people do with the ordinary
> planet dumps.
> >>
> > Are there any existing tools that could do the processing though? Would
> osmosis for example be able to extract a bbox-defined subset of the
> history.osm file?
>
> Not as far as I'm aware, unless osmosis happens to magically work!
> It's the best tool for the job though, so I'd think some extra osmosis
> tasks (--read-history, --write-history) would be the best approach.
> I've no idea how much internal plumbing would be required though to
> support this - anyone want to comment?
>
> Cheers,
> Andy
>
> ___
> dev mailing list
> dev@openstreetmap.org
> http://lists.openstreetmap.org/listinfo/dev
>
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Andy Allan
On Wed, Jul 21, 2010 at 10:06 AM, Martijn van Exel  wrote:

>> Well we don't really want to be running that script lots of times for 
>> different extents either - the idea would be take the dump that produces and 
>> process it to produce subsets of the data as people do with the ordinary 
>> planet dumps.
>>
> Are there any existing tools that could do the processing though? Would 
> osmosis for example be able to extract a bbox-defined subset of the 
> history.osm file?

Not as far as I'm aware, unless osmosis happens to magically work!
It's the best tool for the job though, so I'd think some extra osmosis
tasks (--read-history, --write-history) would be the best approach.
I've no idea how much internal plumbing would be required though to
support this - anyone want to comment?

Cheers,
Andy

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Lars Francke
> At SOTM I talked to Lars Francke about his history dump script that was run 
> once or twice a few months ago. He said it would definitely be feasible to 
> implement an extent parameter but that would require some significant 
> rewriting effort. I am willing to look into this but it's in Java which is 
> not my weapon of choice.. I would love to have a full history for a few 
> larger areas at some point so it would make sense for me to implement this.

The code is here: https://bitbucket.org/lfrancke/historydump/src (but
as you mentioned: it's Java)

And as Tom has mentioned this would probably have to be an external
service but that requires quite a few resources. If anyone has servers
to spare I bet we can put them to good use :)
And there's always the problem with ways crossing the area and things
moving in and out of the area in question over time etc.
If there is anything I can do to help let me know!

Cheers,
Lars

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Martijn van Exel
On Jul 21, 2010, at 11:04 AM, Tom Hughes wrote:

> On 21/07/10 10:01, Martijn van Exel wrote:
> 
>> At SOTM I talked to Lars Francke about his history dump script that was run 
>> once or twice a few months ago. He said it would definitely be feasible to 
>> implement an extent parameter but that would require some significant 
>> rewriting effort. I am willing to look into this but it's in Java which is 
>> not my weapon of choice.. I would love to have a full history for a few 
>> larger areas at some point so it would make sense for me to implement this.
> 
> Well we don't really want to be running that script lots of times for 
> different extents either - the idea would be take the dump that produces and 
> process it to produce subsets of the data as people do with the ordinary 
> planet dumps.
> 
Are there any existing tools that could do the processing though? Would osmosis 
for example be able to extract a bbox-defined subset of the history.osm file?

Martijn van Exel +++ m...@rtijn.org
Laziness – Impatience – Hubris

http://schaaltreinen.nl
twitter: mvexel
skype: mvexel
flickr: rhodes




___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Tom Hughes

On 21/07/10 10:01, Martijn van Exel wrote:


At SOTM I talked to Lars Francke about his history dump script that was run 
once or twice a few months ago. He said it would definitely be feasible to 
implement an extent parameter but that would require some significant rewriting 
effort. I am willing to look into this but it's in Java which is not my weapon 
of choice.. I would love to have a full history for a few larger areas at some 
point so it would make sense for me to implement this.


Well we don't really want to be running that script lots of times for 
different extents either - the idea would be take the dump that produces 
and process it to produce subsets of the data as people do with the 
ordinary planet dumps.


Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Martijn van Exel
On Jul 21, 2010, at 10:48 AM, Tom Hughes wrote:

> On 21/07/10 09:26, Martijn van Exel wrote:
> 
>> I started this because there seems no direct way to retrieve the full 
>> history for a particular area. What I am worried about is that this is an 
>> abuse of the API, so please advise if this is the case.
> 
> Well it's certainly going to be rather expensive, so if people use it more 
> than occasionally or start scripting it to scrape larger area then we are 
> clearly going to have a problem.
> 
> Generating extracts from the full history dump would be preferable - not sure 
> what the status of getting that setup is though.
> 
> Tom
> 

For that reason I restricted the bbox size to a maximum of 5km. Of course 
someone could remove this limitation in the code quite easily. 
I realize that in urban / dense areas even 5km would mean a lot of requests so 
I want to build in a check once the initial map API request returns (maximum of 
a few thousand features or something).

The use case for this program for me is being able to retrieve the history for 
a number of small areas easily, for analysis purposes related to my Crowd 
Quality research I talked about at SOTM. 

At SOTM I talked to Lars Francke about his history dump script that was run 
once or twice a few months ago. He said it would definitely be feasible to 
implement an extent parameter but that would require some significant rewriting 
effort. I am willing to look into this but it's in Java which is not my weapon 
of choice.. I would love to have a full history for a few larger areas at some 
point so it would make sense for me to implement this.

Martijn van Exel +++ m...@rtijn.org
Laziness – Impatience – Hubris

http://schaaltreinen.nl
twitter: mvexel
skype: mvexel
flickr: rhodes




___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM History Retriever

2010-07-21 Thread Tom Hughes

On 21/07/10 09:26, Martijn van Exel wrote:


I started this because there seems no direct way to retrieve the full history 
for a particular area. What I am worried about is that this is an abuse of the 
API, so please advise if this is the case.


Well it's certainly going to be rather expensive, so if people use it 
more than occasionally or start scripting it to scrape larger area then 
we are clearly going to have a problem.


Generating extracts from the full history dump would be preferable - not 
sure what the status of getting that setup is though.


Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev