Re: [OSM-dev] Experimental history files too

2014-02-03 Thread Matt Amos
On Tue, Jan 28, 2014 at 11:11 AM, Peter Körner  wrote:
> Am 28.01.2014 03:18, schrieb Matt Amos:
>> On Fri, Jan 24, 2014 at 2:56 PM, Peter Körner  
>> wrote:
>>> What do you think about adopting the osmium-naming-scheme for history files?
>>
>> personally, i think it's misleading
>> [...]
>> in the case of .osm files, they're all potentially history files, and
>> the file format does not change depending on whether multiple versions
>> are present for a single ID or not.
>
> History-Files carry an extra attribute (bool visible) that distinguishes
> them from regular osm files.

all .osm files can carry the visible attribute, and they're present in
all responses from the API. they are only elided from the "current"
planet dump because they all have the value "true", so would simply be
a waste of space. to be clear: the presence of the visible attribute
does not mean the file contains multiple versions with the same ID.

> Also it's not only about the parser. [...]
> So from a data-user point of view it does not matter if the format is
> actually-quite-similar, as long as there are separate programs and tools
> required to handle the two types of file, they are actually different.

the "current" planet is a strict subset of the "history" planet -
anything which can load and process a "history" planet file should be
able to process the "current" planet. also, it is trivial to turn a
"history" planet into a "current" planet by discarding old versions
and deleted elements.

> I'd compare it more to tiff/aiff. While they are actually quite similar
> from a file-format point of view, no one would argue that audio-files
> and image-files should be handled as if there were no difference.

i would agree that audio files and image files are completely different :-)

but that is not the case with "history" and "current" planet files -
as i've said above, the former is a strict superset of the latter, so
the comparison to tiff/aiff is misleading. i can't think of a good
comparison - the best one i've thought of so far is one version of
HTML compared with a previous version, and programs which understand
only previous versions will ignore some elements of a later version
file.

>> whether something is a "history"
>> osm file or a "current" osm file is a matter of the content - so
>> wanting a different extension is a bit like wanting .png for
>> truecolour images and .pgr for greyscale images (in the same PNG
>> format).
> The main difference here is that all applications capable of reading png
> MUST be capable of reading pgr as well. That's not true for osm/osh
> applications.
>
> Also the tasks you can perform on both file formats are the same
> (display, crop, combine, ...). This is also not true for osm/osh files.
> One can import both into a database but the database-format required and
> the actions possible on those databases are quite different.

indeed, the analogy with image files was not good.

however, there are tasks that one can perform on osm files regardless
of whether they contain multiple versions with the same ID - one can
pull out elements matching certain tags & attributes, perform
transformations on the basis of tags & attributes, reproject nodes or
extract those within certain areas, etc...

>> having said that, it would seem reasonable to add a flag to
>> the document element to indicate whether the .osm file is a special
>> case, having a single version for each ID, as many programs seem to
>> rely on this assumption and it would be better to be able to check it.
> Isn't that already implemented via the required_features header of
> pbf-files?
>
> See
> 
> for a reference.

no, actually, that flag has a different meaning. setting
"HistoricalInformation" in the PBF header simply means that some
elements may be deleted (have visible="false"). one can create a
conforming file, with all elements having visible="true", and yet
still have multiple versions for the same ID.

to correct this, PBF should probably deprecate
"HistoricalInformation"[1] and have a header field
"CurrentInformationOnly", which indicates that *both* there are no
visible="false" elements and no two elements of the same type share an
ID. and our XML should probably have the same attribute on the header.

cheers,

matt

[1]: one could simply change what "HistoricalInformation" meant, but
that runs the risk of already-written (and previously conforming)
files becoming non-conforming, which will be confusing.

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Experimental history files too

2014-01-28 Thread Peter Körner
Am 28.01.2014 03:18, schrieb Matt Amos:
> On Fri, Jan 24, 2014 at 2:56 PM, Peter Körner  wrote:
>> What do you think about adopting the osmium-naming-scheme for history files?
> 
> personally, i think it's misleading
> [...]
> in the case of .osm files, they're all potentially history files, and
> the file format does not change depending on whether multiple versions
> are present for a single ID or not.

History-Files carry an extra attribute (bool visible) that distinguishes
them from regular osm files.

Also it's not only about the parser. osm2pgsql's parser is absolutely
capable of parsing files with multiple versions of an object, but the
whole processing chain will crash with wired errors.

Same for osmosis. Its parser will work but only a small fraction of
tasks will.

So from a data-user point of view it does not matter if the format is
actually-quite-similar, as long as there are separate programs and tools
required to handle the two types of file, they are actually different.

I'd compare it more to tiff/aiff. While they are actually quite similar
from a file-format point of view, no one would argue that audio-files
and image-files should be handled as if there were no difference.


> whether something is a "history"
> osm file or a "current" osm file is a matter of the content - so
> wanting a different extension is a bit like wanting .png for
> truecolour images and .pgr for greyscale images (in the same PNG
> format).
The main difference here is that all applications capable of reading png
MUST be capable of reading pgr as well. That's not true for osm/osh
applications.

Also the tasks you can perform on both file formats are the same
(display, crop, combine, ...). This is also not true for osm/osh files.
One can import both into a database but the database-format required and
the actions possible on those databases are quite different.

> having said that, it would seem reasonable to add a flag to
> the document element to indicate whether the .osm file is a special
> case, having a single version for each ID, as many programs seem to
> rely on this assumption and it would be better to be able to check it.
Isn't that already implemented via the required_features header of
pbf-files?

See

for a reference.

> the generation is synced to the backup database dumps, so the clock
> starts running early Tuesday, when Monday's backup is complete. they
> seem to be fairly reliably finished by Wednesday morning, so it's
> probably safe to start looking for them then - although they'll be
> named for Monday's date.

Thank you for that info, I'll see when I can setup my regular splitting
task and announce sepeately.

>> I xml-writing takes only half as long as xml-reading I'd double-think
>> about supplying xml-based files. nobody really has fun reading such huge
>> files with expat. And if it's really neccessary, there's always
>> osmium_convert which will generate xmls from pbf-dumps or -extracts locally.
> 
> this is a discussion which could probably continue forever. my opinion
> is that it's worthwhile distributing files which are sort-of
> human-readable, in a well-known format/markup for which many libraries
> exist in many languages, compressed with standard tools, and in the
> same format as the API.
got your point.

Peter


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Experimental history files too

2014-01-27 Thread Matt Amos
On Fri, Jan 24, 2014 at 2:56 PM, Peter Körner  wrote:
> Am 23.01.2014 18:28, schrieb Matt Amos:
>> i encourage everyone to take a look and report back any problems you
>> find. my thanks to Peter Körner, who seems to already be doing this -
>> with no problems?
>
> No problems yet. Have run two splits of those files already
> (http://osm.personalwerk.de/full-history-extracts/)
>
> What do you think about adopting the osmium-naming-scheme for history files?
>  .osm.[bz2|gz|pbf] -> regular osm files
>  .osh.[bz2|gz|pbf] -> history files
>  .osc.[bz2|gz]-> changeset-files
>
> That would make detecting the kind of file at first glance more easy and
> it also fits nicely int othe .osc-file nameing convention.

personally, i think it's misleading. osmchange is a related, but
different, format from osm xml and a parser which works for one will
not necessarily work for the other. therefore, having a different
extension seems reasonable.

in the case of .osm files, they're all potentially history files, and
the file format does not change depending on whether multiple versions
are present for a single ID or not. whether something is a "history"
osm file or a "current" osm file is a matter of the content - so
wanting a different extension is a bit like wanting .png for
truecolour images and .pgr for greyscale images (in the same PNG
format). having said that, it would seem reasonable to add a flag to
the document element to indicate whether the .osm file is a special
case, having a single version for each ID, as many programs seem to
rely on this assumption and it would be better to be able to check it.

> I'm going to implement a regular run that generates fresh extracts every
> week from the available file. Is there any note on which weekday the
> full-history-dumps are generated, so I can loosely sync my split-script
> to that rhythm?

great! :-)

the generation is synced to the backup database dumps, so the clock
starts running early Tuesday, when Monday's backup is complete. they
seem to be fairly reliably finished by Wednesday morning, so it's
probably safe to start looking for them then - although they'll be
named for Monday's date.

> I xml-writing takes only half as long as xml-reading I'd double-think
> about supplying xml-based files. nobody really has fun reading such huge
> files with expat. And if it's really neccessary, there's always
> osmium_convert which will generate xmls from pbf-dumps or -extracts locally.

this is a discussion which could probably continue forever. my opinion
is that it's worthwhile distributing files which are sort-of
human-readable, in a well-known format/markup for which many libraries
exist in many languages, compressed with standard tools, and in the
same format as the API.

this way, it's possible for people to develop tools which work against
small map call downloads, then scale them to extracts and even the
whole planet. of course, it's a widely-held belief that xml sucks
irretrievably and, while it's certainly true that pbf is smaller and
parses faster, distributing only pbf would mean someone would have to
learn those extra tools/commands to start using the data.

xml, despite its many flaws, at least has myriad libraries, bindings
and tools which make it easier to experiment with processing and
transforming osm data. these experimental planet/history files are
also line-oriented, which means one can even do quick-and-dirty
grep/sed/awk work for ad-hoc analysis.

cheers,

matt

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Experimental history files too

2014-01-24 Thread Peter Körner
Hi

Am 23.01.2014 18:28, schrieb Matt Amos:
> i encourage everyone to take a look and report back any problems you
> find. my thanks to Peter Körner, who seems to already be doing this -
> with no problems?

No problems yet. Have run two splits of those files already
(http://osm.personalwerk.de/full-history-extracts/)

What do you think about adopting the osmium-naming-scheme for history files?
 .osm.[bz2|gz|pbf] -> regular osm files
 .osh.[bz2|gz|pbf] -> history files
 .osc.[bz2|gz]-> changeset-files

That would make detecting the kind of file at first glance more easy and
it also fits nicely int othe .osc-file nameing convention.

I'm going to implement a regular run that generates fresh extracts every
week from the available file. Is there any note on which weekday the
full-history-dumps are generated, so I can loosely sync my split-script
to that rhythm?

I xml-writing takes only half as long as xml-reading I'd double-think
about supplying xml-based files. nobody really has fun reading such huge
files with expat. And if it's really neccessary, there's always
osmium_convert which will generate xmls from pbf-dumps or -extracts locally.

Regards, Peter


___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev