On Fri, Nov 23, 2012 at 5:03 AM, <[email protected]> wrote: > Hi Scott, > > in brief to the 1-degrees granularity: > > 1. Do whole processing in 64 bit: > This would mean to need much more RAM space when processing ways' > coordinates. We should not do this unless this granularity is really > required. >
If you want your program to do all processing with 100 nanodegree granularity instead of 1 nanodegree granularity, then you can use ints throughout. Your software will have the limitation that if a PBF file contains data with 1 nanodegree granularity that there will be data loss, which is probably not a limitation in practice. AFAIK, there are no PBF files with granularity that is not a multiple of 100 or with lat_offset and lon_offset != 0. > > 2. Your formula: > latitude_int = ((lat_offset + granularity*lat)/50+1)/2 > Good idea, but again, this would mean one more multiplication, one more > division (and two additions, one shift). These operations usually can be > done in no time, however that's different if you need to do them a Billion > times. > I'm curious, have you benchmarked the difference? There are still people out there who have 32 bit machines, I presume they > do not have 64 bits hardware multiplication units, hence the processing > time will increase. > > In any case, if the file has a granularity that is a multiple of 100, you can use this specialized formula instead: latitude_int = (lat_offset/50+1)/2 + (granularity/100)*lat // This calculation can be done using 32-bit ints. This can be further specialized for when the granularity is 100 to: latitude_int = (lat_offset/50+1)/2 + lat // This calculation can be done using 32-bit ints. > 3. Process sequence: > Using the granularity factor, lon/lat of every node in an OSMData > fileblock must be read, stored temporarily and transformed later. Thus you > have to access every data twice: first to read it, and a second time when > you transform its granularity. This might be a flaw in PBF data model... > Could we at least change this in that manner that the granularity > information comes _before_ the real data? Same applies to lon/lat offset > and date granularity. > No can do. Google's protobuf format doesn't specifify the order in which the components of a message are serialized (this is to support concatenation of messages without decoding them). Their implementation serializes in tag-order, and I chose larger numbers for the granularity tags than for the primitive block tags. > > In the end - there always will be a lot of programs which do not need this > quasi "optional feature" "granularity" and simply will not support it. > > Metadata... > > We had the same discussion a year ago. Do you remember? > https://wiki.openstreetmap.org/wiki/Talk:PBF_Format#File_Timestamp.3F > I'm curious if - and I hope that - we manage to extend the PBF data format > this time. :-) > The file time stamp I added was meant as an interim solution: I took the > already defined "optional feature" and stored a key-val pair in it, for > example "timestamp=2011-10-16T15:45:00Z". > > I think this example shows what we really need: a flexible format for file > related meta data. With key-val pairs, everyone could add optional data > whenever they are needed in a toolchain. This is the flexibility we are > used to have from OSM XML format. > I understand the desire for this, but I want to put some thought into it to avoid the situation that created this thread, where the same metadata is stored in different locations, and in different formats. How about two types of metadata storage, one type is standardized in the OSMHeader object directly: message HeaderBlock { optional HeaderBBox bbox = 1; /* Additional tags to aid in parsing this dataset */ repeated string required_features = 4; repeated string optional_features = 5; /* Other ad-hoc metadata */ repeated AdHocMetadata adhoc_metadata = 6; // See below. optional string writingprogram = 16; optional string source = 17; // From the bbox field. optional string timestamp = 18; // from OSM planet header. optional int64 replication_timestamp = 19 // In microseconds since 1970 UTC. optional string copyright = 20; optional string contributors = 21; optional string license = 22; } (new fields taken from the new planet header). Question, since I haven't reviewed OSM replication options, do we want one timetsamp, two timestamps, and should they be fnt64 or string? > To combine this flexibility with the advantages of Protobuf format > (compressed storage of different data types) we need to allow meta > formatted objects - or something like this: > > message HeaderBlock { > ... > repeated HeaderMeta = 20; > } > > message HeaderMeta { > required string HeaderKey = 1; > optional HeaderMetaVarint = 10; > optional HeaderMetaString = 12; > // see type definitions there: > https://wiki.openstreetmap.org/wiki/PBF#Format_example > // Only _one_ of the three optional objects should be used; did not know > how to define this in Protobuf without an additional hierarchy layer. > } > > What do you think about this suggestion? > > And, I agree with your idea of having key-value metadata, but, IMHO, ad-hoc non-standardized metadata keys should be scoped to the author or creator of that key-value. Say, something like this: message AdHocMetadata { required string author = 1; // Fully qualified URI of the author of this metadata, e.g., // a website for toolchain, program, a company using this for // internal tracking data, or an email address of the person who created it. The author has // exclusive ownership of all keys and values assigned under their ID. required String key = 2; // Key assigned by the author. required boolean copied_into_derived = 3; // Should this key be copied into derived data. // These are generic fields that the supplier is free to use any or any subset of these. repeated sint64 value_int = 8; repeated string value_string = 9; repeated double value_double = 10; repeated bytes value_bytes = 11; // byte fields can contain other serialiezd protobuf objects. } Question, should I keep field #3? Useful for helping to track procesing pipelines, or do OSM processing pipelines currently not handle pushing through arbitrary metadata? Thoughts on both proposals for metadata? Scott
_______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

