Re: [osmosis-dev] Proposal for Allowing Additional Data in Pipeline
On Wed, Jun 8, 2011 at 6:33 PM, Jochen Topf wrote: > > Hmm, a somewhat rambling email :-) Any thoughts? > > A flexible mechanism like this would be very interesting and useful, but I > also > see a lot of potential for confusion. Some tasks can handle certain extra > data, > some tasks can't. Some would only work if extra data is present, some would > silently do wrong things when expected data is not present, etc. Currently > all tasks can be plugged together and if you are trying to combine them in > a > way that doesn't work (for instance a task reading change data instead of > one > reading plain data), Osmosis will complain. > > So I think this needs to be a bit more formalized so Osmosis can still make > those checks. > > The Map could be connected to some registry for the strings > (or the strings would be objects instead with some extra information.) For > each one you would have at least the two options: > If a tasks encounters an object of this type in the pipeline and doesn't > understand it, > * it will just ignore it and optionally pass it on > * it has to complain. > A central registry makes sense for the reasons you say. I'd prefer to make registration optional though. In other words, it would be nice to be able to register attributes if you want them to be checked properly, but for unrecognised attributes to be ignored and passed along unchanged. I imagine it would be useful to be able to pass custom attributes for custom plugins, or even to allow special attributes in a custom XML file to be passed through the pipeline. It's largely academic at this point though, I don't think I'm going to get it done :-) Brett ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev
Re: [osmosis-dev] Proposal for Allowing Additional Data in Pipeline
On Wed, May 11, 2011 at 10:59:34PM +1000, Brett Henderson wrote: > This email is just musings at this point. I'm not sure if I'll be able to > implement anything anytime soon, but I'd be interested in people's thoughts > on this. > > Until now I've intentionally kept the core data classes in Osmosis as simple > as possible to simplify maintenance and ensure consistency across all > tasks. I've only added attributes that are required to support basic OSM > data and avoided any extensions from creeping in. > > However it can be quite limiting when there is no way of passing additional > data through the pipeline. Examples of additional data might be: > >- A "mutated" flag of some kind to flag when a particular entity has been >changed and shouldn't be uploaded to the main API. An example is when ways >are clipped at bounding box boundaries. >- A "visible" flag. I hesitate to include this one because Osmosis >supports this via change streams, not optional visible attributes. >- Header information to be attached to the Bound element such as >replication timestamp information, source URLs, etc. >- Custom data exchanged between specialised tasks. For example, a >polygon processing task might add full geometric information to a way. > > To add some flexibility I'm thinking along the following lines: > >- Add a new collection to entities that can be optionally populated with >String/Object pairs. Conceptually similar to a Map but >possibly stored like existing Tag objects in a simple Collection (currently >implemented as an ArrayList) for efficiency. >- The collection may be null when no data is required to minimise >overhead in the common case. Consumers would need to explicitly check for >null which is a tad ugly but I think warranted here. >- Modify key tasks such as XML tasks to support serialising these >additional values as attributes on the entities themselves (eg. version=1 ... mutated="true" /> ). Alternatively represent them as >sub-elements (eg. metatag stored as v="myvalue">) . The object would simply have the toString method >called on it to get a string representation. Reading from XML would result >in a String object. >- Tasks not caring about the data would simply pass the objects on >without modification. >- Some Sink tasks such as PostgreSQL database tasks would ignore the >additional data. >- Some tasks such as --bounding-box could add a flag such as "mutated". >- Rename the existing Bound entity to something more generic like Header >to allow more file attributes to be persisted. > > I think this approach would allow additional data to be attached to entities > in a generic fashion without Osmosis itself having to add special support > for it. It would keep the pipeline generic but allow specialised tasks to > exchange their own custom data. I think representing the value part of data > as an Object rather than String makes more sense because it allows custom > tasks to exchange complete objects instead of forcing serialisation to and > from String. > > The additional data could in theory be represented as Tags without changing > the pipeline at all, but it gets messy mixing real data with metadata. > > I'm not sure if it makes sense to add support for this to the Bound object, > or to simply allow Tag objects to be added instead. Perhaps tags make more > sense here? The whole Bound concept has always fitted awkwardly in Osmosis, > so I'm not sure how to tackle this one. > > Hmm, a somewhat rambling email :-) Any thoughts? A flexible mechanism like this would be very interesting and useful, but I also see a lot of potential for confusion. Some tasks can handle certain extra data, some tasks can't. Some would only work if extra data is present, some would silently do wrong things when expected data is not present, etc. Currently all tasks can be plugged together and if you are trying to combine them in a way that doesn't work (for instance a task reading change data instead of one reading plain data), Osmosis will complain. So I think this needs to be a bit more formalized so Osmosis can still make those checks. The Map could be connected to some registry for the strings (or the strings would be objects instead with some extra information.) For each one you would have at least the two options: If a tasks encounters an object of this type in the pipeline and doesn't understand it, * it will just ignore it and optionally pass it on * it has to complain. Jochen -- Jochen Topf joc...@remote.org http://www.remote.org/jochen/ +49-721-388298 ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev
Re: [osmosis-dev] Proposal for Allowing Additional Data in Pipeline
Additional meta data will also enable much more flexible filters. Right now, if you want to find all nodes, ways and relations carrying a particular tag plus the primitives referenced by these, you have to set up three copies of the input stream, filtering each separately and then merging the results. With additional meta data, you could easily chain the required filters into a single pipeline, passing information about which ways/nodes/relations have been selected via an additional flag. In this case, the additional meta data would used internally and would not get stored in the resulting OSM file. In the use cases you described, the OSM file receives the meta data. Note that for the PBF format, this will require extending the protocol definition. - Bartosz ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev
[osmosis-dev] Proposal for Allowing Additional Data in Pipeline
Hi All, This email is just musings at this point. I'm not sure if I'll be able to implement anything anytime soon, but I'd be interested in people's thoughts on this. Until now I've intentionally kept the core data classes in Osmosis as simple as possible to simplify maintenance and ensure consistency across all tasks. I've only added attributes that are required to support basic OSM data and avoided any extensions from creeping in. However it can be quite limiting when there is no way of passing additional data through the pipeline. Examples of additional data might be: - A "mutated" flag of some kind to flag when a particular entity has been changed and shouldn't be uploaded to the main API. An example is when ways are clipped at bounding box boundaries. - A "visible" flag. I hesitate to include this one because Osmosis supports this via change streams, not optional visible attributes. - Header information to be attached to the Bound element such as replication timestamp information, source URLs, etc. - Custom data exchanged between specialised tasks. For example, a polygon processing task might add full geometric information to a way. To add some flexibility I'm thinking along the following lines: - Add a new collection to entities that can be optionally populated with String/Object pairs. Conceptually similar to a Map but possibly stored like existing Tag objects in a simple Collection (currently implemented as an ArrayList) for efficiency. - The collection may be null when no data is required to minimise overhead in the common case. Consumers would need to explicitly check for null which is a tad ugly but I think warranted here. - Modify key tasks such as XML tasks to support serialising these additional values as attributes on the entities themselves (eg. ). Alternatively represent them as sub-elements (eg. metatag stored as ) . The object would simply have the toString method called on it to get a string representation. Reading from XML would result in a String object. - Tasks not caring about the data would simply pass the objects on without modification. - Some Sink tasks such as PostgreSQL database tasks would ignore the additional data. - Some tasks such as --bounding-box could add a flag such as "mutated". - Rename the existing Bound entity to something more generic like Header to allow more file attributes to be persisted. I think this approach would allow additional data to be attached to entities in a generic fashion without Osmosis itself having to add special support for it. It would keep the pipeline generic but allow specialised tasks to exchange their own custom data. I think representing the value part of data as an Object rather than String makes more sense because it allows custom tasks to exchange complete objects instead of forcing serialisation to and from String. The additional data could in theory be represented as Tags without changing the pipeline at all, but it gets messy mixing real data with metadata. I'm not sure if it makes sense to add support for this to the Bound object, or to simply allow Tag objects to be added instead. Perhaps tags make more sense here? The whole Bound concept has always fitted awkwardly in Osmosis, so I'm not sure how to tackle this one. Hmm, a somewhat rambling email :-) Any thoughts? Cheers Brett ___ osmosis-dev mailing list osmosis-dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/osmosis-dev