Re: [osmosis-dev] Proposal for Allowing Additional Data in Pipeline

2011-06-12 Thread Brett Henderson
On Wed, Jun 8, 2011 at 6:33 PM, Jochen Topf  wrote:

> > Hmm, a somewhat rambling email :-)  Any thoughts?
>
> A flexible mechanism like this would be very interesting and useful, but I
> also
> see a lot of potential for confusion. Some tasks can handle certain extra
> data,
> some tasks can't. Some would only work if extra data is present, some would
> silently do wrong things when expected data is not present, etc. Currently
> all tasks can be plugged together and if you are trying to combine them in
> a
> way that doesn't work (for instance a task reading change data instead of
> one
> reading plain data), Osmosis will complain.
>
> So I think this needs to be a bit more formalized so Osmosis can still make
> those checks.
>
> The Map could be connected to some registry for the strings
> (or the strings would be objects instead with some extra information.) For
> each one you would have at least the two options:
> If a tasks encounters an object of this type in the pipeline and doesn't
> understand it,
> * it will just ignore it and optionally pass it on
> * it has to complain.
>

A central registry makes sense for the reasons you say.  I'd prefer to make
registration optional though.  In other words, it would be nice to be able
to register attributes if you want them to be checked properly, but for
unrecognised attributes to be ignored and passed along unchanged.  I imagine
it would be useful to be able to pass custom attributes for custom plugins,
or even to allow special attributes in a custom XML file to be passed
through the pipeline.

It's largely academic at this point though, I don't think I'm going to get
it done :-)

Brett
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Proposal for Allowing Additional Data in Pipeline

2011-06-08 Thread Jochen Topf
On Wed, May 11, 2011 at 10:59:34PM +1000, Brett Henderson wrote:
> This email is just musings at this point.  I'm not sure if I'll be able to
> implement anything anytime soon, but I'd be interested in people's thoughts
> on this.
> 
> Until now I've intentionally kept the core data classes in Osmosis as simple
> as possible to simplify maintenance and ensure consistency across all
> tasks.  I've only added attributes that are required to support basic OSM
> data and avoided any extensions from creeping in.
> 
> However it can be quite limiting when there is no way of passing additional
> data through the pipeline.  Examples of additional data might be:
> 
>- A "mutated" flag of some kind to flag when a particular entity has been
>changed and shouldn't be uploaded to the main API.  An example is when ways
>are clipped at bounding box boundaries.
>- A "visible" flag.  I hesitate to include this one because Osmosis
>supports this via change streams, not optional visible attributes.
>- Header information to be attached to the Bound element such as
>replication timestamp information, source URLs, etc.
>- Custom data exchanged between specialised tasks.  For example, a
>polygon processing task might add full geometric information to a way.
> 
> To add some flexibility I'm thinking along the following lines:
> 
>- Add a new collection to entities that can be optionally populated with
>String/Object pairs.  Conceptually similar to a Map but
>possibly stored like existing Tag objects in a simple Collection (currently
>implemented as an ArrayList) for efficiency.
>- The collection may be null when no data is required to minimise
>overhead in the common case.  Consumers would need to explicitly check for
>null which is a tad ugly but I think warranted here.
>- Modify key tasks such as XML tasks to support serialising these
>additional values as attributes on the entities themselves (eg. version=1 ... mutated="true" /> ).  Alternatively represent them as
>sub-elements (eg. metatag stored as v="myvalue">) .  The object would simply have the toString method
>called on it to get a string representation.  Reading from XML would result
>in a String object.
>- Tasks not caring about the data would simply pass the objects on
>without modification.
>- Some Sink tasks such as PostgreSQL database tasks would ignore the
>additional data.
>- Some tasks such as --bounding-box could add a flag such as "mutated".
>- Rename the existing Bound entity to something more generic like Header
>to allow more file attributes to be persisted.
> 
> I think this approach would allow additional data to be attached to entities
> in a generic fashion without Osmosis itself having to add special support
> for it.  It would keep the pipeline generic but allow specialised tasks to
> exchange their own custom data.  I think representing the value part of data
> as an Object rather than String makes more sense because it allows custom
> tasks to exchange complete objects instead of forcing serialisation to and
> from String.
> 
> The additional data could in theory be represented as Tags without changing
> the pipeline at all, but it gets messy mixing real data with metadata.
> 
> I'm not sure if it makes sense to add support for this to the Bound object,
> or to simply allow Tag objects to be added instead.  Perhaps tags make more
> sense here?  The whole Bound concept has always fitted awkwardly in Osmosis,
> so I'm not sure how to tackle this one.
> 
> Hmm, a somewhat rambling email :-)  Any thoughts?

A flexible mechanism like this would be very interesting and useful, but I also
see a lot of potential for confusion. Some tasks can handle certain extra data,
some tasks can't. Some would only work if extra data is present, some would
silently do wrong things when expected data is not present, etc. Currently
all tasks can be plugged together and if you are trying to combine them in a
way that doesn't work (for instance a task reading change data instead of one
reading plain data), Osmosis will complain.

So I think this needs to be a bit more formalized so Osmosis can still make
those checks.

The Map could be connected to some registry for the strings
(or the strings would be objects instead with some extra information.) For
each one you would have at least the two options:
If a tasks encounters an object of this type in the pipeline and doesn't
understand it,
* it will just ignore it and optionally pass it on
* it has to complain.

Jochen
-- 
Jochen Topf  joc...@remote.org  http://www.remote.org/jochen/  +49-721-388298


___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


Re: [osmosis-dev] Proposal for Allowing Additional Data in Pipeline

2011-05-11 Thread Bartosz Fabianowski
Additional meta data will also enable much more flexible filters. Right 
now, if you want to find all nodes, ways and relations carrying a 
particular tag plus the primitives referenced by these, you have to set 
up three copies of the input stream, filtering each separately and then 
merging the results. With additional meta data, you could easily chain 
the required filters into a single pipeline, passing information about 
which ways/nodes/relations have been selected via an additional flag. In 
this case, the additional meta data would used internally and would not 
get stored in the resulting OSM file.


In the use cases you described, the OSM file receives the meta data. 
Note that for the PBF format, this will require extending the protocol 
definition.


- Bartosz

___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev


[osmosis-dev] Proposal for Allowing Additional Data in Pipeline

2011-05-11 Thread Brett Henderson
Hi All,

This email is just musings at this point.  I'm not sure if I'll be able to
implement anything anytime soon, but I'd be interested in people's thoughts
on this.

Until now I've intentionally kept the core data classes in Osmosis as simple
as possible to simplify maintenance and ensure consistency across all
tasks.  I've only added attributes that are required to support basic OSM
data and avoided any extensions from creeping in.

However it can be quite limiting when there is no way of passing additional
data through the pipeline.  Examples of additional data might be:

   - A "mutated" flag of some kind to flag when a particular entity has been
   changed and shouldn't be uploaded to the main API.  An example is when ways
   are clipped at bounding box boundaries.
   - A "visible" flag.  I hesitate to include this one because Osmosis
   supports this via change streams, not optional visible attributes.
   - Header information to be attached to the Bound element such as
   replication timestamp information, source URLs, etc.
   - Custom data exchanged between specialised tasks.  For example, a
   polygon processing task might add full geometric information to a way.

To add some flexibility I'm thinking along the following lines:

   - Add a new collection to entities that can be optionally populated with
   String/Object pairs.  Conceptually similar to a Map but
   possibly stored like existing Tag objects in a simple Collection (currently
   implemented as an ArrayList) for efficiency.
   - The collection may be null when no data is required to minimise
   overhead in the common case.  Consumers would need to explicitly check for
   null which is a tad ugly but I think warranted here.
   - Modify key tasks such as XML tasks to support serialising these
   additional values as attributes on the entities themselves (eg.  ).  Alternatively represent them as
   sub-elements (eg. metatag stored as ) .  The object would simply have the toString method
   called on it to get a string representation.  Reading from XML would result
   in a String object.
   - Tasks not caring about the data would simply pass the objects on
   without modification.
   - Some Sink tasks such as PostgreSQL database tasks would ignore the
   additional data.
   - Some tasks such as --bounding-box could add a flag such as "mutated".
   - Rename the existing Bound entity to something more generic like Header
   to allow more file attributes to be persisted.

I think this approach would allow additional data to be attached to entities
in a generic fashion without Osmosis itself having to add special support
for it.  It would keep the pipeline generic but allow specialised tasks to
exchange their own custom data.  I think representing the value part of data
as an Object rather than String makes more sense because it allows custom
tasks to exchange complete objects instead of forcing serialisation to and
from String.

The additional data could in theory be represented as Tags without changing
the pipeline at all, but it gets messy mixing real data with metadata.

I'm not sure if it makes sense to add support for this to the Bound object,
or to simply allow Tag objects to be added instead.  Perhaps tags make more
sense here?  The whole Bound concept has always fitted awkwardly in Osmosis,
so I'm not sure how to tackle this one.

Hmm, a somewhat rambling email :-)  Any thoughts?

Cheers
Brett
___
osmosis-dev mailing list
osmosis-dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/osmosis-dev