Re: [OSM-dev] Server-side data validation

Peter Wendorff Fri, 13 Jul 2012 13:38:59 -0700

Am 13.07.2012 20:54, schrieb Paweł Paprota:

Hi Peter,


Thanks for the response.

1) The OSM API is a restful api that allows "live" editing: The editor
software a) opens a changeset, b) creates a node, c) adds a tag - same
for ways.
Between b and c there's an untagged osm element in the database (even if
it's in most cases a very short time).

I think that is a rather orthogonal issue to validation, meaning that
some validation should probably be launched when a changeset closes for
example - true - but more important is the fact that even with the API
calls that you described it is not possible to _end up_ with broken
data.

Of course it is.

Imagine the user's internet connection to be broken after (b) and thechangeset get's closed due to a serverside timeout later.The database has that empty node and probably even other users alreadyuse it (by downloading and editing).It's not possible to invalidate that node completely afterwards, becausethere may be conflicts if you try that.

  So for now I'm trying to discuss this at a more abstract level -
that the contract would be "we can't have X in the database" but how it
is implemented (at changeset close maybe?) - I cannot say (yet) as I am
no expert in OSM. For now more important is whether this kind of
thinking even makes sense for you.

The idea makes sense IMHO, but I don't have an idea how to intelligentlyhandle these checks without big changes to the API style.

[...]

3) the free tagging scheme would allow similar stuff for nodes, too
(while I don't know any issue where that's used currently). A
theoretical example would be a set of nodes, which are defined points
inside a fuzzy area/region and others which are defined points outside
(where there's no concrete, hard boundary defined, e.g. for "the alpes".

I understand the benefits of "free tagging" approach. On the other hand
it is kind of strange that even for "core" keys (e.g. "highway" or
"surface") there is no validation/schema/whatever one calls it.

It's the big question what "core keys" are - and what they are allowedto contain or not.

What do you want to check for the highway key, for example?

The most prominent values are easy - but there are tons of other values,too, that are less easy to "validate", and as long ashighway=emergency_access_point, highway=give_way and similar are"allowed", how to reject invalid highway-tags?

In this case, what is more efficient:

1. Adding one more possible value for "highway" when it is needed and
deploying such a change to production.
2. Constantly cleaning up the database when there are inconsistent
entries (typos etc). In fact I think there is no such process as global
cleanup - there are couple of bots that do so here and there but overall
the data can be inconsistent.

Sure.

But if you decide for the first variant, Mary Mapper cannot decide toadd a useful highway=electrocycleway for the increasing amount ofe-bikes, because the server would reject that tag.The rails coders have to decide to change that, before anyone can addthat tag.Sure: We can assume that seldom a new value for highway will appear -but is that helpful?

I think, again Multipolygons are some kind of a special case, but fromanother point of view: Multipolygon basically is in fact the currentfourth basic datatype of OSM.

Pushing this validation to the server side has several drawbacks:
- usually server load is the bottleneck in osm, not client load.

I understand infrastructure constraints but I think (very-)long-term
pushing stuff to the client-side will cause much more trouble than
dealing with load issues but having consistent database and business
logic (validation) in place.

True, as long as you really can restrict the tagging. As mentioned: thatbrings other problems as it kills the free tagging even in corner cases,where that's the big benefit of osm.

- a check on server side would fix the corresponding tagging and makes
other tagging schemes invalid probably, a contradiction to the free
tagging scheme we have.
- the api would have to change to use transaction like semantics, wich
is again higher server load, but the only way to make sure not to create
these invalid stuff.

For now it is just a thought exercise and discussion but if I could
propose some changes and perhaps implement some proof of concept, would
it be taken seriously? You can say that "open source is about working
not talking" and I should rather do something instead of discussing but
as you can see these are pretty high level things that go against status
quo - that's why I want to make sure my time is well spent...

well...

I fear, I'm not someone really big in coding (for osm) yet, I lookedinto the rails code, but don't know rails ;)

So probably I'm the wrong one to ask.

But usually as far as I can see solutions are taken seriously usually -if they follow the right path; and that right pass might not be whatsomeone imagines - especially if it's not discussed before.


regards
Peter

_______________________________________________
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev

Re: [OSM-dev] Server-side data validation

Reply via email to