Re: [Imports] New module to merge-sort imports over time (osmfetch python)

Jaak Laineste Fri, 26 Aug 2011 02:57:03 -0700

On 26.08.2011, at 8:34, Bryce Nesbitt wrote:

>> On Thu, 25 Aug 2011, Bryce Nesbitt wrote: 
>> 
>>> I disagree here.  If the external source is "ground truth", then that's the 
>>> data that should take precedence.  A car share operator for example won't 
>>> want a disused location shown, and may well make that a requirement of 
>>> permitting the merge-sort.  An import from a car share reservation system 
>>> is definitive ground truth. 
>> 
>> A car share operator, or anyone else can go in and mark a location as closed 
>> or delete it but the operator isn't automatically considered more 
>> authoratative than any other mapper.  For example let us say a fast-food 
>> chain imports a database of their restaraunt locations and marks some of 
>> them as wheelchair=yes but I go visit the location and feel it doesn't 
>> qualify for that tag. The operators database can't complain ground truth or 
>> authority over someone who has actually visited that location and an 
>> automated script shouldn't 'undo' my change every month.
> 
> That's the cool thing about the proposed approach.  Who is the authority for 
> each tag is scriptable.  wheelchair=yes can (and would) be mastered in osm. 
> Exact lat/lon would always be mastered in osm.  Heck the script could even 
> set an OpenStreetBug if it cared to resolve a minor-tag discrepancy 
> ("Chain-store says toilets=permissive, local mapper says toilets=no, who is 
> right?").  But in general I'd leave all those tags to humans.
> 
> But if the fast-food chain claims a store is closed, well... I'd go with that 
> as first cut.
> Similarly if fast-food chain says a store is now open.
> If the car-share reservation system says there is now a "Prius" and a 
> "Batmobile" for hire, I'd go with that over older community data that is 
> likely stale.
> 
> The car share data produced by the community process was highly spotty.  The 
> reservation system data is complete.  But you can have it both ways: osm 
> contributors can add all sorts of tags (description, photos, etc.) and the 
> merge process will keep the best of both on the same node, with full history.
> 
> The automated tool in question already shows the human operator the diff: so 
> a human is still in control.  Perhaps it could be extended to detect and flag 
> any potential edit wars (e.g. same tag 'corrected' twice)?  Would that 
> satisfy the objection?


What is this "care share location" really? Some special spot, or co-located 
with other amenities (gas stations, bus stations, buildings etc)? If the 
objects are really autonomous nodes, connected to nothing, then your solution 
could work well. I can imagine similar situation with other very specific 
datasets: say elevation info (DEM), half-virtual objects like geocaches. 

If the spots are shared, then you have to merge it with existing (possibly 
conflicting) tags, locations, you link points with ways etc, how exactly it 
would work with your script. 

 I cannot resist to propose also OpenMetaMap solution for your case:

a) if spots are autonomous:
1. Car share operator publishes their data as OSM file. They need to do it for 
import anyways.
2. They put their URL to OMM data directory
3. Users (like OSM main Mapnik renderer) will find the latest situation it from 
there, and will add it as a data layer. I guess is this is your main reason to 
import it.

No data duplication, sync etc needed. 

If the dataset is linked to existing OSM objects then it would be more 
complicated:
1. and 2. steps - same as above
3. you open JOSM and download both datasets (OSM and yours)
4. you merge data: select 2 points, click "Edit > Merge points". Resolve tag 
conflicts if found, check location. This will create OMM Links for you. You do 
exactly same amount of actions as you need to do with manual merging anyways.
5. Save data: OSM object will be updated if you moved point or changed tags, 
otherwise not. Mostly you save links to OMM.
6. Users will take data from OSM, your database and OMM Links, and merge them 
on the fly. Broken links are not rendered, just like with your import/sync 
script


Advantages:
 - maintenance-free, no need to re-run sync scripts by data provider. Data gets 
rotten not only because there care no manual edits, but also because sooner or 
later you do not run the script anymore, so the foreign key tags in OSM will be 
outdated. 
 - Principal difference is that you will give full control over data links to 
the OSM community. If you are not there to update it, they will; at least if 
the data is really relevant for the community.

Disadvantages / limitations:
- much more tools/code needed than one Python script. Cannot be done today.
- one more trouble, decision to make, for OSMer: should my contribution added 
also to OSM, or kept in external dataset. There should be best practice 
guidelines for it, or maybe could make the decision before user. For the car 
sharing operator it does not really matter: they would get the contribution 
either way: from OMM or from OSM (with help of OMM) database.
- external API will be unavailable after some time. Maybe it takes years, but 
it will happen. For this OMM shall have "persistent cache" (archive) option, if 
data source allows it.

Jaak

_______________________________________________
Imports mailing list
Imports@openstreetmap.org
http://lists.openstreetmap.org/listinfo/imports

Re: [Imports] New module to merge-sort imports over time (osmfetch python)

Reply via email to