Hi, as you probably know, the EU data protection rules compel us to be a bit less open in handing out personal data to everyone. Following LWG's analyses and recommendations, the OSMF has decided to implement restrictions on publishing user names and changeset IDs.
The general plan is to allow everyone "in OSM" (i.e. with an OSM account) to fully access all data as before (and have a policy that says you must only use the personal data for OSM purposes), while removing user names, user IDs, and changeset IDs from the publicly availalbe data (i.e. what you can get without an OSM account). This requires changes to the API which I've started to sketch here: https://wiki.openstreetmap.org/wiki/GDPR/Affected_Services but this message is about changes to the downloads on planet.openstreetmap.org. Here's a three phase plan for changing the way we run planet.openstreetmap.org, and I would like to hear feedback about the feasibility from users and those familiar with running the site alike. I haven't run this by the sysadmins so if there are any bloopers I hope they will be pointed out. (I will put this up on https://wiki.openstreetmap.org/wiki/GDPR/Planet.osm_Migration and try to work in any results from discussion here but if you're more comfortable to edit directly on the Wiki that's fine too.) Cheers Frederik Phase 1 - Introduction of no-userdata files ------------------------------------------- This does not require software development and could start immediately, but some scripting is required. 1a. set up a new domain for OSM internal data downloads, e.g. "osm-internal.planet.openstreetmap.org", initially duplicating all data. Issue: name of domain? Issue: ironbelly disk usage is at 70%, possible to add space? 1b. modify the planetdump.erb in the planet chef cookbook to generate versions without user information of all the weekly dumps, in addition to the versions with user information; have the versions without user information stored in the old "planet.openstreetmap.org" tree, and the versions with user information in the new "osm-internal" tree. Issue: should files have the same names on internal and public site, or should they be called "planet-with-userdata" and "planet" or something? 1c. modify the replication.cron.erb as follows: * have osmosis write minutely replication files to the new "internal" tree * run a shell script after generating the replication files that will find the newly generated file, pipe it through osmium stripping user information, and write the result to the old "planet" tree, copying the state.txt files as needed * run the osmosis "merge-diff" tasks separately on both trees OR run on internal tree only and pipe result through osmium as above * write changeset replication XMLs to the new "internal" tree only For step 1c, it might make sense to announce a maintenance window beforehand during which the changes will be made, so that consumers who rely on user data can stop their replication for a few hours and then make the switch. 1d. modify planet.openstreetmap.org index pages to point to the internal page in case people wish to download stuff with user data; place marker on internal page that these files are with user data. At the end of phase 1, we will have this situation: * new changeset diffs only on the "internal" tree * regular diffs come in two flavours, with and without user data * planet dumps etc. also come in two flavours * old files are unchanged * consumers will automatically get the stuff without user data * consumers who need user data will have to change their URLs Phase 2 - Cleaning out old files that contain user data ------------------------------------------------------- This can be done slowly in the background over the course of however long it takes: 2a. remove all changeset dumps and changeset diffs from the public tree. 2b. run all .osc, .osm.pbf, and .osm.bz2 files on the public tree through osmium, scrubbing user data (retain file timestamp if possible) and re-creating .md5 files where necessary Phase 3 - Controlling access to files with user data ---------------------------------------------------- Once the parallel systems are up and running, we will want to 3a. issue guidelines about what you are allowed to do with the user data files, 3b. ensure that everyone who has an OSM account agrees to these guidelines one way or the other, 3c. start requiring an OSM login for all downloads from the internal, "with userdata" tree. One possible technical solution for 3c is https://github.com/geofabrik/sendfile_osm_oauth_protector which also comes with a guide for users on how to run it in a scripted setup. -- Frederik Ramm ## eMail [email protected] ## N49°00'09" E008°23'33" _______________________________________________ dev mailing list [email protected] https://lists.openstreetmap.org/listinfo/dev

