Re: [Talk-GB] TfL cycle data published - proposed conflation process
As for the proposal, I agree that a slow and steady approach is required. Although I do think we should set a target date. A date by which we are happy to start the conflation or have agreed that it is not viable. Would be a shame to see it just drag out. Happy to help as much as I can to ensure this. Yes, good point. With such a large dataset, setting a timescale to ensure that momentum can be maintained would be sensible. Clearly it would depending on resources/time/willingness. I've included a new sentence to this effect. Martin, ** CycleStreets - For Cyclists, By Cyclists Developer, CycleStreets ** https://www.cyclestreets.net/ ___ Talk-GB mailing list Talk-GB@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] TfL cycle data published - proposed conflation process
On Sun, 13 Oct 2019, Mateusz Konieczny wrote: (1) I would suggest also generating big OSM file with this data (without conflation, just what would be imported into unmapped area) and running JOSM validator on it. It may find bugs in data, proposed conversion and in JOSM itself. That's a really great suggestion - have added that in at the end of point 4: "This data should be published as an .osm file for community validation. It should be run against the JOSM Validator, essentially checking its correctness against a theoretical blank map." (2) I would advise also consulting OSM community after steps 4, 5, 6. Just post on talk-gb and process feedback. Yes, I think this is very important and I included that - see mentions in 5) v. 5) vii. 6) xiii. 6) xiv. and the addition of the JOSM Validation step noted above. -- (3) What is also missing is - posting in imports mailing list - obtaining permission from OSM community for import (Assuming that process continues to be as great as so far it should be without problems). I've added in a new point 4, covering these two: 4. Post in the imports mailing list a full description of the proposed process, seeking consent: https://lists.openstreetmap.org/listinfo/imports - documenting new proposed tags on OSM Wiki and getting feedback Added in explicitly as part of point 3. (Full proposal process is not necessary, but may be considered, but at least post about new proposed tags on tagging mailing list. Things like that often benefit from additional review) But mappers should be able to check what exactly will be changed. Agreeing on principle that data may be useful does not mean that any import is ok. Agreed, and added in mention of the tagging list. - (4) Have you considered importing some topics separately? For example - in the first run import just bicycle parkings. Useful suggestion. I thought just starting with a small area would be a sensible alpha stage, but actually within that having a single asset type first, and then following with the other types when that is successful, is a good idea. Martin, ** CycleStreets - For Cyclists, By Cyclists Developer, CycleStreets ** https://www.cyclestreets.net/ ___ Talk-GB mailing list Talk-GB@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] TfL cycle data published - proposed conflation process
(1) I would suggest also generating big OSM file with this data (without conflation, just what would be imported into unmapped area) and running JOSM validator on it. It may find bugs in data, proposed conversion and in JOSM itself. (2) I would advise also consulting OSM community after steps 4, 5, 6. Just post on talk-gb and process feedback. -- (3) What is also missing is - posting in imports mailing list - obtaining permission from OSM community for import (Assuming that process continues to be as great as so far it should be without problems). - documenting new proposed tags on OSM Wiki and getting feedback (Full proposal process is not necessary, but may be considered, but at least post about new proposed tags on tagging mailing list. Things like that often benefit from additional review) But mappers should be able to check what exactly will be changed. Agreeing on principle that data may be useful does not mean that any import is ok. - (4) Have you considered importing some topics separately? For example - in the first run import just bicycle parkings. ---++ 13 Oct 2019, 15:17 by list-osm-talk...@cyclestreets.net: > > > I've been looking at the various tools available, e.g. JOSM Conflation, > Hootenanny, OpenStreetMap Live Conflation, etc. > > Whatever tool is best, a process is needed. May I seek comments on this > proposal which would be put to TfL in my report to them: > > > > > > A proposed process for conflation would be: > > > 1. Seek OSM community agreement in principle that the CID data is useful for > OSM (done). > > > 2. Confirm licensing compatibility (done). > > > 3. Consult on proposed technical translation of data (as per discussion in > this talk-gb thread, ongoing). > https://bikedata.cyclestreets.net/tflcid/conversion/ > > > 4. Write a programming script based on this technical translation, which > converts the CID data (using the version containing OSM IDs) to .osm format. > The fundamental aim of this is to get the CID data to be as compatible with > OSM norms as possible, so that amount of effort needed in the eventual > conflation tool will be as low as possible. This converted dataset is > referred to below as the “External OSM-compatible format dataset”. This will > require expert programming work undertaken by a programmer fully conversant > with the OSM data model. Estimate: 10 days. > > > 5. ALPHA STAGE: small-scale merge of data into OSM. This stage aims to prove > that the data is capable of being converted, and to demonstrate to the OSM > community that it can be undertaken sensitively and accurately. It does not > seek to produce a tool selection recommendation. This work should be > undertaken by someone with experience of JOSM and the JOSM Conflation plugin. > Estimate: 3-5 days. > > i. Identify a suitable extract of the CID data covering only an area of 10-20 > smaller streets. This should be an outer London area, and avoid main roads, > so that in the event that problems materialise, the effect on real users of > OSM data is low. It should include both point-based and line-based assets, > giving a good overview. It should aim to have a good variety of CID assets > rather than the same type of asset dominating. > > ii. Install the JOSM editor and the JOSM Conflation plugin, which provides a > toolset for this alpha project. JOSM Conflation is the most sensible option, > as this is most widely used conflation tool in the OSM community. Although it > requires manual inspection, it is workable for an alpha project at this > smaller scale. > > iii. Attempt a merge of the External OSM-compatible format dataset using this > tool. > > iv. Carefully and thoroughly observe the correctness of the data, iterating > the script output and repeating these alpha steps until correctness is > achieved. > > v. Save the merged import data into the live OSM dataset and request > community feedback. > > vi. Manually fix up any identified problems arising from this feedback so > that there is correctness, and fix the underlying problem in the script. > > vii. At this point, feasibility of conversion has been established, and > community confidence will be much stronger. > > > 6. BETA STAGE: larger-scale merge of data for one area. This stage aims to > identify the best merging tool for a fuller conversion with a view to > creating a fully-optimised workflow. Estimate: 4-8 weeks. > > i. Identify a suitable extract of the CID data to undertake a pilot > conversion project. One of the 25 CID data packages would be an ideal size > for such an evaluation, and each package is likely to contain sufficient > variety. > > ii. Identify 2-3 most likely merging tools, e.g. JOSM Conflation and > Hootenanny (see below). > > iii. Install each such merging tool and learn and practice its use. The time > required for such installation and evaluation should not be underestimated. > These systems involve widely different
Re: [Talk-GB] TfL cycle data published - proposed conflation process
Martin wrote: > e.g. JOSM Conflation, Hootenanny, OpenStreetMap Live Conflation OK, a few I am aware of, one i'm not. Suspect others won't know them all either. For those interested: JOSM Conflation: https://wiki.openstreetmap.org/wiki/JOSM/Plugins/Conflation Hootenanny: https://www.youtube.com/watch?v=LeaTLxVCFmc OpenStreetMap Live Conflation: https://www.openstreetmap.org/user/Richard/diary/368524 As for the proposal, I agree that a slow and steady approach is required. Although I do think we should set a target date. A date by which we are happy to start the conflation or have agreed that it is not viable. Would be a shame to see it just drag out. Happy to help as much as I can to ensure this. Best regards, *Rob* ___ Talk-GB mailing list Talk-GB@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-gb
Re: [Talk-GB] TfL cycle data published - proposed conflation process
I've been looking at the various tools available, e.g. JOSM Conflation, Hootenanny, OpenStreetMap Live Conflation, etc. Whatever tool is best, a process is needed. May I seek comments on this proposal which would be put to TfL in my report to them: A proposed process for conflation would be: 1. Seek OSM community agreement in principle that the CID data is useful for OSM (done). 2. Confirm licensing compatibility (done). 3. Consult on proposed technical translation of data (as per discussion in this talk-gb thread, ongoing). https://bikedata.cyclestreets.net/tflcid/conversion/ 4. Write a programming script based on this technical translation, which converts the CID data (using the version containing OSM IDs) to .osm format. The fundamental aim of this is to get the CID data to be as compatible with OSM norms as possible, so that amount of effort needed in the eventual conflation tool will be as low as possible. This converted dataset is referred to below as the “External OSM-compatible format dataset”. This will require expert programming work undertaken by a programmer fully conversant with the OSM data model. Estimate: 10 days. 5. ALPHA STAGE: small-scale merge of data into OSM. This stage aims to prove that the data is capable of being converted, and to demonstrate to the OSM community that it can be undertaken sensitively and accurately. It does not seek to produce a tool selection recommendation. This work should be undertaken by someone with experience of JOSM and the JOSM Conflation plugin. Estimate: 3-5 days. i. Identify a suitable extract of the CID data covering only an area of 10-20 smaller streets. This should be an outer London area, and avoid main roads, so that in the event that problems materialise, the effect on real users of OSM data is low. It should include both point-based and line-based assets, giving a good overview. It should aim to have a good variety of CID assets rather than the same type of asset dominating. ii. Install the JOSM editor and the JOSM Conflation plugin, which provides a toolset for this alpha project. JOSM Conflation is the most sensible option, as this is most widely used conflation tool in the OSM community. Although it requires manual inspection, it is workable for an alpha project at this smaller scale. iii. Attempt a merge of the External OSM-compatible format dataset using this tool. iv. Carefully and thoroughly observe the correctness of the data, iterating the script output and repeating these alpha steps until correctness is achieved. v. Save the merged import data into the live OSM dataset and request community feedback. vi. Manually fix up any identified problems arising from this feedback so that there is correctness, and fix the underlying problem in the script. vii. At this point, feasibility of conversion has been established, and community confidence will be much stronger. 6. BETA STAGE: larger-scale merge of data for one area. This stage aims to identify the best merging tool for a fuller conversion with a view to creating a fully-optimised workflow. Estimate: 4-8 weeks. i. Identify a suitable extract of the CID data to undertake a pilot conversion project. One of the 25 CID data packages would be an ideal size for such an evaluation, and each package is likely to contain sufficient variety. ii. Identify 2-3 most likely merging tools, e.g. JOSM Conflation and Hootenanny (see below). iii. Install each such merging tool and learn and practice its use. The time required for such installation and evaluation should not be underestimated. These systems involve widely different technologies (even requiring different operating systems to be installed using a Virtual Machine), so this step could easily take 5 days. Test data will need to be prepared, trial runs created, questions are likely to need to be asked on mailing lists, etc. iv. Identify the pros and cons of each tool and move towards a recommended solution based on trialing with the data and the amount of manual fixing up required. v. Determine and iterate the workflow required for the tool. vi. Adapt the now near-final script to perform conversion of this larger dataset for the selected tool. It is likely that the bulk of the conversion script will be unchanged, but that the final output format (e.g. .osm/Shapefile/GeoJSON) would need to be different based on the tool’s expectations. vii. Substantial iteration of the conversion script and/or tool workflow is then likely to be required. For instance, merging will involve conflating data from a cycle lane in the CID data to the cycle lane present in the OSM nearby. This scenario is likely to throw up several potentially issues. For instance, the OSM ID may in fact now have changed; it might now be represented by multiple separate OSM IDs; there might be multiple cycle lanes nearby which need to be disambiguated, etc. Another example would