Re: [Talk-GB] TfL cycle data published - proposed conflation process

2019-10-13 Thread Martin - CycleStreets



As for the proposal, I agree that a slow and steady approach is required. 
Although I do think we should set a target date. A date by which we are 
happy to start the conflation or have agreed that it is not viable. Would 
be a shame to see it just drag out. Happy to help as much as I can to 
ensure this.


Yes, good point. With such a large dataset, setting a timescale to ensure 
that momentum can be maintained would be sensible. Clearly it would 
depending on resources/time/willingness. I've included a new sentence to 
this effect.



Martin, **  CycleStreets - For Cyclists, By Cyclists
Developer, CycleStreets **  https://www.cyclestreets.net/


___
Talk-GB mailing list
Talk-GB@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] TfL cycle data published - proposed conflation process

2019-10-13 Thread Martin - CycleStreets



On Sun, 13 Oct 2019, Mateusz Konieczny wrote:


(1)

I would suggest also generating  big OSM file with this data (without 
conflation, just what would be imported into unmapped area) and running 
JOSM validator on it.


It may find bugs in data, proposed conversion and in JOSM itself.


That's a really great suggestion - have added that in at the end of point 
4:


"This data should be published as an .osm file for community validation. It 
should be run against the JOSM Validator, essentially checking its 
correctness against a theoretical blank map."






(2)

I would advise also consulting
OSM community after steps 4, 5, 6.

Just post on talk-gb and process feedback.


Yes, I think this is very important and I included that - see mentions in
5) v.
5) vii.
6) xiii.
6) xiv.
and the addition of the JOSM Validation step noted above.



--
(3)

What is also missing is  - posting in imports mailing list
- obtaining permission from OSM community for import (Assuming that 
process continues to be as great as so far it should be without 
problems).


I've added in a new point 4, covering these two:

4. Post in the imports mailing list a full description of the proposed 
process, seeking consent:

https://lists.openstreetmap.org/listinfo/imports



- documenting new proposed tags on OSM Wiki and getting feedback


Added in explicitly as part of point 3.


(Full proposal process is not necessary, but may be considered, but at 
least post about new proposed tags on tagging mailing list. Things like 
that often benefit from additional review)


But mappers should be able to check what exactly will be changed.

Agreeing on principle that data may be useful
does not mean that any import is ok.


Agreed, and added in mention of the tagging list.



-
(4)

Have you considered importing 
some topics separately?

For example - in the first run import just
bicycle parkings.


Useful suggestion. I thought just starting with a small area would be a 
sensible alpha stage, but actually within that having a single asset type 
first, and then following with the other types when that is successful, is 
a good idea.




Martin, **  CycleStreets - For Cyclists, By Cyclists
Developer, CycleStreets **  https://www.cyclestreets.net/
___
Talk-GB mailing list
Talk-GB@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] TfL cycle data published - proposed conflation process

2019-10-13 Thread Mateusz Konieczny
(1)

I would suggest also generating 
big OSM file with this data (without conflation, 
just what would be imported into unmapped area)
and running JOSM validator on it.
It may find bugs in data, proposed
conversion and in JOSM itself.


(2)
I would advise also consulting
OSM community after steps 4, 5, 6.

Just post on talk-gb and process feedback.

--
(3)
What is also missing is 
- posting in imports mailing list
- obtaining permission from OSM community for import
(Assuming that process continues to be as great
as so far it should be without problems).
- documenting new proposed tags on OSM Wiki
and getting feedback
(Full proposal process is not necessary,
but may be considered, but at least
post about new proposed tags on tagging mailing list. 
Things like that often benefit from additional review)

But mappers should be able to check
what exactly will be changed.

Agreeing on principle that data may be useful
does not mean that any import is ok.


-
(4)

Have you considered importing 
some topics separately?

For example - in the first run import just
bicycle parkings.

---++

 13 Oct 2019, 15:17 by list-osm-talk...@cyclestreets.net:

>
>
> I've been looking at the various tools available, e.g. JOSM Conflation, 
> Hootenanny, OpenStreetMap Live Conflation, etc.
>
> Whatever tool is best, a process is needed. May I seek comments on this 
> proposal which would be put to TfL in my report to them:
>
>
> 
>
>
> A proposed process for conflation would be:
>
>
> 1. Seek OSM community agreement in principle that the CID data is useful for 
> OSM (done).
>
>
> 2. Confirm licensing compatibility (done).
>
>
> 3. Consult on proposed technical translation of data (as per discussion in 
> this talk-gb thread, ongoing).
> https://bikedata.cyclestreets.net/tflcid/conversion/
>
>
> 4. Write a programming script based on this technical translation, which 
> converts the CID data (using the version containing OSM IDs) to .osm format. 
> The fundamental aim of this is to get the CID data to be as compatible with 
> OSM norms as possible, so that amount of effort needed in the eventual 
> conflation tool will be as low as possible. This converted dataset is 
> referred to below as the “External OSM-compatible format dataset”. This will 
> require expert programming work undertaken by a programmer fully conversant 
> with the OSM data model. Estimate: 10 days.
>
>
> 5. ALPHA STAGE: small-scale merge of data into OSM. This stage aims to prove 
> that the data is capable of being converted, and to demonstrate to the OSM 
> community that it can be undertaken sensitively and accurately. It does not 
> seek to produce a tool selection recommendation. This work should be 
> undertaken by someone with experience of JOSM and the JOSM Conflation plugin. 
> Estimate: 3-5 days.
>
> i. Identify a suitable extract of the CID data covering only an area of 10-20 
> smaller streets. This should be an outer London area, and avoid main roads, 
> so that in the event that problems materialise, the effect on real users of 
> OSM data is low. It should include both point-based and line-based assets, 
> giving a good overview. It should aim to have a good variety of CID assets 
> rather than the same type of asset dominating.
>
> ii. Install the JOSM editor and the JOSM Conflation plugin, which provides a 
> toolset for this alpha project. JOSM Conflation is the most sensible option, 
> as this is most widely used conflation tool in the OSM community. Although it 
> requires manual inspection, it is workable for an alpha project at this 
> smaller scale.
>
> iii. Attempt a merge of the External OSM-compatible format dataset using this 
> tool.
>
> iv. Carefully and thoroughly observe the correctness of the data, iterating 
> the script output and repeating these alpha steps until correctness is 
> achieved.
>
> v. Save the merged import data into the live OSM dataset and request 
> community feedback.
>
> vi. Manually fix up any identified problems arising from this feedback so 
> that there is correctness, and fix the underlying problem in the script.
>
> vii. At this point, feasibility of conversion has been established, and 
> community confidence will be much stronger.
>
>
> 6. BETA STAGE: larger-scale merge of data for one area. This stage aims to 
> identify the best merging tool for a fuller conversion with a view to 
> creating a fully-optimised workflow. Estimate: 4-8 weeks.
>
> i. Identify a suitable extract of the CID data to undertake a pilot 
> conversion project. One of the 25 CID data packages would be an ideal size 
> for such an evaluation, and each package is likely to contain sufficient 
> variety.
>
> ii. Identify 2-3 most likely merging tools, e.g. JOSM Conflation and 
> Hootenanny (see below).
>
> iii. Install each such merging tool and learn and practice its use. The time 
> required for such installation and evaluation should not be underestimated. 
> These systems involve widely different 

Re: [Talk-GB] TfL cycle data published - proposed conflation process

2019-10-13 Thread Rob Nickerson
Martin wrote:
> e.g. JOSM Conflation, Hootenanny, OpenStreetMap Live Conflation

OK, a few I am aware of, one i'm not. Suspect others won't know them all
either. For those interested:

JOSM Conflation:
https://wiki.openstreetmap.org/wiki/JOSM/Plugins/Conflation

Hootenanny:
https://www.youtube.com/watch?v=LeaTLxVCFmc

OpenStreetMap Live Conflation:
https://www.openstreetmap.org/user/Richard/diary/368524

As for the proposal, I agree that a slow and steady approach is required.
Although I do think we should set a target date. A date by which we are
happy to start the conflation or have agreed that it is not viable. Would
be a shame to see it just drag out. Happy to help as much as I can to
ensure this.

Best regards,
*Rob*
___
Talk-GB mailing list
Talk-GB@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-gb


Re: [Talk-GB] TfL cycle data published - proposed conflation process

2019-10-13 Thread Martin - CycleStreets



I've been looking at the various tools available, e.g. JOSM Conflation, 
Hootenanny, OpenStreetMap Live Conflation, etc.


Whatever tool is best, a process is needed. May I seek comments on this 
proposal which would be put to TfL in my report to them:






A proposed process for conflation would be:


1. Seek OSM community agreement in principle that the CID data is useful 
for OSM (done).



2. Confirm licensing compatibility (done).


3. Consult on proposed technical translation of data (as per discussion in 
this talk-gb thread, ongoing).

https://bikedata.cyclestreets.net/tflcid/conversion/


4. Write a programming script based on this technical translation, which 
converts the CID data (using the version containing OSM IDs) to .osm 
format. The fundamental aim of this is to get the CID data to be as 
compatible with OSM norms as possible, so that amount of effort needed in 
the eventual conflation tool will be as low as possible. This converted 
dataset is referred to below as the “External OSM-compatible format 
dataset”. This will require expert programming work undertaken by a 
programmer fully conversant with the OSM data model. Estimate: 10 days.



5. ALPHA STAGE: small-scale merge of data into OSM. This stage aims to 
prove that the data is capable of being converted, and to demonstrate to 
the OSM community that it can be undertaken sensitively and accurately. It 
does not seek to produce a tool selection recommendation. This work should 
be undertaken by someone with experience of JOSM and the JOSM Conflation 
plugin. Estimate: 3-5 days.


i. Identify a suitable extract of the CID data covering only an area of 
10-20 smaller streets. This should be an outer London area, and avoid main 
roads, so that in the event that problems materialise, the effect on real 
users of OSM data is low. It should include both point-based and line-based 
assets, giving a good overview. It should aim to have a good variety of CID 
assets rather than the same type of asset dominating.


ii. Install the JOSM editor and the JOSM Conflation plugin, which provides 
a toolset for this alpha project. JOSM Conflation is the most sensible 
option, as this is most widely used conflation tool in the OSM community. 
Although it requires manual inspection, it is workable for an alpha project 
at this smaller scale.


iii. Attempt a merge of the External OSM-compatible format dataset using 
this tool.


iv. Carefully and thoroughly observe the correctness of the data, iterating 
the script output and repeating these alpha steps until correctness is 
achieved.


v. Save the merged import data into the live OSM dataset and request 
community feedback.


vi. Manually fix up any identified problems arising from this feedback so 
that there is correctness, and fix the underlying problem in the script.


vii. At this point, feasibility of conversion has been established, and 
community confidence will be much stronger.



6. BETA STAGE: larger-scale merge of data for one area. This stage aims to 
identify the best merging tool for a fuller conversion with a view to 
creating a fully-optimised workflow. Estimate: 4-8 weeks.


i. Identify a suitable extract of the CID data to undertake a pilot 
conversion project. One of the 25 CID data packages would be an ideal size 
for such an evaluation, and each package is likely to contain sufficient 
variety.


ii. Identify 2-3 most likely merging tools, e.g. JOSM Conflation and 
Hootenanny (see below).


iii. Install each such merging tool and learn and practice its use. The 
time required for such installation and evaluation should not be 
underestimated. These systems involve widely different technologies (even 
requiring different operating systems to be installed using a Virtual 
Machine), so this step could easily take 5 days. Test data will need to be 
prepared, trial runs created, questions are likely to need to be asked on 
mailing lists, etc.


iv. Identify the pros and cons of each tool and move towards a recommended 
solution based on trialing with the data and the amount of manual fixing up 
required.


v. Determine and iterate the workflow required for the tool.

vi. Adapt the now near-final script to perform conversion of this larger 
dataset for the selected tool. It is likely that the bulk of the conversion 
script will be unchanged, but that the final output format (e.g. 
.osm/Shapefile/GeoJSON) would need to be different based on the tool’s 
expectations.


vii. Substantial iteration of the conversion script and/or tool workflow is 
then likely to be required. For instance, merging will involve conflating 
data from a cycle lane in the CID data to the cycle lane present in the OSM 
nearby. This scenario is likely to throw up several potentially issues. For 
instance, the OSM ID may in fact now have changed; it might now be 
represented by multiple separate OSM IDs; there might be multiple cycle 
lanes nearby which need to be disambiguated, etc. Another example would