Re: [Talk-ca] OSM data quality in Canada
On Wed, Jun 17, 2015 at 4:12 PM, Martijn van Exel m...@rtijn.org wrote: Hello list — My name is Martijn van Exel, I am on the OSM US board and work at Telenav. I’ve written to this list a few times before, but this time I am doing so with my Telenav hat on. Perhaps you know that we have the Scout apps (iOS, Android) which run on OSM data. (If you haven’t yet, please give Scout a try some time and let me know what you think!) Also the Scout app is not available in Canada right now, are you planning to make it available in Canada in the future? ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
On Wed, 17 Jun 2015, Martijn van Exel wrote: Hi Andrew, Thanks for elaborating on the CanVec / Geobase imports! This also raises new questions.. See below. On Jun 17, 2015, at 3:00 PM, Andrew MacKinnon andrew...@gmail.com wrote: A lot of the data in Canada was imported from CanVec and Geobase, some of it by me several years ago. The imported data is pretty poor quality in many places. I haven't done much work on this recently, as imports have a bad reputation in OSM and I am mostly concerned with surveying. For example: - Some older road data comes from an import which combined CanVec and Statistics Canada road names, attempting to match the road names in Statistics Canada with roads without names from CanVec, and this data is poor quality. Is this described in more detail anywhere? Are the data / scripts / process still available? Which dat was poor quality, CanVec or Statistics Canada? The StatsCan geometries were really poor at least as bad as the original TIGER stuff but they were the only source of road names in some places. The scripts used for the geobase-osm (and attaching statscan names) are available at http://svn.openstreetmap.org/applications/utils/import/geobase2osm I only did this in Alberta and Ontario. We tried to use roadmatcher to only include road segments that we were pretty sure didn't already exist in OSM. This often left gaps in road segments where roadmatcher wasn't sure if something was or wasn't included. Also we didn't have any way of automatically attaching the existing OSM ways with the new geobase ways which left A LOT of unconnected roads. This has mostly been fixed (often thanks to keeprite and maproulette) but it tooks many years. Some of the initial sections also didn't connect new geobase roads with each other due to a bug the import script, we tried to fix this with repair scripts at the time. Steve ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
Paul Norman wrote: Address interpolation indicating roads where there are no roads is an interesting one, and might be suitable to a QA tool. Just recall that there are two issues with this one. .. - Imports of addresses data have been done without importing corresponding roads (there are roads on the ground but not in OSM); - Some addresses data are wrong - then there are no roads associated to them Daniel -Original Message- From: Paul Norman [mailto:penor...@mac.com] Sent: June-18-15 17:15 To: talk-ca@openstreetmap.org Subject: Re: [Talk-ca] OSM data quality in Canada On 6/17/2015 1:12 PM, Martijn van Exel wrote: * What is the imports history, particularly in relation to road network, POIs and addresses? (Beyond what’s in the import catalogue page on the wiki, if anything) CanVec, National Hydrographic Network (NHN), and National Road Network (NRN), all out of Natural Resources Canada (NRCan). CanVec is a product supplied in .osm format composed of multiple government datasources, including the NHN and NRN. The sources used vary by region, so what is true somewhere may not be true elsewhere. * What external (government and otherwise) open geospatial data sources are out there that have been or may be considered for improving OSM? There is probably an equivalent to TIGER address ranges that should be used by a geocoder as a fallback in the same manner. I'm not aware of anything really under consideration. Data released by the federal government under their OGL variant is okay license-wise, but the same is not always true for the provincial and municipal data. * Are there any Canada-specific mapping and tagging conventions? Because roads are largely the responsibility of provinces, road classification varies province by province. * Are there any known big (national) issues in the Canadian OSM data? (misguided imports / bots, major tagging disputes, that kind of thing) CanVec has left parts of the country a colossal mess. I would say the forest/water data is the worst, often coming from different sources from the 70s, and these sources often do not agree with each other. When faced with 40 year old imported landcover data that doesn't resemble reality, the best option is often to just delete it. There are some regional quirks with CanVec. These include - Poor alignment of water or trees with each other - Forests on what are now residential areas - Incorrect surface or lanes values - Invalid housenumbers (-1) - Interpolation used for what should be a single number - Interpolation where there aren't roads in the data - Extra spaces in some road names - Unclassified roads tagged as residential NRN and NHN were less wildly imported. Not having landcover, they don't have those problems, but do have some of their own - Incorrect surface or lanes values (NRN) - Lots of tag cruft (Both) - Badly overnoded streams (NHN) - Streams with oneway (NHN) - Non-standard tagging (NHN) * Which (other) companies / organizations / government agencies use OSM data for Canada? NRCan used to use CanVec and OSM matching to find locations missing in their dataset, but I'm not sure if they do this anymore. * Any suggestions for QA tools that would help the community, either existing or new? Beyond the standard international ones, I'm not sure. The incorrect surface, lanes, housenumbers, and extra spaces are probably all amenable to a mechanical edit rather than a QA tool. Some headway has been made with mechanical edits. The tag cruft will remove itself over time as people edit the objects. Overlapping water/trees from CanVec are so easy to find, and I'm not sure a QA tool is the best choice where the time to fix hugely outweighs the time to find. Address interpolation indicating roads where there are no roads is an interesting one, and might be suitable to a QA tool. ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
On 6/17/2015 1:12 PM, Martijn van Exel wrote: * What is the imports history, particularly in relation to road network, POIs and addresses? (Beyond what’s in the import catalogue page on the wiki, if anything) CanVec, National Hydrographic Network (NHN), and National Road Network (NRN), all out of Natural Resources Canada (NRCan). CanVec is a product supplied in .osm format composed of multiple government datasources, including the NHN and NRN. The sources used vary by region, so what is true somewhere may not be true elsewhere. * What external (government and otherwise) open geospatial data sources are out there that have been or may be considered for improving OSM? There is probably an equivalent to TIGER address ranges that should be used by a geocoder as a fallback in the same manner. I'm not aware of anything really under consideration. Data released by the federal government under their OGL variant is okay license-wise, but the same is not always true for the provincial and municipal data. * Are there any Canada-specific mapping and tagging conventions? Because roads are largely the responsibility of provinces, road classification varies province by province. * Are there any known big (national) issues in the Canadian OSM data? (misguided imports / bots, major tagging disputes, that kind of thing) CanVec has left parts of the country a colossal mess. I would say the forest/water data is the worst, often coming from different sources from the 70s, and these sources often do not agree with each other. When faced with 40 year old imported landcover data that doesn't resemble reality, the best option is often to just delete it. There are some regional quirks with CanVec. These include - Poor alignment of water or trees with each other - Forests on what are now residential areas - Incorrect surface or lanes values - Invalid housenumbers (-1) - Interpolation used for what should be a single number - Interpolation where there aren't roads in the data - Extra spaces in some road names - Unclassified roads tagged as residential NRN and NHN were less wildly imported. Not having landcover, they don't have those problems, but do have some of their own - Incorrect surface or lanes values (NRN) - Lots of tag cruft (Both) - Badly overnoded streams (NHN) - Streams with oneway (NHN) - Non-standard tagging (NHN) * Which (other) companies / organizations / government agencies use OSM data for Canada? NRCan used to use CanVec and OSM matching to find locations missing in their dataset, but I'm not sure if they do this anymore. * Any suggestions for QA tools that would help the community, either existing or new? Beyond the standard international ones, I'm not sure. The incorrect surface, lanes, housenumbers, and extra spaces are probably all amenable to a mechanical edit rather than a QA tool. Some headway has been made with mechanical edits. The tag cruft will remove itself over time as people edit the objects. Overlapping water/trees from CanVec are so easy to find, and I'm not sure a QA tool is the best choice where the time to fix hugely outweighs the time to find. Address interpolation indicating roads where there are no roads is an interesting one, and might be suitable to a QA tool. ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
[Talk-ca] OSM data quality in Canada
Hello list — My name is Martijn van Exel, I am on the OSM US board and work at Telenav. I’ve written to this list a few times before, but this time I am doing so with my Telenav hat on. Perhaps you know that we have the Scout apps (iOS, Android) which run on OSM data. (If you haven’t yet, please give Scout a try some time and let me know what you think!) We are always looking into ways to make significant contributions to OSM, in the US, Canada and elsewhere. We’re starting to look into Canada more, and I could really use your help with a few key questions: * What is the imports history, particularly in relation to road network, POIs and addresses? (Beyond what’s in the import catalogue page on the wiki, if anything) * What external (government and otherwise) open geospatial data sources are out there that have been or may be considered for improving OSM? * Are there any Canada-specific mapping and tagging conventions? * Are there any known big (national) issues in the Canadian OSM data? (misguided imports / bots, major tagging disputes, that kind of thing) * Which (other) companies / organizations / government agencies use OSM data for Canada? * Any suggestions for QA tools that would help the community, either existing or new? I’m happy to discuss on-list or off. Thanks! Martijn ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
Unrelated, but I noticed that talk-ca is not archived on Nabble yet - this makes it hard to share and follow a conversation as a non-subscriber. I don’t know what’s involved in adding this list or if anyone would object? Martijn On Jun 17, 2015, at 4:47 PM, Martijn van Exel m...@rtijn.org wrote: Hi Andrew, Thanks for elaborating on the CanVec / Geobase imports! This also raises new questions.. See below. On Jun 17, 2015, at 3:00 PM, Andrew MacKinnon andrew...@gmail.com wrote: A lot of the data in Canada was imported from CanVec and Geobase, some of it by me several years ago. The imported data is pretty poor quality in many places. I haven't done much work on this recently, as imports have a bad reputation in OSM and I am mostly concerned with surveying. For example: - Some older road data comes from an import which combined CanVec and Statistics Canada road names, attempting to match the road names in Statistics Canada with roads without names from CanVec, and this data is poor quality. Is this described in more detail anywhere? Are the data / scripts / process still available? Which dat was poor quality, CanVec or Statistics Canada? - Road data in some areas is missing entirely. This is probably easy to visualize, but do you happen to know where / why? - The CanVec address data is low quality, and is often broken - e.g. on a tile boundary address ranges will be split in half, and comes from several different versions of CanVec. - Other CanVec layers such as woods, lakes and so on were imported in some areas but not others. Much of this data is low quality. Was some sort of progress page kept so we could see where certain features were imported or not (yet)? Has a followup ever been considered to augment / fix these botched / low quality imports? - Some road names have too many spaces e.g. John Street is John Street. Some address ranges are like that as well. - lanes=-1 and surface=unpaved for roads that are really paved in Quebec. - Better quality municipal GIS datasets are now available in some cities like Toronto, Peel Region and York Region and if they are properly licensed, these should be used whenever possible. There generally are some minor errors in these datasets, but they are far better quality than CanVec/Geobase. Ah, interesting. Is there already a list of these candidates or would it make sense to start one and look into proper licensing? I really like the TO-FIX Tiger Delta layer at http://osmlab.github.io/to-fix/#/task/tigerdelta which matches TIGER data with OSM data and tries to find errors. It would helpful if a similar tool were created for Canada. Obviously I am partial to MapRoulette, but sure, let me check it out, I am sure we can come up with something similar for Canada. What would the reference data be instead of TIGER? Again, thanks for your insights, Andrew. Martijn ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
See http://wiki.openstreetmap.org/wiki/CanVec. CanVec data was converted to OSM format and is stored at http://ftp2.cits.rncan.gc.ca/OSM/pub/, and is split into files based on the National Topographic System, and then data was imported in some parts of Canada by manually cutting and pasting data from these files into JOSM. I did this in a large part of southern Ontario and some other users have done this as well. Importing CanVec data this way and correcting all the errors is tedious and hasn't been completed for all of Canada, and I haven't done very much with this for several years. Before this was done there were more primitive imports done, perhaps around 2008-2009 or so, and these imports are extremely low quality. I can't remember which OSM user did this. When OSM was new there was not much data in OSM, so a lot of imports were done and many of these imports were poor quality; now that OSM is more mature, imports are increasingly viewed unfavourably and there is a general attitude that data should be collected by surveying whenever possible. It would probably be best to use the newest version of the Geobase National Road Network (http://www.geobase.ca/) and compare this to the data in OSM and make corrections that way. Keep in mind that this data has errors and municipal datasets (where available) are always better quality. ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
Also see Ordinance Survey Locator Musical Chairs http://wiki.openstreetmap.org/wiki/OS_Locator_Musical_Chairs and http://ris.dev.openstreetmap.org/oslmusicalchairs/map for a comparison tool comparing UK Ordinance Survey data with OSM data, similar to the TIGER fixup tool. On Wed, Jun 17, 2015 at 7:10 PM, Andrew MacKinnon andrew...@gmail.com wrote: See http://wiki.openstreetmap.org/wiki/CanVec. CanVec data was converted to OSM format and is stored at http://ftp2.cits.rncan.gc.ca/OSM/pub/, and is split into files based on the National Topographic System, and then data was imported in some parts of Canada by manually cutting and pasting data from these files into JOSM. I did this in a large part of southern Ontario and some other users have done this as well. Importing CanVec data this way and correcting all the errors is tedious and hasn't been completed for all of Canada, and I haven't done very much with this for several years. Before this was done there were more primitive imports done, perhaps around 2008-2009 or so, and these imports are extremely low quality. I can't remember which OSM user did this. When OSM was new there was not much data in OSM, so a lot of imports were done and many of these imports were poor quality; now that OSM is more mature, imports are increasingly viewed unfavourably and there is a general attitude that data should be collected by surveying whenever possible. It would probably be best to use the newest version of the Geobase National Road Network (http://www.geobase.ca/) and compare this to the data in OSM and make corrections that way. Keep in mind that this data has errors and municipal datasets (where available) are always better quality. ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
Hi Andrew, Thanks for elaborating on the CanVec / Geobase imports! This also raises new questions.. See below. On Jun 17, 2015, at 3:00 PM, Andrew MacKinnon andrew...@gmail.com wrote: A lot of the data in Canada was imported from CanVec and Geobase, some of it by me several years ago. The imported data is pretty poor quality in many places. I haven't done much work on this recently, as imports have a bad reputation in OSM and I am mostly concerned with surveying. For example: - Some older road data comes from an import which combined CanVec and Statistics Canada road names, attempting to match the road names in Statistics Canada with roads without names from CanVec, and this data is poor quality. Is this described in more detail anywhere? Are the data / scripts / process still available? Which dat was poor quality, CanVec or Statistics Canada? - Road data in some areas is missing entirely. This is probably easy to visualize, but do you happen to know where / why? - The CanVec address data is low quality, and is often broken - e.g. on a tile boundary address ranges will be split in half, and comes from several different versions of CanVec. - Other CanVec layers such as woods, lakes and so on were imported in some areas but not others. Much of this data is low quality. Was some sort of progress page kept so we could see where certain features were imported or not (yet)? Has a followup ever been considered to augment / fix these botched / low quality imports? - Some road names have too many spaces e.g. John Street is John Street. Some address ranges are like that as well. - lanes=-1 and surface=unpaved for roads that are really paved in Quebec. - Better quality municipal GIS datasets are now available in some cities like Toronto, Peel Region and York Region and if they are properly licensed, these should be used whenever possible. There generally are some minor errors in these datasets, but they are far better quality than CanVec/Geobase. Ah, interesting. Is there already a list of these candidates or would it make sense to start one and look into proper licensing? I really like the TO-FIX Tiger Delta layer at http://osmlab.github.io/to-fix/#/task/tigerdelta which matches TIGER data with OSM data and tries to find errors. It would helpful if a similar tool were created for Canada. Obviously I am partial to MapRoulette, but sure, let me check it out, I am sure we can come up with something similar for Canada. What would the reference data be instead of TIGER? Again, thanks for your insights, Andrew. Martijn ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
A lot of the data in Canada was imported from CanVec and Geobase, some of it by me several years ago. The imported data is pretty poor quality in many places. I haven't done much work on this recently, as imports have a bad reputation in OSM and I am mostly concerned with surveying. For example: - Some older road data comes from an import which combined CanVec and Statistics Canada road names, attempting to match the road names in Statistics Canada with roads without names from CanVec, and this data is poor quality. - Road data in some areas is missing entirely. - The CanVec address data is low quality, and is often broken - e.g. on a tile boundary address ranges will be split in half, and comes from several different versions of CanVec. - Other CanVec layers such as woods, lakes and so on were imported in some areas but not others. Much of this data is low quality. - Some road names have too many spaces e.g. John Street is John Street. Some address ranges are like that as well. - lanes=-1 and surface=unpaved for roads that are really paved in Quebec. - Better quality municipal GIS datasets are now available in some cities like Toronto, Peel Region and York Region and if they are properly licensed, these should be used whenever possible. There generally are some minor errors in these datasets, but they are far better quality than CanVec/Geobase. I really like the TO-FIX Tiger Delta layer at http://osmlab.github.io/to-fix/#/task/tigerdelta which matches TIGER data with OSM data and tries to find errors. It would helpful if a similar tool were created for Canada. On Wed, Jun 17, 2015 at 4:27 PM, Harald Kliems kli...@gmail.com wrote: A few things I can think of: On Wed, Jun 17, 2015 at 3:13 PM Martijn van Exel m...@rtijn.org wrote: * Are there any Canada-specific mapping and tagging conventions? - There seems to be a strong consensus that what elsewhere would be highway=unclassified is highway=residential, no matter if the road is in a populated area or not. * Are there any known big (national) issues in the Canadian OSM data? (misguided imports / bots, major tagging disputes, that kind of thing) I believe these mostly affect Quebec, but there are two import problems that never got systematically fixed, as far as I know: - CanVec import of highways where lanes=-1 and surface=unpaved. - CanVec or Geobase import where there is an extra blank between the street type designation and the name. E.g. Rue__Sherbrooke instead of Rue_Sherbrooke. Harald (now in the US) ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
If this is the consensus, I've been blissfully unaware and the wiki needs to be updated. The Canadian tagging guidelines (https://wiki.openstreetmap.org/wiki/Canadian_tagging_guidelines#Regional_roadways_.28below_provincially_controlled.29) recommend using unclassified when not in residential areas, and that's how I've been tagging. The CANVEC imports generally use residential as you describe which has led to a lot of mis-tagged highways, but I wouldn't say this is a consensus agreement that this is how we want it to be. It’s just how the data was imported. I'm gradually re-tagging such highways in my area, but there's a lot that need to be fixed across very large areas and not many people working on it. Andrew Lester Victoria, BC, Canada From: Harald Kliems [mailto:kli...@gmail.com] Sent: Wednesday, June 17, 2015 1:27 PM To: Martijn van Exel; talk-ca@openstreetmap.org Subject: Re: [Talk-ca] OSM data quality in Canada A few things I can think of: On Wed, Jun 17, 2015 at 3:13 PM Martijn van Exel m...@rtijn.org mailto:m...@rtijn.org wrote: * Are there any Canada-specific mapping and tagging conventions? - There seems to be a strong consensus that what elsewhere would be highway=unclassified is highway=residential, no matter if the road is in a populated area or not. ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca
Re: [Talk-ca] OSM data quality in Canada
A few things I can think of: On Wed, Jun 17, 2015 at 3:13 PM Martijn van Exel m...@rtijn.org wrote: * Are there any Canada-specific mapping and tagging conventions? - There seems to be a strong consensus that what elsewhere would be highway=unclassified is highway=residential, no matter if the road is in a populated area or not. * Are there any known big (national) issues in the Canadian OSM data? (misguided imports / bots, major tagging disputes, that kind of thing) I believe these mostly affect Quebec, but there are two import problems that never got systematically fixed, as far as I know: - CanVec import of highways where lanes=-1 and surface=unpaved. - CanVec or Geobase import where there is an extra blank between the street type designation and the name. E.g. Rue__Sherbrooke instead of Rue_Sherbrooke. Harald (now in the US) ___ Talk-ca mailing list Talk-ca@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-ca