Re: [Talk-ca] OSM data quality in Canada

2015-06-28 Thread Andrew MacKinnon
On Wed, Jun 17, 2015 at 4:12 PM, Martijn van Exel m...@rtijn.org wrote:
 Hello list —

 My name is Martijn van Exel, I am on the OSM US board and work at Telenav. 
 I’ve written to this list a few times before, but this time I am doing so 
 with my Telenav hat on. Perhaps you know that we have the Scout apps (iOS, 
 Android) which run on OSM data. (If you haven’t yet, please give Scout a try 
 some time and let me know what you think!)

Also the Scout app is not available in Canada right now, are you
planning to make it available in Canada in the future?

___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-20 Thread Steve Singer

On Wed, 17 Jun 2015, Martijn van Exel wrote:

Hi Andrew, 


Thanks for elaborating on the CanVec / Geobase imports! This also raises new 
questions.. See below.


On Jun 17, 2015, at 3:00 PM, Andrew MacKinnon andrew...@gmail.com wrote:

A lot of the data in Canada was imported from CanVec and Geobase,
some of it by me several years ago. The imported data is pretty poor
quality in many places. I haven't done much work on this recently, as
imports have a bad reputation in OSM and I am mostly concerned with
surveying. For example:

- Some older road data comes from an import which combined CanVec and
Statistics Canada road names, attempting to match the road names in
Statistics Canada with roads without names from CanVec, and this data
is poor quality.


Is this described in more detail anywhere? Are the data / scripts / 
process still available? Which dat was poor quality, CanVec or Statistics 
Canada?


The StatsCan geometries were really poor at least as bad as the original 
TIGER stuff but they were the only source of road names in some places.


The scripts used for the geobase-osm (and attaching statscan names) are 
available at 
http://svn.openstreetmap.org/applications/utils/import/geobase2osm

I only did this in Alberta and Ontario.

We tried to use roadmatcher to only include road segments that we were 
pretty sure didn't already exist in OSM.  This often left gaps in road 
segments where roadmatcher wasn't sure if something was or wasn't included. 
Also we didn't have any way of automatically attaching the existing OSM 
ways with the new geobase ways which left A LOT of unconnected roads.  This 
has mostly been fixed (often thanks to keeprite and maproulette) but it 
tooks many years.


Some of the initial sections also didn't connect new geobase roads with each 
other due to a bug the import script, we tried to fix this with repair 
scripts at the time.



Steve


___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-19 Thread Daniel Begin
Paul Norman wrote: Address interpolation indicating roads where there are no 
roads is an interesting one, and might be suitable to a QA tool.

Just recall that there are two issues with this one. ..
- Imports of addresses data have been done without importing corresponding 
roads (there are roads on the ground but not in OSM);
- Some addresses data are wrong - then there are no roads associated to them

Daniel

-Original Message-
From: Paul Norman [mailto:penor...@mac.com] 
Sent: June-18-15 17:15
To: talk-ca@openstreetmap.org
Subject: Re: [Talk-ca] OSM data quality in Canada

On 6/17/2015 1:12 PM, Martijn van Exel wrote:
 * What is the imports history, particularly in relation to road 
 network, POIs and addresses? (Beyond what’s in the import catalogue 
 page on the wiki, if anything)
CanVec, National Hydrographic Network (NHN), and National Road Network (NRN), 
all out of Natural Resources Canada (NRCan).

CanVec is a product supplied in .osm format composed of multiple government 
datasources, including the NHN and NRN. The sources used vary by region, so 
what is true somewhere may not be true elsewhere.

 * What external (government and otherwise) open geospatial data sources are 
 out there that have been or may be considered for improving OSM?
There is probably an equivalent to TIGER address ranges that should be used by 
a geocoder as a fallback in the same manner.

I'm not aware of anything really under consideration. Data released by the 
federal government under their OGL variant is okay license-wise, but the same 
is not always true for the provincial and municipal data.
 * Are there any Canada-specific mapping and tagging conventions?
Because roads are largely the responsibility of provinces, road classification 
varies province by province.
 * Are there any known big (national) issues in the Canadian OSM data? 
 (misguided imports / bots, major tagging disputes, that kind of thing)
CanVec has left parts of the country a colossal mess. I would say the 
forest/water data is the worst, often coming from different sources from the 
70s, and these sources often do not agree with each other. When faced with 40 
year old imported landcover data that doesn't resemble reality, the best option 
is often to just delete it.

There are some regional quirks with CanVec. These include

- Poor alignment of water or trees with each other
- Forests on what are now residential areas
- Incorrect surface or lanes values
- Invalid housenumbers (-1)
- Interpolation used for what should be a single number
- Interpolation where there aren't roads in the data
- Extra spaces in some road names
- Unclassified roads tagged as residential

NRN and NHN were less wildly imported. Not having landcover, they don't have 
those problems, but do have some of their own

- Incorrect surface or lanes values (NRN)
- Lots of tag cruft (Both)
- Badly overnoded streams (NHN)
- Streams with oneway (NHN)
- Non-standard tagging (NHN)

 * Which (other) companies / organizations / government agencies use OSM data 
 for Canada?
NRCan used to use CanVec and OSM matching to find locations missing in their 
dataset, but I'm not sure if they do this anymore.
 * Any suggestions for QA tools that would help the community, either existing 
 or new?
Beyond the standard international ones, I'm not sure. The incorrect surface, 
lanes, housenumbers, and extra spaces are probably all amenable to a mechanical 
edit rather than a QA tool. Some headway has been made with mechanical edits. 
The tag cruft will remove itself over time as people edit the objects.

Overlapping water/trees from CanVec are so easy to find, and I'm not sure a QA 
tool is the best choice where the time to fix hugely outweighs the time to find.

Address interpolation indicating roads where there are no roads is an 
interesting one, and might be suitable to a QA tool.

___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-18 Thread Paul Norman

On 6/17/2015 1:12 PM, Martijn van Exel wrote:

* What is the imports history, particularly in relation to road network, POIs 
and addresses? (Beyond what’s in the import catalogue page on the wiki, if 
anything)
CanVec, National Hydrographic Network (NHN), and National Road Network 
(NRN), all out of Natural Resources Canada (NRCan).


CanVec is a product supplied in .osm format composed of multiple 
government datasources, including the NHN and NRN. The sources used vary 
by region, so what is true somewhere may not be true elsewhere.



* What external (government and otherwise) open geospatial data sources are out 
there that have been or may be considered for improving OSM?
There is probably an equivalent to TIGER address ranges that should be 
used by a geocoder as a fallback in the same manner.


I'm not aware of anything really under consideration. Data released by 
the federal government under their OGL variant is okay license-wise, but 
the same is not always true for the provincial and municipal data.

* Are there any Canada-specific mapping and tagging conventions?
Because roads are largely the responsibility of provinces, road 
classification varies province by province.

* Are there any known big (national) issues in the Canadian OSM data? 
(misguided imports / bots, major tagging disputes, that kind of thing)
CanVec has left parts of the country a colossal mess. I would say the 
forest/water data is the worst, often coming from different sources from 
the 70s, and these sources often do not agree with each other. When 
faced with 40 year old imported landcover data that doesn't resemble 
reality, the best option is often to just delete it.


There are some regional quirks with CanVec. These include

- Poor alignment of water or trees with each other
- Forests on what are now residential areas
- Incorrect surface or lanes values
- Invalid housenumbers (-1)
- Interpolation used for what should be a single number
- Interpolation where there aren't roads in the data
- Extra spaces in some road names
- Unclassified roads tagged as residential

NRN and NHN were less wildly imported. Not having landcover, they don't 
have those problems, but do have some of their own


- Incorrect surface or lanes values (NRN)
- Lots of tag cruft (Both)
- Badly overnoded streams (NHN)
- Streams with oneway (NHN)
- Non-standard tagging (NHN)


* Which (other) companies / organizations / government agencies use OSM data 
for Canada?
NRCan used to use CanVec and OSM matching to find locations missing in 
their dataset, but I'm not sure if they do this anymore.

* Any suggestions for QA tools that would help the community, either existing 
or new?
Beyond the standard international ones, I'm not sure. The incorrect 
surface, lanes, housenumbers, and extra spaces are probably all amenable 
to a mechanical edit rather than a QA tool. Some headway has been made 
with mechanical edits. The tag cruft will remove itself over time as 
people edit the objects.


Overlapping water/trees from CanVec are so easy to find, and I'm not 
sure a QA tool is the best choice where the time to fix hugely outweighs 
the time to find.


Address interpolation indicating roads where there are no roads is an 
interesting one, and might be suitable to a QA tool.


___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


[Talk-ca] OSM data quality in Canada

2015-06-17 Thread Martijn van Exel
Hello list — 

My name is Martijn van Exel, I am on the OSM US board and work at Telenav. I’ve 
written to this list a few times before, but this time I am doing so with my 
Telenav hat on. Perhaps you know that we have the Scout apps (iOS, Android) 
which run on OSM data. (If you haven’t yet, please give Scout a try some time 
and let me know what you think!)

We are always looking into ways to make significant contributions to OSM, in 
the US, Canada and elsewhere. We’re starting to look into Canada more, and I 
could really use your help with a few key questions:

* What is the imports history, particularly in relation to road network, POIs 
and addresses? (Beyond what’s in the import catalogue page on the wiki, if 
anything)
* What external (government and otherwise) open geospatial data sources are out 
there that have been or may be considered for improving OSM?
* Are there any Canada-specific mapping and tagging conventions?
* Are there any known big (national) issues in the Canadian OSM data? 
(misguided imports / bots, major tagging disputes, that kind of thing)
* Which (other) companies / organizations / government agencies use OSM data 
for Canada?
* Any suggestions for QA tools that would help the community, either existing 
or new?

I’m happy to discuss on-list or off. Thanks!

Martijn
___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-17 Thread Martijn van Exel
Unrelated, but I noticed that talk-ca is not archived on Nabble yet - this 
makes it hard to share and follow a conversation as a non-subscriber. I don’t 
know what’s involved in adding this list or if anyone would object?

Martijn

 On Jun 17, 2015, at 4:47 PM, Martijn van Exel m...@rtijn.org wrote:
 
 Hi Andrew, 
 
 Thanks for elaborating on the CanVec / Geobase imports! This also raises new 
 questions.. See below.
 
 On Jun 17, 2015, at 3:00 PM, Andrew MacKinnon andrew...@gmail.com wrote:
 
 A lot of the data in Canada was imported from CanVec and Geobase,
 some of it by me several years ago. The imported data is pretty poor
 quality in many places. I haven't done much work on this recently, as
 imports have a bad reputation in OSM and I am mostly concerned with
 surveying. For example:
 
 - Some older road data comes from an import which combined CanVec and
 Statistics Canada road names, attempting to match the road names in
 Statistics Canada with roads without names from CanVec, and this data
 is poor quality.
 
 Is this described in more detail anywhere? Are the data / scripts / process 
 still available? Which dat was poor quality, CanVec or Statistics Canada?
 
 - Road data in some areas is missing entirely.
 
 This is probably easy to visualize, but do you happen to know where / why?
 
 - The CanVec address data is low quality, and is often broken - e.g.
 on a tile boundary address ranges will be split in half, and comes
 from several different versions of CanVec.
 - Other CanVec layers such as woods, lakes and so on were imported in
 some areas but not others. Much of this data is low quality.
 
 Was some sort of progress page kept so we could see where certain features 
 were imported or not (yet)? Has a followup ever been considered to augment / 
 fix these botched / low quality imports? 
 
 - Some road names have too many spaces e.g. John Street is John
 Street. Some address ranges are like that as well.
 - lanes=-1 and surface=unpaved for roads that are really paved in Quebec.
 - Better quality municipal GIS datasets are now available in some
 cities like Toronto, Peel Region and York Region and if they are
 properly licensed, these should be used whenever possible. There
 generally are some minor errors in these datasets, but they are far
 better quality than CanVec/Geobase.
 
 Ah, interesting. Is there already a list of these candidates or would it make 
 sense to start one and look into proper licensing?
 
 
 I really like the TO-FIX Tiger Delta layer at
 http://osmlab.github.io/to-fix/#/task/tigerdelta which matches TIGER
 data with OSM data and tries to find errors. It would helpful if a
 similar tool were created for Canada.
 
 Obviously I am partial to MapRoulette, but sure, let me check it out, I am 
 sure we can come up with something similar for Canada. What would the 
 reference data be instead of TIGER?
 
 Again, thanks for your insights, Andrew.
 
 Martijn


___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-17 Thread Andrew MacKinnon
See http://wiki.openstreetmap.org/wiki/CanVec. CanVec data was
converted to OSM format and is stored at
http://ftp2.cits.rncan.gc.ca/OSM/pub/, and is split into files based
on the National Topographic System, and then data was imported in some
parts of Canada by manually cutting and pasting data from these files
into JOSM. I did this in a large part of southern Ontario and some
other users have done this as well. Importing CanVec data this way and
correcting all the errors is tedious and hasn't been completed for all
of Canada, and I haven't done very much with this for several years.
Before this was done there were more primitive imports done, perhaps
around 2008-2009 or so, and these imports are extremely low quality. I
can't remember which OSM user did this. When OSM was new there was not
much data in OSM, so a lot of imports were done and many of these
imports were poor quality; now that OSM is more mature, imports are
increasingly viewed unfavourably and there is a general attitude that
data should be collected by surveying whenever possible.

It would probably be best to use the newest version of the Geobase
National Road Network (http://www.geobase.ca/) and compare this to the
data in OSM and make corrections that way. Keep in mind that this data
has errors and municipal datasets (where available) are always better
quality.

___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-17 Thread Andrew MacKinnon
Also see Ordinance Survey Locator Musical Chairs
http://wiki.openstreetmap.org/wiki/OS_Locator_Musical_Chairs and
http://ris.dev.openstreetmap.org/oslmusicalchairs/map for a
comparison tool comparing UK Ordinance Survey data with OSM data,
similar to the TIGER fixup tool.

On Wed, Jun 17, 2015 at 7:10 PM, Andrew MacKinnon andrew...@gmail.com wrote:
 See http://wiki.openstreetmap.org/wiki/CanVec. CanVec data was
 converted to OSM format and is stored at
 http://ftp2.cits.rncan.gc.ca/OSM/pub/, and is split into files based
 on the National Topographic System, and then data was imported in some
 parts of Canada by manually cutting and pasting data from these files
 into JOSM. I did this in a large part of southern Ontario and some
 other users have done this as well. Importing CanVec data this way and
 correcting all the errors is tedious and hasn't been completed for all
 of Canada, and I haven't done very much with this for several years.
 Before this was done there were more primitive imports done, perhaps
 around 2008-2009 or so, and these imports are extremely low quality. I
 can't remember which OSM user did this. When OSM was new there was not
 much data in OSM, so a lot of imports were done and many of these
 imports were poor quality; now that OSM is more mature, imports are
 increasingly viewed unfavourably and there is a general attitude that
 data should be collected by surveying whenever possible.

 It would probably be best to use the newest version of the Geobase
 National Road Network (http://www.geobase.ca/) and compare this to the
 data in OSM and make corrections that way. Keep in mind that this data
 has errors and municipal datasets (where available) are always better
 quality.

___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-17 Thread Martijn van Exel
Hi Andrew, 

Thanks for elaborating on the CanVec / Geobase imports! This also raises new 
questions.. See below.

 On Jun 17, 2015, at 3:00 PM, Andrew MacKinnon andrew...@gmail.com wrote:
 
 A lot of the data in Canada was imported from CanVec and Geobase,
 some of it by me several years ago. The imported data is pretty poor
 quality in many places. I haven't done much work on this recently, as
 imports have a bad reputation in OSM and I am mostly concerned with
 surveying. For example:
 
 - Some older road data comes from an import which combined CanVec and
 Statistics Canada road names, attempting to match the road names in
 Statistics Canada with roads without names from CanVec, and this data
 is poor quality.

Is this described in more detail anywhere? Are the data / scripts / process 
still available? Which dat was poor quality, CanVec or Statistics Canada?

 - Road data in some areas is missing entirely.

This is probably easy to visualize, but do you happen to know where / why?

 - The CanVec address data is low quality, and is often broken - e.g.
 on a tile boundary address ranges will be split in half, and comes
 from several different versions of CanVec.
 - Other CanVec layers such as woods, lakes and so on were imported in
 some areas but not others. Much of this data is low quality.

Was some sort of progress page kept so we could see where certain features were 
imported or not (yet)? Has a followup ever been considered to augment / fix 
these botched / low quality imports? 

 - Some road names have too many spaces e.g. John Street is John
 Street. Some address ranges are like that as well.
 - lanes=-1 and surface=unpaved for roads that are really paved in Quebec.
 - Better quality municipal GIS datasets are now available in some
 cities like Toronto, Peel Region and York Region and if they are
 properly licensed, these should be used whenever possible. There
 generally are some minor errors in these datasets, but they are far
 better quality than CanVec/Geobase.

Ah, interesting. Is there already a list of these candidates or would it make 
sense to start one and look into proper licensing?

 
 I really like the TO-FIX Tiger Delta layer at
 http://osmlab.github.io/to-fix/#/task/tigerdelta which matches TIGER
 data with OSM data and tries to find errors. It would helpful if a
 similar tool were created for Canada.

Obviously I am partial to MapRoulette, but sure, let me check it out, I am sure 
we can come up with something similar for Canada. What would the reference data 
be instead of TIGER?

Again, thanks for your insights, Andrew.

Martijn
___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-17 Thread Andrew MacKinnon
 A lot of the data in Canada was imported from CanVec and Geobase,
some of it by me several years ago. The imported data is pretty poor
quality in many places. I haven't done much work on this recently, as
imports have a bad reputation in OSM and I am mostly concerned with
surveying. For example:

- Some older road data comes from an import which combined CanVec and
Statistics Canada road names, attempting to match the road names in
Statistics Canada with roads without names from CanVec, and this data
is poor quality.
- Road data in some areas is missing entirely.
- The CanVec address data is low quality, and is often broken - e.g.
on a tile boundary address ranges will be split in half, and comes
from several different versions of CanVec.
- Other CanVec layers such as woods, lakes and so on were imported in
some areas but not others. Much of this data is low quality.
- Some road names have too many spaces e.g. John Street is John
Street. Some address ranges are like that as well.
- lanes=-1 and surface=unpaved for roads that are really paved in Quebec.
- Better quality municipal GIS datasets are now available in some
cities like Toronto, Peel Region and York Region and if they are
properly licensed, these should be used whenever possible. There
generally are some minor errors in these datasets, but they are far
better quality than CanVec/Geobase.

I really like the TO-FIX Tiger Delta layer at
http://osmlab.github.io/to-fix/#/task/tigerdelta which matches TIGER
data with OSM data and tries to find errors. It would helpful if a
similar tool were created for Canada.

On Wed, Jun 17, 2015 at 4:27 PM, Harald Kliems kli...@gmail.com wrote:
 A few things I can think of:

 On Wed, Jun 17, 2015 at 3:13 PM Martijn van Exel m...@rtijn.org wrote:

 * Are there any Canada-specific mapping and tagging conventions?

 - There seems to be a strong consensus that what elsewhere would be
 highway=unclassified is highway=residential, no matter if the road is in a
 populated area or not.

 * Are there any known big (national) issues in the Canadian OSM data?
 (misguided imports / bots, major tagging disputes, that kind of thing)

 I believe these mostly affect Quebec, but there are two import problems that
 never got systematically fixed, as far as I know:
 - CanVec import of highways where lanes=-1 and surface=unpaved.
 - CanVec or Geobase import where there is an extra blank between the street
 type designation and the name. E.g. Rue__Sherbrooke instead of
 Rue_Sherbrooke.

  Harald (now in the US)

 ___
 Talk-ca mailing list
 Talk-ca@openstreetmap.org
 https://lists.openstreetmap.org/listinfo/talk-ca


___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-17 Thread Andrew Lester
If this is the consensus, I've been blissfully unaware and the wiki needs to be 
updated. The Canadian tagging guidelines 
(https://wiki.openstreetmap.org/wiki/Canadian_tagging_guidelines#Regional_roadways_.28below_provincially_controlled.29)
 recommend using unclassified when not in residential areas, and that's how 
I've been tagging. The CANVEC imports generally use residential as you describe 
which has led to a lot of mis-tagged highways, but I wouldn't say this is a 
consensus agreement that this is how we want it to be. It’s just how the data 
was imported. I'm gradually re-tagging such highways in my area, but there's a 
lot that need to be fixed across very large areas and not many people working 
on it.

 

Andrew Lester

Victoria, BC, Canada

 

From: Harald Kliems [mailto:kli...@gmail.com] 
Sent: Wednesday, June 17, 2015 1:27 PM
To: Martijn van Exel; talk-ca@openstreetmap.org
Subject: Re: [Talk-ca] OSM data quality in Canada

 

A few things I can think of:

 

On Wed, Jun 17, 2015 at 3:13 PM Martijn van Exel m...@rtijn.org 
mailto:m...@rtijn.org  wrote:

* Are there any Canada-specific mapping and tagging conventions?

- There seems to be a strong consensus that what elsewhere would be 
highway=unclassified is highway=residential, no matter if the road is in a 
populated area or not.

___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca


Re: [Talk-ca] OSM data quality in Canada

2015-06-17 Thread Harald Kliems
A few things I can think of:

On Wed, Jun 17, 2015 at 3:13 PM Martijn van Exel m...@rtijn.org wrote:

 * Are there any Canada-specific mapping and tagging conventions?

- There seems to be a strong consensus that what elsewhere would be
highway=unclassified is highway=residential, no matter if the road is in a
populated area or not.

* Are there any known big (national) issues in the Canadian OSM data?
 (misguided imports / bots, major tagging disputes, that kind of thing)

I believe these mostly affect Quebec, but there are two import problems
that never got systematically fixed, as far as I know:
- CanVec import of highways where lanes=-1 and surface=unpaved.
- CanVec or Geobase import where there is an extra blank between the street
type designation and the name. E.g. Rue__Sherbrooke instead of
Rue_Sherbrooke.

 Harald (now in the US)
___
Talk-ca mailing list
Talk-ca@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-ca