Re: [OSM-talk] Canadian OSM POI quality

2014-06-09 Thread Minh Nguyen

On 2014-06-08 21:57, Paul Norman wrote:

Inconsistent tagging was observed with some shops, such as variability
between amenity=restaurant and amenity=fast_food. This should reflect
differences between locations but may not. Inconsistent names were also
observed, such as "Walmart", "Wal-mart", and "Wal Mart". These issues
are not as significant as the large percentage of missing shops.


Inconsistency with Walmart is to be expected, since signs would've read 
"Wal★Mart" until at least 2008. [1]


All the apostrophe-esses in your results made me wonder how common it is 
to curl apostrophes in POI names. Overpass turbo shows 17,258 nodes and 
1,263 ways with a name containing "’s" (in any language). [2] Few of 
them appear in Canada. (Most of the results there are for David Thompson 
Explorer’s Trail in Alberta and Cook’s Creek in Manitoba.)


[1] http://corporate.walmart.com/our-story/history/walmart-logo-timeline
[2] http://overpass-turbo.eu/?w=%22name%22~%22%E2%80%99s%22

--
m...@nguyen.cincinnati.oh.us


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


[OSM-talk] Canadian OSM POI quality

2014-06-08 Thread Paul Norman

I was curious how complete OpenStreetMap shop data was, so decided to
do an analysis for some Canadian chains.

The results were mixed. Starting with a Canada extract, I processed the
data into PostGIS and ran queries against name, brand and franchise for
objects where amenity, office or shop was null. Brands which have recently
changed names (e.g. Zellers/Target) were avoided.

OSM completeness varied from 7% to 81%, with no overwhelming trends. The
four major fast food and restaurants chains considered ranged from 33%
to 51%. Shops opening and closing change the accuracy of these results,
and the accuracy of the external sources for number of shops may be
variable.

Method
==

The geofabrik extract for Canada was imported with osm2pgsql with a
custom .style file containing name, operator, brand, franchise, amenity,
building, office and shop columns. The last four columns caused an object
to be placed in the polygon table. After import, the tables were filtered
to remove rows where there was not an amenity, office, or shop tag.
Appropriate indexes were added. A view was created combining the two
tables and giving lower-case versions of the name, operator, brand and
franchise tags.

Queries of the following form were run
  SELECT COUNT(*) FROM shops
WHERE lname LIKE :'name' OR lbrand LIKE :'name'
  OR lfranchise LIKE :'name';

:'name' was substituted in by psql for what I was searching for. For
example, 'mcdonald%' for McDonald's. The queries used were intended to
catch all possible shops even if it resulted in false positives. Brand
selection was not done in any systematic manner.

Public sources were used for the true number of shops of a particular
chain, generally Wikipedia or public data aggregators.

Results
===
   OSM   True   Completenmess
Tim Horton's  1480   4304   34%
Subway 849   2563   33%
McDonalds  722   1417   51%
Starbucks  592   1363   43%
Both OSM and Google use both Starbucks and Starbucks Coffee
A & W  292800   37%
Domino's67383   17%
Wendy's224369   61%
Burger King150281   53%
East Side Mario's   46 85   54%
Milestones  32 44   73%
Chili's 10 16   63%

Sears  105   15707%
Rona   122500   24%
The Bay 50421   12%
Walmart265382   69%
Home Depot  95180   53%

Canadian Tire  400491   81%
May be double-counting automative centers.
Chapters47233   20%
Sleep Country   22179   12%
London Drugs54 78   69%
May be double-counting some stores with a pharmacy inside

Remarks
===

It took significantly longer to find the true number of stores than
to get results from the OSM data. Part of this is my increased familiarity
with OSM tools, but a large part is that it is not necessary to track
down many different sources to get store counts.

Although no urban/rural analysis was performed, it is generally expected
that OSM is more complete in populated urban areas than low-density rural
areas, and completeness in these urban areas are often more important
for many uses.

No proprietary data sources were available for comparison, but it should
not be assumed that they are any more complete, nor that their name or
similar tagging is any more consistent. As an example, Google's data was
observed to use both "Starbucks" and "Starbucks Coffee" for the coffee
chain, sometimes having both for what was really the same location.

The tools used to generate counts could easily be used to extract the
shop data to work with.

Improving the data
==
Inconsistent tagging was observed with some shops, such as variability
between amenity=restaurant and amenity=fast_food. This should reflect
differences between locations but may not. Inconsistent names were also
observed, such as "Walmart", "Wal-mart", and "Wal Mart". These issues
are not as significant as the large percentage of missing shops.

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk