Re: [OSM-talk] Semi-auto converting Wikipedia -> Wikidata tags

2016-11-25 Thread Yuri Astrakhan
Hi Martin,

On the first pass, I am not checking individual Q-ID numbers, mostly
because the existing tooling is very poor for that, and the rate of error
is very low. JOSM simply looks up the ID and adds it. BUT, once it is
added, I do a query (OT) for the tags, and match them with the Wikidata
query results, and check that the names and other tags match (in a
spreadsheet), allowing me to quickly catch the very few non-matching or
broken items.  This method has shown much much more value than simply let
people copy/paste IDs, as humans tend to quite a few mistakes - I saw
incorrect language codes for Wikipedia links (probably typed by hand), or
simply stale or non-existent WP links.  None of the approaches are perfect,
but I hope mine will result in a much higher quality and more completeness.

You are correct that sometimes Wikidata could interlink non-related
articles (usually it gets fixed right away), or articles with the different
scope (somewhat more common and permanent). In a rare case, that would mean
Wikidata ID would be wrong or not specific enough, but that is very easy to
catch on the second pass and correct, when the actual data is compared.
The more common case is the one i mentioned before - when Wikipedia
articles (all of the linked languages) are about multiple concepts (e.g.
administrative and ceremonial district together), but there exists another
Wikidata ID, not linked to any articles, just for the admin district. Which
means the linked one is more for ceremonial, and should be fixed (usually
by some additional searching).

So yes, blindly adding tags would work fine for >99%, and will not be good
for the other <1% (guessing). Yet, it would still be good to have that 1%
because they allow much better further validation and correction, whereas
having a Wikipedia link is just a string of text that is much harder to
work with when cross-verifying with other sources.  And BTW, Wikidata is
far from perfect either - most of England frequently has incorrect admin
tree-structure, and should also be fixed - something that this work will
also help fix - win-win for everyone :)

On Fri, Nov 25, 2016 at 6:24 PM Martin Koppenhoefer 
wrote:



sent from a phone

> Il giorno 25 nov 2016, alle ore 22:55, Yuri Astrakhan <
yuriastrak...@gmail.com> ha scritto:
>
> .  I am simply converting existing Wikipedia tag into the Wikidata tags,
because there is always a 1 to 1 matching between them,


you are checking individually and critically whether the osm objects fit to
the wikidata object definitions, or are you just adding wikidata tags for
wikipedia articles that are already linked from osm?

Afaik many wikidata objects are linked to several wikipedia articles
(because of wp articles being written in different languages). Using
wikipedia quite a bit in 3 languages I have found that inconsistencies
aren't that rare ("wrong" articles interlinked). Partly this is because wp
articles in different languages are mostly not translations but are
articles that have varying coverage and levels of detail and focus (i.e. a
wikidata object that fits onto an English article does not necessarily fit
on the German article that is linked to the English article). Some linked
articles are also simply wrong.

One example: In the field of geographic places and settlements it can occur
that socio-geographic places and political territorial entities are either
mixed in the same article or are split over different articles, and it
might also differ between languages (some languages might have 1 article
dealing with both, others might have 2 and more). Wikidata seems to have a
preference for administrative entities (not sure, it is just a first
impression) and related statements in all cases I have seen so fat (even
when there's a different object that also deals with the administrative
entity).

Misguided wikipedia tags are not very frequent in osm, but they do occur of
course. Blindly adding corresponding wikidata tags might make it look more
consistent even if the tag is wrong, because both tags seem to confirm each
other.

cheers,
Martin
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] Semi-auto converting Wikipedia -> Wikidata tags

2016-11-25 Thread Martin Koppenhoefer


sent from a phone

> Il giorno 25 nov 2016, alle ore 22:55, Yuri Astrakhan 
>  ha scritto:
> 
> .  I am simply converting existing Wikipedia tag into the Wikidata tags, 
> because there is always a 1 to 1 matching between them,


you are checking individually and critically whether the osm objects fit to the 
wikidata object definitions, or are you just adding wikidata tags for wikipedia 
articles that are already linked from osm?

Afaik many wikidata objects are linked to several wikipedia articles (because 
of wp articles being written in different languages). Using wikipedia quite a 
bit in 3 languages I have found that inconsistencies aren't that rare ("wrong" 
articles interlinked). Partly this is because wp articles in different 
languages are mostly not translations but are articles that have varying 
coverage and levels of detail and focus (i.e. a wikidata object that fits onto 
an English article does not necessarily fit on the German article that is 
linked to the English article). Some linked articles are also simply wrong.

One example: In the field of geographic places and settlements it can occur 
that socio-geographic places and political territorial entities are either 
mixed in the same article or are split over different articles, and it might 
also differ between languages (some languages might have 1 article dealing with 
both, others might have 2 and more). Wikidata seems to have a preference for 
administrative entities (not sure, it is just a first impression) and related 
statements in all cases I have seen so fat (even when there's a different 
object that also deals with the administrative entity).

Misguided wikipedia tags are not very frequent in osm, but they do occur of 
course. Blindly adding corresponding wikidata tags might make it look more 
consistent even if the tag is wrong, because both tags seem to confirm each 
other.

cheers,
Martin 
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


[OSM-talk] Semi-auto converting Wikipedia -> Wikidata tags

2016-11-25 Thread Yuri Astrakhan
Hi, I am exploring ways to make more educational maps in Wikipedia. For
example, this graph shows all US state governors. It works by querying
Wikidata for the governors' info, and drawing state overlays using OSM
relations tagged with the Wikidata IDs.

https://www.mediawiki.org/wiki/Help:Extension:Kartographer#GeoShapes_via_Wikidata_Query

This new technology should (hopefully) enhance location and politics
related articles. To work, this technology relies on the Wikidata-tagged
objects in OSM, so the more objects are tagged, the more interesting maps
can be created by the community. While the top level (countries, states)
are already tagged, the smaller areas tend to have just the Wikipedia tag.
I have been adding the matching Wikidata tag for many admin-level relations
by using JOSM's "Fetch Wikidata ID" command (Wikipedia plugin).  This works
great most of the time, but on occasion it is not perfect. For example, in
England there are Administrative and Ceremonial (historical) parishes. Both
would be tagged with the same Wikipedia tag because both concepts are
described in the same article, yet the matching Wikidata ID would usually
cover just one aspect (usually ceremonial), but not the admin.  I plan to
do the following:

* Going from admin_level 1..10+, for all locations that have Wikipedia tag
but not Wikidata tag, add the matching Wikidata IDs using Wikipedia
plugin's "fetch Wikidata ID" command. At the moment, Wikipedia plugin does
not automatically resolve Wikipedia page redirects (if a page was renamed),
so I often have to do it by hand.
* Once all areas are marked, I would like to ensure that Wikidata and OSM
are in sync, by checking that Wikidata tags are actually pointing to admin
areas, and that the tree structure in OSM and in Wikidata match. E.g. this
query shows the tree structure of Wikidata. If anyone has any CC0 sources
of the admin structure of the countries, please msg me.

https://www.wikidata.org/w/index.php?title=User:Yurik/Admin_regions

To clarify - I am NOT adding wikidata IDs by some magical GPS coordinate
resolution or name matching.  I am simply converting existing Wikipedia tag
into the Wikidata tags, because there is always a 1 to 1 matching between
them, and adding a Wikidata tag ensures that even if the WP article is
renamed or deleted, at least Wikidata tag stays valid.  Adding WD tag that
describes ceremonial parish rather than admin district is "incrementally
beneficial", in the sense that it is still relevant - it points to the
right Wikipedia article, and it also makes it easier to further improve it
to point to the admin district via a semi-automated (spreadsheet/text
checks) validation, or checking for dups.

Thanks!
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk