Re: [OSM-talk] Semi-auto converting Wikipedia -> Wikidata tags
Hi Martin, On the first pass, I am not checking individual Q-ID numbers, mostly because the existing tooling is very poor for that, and the rate of error is very low. JOSM simply looks up the ID and adds it. BUT, once it is added, I do a query (OT) for the tags, and match them with the Wikidata query results, and check that the names and other tags match (in a spreadsheet), allowing me to quickly catch the very few non-matching or broken items. This method has shown much much more value than simply let people copy/paste IDs, as humans tend to quite a few mistakes - I saw incorrect language codes for Wikipedia links (probably typed by hand), or simply stale or non-existent WP links. None of the approaches are perfect, but I hope mine will result in a much higher quality and more completeness. You are correct that sometimes Wikidata could interlink non-related articles (usually it gets fixed right away), or articles with the different scope (somewhat more common and permanent). In a rare case, that would mean Wikidata ID would be wrong or not specific enough, but that is very easy to catch on the second pass and correct, when the actual data is compared. The more common case is the one i mentioned before - when Wikipedia articles (all of the linked languages) are about multiple concepts (e.g. administrative and ceremonial district together), but there exists another Wikidata ID, not linked to any articles, just for the admin district. Which means the linked one is more for ceremonial, and should be fixed (usually by some additional searching). So yes, blindly adding tags would work fine for >99%, and will not be good for the other <1% (guessing). Yet, it would still be good to have that 1% because they allow much better further validation and correction, whereas having a Wikipedia link is just a string of text that is much harder to work with when cross-verifying with other sources. And BTW, Wikidata is far from perfect either - most of England frequently has incorrect admin tree-structure, and should also be fixed - something that this work will also help fix - win-win for everyone :) On Fri, Nov 25, 2016 at 6:24 PM Martin Koppenhoeferwrote: sent from a phone > Il giorno 25 nov 2016, alle ore 22:55, Yuri Astrakhan < yuriastrak...@gmail.com> ha scritto: > > . I am simply converting existing Wikipedia tag into the Wikidata tags, because there is always a 1 to 1 matching between them, you are checking individually and critically whether the osm objects fit to the wikidata object definitions, or are you just adding wikidata tags for wikipedia articles that are already linked from osm? Afaik many wikidata objects are linked to several wikipedia articles (because of wp articles being written in different languages). Using wikipedia quite a bit in 3 languages I have found that inconsistencies aren't that rare ("wrong" articles interlinked). Partly this is because wp articles in different languages are mostly not translations but are articles that have varying coverage and levels of detail and focus (i.e. a wikidata object that fits onto an English article does not necessarily fit on the German article that is linked to the English article). Some linked articles are also simply wrong. One example: In the field of geographic places and settlements it can occur that socio-geographic places and political territorial entities are either mixed in the same article or are split over different articles, and it might also differ between languages (some languages might have 1 article dealing with both, others might have 2 and more). Wikidata seems to have a preference for administrative entities (not sure, it is just a first impression) and related statements in all cases I have seen so fat (even when there's a different object that also deals with the administrative entity). Misguided wikipedia tags are not very frequent in osm, but they do occur of course. Blindly adding corresponding wikidata tags might make it look more consistent even if the tag is wrong, because both tags seem to confirm each other. cheers, Martin ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
Re: [OSM-talk] Semi-auto converting Wikipedia -> Wikidata tags
sent from a phone > Il giorno 25 nov 2016, alle ore 22:55, Yuri Astrakhan >ha scritto: > > . I am simply converting existing Wikipedia tag into the Wikidata tags, > because there is always a 1 to 1 matching between them, you are checking individually and critically whether the osm objects fit to the wikidata object definitions, or are you just adding wikidata tags for wikipedia articles that are already linked from osm? Afaik many wikidata objects are linked to several wikipedia articles (because of wp articles being written in different languages). Using wikipedia quite a bit in 3 languages I have found that inconsistencies aren't that rare ("wrong" articles interlinked). Partly this is because wp articles in different languages are mostly not translations but are articles that have varying coverage and levels of detail and focus (i.e. a wikidata object that fits onto an English article does not necessarily fit on the German article that is linked to the English article). Some linked articles are also simply wrong. One example: In the field of geographic places and settlements it can occur that socio-geographic places and political territorial entities are either mixed in the same article or are split over different articles, and it might also differ between languages (some languages might have 1 article dealing with both, others might have 2 and more). Wikidata seems to have a preference for administrative entities (not sure, it is just a first impression) and related statements in all cases I have seen so fat (even when there's a different object that also deals with the administrative entity). Misguided wikipedia tags are not very frequent in osm, but they do occur of course. Blindly adding corresponding wikidata tags might make it look more consistent even if the tag is wrong, because both tags seem to confirm each other. cheers, Martin ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk
[OSM-talk] Semi-auto converting Wikipedia -> Wikidata tags
Hi, I am exploring ways to make more educational maps in Wikipedia. For example, this graph shows all US state governors. It works by querying Wikidata for the governors' info, and drawing state overlays using OSM relations tagged with the Wikidata IDs. https://www.mediawiki.org/wiki/Help:Extension:Kartographer#GeoShapes_via_Wikidata_Query This new technology should (hopefully) enhance location and politics related articles. To work, this technology relies on the Wikidata-tagged objects in OSM, so the more objects are tagged, the more interesting maps can be created by the community. While the top level (countries, states) are already tagged, the smaller areas tend to have just the Wikipedia tag. I have been adding the matching Wikidata tag for many admin-level relations by using JOSM's "Fetch Wikidata ID" command (Wikipedia plugin). This works great most of the time, but on occasion it is not perfect. For example, in England there are Administrative and Ceremonial (historical) parishes. Both would be tagged with the same Wikipedia tag because both concepts are described in the same article, yet the matching Wikidata ID would usually cover just one aspect (usually ceremonial), but not the admin. I plan to do the following: * Going from admin_level 1..10+, for all locations that have Wikipedia tag but not Wikidata tag, add the matching Wikidata IDs using Wikipedia plugin's "fetch Wikidata ID" command. At the moment, Wikipedia plugin does not automatically resolve Wikipedia page redirects (if a page was renamed), so I often have to do it by hand. * Once all areas are marked, I would like to ensure that Wikidata and OSM are in sync, by checking that Wikidata tags are actually pointing to admin areas, and that the tree structure in OSM and in Wikidata match. E.g. this query shows the tree structure of Wikidata. If anyone has any CC0 sources of the admin structure of the countries, please msg me. https://www.wikidata.org/w/index.php?title=User:Yurik/Admin_regions To clarify - I am NOT adding wikidata IDs by some magical GPS coordinate resolution or name matching. I am simply converting existing Wikipedia tag into the Wikidata tags, because there is always a 1 to 1 matching between them, and adding a Wikidata tag ensures that even if the WP article is renamed or deleted, at least Wikidata tag stays valid. Adding WD tag that describes ceremonial parish rather than admin district is "incrementally beneficial", in the sense that it is still relevant - it points to the right Wikipedia article, and it also makes it easier to further improve it to point to the admin district via a semi-automated (spreadsheet/text checks) validation, or checking for dups. Thanks! ___ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk