mrephabricator added a comment.
Another example to consider - the dinosaur Changdusaurus (en) was first described in Chinese sources as 昌都龍. In Vietnamese, this dinosaur is called Changtusaurus, having been transliterated from Chinese using Vietnamese Latin script. (That other languages have duplicated the English name is likely incidental - there is no reason to prefer one over the other, and like many dinosaur names, this represents a genus but not one with a Latin taxon name.) If a different dinosaur name derived the same way in Vietnamese and English happened to match, that would not mean they have the same name in each language, since the shared letters don't represent the same sound. Should that "duplicate" get removed we could say that it would not matter because a query would return the same fallback anyway, but the same would be true for dinosaurs which never had a Vietnamese name entered to begin with. The information about which labels exactly would be homographic between which languages would be gone, and a certain amount of unrecoverable data would be gone. This would make working with data within a given language harder as there would be no way to tell between mul (fallback added for English and Swedish) and mul (differently pronounced English and Hawaiian words happened to be written the same way) further skewing the data quality outside of a handful of popular languages. At least ensuring that "mul" is understood as meaning "multiple languages" and not "Latin script" could prevent some of this from happening. I think it would be fitting that preference be given to labels which would not fit anywhere else but would be legible in other languages. For example, if the Balti name of a town in Gilgit-Baltistan is added to mul in absence of a bft Balti code, it would likely be legible to Urdu readers or Kashmiri readers and so on. Then if readers of those uncoded languages are using Urdu or English as a locale, they would still be able to get these names as a fallback. TASK DETAIL https://phabricator.wikimedia.org/T306918 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mrephabricator Cc: mrephabricator, Lucas_Werkmeister_WMDE, Lydia_Pintscher, Nikki, Mahir256, Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org