mrephabricator added a comment.

  Another example to consider - the dinosaur Changdusaurus (en) was first 
described in Chinese sources as 昌都龍. In Vietnamese, this dinosaur is called 
Changtusaurus, having been transliterated from Chinese using Vietnamese Latin 
script. (That other languages have duplicated the English name is likely 
incidental - there is no reason to prefer one over the other, and like many 
dinosaur names, this represents a genus but not one with a Latin taxon name.) 
If a different dinosaur name derived the same way in Vietnamese and English 
happened to match, that would not mean they have the same name in each 
language, since the shared letters don't represent the same sound. Should that 
"duplicate" get removed we could say that it would not matter because a query 
would return the same fallback anyway, but the same would be true for dinosaurs 
which never had a Vietnamese name entered to begin with. The information about 
which labels exactly would be homographic between which languages would be 
gone, and a certain amount of unrecoverable data would be gone. This would make 
working with data within a given language harder as there would be no way to 
tell between mul (fallback added for English and Swedish) and mul (differently 
pronounced English and Hawaiian words happened to be written the same way) 
further skewing the data quality outside of a handful of popular languages. At 
least ensuring that "mul" is understood as meaning "multiple languages" and not 
"Latin script" could prevent some of this from happening.
  
  I think it would be fitting that preference be given to labels which would 
not fit anywhere else but would be legible in other languages. For example, if 
the Balti name of a town in Gilgit-Baltistan is added to mul in absence of a 
bft Balti code, it would likely be legible to Urdu readers or Kashmiri readers 
and so on. Then if readers of those uncoded languages are using Urdu or English 
as a locale, they would still be able to get these names as a fallback.

TASK DETAIL
  https://phabricator.wikimedia.org/T306918

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mrephabricator
Cc: mrephabricator, Lucas_Werkmeister_WMDE, Lydia_Pintscher, Nikki, Mahir256, 
Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to