[Wikidata-bugs] [Maniphest] [Commented On] T105430: Ensure that language tags generated in RDF output are standard language names
Smalyshev added a comment. > The sitelinks are distinguishable by the URL (https://simple.wikipedia.org/) URL is the different triple than the data, and matching URLs means that the client should maintain own database which says which Wiki URL matches which language and do pattern matching on the URL data. This does not sound to me like a good solution, both performance-wise and design-wise. This also couples two things (language and URL) which should not be coupled as they describe two different things. Since we have triple using schema:inLanguage and data using language tags, we should ensure those have right values, instead of relying on other information to fix wrong values there. > en is also a standard language code. True, but it does not adequately describes the data which refers to "Simple English", not just "English". Just as "nl" would not adequately describe data that refers to "nl-informal", etc. That would be loss of information, and we should avoid that when exporting data. > When you change the language codes at the right position the RDF export gets > automatically the correct language codes. So far I have seen no code that does such change and I do not feel comfortable starting a project for refactoring whole language handling in whole MediaWiki (which would be required if we just change `Site::getLanguageCode()`), when I just need a right language tag in RDF. If that refactoring ever happens and solves that particular problem, I would be glad to refactor this particular fix. But remaining without fix until an undefined moment that this happens does not sound like a good way to go for me. > I think it is bad programming style to make several workarounds instead of > fixing the core problem. If somebody volonteers to fix the core problem, I think that would be excellent. If not, as it has been evidently happening since 2012, I don't see what use it is to discuss a hypothetical fix that might have been instead of fixing the actual thing that needs to be fixed. Following this strategy would lead us only to discussing solving bigger and bigger global solutions in theory without actually getting a thing done in practice. TASK DETAIL https://phabricator.wikimedia.org/T105430 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Fomafix, gerritbot, Smalyshev, Aklapper, daniel, mkroetzsch, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T105430: Ensure that language tags generated in RDF output are standard language names
Smalyshev added a comment. > This language code should be used for the sitelinks. In HTML and in the RDF > export That would lead to the situation where links to Simple English wiki and to English wiki are indistinguishable. Which is not good. > If you want to change this to en-x-simple then create a separate task. This is that task. > This must be changed everywhere where a HTML attribute lang is generated. That has no relation to RDF export and thus outside of the scope of this task. > When https://phabricator.wikimedia.org/T43723 is fixed most of your patch for > the RDF export is superfluous. When it would be fixed, we can consider revisiting this code and if the fix allows to remove the special cases then they will be removed. However, since that ticket seems to be open since 2012, I'd rather fix the RDF export now (which otherwise will be confusing for third party users - the main audience of the export) than wait for https://phabricator.wikimedia.org/T43723. TASK DETAIL https://phabricator.wikimedia.org/T105430 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Fomafix, gerritbot, Smalyshev, Aklapper, daniel, mkroetzsch, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T105430: Ensure that language tags generated in RDF output are standard language names
Smalyshev added a comment. @Fomafix we're not talking about user interface languages here. We're talking about language specification in the RDF export - which should follow BCP 47 and common accepted language codes, otherwise third-party tools would not be able to understand in which language these strings are in. Of course, with something like "Simple English" there might not be a standard code (correct me if I'm wrong) but at least it should be one that is standard-compliant and not the same as "en", otherwise sitelink to Simple English and sitelink to English would not be distinguishable. As far as I can see, Simple English is a separate wiki from English - I see "Search the 113,945 articles in the Simple English Wikipedia" on the homepage, so it's not the same articles. Thus, I think we need separate code for it. > Changing this codes to de-x-formal and nl-x-informal may be possible when > this is necessary to be conform to BCP 47. That's what I am doing in the patch. Along with several others that also need to be changed for standard compliance. TASK DETAIL https://phabricator.wikimedia.org/T105430 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev Cc: Fomafix, gerritbot, Smalyshev, Aklapper, daniel, mkroetzsch, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T105430: Ensure that language tags generated in RDF output are standard language names
gerritbot added a subscriber: gerritbot. gerritbot added a comment. Change 225518 had a related patch set uploaded (by Smalyshev): https://phabricator.wikimedia.org/T105430: canonicalize language codes https://gerrit.wikimedia.org/r/225518 TASK DETAIL https://phabricator.wikimedia.org/T105430 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, gerritbot Cc: gerritbot, Smalyshev, Aklapper, daniel, mkroetzsch, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki, Malyacko, P.Copp ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs