Hi Arian,

your diagnosis is completely right. Btw. I've filed some bugs for this
kind of mess since few years. Things gradually improved :-(

Imho, the message object needs to be enabled to return a direction and a
language code (BCP 47 code, to be more precise) that reflects the true
value for fallback messages etc. Currently I do not see a real us case
for the question "Is this message a fallback message" but I bet, someone
will find one, so I suggest to make that queriable, too.

LocalisationCache should keep "is_fallback" and "has language code X" for messages. Alternatively for the latter, a pointer to to a language object
might do as well.

We do not have a chance to produce correct HTML with mixed languages,
if we do not even know what language a string is in. We must however,
in all instances of language strings,
- either check the DOM for the current language, and enclose messages
  in a proper language wrapper if needed, or
- emit language wrappers unconditionally and have tidy clean them up.

Purodha

On 12.04.2016 13:01, Adrian Heine wrote:
Hi everyone,

as some of you might know, I'm a software developer at Wikimedia
Deutschland, working on Wikidata. I'm currently focusing on improving
Wikidata's support for languages we as a team are not using on a daily basis. As part of my work I stumbled over a shortcoming in MediaWiki's
message system that – as far as I see it – prevents me from doing the
right thing(tm). I'm asking you to verify that the issue I see indeed
is an issue and that we want to fix it. Subsequently, I'm interested
in hearing your plans or goals for MediaWiki's message system so that
I can align my implementation with them. Finally, I am hoping to find
someone who is willing to help me fix it.

== The issue ==

On Wikidata, we regularly have content in different languages on the
same page. We use the HTML lang and dir attributes accordingly. For
example, we have a table with terms for an entity in different
languages. For missing terms, we would display a message in the UI
language within this table. The corresponding HTML (simplified) might
look like this:

<div id="mw-content-text" lang="UILANG" dir="UILANG_DIR">
  <table class="entity-terms">
    <tr class="entity-terms-for-OTHERLANG1" lang="OTHERLANG1"
dir="OTHERLANG1_DIR">
      <td class="entity-terms-for-OTHERLANG1-label">
        <div class="wb-empty" lang="UILANG" dir="UILANG_DIR">
          <!-- missing label message -->
        </div>
      </td>
    </tr>
  </div>
</div>

This works great as long as the missing label message is available in
the UI language. If that is not the case, though, the message is
translated according to the defined language fallbacks. In that case,
we might end up with something like this:

<div class="wb-empty" lang="arc" dir="rtl">No label defined</div>

That's obviously wrong, and I'd like to fix it.

== Fixing it ==

For fixing this, I tried to make MessageCache provide the language a
message was taken from [1]. That's not too straight-forward to begin
with, but while working on it I realized that MessageCache is only
responsible for following the language fallback chain for database
translations. For file-based translations, the fallbacks are directly
merged in by LocalisationCache, so the information is not there
anymore at the time of translating a message. I see some ways to fix
this:

* Don't merge messages in LocalisationCache, but perform the fallback
on request (possibly caching the result)
* Tag message strings in LocalisationCache with the language they are
in (sounds expensive to me)
* Tag message strings as being a fallback in LocalisationCache (that
way we could follow the fallback until we find a language in which the
message string is not tagged as being a fallback)

What do you think?

[1] https://gerrit.wikimedia.org/r/282133

Thanks,
--
Adrian Heine né Lang
SOFTWARE DEVELOPER

Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
http://wikimedia.de

Imagine a world, in which every single human being can freely share
in the sum of all
knowledge. That‘s our commitment.

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der
Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Mediawiki-i18n mailing list
Mediawiki-i18n@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n


_______________________________________________
Mediawiki-i18n mailing list
Mediawiki-i18n@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

Reply via email to