Re: [Geoserver-users] Fwd: GitHub integration - Encoding issue

Andrea Aime Sat, 08 Jan 2022 09:06:45 -0800

On Wed, Jan 5, 2022 at 6:46 PM Alexandre Gacon <alexandre.ga...@gmail.com>
wrote:


> ---------------------------
> In order to understand the following explanation, keep in mind that:
>
>    - UTF-8 is the encoding that will preserve properly all non-ascii,
>    non-latin1 characters
>    - ISO-5589-1 (aka latin1 ) is a ascii based encoding that contains all
>    the ascii characters plus some additional ones used in the latin alphabet
>    (i.e. é, è etc..)
>
> Probably key to understanding the rest, latin1 and ISO8859-1 are the same
<https://en.wikipedia.org/wiki/ISO/IEC_8859-1> (it confused me at first).

>
>    -
>    - us-ascii is the standard encoding for electronic communication and
>    as we already mentioned a subset of the latin1 encoding.
>
>
> After the new tests regarding the retaining of the encoding of the file
> given in the ticket, we noticed the following:
>
>    - If a non-latin1, non-ascii character exists in the translation
>    (UTF-8 characters) then the final translation file will contain the UTF-8
>    escaped corresponding characters (i.e. \u0420 corresponds to some Cyrillic
>    letter).
>
> Ok, so Transifex won't support the Wicket ".utf8.properties" convention,
and just escape chars so that they can be encoded in ISO-8859-1 instead.

>
>    - In our case, the latin1 character wasn’t part of the translated
>    strings but part of the structure of the file, at the template of the file.
>    This means that we don’t want to change it to the UTF-8 escaped character.
>
> I don't understand what "the structure of the file" instead of "part of
the translated strings" means. Maybe the latin1 character was in a key
rather than
in a value? Or maybe in a comment.


>
>    - But on the other hand, the library that we are using in order to
>    integrate github with transifex is not supporting latin1 but UTF-8 so when
>    a non-ascii character appears it converts the whole file to the best
>    encoding that can represent that character. In our case that is UTF-8.
>
> It seems they have a technical limitation, and can either do us-ascii or
escaped UTF-8, but does not support latin1 (ISO-8859-1).

>
> In order to preserve the us-ascii encoding (not the latin1) in github one
> must make sure that the source keys and the comments of the file do not
> contain any non ascii characters.
>

Seems that we can either use only us-ascii chars (and encode anything else,
included accented letters, using UTF-8 escape codes),
or maybe fully UTF-8? Regardless it seems ISO-8859-1 is simply out of the
equation?


> ---------------------------
>
> In case something wasn't clear, what this means is that because the source
> file had a latin1 character (é) even though the translations for the
> strings did not, this character was kept as-is (not escaped) as part of the
> "template". Therefore, the translation files sent back to GitHub are being
> encoded with UTF-8 by the library being used. We do not think we can do
> anything about this, unfortunately. So, the translation files for the Java
> Properties file format must be retrieved from Transifex directly instead of
> using the GitHub integration.
>

I believe the "é" character was added in a comment, as an attempt to force
Transifex to use ISO-8859-1?
And Transifex is simply incapable of doing that?

Hum... well Wicket does not really care and will support translation files
made of us-ascii with UTF-8 escapes fine
I believe, but translators that are doing direct commits, rather than going
though Transifex might be less than pleased.
I believe Jody at one point mentioned a different platform, but cannot
remember which one that is.
Thinking out loud, I see two avenues ahead:

   - Put up with Transifex limitations
   - Try to extract the good work present in Transifex once, and then
   migrate to another translation system, if you can find one that works
   better for translator

Cheers
Andrea

==

GeoServer Professional Services from the experts!

Visit http://bit.ly/gs-services-us for more information.
==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions Group
phone: +39 0584 962313

fax:     +39 0584 1660272

mob:   +39  333 8128928

https://www.geosolutionsgroup.com/

http://twitter.com/geosolutions_it

-------------------------------------------------------

Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE
2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si
precisa che ogni circostanza inerente alla presente email (il suo
contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è
riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il
messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra
operazione è illecita. Le sarei comunque grato se potesse darmene notizia.

This email is intended only for the person or entity to which it is
addressed and may contain information that is privileged, confidential or
otherwise protected from disclosure. We remind that - as provided by
European Regulation 2016/679 “GDPR” - copying, dissemination or use of this
e-mail or the information herein by anyone other than the intended
recipient is prohibited. If you have received this email by mistake, please
notify us immediately by telephone or e-mail

_______________________________________________
Geoserver-users mailing list

Please make sure you read the following two resources before posting to this 
list:
- Earning your support instead of buying it, but Ian Turton: 
http://www.ianturton.com/talks/foss4g.html#/
- The GeoServer user list posting guidelines: 
http://geoserver.org/comm/userlist-guidelines.html

If you want to request a feature or an improvement, also see this: 
https://github.com/geoserver/geoserver/wiki/Successfully-requesting-and-integrating-new-features-and-improvements-in-GeoServer


Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

Re: [Geoserver-users] Fwd: GitHub integration - Encoding issue

Reply via email to