Re: UTF-8 in string literals and translation strings in particular

Guillaume Munch Thu, 08 Oct 2015 15:43:12 -0700

Le 08/10/2015 22:11, Georg Baum a écrit :

Jean-Marc Lasgouttes wrote:

The problem with the patch is that it does not have a clear goal. The
discussion would have been much easier if you had splitted it in 3 from
the start:

1/ easy use of utf8 in docstring
2/ allow utf8 in translattable strings
3/ use … instead of ... in UI


4) use of unicode string literals in C++ source files



Thank you for the disambiguation. I included this in 2).

This would have been easier indeed. For example, I have no real opinion
about 2) and 3).

4) is not possible as long as we support C++98 (because the source encoding
is not standardized and especially MSVC has a horrible interpretation of
it).

Then, I agree this cannot go into 2.2. For 2.3, on the other hand, C++11opens better possibilities like directly writing docstring literalswhich is no doubt better than extending the string -> docstring conversions.


Concerning 1) I have a strong opinion which needs a bit of history
explained: When unicode support was introduced in LyX the idea was to
replace all strings which can contain non-ASCII contents with docstring. The
only exceptions would be interfaces to third party libraries or
import/export, where it is sometimes needed to use std::string with a
certain encoding. Unfortunately this conversion was never completely
finished

(this is the reason for all the "FIXME UNICODE" comments).


Good to know.

Therefore, after finishing this task, all occurences of std::string would
contain ASCII contents with very rare exceptions.
The alternative which was also discussed was to use docstring everywhere.
This would have been less work to do, but the advantages of the mixed
docstring/std::string approach were bugs found during the transition
process, more memory and runtime efficiency, and (if it was completed) a
clear picture where one can expect ASCII and where user visible contents is
used.

The proposed changes to docstring weaken the clear separation of ASCII/non-
ASCII contents. They are not needed if the unicode transition is finished
(i.e. all "FIXME UNICODE" comments addressed). They are not needed either if
we change our mind and use the alternative approach of docstring everywhere
instead. For me, the disadvantages count much higher than the advantages,
therefore I would suggest to either finish the unicode transition, or using
docstring everywhere. The only exception would be unicode string literals in
C++11 mode. Support for these in docstring is both safe and useful in any
case.


I am now convinced that string must remain ASCII.

Even independently from the issue 4) with C++98, I agree that it isbetter to wait 2.3 for C++11 support and not cast in stone a situationthat would have been created by C++98 limitations.

So, is the plan is to change char_type from wchar_t to uchar32_t in 2.3and use the syntax U"..." to directly define docstring literals? Do yousee any issues with this change? Then we do not need any conversionmethod, we can just use docstring for all purposes when non-ASCII charsare involved.


Now for the patch under discussion the plan become:

1/ wait for 2.3
2/ allow utf8 in translatable strings, using unicode literals.
3/ use … instead of ... in the UI (as before)

Does it make sense?

Thank you for the detailed answer.


Guillaume

Re: UTF-8 in string literals and translation strings in particular

Reply via email to