Hi Chris,
wrapping arbitrary (= UTF8-encoded) strings into UTF8_string first is
the proper way to go. Consider the differences between:
*
**1. UCS_string yyy(UTF8_string(xxx)); // almost proper, but
ambiguous (most vexing parse error)**
**2. UCS_string yyy(xxx); // now private: so never use it
3a. UTF8_string utf(xxx); // really proper
3b. UCS_string yyy(utf);
4. ***UCS_string yyy((UTF8_string(xxx))); // also proper (this is 1.
without the *****most vexing parse error)
5. UCS_ASCII_string yyy(xxx)
***
If *xxx* is entirely *ASCII* then all of the above are equivalent.
Otherwise the difference is that 1. properly decodes UTF8-encoded
strings while the old 2. (which is now disabled by private:) did not
(and the compiler has no way to detect an incorrect usage of 2.
Even worse, C++ would sometimes do 2. automatically (and incorrectly)
and without notice. Probably some of the recent Tokenization Errors
reported on bug-apl were caused by this.
Although 1. was throwing an assertion when used incorrectly, some
people wrapped a *try {} catch {}* around it which caused the error
to slip through unnoticed (at least up to the tokenizer).
A somewhat unfortunate decision in the C++11 ff. standards was to
resolve *yyy* in 1. (which is ambiguous at a closer look) into a
declaration
of function*yyy() *and not (as gcc still does) into two constructor calls
*UTF8_string(xxx)* followed by *UCS_string()* with the first. This problem
can apparently be avoided by using 4. instead of 1. (note the extra pair
of () which is NOT redundant).
Finally, 5. is a safe replacement for 2. (and the comment in the *.hh* file
is still valid (so *xxx* MUST be ASCII), which should hopefully avoid the
automatic use of 2. by the compiler. It is also easier to use with *grep*
in order to spot the (still possible) incorrect usage of 5.
Hope this helps,
Jürgen
On 6/6/23 22:13, Chris Moller wrote:
Yeah, I saw your comment in one of the .hh files. What I did was wrap
all the edif ASCII strings in UTF8_string() calls. That works, but if
it's circumventing what you're trying to do, let me know and I'll
think of something else.
Even after a lot of years, I'm still not sure of the differences
between UTF, UCS, Unicode, etc, etc.
--cm
On 6/6/23 15:56, Dr. Jürgen Sauermann wrote:
Hi,
sorry for that. The reason for making it private is to entirely
prevent its usage.
The former implementation of of it only worked for ASCII strings.
There was
a note about that in the header file, but I have seen quite a few
incorrect
usages of it (read: with UTF8-encoded strings) which then caused
other, difficult
to find, errors later on.
Best Regards,
Jürgen
On 6/6/23 17:31, Chris Moller wrote:
Hi, Xtian,
Just pushed a fix for edif if you want to give it a try. Works for
me on SVN 1706 and yesterday's SVN 1708.
--cm
On 6/5/23 03:33, Christian Robert wrote:
SVN 1704 completely broke libedif
Juergen made UCS_string (const char *) a private member of the class
so a lot of compile errors in edif.cc ...
Not sure if this can be fixed. I reverted to SVN 1702 meanwhile.
The is no way I'll revert to the "DEL Editor" !
Xtian.