Hi Chris,

wrapping arbitrary (= UTF8-encoded) strings into UTF8_string first is
the proper way to go. Consider the differences between:
*
**1.  UCS_string yyy(UTF8_string(xxx));   // almost proper, but ambiguous (most vexing parse error)**
**2.  UCS_string yyy(xxx);                // now private: so never use it
3a. UTF8_string utf(xxx);              // really proper
3b. UCS_string yyy(utf);
4. ***UCS_string yyy((UTF8_string(xxx)));   // also proper (this is 1. without the *****most vexing parse error)
5.  UCS_ASCII_string yyy(xxx)
***
If *xxx* is entirely *ASCII* then all of the above are equivalent.

Otherwise the difference is that 1. properly decodes UTF8-encoded
strings while the old 2. (which is now  disabled by private:) did not
(and the compiler has no way to detect an incorrect usage of 2.

Even worse, C++ would sometimes do 2. automatically (and incorrectly)
and without notice. Probably some of the recent Tokenization Errors
reported on bug-apl were caused by this.

Although 1. was throwing an assertion when used incorrectly, some
people wrapped a *try {} catch {}* around it which caused the error
to slip through unnoticed (at least up to the tokenizer).

A somewhat  unfortunate decision in the C++11 ff. standards was to
resolve *yyy* in  1. (which is ambiguous at a closer look) into a declaration
of function*yyy() *and not (as gcc still does) into two constructor calls
*UTF8_string(xxx)* followed by *UCS_string()* with the first. This problem
can apparently be avoided by using 4. instead of 1. (note the extra pair
of () which is NOT redundant).

Finally, 5. is a safe replacement for 2. (and the comment in the *.hh* file
is still valid (so *xxx* MUST be ASCII), which should hopefully avoid the
automatic use of 2. by the compiler. It is also easier to use with *grep*
in order to spot the (still possible) incorrect usage of 5.

Hope this helps,
Jürgen


On 6/6/23 22:13, Chris Moller wrote:
Yeah, I saw your comment in one of the .hh files. What I did was wrap all the edif ASCII strings in UTF8_string() calls.  That works, but if it's circumventing what you're trying to do, let me know and I'll think of something else.

Even after a lot of years, I'm still not sure of the differences between UTF, UCS, Unicode, etc, etc.

--cm

On 6/6/23 15:56, Dr. Jürgen Sauermann wrote:
Hi,

sorry for that. The reason for making it private is to entirely prevent its usage. The former implementation of of it only worked for ASCII strings. There was a note about that in the header file, but I have seen quite a few incorrect usages of it (read: with UTF8-encoded strings) which then caused other, difficult
to find, errors later on.

Best Regards,
Jürgen


On 6/6/23 17:31, Chris Moller wrote:
Hi, Xtian,

Just pushed a fix for edif if you want to give it a try. Works for me on SVN 1706 and yesterday's SVN 1708.

--cm

On 6/5/23 03:33, Christian Robert wrote:
SVN 1704 completely broke libedif

Juergen made UCS_string (const char *)  a private member of the class
so a lot of compile errors in edif.cc ...

Not sure if this can be fixed. I reverted to SVN 1702 meanwhile. The is no way I'll revert to the "DEL Editor" !


Xtian.




Reply via email to