Re: SVN 1704 completely broke libedif

Dr . Jürgen Sauermann Wed, 07 Jun 2023 07:25:16 -0700

Hi Chris,

wrapping arbitrary (= UTF8-encoded) strings into UTF8_string first is
the proper way to go. Consider the differences between:
*

**1. UCS_string yyy(UTF8_string(xxx)); // almost proper, butambiguous (most vexing parse error)**

**2.  UCS_string yyy(xxx);                // now private: so never use it
3a. UTF8_string utf(xxx);              // really proper
3b. UCS_string yyy(utf);

4. ***UCS_string yyy((UTF8_string(xxx))); // also proper (this is 1.without the *****most vexing parse error)

5.  UCS_ASCII_string yyy(xxx)
***
If *xxx* is entirely *ASCII* then all of the above are equivalent.


Otherwise the difference is that 1. properly decodes UTF8-encoded
strings while the old 2. (which is now  disabled by private:) did not
(and the compiler has no way to detect an incorrect usage of 2.

Even worse, C++ would sometimes do 2. automatically (and incorrectly)
and without notice. Probably some of the recent Tokenization Errors
reported on bug-apl were caused by this.

Although 1. was throwing an assertion when used incorrectly, some
people wrapped a *try {} catch {}* around it which caused the error
to slip through unnoticed (at least up to the tokenizer).

A somewhat  unfortunate decision in the C++11 ff. standards was to

resolve *yyy* in 1. (which is ambiguous at a closer look) into adeclaration

of function*yyy() *and not (as gcc still does) into two constructor calls
*UTF8_string(xxx)* followed by *UCS_string()* with the first. This problem
can apparently be avoided by using 4. instead of 1. (note the extra pair
of () which is NOT redundant).

Finally, 5. is a safe replacement for 2. (and the comment in the *.hh* file
is still valid (so *xxx* MUST be ASCII), which should hopefully avoid the
automatic use of 2. by the compiler. It is also easier to use with *grep*
in order to spot the (still possible) incorrect usage of 5.

Hope this helps,
Jürgen


On 6/6/23 22:13, Chris Moller wrote:

Yeah, I saw your comment in one of the .hh files. What I did was wrapall the edif ASCII strings in UTF8_string() calls. That works, but ifit's circumventing what you're trying to do, let me know and I'llthink of something else.
Even after a lot of years, I'm still not sure of the differencesbetween UTF, UCS, Unicode, etc, etc.
--cm

On 6/6/23 15:56, Dr. Jürgen Sauermann wrote:
Hi,
sorry for that. The reason for making it private is to entirelyprevent its usage.The former implementation of of it only worked for ASCII strings.There wasa note about that in the header file, but I have seen quite a fewincorrectusages of it (read: with UTF8-encoded strings) which then causedother, difficult
to find, errors later on.

Best Regards,
Jürgen


On 6/6/23 17:31, Chris Moller wrote:
Hi, Xtian,
Just pushed a fix for edif if you want to give it a try. Works forme on SVN 1706 and yesterday's SVN 1708.
--cm

On 6/5/23 03:33, Christian Robert wrote:
SVN 1704 completely broke libedif

Juergen made UCS_string (const char *)  a private member of the class
so a lot of compile errors in edif.cc ...
Not sure if this can be fixed. I reverted to SVN 1702 meanwhile.The is no way I'll revert to the "DEL Editor" !
Xtian.

Re: SVN 1704 completely broke libedif

Reply via email to