Re: Question about Perl5 extended UTF-8 design
On 11/06/2015 01:32 PM, Richard Wordingham wrote: On Thu, 05 Nov 2015 13:41:42 -0700 "Doug Ewell" wrote: Richard Wordingham wrote: No-one's claiming it is for a Unicode Transformation Format (UTF). Then they ought not to call it "UTF-8" or "extended" or "modified" UTF-8, or anything of the sort, even if the bit-shifting algorithm is based on UTF-8. "UTF-8 encoding form" is defined as a mapping of Unicode scalar values -- not arbitrary integers -- onto byte sequences. [D92] If it extends the mapping of Unicode scalar values *into* byte sequences, then it's an extension. A non-trivial extension of a mapping of scalar values has to have a larger domain. I'm assuming that 'UTF-8' and 'UTF' are not registered trademarks. Richard. I have no idea how my original message ended up being marked to send to this list. I'm sorry. It was meant to be a personal message for someone who I believe was involved in the original design.
Re: Question about Perl5 extended UTF-8 design
On Thu, 05 Nov 2015 13:41:42 -0700 "Doug Ewell" wrote: > Richard Wordingham wrote: > > > No-one's claiming it is for a Unicode Transformation Format (UTF). > > Then they ought not to call it "UTF-8" or "extended" or "modified" > UTF-8, or anything of the sort, even if the bit-shifting algorithm is > based on UTF-8. > "UTF-8 encoding form" is defined as a mapping of Unicode scalar values > -- not arbitrary integers -- onto byte sequences. [D92] If it extends the mapping of Unicode scalar values *into* byte sequences, then it's an extension. A non-trivial extension of a mapping of scalar values has to have a larger domain. I'm assuming that 'UTF-8' and 'UTF' are not registered trademarks. Richard.
Re: Question about Perl5 extended UTF-8 design
Am 05.11.2015 um 23:11 schrieb Ilya Zakharevich: First of all, “reserved” means that they have no meaning. Right? Almost. “Reserved” means that they have currently no meaning but may be assigned a meaning, later; hence you ought not use them lest your programs, or data, be invalidated by later amendmends of the pertinent specification. In contrast, “invalid”, or “ill-formed” (Unicode term), means that the particular bit pattern may never be used in a sequence that purports to represent Unicode characters. In practice, that means that no programm is allowed to send those ill-formed patterns in Unicode-based data exchange, and every program should refuse to accept those ill-formed patterns, in Unicode-based data exchange. What a program does internally is at the discretion (or should I say: “whim”?) of its author, of course – as long as the overall effect of the program complies with the standard. Best wishes, Otto Stolz