Re: Question about Perl5 extended UTF-8 design

2015-11-06 Thread Karl Williamson

On 11/06/2015 01:32 PM, Richard Wordingham wrote:

On Thu, 05 Nov 2015 13:41:42 -0700
"Doug Ewell"  wrote:


Richard Wordingham wrote:


No-one's claiming it is for a Unicode Transformation Format (UTF).


Then they ought not to call it "UTF-8" or "extended" or "modified"
UTF-8, or anything of the sort, even if the bit-shifting algorithm is
based on UTF-8.



"UTF-8 encoding form" is defined as a mapping of Unicode scalar values
-- not arbitrary integers -- onto byte sequences. [D92]


If it extends the mapping of Unicode scalar values *into* byte
sequences, then it's an extension.  A non-trivial extension of a
mapping of scalar values has to have a larger domain.

I'm assuming that 'UTF-8' and 'UTF' are not registered trademarks.

Richard.



I have no idea how my original message ended up being marked to send to 
this list.  I'm sorry.  It was meant to be a personal message for 
someone who I believe was involved in the original design.


Re: Question about Perl5 extended UTF-8 design

2015-11-06 Thread Richard Wordingham
On Thu, 05 Nov 2015 13:41:42 -0700
"Doug Ewell"  wrote:

> Richard Wordingham wrote:
> 
> > No-one's claiming it is for a Unicode Transformation Format (UTF).
> 
> Then they ought not to call it "UTF-8" or "extended" or "modified"
> UTF-8, or anything of the sort, even if the bit-shifting algorithm is
> based on UTF-8.

> "UTF-8 encoding form" is defined as a mapping of Unicode scalar values
> -- not arbitrary integers -- onto byte sequences. [D92]

If it extends the mapping of Unicode scalar values *into* byte
sequences, then it's an extension.  A non-trivial extension of a
mapping of scalar values has to have a larger domain.

I'm assuming that 'UTF-8' and 'UTF' are not registered trademarks.

Richard.


Re: Question about Perl5 extended UTF-8 design

2015-11-06 Thread Otto Stolz

Am 05.11.2015 um 23:11 schrieb Ilya Zakharevich:

First of all, “reserved” means that they have no meaning.  Right?


Almost.

“Reserved” means that they have currently no meaning
but may be assigned a meaning, later; hence you ought
not use them lest your programs, or data, be invalidated
by later amendmends of the pertinent specification.

In contrast, “invalid”, or “ill-formed” (Unicode term),
means that the particular bit pattern may never be used
in a sequence that purports to represent Unicode characters.
In practice, that means that no programm is allowed to
send those ill-formed patterns in Unicode-based data exchange,
and every program should refuse to accept those ill-formed
patterns, in Unicode-based data exchange.

What a program does internally is at the discretion (or should
I say: “whim”?) of its author, of course – as long as the
overall effect of the program complies with the standard.

Best wishes,
  Otto Stolz