Re: [precis] Applying the rules three times to get a stable output string?

Christian Schudt Sat, 09 Dec 2017 14:41:32 -0800

I just wrote a Java test, which checks the idempotency for all code points (0 - 
0X10FFFF) for all 4 profiles (opaque, username, username preserved, nickname).


The result is as you suspected:

Only the Nickname profile requires additional application of the rules in order 
to stabilize the output string.
But there’s is no code point which requires more than one iteration.

The other 3 profiles are idempotent on the first run.

I tested with Java 8, which uses Unicode 6.2.0.
If the tests fail on Java 9 (Unicode 8.0), I’ll report back.


— Christian


> Am 09.12.2017 um 23:09 schrieb William Fisher <william.w.fis...@gmail.com>:
> 
> I did not come across any code points where IdentifierClass/Usernames
> required multiple passes to make the result idempotent. Only the
> Nickname profile is affected, due to the interaction between NFKC and
> the case/space rules.
> 
> My implementation applies an extra iteration for the Nickname profile.
> The other profiles verify that the result is idempotent and raise a
> DISALLOWED/not_idempotent error if this is violated. I do not believe
> there are legal inputs for Usernames which violate the idempotency
> requirement, so this is purely defensive.
> 
> 
> On Sat, Dec 9, 2017 at 2:27 PM, Christian Schudt
> <christian.sch...@gmx.de> wrote:
>> Great, thanks! These code points revealed some bugs :-). They should have 
>> been included in the Examples.
>> 
>> Are there any known code points for the IdentifierClass / Usernames as well?
>> Seems like all these code points are disallowed anyway.
>> 
>> If not, implementations could save 1-2 iterations and only apply the 
>> „3-times“-rule for FreeformClass.
>> 
>> 
>> 
>>> Am 09.12.2017 um 20:34 schrieb William Fisher <william.w.fis...@gmail.com>:
>>> 
>>> Where it makes a difference for NicknameCaseMapped:
>>> 
>>> "\u210c"
>>> "\u20a8"
>>> 
>>> Where it makes a difference for Nickname due to spaces:
>>> 
>>> "\u00a8"
>>> "\u02dc"
>>> 
>>> 
>>> On Sat, Dec 9, 2017 at 8:37 AM, Christian Schudt
>>> <christian.sch...@gmx.de> wrote:
>>>> Hi,
>>>> 
>>>> RFC 8264 introduced these new sentences:
>>>> 
>>>>  under certain circumstances, such as when Unicode
>>>>  Normalization Form KC is used, performing Unicode normalization after
>>>>  case mapping can still yield uppercase characters for certain code
>>>>  points
>>>> 
>>>>  Therefore, an implementation SHOULD apply the rules
>>>>  repeatedly until the output string is stable
>>>> 
>>>> 
>>>> I could imagine these sentences refer to code points of the „Unstable“ 
>>>> category, but this category is unused.
>>>> 
>>>> Are there any concrete code points or input strings which show this 
>>>> unstable behaviour?
>>>> I am asking for some test vectors, i.e. an input string, which doesn’t 
>>>> have the expected output string after the first rule application, but 
>>>> after the second one.
>>>> 
>>>> Thanks,
>>>> — Christian
>>>> _______________________________________________
>>>> precis mailing list
>>>> precis@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/precis
>> 

_______________________________________________
precis mailing list
precis@ietf.org
https://www.ietf.org/mailman/listinfo/precis

Re: [precis] Applying the rules three times to get a stable output string?

Reply via email to