Re: Let's discuss about unicode compositions for filenames!

Julian Foad Fri, 03 Feb 2012 06:02:52 -0800

Hiroaki Nakamura wrote:

>>>  It would be nice if we could normalize paths in the repository without
>>>  having to perform a dump/reload cycle, but I don't know how that 
>>> would  work in FSFS.
>> 
>>  It won't.  Changing the encoding increase the length (in bytes) of the
>>  string (in the dirents hash, for example), and thus change the offsets
>>  of the node-revs that are later in the file --- to which subsequent
>>  revisions, and the id's of those node-revs, refer.
> 
> Changes from NFD to NFC does not increase the length.
> The length will be same or smaller, not larger.


You may well be correct that NFC is never longer than NFD, but that's not the 
question.  The question is whether NFC may be longer than the current paths 
(which are not normalized to normalization form C or to form D).  And the 
answer is yes it may be longer.  See 
<http://unicode.org/faq/normalization.html#11>.


> Here I quote from
> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
>   > The proposed internal 'normal form' should be NFC, if only if
>   > it were because it's the most compact form of the two:  when
>   > allocating memory to store a conversion result, it won't be
>   > necessary (ever) to allocate more than the size of the input buffer.

That statement seems to be talking about converting between NFC and NFD, not 
from un-normalized to normalized.

- Julian

Re: Let's discuss about unicode compositions for filenames!

Reply via email to