On Sat, Nov 14, 2015 at 4:14 AM, Yuri Pankov <yuri.pan...@nexenta.com>
wrote:

> I'm trying to understand the idea behind the "normalization" property in
> ZFS.
>
> What's the original idea behind the normalization when "normalization" is
> set to "none" - is it "Or we could choose to be normalization-insensitive
> on LOOKUP and normalization-preserving on CREATE." as described in [1]?
>

According to the zfs.1m manpage: "File names are always stored unmodified,
names are normalized as part of any comparison process."  In general,
normalization and casesensitivity work similarly: we always store the
specified bytes, but depending on the settings, some byte sequences may be
considered "identical" from the point of view of lookup and create
operations (in terms of determining if an entry exists).

Therefore:
 - when you list the entries, you will always see the bytes sequence you
used to create a file.
 - when you lookup a byte sequence, it may match a file whose name is a
different byte sequence, but which is considered to be the equivalent
according to the normalization and casesensitivity properties.  (e.g.
casesensitivity=insensitive, there is a file name "foo", you lookup "Foo",
it will match the existing file).
 - when you create a file, it may fail with EEXIST if there is a file with
a name that is equivalent according to the normalization and
casesensitivity properties.

normalization=none means that we do not do normalization, so even if two
characters look the same, if they use different byte sequences, they will
be considered to be distinct.  (Analogous to casesensitivity=sensitive.)

Hopefully the answers to your specific questions below are obvious given
the above principles:


> When comparing filenames for other "normalization" values, which part of
> the comparison do we normalize - the stored filename, or the one in lookup
> request?


normalize on lookup.


>
> Currently I'm seeing that "normalization-preserving on CREATE" part is
> there, but "normalization-insensitive on LOOKUP" is not:


> # zfs create -o mountpoint=/norm/n -o utf8only=on -o normalization=none
> rpool/formN
>

That's because you requested that it not be, by setting normalization=none.


> # cd /norm/n
> # touch $( echo "\xc3\xbc" )
> # touch $( echo "\x75\xcc\x88" )
> # ls
> ü  ü
> # LC_ALL=C ls -b
> u\314\210 \303\274
>
> What of the following is correct per design, not as currently implemented
> (given we have the "same" filename with "ü" character in NFC and NFD forms
> as "fileC" and "fileD"):
>
> A. for all normalization settings the filename itself is NOT modified.
>

Correct.


>
> B. normalization=none
> - creating either of fileC OR fileD is OK, creating another form when one
> exists is NOT.
>

Incorrect, the names are not equivalent according to normalization=none, so
you can create both names,


>
> C.  normalization=formC
> - creating either of fileC OR fileD is OK
>

Correct.


> - C1. fileC exists, creating fileD is OK;


Incorrect, these names are equivalent according to normalization=formC, so
you will get EEXIST when creating fileD.


>

fileD exists, creating fileC isn't OK - normalizing stored filename.
>

Correct, creating fileC will get EEXIST.


> - OR
> - C2. fileC exists, creating fileD isn't OK;


Correct, creating fileD will get EEXIST.


> fileD exists, creating fileC is OK - normalizing the looked up filename.
>

Incorrect, creating fileC will get ENOENT


>
> D. normalization=formD, same as C, swapping the fileC and fileD.
>

Same as with formC, because you said the names are equivalent under both
formC and formD.


>
>
> 1. https://blogs.oracle.com/nico/entry/filesystem_i18n


That post seems accurate to me.

--matt



>
>
> -------------------------------------------
> illumos-zfs
> Archives: https://www.listbox.com/member/archive/182191/=now
> RSS Feed:
> https://www.listbox.com/member/archive/rss/182191/27179292-bb9021e0
> Modify Your Subscription:
> https://www.listbox.com/member/?member_id=27179292&id_secret=27179292-acf9db97
> Powered by Listbox: http://www.listbox.com
>
_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to