On HFS (which appears to be the default Mac filesystem prior to High
Sierra), unicode names are "normalized" before recording.  Thus with a
script like:

    mkdir tmp
    cd tmp

    auml=$(printf "\303\244")
    aumlcdiar=$(printf "\141\314\210")
    >"$auml"

    echo "auml:          " $(echo -n "$auml" | xxd)
    echo "aumlcdiar:     " $(echo -n "$aumlcdiar" | xxd)
    echo "Dir contents:  " $(echo -n * | xxd)

    echo "Stat auml:     " "$(stat -f "%i   %Sm   %Su %N" "$auml")"
    echo "Stat aumlcdiar:" "$(stat -f "%i   %Sm   %Su %N" "$aumlcdiar")"

We see output like:

    auml:           00000000: c3a4 ..
    aumlcdiar:      00000000: 61cc 88 a..
    Dir contents:   00000000: 61cc 88 a..
    Stat auml:      857473   Apr 26 09:40:40 2018   newren ä
    Stat aumlcdiar: 857473   Apr 26 09:40:40 2018   newren ä

On APFS, which appears to be the new default filesystem in Mac OS High
Sierra, we instead see:

    auml:           00000000: c3a4 ..
    aumlcdiar:      00000000: 61cc 88 a..
    Dir contents:   00000000: c3a4 ..
    Stat auml:      8591766636   Apr 26 09:40:59 2018   newren ä
    Stat aumlcdiar: 8591766636   Apr 26 09:40:59 2018   newren ä

i.e. APFS appears to record the filename as specified by the user, but
continues to allow the user to access it via any name that normalizes
to the same thing.  This difference causes t0050-filesystem.sh to fail
the final two tests.  I could change the "UTF8_NFD_TO_NFC" flag
checking in test-lib.sh to instead test the exit code of stat to make
it pass these two tests, but I have no idea if there are problems
elsewhere that this would just be papering over.

I dislike Mac OS and avoid it, so I'd prefer to find someone else
motivated to fix this.  If no one is, I may eventually try to fix this
up...in a year or three from now.  But is someone else interested?
Would this serve as a good microproject for our microprojects list (or
are the internals hairy enough that this is too big of a project for
that list)?


Elijah

Reply via email to