Thanks a lot for your insights! (I'm cc'ing the mailinglist, just in case someone else later stumbles across this.)
Best, G. > On 14. Mar 2022, at 14:57, Aandi Inston <aa...@quite.com> wrote: > > This is largely from memory, so details might be wrong. > Normalisation is an insufficiently known thing to consider when working with > Unicode. (We all know that Unicode is a list of code points (integers). > > Here are some Unicode points for this discussion: > U+0065 "e" Latin Small Letter E > U+00E9 "é" Latin Small Letter E with Acute > U+0301 "◌́" (U+0301) Combining Acute accent - this may not display as expected > Many languages have accents that change letters, so we have "e" plus "acute" > to get "e acute". In Unicode there are two ways to get "e acute". One is the > single Unicode point U+00E9. The other is the TWO characters "e" and > "combining acute accent", so U+0065 followed by U+0301. U+0301 does not take > any space for itself, but dumps an acute accent over the character > before.(Not all accented letters have two representations like this, and some > have more than two). > So, what's the difference between U+00E9 versus U+0065 followed by U+0301? > They will look the same, but a string with the second form will be 1 > character longer, and the offset of all character after it will be changed. > Are they equal? Well, no, not in simple terms because they are different list > of characters. > Do we get both? YES. The Mac OS file systems store the long form. On a French > keyboard, if you type e acute, you get the short form. If you copy paste it > could be either. This can be bad. For example if you get a list of the files > in a folder, and allow the user to type a name to choose the file, there > might not be a match, even though the user can see one. > To get over this, we have "canonical" forms. There are at least four forms, > C, D, KC and KD. precomposedStringWithCanonicalMapping converts to form C. It > doesn't really matter what it is, but if you run all your strings through > precomposedStringWithCanonicalMapping, then you will get more expected > results when comparing strings. >
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com