On Sat, Feb 28, 2009 at 8:58 AM, Shawn Erickson <shaw...@gmail.com> wrote:
> On Sat, Feb 28, 2009 at 8:45 AM, Clark Cox <clarkc...@gmail.com> wrote:
>
>>>... not sure what Michael is
>>> talking about.
>>
>> On Leopard, invalid bytes will indeed be escaped:
>
> Ah going back over the email chain I now get the context of the
> conversation when Michael made his comment about escaping.
>
> Anyway I was mostly pointing out that it isn't HSF+ doing this it is
> the POSIX APIs which expect UTF-8 and presumably some place now escape
> invalid bytes (non-UTF-8). HFS+ as I noted doesn't work with UTF-8.

Ah so the escaping comes from utf8_decodestr (vfs_utfconv.c, shared by
all file systems) if you pass the UTF_ESCAPE_ILLEGAL option. If that
wasn't specified EINVAL would be returned.

/*
 * utf8_decodestr - Decodes a UTF-8 string into Unicode
 *
 * This function takes an UTF-8 input string, utf8p, of utf8len bytes
 * and produces the Unicode output into a buffer of buflen bytes pointed
 * to by ucsp. The size of the output in bytes (not including a NULL
 * termination byte) is returned in ucslen. Both buffers must reside
 * in kernel memory.
 *
 * If '/' chars are allowed in the Unicode output then an alternate
 * (replacement) char must be provided in altslash.
 *
 * FLAGS
 *    UTF_REV_ENDIAN:  Unicode byte order is opposite current runtime
 *
 *    UTF_BIG_ENDIAN:  Unicode byte order is always big endian
 *
 *    UTF_LITTLE_ENDIAN:  Unicode byte order is always little endian
 *
 *    UTF_DECOMPOSED:  generate fully decomposed output (NFD)
 *
 *    UTF_PRECOMPOSED:  generate precomposed output (NFC)
 *
 *    UTF_ESCAPE_ILLEGAL:  percent escape any illegal UTF-8 input
 *
 * ERRORS
 *    ENAMETOOLONG:  output did not fit; only ucslen bytes were decoded.
 *
 *    EINVAL:  illegal UTF-8 sequence encountered.
 */

At this time it looks like only the HFS+ file system code specifies
this flag when converting incoming UTF-8 names to the HFS+ Unicode
encoding. Interestingly it isn't universally applied when the HFS+
gets UTF-8 names from its callers... It appears to only happen on
catalog entry creation and lookup... it isn't used for attribute name
or post creation name comparison (did a very quick look over of the
HFS+ code in XNU so I could be misunderstanding the pathways a little)

-Shawn
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to