On Mar 21, 2013, at 6:05 PM, Andrew Thompson <lordpi...@me.com> wrote:

> 
> 
> On Mar 21, 2013, at 2:10 PM, Aki Inoue <a...@apple.com> wrote:
> 
>> For that matter, UTF-32 (aka UCS-4) is not safe to find the truncation 
>> boundary just at the 4-byte boundary.
> 
> You're thinking of combining marks here?
Yes.

> It's generally claimed that one can multiply character offsets by 4 to index 
> into UCS-4 data… which I think I now see is only true depending on your 
> definition of character; i.e whether one considers a decomposed sequence to 
> be one character or two.

> I see how truncation would be unsafe because you'd chop off the accents etc?
Yes.

Aki

> 
> 


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to