On Aug 26, 2009, at 10:43 AM, Michael Ash wrote:

On Wed, Aug 26, 2009 at 5:42 AM, Ken Thomases<k...@codeweavers.com> wrote:
On Aug 25, 2009, at 7:21 PM, Ross Carter wrote:

I haven't tried it, but this should work:

       NSAttributedString* original = whatever;
NSMutableAttributedString* normalized = [[original mutableCopy]
autorelease];
       CFMutableStringRef str = (CFMutableStringRef)[original
mutableString];
       CFStringNormalize(str, kCFStringNormalizationFormD);

This works because -[NSMutableAttributedString mutableString] is a proxy
that automatically fixes up the attribute runs held by its owner.

Hmm, this seems dangerous in the sense that the conversion may be lossy. As far as I can see, there's no guarantee that CFStringNormalize will perform minimal replacements. If it does not, then whole ranges of characters may
have their attributes reset to that of the first replaced character.

Even if testing reveals it to be non-lossy under one testing environment, without a guarantee that might differ under any other testing environment.

http://en.wikipedia.org/wiki/Unicode_equivalence

[... quote snipped ...]

I'm well aware of what it means. The question is, which exact operations on the mutable string proxy does CFStringNormalize perform. If CFStringNormalize performs the minimal replace operations to get the result, then it will preserve the attributes closely. It's conceivable, though, that CFStringNormalize uses a side buffer to compute the normalized form and then does one big replace of the whole mutable string's range. Or, anywhere in between. Like, it might replace a series of precomposed characters with their decompositions all with one replace operation. In that case, the attributes of most of the characters will be lost (replaced with the attributes of the first character in the replace range).

So, it's clear that the _strings_ will always have a deterministic value as a result of normalization. That's the point of normalization. But the _attributed strings_ may not.


Also, it should be self-evident that normalizing to a precomposed form will obliterate attribute differences between a base character and any combining
characters, as discussed elsewhere in this thread.

Good thing he went and normalized to a *de*composed form then, isn't it?

Martin's example used Form D, but Ross never quite said that's what he was normalizing to. He might have been adapting Martin's example but using a different form.

Regards,
Ken

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to