On 6 Jun '08, at 3:23 AM, Jason Stephenson wrote:

As a long time UNIX programmer, I'll suggest looking into the regexp library that already comes with OS X.
man regcomp on the command line to find out how to use.

It doesn't look as though this library is Unicode-aware. The strings it takes are C string (char*) with no indication of what encoding is used, and Unicode or UTF-8 aren't mentioned in the man page. From that, I'd guess that this library only works with single-byte encodings (like ISO-Latin-1 or CP-1252, not UTF-8 or the various non- Roman encodings) and that it will treat all non-ascii characters as being not spaces and not letters.

In short, I think it only works correctly with plain ascii. IMHO that's much too limited for most purposes nowadays. Even if you don't touch user-visible text with it, it's still pretty common to find non- ascii characters in HTML, XML, even source code.

Of the regex libraries mentioned so far, I recommend RegexKitLite. It's based on ICU, which is Unicode-savvy, already built into the OS, and used by lots of Apple apps.

—Jens

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to