On 6 Jun 2008, at 08:03, Jens Alfke wrote:

On 6 Jun '08, at 3:23 AM, Jason Stephenson wrote:

As a long time UNIX programmer, I'll suggest looking into the regexp library that already comes with OS X.
man regcomp on the command line to find out how to use.

It doesn't look as though this library is Unicode-aware. The strings it takes are C string (char*) with no indication of what encoding is used, and Unicode or UTF-8 aren't mentioned in the man page. From that, I'd guess that this library only works with single-byte encodings (like ISO-Latin-1 or CP-1252, not UTF-8 or the various non- Roman encodings) and that it will treat all non-ascii characters as being not spaces and not letters.

In short, I think it only works correctly with plain ascii. IMHO that's much too limited for most purposes nowadays. Even if you don't touch user-visible text with it, it's still pretty common to find non-ascii characters in HTML, XML, even source code.

Of the regex libraries mentioned so far, I recommend RegexKitLite. It's based on ICU, which is Unicode-savvy, already built into the OS, and used by lots of Apple apps.

You are correct, but in my casual usage, feeding UTF-8 to the POSIX regex routines works just fine if you take into account that the defined character classes are ASCII-aware only, and are aware that the results you get back are byte offsets, not character offsets - i.e. don't convert them to NSRanges and expect them to be correct against the NSString you got the UTF-8 from (similar caveats apply to match counts etc. - i.e. ".{3}" will happily match two characters if they take up three bytes).

I wouldn't want to present the regexes to the user, of course, but for pre-defined regexes in code, it's okay (not great with those caveats obviously, but alright).

My main complaint about it is that it's /extremely slow/ compared to most modern regex libraries, but for casual usage, you at least don't have to link any extra libraries to use it.

I do think that good regex additions to NSString, or an NSRegex class, are highly overdue in Cocoa.

Jamie.



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to