On Jun 6, 2008, at 5:23 AM, Jason Stephenson wrote:

Hi,

You've gotten a lot of decent answers so far.

As a long time UNIX programmer, I'll suggest looking into the regexp library that already comes with OS X.

man regcomp on the command line to find out how to use.

Note that NSStrings are usually internally stored as UTF-16, and regcomp requires a "char *", so at the very least, you'll need to convert the NSString to UTF-8, which can be expensive (in terms of having to make a large copy of a potentially very large string and walk through before doing any regex work on it).

Worse, once converted to UTF8, it's not documented that regcomp works correctly for any UTF-8 other than ASCII.

Even worse, converting from an index in a UTF-8 string back to the corresponding index in the original NSString is also problematic - you basically have to walk through the UTF-8 string, counting code points (which count double for surrogate pairs).

As a result, using regcomp works OK for shorter strings that are pure ASCII to start with, but longer string or non-ASCII characters start to increase the problem...


One other possible solution is to use the JavaScriptCore and make a JSStringRef (which works with unichars like NSString), and use JavaScript's regex support - that way the results will at least have consistent indices, work well with non-ASCII characters, etc...


Glenn Andreas                      [EMAIL PROTECTED]
 <http://www.gandreas.com/> wicked fun!
JSKit | the easy way to unite JavaScript and Objective C



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to