On Sun, Jan 17, 2010 at 4:15 PM, K.Darcy Otto <do...@csusb.edu> wrote:
> I've been working with RegexkitLite, and I'm wondering whether someone else > who has RegexkitLite can reproduce this problem, or spot what I'm doing > wrong: > > NSString *originalString = > @"IMUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIU"; > > // Using the built-in "range:" option > NSString *firstTry = [originalString > stringByReplacingOccurrencesOfRegex:@"M(.*)" > withString:@"M$1$1" range:NSMakeRange(1,57)]; > NSLog(@"firstTry result: %@",firstTry); > > // Using "substringWithRange:" first > NSString *cutOriginalString = [originalString > substringWithRange:NSMakeRange(1, 57)]; > NSString *secondTry = [cutOriginalString > stringByReplacingOccurrencesOfRegex:@"M(.*)" withString:@"M$1$1"]; > NSLog(@"secondTry result: %@",secondTry); > > Output: > > firstTry result: (null) > secondTry result: > MUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIU > > I contend that the results of firstTry and secondTry should be the same. > What am I missing? Thanks. > If something isn't working quite right, it's often a good idea to to get the NSError object if the API supports it. In this case: NSError *error = NULL; NSString *firstTry = [originalString stringByReplacingOccurrencesOfRegex:@"M(.*)" withString:@"M$1$1" options:RKLNoOptions range:NSMakeRange(1,57) error:&error]; NSLog(@"firstTry result: %@",firstTry); NSLog(@"error: %@", error); NSLog(@"error: %@", [error userInfo]); 2010-01-17 19:04:40.513 list_bug[73048:a0f] firstTry result: (null) 2010-01-17 19:04:40.513 list_bug[73048:a0f] error: Error Domain=RKLICURegexErrorDomain Code=-124 UserInfo=0x409850 "The ICU library returned an unexpected error code." 2010-01-17 19:04:40.514 list_bug[73048:a0f] error: { NSLocalizedDescription = "The ICU library returned an unexpected error code."; NSLocalizedFailureReason = "The error U_STRING_NOT_TERMINATED_WARNING occurred."; RKLICURegexErrorCode = "-124"; RKLICURegexErrorName = "U_STRING_NOT_TERMINATED_WARNING"; RKLICURegexRegex = "M(.*)"; RKLICURegexRegexOptions = 0; } The ICU functions that perform the search and replace functionality have been a big source of bugs in RegexKitLite. The ICU functions have a particularly error prone and brittle calling syntax. Since you're performing a search and replace, the size of the replaced string can be quite a bit larger than the original string. Your example replacement string essentially doubles the size of the final, replaced string. RegexKitLite makes an "educated guess" at what the size of the final, replaced string is going to be. The ICU library fills up whatever buffer you happen to give it, but when it runs out of space, it returns an error code "U_BUFFER_OVERFLOW_ERROR". Now, it's "supposed" to allow you to keep calling the "append and replace" string functions so it can tally up the exact size of the buffer that you would need to complete the replacement. Naturally, there's bugs in the replacement code in at least some versions of ICU where the first overflow error causes the append and replace functions to stop processing because "There's an error!". The API says that you only ever need to do "two passes" of a search and replace at most: if the first pass had too small a buffer, you'll get the size of buffer you need, and therefore the second run is "guaranteed" to succeed because it has calculated the required sizes. So, RegexKitLite has workarounds to compensate for this broken behavior. To do this, RegexKitLite needs to detect the fact that a buffer over flow error has occurred, reset the error status so that ICU thinks it can keep going, and rinse and repeat until ICU says it's finished. However, this introduces another problem: Using this technique, you can really only return one error condition. And if you've got a buffer overflow condition, that's your error. If a second error pops up, then what? From past experience, these routines are pretty brittle, and trying to compensate for them usually just leads to more problems. Therefore, I've decided to take an extremely conservative approach and abort if things start to go sideways. While I'm sure something thought it was a great idea to "warn" you about "your string is not terminated", in reality it does nothing but complicate things.. especially because it's no longer unambiguous if the U_STRING_NOT_TERMINATED_WARNING warning/error is masking an underlying U_BUFFER_OVERFLOW_ERROR because the buffer over flow error code is completely buggy. In this particular case, it looks like you happened to create a replacement string that is exactly the same size as the size RegexKitLite choose for its temporary buffer. A possible work around is to use the pre-4.0 version that's in SVN. It has support for the new Blocks syntax and you can use it to do a search and replace like so: NSString *replacedString = [originalString stringByReplacingOccurrencesOfRegex:@"M(.*)" options:RKLNoOptions inRange:NSMakeRange(1,57) error:&error enumerationOptions:RKLRegexEnumerationNoOptions usingBlock:^NSString *(NSInteger captureCount, NSString * const capturedStrings[captureCount], const NSRange capturedRanges[captureCount], volatile BOOL * const stop){ return([NSString stringWithFormat:@"m...@%@", capturedStrings[1], capturedStrings[1]]); }]; NSLog(@"replacedString: %@", replacedString); 2010-01-17 19:59:31.631 list_bug[73256:a0f] firstTry result: (null) 2010-01-17 20:02:17.374 list_bug[73294:a0f] replacedString: MUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIU 2010-01-17 19:59:31.635 list_bug[73256:a0f] secondTry result: MUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIUUIUIUIU _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com