Am 12.04.2014 um 06:56 schrieb Richard Frith-Macdonald <richardfrithmacdon...@gmail.com>:
> >> On 11 Apr 2014, at 22:54, Fred Kiefer <fredkie...@gmx.de> wrote: >> >>> On 08.04.2014 16:14, Mathias Bauer wrote: >>> Hi, >>> >>> the following simple test program throws an exception: >>> >>> >>>> #import <Foundation/Foundation.h> >>>> >>>> int main(int argc, const char * argv[]) >>>> { >>>> @autoreleasepool >>>> { >>>> NSString* text = @"h1. Real >>>> Acme\n\n||{noborder}{left}Item||{right}Price||\n|Testproduct|{right}2 >>>> x $59.50|\n| |{right}net amount: $100.00|\n| |{right}total amount: >>>> $119.00|\n\n\nh2. Thanks for your purchase!\n\n\n"; >>>> >>>> // NSRegularExpression* expr = [NSRegularExpression >>>> regularExpressionWithPattern:@".*?$" >>>> options:NSRegularExpressionAnchorsMatchLines error:NULL]; >>>> // int currentIndex = 27; >>>> >>>> NSRegularExpression* expr = [NSRegularExpression >>>> regularExpressionWithPattern:@"h[123]\\. " >>>> options:NSRegularExpressionCaseInsensitive error:NULL]; >>>> int currentIndex = 33; >>>> >>>> [expr firstMatchInString:text options:NSMatchingAnchored >>>> range:NSMakeRange(currentIndex, [text length]-currentIndex-1)]; >>>> } >>>> return 0; >>>> } >>> >>> The call to firstMatchInString will end up in calling uregex_lookingAt >>> (thus carrying out a regex match) and afterwards calling uregex_start >>> and uregext_end (thus retrieving the matched text range). The results of >>> the two latter calls will be used to create an NSRange object in the >>> prepareResult function of NSRegularExpression.m. And because the length >>> of this range is negative, an exception is thrown. >>> >>> Let's have a look at the data: >>> >>> The matching region starts at position 33, it ends at the string end. >>> This region has been set at the regex by calling uregex_setRegion (in >>> the setupRegex function in NSRegularExpression.m). >>> >>> According to the documentation, uregex_start should return the index in >>> the input string of the start of the text matched. In my book this >>> should be the position of the "h2" near the end of the string. >>> >>> According to the documentation, uregex_end should return the index in >>> the input string of the position following the end of the text matched. >>> In my book that should be start + 4. >>> >>> But I get back: 33 for start and 4 for end. That obviously can't work. >>> >>> I can't believe that the ICU regex implementation (I'm using ICU4.8 on >>> Ubuntu 13.10 64Bit) is broken to this extent, so probably the >>> NSRegularExpression implementation uses it incorrectly. But OTOH I can't >>> spot an obvious error. >>> >>> Any hints would be greatly appreciated. >> >> No hint, just some feedback. I was able to reproduce you problem on my >> GNUstep installation but completely failed to understand why uregex >> comes up with 4 as the result of uregex_end. > > I spent all night looking at this ... without a lot of success. > > I think this really is an ICU bug; I can consistently reproduce the problem > and what I *think* is happeneing is that the call to uregex_lookingAt() is > simply not working properly in this situation ... it's not honoring the range > which was set for the regular expression, but neither is it ignoring it. > > It seems to me that: > It's starting the matching at the start of the string rather than at the > start of the range. > So it actually matches the 'h1. ' right at the start of the string. > It's then reporting the start index as if it had matched in the range, and > since the range starts at 33 it reports a match at offset 0 in the string as > being at index 33. > But it's reporting the end index as if it had matched in the whole string > (which it did) as 4. After more thinking, I reached the same conclusion. > That looks very broken, and despite spending ages looking at the ICU > documentation etc, I have been unable to find any indication that there's any > misinterpretation of the way it *should* be working. > > When I tried the same test on OSX, it didn't raise an exception, but neither > did it work ... on OSX the behavior was to return a nil object for the match. OSX might even be correct here. An anchored search should only give a result if the expression gets matched at the start of the range and if I counted correctly ther is no h at position 33. > I modified base to do the same as OSX, but I think probably a bug report to > the ICU project is the way to go as I can't see any way of working around it. > I did consider extracting the substring from the range and then using ICU to > match that substring (avoiding ICUs range code), but that would not work if > the NSMatchingWithTransparentBounds is used, so it would only be a partial > workaround. _______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org https://lists.gnu.org/mailman/listinfo/gnustep-dev