> Am 08.04.2018 um 16:11 schrieb David Chisnall <gnus...@theravensnest.org>: > > On 8 Apr 2018, at 12:41, David Chisnall <gnus...@theravensnest.org> wrote: >> >> On 8 Apr 2018, at 10:55, Richard Frith-Macdonald >> <richard.frith-macdon...@theengagehub.com> wrote: >>> >>> >>> >>>> On 6 Apr 2018, at 11:00, David Chisnall <gnus...@theravensnest.org> wrote: >>>> >>>> It would probably help catch more bugs if we made use of NSString’s >>>> class-cluster nature more in -base. I have just fixed a bug in GSString >>>> where we were checking one object matched a particular class before >>>> dereferencing the _flags ivar of the other. I caught this because the >>>> other was a GSTinyString, which is almost never a valid pointer. >>> >>> Possibly, but performance *is* an issue here. The NSString code was >>> rewritten some years ago (moving away from them use of class cluster >>> features) as a result of extensive profiling of real-world applications >>> which were running too slow, precisely because NSString methods are very >>> heavily used in real apps. At the time somethjing like 20% of the CPU was >>> wasted in method dispatch overheads (the -characterAtIndex: method is one >>> of the cluster primitives and a major culprit) but there were also >>> performance issues due to buffer allocation and copying of internal >>> representations. The changes made a substantial improvement in general >>> performance as well as causing multipler orders of magnitude improvement in >>> a few pathological cases. >> >> I agree that we should be improving performance for critical code, but >> unfortunately it appears that we have done so at the expense of correctness >> in a number of places. As per my other email, >> -rangeOfComposedCharacterSequenceAtIndex: appears to give the wrong results >> in almost every nontrivial case, and is unfortunately one of the primitive >> methods for a lot of things. >> >> I also note that a lot of the NSString method implementations are not well >> optimised. In a number of places, -characterAtIndex: is called repeatedly, >> when -getCharacters:range: is normally significantly more efficient. The >> ICU UText interface provides something very similar to -getCharacters:range: >> as its primitive method (a callback that fills a buffer with UTF-16 >> characters) and has some carefully optimised routines. > > I’ve pushed my WIP changes to the newabi branch - review is welcome! This > branch disables the GSString implementation of > -rangeOfComposedCharacterSequenceAtIndex: and falls back to the NSString one > (which is also wrong, but now consistently wrong).
I think your implementation of characterAtIndex: is wrong for the UTF8 case. Sadly things are more complicated. Please have a look at the (slow) way the old code is handling this. - (unichar) characterAtIndex: (NSUInteger)index { NSUInteger l = 0; unichar u; unichar n = 0; unsigned i = 0; while (i < nxcslen || n > 0) { u = nextUTF8((const uint8_t *)nxcsptr, nxcslen, &i, &n); if (l++ == index) { return u; } } [NSException raise: NSInvalidArgumentException format: @"-characterAtIndex: index out of range"]; return 0; } _______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org https://lists.gnu.org/mailman/listinfo/gnustep-dev