> Am 07.04.2018 um 20:51 schrieb David Chisnall <gnus...@theravensnest.org>: > > I am testing out a new version of the compiler / runtime that is producing > NSConstantString instances with UTF-16 data. I have currently disabled a lot > of the NSConstantString optimisations, on the basis of ‘make it work then > make it fast’ and I’m still seeing quite a lot of test failures. The most > recent ones seem to come from the fact that GSUnicodeString’s implementation > of rangeOfComposedCharacterSequenceAtIndex: calls rangeOfSequence_u(), which > returns a different range to NSString’s implementation. > > I have ls (an GSUnicodeString) and indianLong (a UTF-16 NSConstantString) > from the NSString/test00.m. If I call -getCharacters:range: on both, then I > get the same set of characters for [indianLong length] characters. This is > as expected. When searching for indianLong in ls, it is not found. Sticking > in a lot of debugging code, I eventually tracked it down to this disagreement > and when I comment out GSUnicodeString’s implementation of > rangeOfComposedCharacterSequenceAtIndex: so that it uses the superclass > implementation then this test passes. > > Please can someone who understands these bits of exciting unicode logic take > a look and see if there’s any reason for the disagreement?
I am surely no expert here, but I had a quick look at the code and the two algorithms seem to be very similar. The only difference is the set of code points that the characters get compared to. NSString uses [NSCharacterSet nonBaseCharacterSet], which looks correct to me. On the other hand GSString uses uni_isnonsp(), which I would read as "non spacing“ but is never explained. The code here is as follows: BOOL uni_isnonsp(unichar u) { /* * Treating upper surrogates as non-spacing is a convenient solution * to a number of issues with UTF-16 */ if ((u >= 0xdc00) && (u <= 0xdfff)) return YES; // FIXME check is uni_cop good for this if (GSPrivateUniCop(u)) return YES; else return NO; } As a side effect this should handle the upper surrogates correctly, but not the lower and I have no idea what GSPrivateUniCop does, even after looking at the code various times. OK, it is a binary search on uni_cop_table, but what is in that table? _______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org https://lists.gnu.org/mailman/listinfo/gnustep-dev