I kept my original question as brief as I could, but let me tell you what 
problem I’m trying to solve, and maybe someone will have good advice I haven’t 
yet considered.

I’m trying to code in pure Swift. I have an NSAttributedString which can 
potentially be very large, and I want to save off the 
attributedSubstringFromRange: which represents the string with leading and 
trailing whitespace trimmed. I’m trying to avoid copying the giant string 
merely to determine the proper substring range for copying it again.

Swift has a built-in func stringByTrimmingCharactersInSet(set: NSCharacterSet) 
-> String which won’t help me because using it would copy the string and 
discard the attributes. Even using it for length-testing wouldn’t work, because 
I have no way to know how many characters were trimmed off the head versus the 
tail of the string.

What would be nice is a way to count leading and trailing characters in place 
while the thing is still an NSAttributedString--without using 
NSAttributedString.string to convert to a Swift string in the first place. If 
there were no conversion to the unicode-compliant and amazingly 
difficult-to-do-anything-with-it Swift string, I’d be more confident that the 
shrunken range I calculate would be apples to apples.

-- 

Charles

On April 2, 2015 at 01:25:40, Quincey Morris 
(quinceymor...@rivergatesoftware.com) wrote:

On Apr 1, 2015, at 21:17 , Charles Jenkins <cejw...@gmail.com> wrote:

for ch in String(char).utf16 {  
if !set.characterIsMember(ch) { found = false }  
}  

Except that this code can’t possibly be right, in general. 

1. A ‘unichar’ is a UTF-16 code value, but it’s not a Unicode code point. Some 
UTF-16 code values have no meaning as “characters” by themselves. I think you 
could mitigate this problem by using ‘longCharacterIsMember’, which takes a 
UTF-32 code value instead (and enumerating the string as UTF-32 instead of 
UTF-16).

2. A Swift ‘Character’ isn’t a Unicode code point, but rather a grapheme. That 
is, it might be a sequence of code points (and I mean code points, not code 
values). It might be such a sequence either because there’s no way of 
representing the grapheme by a single code point, or because it’s a composed 
character made up of a base code points and some combining characters.

In this case, you can’t validly test the individual code points for membership 
of the character set.

I’m not sure, but I suspect the underlying obstacle is that NSCharacterSet is 
at best a set of code points, and you cannot test a grapheme for membership of 
a set of code points.

In your particular application, if it’s true that all** Unicode whitespace 
characters are represented as a single code point (via a single UTF-32 code 
value), or a single UTF-16 code value, then you can get away with one of the 
above solutions. Otherwise you’re going to need a more complex solution, that 
doesn’t involve NSCharacterSet at all.



** Or at least the ones you happen to care about, but ignoring the others may 
be a perilous proceeding.

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to