When CFXMLCreateStringByUnescapingEntities is passed the string "�”,
it returns a string of two unassigned Unicode characters which cause an NSLog
containing it to not print, and also upsets Core Data.
// Define the problematic string
NSString* bomb1 = @"�" ;
NSLog(@"bomb1 length=%ld", (long)[bomb1 length]) ;
NSLog(@"bomb1 = '%@'", bomb1) ;
// Run it thru CFXMLCreateStringByUnescapingEntities()
NSString* bomb2 = (NSString*)CFXMLCreateStringByUnescapingEntities(NULL,
(CFStringRef)bomb1, NULL) ;
// Examine the result
NSLog(@"bomb2 length=%ld", (long)[bomb2 length]) ;
unichar char0 = [bomb2 characterAtIndex:0] ;
NSLog(@"char0 = '%c' = %x = %d", char0, char0, char0) ;
unichar char1 = [bomb2 characterAtIndex:1] ;
NSLog(@"char1 = '%c' = %x = %d", char1, char1, char1) ;
NSLog(@"bomb2 = '%@' THIS DOES NOT LOG AT ALL!!!", bomb2) ;
printf("printf bomb2: %s\n", [bomb2 UTF8String]) ;
Here is the result:
TestApp[13859:303] bomb1 length=10
TestApp[13859:303] bomb1 = '�'
TestApp[13859:303] bomb2 length=2
TestApp[13859:303] char0 = 'É' = dcc9 = 56521
TestApp[13859:303] char1 = '-' = df2d = 57133
printf bomb2: (null)
I don’t see why CFXMLCreateStringByUnescapingEntities() is even touching bomb1,
because it does not end in a semicolon. There is no HTML entity in bomb1.
The two characters in bomb2, U+DCC9 and U+DF2D, are unassigned characters in
the “Low Surrogates” block. Changing the number “13207494” to a slightly
different value sometimes cures the problem.
The Core Data upset occurs in -[NSManagedObjectContext save:], wherein an
object has bomb2 as the value of a String attribute. (Of course, that’s how I
“discovered” this problem; a user managed to get a string containing bomb1 in
their input xml data.) The returned error is:
Error Code: 1671
Error Domain: NSCocoaErrorDomain,
Localized Description: The operation couldn’t be completed. (Cocoa error 1671.)
NSValidationErrorKey: location (This is the name of the String attribute.)
NSValidationErrorValue:
Actually, the error viewer in my app, an NSTextView, displays the
NSValidationErrorValue as a string of two identical characters that look like a
square containing an upper-case letter A whose left half is blacked out. But
when I try to copy and paste that into any other app, including Mail.app, I get
0 characters.
The “validation” error is *not* because the value of the ‘location’ attribute
is nil. The ‘location’ attribute is optional in the data model, and often is
nil in working data sets.
This seems to me like a bug in CFXMLCreateStringByUnescapingEntities(), and
that the proper workaround would be to pre-flight its input value (bomb1) and
take evasive action if necessary. But since the problem occurs with numbers
other than 13207494, I need to know the bounds of the set of “bad” substrings.
Alternatively, a not-so-good workaround would be to detect “invalid” strings in
the output. I’m hoping that someone smarter than me will know an answer that
does not involve brute force :|
Thanks,
Jerry Krinock
_______________________________________________
Cocoa-dev mailing list ([email protected])
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com
This email sent to [email protected]