For the sake of the archives, and in case anyone else comes across this problem 
(anyone using the NSXML classes should at least be aware of it), here is the 
workaround I have settled on.

I have created my own NSXMLDocument category to generate NSData using 
-XMLDataWithOptions: that then escapes any potentially dodgy occurrences of ‘>’ 
by loading that data object into an NSString. Here is the category method:

@implementation NSXMLDocument (BugWorkaround)

/*
 *NOTE: This method works around a bug in the Cocoa NSXML classes, which refuse 
to escape '>' in any situation.
 *According to section 2.4 of the XML specs ( 
http://www.w3.org/TR/REC-xml/#syntax ), '<' and '&' must always
 *be escaped, but '>' only needs escaping when it appears in the character 
sequence ']]>', unless it marks the
 *end of a CDATA block (or a conditional section - see section 3.4). The Cocoa 
NSXML classes correctly escape
 *'<' and '&' to '&lt;' and '&amp;', but *never* escape '>', even when it 
appears in the string ']]>' inside an
 *element's -stringValue. This generates invalid XML, and NSXMLDocument will 
refuse to read invalid XML unless
 *NSXMLDocumentTidyXML is set, but that messes up all the white space.
 *This method therefore generates NSData from the NSXMLDocument, then reads 
that data into an NSString. It then
 *looks for the ']]>' sequence. If it finds it, it looks for the '<![' 
sequence, which opens a conditional section
 *or CDATA block. If '<![' is *not* found in the XML string, then we can safely 
assume that any occurrences of
 *']]>' must appear only in places where it should be escaped to ']]&gt;' in 
order to create valid XML. This method
 *replaces such instances with an escaped version of the string.
 */
- (NSData *)XMLDataWithPrettyPrintAndExtraEscapes
{
// First, get the data.
NSData*data = [selfXMLDataWithOptions:NSXMLNodePrettyPrint];
// Then, read the resulting XML string.
NSMutableString*str = [[NSMutableStringalloc] initWithData:data 
encoding:NSUTF8StringEncoding];
// Does this XML contain the character sequence ']]>'?
// If not, do nothing - just return the data as-is, as the NSXML classes
// will have escaped everything that needed escaping.
NSRange range = [str rangeOfString:@"]]>"];
if (range.length == 0)
{
[str release];
return data;
}
// If it does contain these characters, though, we need to do some checks. This 
sequence
// of characters should only appear at the end of a CDATA or conditional 
sequence. If the ']]>'
// string appears anywhere else - i.e. in content - then the '>' *must* be 
escaped to '&gt;'.
// We thus check to see if this XML contains any CDATA or conditional sequences 
by looking for
// '<![' (CDATA and conditional sections always begin '<![', e.g. 
'<![CDATA[...]]>'.).
NSRange conditionalRange = [str rangeOfString:@"<!["];
if (conditionalRange.length > 0)
{
// If the document contains any CDATA or conditional sequences,  bail and do 
nothing - we'll
// just have to leave it to the user to ensure that there is no invalid ']]>' 
sequences within
// the content. We don't want to escape any ']]>' sequences that should not be 
escaped, and
// checking which ones should and shouldn't be escaped gets too complicated for 
our purposes.
[str release];
return data;
}
// If we got here, the XML document contains the ']]>' sequence only in places 
that are *not* ending CDATA
// or conditional sequences (because we know there are no such sequences in 
this document). This is bad XML,
// so we replace all of the occurrences of this string with the correct 
']]&gt;' escaped version, and then
// generate and return our own data object from our string.
[str 
replaceOccurrencesOfString:@"]]>"withString:@"]]&gt;"options:0range:NSMakeRange(range.location,
 [str length]-range.location)];
data = [str dataUsingEncoding:NSUTF8StringEncoding];
[str release];
return data;
}

@end

I’ve also filed this as a bug - #7637981.

--- ORIGINAL MESSAGE ---

Still working on this and still getting nowhere, so another question:

Is there a way to prevent NSXMLElement converting '&' into '&amp' so that I can 
resolve character entities myself in my own NSXMLElement category -init... 
method?

To recap the problem, the NSXML classes change '<' into '&lt;' and '&' into 
'&amp;' (when in string value content), just as they should according to the 
XML specs. But they don't convert '>' into '&gt;'. This is fine as the XML 
specs don't require this in most situations, but if '>' appears in the string 
']]>' (when not ending CDATA) then it must be escaped - but Apple's NSXML 
classes don't do this, generating invalid XML that cannot be opened by 
NSXMLDocument in this situation.

I tried creating my own -initWithName:validStringValue: method which did some 
jiggery-pokery and then called -initWithXMLString:, thinking that this wouldn't 
do any conversion, the idea being that I could force ']]>' to appear as 
']]&gt;' myself by creating the XML string directly rather than going through 
-setStringValue. But no. If you try this:

NSXMLElement *element = [[NSXMLElement alloc] 
initWithXMLString:@"<test>&gt;</test>"];
NSLog (@"%@", element);

The output is:

<test>></test>

In other words, the NSXMLElement automatically *forces* any occurrences of 
'&gt;' to become '>', no matter how you try to work around it. And this means 
that if the user has entered the string ']]>' and you need to encode that in 
XML somewhere, then the NSXML classes force you to write invalid XML that 
cannot be read.

I've also tried creating the element like this:
element = [[NSXMLNode alloc] initWithKind:NSXMLElementKind 
options:NSXMLPreserveAll];
[element setName:@"Test"];
[element setObjectValue:@"&gt;"];

But this comes out as:

<Text>&amp;gt;</Test>
...

--- ORIGINAL MESSAGE ---

Just to follow up on this, yet again it seems that the NSXML classes are better 
at validating invalid XML when opening documents than when generating XML data. 
If you include the string "]]>" inside the stringValue of an NSXMLElement, the 
'>' does not get escaped as it should according to the XML specs, and when you 
generate XML document data including such an element and then try to read it 
again, NSXMLDocument will fail and report the error: "Sequence ']]>' not 
allowed in content". Some sample code to demonstrate the issue:

// Create an element containing some characters that should be escaped to 
create valid XML.
NSXMLElement*element = [[[NSXMLElementalloc] initWithName:@"Test"stringValue:@" 
< & > ]]> "] autorelease];
// Note how the '<' and '&' get escaped, but not the '>' (even though it should 
do in the ']]>' sequence).
NSLog(@"%@", element);// OUTPUT: <Test> &lt; &amp; > ]]> </Test>
// Now create an XML doc from the element and generate the data.
NSXMLDocument*xmlDoc = [[[NSXMLDocumentalloc] initWithRootElement:element] 
autorelease];
NSData*data = [xmlDoc XMLDataWithOptions:NSXMLNodePrettyPrint];
// Check the doc and data:
NSLog(@"XML Doc: %...@\ndata: %@", xmlDoc, data);// Yep, they are non-nil, all 
fine.
// Now load the data we created into an XML document.
NSError *error;
xmlDoc = [[NSXMLDocumentalloc] initWithData:data 
options:NSXMLNodePreserveWhitespaceerror:&error];
if(xmlDoc == nil)// If it failed, try with tidy.
xmlDoc = [[NSXMLDocumentalloc] initWithData:data 
options:NSXMLNodePreserveWhitespaceerror:&error];
// Did it fail?
if (xmlDoc == nil)
{
// Run the error.
if (error)
[[NSAlertalertWithError:error] runModal];
// Uh-oh... The error is: "Line 2: Sequence ']]>' not allowed in content". 
Because the '>' should have been escaped.
}

In other words, although the NSXML classes will escape '<' and '&' correctly, 
they will not handle escaping '>' at all - even when it occurs in the invalid 
(except when terminating CDATA) sequence ']]>'.  This then causes the NSXML 
classes to fail when re-loading the document they just created from the same 
data, because NSXML is more fussy about reading than writing.
...



_______________________________________________

Cocoa-dev mailing list ([email protected])

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to