Hi,

Just to follow up on this as I'm still having problems have done some more 
testing and double-checked the XML specs.

Yet again it seems that the NSXML classes are better at validating invalid XML 
when opening documents than when generating XML data. If you include the string 
"]]>" inside the stringValue of an NSXMLElement, the '>' does not get escaped 
as it should according to the XML specs, and when you generate XML document 
data including such an element and then try to read it again, NSXMLDocument 
will fail and report the error: "Sequence ']]>' not allowed in content". Some 
sample code to demonstrate the issue:

// Create an element containing some characters that should be escaped to 
create valid XML.
NSXMLElement*element = [[[NSXMLElementalloc] initWithName:@"Test"stringValue:@" 
< & > ]]> "] autorelease];
// Note how the '<' and '&' get escaped, but not the '>' (even though it should 
do in the ']]>' sequence).
NSLog(@"%@", element);// OUTPUT: <Test> &lt; &amp; > ]]> </Test>
// Now create an XML doc from the element and generate the data.
NSXMLDocument*xmlDoc = [[[NSXMLDocumentalloc] initWithRootElement:element] 
autorelease];
NSData*data = [xmlDoc XMLDataWithOptions:NSXMLNodePrettyPrint];
// Check the doc and data:
NSLog(@"XML Doc: %...@\ndata: %@", xmlDoc, data);// Yep, they are non-nil, all 
fine.
// Now load the data we created into an XML document.
NSError *error;
xmlDoc = [[NSXMLDocumentalloc] initWithData:data 
options:NSXMLNodePreserveWhitespaceerror:&error];
if(xmlDoc == nil)// If it failed, try with tidy.
xmlDoc = [[NSXMLDocumentalloc] initWithData:data 
options:NSXMLNodePreserveWhitespaceerror:&error];
// Did it fail?
if (xmlDoc == nil)
{
// Run the error.
if (error)
[[NSAlertalertWithError:error] runModal];
// Uh-oh... The error is: "Line 2: Sequence ']]>' not allowed in content". 
Because the '>' should have been escaped.
}

In other words, although the NSXML classes will escape '<' and '&' correctly, 
they will not handle escaping '>' at all - even when it occurs in the invalid 
(except when terminating CDATA) sequence ']]>'.  This then causes the NSXML 
classes to fail when re-loading the document they just created from the same 
data, because NSXML is more fussy about reading than writing.

One the one hand, it is (sort of) fair enough to expect the user of these 
classes to ensure the string values are valid XML (even if it does mean every 
user of these classes having to be extra careful and become very familiar with 
the XML specs); on the other hand, how do I go about ensuring valid XML when 
this is user-generated data over which I have no control, and when the NSXML 
classes will tidy up the ampersands in any character entities I try to escape 
myself?

At first, I thought I could just replace all occurrences of ">" with "&gt;" 
using NSString's -stringByReplacingOccurrencesOfString:withString:, e.g.:

NSString *validXMLStr = [userStr stringByReplacingOccurrencesOfString:@">" 
withString:@"&gt;"];
NSXMLElement *element = [[NSXMLElement alloc] initWithName:@"Text" 
stringValue:validXMLStr];

Then, to restore it:
NSString *value = [element stringValue];
userStr = [value stringByReplacingOccurrencesOfString:@"&gt;" withString:@">"];

But of course, that won't work, because the "&gt;" I place in my "fixed" string 
will become "&amp;gt;" in the XML file. So, consider the user had written a 
string all about XML himself:

"It turns out that ']]>' needs changing to ']]&gt;' for valid XML..."

I then swap out the '>' in this situation to '&gt;':

"It turns out that ']]&gt;' needs changing to ']]&gt;' for valid XML..."

I then pass it to the stringValue of an NSXMLElement which encodes it as:

"It turns out that ']]&amp;gt;' needs changing to ']]&amp;gt;' for valid XML..."

Then I read it back out, get its string value, and swap all occurrences of 
'&gt;' for '>', and what we get on re-opening the file is:

"It turns out that ']]>' needs changing to ']]>' for valid XML..."

i.e. Not what the user wrote. An unlikely situation, I know, but not impossible 
and I have to account for it.

In other words, if NSXMLElement won't escape the '>' for me in situations where 
it should, how do I do it myself?

Am I missing something obvious? Has anybody had to handle something similar?

Many thanks and all the best,
Keith

----- Original Message ----
From: Jens Alfke <j...@mooseyard.com>
To: Keith Blount <keithblo...@yahoo.com>
Cc: glenn andreas <gandr...@mac.com>; "cocoa-dev@lists.apple.com" 
<cocoa-dev@lists.apple.com>
Sent: Tue, February 9, 2010 9:37:46 PM
Subject: Re: NSXML and >


On Feb 9, 2010, at 1:03 PM, Keith Blount wrote:

> Great, many thanks for the reply, and for the location of the information in 
> the XML docs, that's very helpful. Unfortunately, it seems that the NSXML 
> classes don't fix the '>' in the ']]>' case either, though:
> NSXMLElement*element = [[[NSXMLElementalloc] 
> initWithName:@"Test"stringValue:@"< & > ]]>"] autorelease];
> NSLog (@"%@", element);

The ">" in "]]>" only needs to be escaped when it's inside a CDATA, I believe. 
(Since that string marks the end of a CDATA.)

—Jens



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to