Re: NSXML Parsing Problem

2010-03-26 Thread Jeffrey Oleander


> On Thu, 2010/03/25, Keary Suska  wrote:
>> Maybe a cool option for NSXML would be to be able to
>> specify the & pound ; sequence and have it map it to
>> whatever...
 
> My XML is a little rusty but IIRC this is an XML issue, and
> any XML parser would choke. You have to define (or perhaps
> more properly "declare") every named entity other than the
> pre-defined named entities such as >, < and
> &.
>
> I believe you can use numeric references with impunity:
> &#nnn; but make sure it jives with your character
> encoding.

Agreed.  pound is defined in html 4 but not in xhtml,
which has only pre-defined character references for 
amp, lt, gt, apos, and quot
http://www.w3.org/TR/2006/REC-xml11-20060816/#intern-replacement
But in the current mode, they strive to make it difficult 
to put the pieces together, though they may believe they
are clearly doing so here
http://www.w3.org/TR/2006/REC-xml11-20060816/#intern-replacement
here
http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-entexpand
and here
http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EntityValue

But if you've told it you're using UTF-8 or UTF-16 it
shouldn't need an ampersand escape, since the British
pound sterling symbol is not otherwise used in XML itself;
in which case you just use the Unicode character.  But,
if you want to be compatible with html 4 you've got to
define that character reference.


  
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSXML Parsing Problem

2010-03-25 Thread Jens Alfke

On Mar 25, 2010, at 7:34 PM, Dave wrote:

> I was wondering if changing the XML charset would solve the problem? From 
> searching the Web I think the problem could be that we are assuming UTF-8, I 
> was wondering if we changed it to one of the ISO char sets if this would 
> solve it.

No, it has nothing to do with character set. It sounds like the file is ASCII.

> Maybe a cool option for NSXML would be to be able to specify the & pound ; 
> sequence and have it map it to whatever…….

Yes, that’s called a DTD in XML lingo. I don’t know if NSXMLParser supports 
those. Your documents would need to use some sort of DTD to be considered valid 
XML, due to the undefined entities.

—Jens


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSXML Parsing Problem

2010-03-25 Thread Keary Suska
On Mar 25, 2010, at 8:34 PM, Dave wrote:

> Hi Jens,
> 
> Thanks for taking the time to reply. We are a startup and basically just 
> trying to get thing going with what we have. I'm downloading the XML data via 
> a URL and I could just change the database and strip out the offending 
> characters. I was wondering if changing the XML charset would solve the 
> problem? From searching the Web I think the problem could be that we are 
> assuming UTF-8, I was wondering if we changed it to one of the ISO char sets 
> if this would solve it.
> 
> Maybe a cool option for NSXML would be to be able to specify the & pound ; 
> sequence and have it map it to whatever...

My XML is a little rusty but IIRC this is an XML issue, and any XML parser 
would choke. You have to define (or perhaps more properly "declare") every 
named entity other than the pre-defined named entities such as >, < and 
&.

I believe you can use numeric references with impunity: &#nnn; but make sure it 
jives with your character encoding.

HTH,

Keary Suska
Esoteritech, Inc.
"Demystifying technology for your home or business"

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSXML Parsing Problem

2010-03-25 Thread Jack Carbaugh
Typically what i do is download the XML into a string ... then if  
there are special characters that i know about in advance, i can use  
string class methods to replace them in the string before passing off  
to the xml parser. just another option to consider.



jack

On Mar 25, 2010, at 10:34 PM, Dave wrote:


Hi Jens,

Thanks for taking the time to reply. We are a startup and basically  
just trying to get thing going with what we have. I'm downloading  
the XML data via a URL and I could just change the database and  
strip out the offending characters. I was wondering if changing the  
XML charset would solve the problem? From searching the Web I think  
the problem could be that we are assuming UTF-8, I was wondering if  
we changed it to one of the ISO char sets if this would solve it.


Maybe a cool option for NSXML would be to be able to specify the &  
pound ; sequence and have it map it to whatever...


Thanks again
Dave


On 25 Mar 2010, at 23:13, Jens Alfke wrote:



On Mar 25, 2010, at 8:47 AM, Dave wrote:

I am getting an error using NSXMLParser if it encounters a British  
Pound Sign - it's encoded as & pound ;   (minus the spaces).

Any idea on how to solve this??


Basic XML only defines a handful of character entities. The other  
common ones are part of HTML. Are you sure this document is valid  
XML?


I'm more familiar with NSXMLDocument than NSXMLParser, so I'm not  
sure how you tell the latter to handle arbitrary character  
entities. Sorry I can't be more help.


—Jens




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/intrntmn%40aol.com

This email sent to intrn...@aol.com


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSXML Parsing Problem

2010-03-25 Thread Dave

Hi Jens,

Thanks for taking the time to reply. We are a startup and basically  
just trying to get thing going with what we have. I'm downloading the  
XML data via a URL and I could just change the database and strip out  
the offending characters. I was wondering if changing the XML charset  
would solve the problem? From searching the Web I think the problem  
could be that we are assuming UTF-8, I was wondering if we changed it  
to one of the ISO char sets if this would solve it.


Maybe a cool option for NSXML would be to be able to specify the &  
pound ; sequence and have it map it to whatever...


Thanks again
Dave


On 25 Mar 2010, at 23:13, Jens Alfke wrote:



On Mar 25, 2010, at 8:47 AM, Dave wrote:

I am getting an error using NSXMLParser if it encounters a British  
Pound Sign - it's encoded as & pound ;   (minus the spaces).

Any idea on how to solve this??


Basic XML only defines a handful of character entities. The other  
common ones are part of HTML. Are you sure this document is valid XML?


I'm more familiar with NSXMLDocument than NSXMLParser, so I'm not  
sure how you tell the latter to handle arbitrary character  
entities. Sorry I can't be more help.


—Jens




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSXML Parsing Problem

2010-03-25 Thread Jens Alfke


On Mar 25, 2010, at 8:47 AM, Dave wrote:

I am getting an error using NSXMLParser if it encounters a British  
Pound Sign - it's encoded as & pound ;   (minus the spaces).

Any idea on how to solve this??


Basic XML only defines a handful of character entities. The other  
common ones are part of HTML. Are you sure this document is valid XML?


I'm more familiar with NSXMLDocument than NSXMLParser, so I'm not sure  
how you tell the latter to handle arbitrary character entities. Sorry  
I can't be more help.


—Jens___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com