Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-24 Thread Peter Saint-Andre
Hi Matthias!

Matthias Wimmer wrote:
> Peter Saint-Andre schrieb:
>>> Yes. In that case we would be able to use most (push) SAX parsers.
>>
>> Yes, you would still object? Or yes, you think that's fine? Please
>> choose one (phrased better this time):
> 
> Yes, I think it is fine then.
> 
>> If we specify that you must escape only the characters that the XML
>> specification says you must escape, then:
>>
>> (1) I think the stream error handling now in rfc3920bis is OK.
> 
> This one!

OK. I will find the relevant text in the XML spec and refer to that text
in rfc3920bis.

/psa




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-24 Thread Matthias Wimmer

Peter Saint-Andre schrieb:

Yes. In that case we would be able to use most (push) SAX parsers.


Yes, you would still object? Or yes, you think that's fine? Please
choose one (phrased better this time):


Yes, I think it is fine then.


If we specify that you must escape only the characters that the XML
specification says you must escape, then:

(1) I think the stream error handling now in rfc3920bis is OK.


This one!


Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-23 Thread Peter Saint-Andre
Hi Matthias!

Matthias Wimmer wrote:
>>> I it really necessary, that RFC 3920bis mandates a server to reject such
>>> XMPP streams? I very much dislike this requirement, as it would require
>>> me to implement my own XML parser, as I don't know any parser I could
>>> use, that could be configured to notice me that these characters have
>>> been received unescaped.
>> If we change the text regarding restricted XML features (i.e., say that
>> the characters that don't need to be escaped in XML don't need to be
>> escaped in XMPP), would you still object to the error handling?
> 
> Yes. In that case we would be able to use most (push) SAX parsers.

Yes, you would still object? Or yes, you think that's fine? Please
choose one (phrased better this time):

If we specify that you must escape only the characters that the XML
specification says you must escape, then:

(1) I think the stream error handling now in rfc3920bis is OK.

(2) I think the stream error handling now in rfc3920bis is evil.

> (Well one question left: Is RFC3920bis forbidding numeric character
> references? 

No!

> AFAIK numeric character references are NO entities and are
> therfore not forbidden, but if they would be, I'd have a problem
> generating the  error in all cases as well.)

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-23 Thread Matthias Wimmer
Hi Peter!

Peter Saint-Andre schrieb:
>> RFC3920bis even requires a server to check that this type of XML is not
>> used and that a stream error has to be generated, if it is received.
> We tried to clarify the error handling in rfc3920bis, and that text
> reflected list consensus.

Yes ... I even followed this discussion. But did not write anything
earlier, as I did not realize, that the text together with the rule to
forbid ' " and > to be in the XML document unescaped, would prevent us
from using standard XML parsers.

>> Why at all do these characters have to be escaped?
> 
> I don't know. IIRC, that was text from an early version of
> draft-ietf-xmpp-core and I think I agree with you that it should not be
> necessary to escape those characters in XMPP.
> 
>> I it really necessary, that RFC 3920bis mandates a server to reject such
>> XMPP streams? I very much dislike this requirement, as it would require
>> me to implement my own XML parser, as I don't know any parser I could
>> use, that could be configured to notice me that these characters have
>> been received unescaped.
> 
> If we change the text regarding restricted XML features (i.e., say that
> the characters that don't need to be escaped in XML don't need to be
> escaped in XMPP), would you still object to the error handling?

Yes. In that case we would be able to use most (push) SAX parsers.

(Well one question left: Is RFC3920bis forbidding numeric character
references? AFAIK numeric character references are NO entities and are
therfore not forbidden, but if they would be, I'd have a problem
generating the  error in all cases as well.)


Matthias

-- 
Matthias Wimmer  Fon +49-700 77 00 77 70
Züricher Str. 243Fax +49-89 95 89 91 56
81476 Münchenhttp://ma.tthias.eu/



Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-23 Thread Peter Saint-Andre
Hi Matthias!

Matthias Wimmer wrote:
> Hi!
> 
> There are several characters, that have predefined entities in XML, but
> that do not need to be escaped in XML.
> Examples for such characters are > ' and " in text nodes.
> 
> E.g. due to the XML standard the following stanza would be valid XML:
> 
> Yes, a >
> b!
> 
> ... while RFC 3920 forbidds to generate such XML when used as XMPP.

Correct.

> RFC3920bis even requires a server to check that this type of XML is not
> used and that a stream error has to be generated, if it is received.

We tried to clarify the error handling in rfc3920bis, and that text
reflected list consensus.

> So I have two questions regarding this:
> 
> Why at all do these characters have to be escaped?

I don't know. IIRC, that was text from an early version of
draft-ietf-xmpp-core and I think I agree with you that it should not be
necessary to escape those characters in XMPP.

> I it really necessary, that RFC 3920bis mandates a server to reject such
> XMPP streams? I very much dislike this requirement, as it would require
> me to implement my own XML parser, as I don't know any parser I could
> use, that could be configured to notice me that these characters have
> been received unescaped.

If we change the text regarding restricted XML features (i.e., say that
the characters that don't need to be escaped in XML don't need to be
escaped in XMPP), would you still object to the error handling?

/psa




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-22 Thread Robin Redeker
On Sun, Jul 22, 2007 at 11:25:23PM +0200, Matthias Wimmer wrote:
> Hi Robin!
> 
> 
> Robin Redeker schrieb:
> >>Why at all do these characters have to be escaped?
> >I guess because many people did implement their own broken XML parsers
> >in the past and many couldn't handle real XML, so they enforced escaping
> >that character for the backward compatibility. (just a guess)
> 
> I can't beleave that there are any such problems. There is already 
> software producing XML, that is valid but not escaping all possible 
> characters.
> 
> Examples for this are jabberd2 (but to a very new SVN version), jadc2s 
> (up to today), Psi (still not escaping " and ' in text nodes).

Hm, ok, then I also have absolutely no idea why these chars have to be
escaped. (And I'm also very curious!)

> So there is out many software, that worked for years now, but 
> introducing this unneccessary restricting in RFC 3920 made them broken.
> 
> >If you use expat you could get the original string from a text node
> >and look for a '>' in that string. But this is an ugly hack that I also
> >consider unneccessary.
> 
> How do I do this with expat? I have never seen something like this. At 
> least normally expat is a SAX parser, that you set an 
> CharacterDataHandler. And the function you register as the 
> CharacterDataHandler gets passed unescaped UTF-8 data. Within the 
> CharacterDataHandler I see now way to determine if a > has been 
> transfered as > or as >.


The function I mean is XML_GetInputContext, the Perl API uses that
to get me the portion of the XML document that was parsed. If you got
that you might be able to find out whether unescaped > is in that.

(see eg.
http://www.math.ucla.edu/computing/docindex/expat-html-2/reference.html#XML_GetInputContext
)

You could also use XML_SetDefaultHandler and XML_DefaultCurrent to get
nearly the same data without the limit of XML_CONTEXT_BYTES (I actually
don't know very precise what the difference is, I'm not that intimate
with the C API of expat).

These are all of course very weird ways to get to the original XML
character data of a recognized element, and I would love if XMPP
wouldn't require me to even have need for such access to the my XML
parser.

All I want is a DOM tree and all I want to care about when writing XMPP
out is passing my DOM tree to a XML generator without convincing the XML
generator that _I_ know better than it how XML should be generated and
look like so that a XMPP server doesn't get confused.

(Lets not start with namespace madness here, we don't want to open the
can of worms which is still left over from the 'About stream namespaces'
discussion from 4 months ago. Hmm... IMO the issue should be brought
up regularly even though that discussions about it end in:
"We know, but we won't fix it, because the protocol should stay broken
and underspecified because if we fix it we break the implementations.").

> >The RFC should be fixed and software that doesn't parse unescaped > in
> >text nodes should be fixed (noone is forced in todays world to write his
> >own XML parser, libxml2 (afaik) and expat (for sure) can be convinced to
> >handle partial transferred XML documents these days).
> 
> Yes ... I'd also say that because of reusing standards and 
> implementations of them, we should not force software to not accept 
> unescaped entities. We should even encourage software to accept these 
> unescaped entities.

I agree, we should encourage being conform to the W3C recommendation.



Robin


Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-22 Thread Matthias Wimmer

Hi Robin!


Robin Redeker schrieb:

Why at all do these characters have to be escaped?

I guess because many people did implement their own broken XML parsers
in the past and many couldn't handle real XML, so they enforced escaping
that character for the backward compatibility. (just a guess)


I can't beleave that there are any such problems. There is already 
software producing XML, that is valid but not escaping all possible 
characters.


Examples for this are jabberd2 (but to a very new SVN version), jadc2s 
(up to today), Psi (still not escaping " and ' in text nodes).


So there is out many software, that worked for years now, but 
introducing this unneccessary restricting in RFC 3920 made them broken.



If you use expat you could get the original string from a text node
and look for a '>' in that string. But this is an ugly hack that I also
consider unneccessary.


How do I do this with expat? I have never seen something like this. At 
least normally expat is a SAX parser, that you set an 
CharacterDataHandler. And the function you register as the 
CharacterDataHandler gets passed unescaped UTF-8 data. Within the 
CharacterDataHandler I see now way to determine if a > has been 
transfered as > or as >.



The RFC should be fixed and software that doesn't parse unescaped > in
text nodes should be fixed (noone is forced in todays world to write his
own XML parser, libxml2 (afaik) and expat (for sure) can be convinced to
handle partial transferred XML documents these days).


Yes ... I'd also say that because of reusing standards and 
implementations of them, we should not force software to not accept 
unescaped entities. We should even encourage software to accept these 
unescaped entities.



Matthias


Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-22 Thread Robin Redeker
On Sun, Jul 22, 2007 at 04:30:13PM +0200, Matthias Wimmer wrote:
> Hi!
> 
> There are several characters, that have predefined entities in XML, but
> that do not need to be escaped in XML.
> Examples for such characters are > ' and " in text nodes.
> 
[...]
> So I have two questions regarding this:
> 
> Why at all do these characters have to be escaped?

I guess because many people did implement their own broken XML parsers
in the past and many couldn't handle real XML, so they enforced escaping
that character for the backward compatibility. (just a guess)

> I it really necessary, that RFC 3920bis mandates a server to reject such
> XMPP streams? I very much dislike this requirement, as it would require
> me to implement my own XML parser, as I don't know any parser I could
> use, that could be configured to notice me that these characters have
> been received unescaped.

If you use expat you could get the original string from a text node
and look for a '>' in that string. But this is an ugly hack that I also
consider unneccessary.

The RFC should be fixed and software that doesn't parse unescaped > in
text nodes should be fixed (noone is forced in todays world to write his
own XML parser, libxml2 (afaik) and expat (for sure) can be convinced to
handle partial transferred XML documents these days).



Robin


[Standards] Handling for characters that have entities, but XML does not require them to be escaped

2007-07-22 Thread Matthias Wimmer
Hi!

There are several characters, that have predefined entities in XML, but
that do not need to be escaped in XML.
Examples for such characters are > ' and " in text nodes.

E.g. due to the XML standard the following stanza would be valid XML:

Yes, a >
b!

... while RFC 3920 forbidds to generate such XML when used as XMPP.
RFC3920bis even requires a server to check that this type of XML is not
used and that a stream error has to be generated, if it is received.

So I have two questions regarding this:

Why at all do these characters have to be escaped?

I it really necessary, that RFC 3920bis mandates a server to reject such
XMPP streams? I very much dislike this requirement, as it would require
me to implement my own XML parser, as I don't know any parser I could
use, that could be configured to notice me that these characters have
been received unescaped.


Matthias

-- 
Matthias Wimmer  Fon +49-700 77 00 77 70
Züricher Str. 243Fax +49-89 95 89 91 56
81476 Münchenhttp://ma.tthias.eu/