Re: [Standards] JID Escaping
On Sat, Jul 21, 2007 at 08:17:19PM -0600, Peter Saint-Andre wrote: > Robin Redeker wrote: > > On Sat, Jul 21, 2007 at 09:20:27AM +0200, Mats Bengtsson wrote: > >>> I think the whole XEP should be renamed to something like: > >>> > >>> XEP-0106 - JID Mapping for Gateways > >> This would be better. But it breaks the generic usage of JIDs for both > >> users > >> and gateways. It will create a lot of trouble. > >> > > > > The XEP seems to already create a lot of trouble. Just remind me to > > register '[EMAIL PROTECTED]' when every client unescapes JIDs ;-) > > No problem. The spec says: > > "The character sequence \20 MUST NOT be the first or last character of > an escaped node identifier." > > But of course you can violate the spec if desired. ;-) I don't violate the RFC here. I violate some optional extension. The XEP-0106 has to exclude the JIDs which start or end with '\20' in the nodepart from the escaping AND unescaping transformations. At the moment the paragraph says that it MUST NOT be first or last in the node part, but it doesn't say WHAT to do when this perfectly fine JID arrives from the line. Should the JID not be unescaped at all? Should only the parts after and before '\20' be unescaped? Should the client close the connection? Do I miss something in the XEP? (If I do so please ignore the rest of the mail.) Please also note the nice, but maybe not so important collision that here happens when the client just doesn't unescape: unescape ("\5c20foobar\5c20") => "\20foobar\20" unescape ("\20foobar\20") => "\20foobar\20" This is of course not really an important JID, and who cares about a few optical collisions in clients which confuse the user. And these only happens once someone else decides to put '\20' at the beginning or end of his name and why would someone do that? Hey, we could add security notes to all clients which tell the user: "Never attach '\20' to the beginning or end of your name, it is unsafe!" The U.S. Army will love this! (One might think of a case where they actually name their units by enumerating them with a \ in the end: Unescaped: Escaped: Unescaped: "Tank\1" "Tank\5c1" "Tank\1" "Tank\20" "Tank\20""Tank\20" "Tank\22" "Tank\5c22" "Tank\22" "Tank\5c20" "Tank\20" ... ps Ah... never... why would they do that... :-) I propose to rename the XEP to make clear that this escaping/unescaping should only happen in very rare cases (only at gateways or heavily specialized client frontends). And that the terms 'escaping' and 'unescaping' are replaced by 'mapping' and 'unmapping', because thats what is happening here. Robin
Re: [Standards] mutual authentication and XEP 178
Peter Saint-Andre wrote: > Justin Karneges wrote: >> On Wednesday 18 July 2007 12:31 pm, Peter Saint-Andre wrote: >>> Server1 realizes that it needs an XML stream to Server2 in order to >>> route some stanzas. So Server1 completes address resolution via SRV or >>> whatever and opens a TCP connection to Server2. That happens on >>> TCPconn1. Then Server1 sends a stream header to Server2. So far so good. >>> >>> RFC3920 says that for s2s there are 2 TCP connections. So in order to >>> send a response stream header to Server1, I assume that Server2 opens a >>> second TCP connection, which we'll call TCPconn2, and then sends the >>> response stream header over TCPconn2. >>> >>> Correct? >> Absolutely not. :) >> >>> I don't know if the spec needs to talk about this, but it couldn't hurt >>> (since it's different for c2s vs. s2s). >> It's the same. One XML document for each direction in the TCP connection. >> However, with s2s, only the initiator of a TCP connection can send stanzas >> (e.g. 'message', 'presence', and 'iq'). > > I'll have to do some research on this so that I can specify it correctly. :) I woke up at 3:30 this morning and realized that my previous email was horribly wrong. An XML stream always goes in one direction, it's just that in the c2s case both streams go over the same TCP connection, whereas in the s2s case there are two TCP connections. However, as Justin says, the directionality matters only for the sending of stanzas, not for the sending of XML elements that are used to establish the stream. I'll clarify this in the -04 version of rfc3920bis, and will post to the list once I have proposed text. /psa smime.p7s Description: S/MIME Cryptographic Signature
Re: [Standards] JID Escaping
Robin Redeker wrote: > On Sat, Jul 21, 2007 at 09:20:27AM +0200, Mats Bengtsson wrote: >>> I think the whole XEP should be renamed to something like: >>> >>> XEP-0106 - JID Mapping for Gateways >> This would be better. But it breaks the generic usage of JIDs for both users >> and gateways. It will create a lot of trouble. >> > > The XEP seems to already create a lot of trouble. Just remind me to > register '[EMAIL PROTECTED]' when every client unescapes JIDs ;-) No problem. The spec says: "The character sequence \20 MUST NOT be the first or last character of an escaped node identifier." But of course you can violate the spec if desired. ;-) /psa smime.p7s Description: S/MIME Cryptographic Signature
Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped
On Sun, Jul 22, 2007 at 11:25:23PM +0200, Matthias Wimmer wrote: > Hi Robin! > > > Robin Redeker schrieb: > >>Why at all do these characters have to be escaped? > >I guess because many people did implement their own broken XML parsers > >in the past and many couldn't handle real XML, so they enforced escaping > >that character for the backward compatibility. (just a guess) > > I can't beleave that there are any such problems. There is already > software producing XML, that is valid but not escaping all possible > characters. > > Examples for this are jabberd2 (but to a very new SVN version), jadc2s > (up to today), Psi (still not escaping " and ' in text nodes). Hm, ok, then I also have absolutely no idea why these chars have to be escaped. (And I'm also very curious!) > So there is out many software, that worked for years now, but > introducing this unneccessary restricting in RFC 3920 made them broken. > > >If you use expat you could get the original string from a text node > >and look for a '>' in that string. But this is an ugly hack that I also > >consider unneccessary. > > How do I do this with expat? I have never seen something like this. At > least normally expat is a SAX parser, that you set an > CharacterDataHandler. And the function you register as the > CharacterDataHandler gets passed unescaped UTF-8 data. Within the > CharacterDataHandler I see now way to determine if a > has been > transfered as > or as >. The function I mean is XML_GetInputContext, the Perl API uses that to get me the portion of the XML document that was parsed. If you got that you might be able to find out whether unescaped > is in that. (see eg. http://www.math.ucla.edu/computing/docindex/expat-html-2/reference.html#XML_GetInputContext ) You could also use XML_SetDefaultHandler and XML_DefaultCurrent to get nearly the same data without the limit of XML_CONTEXT_BYTES (I actually don't know very precise what the difference is, I'm not that intimate with the C API of expat). These are all of course very weird ways to get to the original XML character data of a recognized element, and I would love if XMPP wouldn't require me to even have need for such access to the my XML parser. All I want is a DOM tree and all I want to care about when writing XMPP out is passing my DOM tree to a XML generator without convincing the XML generator that _I_ know better than it how XML should be generated and look like so that a XMPP server doesn't get confused. (Lets not start with namespace madness here, we don't want to open the can of worms which is still left over from the 'About stream namespaces' discussion from 4 months ago. Hmm... IMO the issue should be brought up regularly even though that discussions about it end in: "We know, but we won't fix it, because the protocol should stay broken and underspecified because if we fix it we break the implementations."). > >The RFC should be fixed and software that doesn't parse unescaped > in > >text nodes should be fixed (noone is forced in todays world to write his > >own XML parser, libxml2 (afaik) and expat (for sure) can be convinced to > >handle partial transferred XML documents these days). > > Yes ... I'd also say that because of reusing standards and > implementations of them, we should not force software to not accept > unescaped entities. We should even encourage software to accept these > unescaped entities. I agree, we should encourage being conform to the W3C recommendation. Robin
Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped
Hi Robin! Robin Redeker schrieb: Why at all do these characters have to be escaped? I guess because many people did implement their own broken XML parsers in the past and many couldn't handle real XML, so they enforced escaping that character for the backward compatibility. (just a guess) I can't beleave that there are any such problems. There is already software producing XML, that is valid but not escaping all possible characters. Examples for this are jabberd2 (but to a very new SVN version), jadc2s (up to today), Psi (still not escaping " and ' in text nodes). So there is out many software, that worked for years now, but introducing this unneccessary restricting in RFC 3920 made them broken. If you use expat you could get the original string from a text node and look for a '>' in that string. But this is an ugly hack that I also consider unneccessary. How do I do this with expat? I have never seen something like this. At least normally expat is a SAX parser, that you set an CharacterDataHandler. And the function you register as the CharacterDataHandler gets passed unescaped UTF-8 data. Within the CharacterDataHandler I see now way to determine if a > has been transfered as > or as >. The RFC should be fixed and software that doesn't parse unescaped > in text nodes should be fixed (noone is forced in todays world to write his own XML parser, libxml2 (afaik) and expat (for sure) can be convinced to handle partial transferred XML documents these days). Yes ... I'd also say that because of reusing standards and implementations of them, we should not force software to not accept unescaped entities. We should even encourage software to accept these unescaped entities. Matthias
Re: [Standards] Handling for characters that have entities, but XML does not require them to be escaped
On Sun, Jul 22, 2007 at 04:30:13PM +0200, Matthias Wimmer wrote: > Hi! > > There are several characters, that have predefined entities in XML, but > that do not need to be escaped in XML. > Examples for such characters are > ' and " in text nodes. > [...] > So I have two questions regarding this: > > Why at all do these characters have to be escaped? I guess because many people did implement their own broken XML parsers in the past and many couldn't handle real XML, so they enforced escaping that character for the backward compatibility. (just a guess) > I it really necessary, that RFC 3920bis mandates a server to reject such > XMPP streams? I very much dislike this requirement, as it would require > me to implement my own XML parser, as I don't know any parser I could > use, that could be configured to notice me that these characters have > been received unescaped. If you use expat you could get the original string from a text node and look for a '>' in that string. But this is an ugly hack that I also consider unneccessary. The RFC should be fixed and software that doesn't parse unescaped > in text nodes should be fixed (noone is forced in todays world to write his own XML parser, libxml2 (afaik) and expat (for sure) can be convinced to handle partial transferred XML documents these days). Robin
[Standards] Handling for characters that have entities, but XML does not require them to be escaped
Hi! There are several characters, that have predefined entities in XML, but that do not need to be escaped in XML. Examples for such characters are > ' and " in text nodes. E.g. due to the XML standard the following stanza would be valid XML: Yes, a > b! ... while RFC 3920 forbidds to generate such XML when used as XMPP. RFC3920bis even requires a server to check that this type of XML is not used and that a stream error has to be generated, if it is received. So I have two questions regarding this: Why at all do these characters have to be escaped? I it really necessary, that RFC 3920bis mandates a server to reject such XMPP streams? I very much dislike this requirement, as it would require me to implement my own XML parser, as I don't know any parser I could use, that could be configured to notice me that these characters have been received unescaped. Matthias -- Matthias Wimmer Fon +49-700 77 00 77 70 Züricher Str. 243Fax +49-89 95 89 91 56 81476 Münchenhttp://ma.tthias.eu/