Re: [Standards] well-formedness

2008-10-28 Thread Pedro Melo


On Oct 28, 2008, at 5:23 AM, Sergei Golovan wrote:

On Tue, Oct 28, 2008 at 1:28 AM, Peter Saint-Andre  
[EMAIL PROTECTED] wrote:


What about things like SOAP over XMPP? There are lots of prefixes in
that spec:

http://xmpp.org/extensions/xep-0072.html

However that's just about the only such spec I know of.


There's another extension (and it's even implemented):
http://code.google.com/apis/talk/jep_extensions/roster_attributes.html


No, the router part of your server doesn't need to look into the query  
part of the IQ in this case. Only the session manager.


In some servers, those two parts are quite separate entities.


And there's an extension which requires to parse every message packet
which goes through a server:
http://www.xmpp.org/extensions/xep-0079.html


Maybe a reason why I don't know of a single server who implements this.

Best regards,
--
Pedro Melo
Blog: http://www.simplicidade.org/notes/
XMPP ID: [EMAIL PROTECTED]
Use XMPP!




Re: [Standards] well-formedness

2008-10-28 Thread Sergei Golovan
On Tue, Oct 28, 2008 at 3:41 PM, Pedro Melo [EMAIL PROTECTED] wrote:

 On Oct 28, 2008, at 5:23 AM, Sergei Golovan wrote:

 There's another extension (and it's even implemented):
 http://code.google.com/apis/talk/jep_extensions/roster_attributes.html

 No, the router part of your server doesn't need to look into the query part
 of the IQ in this case. Only the session manager.

It doesn't matter which part of the server will process the stanza. The problem
is that if you really want to forbid XMLNS prefixes in XMPP than this particular
extension will be considered as a violation. And consequently it wouldn't be
possible to extend XMPP by adding custom attributes (properly namespaced)
to existing elements. If it's OK then go ahead.


 In some servers, those two parts are quite separate entities.

 And there's an extension which requires to parse every message packet
 which goes through a server:
 http://www.xmpp.org/extensions/xep-0079.html

 Maybe a reason why I don't know of a single server who implements this.

I guess it's just useless. At least in ejabberd (which creates a DOM for every
incoming stanza) impementing this XEP seems not to be a big problem.

-- 
Sergei Golovan


Re: [Standards] well-formedness

2008-10-28 Thread Peter Saint-Andre
Curtis King wrote:
 
 On 27-Oct-08, at 3:28 PM, Peter Saint-Andre wrote:
 
 What about things like SOAP over XMPP? There are lots of prefixes in
 that spec:

 http://xmpp.org/extensions/xep-0072.html

 However that's just about the only such spec I know of.
 
 I never said it would be painless :-)
 
 I think all sides have presented their arguments and no one is really
 going to change their mind until someone shows if there is or isn't a
 significant cost for the server to validate the XML.

So it seems.

Peter


Re: [Standards] well-formedness

2008-10-27 Thread Peter Saint-Andre
Brendan Taylor wrote:
 Throwing another idea out there; since we're not *really* using
 namespaces anyways, could we just reject elements and attributes with
 colons in their names? (besides the xml: attributes, of course)
 IIRC XMPP entities aren't expected to understand namespace prefixes
 anyhow.

That's always been the tradition in Jabber-land, although there are some
exceptions:

http://xmpp.org/extensions/xep-0072.html

/psa


Re: [Standards] well-formedness

2008-10-27 Thread Peter Saint-Andre
Curtis King wrote:
 
 On 21-Oct-08, at 6:47 AM, Peter Saint-Andre wrote:
 
 Curtis King wrote:

 On 20-Oct-08, at 7:37 PM, Peter Saint-Andre wrote:


 Please understand that even if we use MUST instead of SHOULD with
 respect to namespace-awareness, the existing servers are not going to
 be left behind. Newer servers and server versions are still going to
 continue to support their legacy counterparts. The benefit of course
 would be that eventually we will have a sterilized network, where
 clients wouldn't need to worry about rolling out their own
 (non-conforming) namespace handling. In my opinion this is a better
 long term direction.

 I too think that is a worthy goal. The question is: how can we get
 there
 in a reasonable fashion?

 Why not limit the scope of XML-NAMES ?

 I think xml like this should be prohibited by the xmpp spec.

 snip/

 Er yes, that *is* ugly. :)
 
 It's not only ugly, but the root of the problem. No?
 
 If we are going to make a change to the spec which will break most if
 not all server implementations. Why not do the correct fix, by changing
 the text to MUST not use prefixes as described in XML-NAMES. We are
 using XML to frame and encode an over the wire protocol not store a 500
 page document. Let's be smart and not use the parts which will cause us
 pain like prefixes.

What about things like SOAP over XMPP? There are lots of prefixes in
that spec:

http://xmpp.org/extensions/xep-0072.html

However that's just about the only such spec I know of.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



Re: [Standards] well-formedness

2008-10-27 Thread Curtis King


On 27-Oct-08, at 3:28 PM, Peter Saint-Andre wrote:


What about things like SOAP over XMPP? There are lots of prefixes in
that spec:

http://xmpp.org/extensions/xep-0072.html

However that's just about the only such spec I know of.


I never said it would be painless :-)

I think all sides have presented their arguments and no one is really  
going to change their mind until someone shows if there is or isn't a  
significant cost for the server to validate the XML.


ck



Re: [Standards] well-formedness

2008-10-27 Thread Sergei Golovan
On Tue, Oct 28, 2008 at 1:28 AM, Peter Saint-Andre [EMAIL PROTECTED] wrote:

 What about things like SOAP over XMPP? There are lots of prefixes in
 that spec:

 http://xmpp.org/extensions/xep-0072.html

 However that's just about the only such spec I know of.

There's another extension (and it's even implemented):
http://code.google.com/apis/talk/jep_extensions/roster_attributes.html

And there's an extension which requires to parse every message packet
which goes through a server:
http://www.xmpp.org/extensions/xep-0079.html

-- 
Sergei Golovan


Re: [Standards] well-formedness

2008-10-25 Thread Artur Hefczyc

Hi,

I am glad somebody has responded to my post :-)

Without tests I can't really say how much the resource usage would  
grow

but I can imagine it could be significant.
One of the reason for a good performance in Tigase server is a very
lightweight XML parser I have written.



Just so you know, that parser is not a conforming XML parser. Tigase
happily accepts data that is not XML-well-formed, and happily routes
or delivers it.


That's true but please note that XMPP stream is not really XML stream  
either.

I would rather call my parser: XMPP parser then.


And please note. All these increased resource usage would be only  
needed
because _sometimes_ it _may_ happen that maybe 1/1mln packet might  
have

incorrect XMLNS..

I am not sure if this is worth the cost.



What is the cost? Has anyone actually tried determining the actual  
cost?






I just don't think the cost of simply validating namespaces is
significant, and it certainly is not prohibitive.



This cost might be ignored on the client side but on the server side
everything counts. Imagine you have to parse XMPP packets on
150k active connections. The traffic during my load tests was
10k packets/sec. Every instruction you add to the data processing is
multiplied by the number of packets.

Of course if the XMLNS validation would be 1% of all operations
performed by the parser it could be probably ignored.
I think, however that XMLNS validation could require even
more processing than all other parser tasks.

Artur
--
Artur Hefczyc
http://www.tigase.org/
http://artur.hefczyc.net/



Re: [Standards] well-formedness

2008-10-25 Thread Waqas
On Sat, Oct 25, 2008 at 3:48 PM, Artur Hefczyc [EMAIL PROTECTED] wrote:
 Hi,

 I am glad somebody has responded to my post :-)

 Without tests I can't really say how much the resource usage would grow
 but I can imagine it could be significant.
 One of the reason for a good performance in Tigase server is a very
 lightweight XML parser I have written.


 Just so you know, that parser is not a conforming XML parser. Tigase
 happily accepts data that is not XML-well-formed, and happily routes
 or delivers it.

 That's true but please note that XMPP stream is not really XML stream
 either.
 I would rather call my parser: XMPP parser then.



I meant invalid XML, like [EMAIL PROTECTED]/. This when routed could result in
all sorts of DoS scenarios. Most servers and clients would terminate
connections when they receive this.

 And please note. All these increased resource usage would be only needed
 because _sometimes_ it _may_ happen that maybe 1/1mln packet might have
 incorrect XMLNS..

 I am not sure if this is worth the cost.


 What is the cost? Has anyone actually tried determining the actual cost?



 I just don't think the cost of simply validating namespaces is
 significant, and it certainly is not prohibitive.


 This cost might be ignored on the client side but on the server side
 everything counts. Imagine you have to parse XMPP packets on
 150k active connections. The traffic during my load tests was
 10k packets/sec. Every instruction you add to the data processing is
 multiplied by the number of packets.

 Of course if the XMLNS validation would be 1% of all operations
 performed by the parser it could be probably ignored.
 I think, however that XMLNS validation could require even
 more processing than all other parser tasks.

 Artur
 --
 Artur Hefczyc
 http://www.tigase.org/
 http://artur.hefczyc.net/




Re: [Standards] well-formedness

2008-10-25 Thread Kevin Smith
On Sat, Oct 25, 2008 at 11:48 AM, Artur Hefczyc
[EMAIL PROTECTED] wrote:
 Just so you know, that parser is not a conforming XML parser. Tigase
 happily accepts data that is not XML-well-formed, and happily routes
 or delivers it.
 That's true but please note that XMPP stream is not really XML stream
 either.
 I would rather call my parser: XMPP parser then.

It's true that legal XML can be illegal XMPP, but legal XMPP cannot be
illegal XML (quibbles about stream restarts aside), so if the parser
is letting through illegal XML, it's almost certainly letting through
illegal XMPP, too.

 Of course if the XMLNS validation would be 1% of all operations
 performed by the parser it could be probably ignored.
 I think, however that XMLNS validation could require even
 more processing than all other parser tasks.

I'd really love to see a measured figure, but I can't see why anyone
would want to implement it just to prove it's not worth implementing.

/K


Re: [Standards] well-formedness

2008-10-24 Thread Waqas
On Fri, Oct 24, 2008 at 3:12 AM, Artur Hefczyc [EMAIL PROTECTED] wrote:
 Hi,

 I am the server developer so let me add something to the discussion
 even if this is not a direct response to anybody post.

 I think I understand the point but my opinion is if people want to push
 more and more processing on the server to make live easier on the
 client side then the server installations will become more and more
 expensive.

 If the server had to validate XMLNS as well it would significantly
 affect the server performance and memory consumption as it would need
 to keep information about XMLNS for stanzas currently parsed for all
 network connections. Don't even mention the CPU usage to perform
 all the validation.
 Without tests I can't really say how much the resource usage would grow
 but I can imagine it could be significant.
 One of the reason for a good performance in Tigase server is a very
 lightweight
 XML parser I have written.


Just so you know, that parser is not a conforming XML parser. Tigase
happily accepts data that is not XML-well-formed, and happily routes
or delivers it.

 And please note. All these increased resource usage would be only needed
 because _sometimes_ it _may_ happen that maybe 1/1mln packet might have
 incorrect XMLNS..

 I am not sure if this is worth the cost.


What is the cost? Has anyone actually tried determining the actual cost?

 I am not speaking just about XMLNS validation here. I would like you to keep
 it in mind on any occasion you want to push even more processing to the
 server.
 And this is not just for my own comfort to have less development work to do.
 It seems to me that fast and low resources consuming servers are good for us
 all.

 I don't mean I don't want to add more stuff to the server and put more
 processing
 on the server at all cost. I like software development and I am always happy
 to
 implement something new. Even if this is just iq:iq...
 Sometimes however it is better to do more on the client side if possible.

 Artur
 --
 Artur Hefczyc
 http://www.tigase.org/
 http://artur.hefczyc.net/



I just don't think the cost of simply validating namespaces is
significant, and it certainly is not prohibitive.

--
Waqas


Re: [Standards] well-formedness

2008-10-23 Thread Curtis King


On 21-Oct-08, at 6:47 AM, Peter Saint-Andre wrote:


Curtis King wrote:


On 20-Oct-08, at 7:37 PM, Peter Saint-Andre wrote:



Please understand that even if we use MUST instead of SHOULD with
respect to namespace-awareness, the existing servers are not  
going to
be left behind. Newer servers and server versions are still going  
to
continue to support their legacy counterparts. The benefit of  
course

would be that eventually we will have a sterilized network, where
clients wouldn't need to worry about rolling out their own
(non-conforming) namespace handling. In my opinion this is a better
long term direction.


I too think that is a worthy goal. The question is: how can we get  
there

in a reasonable fashion?


Why not limit the scope of XML-NAMES ?

I think xml like this should be prohibited by the xmpp spec.


snip/

Er yes, that *is* ugly. :)


It's not only ugly, but the root of the problem. No?

If we are going to make a change to the spec which will break most if  
not all server implementations. Why not do the correct fix, by  
changing the text to MUST not use prefixes as described in XML- 
NAMES. We are using XML to frame and encode an over the wire protocol  
not store a 500 page document. Let's be smart and not use the parts  
which will cause us pain like prefixes.


One the of the big strength of the xmpp model is that servers can  
route and do all the heavy lifting for the business rules, etc by just  
parse the outer parts of the xml stanza. If the server needs to  
validate the xml-names then this will be no longer be true and the  
price will be very high. There will be increased latency and limited  
scalability.


Pushing the validation to the clients won't work because it's most  
likely the servers breaking the prefixes as it routes the stanzas ;-)


I don't see this limitation as a big cost or loss, as I have only seen  
one client produce xml with prefixes. We can add a SHOULD saying  
servers and clients should accept prefixes where possible for  
historical reasons, but I don't know if it would be needed.



ck



Re: [Standards] well-formedness

2008-10-23 Thread Dave Cridland

On Thu Oct 23 19:58:32 2008, Sergei Golovan wrote:
On Thu, Oct 23, 2008 at 10:38 PM, Curtis King [EMAIL PROTECTED]  
wrote:


 One the of the big strength of the xmpp model is that servers can  
route and
 do all the heavy lifting for the business rules, etc by just  
parse the outer

 parts of the xml stanza.

XML is such a wondeful language where you can't recognize the outer
part of the stanza without entirely parsing it. So, server has to
parse every byte it receives.


Wrong.

You have to lex, and that's all.

You need to parse, build lookup tables, maintain them dependent on  
the document model, and all sort to detect undeclared namespace  
prefixes, though. Hence this entire thread, essentially.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] well-formedness

2008-10-23 Thread Artur Hefczyc

Hi,

I am the server developer so let me add something to the discussion
even if this is not a direct response to anybody post.

I think I understand the point but my opinion is if people want to push
more and more processing on the server to make live easier on the
client side then the server installations will become more and more  
expensive.


If the server had to validate XMLNS as well it would significantly
affect the server performance and memory consumption as it would need
to keep information about XMLNS for stanzas currently parsed for all
network connections. Don't even mention the CPU usage to perform
all the validation.
Without tests I can't really say how much the resource usage would grow
but I can imagine it could be significant.
One of the reason for a good performance in Tigase server is a very  
lightweight

XML parser I have written.

And please note. All these increased resource usage would be only needed
because _sometimes_ it _may_ happen that maybe 1/1mln packet might have
incorrect XMLNS..

I am not sure if this is worth the cost.

I am not speaking just about XMLNS validation here. I would like you  
to keep
it in mind on any occasion you want to push even more processing to  
the server.
And this is not just for my own comfort to have less development work  
to do.
It seems to me that fast and low resources consuming servers are good  
for us all.


I don't mean I don't want to add more stuff to the server and put more  
processing
on the server at all cost. I like software development and I am always  
happy to

implement something new. Even if this is just iq:iq...
Sometimes however it is better to do more on the client side if  
possible.


Artur
--
Artur Hefczyc
http://www.tigase.org/
http://artur.hefczyc.net/



Re: [Standards] well-formedness

2008-10-23 Thread Curtis King


On 23-Oct-08, at 11:58 AM, Sergei Golovan wrote:


On Thu, Oct 23, 2008 at 10:38 PM, Curtis King [EMAIL PROTECTED] wrote:


One the of the big strength of the xmpp model is that servers can  
route and
do all the heavy lifting for the business rules, etc by just parse  
the outer

parts of the xml stanza.


XML is such a wondeful language where you can't recognize the outer
part of the stanza without entirely parsing it. So, server has to
parse every byte it receives.


I'm amazed by the number of people on this list, whom seem to forget  
about the difference between lexing and a parsing when discussing XML.  
It's like some kind of XML haze...


ck



Re: [Standards] well-formedness

2008-10-23 Thread Curtis King


On 23-Oct-08, at 3:12 PM, Artur Hefczyc wrote:


Hi,

If the server had to validate XMLNS as well it would significantly
affect the server performance and memory consumption as it would need
to keep information about XMLNS for stanzas currently parsed for all
network connections.


Good point. I was thinking about parsing of just individual stanzas.  
But it needs to-be for the whole XML stream! Ouch.




Don't even mention the CPU usage to perform
all the validation.
Without tests I can't really say how much the resource usage would  
grow

but I can imagine it could be significant.
One of the reason for a good performance in Tigase server is a very  
lightweight

XML parser I have written.


I did the same for M-Link, well actually I had already written it for  
that other protocol which uses XML.





And please note. All these increased resource usage would be only  
needed
because _sometimes_ it _may_ happen that maybe 1/1mln packet might  
have

incorrect XMLNS..

I am not sure if this is worth the cost.


It isn't because a client in the end must protect itself and not  
blindly trust the server.


ck



Re: [Standards] well-formedness

2008-10-23 Thread Brendan Taylor
Throwing another idea out there; since we're not *really* using
namespaces anyways, could we just reject elements and attributes with
colons in their names? (besides the xml: attributes, of course)
IIRC XMPP entities aren't expected to understand namespace prefixes
anyhow.

(I've never written a parser, so I was not aware of the difference
between lexing and parsing. I'm more sympathetic to Dave Cridland's
case now.)


pgpwDJzdX39D2.pgp
Description: PGP signature


Re: [Standards] well-formedness

2008-10-22 Thread Peter Saint-Andre
Dave, what text would you propose?

As a reminder, the provisional text in version -08 of rfc3920bis is:

***

12.3.  Well-Formedness

   There are two varieties of well-formedness:

   o  XML-well-formedness in accordance with the definition of well-
  formed in Section 2.1 of [XML].
   o  Namespace-well-formedness in accordance with the definition of
  namespace-well-formed in Section 7 of [XML-NAMES].

   The following rules apply.

   An XMPP entity MUST NOT generate data that is not XML-well-formed.
   An XMPP entity MUST NOT accept data that is not XML-well-formed;
   instead it MUST return an xml-not-well-formed/ stream error and
   close the stream over which the data was received.

   An XMPP entity MUST NOT generate data that is not namespace-well-
   formed.  An XMPP server SHOULD NOT route or deliver data that is not
   namespace-well-formed, and SHOULD return a stanza error of not-
   acceptable/ggt; or a stream error of xml-not-well-formed/ in
   response to the receipt of such data.

  Note: Because these restrictions were underspecified in an earlier
  revision of this specification, it is possible that
  implementations based on that revision will send data that does
  not comply with the restrictions; an entity SHOULD be liberal in
  accepting such data.

***

Dave Cridland wrote:
 On Tue Oct 21 00:53:35 2008, Waqas wrote:
 The expat parser (as an example) in namespace-aware mode reports a
 fatal error on undeclared prefixes. This was added in response to this
 bug report:
 http://sourceforge.net/tracker/index.php?func=detailaid=695401group_id=10127atid=110127

 which references this section of XML Names:
 http://www.w3.org/TR/REC-xml-names/#ns-qualnames


 Which doesn't say anything about mandatory fatal errors.
 
 If you're parsing a static document, it's quite reasonable to generate a
 fatal error, but I don't think that's the right thing at all with an XML
 stream.
 
 Ah yes, a namespace aware parser (expat) is indeed being used with
 namespace awareness disabled...
 
 Right - and then namespaces are handled, so the overall result is that a
 namespace aware parser is used. If you're mandating that all XMPP
 implementations MUST use somebody else's parser, then I don't know quite
 what to say.
 
 
 I looked at the Gajim sources, and using
 'http://www.gajim.org/xmlns/undeclared-root' as the namespace of all
 undeclared prefixes clearly does not conform with [XML-NAMES].
 See: http://www.w3.org/TR/REC-xml-names/#ProcessorConformance


 Nonsense.
 
 A processor MUST report violations of namespace well-formedness -
 Gajim is doing so, signalling this condition using a specific namespace
 URI, so it clearly *does* conform. You may argue that I should have used
 some special non-string object instead, if you like, and that how Gajim
 handles this signal - by treating it as the unknown namespace it (kind
 of) is - is sufficiently simple and neat as to warrant being maligned as
 a hack, but it's a damn sight better than terminating the connection.
 
 Gajim does not conform to XML-NAMES. I reviewed the code, and it
 appears to act correctly for most XML. But it does not act correctly
 for prefixes on attributes
 
 Not that it did when expat was used to handle the namespaces, either.
 Making it handle these properly would involve quite a bit more
 rewriting. (Possible and desirable rewriting, to be sure, but nothing to
 do with the issue at hand, sorry).
 
 . And it does not have a single one of all
 those required checks for non-conforming XML (except the undeclared
 prefix check on tag names). XML-NAMES requires a number of checks for
 conformance, some of which are in
 http://www.w3.org/TR/REC-xml-names/#Conformance while others are
 sprinkled throughout the spec.


 I'll accept that - I didn't make it check for multiple colons, etc, and
 I might well allow a redefinition of xml: and xmlns:, which'd be
 confusing. I ought to fix these at some point.
 
 Incidentally, by stating except the undeclared prefix check, aren't
 you arguing that the code *is* following XML-NAMES in this regard?
 
 Dave, I don't think you want to conform to XML-NAMES. I think you'd
 prefer to sanitize the XML instead to make it conform to XML-NAMES.
 One step closer to HTML ;)
 
 The mechanism by which I happen to have chosen to report undeclared
 namespaces is merely a convenient mechanism which happens to have result
 I desired with minimal programming. I happen to think the code is less
 hacky than Expat's rather bizarre API, which has namespace handling
 hacked on via character delimiters, especially given how Gajim then used
 this API. (Either Expat looks up namespaces and then leaves you a
 non-standard notation to parse, or else you parse the standard notation
 and lookup namespaces yourself, in a more resilient manner - not a hard
 choice, really).
 
 What I'm trying to do is look at where we are now, and describe the best
 option for developers wishing to deploy now, especially 

Re: [Standards] well-formedness

2008-10-21 Thread Peter Saint-Andre
Curtis King wrote:
 
 On 20-Oct-08, at 7:37 PM, Peter Saint-Andre wrote:
 

 Please understand that even if we use MUST instead of SHOULD with
 respect to namespace-awareness, the existing servers are not going to
 be left behind. Newer servers and server versions are still going to
 continue to support their legacy counterparts. The benefit of course
 would be that eventually we will have a sterilized network, where
 clients wouldn't need to worry about rolling out their own
 (non-conforming) namespace handling. In my opinion this is a better
 long term direction.

 I too think that is a worthy goal. The question is: how can we get there
 in a reasonable fashion?
 
 Why not limit the scope of XML-NAMES ?
 
 I think xml like this should be prohibited by the xmpp spec.

snip/

Er yes, that *is* ugly. :)

/psa





Re: [Standards] well-formedness

2008-10-21 Thread Matthew Wild
On Tue, Oct 21, 2008 at 2:47 PM, Peter Saint-Andre [EMAIL PROTECTED] wrote:
 Curtis King wrote:

 On 20-Oct-08, at 7:37 PM, Peter Saint-Andre wrote:


 Please understand that even if we use MUST instead of SHOULD with
 respect to namespace-awareness, the existing servers are not going to
 be left behind. Newer servers and server versions are still going to
 continue to support their legacy counterparts. The benefit of course
 would be that eventually we will have a sterilized network, where
 clients wouldn't need to worry about rolling out their own
 (non-conforming) namespace handling. In my opinion this is a better
 long term direction.

 I too think that is a worthy goal. The question is: how can we get there
 in a reasonable fashion?

 Why not limit the scope of XML-NAMES ?

 I think xml like this should be prohibited by the xmpp spec.

 snip/

 Er yes, that *is* ugly. :)


An XMPP entity MAY choose to use prefixes as described in
[XML-NAMES], on the condition that it does not generate XML which may
be considered UGLY to a receiving XMPP entity.

Moving forward, I'd also be in favour of specifying that RFC-compliant
implementations MUST NOT send XML deemed invalid per XML-NAMES.

Matthew.


Re: [Standards] well-formedness

2008-10-21 Thread Dave Cridland

On Tue Oct 21 00:53:35 2008, Waqas wrote:

The expat parser (as an example) in namespace-aware mode reports a
fatal error on undeclared prefixes. This was added in response to  
this
bug report:  
http://sourceforge.net/tracker/index.php?func=detailaid=695401group_id=10127atid=110127

which references this section of XML Names:
http://www.w3.org/TR/REC-xml-names/#ns-qualnames



Which doesn't say anything about mandatory fatal errors.

If you're parsing a static document, it's quite reasonable to  
generate a fatal error, but I don't think that's the right thing at  
all with an XML stream.



Ah yes, a namespace aware parser (expat) is indeed being used with
namespace awareness disabled...


Right - and then namespaces are handled, so the overall result is  
that a namespace aware parser is used. If you're mandating that all  
XMPP implementations MUST use somebody else's parser, then I don't  
know quite what to say.




I looked at the Gajim sources, and using
'http://www.gajim.org/xmlns/undeclared-root' as the namespace of all
undeclared prefixes clearly does not conform with [XML-NAMES].
See: http://www.w3.org/TR/REC-xml-names/#ProcessorConformance



Nonsense.

A processor MUST report violations of namespace well-formedness -  
Gajim is doing so, signalling this condition using a specific  
namespace URI, so it clearly *does* conform. You may argue that I  
should have used some special non-string object instead, if you like,  
and that how Gajim handles this signal - by treating it as the  
unknown namespace it (kind of) is - is sufficiently simple and neat  
as to warrant being maligned as a hack, but it's a damn sight better  
than terminating the connection.



Gajim does not conform to XML-NAMES. I reviewed the code, and it
appears to act correctly for most XML. But it does not act correctly
for prefixes on attributes


Not that it did when expat was used to handle the namespaces, either.  
Making it handle these properly would involve quite a bit more  
rewriting. (Possible and desirable rewriting, to be sure, but nothing  
to do with the issue at hand, sorry).



. And it does not have a single one of all
those required checks for non-conforming XML (except the undeclared
prefix check on tag names). XML-NAMES requires a number of checks  
for

conformance, some of which are in
http://www.w3.org/TR/REC-xml-names/#Conformance while others are
sprinkled throughout the spec.


I'll accept that - I didn't make it check for multiple colons, etc,  
and I might well allow a redefinition of xml: and xmlns:, which'd be  
confusing. I ought to fix these at some point.


Incidentally, by stating except the undeclared prefix check, aren't  
you arguing that the code *is* following XML-NAMES in this regard?



Dave, I don't think you want to conform to XML-NAMES. I think you'd
prefer to sanitize the XML instead to make it conform to XML-NAMES.
One step closer to HTML ;)


The mechanism by which I happen to have chosen to report undeclared  
namespaces is merely a convenient mechanism which happens to have  
result I desired with minimal programming. I happen to think the code  
is less hacky than Expat's rather bizarre API, which has namespace  
handling hacked on via character delimiters, especially given how  
Gajim then used this API. (Either Expat looks up namespaces and then  
leaves you a non-standard notation to parse, or else you parse the  
standard notation and lookup namespaces yourself, in a more resilient  
manner - not a hard choice, really).


What I'm trying to do is look at where we are now, and describe the  
best option for developers wishing to deploy now, especially bearing  
in mind we need to obtain the best result, where best is in terms  
of interoperability and potential efficiency. If you disagree with  
those goals, please say so - I don't think your goals are all that  
different.


You appear to be arguing that the best interoperability (presumably)  
is achieved by producing only XML-NAMES conforming XML. I can agree  
with you there.


I also think this doesn't always happen right now, and that therefore  
clients are best advised to handle Bad XMLNS in a graceful manner,  
in particular, not generated a fatal stream-level error.


Furthermore, I note that if clients do this, the requirement to  
produce only Good XMLNS can be relaxed slightly, since no serious  
damage results. That is, avoid if possible - bad things may result,  
rather than avoid at all costs - bad things will result. SHOULD  
instaed of MUST in RFC 2119 terms.


Finally, I note that the costs can be, in fact, remarkably high for a  
server in the case of forwarding stanzas, since in order to merely  
forward stanzas, a simple lexing pass is sufficient, whereas to check  
- in particular - for undeclared prefixes requires a full parse and  
lookup. These are expensive operations involving allocations, string  
compares, and other primitives that have a detrimental effect on  
short-term and long-term 

Re: [Standards] well-formedness

2008-10-20 Thread Dave Cridland

On Mon Oct 20 04:40:56 2008, Waqas wrote:
On Tue, Oct 14, 2008 at 3:28 AM, Peter Saint-Andre  
[EMAIL PROTECTED] wrote:

 an entity SHOULD be liberal in accepting such data.

This translates to:

  an entity SHOULD NOT use a namespace-validating parser (as  
defined

in [XML-NAMES])


No, I disagree, it translates as an entity SHOULD NOT use a parser  
that produces an unrecoverable fatal error on an undeclared namespace  
prefix. I disagree that's the same thing at all.



This is indeed the case. Entities in the XMPP world tend not to use
namespace aware parsers.


Um, wait...


 In
fact most do not care about namespaces at all (aside from a few
specific cases where the XEPs use
a namespace prefix in the examples, the implementations are often
coded to look for that prefix).



... because...


Testing with ejabberd and gajim (quite a popular combination), it  
was

quickly clear that both did
not deal with valid messages where a prefix was used, and both did
deal with messages with a
namespace other than jabber:client.



... Gajim was using, and still does use, a namespace aware parser.


All implementations must be namespace non-aware if they don't wish  
to

have the disconnection bug
that gajim had. I would like to argue that it was not a bug at all.


And Gajim is most certainly namespace aware. Please, review the code  
and tell me where it doesn't conform to XML-NAMES.


There are a few cases in it's higher layers where it ignores the  
namespace, and switches only based on the local-name of the element -  
this is, indeed, an error, but it's one that has nothing at all to do  
with this. The fact that these existed when the namespace handling in  
Expat was used somewhat defeats your argument.


The behavior when a server receives a badly-namespaced stanza needs  
to

be clarified. I have
been working with Matthew Wild on a not-yet-released server. We are
wondering whether we should
discard the stanza, the element, or raise a stream error. After all,
there really is no reason
that any (non-malicious) entity should be sending invalid  
namespaces.

If they do then it is a bug,
just the same as if they sent invalid XML.


Wrong. It's impossible, in the current infrastructure, to receive  
invalid XML except via a error in the entity you're actually  
connected to.


If you receive invalid XML, there is no way to handle it in any  
useful manner.


If you receive invalid XMLNS, however, it might have come from  
anywhere, and merely been forwarded on to you. ANd there's at least  
three ways of handling it.


a) Assume that undeclared prefixes are bound to an arbitrary  
unknown namespace. (This is what Gajim does, now). From there,  
process the stanza as much as is possible, which might include doing  
nothing at all, or rejecting it with service-unavailable/, just as  
with any other unrecognized namespace.


b) Detect unbound namespaces as a special case, and bounce the stanza.

c) Emit a stream error. This is what Gajim did previously, and what  
you're recommending everyone does.



Just discarding it has a problem. Someone could send a message with
invalid namespaces to a
conference.jabber.org room. Everyone (human) would see that, except
entities which care about
namespaces. From the protocol's perspective this would be correct,
but not from a normal user's
perspective.


And this will have exactly the same effect with any of the above  
solutions, unless one is mandated. And the interesting thing is if a  
server passes through - as existing servers do, essentially doing (a)  
- and the clients all do (c).


Because then, sending invalid XMLNS via a chat room kicks out all the  
users, and this doesn't seem to be appreciated.


Sorry if this sounds like a rant. I just don't like where we are  
headed.


I don't like where we are. I don't like where some people want us to  
go, because they seem to want to send us off into a fantasy land,  
where servers are redeployed in seconds.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] well-formedness

2008-10-20 Thread Peter Saint-Andre
Waqas wrote:
 On Mon, Oct 20, 2008 at 9:01 PM, Dave Cridland [EMAIL PROTECTED] wrote:

 If you receive invalid XMLNS, however, it might have come from anywhere, and
 merely been forwarded on to you. ANd there's at least three ways of handling
 it.

 a) Assume that undeclared prefixes are bound to an arbitrary unknown
 namespace. (This is what Gajim does, now). From there, process the stanza as
 much as is possible, which might include doing nothing at all, or rejecting
 it with service-unavailable/, just as with any other unrecognized
 namespace.

 b) Detect unbound namespaces as a special case, and bounce the stanza.

 c) Emit a stream error. This is what Gajim did previously, and what you're
 recommending everyone does.

 
 I am NOT suggesting everyone does that, or should do that. I'm saying
 everyone tends to do that because server implementations (e.g.,
 ejabberd) do not conform to [XML-NAMES].
 
 [XML-NAMES] does say To conform to this specification, a processor
 MUST report violations of namespace well-formedness, and parsers I
 have tested with all seem to interpret that as a required fatal error.

For our purposes, does it need to be a fatal error instead of a warning,
or (that is) a stream error instead of a stanza error?

 Just discarding it has a problem. Someone could send a message with
 invalid namespaces to a
 conference.jabber.org room. Everyone (human) would see that, except
 entities which care about
 namespaces. From the protocol's perspective this would be correct,
 but not from a normal user's
 perspective.
 And this will have exactly the same effect with any of the above solutions,
 unless one is mandated. And the interesting thing is if a server passes
 through - as existing servers do, essentially doing (a) - and the clients
 all do (c).

 Because then, sending invalid XMLNS via a chat room kicks out all the users,
 and this doesn't seem to be appreciated.

Believe me, we've see it in the jdev room (not everyone gets kicked, but
it's still a selective DoS).

 Yep, and while servers should indeed support this until the majority
 of servers are namespace aware, this should be considered a bug, and
 not legitimized by the spec.

I tend to agree, but I'm also sympathetic to concerns about whether this
can be implemented reasonably.

 Sorry if this sounds like a rant. I just don't like where we are headed.
 I don't like where we are. I don't like where some people want us to go,
 because they seem to want to send us off into a fantasy land, where servers
 are redeployed in seconds.

 
 I understand this. But I do think that we should be stricter in the
 long term. What you are suggesting (and did in Gajim) is basically a
 hack. Something which needed to be done to cope with xmlns-unaware
 servers. All client developers should roll their own XMLNS processing
 code? They do, but they shouldn't have had to. I just don't think we
 should legitimize a hack.
 
 What I'm saying is that yes, we do need to support the existing
 deployments, but their (IMHO) incorrect behavior should be declared as
 non-conforming.

Right, and that's what the text I proposed (mostly) does.

BTW, where we are is really a function of jabberd 1.x, which in 2004
(not sure about now) was not namespace-aware or namespace-correct in
conformance with XML-NAMES, so we just punted and said it was OK to be
out of compliance with XML-NAMES. Perhaps not such a good precedent, but
we didn't want to break backwards-compatibility at the time -- in fact
that was an explicit goal of the XMPP WG.

 Dave.
 --
 Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
 Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

 
 I understand developers today simply have to accept this non-XML-NAMES
 conforming XML. But let's not force the developers writing clients in
 2015 to face the same problems. I think most would agree that that
 would be pretty sad.

Indeed.

 Please understand that even if we use MUST instead of SHOULD with
 respect to namespace-awareness, the existing servers are not going to
 be left behind. Newer servers and server versions are still going to
 continue to support their legacy counterparts. The benefit of course
 would be that eventually we will have a sterilized network, where
 clients wouldn't need to worry about rolling out their own
 (non-conforming) namespace handling. In my opinion this is a better
 long term direction.

I too think that is a worthy goal. The question is: how can we get there
in a reasonable fashion?

Peter



Re: [Standards] well-formedness

2008-10-19 Thread Waqas
On Tue, Oct 14, 2008 at 3:28 AM, Peter Saint-Andre [EMAIL PROTECTED] wrote:

 OK here's what I have now in my working copy of rfc3920bis:

 ***

 12.3.  Well-Formedness

   There are two varieties of well-formedness:

   o  XML-well-formedness in accordance with the definition of well-
  formed in Section 2.1 of [XML].
   o  Namespace-well-formedness in accordance with the definition of
  namespace-well-formed in Section 7 of [XML-NAMES].

   The following rules apply.

   An XMPP entity MUST NOT generate data that is not XML-well-formed.
   An XMPP entity MUST NOT accept data that is not XML-well-formed;
   instead it MUST return an xml-not-well-formed/ stream error and
   close the stream over which the data was received.

   An XMPP entity MUST NOT generate data that is not namespace-well-
   formed.  An XMPP server SHOULD NOT route or deliver data that is not
   namespace-well-formed, but MUST NOT return a stream error in response
   to the receipt of such data.

  Note: Because these restrictions were underspecified in an earlier
  revision of this specification, it is possible that
  implementations based on that revision will send data that does
  not comply with the restrictions; an entity SHOULD be liberal in
  accepting such data.

 ***



I think we should understand the consequenses of using SHOULD (and not MUST).

 an entity SHOULD be liberal in accepting such data.

This translates to:

  an entity SHOULD NOT use a namespace-validating parser (as defined
in [XML-NAMES])

This is indeed the case. Entities in the XMPP world tend not to use
namespace aware parsers. In
fact most do not care about namespaces at all (aside from a few
specific cases where the XEPs use
a namespace prefix in the examples, the implementations are often
coded to look for that prefix).

Testing with ejabberd and gajim (quite a popular combination), it was
quickly clear that both did
not deal with valid messages where a prefix was used, and both did
deal with messages with a
namespace other than jabber:client.

All implementations must be namespace non-aware if they don't wish to
have the disconnection bug
that gajim had. I would like to argue that it was not a bug at all.

If we are going to allow non-namespace aware servers, then we should
remove the reference to
[XML-NAMES] (except maybe to specify that this is something which
SHOULD NOT be complied to).
With non xmlns-aware entities allowed, we might as well remove the
illusion of the possibility
of using namespaces too. After all, many entities don't care about it,
and would very likely show
non-compliant behavior if a different namespace is used, or if a
prefix is used. At the moment
the use of namespaces just causes developers to assume that they can
use their own elements with
names similar to those used by various RFC/XEP elements with a
different namespace. Such use
would of course result in incorrect behavior in all non-xmlns-aware
entities. If non-xmlns-aware
entities are going to be allowed, we should declare that all entities
should be so, in order to
save everyone from headaches.

The behavior when a server receives a badly-namespaced stanza needs to
be clarified. I have
been working with Matthew Wild on a not-yet-released server. We are
wondering whether we should
discard the stanza, the element, or raise a stream error. After all,
there really is no reason
that any (non-malicious) entity should be sending invalid namespaces.
If they do then it is a bug,
just the same as if they sent invalid XML.

Just discarding it has a problem. Someone could send a message with
invalid namespaces to a
conference.jabber.org room. Everyone (human) would see that, except
entities which care about
namespaces. From the protocol's perspective this would be correct,
but not from a normal user's
perspective.

Sorry if this sounds like a rant. I just don't like where we are headed.

Waqas.


Re: [Standards] well-formedness

2008-10-17 Thread Brendan Taylor
On Wed, Oct 15, 2008 at 06:11:56AM -0600, Peter Saint-Andre wrote:
 Brendan Taylor wrote:
  I just noticed this clause:
  
... but MUST NOT return a stream error in response to the receipt
of [Bad XMLNS].
  
  Why is that there? Silently dropping the stanza seems like a bad idea.
  It also requires servers to be able to handle Bad XMLNS, which is a step
  backward from where we are now.
 
 That is the nub of the issue being discussed here. :)

Even if we're not willing to require servers to be draconian about Bad
XMLNS, the ones that are draconian about it should return the same kind
of error that they would have for Bad XML.

I don't see any reason to say that they MUST NOT return a stream error.


pgpuR8Tpjeuou.pgp
Description: PGP signature


Re: [Standards] well-formedness

2008-10-17 Thread Peter Saint-Andre
Brendan Taylor wrote:
 On Wed, Oct 15, 2008 at 06:11:56AM -0600, Peter Saint-Andre wrote:
 Brendan Taylor wrote:
 I just noticed this clause:

   ... but MUST NOT return a stream error in response to the receipt
   of [Bad XMLNS].

 Why is that there? Silently dropping the stanza seems like a bad idea.
 It also requires servers to be able to handle Bad XMLNS, which is a step
 backward from where we are now.
 That is the nub of the issue being discussed here. :)
 
 Even if we're not willing to require servers to be draconian about Bad
 XMLNS, the ones that are draconian about it should return the same kind
 of error that they would have for Bad XML.
 
 I don't see any reason to say that they MUST NOT return a stream error.

In draft-saintandre-rfc3920bis-08, Section 12.3 *currently* reads:

***

12.3.  Well-Formedness

   There are two varieties of well-formedness:

   o  XML-well-formedness in accordance with the definition of well-
  formed in Section 2.1 of [XML].
   o  Namespace-well-formedness in accordance with the definition of
  namespace-well-formed in Section 7 of [XML-NAMES].

   The following rules apply.

   An XMPP entity MUST NOT generate data that is not XML-well-formed.
   An XMPP entity MUST NOT accept data that is not XML-well-formed;
   instead it MUST return an xml-not-well-formed/ stream error and
   close the stream over which the data was received.

   An XMPP entity MUST NOT generate data that is not namespace-well-
   formed.  An XMPP server SHOULD NOT route or deliver data that is not
   namespace-well-formed, and SHOULD return a stanza error of not-
   acceptable/ or a stream error of xml-not-well-formed/ in
   response to the receipt of such data.

  Note: Because these restrictions were underspecified in an earlier
  revision of this specification, it is possible that
  implementations based on that revision will send data that does
  not comply with the restrictions; an entity SHOULD be liberal in
  accepting such data.

***

Further feedback is welcome.

/psa


Re: [Standards] well-formedness

2008-10-15 Thread Peter Saint-Andre
Brendan Taylor wrote:
 On Tue, Oct 14, 2008 at 02:23:52PM -0600, Peter Saint-Andre wrote:
 So how would you tweak the text I proposed?
 
 I would make the paragraph for namespace well-formedness identical to
 the one for plain well-formedness.
 
 I just noticed this clause:
 
   ... but MUST NOT return a stream error in response to the receipt
   of [Bad XMLNS].
 
 Why is that there? Silently dropping the stanza seems like a bad idea.
 It also requires servers to be able to handle Bad XMLNS, which is a step
 backward from where we are now.

That is the nub of the issue being discussed here. :)

Peter

-- 
Peter Saint-Andre
https://stpeter.im/




Re: [Standards] well-formedness

2008-10-14 Thread Sergei Golovan
On Tue, Oct 14, 2008 at 2:28 AM, Peter Saint-Andre [EMAIL PROTECTED] wrote:
 OK here's what I have now in my working copy of rfc3920bis:
   An XMPP entity MUST NOT generate data that is not namespace-well-
   formed.  An XMPP server SHOULD NOT route or deliver data that is not

Unfortunately, this 'SHOULD' doesn't solve the problem which I think is one
of the main current issue with XMPP - to parse XMPP you have to use
non-standard XML parser. Because buggy client (or malicious user) certainly
will generate not namespace-well-formed XML, and some servers will route
it to all clients. So, to write your own client or component you have to write
own XML parser also. And this seems ridiculous to me.

BTW, two ejabberd developers (ejabberd is a server which happily routes
stanzas with unbound prefixes, and doesn't understand any XMLNS prefix
other than 'stream') have chatted tomorrow morning and said the following:

zinid An XMPP entity MUST NOT generate data that is not namespace-well-
formed. An XMPP server SHOULD NOT route or deliver data that is not
namespace-well-formed, but MUST NOT return a stream error in response
to the receipt of such data.
zinid вот что будет в bis
aleksey т.е. можно ничо не трогать
zinid ну получается так

Which means:
zinid this text will be in bis
aleksey so, we don't have to change anything
zinid it appears so

So, I think that the proposed change just clarified a bit that it's a client
responsibility to work with non-xmlns-well-formed XML.

Cheers!
-- 
Sergei Golovan


Re: [Standards] well-formedness

2008-10-14 Thread Jonathan Schleifer
Dave Cridland [EMAIL PROTECTED] wrote:

 No it wasn't.

A patch for ejabberd was ready in 2 days, I'm using that on my server
and never got a problem. For Gajim, it tooks *MONTHS* to get a fix.
Wasn't that bug even open for more than 1 year? So what's easier to fix
now? Clearly ejabberd…

-- 
Jonathan


signature.asc
Description: PGP signature


Re: [Standards] well-formedness

2008-10-14 Thread Jonathan Schleifer
Dave Cridland [EMAIL PROTECTED] wrote:

 Erm. I wrote the fix for Gajim, I'm well aware of how long it took  
 me, from a first look to a fix. I'd put it at two days elapsed, but  
 about three hours of actual coding. I'd dispute the idea I was
 trying to get it done for more than a year - I've not even used Gajim
 that long.

There were several fixes before posted in the ticket, which all had
side effects. That ticket was open for more than a year, someone
eventually removed the milestone and said we won't fix it as we'd have
to change even expat for it. I don't know what your fix does, but it
seemed that expat had to be used in non-namespace-aware-mode which
would mean even some python stuff would have been needed to be
rewritten. What way did you chose?

As you can see, it's clearly more problematic for the client than for
the server, for example ejabberd had a fix ready after the bug was open
for 2 days or so.

-- 
Jonathan


signature.asc
Description: PGP signature


Re: [Standards] well-formedness

2008-10-14 Thread Brendan Taylor
On Tue, Oct 14, 2008 at 10:36:09AM -0600, Peter Saint-Andre wrote:
 Maybe I agree with you (simple clients, complex servers), but I'd like
 to hear what other server and client developers think.

XMPP has mostly avoided Postel's Law. Nobody has to deal with ill-formed
XML because nobody sends ill-formed XML. Nobody sends ill-formed XML
because nobody accepts it, and what use is a client or server that
nobody can receive messages from?

I think this is ideal; bad producers never have a chance to enter the
ecosystem.

This unfortunate oversight in the original RFC has spoiled that, but we
can still fix it. Even if it takes years for most server deployments to
be updated, I'm expecting XMPP to be around for decades (centuries?).

The alternative (limiting what parsers non-toy clients can use, maybe even
requiring them to write their own or use liberal XML parsers) is ugly.
I don't want XMPP to end up like HTML. (though I doubt it would ever get
*that* bad :))

Also keep in mind that in this context servers only means the actual
stanza router; having to handle namespace ill-formed XML beyond rejecting
it complicates things for component developers too.


pgp5plDk4R1DP.pgp
Description: PGP signature


Re: [Standards] well-formedness

2008-10-14 Thread Peter Saint-Andre
Brendan Taylor wrote:
 On Tue, Oct 14, 2008 at 10:36:09AM -0600, Peter Saint-Andre wrote:
 Maybe I agree with you (simple clients, complex servers), but I'd like
 to hear what other server and client developers think.
 
 XMPP has mostly avoided Postel's Law. Nobody has to deal with ill-formed
 XML because nobody sends ill-formed XML. Nobody sends ill-formed XML
 because nobody accepts it, and what use is a client or server that
 nobody can receive messages from?
 
 I think this is ideal; bad producers never have a chance to enter the
 ecosystem.

I never looked at it that way, but I see what you mean.

 This unfortunate oversight in the original RFC has spoiled that, but we
 can still fix it. Even if it takes years for most server deployments to
 be updated, I'm expecting XMPP to be around for decades (centuries?).

Decades at least.

 The alternative (limiting what parsers non-toy clients can use, maybe even
 requiring them to write their own or use liberal XML parsers) is ugly.
 I don't want XMPP to end up like HTML. (though I doubt it would ever get
 *that* bad :))

Please not. :)

 Also keep in mind that in this context servers only means the actual
 stanza router; having to handle namespace ill-formed XML beyond rejecting
 it complicates things for component developers too.

This is true.

So how would you tweak the text I proposed?

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



Re: [Standards] well-formedness

2008-10-14 Thread Robert Quattlebaum
I think you should also describe what a XMPP client should do upon  
receiving good XML 1.0 which is also bad XML 1.1.


My preference is that the client SHOULD NOT or MUST NOT interpret  
it as a stream error.


On Oct 13, 2008, at 3:28 PM, Peter Saint-Andre wrote:


OK here's what I have now in my working copy of rfc3920bis:

***

12.3.  Well-Formedness

  There are two varieties of well-formedness:

  o  XML-well-formedness in accordance with the definition of well-
 formed in Section 2.1 of [XML].
  o  Namespace-well-formedness in accordance with the definition of
 namespace-well-formed in Section 7 of [XML-NAMES].

  The following rules apply.

  An XMPP entity MUST NOT generate data that is not XML-well-formed.
  An XMPP entity MUST NOT accept data that is not XML-well-formed;
  instead it MUST return an xml-not-well-formed/ stream error and
  close the stream over which the data was received.

  An XMPP entity MUST NOT generate data that is not namespace-well-
  formed.  An XMPP server SHOULD NOT route or deliver data that is not
  namespace-well-formed, but MUST NOT return a stream error in  
response

  to the receipt of such data.

 Note: Because these restrictions were underspecified in an  
earlier

 revision of this specification, it is possible that
 implementations based on that revision will send data that does
 not comply with the restrictions; an entity SHOULD be liberal in
 accepting such data.

***





__
Robert Quattlebaum
Jabber: [EMAIL PROTECTED]
eMail:  [EMAIL PROTECTED]
www:http://www.deepdarc.com/





Re: [Standards] well-formedness

2008-10-14 Thread Peter Saint-Andre
Sergei Golovan wrote:
 On Tue, Oct 14, 2008 at 2:28 AM, Peter Saint-Andre [EMAIL PROTECTED] wrote:
 OK here's what I have now in my working copy of rfc3920bis:
   An XMPP entity MUST NOT generate data that is not namespace-well-
   formed.  An XMPP server SHOULD NOT route or deliver data that is not
 
 Unfortunately, this 'SHOULD' doesn't solve the problem which I think is one
 of the main current issue with XMPP - to parse XMPP you have to use
 non-standard XML parser. Because buggy client (or malicious user) certainly
 will generate not namespace-well-formed XML, and some servers will route
 it to all clients. So, to write your own client or component you have to write
 own XML parser also. And this seems ridiculous to me.
 
 BTW, two ejabberd developers (ejabberd is a server which happily routes
 stanzas with unbound prefixes, and doesn't understand any XMLNS prefix
 other than 'stream') have chatted tomorrow morning and said the following:
 
 zinid An XMPP entity MUST NOT generate data that is not namespace-well-
 formed. An XMPP server SHOULD NOT route or deliver data that is not
 namespace-well-formed, but MUST NOT return a stream error in response
 to the receipt of such data.
 zinid вот что будет в bis
 aleksey т.е. можно ничо не трогать
 zinid ну получается так
 
 Which means:
 zinid this text will be in bis
 aleksey so, we don't have to change anything
 zinid it appears so
 
 So, I think that the proposed change just clarified a bit that it's a client
 responsibility to work with non-xmlns-well-formed XML.

And I take it you think that's bad, right?

Maybe I agree with you (simple clients, complex servers), but I'd like
to hear what other server and client developers think.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



Re: [Standards] well-formedness

2008-10-14 Thread Dave Cridland

On Tue Oct 14 20:45:43 2008, Brendan Taylor wrote:
XMPP has mostly avoided Postel's Law. Nobody has to deal with  
ill-formed

XML because nobody sends ill-formed XML. Nobody sends ill-formed XML
because nobody accepts it, and what use is a client or server that
nobody can receive messages from?


Ah, be careful, you're convolving two distinct cases - ill-formed  
XML to me means XML which is not well-formed, and that's a stream  
error, pure and simple. I'll call this Bad XML.


Where Postel's Law applies is that XMPP speakers need to cope with  
XML which *is* well-formed, but might not be namespace well-formed.  
This, I'll call Bad XMLNS.


I think this is ideal; bad producers never have a chance to enter  
the

ecosystem.


But they have and do. It's not very often, I agree, but by stating  
that this situation exists, and you mustn't - or shouldn't - choke on  
it, I think we're documenting the facts as they are, not as we might  
wish them to be.


Moreover, we're then putting ourselves in the position of describing  
the situation in terms of interoperability requirements, which is  
what those MUSTs and SHOULDs describe. They're not statements of  
opinion. They're saying that:


1) Sending Bad XML will break things. So will trying to process it -  
the only safe thing to do is give up.


2) Sending Bad XMLNS will probably break some of the things, some of  
the time. Trying to process it, though, is fine, and refusing the  
process the entire stream will cause you pain.


I'm assuming that, thus far, you agree.

The alternative (limiting what parsers non-toy clients can use,  
maybe even
requiring them to write their own or use liberal XML parsers) is  
ugly.


Please, Brendan, I'm getting increasingly fed up with this persistent  
rumour that solving this issue in clients requires some kind of  
special parser, or difficult programming, or voodoo in the dead of  
night.


Gajim, prior to the fix, used expat. It then tracked namespace  
declarations, and built a DOM using a bespoke XML library.  
(Substitute Gajim by xmppy, if you prefer).


After the fix, it uses expat. It then tracks namespace declarations,  
and builds a DOM using a bespoke XML library.


In broad terms, it's as simple as that - there is really no  
difference.


Actually, by far the most complex portion of the patch is the hairy  
code trying to maintain the unique way of *generating* XML that Gajim  
uses, via the aforementioned bespoke DOM class.


And I'll certainly tell you that not only is the fix easier to do in  
Gajim than it would be in our server, it also results in a faster  
XMPP network. That margin might not be big in low traffic servers,  
but in higher traffic cases it'll really add up. This performance  
impact is, for us (and our customers) worth the additional  
consideration in choice of client caused by the potentially reduced  
interoperability.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] well-formedness

2008-10-14 Thread Peter Saint-Andre
Robert Quattlebaum wrote:
 I think you should also describe what a XMPP client should do upon
 receiving good XML 1.0 which is also bad XML 1.1.
 
 My preference is that the client SHOULD NOT or MUST NOT interpret it
 as a stream error. 

XMPP 1.0 supports XML 1.0 only. A future version of XMPP might support
other versions of XML, but that is out of scrope for RFC 3920 and
probably rfc3920bis. However, we need to talk about XML versioning so
that this is clear.

I think that your suggestion makes sense, but I need to look at XML 1.1
again to be sure.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/




Re: [Standards] well-formedness

2008-10-14 Thread Brendan Taylor
On Tue, Oct 14, 2008 at 11:21:43PM +0100, Dave Cridland wrote:
 Where Postel's Law applies is that XMPP speakers need to cope with XML 
 which *is* well-formed, but might not be namespace well-formed. This, I'll 
 call Bad XMLNS.

Nice coinage :).

I don't think Postel's Law should have to apply in either case (except
in the interim period where we're forced to deal with Bad XMLNS).

 I think this is ideal; bad producers never have a chance to enter the
 ecosystem.


 But they have and do. It's not very often, I agree, but by stating that 
 this situation exists, and you mustn't - or shouldn't - choke on it, I 
 think we're documenting the facts as they are, not as we might wish them to 
 be.

They do? Are there any widely deployed clients that generate Bad XML,
or servers that pass it on?

(Note that I'm just talking about plain Bad XML here, the point was
to demonstrate that refusing to handle Bad XML has had certain
benefits and that refusing to handle Bad XMLNS can have similar
ones.)

 Please, Brendan, I'm getting increasingly fed up with this persistent 
 rumour that solving this issue in clients requires some kind of special 
 parser, or difficult programming, or voodoo in the dead of night.

I certainly didn't mean to imply anything like that.

Having to handle Bad XMLNS limits the parsers a client can use. Not all
parsers will accept Bad XMLNS. Many people are already familiar with
XML parsers that don't. Some platforms may not have XML parsers that do.

It seems to me that the simplest solution is to follow the XML
specifications that we claim to be using.


pgpYlgxIaWNPu.pgp
Description: PGP signature


Re: [Standards] well-formedness

2008-10-14 Thread Robert Quattlebaum
I was actually confused into thinking that XML 1.1 was just  
XML1.0+Namespaces... Which happens to not be the case.


So replace XML 1.1 with XML 1.0+Namespaces, and my original  
comment will make sense. :)


On Oct 14, 2008, at 3:49 PM, Peter Saint-Andre wrote:


Robert Quattlebaum wrote:

I think you should also describe what a XMPP client should do upon
receiving good XML 1.0 which is also bad XML 1.1.

My preference is that the client SHOULD NOT or MUST NOT  
interpret it

as a stream error.


XMPP 1.0 supports XML 1.0 only. A future version of XMPP might support
other versions of XML, but that is out of scrope for RFC 3920 and
probably rfc3920bis. However, we need to talk about XML versioning so
that this is clear.

I think that your suggestion makes sense, but I need to look at XML  
1.1

again to be sure.

Peter

--
Peter Saint-Andre
https://stpeter.im/





__
Robert Quattlebaum
Jabber: [EMAIL PROTECTED]
eMail:  [EMAIL PROTECTED]
www:http://www.deepdarc.com/





Re: [Standards] well-formedness

2008-10-14 Thread Brendan Taylor
On Tue, Oct 14, 2008 at 02:23:52PM -0600, Peter Saint-Andre wrote:
 So how would you tweak the text I proposed?

I would make the paragraph for namespace well-formedness identical to
the one for plain well-formedness.

I just noticed this clause:

  ... but MUST NOT return a stream error in response to the receipt
  of [Bad XMLNS].

Why is that there? Silently dropping the stanza seems like a bad idea.
It also requires servers to be able to handle Bad XMLNS, which is a step
backward from where we are now.


pgp0UVLsJSeUB.pgp
Description: PGP signature