Re: PaceMustBeWellFormed status

2005-01-26 Thread Robert Sayre
Sam Ruby wrote:
The feedvalidator currently declares content served as text/plain as
invalid.  I would very much like to keep this check, for a number of
reasons. 

What I am saying is
that the Atom spec should allow consumers enough leeway to process such
resources as non-feed (specifically, I would hope that they would
process such resources as plain text).

Section 2 says "Both kinds of Atom documents are specified in terms of 
the XML  Information Set, serialised as XML 1.0 [W3C.REC-xml-20040204] 
and identified with the  "application/atom+xml" media type. Atom 
Documents MUST be well-formed XML."

So, Atom documents are well-formed XML identified with the Atom media 
type. The specification doesn't talk about other media types or 
ill-formed XML documents. Is there something more we can add to the 
specification? I don't think PaceMustBeWellFormed is it.

Robert Sayre


Re: PaceMustBeWellFormed status

2005-01-26 Thread Sam Ruby
Martin Duerst wrote:
 >> 5.
 >> Publishers MUST NOT serve Atom feeds with a media type other than 
"application/atom+xml" (registered in this Section 8 of document) or one 
of the XML media types defined in RFC 3023 or its successor. In 
particular, "text/plain" is never an appropriate media type for an Atom 
feed. When retrieving an Atom feed served with a non-XML media type, 
clients MUST reject it as non-well-formed.
 >
 >
 >We have no business stating this. I will serve Atom feeds as 
text/plain if I want them processed as text documents.

At which time I will claim that it's no longer an Atom feed, it just
looks like one :-). The Atom spec should talk about Atom as Atom;
that somebody might want to look at the Atom document as a text document,
or even as a hex dump, isn't something we should be talking about.
I agree with Martin.
The feedvalidator currently declares content served as text/plain as
invalid.  I would very much like to keep this check, for a number of
reasons.  For now, I will simply state one:
text/plain can be used as part of a security breach.  The feedvalidator
can't stop such security breaches, but can discourage widescale
deployment of feeds served in this manner.  If we ever got to the point
where there were widescale deployment of feeds served with text/plain,
then consumers would essentially be forced to be liberal.
Note what I am not saying.  I am not saying that applications other than
the feedvalidator need to reject such feeds.  Nor am I saying that you
can't publish the same sequence of bytes that one would tend to find in
an Atom feed with a content type of text/plain.  What I am saying is
that the Atom spec should allow consumers enough leeway to process such
resources as non-feed (specifically, I would hope that they would
process such resources as plain text).
- Sam Ruby


Re: PaceMustBeWellFormed status

2005-01-25 Thread Bjoern Hoehrmann

* Tim Bray wrote:
>If there were no further discussion:  The WG completely failed to 
>converge to consensus on these issues last time around. Consensus can 
>still be found here. -Tim

I would suggests we contact implementers and ask them whether they will
conform to these requirements or not. If we are not convinced that they
will implement the spec as written, I consider the requirements point-
less and misleading.

How do we expect to manage updates to RFC3023 with/without this Pace?
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: PaceMustBeWellFormed status

2005-01-25 Thread Bjoern Hoehrmann

* Tim Bray wrote:
>On Jan 24, 2005, at 5:12 PM, Joe Gregorio wrote:
>> It's good work but it belongs in a primer or best practices document.
>
>+1.  I like it, I'd like to use it somewhere, but I don't think it 
>belongs in the core spec. -Tim

I am afraid, if requirements ala "clients MUST stop processing at the
first well-formedness error" are not in the "core spec", implementers
might miss them, in fact, I would argue that the requirement does not
apply if it is not part of the conformance requirements for Atom format
implementations.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: PaceMustBeWellFormed status

2005-01-25 Thread Bjoern Hoehrmann

* Robert Sayre wrote:
>I'm very -1 on this, since it makes the definition of the Atom format an 
>HTTP message, rather than an XML document.

It seems common practise to include requirements for implementations of
a format into the specification of the format. Why should these be kept
separate? In fact, I see a number of statements ala "Processors MUST" in
the current draft. Do you mean these should be removed?
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: PaceMustBeWellFormed status

2005-01-25 Thread Eric Scheid

On 26/1/05 8:58 AM, "Walter Underwood" <[EMAIL PROTECTED]> wrote:

>  An external transport protocol (e.g. HTTP with text/xml content-type)
>  may force the document to be decoded as US-ASCII. In that case ...

+1

e.



Re: PaceMustBeWellFormed status

2005-01-25 Thread Robert Sayre
Martin Duerst wrote:
>>
>> Atom feeds served over HTTP MUST be well-formed XML 1.0, as defined 
in Section 2.1 of the XML specification 
. Furthermore, the 
concept of XML well-formedness relies on first determining the 
character encoding of the XML document. RFC 3023 defines how to 
determine the character encoding of XML documents served over HTTP.
>
>
>The first sentence is redundant because all Atom feeds must be 
well-formed. The second sentence is plainly false. The two concepts 
are unrelated.

Could you explain/substantiate your claim that the second sentence
is plainly false? I understand it to be true, and I have implementation
experience with the W3C Markup Validator to back it up.

Illegal characters are fatal errors, which are quite likely when 
processing a document with the wrong encoding. Fatal errors aren't 
necessarily the result of ill-formed documents. In this case, the term 
"well-formed" seems to come down to the "Char" production, which is 
arrived at after decoding. It's possible to process an XML document with 
the wrong character encoding and still have a well-formed XML document.

In any case, this Pace is totally wrapped up in HTTP, pedantic, and 
incorrect in numerous ways.

Robert Sayre


Re: PaceMustBeWellFormed status

2005-01-25 Thread Martin Duerst
At 07:35 05/01/26, Robert Sayre wrote:
>
>Walter Underwood wrote:
>> 6. Client processing requirements
>>
>> Atom feeds served over HTTP MUST be well-formed XML 1.0, as defined in 
Section 2.1 of the XML specification 
. Furthermore, the concept 
of XML well-formedness relies on first determining the character encoding 
of the XML document. RFC 3023 defines how to determine the character 
encoding of XML documents served over HTTP.
>
>
>The first sentence is redundant because all Atom feeds must be 
well-formed. The second sentence is plainly false. The two concepts are 
unrelated.

Could you explain/substantiate your claim that the second sentence
is plainly false? I understand it to be true, and I have implementation
experience with the W3C Markup Validator to back it up.
>> 5.
>> Publishers MUST NOT serve Atom feeds with a media type other than 
"application/atom+xml" (registered in this Section 8 of document) or one of 
the XML media types defined in RFC 3023 or its successor. In particular, 
"text/plain" is never an appropriate media type for an Atom feed. When 
retrieving an Atom feed served with a non-XML media type, clients MUST 
reject it as non-well-formed.
>
>
>We have no business stating this. I will serve Atom feeds as text/plain 
if I want them processed as text documents.

At which time I will claim that it's no longer an Atom feed, it just
looks like one :-). The Atom spec should talk about Atom as Atom;
that somebody might want to look at the Atom document as a text document,
or even as a hex dump, isn't something we should be talking about.
Regards,Martin. 



Re: PaceMustBeWellFormed status

2005-01-25 Thread Robert Sayre
Walter Underwood wrote:
--On Tuesday, January 25, 2005 04:21:29 PM -0500 Robert Sayre 
<[EMAIL PROTECTED]> wrote:

It's required for interop over HTTP. That's off-topic in the format 
draft,
which mentions HTTP once, in passing.

So you suggest that we have additional format requirements in the
protocol spec?

I suggest saying nothing about it. I will explain why below.
The headers-ueber-alles rule in HTTP means that a legal Atom feed
can become illegal when served as text/xml. That is going to
suprise people and cause breakage.

How about something along the wording of the XML spec's section 4.3.3:
In the absence of information provided by an external transport
protocol (e.g. HTTP or MIME), it is an error for ...
Point 1: It's already in the XML spec. This means we are targetting 
implementors who will understand section six, but who haven't read the 
XML spec. Not a set of people worth adding a whole section for. This is 
really just a tour of Apache's mime.types file that inserts many 
damaging requirements along the way. I will detail them:

6. Client processing requirements
Atom feeds served over HTTP MUST be well-formed XML 1.0, as defined in 
Section 2.1 of the XML specification 
. Furthermore, the 
concept of XML well-formedness relies on first determining the 
character encoding of the XML document. RFC 3023 defines how to 
determine the character encoding of XML documents served over HTTP.

The first sentence is redundant because all Atom feeds must be 
well-formed. The second sentence is plainly false. The two concepts are 
unrelated.

6.1 Determining the character encoding of an Atom feed
The rules for determining the character encoding of an Atom feed are 
the same as determining the character encoding of any XML document 
served over HTTP. The rules are wholely defined by RFC 3023, but they 
are summarized here because there has been widespread confusion over 
how RFC 3023 should be interpreted:

The text then goes on to state many requirements that are not in RFC 3023.
1. When serving an Atom feed, it is RECOMMENDED that publishers 
include the charset parameter along with the media type in the 
Content-type HTTP header. If the charset parameter is present, clients 
MUST parse the Atom feed in that charset, ignoring any charset 
declared in the encoding attribute of the XML declaration.

2.
Publishers SHOULD serve all Atom feeds with the media type 
"application/atom+xml" (registered in Section 8 of this document). 
Clients MUST treat "application/atom+xml" as "application/xml" and 
determine the character encoding as per RFC 3023 or its successor.

Publishers should serve their documents with the MIME type they want 
clients to use.

3.
If a publisher wishes to serve an Atom feed over HTTP, but for some 
reason they are unable to use the "application/atom+xml" media type, 
the publisher SHOULD use "application/xml", and clients MUST determine 
the character encoding as per RFC 3023 or its successor.

Publishers should serve their documents with the MIME type they want 
clients to use.

4.
If a publisher is unable to serve their Atom feed with a Content-Type 
of "application/atom+xml" or "application/xml", they MAY use 
"text/xml". According to RFC 3023, XML documents served as "text/xml" 
with no charset parameter have a character encoding of "us-ascii".

Of course they can serve it as text/xml. They should do that if they 
want people to view source. It's not appropriate to send the content to 
an Atom processor.

5.
Publishers MUST NOT serve Atom feeds with a media type other than 
"application/atom+xml" (registered in this Section 8 of document) or 
one of the XML media types defined in RFC 3023 or its successor. In 
particular, "text/plain" is never an appropriate media type for an 
Atom feed. When retrieving an Atom feed served with a non-XML media 
type, clients MUST reject it as non-well-formed.

We have no business stating this. I will serve Atom feeds as text/plain 
if I want them processed as text documents. Clients shouldn't send them 
to the XML processor at all. Well-formedness errors come from XML 
processors, not passive-aggressive applications.

6.2 Handling well-formedness errors
After determining the character encoding by the rules in section 6.1 
of this document, clients MUST use a conforming XML parser to parse an 
Atom feed. In particular, clients MUST stop processing at the first 
well-formedness error, although they MAY display any information they 
have parsed before the first well-formedness error.

The second sentence is incorrect, since it's acceptable for processors 
to continue reporting errors.

Here is a non-comprehensive list of things clients have been known to 
do after encountering a well-formedness error, which this document 
specifically prohibits:
•
Clients MUST NOT reparse the feed in any other character encoding.
•
Clients MUST NOT "tidy" the feed to attempt to fix mismatched start 
and end tags.
•
Clien

Re: PaceMustBeWellFormed status

2005-01-25 Thread Walter Underwood
--On Tuesday, January 25, 2005 04:21:29 PM -0500 Robert Sayre <[EMAIL PROTECTED]> wrote:
It's required for interop over HTTP. That's off-topic in the format draft,
which mentions HTTP once, in passing.
So you suggest that we have additional format requirements in the
protocol spec?
The headers-ueber-alles rule in HTTP means that a legal Atom feed
can become illegal when served as text/xml. That is going to
suprise people and cause breakage.
How about something along the wording of the XML spec's section 4.3.3:
  In the absence of information provided by an external transport
  protocol (e.g. HTTP or MIME), it is an error for ...
Perhaps:
  An external transport protocol (e.g. HTTP with text/xml content-type)
  may force the document to be decoded as US-ASCII. In that case ...
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: PaceMustBeWellFormed status

2005-01-25 Thread Robert Sayre
Walter Underwood wrote:
--On Tuesday, January 25, 2005 03:39:13 PM -0500 Robert Sayre 
<[EMAIL PROTECTED]> wrote:

I'm very -1 on this, since it makes the definition of the Atom format
an HTTP message, rather than an XML document.
On top of that, most of the Pace is babysitting. To the Guide with it.

Except that there is no implementors' guide, and this is required for
interoperability. From a formal perspective, maybe it doesn't belong.

It's required for interop over HTTP. That's off-topic in the format 
draft, which mentions HTTP once, in passing.

Robert Sayre


Re: PaceMustBeWellFormed status

2005-01-25 Thread Walter Underwood
--On Tuesday, January 25, 2005 03:39:13 PM -0500 Robert Sayre <[EMAIL PROTECTED]> wrote:
I'm very -1 on this, since it makes the definition of the Atom format
an HTTP message, rather than an XML document.
On top of that, most of the Pace is babysitting. To the Guide with it.
Except that there is no implementors' guide, and this is required for
interoperability. From a formal perspective, maybe it doesn't belong.
From a practical perspective, it does.
It was clear from the WG discussion that this is not a well-understood
area, even for experts in blog protocols. We can't assume that implementors
already know RFC 3023 or even know it exists. If we assume that, we
increase interop problems for Atom.
It would be nice if everyone knew about RFC 3023 and did it right, but
they don't.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: PaceMustBeWellFormed status

2005-01-25 Thread Robert Sayre
Walter Underwood wrote:
--On Monday, January 24, 2005 04:17:40 PM -0800 Tim Bray 
<[EMAIL PROTECTED]> wrote:

If there were no further discussion:  The WG completely failed to 
converge to
consensus on these issues last time around. Consensus can still be 
found here. -Tim

I'm +1 on this, and feel that it belongs in the spec. This is a
constraint on the format of the feed document, and is testable.

I would add a note that 3023 is normative, and maybe move the
notes in 6.1 to an appendix. 

I'm very -1 on this, since it makes the definition of the Atom format an 
HTTP message, rather than an XML document.
On top of that, most of the Pace is babysitting. To the Guide with it.

Robert Sayre


Re: PaceMustBeWellFormed status

2005-01-25 Thread Julian Reschke
Walter Underwood wrote:
--On Monday, January 24, 2005 04:17:40 PM -0800 Tim Bray 
<[EMAIL PROTECTED]> wrote:

If there were no further discussion:  The WG completely failed to 
converge to
consensus on these issues last time around. Consensus can still be 
found here. -Tim

I'm +1 on this, and feel that it belongs in the spec. This is a
constraint on the format of the feed document, and is testable.
...
+1 (if we can get consensus this time...)
Julian
--
bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


Re: PaceMustBeWellFormed status

2005-01-25 Thread Walter Underwood
--On Monday, January 24, 2005 04:17:40 PM -0800 Tim Bray <[EMAIL PROTECTED]> wrote:
If there were no further discussion:  The WG completely failed to converge 
to
consensus on these issues last time around. Consensus can still be found here. 
-Tim
I'm +1 on this, and feel that it belongs in the spec. This is a
constraint on the format of the feed document, and is testable.
Forbidding re-parsing (6.2) is OK, and not a restatement of the XML spec.
If you use a parser which isn't an XML parser, it might process
the doc. This says you can't do that.
I think that the rationale misstates the Pace. It says that Atom feeds
must always be ASCII, but the proposal only requires that for text/xml
feeds. application/xml feeds may use UTF-8, either in an encoding
declaration or with a charset parameter.
I would add a note that 3023 is normative, and maybe move the
notes in 6.1 to an appendix.
Are we sure we want "RFC 3023 or its successor" instead of "RFC 3023"?
A successor could make some Atom feeds illegal without a change to
the Atom spec.
wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek


Re: PaceMustBeWellFormed status

2005-01-25 Thread Asbjørn Ulsberg
On Mon, 24 Jan 2005 16:17:40 -0800, Tim Bray <[EMAIL PROTECTED]> wrote:
If there were no further discussion:  The WG completely failed to  
converge to consensus on these issues last time around. Consensus can  
still be found here. -Tim
I think we should do something about it, if we don't incorporate it into  
the specification. I like it very much.

--
Asbjørn Ulsberg -=|=-http://virtuelvis.com/quark/
«He's a loathsome offensive brute, yet I can't look away»


Re: PaceMustBeWellFormed status

2005-01-24 Thread Graham
On 25 Jan 2005, at 12:17 am, Tim Bray wrote:
If there were no further discussion:  The WG completely failed to 
converge to consensus on these issues last time around. Consensus can 
still be found here. -Tim
-1
Phrases like "must be parsed with" seem to be dictating implementation 
rather than interop. Also it largely duplicates stuff found in RFCs

Graham


smime.p7s
Description: S/MIME cryptographic signature


Re: PaceMustBeWellFormed status

2005-01-24 Thread Tim Bray

On Jan 24, 2005, at 5:12 PM, Joe Gregorio wrote:
It's good work but it belongs in a primer or best practices document.
+1.  I like it, I'd like to use it somewhere, but I don't think it 
belongs in the core spec. -Tim



Re: PaceMustBeWellFormed status

2005-01-24 Thread Joe Gregorio

It's good work but it belongs in a primer or best practices document.

   -joe


On Mon, 24 Jan 2005 16:17:40 -0800, Tim Bray <[EMAIL PROTECTED]> wrote:
> 
> If there were no further discussion:  The WG completely failed to
> converge to consensus on these issues last time around. Consensus can
> still be found here. -Tim
> 
> 


-- 
Joe Gregoriohttp://bitworking.org