Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
On 8 May 2005, at 4:30 am, Walter Underwood wrote: White space is not particularly meaningful in some of these languages, so we cannot expect them to suddenly pay attention to that just so they can use Atom. There will be plenty of content from other formats with this linguistically meaningless white space. So the idea is that whitespace should not appear at all in certain texts, and you'd like it to be stripped out at the consumer? There are only three possible ways for this to happen: 1) The consumer removes all whitespace, even in western texts 2) The consumer recognizes these languages and removes the whitespace automatically 3) The consumer is told what to do by an attribute (1) is obviously not plausible, but included for completeness. (2) is impractical. (3) is plausible, but may or may not end up being implemented in all consumers, making it kind of useless. I don't see how there's a better solution than texts that shouldn't be shown with whitespace not containing whitespace in the first place, which is what we have. Graham
RE: Last Call: 'The Atom Syndication Format' to Proposed Standard
--On May 10, 2005 8:57:47 AM -0400 Scott Hollenbeck [EMAIL PROTECTED] wrote: I have to agree with Paul. I don't believe that the issue of white space in the syndicated content is really an Atompub issue. It might be an issue for the content creator. It might be an issue for the reader. As long as the pipe between the two passes the content as submitted, though, the pipe has done its job. If publishers and subscribers have obstacles to using Atom, that sounds like a problem to me. Everyone has this problem is not a good reason to ignore it. Someone has to be the first to solve it, might as well be us. It is not acceptable to build formats for the English Wide Web. That doesn't exist any more. wunder -- Walter Underwood Principal Architect, Verity
RE: Last Call: 'The Atom Syndication Format' to Proposed Standard
At 8:16 AM -0700 5/10/05, Walter Underwood wrote: If publishers and subscribers have obstacles to using Atom, that sounds like a problem to me. It is a problem, of course. Everyone has this problem is not a good reason to ignore it. No one is ignoring it. This thread started because the format draft pointed out at least one aspect of the problem, which is more than most other RFCs do. Someone has to be the first to solve it, might as well be us. May I suggest that there are groups with more experience in the area than ours that would be more appropriate? In specific, since this problem affects all internationalized text, the Unicode Consortium has a much higher chance of solving the problem than an IETF Working Group who is focused on a syndication format. If you have a proposed solution to the problem (you didn't include one in your message to the WG), the Unicode Consortium is quite open to outside input on this type of thing. It is not acceptable to build formats for the English Wide Web. That doesn't exist any more. That is both grossly insulting to those of us have spent a great deal of time trying to make the Internet internationalization-friendly, and is also grossly technically inaccurate, unless you consider every written language other than Chinese, Japanese Kanji, Burmese, Khmer, Thai, Tagalog, Lao, and Tibetan to be English. (The folks who speak all the other languages might find you calling them English to be insulting too, of course.) --Paul Hoffman, Director --Internet Mail Consortium
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
Scott == Scott Hollenbeck [EMAIL PROTECTED] writes: I'm not asking for a lot of text; probably something about as long as this message. I believe that it can be a lot shorter: given the rationale above, it's not a problem for Atompub or any other XML-using protocol. For that matter, it's not really and XML problem at all: it affects text formats like HTML and RFC 2822 as well. Scott I have to agree with Paul. I don't believe that the issue Scott of white space in the syndicated content is really an Scott Atompub issue. It might be an issue for the content Scott creator. It might be an issue for the reader. As long as Scott the pipe between the two passes the content as submitted, Scott though, the pipe has done its job. Except that we try to build deployable protocols. If there aren't content creation tools that can do the right thing then it becomes a deployment issue for atompub. A perfectly reasonable response would be that you've thought about and understood the problem and there are sufficient tools available that can work with your proposed pipe that you don't need to care about the issue. --Sam
RE: Last Call: 'The Atom Syndication Format' to Proposed Standard
A perfectly reasonable response would be that you've thought about and understood the problem and there are sufficient tools available that can work with your proposed pipe that you don't need to care about the issue. Paul described text that's in the document to describe what MAY be done. I would argue that the existing text is evidence of the thought that has gone into understanding the issue and the alleged problem. -Scott-
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
At 2:14 PM -0400 5/10/05, Sam Hartman wrote: Except that we try to build deployable protocols. If there aren't content creation tools that can do the right thing then it becomes a deployment issue for atompub. True. Fortunately, there have been plenty of text editing tools that work with the no spaces between words languages for at least 20 years in the case of Chinese and Japanese Kanji (probably 15 years for the other languages). A perfectly reasonable response would be that you've thought about and understood the problem and there are sufficient tools available that can work with your proposed pipe that you don't need to care about the issue. I'll make that response. :-) --Paul Hoffman, Director --Internet Mail Consortium
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
A. Pagaltzis wrote: * Thomas Broyer [EMAIL PROTECTED] [2005-05-03 19:35]: This means type=text content is a single paragraph of text. If you need paragraphs, lists or any other structural formatting, you have to use type=html or type=xhtml with the appropriate content. Or type=text/plain, Id assume? If you're talking about atom:content, not for Text Constructs. -- Thomas Broyer
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
On 5/9/05, Sam Hartman [EMAIL PROTECTED] wrote: At least based on the discussion the IESG has been copied on, it doesn't sound like the working group has fully considered this issue. The responses have more of the character of those found from people trying to brush aside an issue than of people who have carefully considered something and concluded there is nothing to be done. Moreover, thisn issue cannot be unique to atom: it must effect many XML based protocols both within the IETF and within other standards organizations. Martin, I agree with Sam on both points. Can you give us an example of an XML format that successfully deals with your issue? Does XHTML differ from Atom? Robert Sayre
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
At 9:33 AM -0400 5/9/05, Sam Hartman wrote: My personal opinion as someone who is very shortly going to have to evaluate the atom specification is that you've identified an issue (space and line breaking) for some languages that should be considered. Your proposed solution seems highly undesirable in that it requires us to understand the language of the text being displayed. In the past we've had all sorts of problems doing that. Your proposed solution also seems quite complicated. Fully agree. Please note the text in the spec we are working from: If the value is text, the content of the Text construct MUST NOT contain child elements. Such text is intended to be presented to humans in a readable fashion. Thus, Atom Processors MAY collapse white-space (including line-breaks), and display the text using typographic techniques such as justification and proportional fonts. FWIW, this appears twice, identically, in the spec. Martin Dürst brought up CJK (well, actually CJT), saying that they don't use inter-word spacing. That's fine, but it is irrelevant to the text in the draft. If some text comes through with no spaces, there is no white space to collapse. His argument that some XML editors make long lines of text difficult to edit is clearly *way* out of scope for Atom, or any other XML-using protocol for that matter. It may well be that the solutions to this problem are worse than the problem itself. However I think it is important to specifically understand that is the case rather than failing to solve the problem because we failed to understand it. The case is that text that is supposed to be read by humans comes in many forms, with different line lengths, and so on. The paragraph from the spec says that Atom processors may alter these so that they can be presented better for the reader. Of course, they may also alter it to make it less readable, as many mail user agents do (sigh). Regardless, this says that the Atom processor is free to present things in text constructs in any fashion it deems suitable. This is particularly important for making Atom content accessible; for example, the Atom processor can use this rule to present text content by reading it aloud, by putting it on a screen greatly magnified one character at a time, and so on. At least based on the discussion the IESG has been copied on, it doesn't sound like the working group has fully considered this issue. The responses have more of the character of those found from people trying to brush aside an issue than of people who have carefully considered something and concluded there is nothing to be done. Sorry, but that's unfair. Alexy asked Ok, maybe it is just me, but what does it mean to collapse white-space? Does this mean to replace FWS (in RFC 2822 sense) with a single space? Martin's response was orthogonal: Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words. The rest of the thread wandered into the weeds because it was hard to figure out what was being discussed. Moreover, thisn issue cannot be unique to atom: it must effect many XML based protocols both within the IETF and within other standards organizations. Any protocol that has XML that includes human-readable text has this issue. Well, the processors of that XML does; the protocols themselves do not. Anyway as someone evaluating atompub's output it would be very useful if the working group responded to this last call comment. IN my mind a response would start with a researched description of the issue: either confirm that Chinese and Japanese and Thai tools work as described or explain how they actually work. Then describe what other standards have done about this problem. Finally describe what atompub has done about the problem and why. I'm not asking for a lot of text; probably something about as long as this message. I believe that it can be a lot shorter: given the rationale above, it's not a problem for Atompub or any other XML-using protocol. For that matter, it's not really and XML problem at all: it affects text formats like HTML and RFC 2822 as well. --Paul Hoffman, Director --Internet Mail Consortium
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
Henri Sivonen wrote: On May 8, 2005, at 06:30, Walter Underwood wrote: White space is not particularly meaningful in some of these languages, so we cannot expect them to suddenly pay attention to that just so they can use Atom. Why not? We expect them not no insert other random characters there. What do the same producers do with XHTML? Opera 7.53 and Safari 1.3 render a space between the second and third Kanji in http://hsivonen.iki.fi/test/cjk-whitespace.xhtml See also Ishida's tests: http://www.w3.org/International/tests/results/white-space-ideograph Special handling of white-space in CJK context is accounted for in the CSS2.1 spec (and will be described in more detail in CSS3 Text). There will be plenty of content from other formats with this linguistically meaningless white space. Why not just get rid of it in the producer end like you have to get rid of form feeds? Because form feeds are normally not used in source code files whereas line breaks and indendation often are? ~fantasai
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
On May 8, 2005, at 06:30, Walter Underwood wrote: White space is not particularly meaningful in some of these languages, so we cannot expect them to suddenly pay attention to that just so they can use Atom. Why not? We expect them not no insert other random characters there. What do the same producers do with XHTML? Opera 7.53 and Safari 1.3 render a space between the second and third Kanji in http://hsivonen.iki.fi/test/cjk-whitespace.xhtml There will be plenty of content from other formats with this linguistically meaningless white space. Why not just get rid of it in the producer end like you have to get rid of form feeds? -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
At 02:27 05/05/04, Thomas Broyer wrote: Martin Duerst wrote: At 03:33 05/04/29, Alexey Melnikov wrote: If the value is text, the content of the Text construct MUST NOT contain child elements. Such text is intended to be presented to humans in a readable fashion. Thus, Atom Processors MAY collapse white-space (including line-breaks), Ok, maybe it is just me, but what does it mean to collapse white-space? Does this mean to replace FWS (in RFC 2822 sense) with a single space? Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words. It doesn't work for languages that don't have spaces between words (Chinese, Japanese, Thai,...). If Text elements are only used for short things such as names or titles, that's not a big issue, the text in question can just be put on a single line. However, when the texts in question are long, it's a serious issue, and should be fixed. My understanding of type=text is that this is just text without any formatting. That's my understanding, too. Hence, it is not meant to be preformatted text such as text/plain or inside an (X)HTML pre. Yes. But that's exactly where the spacing problems with Chinese/Japanese/Thai are. There are no such problems for preformatted text, because the line breaking in the source (as sent) is the same as the line breaking when displayed. For free-flowing text, however, the line breaks in the source and those in the display are not (necessarily) the same, and so linebreaks have to be changed to spaces for Western languages, but to nothing for Chinese/Japanese (and most possibly to a zero-width non-breaking space for Thai), and the spec has to say something about this. Regards,Martin. This means type=text content is a single paragraph of text. If you need paragraphs, lists or any other structural formatting, you have to use type=html or type=xhtml with the appropriate content. I was about to writing a Pace about white-space handling in type=text (either using xml:space or an attribute that would have mimic'd the white-space CSS property) when I understood/recalled that Text Constructs have accessibility in mind (hence their limitation to textual contents) and preformatted text is not accessible enough. -- Thomas Broyer
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
--On May 7, 2005 11:29:07 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote: Why would you put line breaks in the CJK source, then? Isn't the problem solved with the least heuristics by the producer not putting breaks there? It would be even better if they would just speak English. :-) White space is not particularly meaningful in some of these languages, so we cannot expect them to suddenly pay attention to that just so they can use Atom. There will be plenty of content from other formats with this linguistically meaningless white space. If we get this wrong, Atom-delivered content will look broken in some languages, and a bunch of extra-spec practice will build up about how to fix it. Much better to get it right in 1.0. wunder -- Walter Underwood Principal Architect, Verity
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
* Thomas Broyer [EMAIL PROTECTED] [2005-05-03 19:35]: This means type=text content is a single paragraph of text. If you need paragraphs, lists or any other structural formatting, you have to use type=html or type=xhtml with the appropriate content. Or type=text/plain, Id assume? Regards, -- Aristotle
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
On Apr 29, 2005, at 12:17, Martin Duerst wrote: Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words. It doesn't work for languages that don't have spaces between words (Chinese, Japanese, Thai,...). If Text elements are only used for short things such as names or titles, that's not a big issue, the text in question can just be put on a single line. However, when the texts in question are long, it's a serious issue, and should be fixed. You seem to be assuming that the length of a line is restricted in XML source. Why? As far as I can tell, it should be permissible to produce Atom documents that contain no LF or CR characters. Can't languages without spaces use long source lines and apply soft wrapping in a source view if necessary? Why is this a wire format problem? -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
At 03:33 05/04/29, Alexey Melnikov wrote: The file can be obtained via http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-08.txt 3.1.1.1 Text If the value is text, the content of the Text construct MUST NOT contain child elements. Such text is intended to be presented to humans in a readable fashion. Thus, Atom Processors MAY collapse white-space (including line-breaks), Ok, maybe it is just me, but what does it mean to collapse white-space? Does this mean to replace FWS (in RFC 2822 sense) with a single space? Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words. It doesn't work for languages that don't have spaces between words (Chinese, Japanese, Thai,...). If Text elements are only used for short things such as names or titles, that's not a big issue, the text in question can just be put on a single line. However, when the texts in question are long, it's a serious issue, and should be fixed. and display the text using typographic techniques such as justification and proportional fonts. 4.1.3.3 Processing Model ... 2. If the value of type is html, the content of atom:content MUST NOT contain child elements, and SHOULD be suitable for handling as HTML [HTML]. The HTML markup must be escaped; for Should the must be changed to MUST here? Yes, please! 6.3 Software Processing of Foreign Markup ... When unknown foreign markup is encountered in a Text Contruct or atom:content element, software SHOULD ignore the markup and process any text content of foreign elements as though the surrounding markup were not present. I reread this paragraph few times and I am still not quite sure what the paragraph is trying to say. Is it trying to say if the content of a foreign element looks like XML with unrecognized schema - just strip the markup and process the text? Reading this, I got confused because we have both Text Construct and Text as subtitles. I suggest to change the subtitle Text to something like Text Construct with type='text' or so. Also, starting a section with just an example looks weird. Please add an introductory sentence. Same of course for the parallel subsections. Regards,Martin.
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
On 4/29/05, Martin Duerst [EMAIL PROTECTED] wrote: At 03:33 05/04/29, Alexey Melnikov wrote: Ok, maybe it is just me, but what does it mean to collapse white-space? Does this mean to replace FWS (in RFC 2822 sense) with a single space? Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words. It doesn't work for languages that don't have spaces between words (Chinese, Japanese, Thai,...). If Text elements are only used for short things such as names or titles, that's not a big issue, the text in question can just be put on a single line. However, when the texts in question are long, it's a serious issue, and should be fixed. I believe the intent of this text was to match HTML's text treatment, so that implementations can avoid preprocessing whitespace. http://www.w3.org/TR/html4/struct/text.html#h-9.1 Suggestions for less vague text is welcome, but I want to make sure the text remains comprehensible to non-experts. Robert Sayre
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
Martin Duerst wrote: At 03:33 05/04/29, Alexey Melnikov wrote: If the value is text, the content of the Text construct MUST NOT contain child elements. Such text is intended to be presented to humans in a readable fashion. Thus, Atom Processors MAY collapse white-space (including line-breaks), Ok, maybe it is just me, but what does it mean to collapse white-space? Does this mean to replace FWS (in RFC 2822 sense) with a single space? Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words. It doesn't work for languages that don't have spaces between words (Chinese, Japanese, Thai,...). If Text elements are only used for short things such as names or titles, that's not a big issue, the text in question can just be put on a single line. However, when the texts in question are long, it's a serious issue, and should be fixed. My understanding of type=text is that this is just text without any formatting. Hence, it is not meant to be preformatted text such as text/plain or inside an (X)HTML pre. This means type=text content is a single paragraph of text. If you need paragraphs, lists or any other structural formatting, you have to use type=html or type=xhtml with the appropriate content. I was about to writing a Pace about white-space handling in type=text (either using xml:space or an attribute that would have mimic'd the white-space CSS property) when I understood/recalled that Text Constructs have accessibility in mind (hence their limitation to textual contents) and preformatted text is not accessible enough. -- Thomas Broyer
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
On 28 Apr 2005, at 7:33 pm, Alexey Melnikov wrote: Ok, maybe it is just me, but what does it mean to collapse white- space? Does this mean to replace FWS (in RFC 2822 sense) with a single space? Since the statement is a MAY, I don't think any exact meaning is necessary. It's simply a hint to publishers that whitespace may not be preserved. On 29 Apr 2005, at 10:17 am, Martin Duerst wrote: Making this more precise is definitely desirable. But there is also an i18n issue: This works fine for languages that use spaces between words. It doesn't work for languages that don't have spaces between words (Chinese, Japanese, Thai,...). If Text elements are only used for short things such as names or titles, that's not a big issue, the text in question can just be put on a single line. However, when the texts in question are long, it's a serious issue, and should be fixed. A consumer may do anything that can reasonably be described as collapsing whitespace, but are not required to. How does this cause problems in Asian languages? Graham
Re: Last Call: 'The Atom Syndication Format' to Proposed Standard
The IESG wrote: The IESG has received a request from the Atom Publishing Format and Protocol WG to consider the following document: - 'The Atom Syndication Format ' draft-ietf-atompub-format-08.txt as a Proposed Standard The IESG plans to make a decision in the next few weeks, and solicits final comments on this action. Please send any comments to the iesg@ietf.org or ietf@ietf.org mailing lists by 2005-05-04. The file can be obtained via http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-08.txt In general the document looks good to me. Some minor comments (and few questions), mostly nitpicking below: 3.1.1.1 Text Example atom:title with text content: ... title type=text Less: lt; /title ... If the value is text, the content of the Text construct MUST NOT contain child elements. Such text is intended to be presented to humans in a readable fashion. Thus, Atom Processors MAY collapse white-space (including line-breaks), Ok, maybe it is just me, but what does it mean to collapse white-space? Does this mean to replace FWS (in RFC 2822 sense) with a single space? and display the text using typographic techniques such as justification and proportional fonts. 4.1.3.3 Processing Model ... 2. If the value of type is html, the content of atom:content MUST NOT contain child elements, and SHOULD be suitable for handling as HTML [HTML]. The HTML markup must be escaped; for Should the must be changed to MUST here? example, br as lt;br. The HTML markup SHOULD be such that it could validly appear directly within an HTML DIV element. Atom Processors that display the content MAY use the markup to aid in displaying it. ... 6. For all other values of type, the content of atom:content MUST be a valid Base64 encoding [RFC3548], which when decoded SHOULD I have to note that the RFC 3548 has 2 base64 alphabets: in section 3 and in section 4. You probably want the more common one in section 3, but this has to be stated explicitly. 6.3 Software Processing of Foreign Markup ... When unknown foreign markup is encountered in a Text Contruct or atom:content element, software SHOULD ignore the markup and process any text content of foreign elements as though the surrounding markup were not present. I reread this paragraph few times and I am still not quite sure what the paragraph is trying to say. Is it trying to say if the content of a foreign element looks like XML with unrecognized schema - just strip the markup and process the text? Regards, Alexey