Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Graham
On 8 May 2005, at 4:30 am, Walter Underwood wrote:
White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom. There will be plenty of content from other formats
with this linguistically meaningless white space.
So the idea is that whitespace should not appear at all in certain  
texts, and you'd like it to be stripped out at the consumer? There  
are only three possible ways for this to happen:

1) The consumer removes all whitespace, even in western texts
2) The consumer recognizes these languages and removes the whitespace  
automatically
3) The consumer is told what to do by an attribute

(1) is obviously not plausible, but included for completeness. (2) is  
impractical. (3) is plausible, but may or may not end up being  
implemented in all consumers, making it kind of useless.

I don't see how there's a better solution than texts that shouldn't  
be shown with whitespace not containing whitespace in the first  
place, which is what we have.

Graham


RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Walter Underwood

--On May 10, 2005 8:57:47 AM -0400 Scott Hollenbeck [EMAIL PROTECTED] wrote:

 I have to agree with Paul.  I don't believe that the issue of white space in
 the syndicated content is really an Atompub issue.  It might be an issue for
 the content creator.  It might be an issue for the reader.  As long as the
 pipe between the two passes the content as submitted, though, the pipe has
 done its job.

If publishers and subscribers have obstacles to using Atom, that sounds
like a problem to me.

Everyone has this problem is not a good reason to ignore it. Someone
has to be the first to solve it, might as well be us. It is not acceptable
to build formats for the English Wide Web. That doesn't exist any more.

wunder
--
Walter Underwood
Principal Architect, Verity



RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Paul Hoffman
At 8:16 AM -0700 5/10/05, Walter Underwood wrote:
If publishers and subscribers have obstacles to using Atom, that sounds
like a problem to me.
It is a problem, of course.
Everyone has this problem is not a good reason to ignore it.
No one is ignoring it. This thread started because the format draft 
pointed out at least one aspect of the problem, which is more than 
most other RFCs do.

 Someone
has to be the first to solve it, might as well be us.
May I suggest that there are groups with more experience in the area 
than ours that would be more appropriate? In specific, since this 
problem affects all internationalized text, the Unicode Consortium 
has a much higher chance of solving the problem than an IETF 
Working Group who is focused on a syndication format.

If you have a proposed solution to the problem (you didn't include 
one in your message to the WG), the Unicode Consortium is quite open 
to outside input on this type of thing.

It is not acceptable
to build formats for the English Wide Web. That doesn't exist any more.
That is both grossly insulting to those of us have spent a great deal 
of time trying to make the Internet internationalization-friendly, 
and is also grossly technically inaccurate, unless you consider every 
written language other than Chinese, Japanese Kanji, Burmese, Khmer, 
Thai, Tagalog, Lao, and Tibetan to be English. (The folks who speak 
all the other languages might find you calling them English to be 
insulting too, of course.)

--Paul Hoffman, Director
--Internet Mail Consortium


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Sam Hartman

 Scott == Scott Hollenbeck [EMAIL PROTECTED] writes:

 I'm not asking for a lot of text; probably something about as
 long as this message.
 
 I believe that it can be a lot shorter: given the rationale
 above, it's not a problem for Atompub or any other XML-using
 protocol. For that matter, it's not really and XML problem at
 all: it affects text formats like HTML and RFC 2822 as well.

Scott I have to agree with Paul.  I don't believe that the issue
Scott of white space in the syndicated content is really an
Scott Atompub issue.  It might be an issue for the content
Scott creator.  It might be an issue for the reader.  As long as
Scott the pipe between the two passes the content as submitted,
Scott though, the pipe has done its job.

Except that we try to build deployable protocols.  If there aren't
content creation tools that can do the right thing then it becomes a
deployment issue for atompub.

A perfectly reasonable response would be that you've thought about and
understood the problem and there are sufficient tools available that
can work with your proposed pipe that you don't need to care about the
issue.

--Sam



RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Scott Hollenbeck

 A perfectly reasonable response would be that you've thought about and
 understood the problem and there are sufficient tools available that
 can work with your proposed pipe that you don't need to care about the
 issue.

Paul described text that's in the document to describe what MAY be done.  I
would argue that the existing text is evidence of the thought that has gone
into understanding the issue and the alleged problem.

-Scott-



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Paul Hoffman
At 2:14 PM -0400 5/10/05, Sam Hartman wrote:
Except that we try to build deployable protocols.  If there aren't
content creation tools that can do the right thing then it becomes a
deployment issue for atompub.
True. Fortunately, there have been plenty of text editing tools that 
work with the no spaces between words languages for at least 20 
years in the case of Chinese and Japanese Kanji (probably 15 years 
for the other languages).

A perfectly reasonable response would be that you've thought about and
understood the problem and there are sufficient tools available that
can work with your proposed pipe that you don't need to care about the
issue.
I'll make that response. :-)
--Paul Hoffman, Director
--Internet Mail Consortium


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Thomas Broyer
A. Pagaltzis wrote:
* Thomas Broyer [EMAIL PROTECTED] [2005-05-03 19:35]:
This means type=text content is a single paragraph of text.
If you need paragraphs, lists or any other structural
formatting, you have to use type=html or type=xhtml with
the appropriate content.

Or type=text/plain, Id assume?
If you're talking about atom:content, not for Text Constructs.
--
Thomas Broyer


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Robert Sayre

On 5/9/05, Sam Hartman [EMAIL PROTECTED] wrote:
 At least based on the discussion the IESG has been copied on, it
 doesn't sound like the working group has fully considered this issue.
 The responses have more of the character of those found from people
 trying to brush aside an issue than of people who have carefully
 considered something and concluded there is nothing to be done.
 
 Moreover, thisn issue cannot be unique to atom: it must effect many
 XML based protocols both within the IETF and within other standards
 organizations.

Martin,

I agree with Sam on both points. Can you give us an example of an XML
format that successfully deals with your issue?

Does XHTML differ from Atom?

Robert Sayre



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Paul Hoffman
At 9:33 AM -0400 5/9/05, Sam Hartman wrote:
My personal opinion as someone who is very shortly going to have to
evaluate the atom specification is that you've identified an issue
(space and line breaking) for some languages that should be
considered.  Your proposed solution seems highly undesirable in that
it requires us to understand the language of the text being displayed.
In the past we've had all sorts of problems doing that.  Your proposed
solution also seems quite complicated.
Fully agree. Please note the text in the spec we are working from:
   If the value is text, the content of the Text construct MUST NOT
   contain child elements.  Such text is intended to be presented to
   humans in a readable fashion.  Thus, Atom Processors MAY collapse
   white-space (including line-breaks), and display the text using
   typographic techniques such as justification and proportional fonts.
FWIW, this appears twice, identically, in the spec.
Martin Dürst brought up CJK (well, actually CJT), saying that they 
don't use inter-word spacing. That's fine, but it is irrelevant to 
the text in the draft. If some text comes through with no spaces, 
there is no white space to collapse. His argument that some XML 
editors make long lines of text difficult to edit is clearly *way* 
out of scope for Atom, or any other XML-using protocol for that 
matter.

It may well be that the solutions to this problem are worse than the
problem itself.  However I think it is important to specifically
understand that is the case rather than failing to solve the problem
because we failed to understand it.
The case is that text that is supposed to be read by humans comes 
in many forms, with different line lengths, and so on. The paragraph 
from the spec says that Atom processors may alter these so that they 
can be presented better for the reader. Of course, they may also 
alter it to make it less readable, as many mail user agents do 
(sigh). Regardless, this says that the Atom processor is free to 
present things in text constructs in any fashion it deems suitable. 
This is particularly important for making Atom content accessible; 
for example, the Atom processor can use this rule to present text 
content by reading it aloud, by putting it on a screen greatly 
magnified one character at a time, and so on.

At least based on the discussion the IESG has been copied on, it
doesn't sound like the working group has fully considered this issue.
The responses have more of the character of those found from people
trying to brush aside an issue than of people who have carefully
considered something and concluded there is nothing to be done.
Sorry, but that's unfair. Alexy asked Ok, maybe it is just me, but 
what does it mean to collapse white-space? Does this mean to 
replace FWS (in RFC 2822 sense) with a single space? Martin's 
response was orthogonal: Making this more precise is definitely 
desirable. But there is also an i18n issue: This works fine for 
languages that use spaces between words. The rest of the thread 
wandered into the weeds because it was hard to figure out what was 
being discussed.

Moreover, thisn issue cannot be unique to atom: it must effect many
XML based protocols both within the IETF and within other standards
organizations.
Any protocol that has XML that includes human-readable text has this 
issue. Well, the processors of that XML does; the protocols 
themselves do not.

Anyway as someone evaluating atompub's output it would be very useful
if the working group responded to this last call comment.  IN my mind
a response would start with a researched description of the issue:
either confirm that Chinese and Japanese and Thai tools work as
described or explain how they actually work.  Then describe what other
standards have done about this problem.  Finally describe what atompub
has done about the problem and why. 

I'm not asking for a lot of text; probably something about as long as
this message.
I believe that it can be a lot shorter: given the rationale above, 
it's not a problem for Atompub or any other XML-using protocol. For 
that matter, it's not really and XML problem at all: it affects text 
formats like HTML and RFC 2822 as well.

--Paul Hoffman, Director
--Internet Mail Consortium


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread fantasai
Henri Sivonen wrote:
On May 8, 2005, at 06:30, Walter Underwood wrote:
White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom.
Why not? We expect them not no insert other random characters there. 
What do the same producers do with XHTML? Opera 7.53 and Safari 1.3 
render a space between the second and third Kanji in
http://hsivonen.iki.fi/test/cjk-whitespace.xhtml
See also Ishida's tests:
http://www.w3.org/International/tests/results/white-space-ideograph
Special handling of white-space in CJK context is accounted for in the
CSS2.1 spec (and will be described in more detail in CSS3 Text).
There will be plenty of content from other formats
with this linguistically meaningless white space.
Why not just get rid of it in the producer end like you have to get rid 
of form feeds?
Because form feeds are normally not used in source code files whereas
line breaks and indendation often are?
~fantasai


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-08 Thread Henri Sivonen
On May 8, 2005, at 06:30, Walter Underwood wrote:
White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom.
Why not? We expect them not no insert other random characters there. 
What do the same producers do with XHTML? Opera 7.53 and Safari 1.3 
render a space between the second and third Kanji in
http://hsivonen.iki.fi/test/cjk-whitespace.xhtml

There will be plenty of content from other formats
with this linguistically meaningless white space.
Why not just get rid of it in the producer end like you have to get rid 
of form feeds?

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-07 Thread Martin Duerst
At 02:27 05/05/04, Thomas Broyer wrote:

Martin Duerst wrote:
 At 03:33 05/04/29, Alexey Melnikov wrote:
  If the value is text, the content of the Text construct MUST NOT
  contain child elements.  Such text is intended to be presented to
  humans in a readable fashion.  Thus, Atom Processors MAY collapse
  white-space (including line-breaks),
  
  Ok, maybe it is just me, but what does it mean to collapse 
white-space? Does this mean to replace FWS (in RFC 2822 sense) with a 
single space?
 Making this more precise is definitely desirable. But there is also
 an i18n issue: This works fine for languages that use spaces between
 words. It doesn't work for languages that don't have spaces between
 words (Chinese, Japanese, Thai,...). If Text elements are only used
 for short things such as names or titles, that's not a big issue,
 the text in question can just be put on a single line. However,
 when the texts in question are long, it's a serious issue, and
 should be fixed.

My understanding of type=text is that this is just text without any 
formatting.

That's my understanding, too.
Hence, it is not meant to be preformatted text such as text/plain or 
inside an (X)HTML pre.

Yes. But that's exactly where the spacing problems with Chinese/Japanese/Thai
are. There are no such problems for preformatted text, because the line breaking
in the source (as sent) is the same as the line breaking when displayed.
For free-flowing text, however, the line breaks in the source and those in
the display are not (necessarily) the same, and so linebreaks have to be
changed to spaces for Western languages, but to nothing for Chinese/Japanese
(and most possibly to a zero-width non-breaking space for Thai), and the spec
has to say something about this.
Regards,Martin.
This means type=text content is a single paragraph of text. If you 
need paragraphs, lists or any other structural formatting, you have to 
use type=html or type=xhtml with the appropriate content.

I was about to writing a Pace about white-space handling in type=text 
(either using xml:space or an attribute that would have mimic'd the 
white-space CSS property) when I understood/recalled that Text 
Constructs have accessibility in mind (hence their limitation to textual 
contents) and preformatted text is not accessible enough.

--
Thomas Broyer
 



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-07 Thread Walter Underwood

--On May 7, 2005 11:29:07 AM +0300 Henri Sivonen [EMAIL PROTECTED] wrote:

 Why would you put line breaks in the CJK source, then? Isn't the problem
 solved with the least heuristics by the producer not putting breaks there?

It would be even better if they would just speak English. :-)

White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom. There will be plenty of content from other formats
with this linguistically meaningless white space.

If we get this wrong, Atom-delivered content will look broken in
some languages, and a bunch of extra-spec practice will build up about
how to fix it. Much better to get it right in 1.0.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-05 Thread A. Pagaltzis

* Thomas Broyer [EMAIL PROTECTED] [2005-05-03 19:35]:
 This means type=text content is a single paragraph of text.
 If you need paragraphs, lists or any other structural
 formatting, you have to use type=html or type=xhtml with
 the appropriate content.

Or type=text/plain, Id assume?

Regards,
-- 
Aristotle



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-04 Thread Henri Sivonen
On Apr 29, 2005, at 12:17, Martin Duerst wrote:
Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
You seem to be assuming that the length of a line is restricted in 
XML source. Why? As far as I can tell, it should be permissible to 
produce Atom documents that contain no LF or CR characters.

Can't languages without spaces use long source lines and apply soft 
wrapping in a source view if necessary? Why is this a wire format 
problem?

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Martin Duerst
At 03:33 05/04/29, Alexey Melnikov wrote:
The file can be obtained via
http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-08.txt
 3.1.1.1  Text
If the value is text, the content of the Text construct MUST NOT
contain child elements.  Such text is intended to be presented to
humans in a readable fashion.  Thus, Atom Processors MAY collapse
white-space (including line-breaks),

Ok, maybe it is just me, but what does it mean to collapse white-space? 
Does this mean to replace FWS (in RFC 2822 sense) with a single space?

Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
  and display the text using
typographic techniques such as justification and proportional fonts.


 4.1.3.3  Processing Model
...
2.  If the value of type is html, the content of atom:content
MUST NOT contain child elements, and SHOULD be suitable for
handling as HTML [HTML].  The HTML markup must be escaped; for

Should the must be changed to MUST here?
Yes, please!
 6.3  Software Processing of Foreign Markup
 
...
When unknown foreign markup is encountered in a Text Contruct or
atom:content element, software SHOULD ignore the markup and process
any text content of foreign elements as though the surrounding markup
were not present.

I reread this paragraph few times and I am still not quite sure what the 
paragraph is trying to say. Is it trying to say if the content of a 
foreign element looks like XML with unrecognized schema - just strip the 
markup and process the text?

Reading this, I got confused because we have both Text Construct
and Text as subtitles. I suggest to change the subtitle Text to
something like Text Construct with type='text' or so. Also, starting
a section with just an example looks weird. Please add an introductory
sentence. Same of course for the parallel subsections.
Regards,Martin. 



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Robert Sayre

On 4/29/05, Martin Duerst [EMAIL PROTECTED] wrote:
 At 03:33 05/04/29, Alexey Melnikov wrote:
  
  Ok, maybe it is just me, but what does it mean to collapse white-space?
 Does this mean to replace FWS (in RFC 2822 sense) with a single space?
 
 Making this more precise is definitely desirable. But there is also
 an i18n issue: This works fine for languages that use spaces between
 words. It doesn't work for languages that don't have spaces between
 words (Chinese, Japanese, Thai,...). If Text elements are only used
 for short things such as names or titles, that's not a big issue,
 the text in question can just be put on a single line. However,
 when the texts in question are long, it's a serious issue, and
 should be fixed.

I believe the intent of this text was to match HTML's text treatment,
so that implementations can avoid preprocessing whitespace.

http://www.w3.org/TR/html4/struct/text.html#h-9.1

Suggestions for less vague text is welcome, but I want to make sure
the text remains comprehensible to non-experts.

Robert Sayre



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Thomas Broyer
Martin Duerst wrote:
At 03:33 05/04/29, Alexey Melnikov wrote:
 If the value is text, the content of the Text construct MUST NOT
 contain child elements.  Such text is intended to be presented to
 humans in a readable fashion.  Thus, Atom Processors MAY collapse
 white-space (including line-breaks),
 
 Ok, maybe it is just me, but what does it mean to collapse 
 white-space? Does this mean to replace FWS (in RFC 2822 sense) with a 
 single space?

Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
My understanding of type=text is that this is just text without any 
formatting. Hence, it is not meant to be preformatted text such as 
text/plain or inside an (X)HTML pre.

This means type=text content is a single paragraph of text. If you 
need paragraphs, lists or any other structural formatting, you have to 
use type=html or type=xhtml with the appropriate content.

I was about to writing a Pace about white-space handling in type=text 
(either using xml:space or an attribute that would have mimic'd the 
white-space CSS property) when I understood/recalled that Text 
Constructs have accessibility in mind (hence their limitation to textual 
contents) and preformatted text is not accessible enough.

--
Thomas Broyer


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Graham
On 28 Apr 2005, at 7:33 pm, Alexey Melnikov wrote:
Ok, maybe it is just me, but what does it mean to collapse white- 
space? Does this mean to replace FWS (in RFC 2822 sense) with a  
single space?
Since the statement is a MAY, I don't think any exact meaning is  
necessary. It's simply a hint to publishers that whitespace may not  
be preserved.

On 29 Apr 2005, at 10:17 am, Martin Duerst wrote:
Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
A consumer may do anything that can reasonably be described as  
collapsing whitespace, but are not required to. How does this cause  
problems in Asian languages?

Graham


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-04-28 Thread Alexey Melnikov
The IESG wrote:
The IESG has received a request from the Atom Publishing Format and Protocol 
WG to consider the following document:

- 'The Atom Syndication Format '
  draft-ietf-atompub-format-08.txt as a Proposed Standard
The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action.  Please send any comments to the
iesg@ietf.org or ietf@ietf.org mailing lists by 2005-05-04.
The file can be obtained via
http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-08.txt
 

In general the document looks good to me. Some minor comments (and few 
questions), mostly nitpicking below:

3.1.1.1  Text

   Example atom:title with text content:

   ...
   title type=text
 Less: lt;
   /title
   ...

   If the value is text, the content of the Text construct MUST NOT
   contain child elements.  Such text is intended to be presented to
   humans in a readable fashion.  Thus, Atom Processors MAY collapse
   white-space (including line-breaks),
Ok, maybe it is just me, but what does it mean to collapse 
white-space? Does this mean to replace FWS (in RFC 2822 sense) with a 
single space?

 and display the text using
   typographic techniques such as justification and proportional fonts.
4.1.3.3  Processing Model
...
   2.  If the value of type is html, the content of atom:content
   MUST NOT contain child elements, and SHOULD be suitable for
   handling as HTML [HTML].  The HTML markup must be escaped; for
Should the must be changed to MUST here?
   example, br as lt;br.  The HTML markup SHOULD be such
   that it could validly appear directly within an HTML DIV
   element.  Atom Processors that display the content MAY use the
   markup to aid in displaying it.
...
   6.  For all other values of type, the content of atom:content MUST
   be a valid Base64 encoding [RFC3548], which when decoded SHOULD
I have to note that the RFC 3548 has 2 base64 alphabets: in section 3 
and in section 4. You probably want the more common one in section 3, 
but this has to be stated explicitly.

6.3  Software Processing of Foreign Markup

...
   When unknown foreign markup is encountered in a Text Contruct or
   atom:content element, software SHOULD ignore the markup and process
   any text content of foreign elements as though the surrounding markup
   were not present.
I reread this paragraph few times and I am still not quite sure what the 
paragraph is trying to say. Is it trying to say if the content of a 
foreign element looks like XML with unrecognized schema - just strip the 
markup and process the text?

Regards,
Alexey