Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Martin Duerst
Hello Sam, others,
At 22:33 05/05/09, Sam Hartman wrote:
>
>> "Martin" == Martin Duerst <[EMAIL PROTECTED]> writes:
>
>Martin> At 17:29 05/05/07, Henri Sivonen wrote:
>>>  On May 4, 2005, at 04:39, Martin Duerst wrote:
>>>> For free-flowing text, however, the line breaks in the source
>>>> and those in the display are not (necessarily) the same, and
>>>> so linebreaks have to be changed to spaces for Western
>>>> languages, but to nothing for Chinese/Japanese (and most
>>>> possibly to a zero-width non-breaking space for Thai), and the
>>>> spec has to say something about this.
>>>  Why would you put line breaks in the CJK source, then? Isn't
>>> the
>Martin> "problem" solved with the least heuristics by the producer
>Martin> not putting breaks there?
>
>Martin> People in China, Japan, and so on (Korean uses spaces, so
>Martin> it's not CJK) tend to use similar tools to those in the
>Martin> western world. Tools for editing XML, e.g., usually don't
>Martin> make it easy to edit very long lines because they assume
>Martin> that such long lines can be broken. So it's not as easy as
>Martin> it looks for the producer.
>
>My personal opinion as someone who is very shortly going to have to
>evaluate the atom specification is that you've identified an issue
>(space and line breaking) for some languages that should be
>considered.  Your proposed solution seems highly undesirable in that
>it requires us to understand the language of the text being displayed.
>In the past we've had all sorts of problems doing that.  Your proposed
>solution also seems quite complicated.
It's the solution that is being adopted for HTML and CSS, based on a lot
of experience. See Fantasai's post. This also points to a way to solve
this, similar to how this is being addressed by other XML-based specs
starting with XHMTL: Assume that getting space collapsing/removal right
isn't the job of the XML-based format, but the job of the rendering
engine and styling. In that sense, Paul and Scott are right that other
XML formats themselves don't deal with that.
So I think there would be two possible solutions:
- Leave the text as is, with the implication that this includes possible
  removal of all spacing around line ends in some cases. I.e. answering
  Alex's question to the negative, even if on the list only.
- Being more explicit about what is going on. This may include as little
  as saying that the details are up to rendering/styling mechanisms
  such as CSS or XSL-FO, but could go further if deemed necessary.
  In my eyes, this would be desirable, at least to give implementers
  (of viewers, not of "Atom processors") some hints.
A solution that, explicitly or implicitly, says that languages having
word spaces can decently format and indent their XML source, but
languages that don't have word spaces have to use very long lines
in the source for paragraphs, as suggested by some of the commenters
on this list, isn't appropriate at all.
>It may well be that the solutions to this problem are worse than the
>problem itself.  However I think it is important to specifically
>understand that is the case rather than failing to solve the problem
>because we failed to understand it.
The solutions are not worse than the problem itself. But Paul is
right that it may not be a good idea for Atom to deal with them,
because this is an issue on another layer, which shouldn't be
dealt with separately by each format spec.
Regards,Martin. 



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Paul Hoffman
At 2:14 PM -0400 5/10/05, Sam Hartman wrote:
Except that we try to build deployable protocols.  If there aren't
content creation tools that can do the right thing then it becomes a
deployment issue for atompub.
True. Fortunately, there have been plenty of text editing tools that 
work with the "no spaces between words" languages for at least 20 
years in the case of Chinese and Japanese Kanji (probably 15 years 
for the other languages).

A perfectly reasonable response would be that you've thought about and
understood the problem and there are sufficient tools available that
can work with your proposed pipe that you don't need to care about the
issue.
I'll make that response. :-)
--Paul Hoffman, Director
--Internet Mail Consortium


RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Scott Hollenbeck

> A perfectly reasonable response would be that you've thought about and
> understood the problem and there are sufficient tools available that
> can work with your proposed pipe that you don't need to care about the
> issue.

Paul described text that's in the document to describe what MAY be done.  I
would argue that the existing text is evidence of the thought that has gone
into understanding the issue and the alleged problem.

-Scott-



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Sam Hartman

> "Scott" == Scott Hollenbeck <[EMAIL PROTECTED]> writes:

>> >I'm not asking for a lot of text; probably something about as
>> long as >this message.
>> 
>> I believe that it can be a lot shorter: given the rationale
>> above, it's not a problem for Atompub or any other XML-using
>> protocol. For that matter, it's not really and XML problem at
>> all: it affects text formats like HTML and RFC 2822 as well.

Scott> I have to agree with Paul.  I don't believe that the issue
Scott> of white space in the syndicated content is really an
Scott> Atompub issue.  It might be an issue for the content
Scott> creator.  It might be an issue for the reader.  As long as
Scott> the pipe between the two passes the content as submitted,
Scott> though, the pipe has done its job.

Except that we try to build deployable protocols.  If there aren't
content creation tools that can do the right thing then it becomes a
deployment issue for atompub.

A perfectly reasonable response would be that you've thought about and
understood the problem and there are sufficient tools available that
can work with your proposed pipe that you don't need to care about the
issue.

--Sam



RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Paul Hoffman
At 8:16 AM -0700 5/10/05, Walter Underwood wrote:
If publishers and subscribers have obstacles to using Atom, that sounds
like a problem to me.
It is a problem, of course.
"Everyone has this problem" is not a good reason to ignore it.
No one is ignoring it. This thread started because the format draft 
pointed out at least one aspect of the problem, which is more than 
most other RFCs do.

 Someone
has to be the first to solve it, might as well be us.
May I suggest that there are groups with more experience in the area 
than ours that would be more appropriate? In specific, since this 
problem affects all internationalized text, the Unicode Consortium 
has a much higher chance of "solving" the problem than an IETF 
Working Group who is focused on a syndication format.

If you have a proposed solution to the problem (you didn't include 
one in your message to the WG), the Unicode Consortium is quite open 
to outside input on this type of thing.

It is not acceptable
to build formats for the "English Wide Web". That doesn't exist any more.
That is both grossly insulting to those of us have spent a great deal 
of time trying to make the Internet internationalization-friendly, 
and is also grossly technically inaccurate, unless you consider every 
written language other than Chinese, Japanese Kanji, Burmese, Khmer, 
Thai, Tagalog, Lao, and Tibetan to be "English". (The folks who speak 
all the other languages might find you calling them "English" to be 
insulting too, of course.)

--Paul Hoffman, Director
--Internet Mail Consortium


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Henri Sivonen
On May 10, 2005, at 18:16, Walter Underwood wrote:
"Everyone has this problem" is not a good reason to ignore it. Someone
has to be the first to solve it, might as well be us. It is not 
acceptable
to build formats for the "English Wide Web". That doesn't exist any 
more.
I believe the problem should be addressed in the producer end. Not in 
the wire format nor at the consumer end.

No one is preventing non-English text in Atom--just saying that if you 
don't want stuff to appear in the feed do not put the stuff there and 
expect the consumer to take it out. It's the usual "It hurts when I do 
this. -- Don't do it then." situation.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Scott Hollenbeck

> -Original Message-
> From: Walter Underwood [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, May 10, 2005 11:16 AM
> To: Scott Hollenbeck; 'Paul Hoffman'; iesg@ietf.org; 'Atom WG'
> Subject: RE: Last Call: 'The Atom Syndication Format' to 
> Proposed Standard
> 
> 
> 
> --On May 10, 2005 8:57:47 AM -0400 Scott Hollenbeck 
> <[EMAIL PROTECTED]> wrote:
> >
> > I have to agree with Paul.  I don't believe that the issue 
> of white space in
> > the syndicated content is really an Atompub issue.  It 
> might be an issue for
> > the content creator.  It might be an issue for the reader.  
> As long as the
> > pipe between the two passes the content as submitted, 
> though, the pipe has
> > done its job.
> 
> If publishers and subscribers have obstacles to using Atom, 
> that sounds
> like a problem to me.

Why?  If the "problem" is in the content creation and consumption processes,
why must the conduit provide the solution?  I agree that Atompub shouldn't
make the problem worse, but I don't believe that it should attempt to solve
endpoint problems.

> "Everyone has this problem" is not a good reason to ignore it. Someone
> has to be the first to solve it, might as well be us. It is 
> not acceptable
> to build formats for the "English Wide Web". That doesn't 
> exist any more.

Sorry, but I disagree with the "might as well be us" part.  Atompub is
chartered to define a syndication format and protocol, not to solve the
web's i18n problems.

-Scott-




RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Walter Underwood

--On May 10, 2005 8:57:47 AM -0400 Scott Hollenbeck <[EMAIL PROTECTED]> wrote:
>
> I have to agree with Paul.  I don't believe that the issue of white space in
> the syndicated content is really an Atompub issue.  It might be an issue for
> the content creator.  It might be an issue for the reader.  As long as the
> pipe between the two passes the content as submitted, though, the pipe has
> done its job.

If publishers and subscribers have obstacles to using Atom, that sounds
like a problem to me.

"Everyone has this problem" is not a good reason to ignore it. Someone
has to be the first to solve it, might as well be us. It is not acceptable
to build formats for the "English Wide Web". That doesn't exist any more.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Graham
On 8 May 2005, at 4:30 am, Walter Underwood wrote:
White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom. There will be plenty of content from other formats
with this linguistically meaningless white space.
So the idea is that whitespace should not appear at all in certain  
texts, and you'd like it to be stripped out at the consumer? There  
are only three possible ways for this to happen:

1) The consumer removes all whitespace, even in western texts
2) The consumer recognizes these languages and removes the whitespace  
automatically
3) The consumer is told what to do by an attribute

(1) is obviously not plausible, but included for completeness. (2) is  
impractical. (3) is plausible, but may or may not end up being  
implemented in all consumers, making it kind of useless.

I don't see how there's a better solution than texts that shouldn't  
be shown with whitespace not containing whitespace in the first  
place, which is what we have.

Graham


RE: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-10 Thread Scott Hollenbeck

> >I'm not asking for a lot of text; probably something about as long as
> >this message.
> 
> I believe that it can be a lot shorter: given the rationale above, 
> it's not a problem for Atompub or any other XML-using protocol. For 
> that matter, it's not really and XML problem at all: it affects text 
> formats like HTML and RFC 2822 as well.

I have to agree with Paul.  I don't believe that the issue of white space in
the syndicated content is really an Atompub issue.  It might be an issue for
the content creator.  It might be an issue for the reader.  As long as the
pipe between the two passes the content as submitted, though, the pipe has
done its job.

-Scott-




Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread fantasai
Henri Sivonen wrote:
On May 8, 2005, at 06:30, Walter Underwood wrote:
White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom.
Why not? We expect them not no insert other random characters there. 
What do the same producers do with XHTML? Opera 7.53 and Safari 1.3 
render a space between the second and third Kanji in
http://hsivonen.iki.fi/test/cjk-whitespace.xhtml
See also Ishida's tests:
http://www.w3.org/International/tests/results/white-space-ideograph
Special handling of white-space in CJK context is accounted for in the
CSS2.1 spec (and will be described in more detail in CSS3 Text).
There will be plenty of content from other formats
with this linguistically meaningless white space.
Why not just get rid of it in the producer end like you have to get rid 
of form feeds?
Because form feeds are normally not used in source code files whereas
line breaks and indendation often are?
~fantasai


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Paul Hoffman
At 9:33 AM -0400 5/9/05, Sam Hartman wrote:
My personal opinion as someone who is very shortly going to have to
evaluate the atom specification is that you've identified an issue
(space and line breaking) for some languages that should be
considered.  Your proposed solution seems highly undesirable in that
it requires us to understand the language of the text being displayed.
In the past we've had all sorts of problems doing that.  Your proposed
solution also seems quite complicated.
Fully agree. Please note the text in the spec we are working from:
   If the value is "text", the content of the Text construct MUST NOT
   contain child elements.  Such text is intended to be presented to
   humans in a readable fashion.  Thus, Atom Processors MAY collapse
   white-space (including line-breaks), and display the text using
   typographic techniques such as justification and proportional fonts.
FWIW, this appears twice, identically, in the spec.
Martin Dürst brought up CJK (well, actually CJT), saying that they 
don't use inter-word spacing. That's fine, but it is irrelevant to 
the text in the draft. If some text comes through with no spaces, 
there is no white space to collapse. His argument that some XML 
editors make long lines of text difficult to edit is clearly *way* 
out of scope for Atom, or any other XML-using protocol for that 
matter.

It may well be that the solutions to this problem are worse than the
problem itself.  However I think it is important to specifically
understand that is the case rather than failing to solve the problem
because we failed to understand it.
The "case" is that text that is supposed to be read by humans comes 
in many forms, with different line lengths, and so on. The paragraph 
from the spec says that Atom processors may alter these so that they 
can be presented better for the reader. Of course, they may also 
alter it to make it less readable, as many mail user agents do 
(). Regardless, this says that the Atom processor is free to 
present things in text constructs in any fashion it deems suitable. 
This is particularly important for making Atom content accessible; 
for example, the Atom processor can use this rule to present text 
content by reading it aloud, by putting it on a screen greatly 
magnified one character at a time, and so on.

At least based on the discussion the IESG has been copied on, it
doesn't sound like the working group has fully considered this issue.
The responses have more of the character of those found from people
trying to brush aside an issue than of people who have carefully
considered something and concluded there is nothing to be done.
Sorry, but that's unfair. Alexy asked "Ok, maybe it is just me, but 
what does it mean to "collapse white-space"? Does this mean to 
replace FWS (in RFC 2822 sense) with a single space?" Martin's 
response was orthogonal: "Making this more precise is definitely 
desirable. But there is also an i18n issue: This works fine for 
languages that use spaces between words." The rest of the thread 
wandered into the weeds because it was hard to figure out what was 
being discussed.

Moreover, thisn issue cannot be unique to atom: it must effect many
XML based protocols both within the IETF and within other standards
organizations.
Any protocol that has XML that includes human-readable text has this 
issue. Well, the processors of that XML does; the protocols 
themselves do not.

Anyway as someone evaluating atompub's output it would be very useful
if the working group responded to this last call comment.  IN my mind
a response would start with a researched description of the issue:
either confirm that Chinese and Japanese and Thai tools work as
described or explain how they actually work.  Then describe what other
standards have done about this problem.  Finally describe what atompub
has done about the problem and why. 

I'm not asking for a lot of text; probably something about as long as
this message.
I believe that it can be a lot shorter: given the rationale above, 
it's not a problem for Atompub or any other XML-using protocol. For 
that matter, it's not really and XML problem at all: it affects text 
formats like HTML and RFC 2822 as well.

--Paul Hoffman, Director
--Internet Mail Consortium


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Robert Sayre

On 5/9/05, Sam Hartman <[EMAIL PROTECTED]> wrote:
> At least based on the discussion the IESG has been copied on, it
> doesn't sound like the working group has fully considered this issue.
> The responses have more of the character of those found from people
> trying to brush aside an issue than of people who have carefully
> considered something and concluded there is nothing to be done.
> 
> Moreover, thisn issue cannot be unique to atom: it must effect many
> XML based protocols both within the IETF and within other standards
> organizations.

Martin,

I agree with Sam on both points. Can you give us an example of an XML
format that successfully deals with your issue?

Does XHTML differ from Atom?

Robert Sayre



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Sam Hartman

> "Martin" == Martin Duerst <[EMAIL PROTECTED]> writes:

Martin> At 17:29 05/05/07, Henri Sivonen wrote:
>>  On May 4, 2005, at 04:39, Martin Duerst wrote:
>>> For free-flowing text, however, the line breaks in the source
>>> and those in the display are not (necessarily) the same, and
>>> so linebreaks have to be changed to spaces for Western
>>> languages, but to nothing for Chinese/Japanese (and most
>>> possibly to a zero-width non-breaking space for Thai), and the
>>> spec has to say something about this.
>>  Why would you put line breaks in the CJK source, then? Isn't
>> the
Martin> "problem" solved with the least heuristics by the producer
Martin> not putting breaks there?

Martin> People in China, Japan, and so on (Korean uses spaces, so
Martin> it's not CJK) tend to use similar tools to those in the
Martin> western world. Tools for editing XML, e.g., usually don't
Martin> make it easy to edit very long lines because they assume
Martin> that such long lines can be broken. So it's not as easy as
Martin> it looks for the producer.

My personal opinion as someone who is very shortly going to have to
evaluate the atom specification is that you've identified an issue
(space and line breaking) for some languages that should be
considered.  Your proposed solution seems highly undesirable in that
it requires us to understand the language of the text being displayed.
In the past we've had all sorts of problems doing that.  Your proposed
solution also seems quite complicated.

It may well be that the solutions to this problem are worse than the
problem itself.  However I think it is important to specifically
understand that is the case rather than failing to solve the problem
because we failed to understand it.

At least based on the discussion the IESG has been copied on, it
doesn't sound like the working group has fully considered this issue.
The responses have more of the character of those found from people
trying to brush aside an issue than of people who have carefully
considered something and concluded there is nothing to be done.

Moreover, thisn issue cannot be unique to atom: it must effect many
XML based protocols both within the IETF and within other standards
organizations.

It may be that the right people haven'tspoken up or that the right
part of the discussion hasn't been copied to the IESG.  I'd sort of
expect the apps ADs and atompub chairs to be familiar with these
issues but so far have not seen them chime in.


Anyway as someone evaluating atompub's output it would be very useful
if the working group responded to this last call comment.  IN my mind
a response would start with a researched description of the issue:
either confirm that Chinese and Japanese and Thai tools work as
described or explain how they actually work.  Then describe what other
standards have done about this problem.  Finally describe what atompub
has done about the problem and why.  

I'm not asking for a lot of text; probably something about as long as
this message.


Such a response would make evaluating this issue much easier when the
document comes before the IESG.



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Thomas Broyer
A. Pagaltzis wrote:
* Thomas Broyer <[EMAIL PROTECTED]> [2005-05-03 19:35]:
This means type="text" content is a single paragraph of text.
If you need paragraphs, lists or any other "structural
formatting", you have to use type="html" or type="xhtml" with
the appropriate content.

Or type="text/plain", Iâd assume?
If you're talking about atom:content, not for Text Constructs.
--
Thomas Broyer


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Henri Sivonen

On May 8, 2005, at 09:38, Martin Duerst wrote:
Why would you put line breaks in the CJK source, then? Isn't the 
"problem" solved with the least heuristics by the producer not 
putting breaks there?
People in China, Japan, and so on (Korean uses spaces, so it's not CJK)
tend to use similar tools to those in the western world. Tools for
editing XML, e.g., usually don't make it easy to edit very long lines
because they assume that such long lines can be broken. So it's not
as easy as it looks for the producer.
If the XML editing tools are broken, they should be fixed instead of 
requiring consumers to perform context-sensitive DWIM. Besides, Atom 
feeds are typically produced programmatically instead of editing 
manually, so I think the point about editors is moot. The content in 
Atom feeds may be manually edited, but putting stray line feeds in 
XHTML is not safe, so producers need to take care of not including 
stray line feeds anyway.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-09 Thread Martin Duerst
At 17:29 05/05/07, Henri Sivonen wrote:
>
>On May 4, 2005, at 04:39, Martin Duerst wrote:
>
>> For free-flowing text, however, the line breaks in the source and those in
>> the display are not (necessarily) the same, and so linebreaks have to be
>> changed to spaces for Western languages, but to nothing for Chinese/Japanese
>> (and most possibly to a zero-width non-breaking space for Thai), and the spec
>> has to say something about this.
>
>Why would you put line breaks in the CJK source, then? Isn't the 
"problem" solved with the least heuristics by the producer not putting 
breaks there?

People in China, Japan, and so on (Korean uses spaces, so it's not CJK)
tend to use similar tools to those in the western world. Tools for
editing XML, e.g., usually don't make it easy to edit very long lines
because they assume that such long lines can be broken. So it's not
as easy as it looks for the producer.
Regards,Martin. 



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-08 Thread Henri Sivonen
On May 8, 2005, at 06:30, Walter Underwood wrote:
White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom.
Why not? We expect them not no insert other random characters there. 
What do the same producers do with XHTML? Opera 7.53 and Safari 1.3 
render a space between the second and third Kanji in
http://hsivonen.iki.fi/test/cjk-whitespace.xhtml

There will be plenty of content from other formats
with this linguistically meaningless white space.
Why not just get rid of it in the producer end like you have to get rid 
of form feeds?

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-07 Thread Walter Underwood

--On May 7, 2005 11:29:07 AM +0300 Henri Sivonen <[EMAIL PROTECTED]> wrote:
>
> Why would you put line breaks in the CJK source, then? Isn't the "problem"
> solved with the least heuristics by the producer not putting breaks there?

It would be even better if they would just speak English. :-)

White space is not particularly meaningful in some of these languages,
so we cannot expect them to suddenly pay attention to that just so
they can use Atom. There will be plenty of content from other formats
with this linguistically meaningless white space.

If we get this wrong, Atom-delivered content will look broken in
some languages, and a bunch of extra-spec practice will build up about
how to fix it. Much better to get it right in 1.0.

wunder
--
Walter Underwood
Principal Architect, Verity



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-07 Thread Henri Sivonen
On May 4, 2005, at 04:39, Martin Duerst wrote:
For free-flowing text, however, the line breaks in the source and 
those in
the display are not (necessarily) the same, and so linebreaks have to 
be
changed to spaces for Western languages, but to nothing for 
Chinese/Japanese
(and most possibly to a zero-width non-breaking space for Thai), and 
the spec
has to say something about this.
Why would you put line breaks in the CJK source, then? Isn't the 
"problem" solved with the least heuristics by the producer not putting 
breaks there?

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-06 Thread Martin Duerst
At 04:32 05/05/04, Graham wrote:
>
>On 28 Apr 2005, at 7:33 pm, Alexey Melnikov wrote:
>
>> Ok, maybe it is just me, but what does it mean to "collapse white- 
space"? Does this mean to replace FWS (in RFC 2822 sense) with a
>> single space?
>
>Since the statement is a MAY, I don't think any exact meaning is
>necessary. It's simply a hint to publishers that whitespace may not
>be preserved.
>
>
>On 29 Apr 2005, at 10:17 am, Martin Duerst wrote:
>
>> Making this more precise is definitely desirable. But there is also
>> an i18n issue: This works fine for languages that use spaces between
>> words. It doesn't work for languages that don't have spaces between
>> words (Chinese, Japanese, Thai,...). If Text elements are only used
>> for short things such as names or titles, that's not a big issue,
>> the text in question can just be put on a single line. However,
>> when the texts in question are long, it's a serious issue, and
>> should be fixed.
>
>A consumer may do anything that can reasonably be described as
>"collapsing whitespace", but are not required to. How does this cause
>problems in Asian languages?

If the consumer does the right thing, it won't cause problems.
But the chance that the consumer does the right thing without
being told what this is (or, without being told that this is
different for different languages or scripts) is rather low.
So we better improve the spec to help consumers do a better
job.
Regards,Martin. 



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-06 Thread Martin Duerst
At 02:27 05/05/04, Thomas Broyer wrote:
>
>Martin Duerst wrote:
>> At 03:33 05/04/29, Alexey Melnikov wrote:
>>  > >   If the value is "text", the content of the Text construct MUST NOT
>>  > >   contain child elements.  Such text is intended to be presented to
>>  > >   humans in a readable fashion.  Thus, Atom Processors MAY collapse
>>  > >   white-space (including line-breaks),
>>  >
>>  >Ok, maybe it is just me, but what does it mean to "collapse 
>white-space"? Does this mean to replace FWS (in RFC 2822 sense) with a 
>single space?
>> Making this more precise is definitely desirable. But there is also
>> an i18n issue: This works fine for languages that use spaces between
>> words. It doesn't work for languages that don't have spaces between
>> words (Chinese, Japanese, Thai,...). If Text elements are only used
>> for short things such as names or titles, that's not a big issue,
>> the text in question can just be put on a single line. However,
>> when the texts in question are long, it's a serious issue, and
>> should be fixed.
>
>My understanding of type="text" is that this is "just text" without any 
"formatting".

That's my understanding, too.
>Hence, it is not meant to be "preformatted text" such as text/plain or 
inside an (X)HTML "pre".

Yes. But that's exactly where the spacing problems with Chinese/Japanese/Thai
are. There are no such problems for preformatted text, because the line breaking
in the source (as sent) is the same as the line breaking when displayed.
For free-flowing text, however, the line breaks in the source and those in
the display are not (necessarily) the same, and so linebreaks have to be
changed to spaces for Western languages, but to nothing for Chinese/Japanese
(and most possibly to a zero-width non-breaking space for Thai), and the spec
has to say something about this.
Regards,Martin.
>This means type="text" content is a single paragraph of text. If you 
need paragraphs, lists or any other "structural formatting", you have to 
use type="html" or type="xhtml" with the appropriate content.
>
>I was about to writing a Pace about white-space handling in type="text" 
(either using xml:space or an attribute that would have mimic'd the 
"white-space" CSS property) when I understood/recalled that Text 
Constructs have accessibility in mind (hence their limitation to textual 
contents) and preformatted text is not accessible enough.
>
>--
>Thomas Broyer
> 



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-05 Thread A. Pagaltzis

* Thomas Broyer <[EMAIL PROTECTED]> [2005-05-03 19:35]:
> This means type="text" content is a single paragraph of text.
> If you need paragraphs, lists or any other "structural
> formatting", you have to use type="html" or type="xhtml" with
> the appropriate content.

Or type="text/plain", Iâd assume?

Regards,
-- 
Aristotle



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-04 Thread Henri Sivonen
On May 4, 2005, at 12:05, Brian E Carpenter wrote:
Henri Sivonen wrote:
On Apr 29, 2005, at 12:17, Martin Duerst wrote:
Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
You seem to be assuming that the length of a "line" is restricted in 
XML source. Why? As far as I can tell, it should be permissible to 
produce Atom documents that contain no LF or CR characters.
Can't languages without spaces use long source "lines" and apply soft 
wrapping in a source view if necessary? Why is this a wire format 
problem?
Are you suggesting that a canonical format without CRLF should be 
mandatory?
No, not mandatory, although I expect to produce such feeds myself (even 
in English or Finnish). I am suggesting that pretty-printing line 
breaks should not be introduced in places where normalizing them to a 
space would be inappropriate. So if you have a long string of Chinese, 
Japanese or Thai and you don't want spaces in that string, just don't 
put white space there.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Henri Sivonen
On Apr 29, 2005, at 12:17, Martin Duerst wrote:
Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
You seem to be assuming that the length of a "line" is restricted in 
XML source. Why? As far as I can tell, it should be permissible to 
produce Atom documents that contain no LF or CR characters.

Can't languages without spaces use long source "lines" and apply soft 
wrapping in a source view if necessary? Why is this a wire format 
problem?

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Graham
On 28 Apr 2005, at 7:33 pm, Alexey Melnikov wrote:
Ok, maybe it is just me, but what does it mean to "collapse white- 
space"? Does this mean to replace FWS (in RFC 2822 sense) with a  
single space?
Since the statement is a MAY, I don't think any exact meaning is  
necessary. It's simply a hint to publishers that whitespace may not  
be preserved.

On 29 Apr 2005, at 10:17 am, Martin Duerst wrote:
Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
A consumer may do anything that can reasonably be described as  
"collapsing whitespace", but are not required to. How does this cause  
problems in Asian languages?

Graham


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Thomas Broyer
Martin Duerst wrote:
At 03:33 05/04/29, Alexey Melnikov wrote:
 > >   If the value is "text", the content of the Text construct MUST NOT
 > >   contain child elements.  Such text is intended to be presented to
 > >   humans in a readable fashion.  Thus, Atom Processors MAY collapse
 > >   white-space (including line-breaks),
 >
 >Ok, maybe it is just me, but what does it mean to "collapse 
 >white-space"? Does this mean to replace FWS (in RFC 2822 sense) with a 
 >single space?

Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
My understanding of type="text" is that this is "just text" without any 
"formatting". Hence, it is not meant to be "preformatted text" such as 
text/plain or inside an (X)HTML "pre".

This means type="text" content is a single paragraph of text. If you 
need paragraphs, lists or any other "structural formatting", you have to 
use type="html" or type="xhtml" with the appropriate content.

I was about to writing a Pace about white-space handling in type="text" 
(either using xml:space or an attribute that would have mimic'd the 
"white-space" CSS property) when I understood/recalled that Text 
Constructs have accessibility in mind (hence their limitation to textual 
contents) and preformatted text is not accessible enough.

--
Thomas Broyer


Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Robert Sayre

On 4/29/05, Martin Duerst <[EMAIL PROTECTED]> wrote:
> At 03:33 05/04/29, Alexey Melnikov wrote:
>  >
>  >Ok, maybe it is just me, but what does it mean to "collapse white-space"?
> Does this mean to replace FWS (in RFC 2822 sense) with a single space?
> 
> Making this more precise is definitely desirable. But there is also
> an i18n issue: This works fine for languages that use spaces between
> words. It doesn't work for languages that don't have spaces between
> words (Chinese, Japanese, Thai,...). If Text elements are only used
> for short things such as names or titles, that's not a big issue,
> the text in question can just be put on a single line. However,
> when the texts in question are long, it's a serious issue, and
> should be fixed.

I believe the intent of this text was to match HTML's text treatment,
so that implementations can avoid preprocessing whitespace.

http://www.w3.org/TR/html4/struct/text.html#h-9.1

Suggestions for less vague text is welcome, but I want to make sure
the text remains comprehensible to non-experts.

Robert Sayre



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-05-03 Thread Martin Duerst
At 03:33 05/04/29, Alexey Melnikov wrote:
>>The file can be obtained via
>>http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-08.txt
> >3.1.1.1  Text
> >   If the value is "text", the content of the Text construct MUST NOT
> >   contain child elements.  Such text is intended to be presented to
> >   humans in a readable fashion.  Thus, Atom Processors MAY collapse
> >   white-space (including line-breaks),
>
>Ok, maybe it is just me, but what does it mean to "collapse white-space"? 
Does this mean to replace FWS (in RFC 2822 sense) with a single space?

Making this more precise is definitely desirable. But there is also
an i18n issue: This works fine for languages that use spaces between
words. It doesn't work for languages that don't have spaces between
words (Chinese, Japanese, Thai,...). If Text elements are only used
for short things such as names or titles, that's not a big issue,
the text in question can just be put on a single line. However,
when the texts in question are long, it's a serious issue, and
should be fixed.
> > and display the text using
> >   typographic techniques such as justification and proportional fonts.
>
>
> >4.1.3.3  Processing Model
>...
> >   2.  If the value of "type" is "html", the content of atom:content
> >   MUST NOT contain child elements, and SHOULD be suitable for
> >   handling as HTML [HTML].  The HTML markup must be escaped; for
>
>Should the "must" be changed to MUST here?
Yes, please!
> >6.3  Software Processing of Foreign Markup
> >
>...
> >   When unknown foreign markup is encountered in a Text Contruct or
> >   atom:content element, software SHOULD ignore the markup and process
> >   any text content of foreign elements as though the surrounding markup
> >   were not present.
>
>I reread this paragraph few times and I am still not quite sure what the 
paragraph is trying to say. Is it trying to say "if the content of a 
foreign element looks like XML with unrecognized schema - just strip the 
markup and process the text"?

Reading this, I got confused because we have both "Text Construct"
and "Text" as subtitles. I suggest to change the subtitle "Text" to
something like "Text Construct with type='text'" or so. Also, starting
a section with just an example looks weird. Please add an introductory
sentence. Same of course for the parallel subsections.
Regards,Martin. 



Re: Last Call: 'The Atom Syndication Format' to Proposed Standard

2005-04-28 Thread Alexey Melnikov
The IESG wrote:
The IESG has received a request from the Atom Publishing Format and Protocol 
WG to consider the following document:

- 'The Atom Syndication Format '
   as a Proposed Standard
The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action.  Please send any comments to the
iesg@ietf.org or ietf@ietf.org mailing lists by 2005-05-04.
The file can be obtained via
http://www.ietf.org/internet-drafts/draft-ietf-atompub-format-08.txt
 

In general the document looks good to me. Some minor comments (and few 
questions), mostly nitpicking below:

>3.1.1.1  Text
>
>   Example atom:title with text content:
>
>   ...
>   
> Less: <
>   
>   ...
>
>   If the value is "text", the content of the Text construct MUST NOT
>   contain child elements.  Such text is intended to be presented to
>   humans in a readable fashion.  Thus, Atom Processors MAY collapse
>   white-space (including line-breaks),
Ok, maybe it is just me, but what does it mean to "collapse 
white-space"? Does this mean to replace FWS (in RFC 2822 sense) with a 
single space?

> and display the text using
>   typographic techniques such as justification and proportional fonts.
>4.1.3.3  Processing Model
...
>   2.  If the value of "type" is "html", the content of atom:content
>   MUST NOT contain child elements, and SHOULD be suitable for
>   handling as HTML [HTML].  The HTML markup must be escaped; for
Should the "must" be changed to MUST here?
>   example, "" as "
". The HTML markup SHOULD be such > that it could validly appear directly within an HTML > element. Atom Processors that display the content MAY use the > markup to aid in displaying it. ... > 6. For all other values of "type", the content of atom:content MUST > be a valid Base64 encoding [RFC3548], which when decoded SHOULD I have to note that the RFC 3548 has 2 base64 alphabets: in section 3 and in section 4. You probably want the more common one in section 3, but this has to be stated explicitly. >6.3 Software Processing of Foreign Markup > ... > When unknown foreign markup is encountered in a Text Contruct or > atom:content element, software SHOULD ignore the markup and process > any text content of foreign elements as though the surrounding markup > were not present. I reread this paragraph few times and I am still not quite sure what the paragraph is trying to say. Is it trying to say "if the content of a foreign element looks like XML with unrecognized schema - just strip the markup and process the text"? Regards, Alexey