Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?

2012-07-20 Thread Martin J. Dürst

Hello Karl,

On 2012/07/21 0:41, Karl Pentzlin wrote:

Looking for an example of "plain text" which is obvious to anybody,
it seems to me that the "Subject" field of e-mails is a good example.
Common e-mail software lets you enter any text but gives you never
access to any higher-level protocol. Possibly you can select the font
in which the subject line is shown, but this is completely independent
of the font your subject line is shown at the recipient.
Thus, you transfer here plain text, and you can use exactly the
characters which either Unicode provides to you, or which are PUA
characters which you have agreed upon with the recipient before.

In fact, the de-facto-standard regulating the e-mail content (RFC 2822,
dated April 2001 http://www.ietf.org/rfc/rfc2822.txt , afaik)


No. If you go to http://tools.ietf.org/html/rfc2822, you'll see
Obsoleted by: 5322, Updated by: 5335, 5336.
RFC 5322 is the new version, date October 2008, but doesn't change much.
RFC 5335 and 5336 are experimental for encoding the Subject (and a lot 
of other fields) as raw UTF-8 if the email infrastructure supports it. 
There are Standards Track updates for these two, RFC 6531 and 6532.


But what's more important for your question, at least in theory, is 
http://tools.ietf.org/html/rfc2231, which defines a way to add language 
information to header fields such as Subject:. With such information, it 
would stop to be plain text.


In practice, RFC 2231 is not well known, and even less used, so except 
for detailed technical discussion, your example should be good enough.


Regards,   Martin.



defines the content of the "Subject" line as "unstructured" (p.25),
which means that is has to consist of US-ASCII characters, which in
turn can denote other (e.g. Unicode) characters by the application of
MIME protocols. Thus, the result is an unstructured character
sequence.

There is e.g. no possibility to include superscripted/subscripted
characters in a "Subject" of an e-mail, unless these characters are
in fact included as superscript/subscript characters in Unicode
directly.

Thus, proving the necessity to include a character in the text of a
"Subject" line of an e-mail, is proving that the character has to be
available as a plain text character. If, additionally, the character
is used outside a closed group (which can be advised to use PUA
characters), then there is a valid argument to include such a
character in Unicode.

Is my assumption correct?

(I think of the SUBSCRIPT SOLIDUS proposed in WG2 N3980.
  It is in fact annoying that you cannot address DIN EN 13501
  requirements in an e-mail subject line written correctly,
  as Unicode, although being an industry standard, until now
  did not listen to an industry request at this special topic.)

- Karl







Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?

2012-07-20 Thread Asmus Freytag

On 7/20/2012 1:34 PM, Jukka K. Korpela wrote:

2012-07-20 20:19, Asmus Freytag wrote:


On 7/20/2012 8:41 AM, Karl Pentzlin wrote:

Looking for an example of "plain text" which is obvious to anybody,
it seems to me that the "Subject" field of e-mails is a good example.


By common convention, certain notational features have been relegated to
styled text. Super and subscript in mathematical, chemical and other
notation belongs to that class.


I’m afraid I don’t quite follow.


Yeah, I think in this case you missed the point of what I was trying to say.

A./



Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?

2012-07-20 Thread Jukka K. Korpela

2012-07-20 20:19, Asmus Freytag wrote:


On 7/20/2012 8:41 AM, Karl Pentzlin wrote:

Looking for an example of "plain text" which is obvious to anybody,
it seems to me that the "Subject" field of e-mails is a good example.


By common convention, certain notational features have been relegated to
styled text. Super and subscript in mathematical, chemical and other
notation belongs to that class.


I’m afraid I don’t quite follow. Superscripts and subscripts can be 
presented using styling or other higher-level protocols, or specialized 
superscript or subscript characters can be used, in many cases. But this 
does not seem to be relevant to the question whether “Subject” fields 
are a good example of plain text.



A much stronger case than subject lines are regulatory databases with
plain-text fields in their records.


It’s part of the database design to decide whether fields are plain 
text, so I don’t quite get the point. Sometimes people would like plain 
text to cover things that do not exist as Unicode characters now, but 
that’s a different topic.



If the users for which such "near plain text" notations are part of
their daily work were to report that subject lines, database "plain
text" fields and other such bottlenecks are causing serious issues, then
I think Unicode and WG2 should listen carefully.


Instead of getting into theoretical considerations of “near plain text”, 
I think the question is whether there is sufficient evidence of 
real-life needs for new subscript or superscript characters. In general, 
coding of new characters requires demonstrated *use* of symbols as text 
characters, rather than arguments about *need* to use them. But even the 
need is questionable: e-mail headings are supposed to be short texts 
that tell what the message is about, not complicated formulas. And it’s 
part of database design to decide that you use some fields for some 
purposes and make them plain text fields, instead of (somehow) allowing 
styling inside them.


Yucca






RE: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?

2012-07-20 Thread Shawn Steele
A) it can use quoted-printable
B) See RFC 6532/6530 - Now it can be UTF-8 :)

-Shawn










Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?

2012-07-20 Thread Jukka K. Korpela

2012-07-20 19:52, Philippe Verdy wrote:


The "Subject" fi[el]d is subject to special encoding like
Quoted-Printable or Base64 using specific prefixes.


This is a matter of character encoding. All plain text inevitably has 
some encoding, and the encoding may vary without changing the plain text 
status. Admittedly, QP and Base64 may be interpreted as being a 
higher-level protocol, but they can be applied to any plain text, and I 
don’t think this changes plain text to non-plain.



Additionally it has specific formatting conventions related to the use
of spaces and continuation lines if needed.


This is a real deviation from plain text principles and applies to 
e-mail message headers in general. As per clause 2.2.3 of RFC 2822, the 
header is logically a single line but may contain CR LF, which will be 
unfolded.


Yucca





Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?

2012-07-20 Thread Asmus Freytag

On 7/20/2012 8:41 AM, Karl Pentzlin wrote:

Looking for an example of "plain text" which is obvious to anybody,
it seems to me that the "Subject" field of e-mails is a good example.


By common convention, certain notational features have been relegated to 
styled text. Super and subscript in mathematical, chemical and other 
notation belongs to that class.


There have been occasional calls to add certain explicit characters, but 
they have been either rejected or met with such chilly response on 
preliminary inquiry that no formal submission was ever made.


Subscript and superscript are essential features of such a notation, but 
most people can "live with" not having access to the full notation in 
the subject line. (No mathematician expects to be able to place a fully 
built-up equation there, even if his software supports plain text math, 
as defined in UTN#28).


A much stronger case than subject lines are regulatory databases with 
plain-text fields in their records. A German company had approached 
Unicode with the problem that even the in-line formulas for chemical 
compounds needed a few subscript character beyond digits, in particular 
the Greek letters alpha, beta  and gamma (not the whole alphabet).


That request died before being taken up by the committee.

I have no idea how that industry solved their problem, after all, the 
regulatory mandate didn't disappear. However, as it stands, the de-facto 
precedent is to not accommodate such usage by coding characters. The 
situation with DIN EN 13501 seems to be entirely equivalent, in fact I 
find it less likely that a subject line, to be intelligible and specific 
would require the particular character in question than the letters 
needed to be able to write a full chemical formula (in the style of 
C₂H₆O). People just make do, writing C2H6O etc. (check "chemical formula 
of alcohol" on google, to see what I mean). [Some organic compounds also 
use Greek letters, I don't have an example, not being a chemist.]


If the users for which such "near plain text" notations are part of 
their daily work were to report that subject lines, database "plain 
text" fields and other such bottlenecks are causing serious issues, then 
I think Unicode and WG2 should listen carefully. However, this should be 
something that's broadly anchored in those user communities. Let them 
demonstrate that there's a real practical need that outweighs the dual 
representation issue.


A./



Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?

2012-07-20 Thread Philippe Verdy
The "Subject" filed is subject to special encoding like
Quoted-Printable or Base64 using specific prefixes. This is necessary
because the MIME headers spreciying the ail encoding only applies to
the mail body but not to the headers themselves.

For this reason it is not stricly plain text.

Additionally it has specific formatting conventions related to the use
of spaces and continuation lines if needed.

Not all mail reader agents will recognize the Quoted-Printable or
Base64 signatures found in these headers (notably in: subject, from,
to), but most now actually decode them properly, privded that the
prefixes are specifying a supported charset. UTF-8 is one of thoese
charsets that will be most fequently recognized, but the ISO-8859-1 is
still much more often recognized. For Chinese, or Japanese, UTF-8 is
rarely used.

There's no way to specify a font to render the encoded characters.
When the headers contain 8-bit byte values, there's some assumption
that it will be decoded like with the encooding found or specified in
the mail body, but this is unreliable.

2012/7/20 Karl Pentzlin :
> Looking for an example of "plain text" which is obvious to anybody,
> it seems to me that the "Subject" field of e-mails is a good example.
> Common e-mail software lets you enter any text but gives you never
> access to any higher-level protocol. Possibly you can select the font
> in which the subject line is shown, but this is completely independent
> of the font your subject line is shown at the recipient.
> Thus, you transfer here plain text, and you can use exactly the
> characters which either Unicode provides to you, or which are PUA
> characters which you have agreed upon with the recipient before.