Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?
Hello Karl, On 2012/07/21 0:41, Karl Pentzlin wrote: Looking for an example of "plain text" which is obvious to anybody, it seems to me that the "Subject" field of e-mails is a good example. Common e-mail software lets you enter any text but gives you never access to any higher-level protocol. Possibly you can select the font in which the subject line is shown, but this is completely independent of the font your subject line is shown at the recipient. Thus, you transfer here plain text, and you can use exactly the characters which either Unicode provides to you, or which are PUA characters which you have agreed upon with the recipient before. In fact, the de-facto-standard regulating the e-mail content (RFC 2822, dated April 2001 http://www.ietf.org/rfc/rfc2822.txt , afaik) No. If you go to http://tools.ietf.org/html/rfc2822, you'll see Obsoleted by: 5322, Updated by: 5335, 5336. RFC 5322 is the new version, date October 2008, but doesn't change much. RFC 5335 and 5336 are experimental for encoding the Subject (and a lot of other fields) as raw UTF-8 if the email infrastructure supports it. There are Standards Track updates for these two, RFC 6531 and 6532. But what's more important for your question, at least in theory, is http://tools.ietf.org/html/rfc2231, which defines a way to add language information to header fields such as Subject:. With such information, it would stop to be plain text. In practice, RFC 2231 is not well known, and even less used, so except for detailed technical discussion, your example should be good enough. Regards, Martin. defines the content of the "Subject" line as "unstructured" (p.25), which means that is has to consist of US-ASCII characters, which in turn can denote other (e.g. Unicode) characters by the application of MIME protocols. Thus, the result is an unstructured character sequence. There is e.g. no possibility to include superscripted/subscripted characters in a "Subject" of an e-mail, unless these characters are in fact included as superscript/subscript characters in Unicode directly. Thus, proving the necessity to include a character in the text of a "Subject" line of an e-mail, is proving that the character has to be available as a plain text character. If, additionally, the character is used outside a closed group (which can be advised to use PUA characters), then there is a valid argument to include such a character in Unicode. Is my assumption correct? (I think of the SUBSCRIPT SOLIDUS proposed in WG2 N3980. It is in fact annoying that you cannot address DIN EN 13501 requirements in an e-mail subject line written correctly, as Unicode, although being an industry standard, until now did not listen to an industry request at this special topic.) - Karl
Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?
On 7/20/2012 1:34 PM, Jukka K. Korpela wrote: 2012-07-20 20:19, Asmus Freytag wrote: On 7/20/2012 8:41 AM, Karl Pentzlin wrote: Looking for an example of "plain text" which is obvious to anybody, it seems to me that the "Subject" field of e-mails is a good example. By common convention, certain notational features have been relegated to styled text. Super and subscript in mathematical, chemical and other notation belongs to that class. I’m afraid I don’t quite follow. Yeah, I think in this case you missed the point of what I was trying to say. A./
Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?
2012-07-20 20:19, Asmus Freytag wrote: On 7/20/2012 8:41 AM, Karl Pentzlin wrote: Looking for an example of "plain text" which is obvious to anybody, it seems to me that the "Subject" field of e-mails is a good example. By common convention, certain notational features have been relegated to styled text. Super and subscript in mathematical, chemical and other notation belongs to that class. I’m afraid I don’t quite follow. Superscripts and subscripts can be presented using styling or other higher-level protocols, or specialized superscript or subscript characters can be used, in many cases. But this does not seem to be relevant to the question whether “Subject” fields are a good example of plain text. A much stronger case than subject lines are regulatory databases with plain-text fields in their records. It’s part of the database design to decide whether fields are plain text, so I don’t quite get the point. Sometimes people would like plain text to cover things that do not exist as Unicode characters now, but that’s a different topic. If the users for which such "near plain text" notations are part of their daily work were to report that subject lines, database "plain text" fields and other such bottlenecks are causing serious issues, then I think Unicode and WG2 should listen carefully. Instead of getting into theoretical considerations of “near plain text”, I think the question is whether there is sufficient evidence of real-life needs for new subscript or superscript characters. In general, coding of new characters requires demonstrated *use* of symbols as text characters, rather than arguments about *need* to use them. But even the need is questionable: e-mail headings are supposed to be short texts that tell what the message is about, not complicated formulas. And it’s part of database design to decide that you use some fields for some purposes and make them plain text fields, instead of (somehow) allowing styling inside them. Yucca
RE: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?
A) it can use quoted-printable B) See RFC 6532/6530 - Now it can be UTF-8 :) -Shawn
Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?
2012-07-20 19:52, Philippe Verdy wrote: The "Subject" fi[el]d is subject to special encoding like Quoted-Printable or Base64 using specific prefixes. This is a matter of character encoding. All plain text inevitably has some encoding, and the encoding may vary without changing the plain text status. Admittedly, QP and Base64 may be interpreted as being a higher-level protocol, but they can be applied to any plain text, and I don’t think this changes plain text to non-plain. Additionally it has specific formatting conventions related to the use of spaces and continuation lines if needed. This is a real deviation from plain text principles and applies to e-mail message headers in general. As per clause 2.2.3 of RFC 2822, the header is logically a single line but may contain CR LF, which will be unfolded. Yucca
Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?
On 7/20/2012 8:41 AM, Karl Pentzlin wrote: Looking for an example of "plain text" which is obvious to anybody, it seems to me that the "Subject" field of e-mails is a good example. By common convention, certain notational features have been relegated to styled text. Super and subscript in mathematical, chemical and other notation belongs to that class. There have been occasional calls to add certain explicit characters, but they have been either rejected or met with such chilly response on preliminary inquiry that no formal submission was ever made. Subscript and superscript are essential features of such a notation, but most people can "live with" not having access to the full notation in the subject line. (No mathematician expects to be able to place a fully built-up equation there, even if his software supports plain text math, as defined in UTN#28). A much stronger case than subject lines are regulatory databases with plain-text fields in their records. A German company had approached Unicode with the problem that even the in-line formulas for chemical compounds needed a few subscript character beyond digits, in particular the Greek letters alpha, beta and gamma (not the whole alphabet). That request died before being taken up by the committee. I have no idea how that industry solved their problem, after all, the regulatory mandate didn't disappear. However, as it stands, the de-facto precedent is to not accommodate such usage by coding characters. The situation with DIN EN 13501 seems to be entirely equivalent, in fact I find it less likely that a subject line, to be intelligible and specific would require the particular character in question than the letters needed to be able to write a full chemical formula (in the style of C₂H₆O). People just make do, writing C2H6O etc. (check "chemical formula of alcohol" on google, to see what I mean). [Some organic compounds also use Greek letters, I don't have an example, not being a chemist.] If the users for which such "near plain text" notations are part of their daily work were to report that subject lines, database "plain text" fields and other such bottlenecks are causing serious issues, then I think Unicode and WG2 should listen carefully. However, this should be something that's broadly anchored in those user communities. Let them demonstrate that there's a real practical need that outweighs the dual representation issue. A./
Re: Is the "Subject" field of an e-mail an obvious example of "plain text" where no higher level protocol application is possible?
The "Subject" filed is subject to special encoding like Quoted-Printable or Base64 using specific prefixes. This is necessary because the MIME headers spreciying the ail encoding only applies to the mail body but not to the headers themselves. For this reason it is not stricly plain text. Additionally it has specific formatting conventions related to the use of spaces and continuation lines if needed. Not all mail reader agents will recognize the Quoted-Printable or Base64 signatures found in these headers (notably in: subject, from, to), but most now actually decode them properly, privded that the prefixes are specifying a supported charset. UTF-8 is one of thoese charsets that will be most fequently recognized, but the ISO-8859-1 is still much more often recognized. For Chinese, or Japanese, UTF-8 is rarely used. There's no way to specify a font to render the encoded characters. When the headers contain 8-bit byte values, there's some assumption that it will be decoded like with the encooding found or specified in the mail body, but this is unreliable. 2012/7/20 Karl Pentzlin : > Looking for an example of "plain text" which is obvious to anybody, > it seems to me that the "Subject" field of e-mails is a good example. > Common e-mail software lets you enter any text but gives you never > access to any higher-level protocol. Possibly you can select the font > in which the subject line is shown, but this is completely independent > of the font your subject line is shown at the recipient. > Thus, you transfer here plain text, and you can use exactly the > characters which either Unicode provides to you, or which are PUA > characters which you have agreed upon with the recipient before.