Re: I-D file formats and internationalization
On Thu, 1 Dec 2005 17:58:52 -0500 Gray, Eric wrote: -- Robert Sayre -- -- I would have written a shorter letter, but I did not have -- the time. I love this quote. Too bad it's not attributed. Did you make it up yourself? I'd like to use it sometime... (never saw an answer) The internet shows a few attributions to some Mark Twain and lots of attributions to some Blaise Pascal --- ~Randy ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Randy.Dunlap wrote: -- I would have written a shorter letter, but I did not have -- the time. I love this quote. Too bad it's not attributed. Did you make it up yourself? I'd like to use it sometime... (never saw an answer) The internet shows a few attributions to some Mark Twain and lots of attributions to some Blaise Pascal Anyway, the true ID internationalization is to make the IDs short and concise so that non-native users of English are not overwhelmed by the huge quantity and thin content of the IDs. There is no point to expand the character set only to make the IDs illegible in most international environment and, even if some environment support it, by most international people. Masataka Ohta ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Yaakov Stein wrote: Character sets are important, but there is more. I have had bad experiences with right-to-left writing in environments not specifically designed to handle it. And the worst case is embedding of left-to-right expressions inside right-to-left text (or vice versa). The only thing we can do with bidirectionality at plain text level is to spell words of opposite directinality backword. Then, if we change line length, the representaion becomes inappropriate or wrong. But, it is of course, even if we use type setted ASCII. If you handle bidirectionality in more complicated manner, it involves nesting, which makes searching practically impossible (it takes time proportional to the third power of length of string being searched). Masataka Ohta ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
--On mandag, desember 05, 2005 09:17:08 -0500 Marshall Eubanks [EMAIL PROTECTED] wrote: You may have sent it in UTF-8, but arrived here as ASCII : Content-Type: text/plain; charset=UTF-8; format=flowed ^ ASCII? -+ And your response was in ISO-8859-1; the characters survived the transition. Perhaps there's hope.. On Dec 5, 2005, at 9:00 AM, Brian E Carpenter wrote: Hallam-Baker, Phillip wrote: The fact that Brian is English and lives in Zurich is irrelevant. As a matter of fact I don't live in Zürich; I live near Genève. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Hello Harald; On Dec 6, 2005, at 11:37 AM, Harald Tveit Alvestrand wrote: --On mandag, desember 05, 2005 09:17:08 -0500 Marshall Eubanks [EMAIL PROTECTED] wrote: You may have sent it in UTF-8, but arrived here as ASCII : Content-Type: text/plain; charset=UTF-8; format=flowed ^ ASCII? -+ And your response was in ISO-8859-1; the characters survived the transition. Perhaps there's hope.. Yes, things seemed to have surprisingly well, which does give glimmers of hope. Even this seems to go back and forth OK. Of course, if it doesn't, it might be hard to reconstruct... Довяй но провяй ! Marshall On Dec 5, 2005, at 9:00 AM, Brian E Carpenter wrote: Hallam-Baker, Phillip wrote: The fact that Brian is English and lives in Zurich is irrelevant. As a matter of fact I don't live in Zürich; I live near Genève. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Marshall Eubanks wrote: Even this seems to go back and forth OK. Of course, if it doesn't, it might be hard to reconstruct... Дов�й но пров�й ! ...test it. g Seriously, nobody but me uses a pre-UTF-8 MUA. Bye, Frank ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Dec 6, 2005, at 11:08 PM, Frank Ellermann wrote: Marshall Eubanks wrote: Довяй но провяй ! Dont know what it means but it looks great to me...:-) ...test it. g Seriously, nobody but me uses a pre-UTF-8 MUA. Bye, Frank happy santa Santa Claus eve marc -- You see, all the problems in the world are created by those who want 'perfection.' Sri Sri Ravi Shankar http://montagsdemo.org/ my blog:http://brain.let.de AIM ,MSN, MAC= macbroadcast ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Douglas Otis wrote: this could also mean utilizing graphical characters to create clean lines, boxes, and borders. This could be a matter of the character-repertoire going beyond ASCII in conjunction with a drawing application. This approach should permit a simple translation back into ASCII-artwork for the ASCII only version. This won't work. Forget it. The xml2rfc ASCII-approximations for Latin-1 and some additional windows-1252 characters are already dubious. Weird cp858 example added below, bye, Frank ??? ??? +-+ ??? ??? |+| ??? ??? +-+ Title: ASCII isn't the point here +-+ |+| +-+ ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Dec 6, 2005, at 2:27 PM, Frank Ellermann wrote: Douglas Otis wrote: this could also mean utilizing graphical characters to create clean lines, boxes, and borders. This could be a matter of the character-repertoire going beyond ASCII in conjunction with a drawing application. This approach should permit a simple translation back into ASCII-artwork for the ASCII only version. This won't work. Forget it. The unicode box drawing characters (2500-257F) you used are specifically for box and line drawing, but depend upon the line spacing matching their point size. This can be used to create clean drawings, when formating is controlled and perhaps output as PDF or HTML, for example. These same characters can be translated into the same dashed-line quality ASCII-artwork for the current US-ASCII version RFCs. Unicode does offer the possibility that artwork can look clean, but font selection and line spacing must be controlled, which is not practical in emails. : ( ┌─┬─┐ +-+-+ ├─┼─┤ |-+-| └─┴─┘ +-+-+ -Doug ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
Character sets are important, but there is more. I have had bad experiences with right-to-left writing in environments not specifically designed to handle it. And the worst case is embedding of left-to-right expressions inside right-to-left text (or vice versa). האם עברית עוברת נכון ? Y(J)S ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Hallam-Baker, Phillip wrote: The fact that Brian is English and lives in Zurich is irrelevant. As a matter of fact I don't live in Zürich; I live near Genève. Of course this matters. The problem is that it's not quite as straightforward as people think. I'm attempting to send this in UTF-8; I wonder how many people will receive it correctly? Brian ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
You may have sent it in UTF-8, but arrived here as ASCII : Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mtagate3.uk.ibm.com id jB5E0WQg115170 X-Spam-Score: 0.0 (/) X-Scan-Signature: 68c8cc8a64a9d0402e43b8eee9fc4199 Content-Transfer-Encoding: quoted-printable Was that the intent ? Regards Marshall On Dec 5, 2005, at 9:00 AM, Brian E Carpenter wrote: Hallam-Baker, Phillip wrote: The fact that Brian is English and lives in Zurich is irrelevant. As a matter of fact I don't live in Zürich; I live near Genève. Of course this matters. The problem is that it's not quite as straightforward as people think. I'm attempting to send this in UTF-8; I wonder how many people will receive it correctly? Brian ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Marshall Eubanks wrote: You may have sent it in UTF-8, but arrived here as ASCII : Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mtagate3.uk.ibm.com id jB5E0WQg115170 X-Spam-Score: 0.0 (/) X-Scan-Signature: 68c8cc8a64a9d0402e43b8eee9fc4199 Content-Transfer-Encoding: quoted-printable Was that the intent ? It wasn't *my* intent :-) But in fact, that's the transfer encoding that got converted. My outgoing transfer encoding was Content-Transfer-Encoding: 8bit The content type was still tagged as UTF-8. OTOH your response is tagged Content-Type: text/plain; charset=ISO-8859-1 and indeed did get converted somewhere, at your end I suspect. It's not so easy to assert that UTF-8 just works. Brian Regards Marshall On Dec 5, 2005, at 9:00 AM, Brian E Carpenter wrote: Hallam-Baker, Phillip wrote: The fact that Brian is English and lives in Zurich is irrelevant. As a matter of fact I don't live in Zürich; I live near Genève. Of course this matters. The problem is that it's not quite as straightforward as people think. I'm attempting to send this in UTF-8; I wonder how many people will receive it correctly? Brian ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
One interesting thing is that the umlaut on the U in Zurich and the accent grave in Geneva came though, and came back as well (on the response to my response). They look fine, and are coded as Zürich; Genève So, if your use of UTF-8 was intended to display that, I think that (for me, OS X 10.4.3) the software did the right thing. Marshall On Dec 5, 2005, at 10:03 AM, Brian E Carpenter wrote: Marshall Eubanks wrote: You may have sent it in UTF-8, but arrived here as ASCII : Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mtagate3.uk.ibm.com id jB5E0WQg115170 X-Spam-Score: 0.0 (/) X-Scan-Signature: 68c8cc8a64a9d0402e43b8eee9fc4199 Content-Transfer-Encoding: quoted-printable Was that the intent ? It wasn't *my* intent :-) But in fact, that's the transfer encoding that got converted. My outgoing transfer encoding was Content-Transfer-Encoding: 8bit The content type was still tagged as UTF-8. OTOH your response is tagged Content-Type: text/plain; charset=ISO-8859-1 and indeed did get converted somewhere, at your end I suspect. It's not so easy to assert that UTF-8 just works. Brian Regards Marshall On Dec 5, 2005, at 9:00 AM, Brian E Carpenter wrote: Hallam-Baker, Phillip wrote: The fact that Brian is English and lives in Zurich is irrelevant. As a matter of fact I don't live in Zürich; I live near Genève. Of course this matters. The problem is that it's not quite as straightforward as people think. I'm attempting to send this in UTF-8; I wonder how many people will receive it correctly? Brian ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
Ted, -- -- The IETF does not make any effort to be representative of the Internet -- community. -- -- 1) They do too. -- -- Hmmm. I would have thought proof by assertion would be more fun. -- -- Seriously, you can argue that the IETF is failing to reach the -- stakeholders it claims to represent, but I think it's disingenuous to say -- that the group doesn't try to reach them. There are low barriers to -- entry for interested parties, and concentrated efforts to find and -- coordinate with other networking standards bodies. Those aren't the -- actions of a group of ostriches. It is true that the barriers to entry are low. But, then, ostriches were not born with their heads in the sand. How is the IETF process of driving new people away because they say stuff nobody wants to hear, different from burying its head in the sand? -- -Original Message- -- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] -- On Behalf Of Ted Faber -- Sent: Friday, December 02, 2005 11:25 PM -- To: Hallam-Baker, Phillip -- Cc: ietf@ietf.org -- Subject: Re: I-D file formats and internationalization -- -- ___ -- Ietf mailing list -- Ietf@ietf.org -- https://www1.ietf.org/mailman/listinfo/ietf -- ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Hi Yaakov, on 2005-12-04 08:17 Yaakov Stein said the following: Why should any electronic format be normative? The normative version should be the hardcopy print-out, and any editing tool that can produce a precisely reproducible print-out should be allowed. This should hold for artwork as well. I guess here we differ to some degree in our views - I would rather say that if any single presentation format should be normative, it should be the one that people read during last call and during the IESG approval process. I think that's quite independent of the ability to produce precisely reproducible paper copies. I'm not sure, though, how hard the requirement on one single normative presentation format is. If all 3 or 5 (or whatever) formats in common use can easily and consistently be generated from the same source, does it matter? Today, any RFC2629 (or rfc2629bis) source can be consistently turned into HTML, XHTML, PDF, CHM and more, by using various combinations of xml2rfc, rfc2629xslt, xsl11toFop, xml2rfcpdf. The only major gap we have is really the ability to turn rfc2629bis sources which contain diagrams in image formats automatically into good ASCII artwork. With enough confidence in the consistency of the generated output, we might not care so much any more about whether it was the PDF/A version, the ASCII version, or the printed page which was normative... Of course, this only holds if the source from which the various formats can be produced is as easily available as any single presentation format. Henrik ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
on 2005-12-04 08:52 Doug Ewell said the following: Perhaps it's just me, but I find it bizarre that the question of limiting RFC text to ASCII vs. UTF-8 is being conflated with the question of limiting RFC illustrations to ASCII art vs. other graphic formats. I don't think the two have anything important in common. It is two different issues. They are easily conflated because in both cases the ASCII format is limiting, and other richer formats are less limiting. But you're right, and a change of subject line would have been in order. Henrik ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Sun, 2005-12-04 at 12:22 +0100, Henrik Levkowetz wrote: on 2005-12-04 08:52 Doug Ewell said the following: Perhaps it's just me, but I find it bizarre that the question of limiting RFC text to ASCII vs. UTF-8 is being conflated with the question of limiting RFC illustrations to ASCII art vs. other graphic formats. I don't think the two have anything important in common. It is two different issues. They are easily conflated because in both cases the ASCII format is limiting, and other richer formats are less limiting. But you're right, and a change of subject line would have been in order. If by other graphics, this could also mean utilizing graphical characters to create clean lines, boxes, and borders. This could be a matter of the character-repertoire going beyond ASCII in conjunction with a drawing application. This approach should permit a simple translation back into ASCII-artwork for the ASCII only version. If by graphics you mean something more complex in terms of graphs and charts, then having a means for reproducing these images will offer a greater challenge. It would seem some open-source application should be specified, where the input data is also made available. The ASCII only version could list just this input data in lieu of graphics. -Doug ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
At 11:52 PM -0800 12/3/05, Doug Ewell wrote: Perhaps it's just me, but I find it bizarre that the question of limiting RFC text to ASCII vs. UTF-8 is being conflated with the question of limiting RFC illustrations to ASCII art vs. other graphic formats. I don't think the two have anything important in common. It is not just you. Every time any change is proposed to RFCs on this list, every desired change is conflated, and they all instantly achieve MUST status, and the IETF are idiots for not having done it five years ago. Maybe it will be the same for our grandchildren. --Paul Hoffman, Director --VPN Consortium ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
At 05:24 03/12/2005, Ted Faber wrote: IETF standards documents reflect the consensus of the IETF participants at the time of submission to the publication queue. People who believe an IETF standards document represents other things are misinformed. Ted, I am perfectly confortable with this. I am therefore totally unconfortable with the RFC 3935 doctrine the IETF standards documents are a way to influence and therefore the tools of a policy (in favor of a set of undefined core values). Whatever your religion a byte is a byte. jfc ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
Now, the toughest question here is which presentation format should be normative. Why should any electronic format be normative? The normative version should be the hardcopy print-out, and any editing tool that can produce a precisely reproducible print-out should be allowed. This should hold for artwork as well. Y(J)S ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Perhaps it's just me, but I find it bizarre that the question of limiting RFC text to ASCII vs. UTF-8 is being conflated with the question of limiting RFC illustrations to ASCII art vs. other graphic formats. I don't think the two have anything important in common. -- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/ ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Hi Tim, on 2005-12-02 02:44 Tim Bray said the following: [snip] I will now shut up. It is clearly the case that there is tremendous resistance within the IETF to leaving their comfy ASCII enclave. Following the debate from the sideline till now, it's clear to me that there are at least a few people who are adamantly against change. I'm not at all convinced that a large majority feels this way. A poll might reveal more than the relative proportions of highly engaged people voicing their views here. As for the issue itself - no, make that some of the issues... A lot of people seem to appreciate the html-ized and PDF-ized versions provided on the tools.ietf.org site, even if the convenience offered by those are small compared to what could be available if a richer format was available during document preparation. Personally, I think we would benefit from * having the XML format championed by xml2rfc as a common source format * having support for the complete range of unicode characters for author attribution in the source format * having the ability to consistently generate multiple presentation format from this - including ASCII and UTF-8 text for textual diffs, html with links for browsing, and pdf for consistent cross-platform printing. Regarding the stability of the presentation format, I note that - A new version of PDF called PDF/A (A for Archive) has recently been standardized in ISO 19005-1 as an Electronic document file format for long-term preservation. This is going to be around for a while. - HTML 4 is also stable; the latest published version is 4.01 from 1999. - Text, as has been pointed out, is going to be readable long after other formats have come and gone. Yes, that's true, but readable doesn't necessarily mean as easily accessible as other formats. I think we're already seeing that richer formats is passing text as the format of choice for most people, and platform support for easy handling of the different formats is shifting. However - aren't we lucky! If we have a standardized format (RFC 2629 or revisions thereof) from which we can generate multiple presentation formats, we could always use it to generate the format-du-jour if it should come to pass that both PDF/A, HTML 4 and text should be hopelessly outdated... Now, the toughest question here is which presentation format should be normative. I think that something richer than ASCII would be good. It's not clear to me that there is one obvious choice for which format it should be. In 1 or 2 years I might say PDF/A if it has enough tool support, as it's open and free, cross-platform consistent, designed to be stable and long-term accessible. But I'd consider it only if we have a common source format from which we can reliably produce other versions such as HTML for browsing and ASCII/UTF-8 for diffing. Henrik ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
A fundamental problem, among many, of Unicode is that, most people can't recognize most of local characters. With the internationalized context of IETF, it is prohibitively impolite to spell names of people in a way not recognizable by others. Thus, for the internationalization, names of people MUST be spelled with characters recognized by all the international participants, which means ASCII is the only choice. Masataka Ohta ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Keith Moore wrote: personally I find most of the HTML versions of RFCs quite annoying to read because of distracting embellishments. +1 The rfcmarkup version based on the plain text is nice, e.g. http://tools.ietf.org/html/3834#section-4 It also supports e.g. http://tools.ietf.org/html/3834#page-14 - medieval or not, I like it. Bye, Frank ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
At 02:44 02/12/2005, Tim Bray wrote: On Dec 1, 2005, at 3:16 PM, Keith Moore wrote: And I will freely admit that I find the notion that a group of people designing global infrastructure think it's OK to use ASCII so morally and and aesthetically offensive that it probably interferes with my evaluation of the trade-offs. I will now shut up. It is clearly the case that there is tremendous resistance within the IETF to leaving their comfy ASCII enclave. Tim, ASCII enclave is excellent. I felt like you for years. Then RFC 3935 came. I understood it was the same old group of people who believe that the existence of the Internet, and its influence on economics, communication, and education, will help [them] to build a better human society and this is why they are to influence the way people design, use, and manage the Internet. To do so this group uses the English language for its work is because of its utility for working in a global context.. This is an old religion. ASCII is part of it. From the very beginning. You cannot oppose a religion. But you can expose it, so everyone may knowingly adhere or repudiate it. This is the reason of my RFC 3066 bis saga: the current outcome (simultaneous USG Tunis deal and IESG approval of the painstakingly trimmed Draft) brought the limitations I wanted. The appeal underway will clarify the IESG position: is the IETF exclusive or still possibly inclusive? This is of key importance because two visions of the Internet oppose. Most of us share the vision of a functional Internet where our goal is connectivity, the tool is the Internet Protocol, and the intelligence is end to end rather than hidden in the network. (RFC 1958). The vision of your ASCII enclave is for them to be the globalization core of an hardware Internet a large, heterogeneous collection of interconnected systems that can be used for communication of many different types between any interested parties connected to it. BTW this is quite the same definition, and resulting objectives, as 47 USC 230 (f)(1) legal US definition: Internet is the international computer network of both Federal and non-Federal interoperable packet switched. data networks. This is a very consistent, time proven, stakeholders supported vision. If they do not want to move, I do not think you can change it without changing of Internet. jfc ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Paul Hoffman wrote: Listing an author as Patrik Fauml;ltstrouml;m is not readable. From my POV it's better than F=84lttr=94m (858 QP) or F=E4ltstr=F6m (1252 QP) or F=C3=A4ltstr=C3=B6m (UTF-8 QP). With your proposal I'd see Fältström (intentionally raw UTF-8 encoded in the Latin-1 subset of windows-1252). AFAIK xml2rfc would transform it to Faeltstroem for its ASCII output formats - perfectly readable, but wrong. With NCRs we'd get F#xE4;ltstr#xF6;m - not exactly the best readable version, but at least it's correct, and IMO better than F=C3=A4lttr=C3=B6m. I'm not very good at decoding UTF-8 (QP or raw) manually. People would have to search using the escaped-HTML kludge, even though our documents are plain text. Yes, we'd need a new text/plane or whatever, plain ASCII plus NCRs isn't the same as text/plain ASCII. Your idea text/plain UTF-8 also isn't the default for text/plain. Same problem, do you expect an update of 2046 introducing UTF-8 as new default for text/plain ? I'd prefer to revisit your idea when the IEE / IMA folks have something that works somewhere - in other words 2007, not next year. Bye, Frank ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
JFC (Jefsey) Morfin wrote: To do so this group uses the English language for its work is because of its utility for working in a global context.. This is an old religion. ASCII is part of it. From the very beginning. I'm tired of this kind of fallacy. ASCII is not English. A charset is not tied to some content language. For exmaple, the line below is in Japanese language spelled in ASCII. Uso ha yasumi yasumi tsuke. Masataka Ohta ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Robert Sayre wrote: the current format is not easy to print with proper pagination on Microsoft Windows Last time I tested `dir prn` on a W2K box it worked, why would that be different for `type rfc4567.txt prn` ? Hmm. What is the LCD? Is it NetBSD circa 1993? Is it a PDP-11? How about a cell phone or PDA anywhere in the world. Not the ideal tool to write I-Ds, but viewing should work. At what point does internationalization trump (vociferous?) users of long obsolete platforms? Cheap devices will never support the complete Unicode repertoire. Your own browser doesn't: I don't have East Asian fonts, so I got some squares Bye, Frank ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
At 13:32 02/12/2005, Masataka Ohta wrote: JFC (Jefsey) Morfin wrote: To do so this group uses the English language for its work is because of its utility for working in a global context.. This is an old religion. ASCII is part of it. From the very beginning. I'm tired of this kind of fallacy. ASCII is not English. A charset is not tied to some content language. I am also fed-up with this kind of logic violation. Read the RFC and read my mail! English language is selected by the IETF. English is tied to ASCII. This is one of its leading advantage: to support information interchanges with the most limited charset. To read into this anything else, and in particular that ASCII would be limited to English is absurd or baised. jfc ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Frank Ellermann wrote: Paul Hoffman wrote: Listing an author as Patrik Fauml;ltstrouml;m is not readable. From my POV it's better than F=84lttr=94m (858 QP) or F=E4ltstr=F6m (1252 QP) or F=C3=A4ltstr=C3=B6m (UTF-8 QP). With your proposal I'd see Fältström (intentionally raw UTF-8 encoded in the Latin-1 subset of windows-1252). Frank, I'll make the following assumption: those who still use operating systems with no builtin support for UTF-8 are knowledgeable enough to run a text file - in particular if it has a UTF BOM - through GNU recode. ... Best regards, Julian ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
JFC (Jefsey) Morfin wrote: English is tied to ASCII. All the languages in the world is tied to ASCII too. So? This is one of its leading advantage: to support information interchanges with the most limited charset. ISO 646 IRV (Internatinal Reference Version) is defined by not IETF but ISO. Masataka Ohta ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Julian Reschke wrote: those who still use operating systems with no builtin support for UTF-8 are knowledgeable enough to run a text file - in particular if it has a UTF BOM - through GNU recode. A mandatory BOM for those I-Ds / RFCs actually using UTF-8 is an idea for Paul's next draft. Just for fun I test it here. I don't know GNU recode so far - my UTF-8 script would only recode characters with a 1:1 correspondence in windows-1252. So maybe Paul would also need fully normalized as defined in http://www.w3.org/TR/charmod-norm/ (that's not yet ready). Bye, Frank ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
From: Masataka Ohta [EMAIL PROTECTED] A fundamental problem .. of Unicode is that, most people can't recognize most of local characters. With the internationalized context of IETF, it is prohibitively impolite to spell names of people in a way not recognizable by others. This is a good point; the way to handle it would be to print the person's name in transliterated ASCII (e.g. Masataka Ohta) on top, and in local script (Kanji for you, I assume - I don't know if your name uses Kana) below. Although now that I make this point, I think it just points out that this is a silly idea - since we have to have the name in ASCII anyway, adding it in the local script is just bowing to nationalism and/or personal issues (for those who would feel slighted if we leave it out), and I would have hoped we were beyond that - but clearly not.. But I guess it does no harm to add it. Noel ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Thu, Dec 01, 2005 at 03:44:35PM -0800, Hallam-Baker, Phillip wrote: I don't think that the term 'authoritative' has much utility. The version I want is the one most likely to be trustworthy. The early church had a series of battles deciding which text should be considered cannonical. And pretty unpleasant ones at that. If you have the medieval view of knowledge resulting uniquely from divine authority such disputes might make sense. That's competent rhetoric. It doesn't address the actual state of affairs, but it reads well and is inflammatory. Nice work. RFCs have authoritative versions for a couple reasons. Some are the result of the IETF consensus process and the exact wording on which consensus was achieved is important to know. There are a significant number of cases where small changes in the wording of an RFC or section thereof would not achieve consensus. To the extent that the internet community agrees to abide by that consensus, the authoritative version is what was agreed to. For these RFCs the authority due them stems from the consensus process. Now, the IETF consensus process is performed by humans, and therefore there will be mistakes. In principle, however, it is this process that conveys any authority on an RFC and not some imperative handed down from an oligarchy, tyrant, or supernatural power. I don't think your analogy to the(?) church is particularly illuminating for that reason. If you have issues with the IETF consensus process and the quality of standards (and other documents) that process produces, I'm all ears. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG pgpBn1CrvoXWw.pgp Description: PGP signature ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
From: Ted Faber [mailto:[EMAIL PROTECTED] That's competent rhetoric. It doesn't address the actual state of affairs, but it reads well and is inflammatory. Nice work. Its an effective means of making a point in a memorable fashion. What we are arguing for here is that the IETF should choose a communication medium that makes its point in an effective manner. RFCs have authoritative versions for a couple reasons. Some are the result of the IETF consensus process and the exact wording on which consensus was achieved is important to know. There are a significant number of cases where small changes in the wording of an RFC or section thereof would not achieve consensus. Agreed, but I have yet to see an example that would be affected by the document format. To the extent that the internet community agrees to abide by that consensus, the authoritative version is what was agreed to. For these RFCs the authority due them stems from the consensus process. Ah here you make the mistake of thinking that the IETF community is the Internet community. Perhaps forty years ago, but certainly not today. The IETF does not make any effort to be representative of the Internet community. Now, the IETF consensus process is performed by humans, and therefore there will be mistakes. In principle, however, it is this process that conveys any authority on an RFC and not some imperative handed down from an oligarchy, tyrant, or supernatural power. I don't think your analogy to the(?) church is particularly illuminating for that reason. The analogy is very apt as your equation of the IETF with the Internet community demonstrates. IETFers behave like cardinals and priests far too often. The lack of a pope does not damage the analogy in any way. The early church was governed by independent bishops. The papacy was established after an external authority (the emperor) got tired of their infighting, doctrinal disputes ct. Only one of the original four doctors of the church was Bishop of Rome and the supremacy was never accepted by the Eastern church. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
IETF vs. global politics (was RE: I-D file formats and internationalization)
Phillip Hallam-Baker writes... Ah here you make the mistake of thinking that the IETF community is the Internet community. Perhaps forty years ago, but certainly not today. The IETF does not make any effort to be representative of the Internet community. I beg to differ. I think the IETF makes a very reasonable effort by its extraordinary attempts at openness and inclusiveness: no formal membership requirements, no voting status, no government-appointed delegates, freely available work-in-progress and final documents, decisions made on the mailing lists, no requirement to travel to face-to-face meetings to participate, streaming audio of meeting sessions, jabber rooms, and the list goes on. It seems to me that the primary objections of those who feel disenfranchised surround a specific set of existing IETF behaviors: conducting all its business in English and using US-ASCII for documents (instead of something that supports non-Latin character sets), and a set of alleged IETF behaviors: being somehow insensitive to the economic, social, political and cultural aspirations of non-native English speakers, and having some hidden agenda to retain US hegemony of the Internet. I think that's a lot of stuff and nonsense. Certainly there is room for inclusion of non-Latin character sets in documents for items such as names and addresses of contributors. If the suggestion is effectively that the IETF needs to conducts its business like the UN, with simultaneous translation into six languages, I think that's impractical, and probably detrimental. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Fri, Dec 02, 2005 at 11:57:43AM -0800, Hallam-Baker, Phillip wrote: From: Ted Faber [mailto:[EMAIL PROTECTED] RFCs have authoritative versions for a couple reasons. Some are the result of the IETF consensus process and the exact wording on which consensus was achieved is important to know. There are a significant number of cases where small changes in the wording of an RFC or section thereof would not achieve consensus. Agreed, but I have yet to see an example that would be affected by the document format. As far as pagination, no. But I didn't get the impression that your analogy was about the illumination, but about the text. For the record, I think an authoritative plain text document with a sufficiently rich character encoding to express authors' names is a reasonable RFC format. I think that generating those from a standard content markup language is a good idea and that the generating markup should be made commonly available to people revising documents. I expect to be driven to saying that again in 6 months. To the extent that the internet community agrees to abide by that consensus, the authoritative version is what was agreed to. For these RFCs the authority due them stems from the consensus process. Ah here you make the mistake of thinking that the IETF community is the Internet community. Perhaps forty years ago, but certainly not today. Point taken. In my message s/Internet community/IETF community/g. That's what I mean. I did not intend for those communities to become conflated. The IETF does not make any effort to be representative of the Internet community. 1) They do too. Hmmm. I would have thought proof by assertion would be more fun. Seriously, you can argue that the IETF is failing to reach the stakeholders it claims to represent, but I think it's disingenuous to say that the group doesn't try to reach them. There are low barriers to entry for interested parties, and concentrated efforts to find and coordinate with other networking standards bodies. Those aren't the actions of a group of ostriches. 2) Internet community is a noun that can't meaningfully be used with a definite article anymore. There are lots of Internet communities from webcomics artists through political bloggers and ebay merchants. The question of who makes up the IETF's constituency is a good one, though. I think there are significant numbers of Internet network engineers for whom the body still refelects consensus and who find the working dialogs conducted there valuable. I suspect there are folks who'll argue the opposite. I think most IETF participants think interested and interesting factions are represented, or they'd stop coming. A fair number of entities who are not participants also are interested in what the IETF publishes, even if their level of involvement does not extend to taking part in making the sausage. (By sausage I mean IETF standards documents.) Now, the IETF consensus process is performed by humans, and therefore there will be mistakes. In principle, however, it is this process that conveys any authority on an RFC and not some imperative handed down from an oligarchy, tyrant, or supernatural power. I don't think your analogy to the(?) church is particularly illuminating for that reason. The analogy is very apt as your equation of the IETF with the Internet community demonstrates. IETFers behave like cardinals and priests far too often. Though you've been mislead by my poor substitution (Internet != IETF), my intent was not to indicate that I believe that the IETF speaks for anyone but the IETF. I would have hoped that the rest of my message made that clear, but I'm happy to say it explicitly: IETF standards documents reflect the consensus of the IETF participants at the time of submission to the publication queue. People who believe an IETF standards document represents other things are misinformed. If our only point of contention is that you believe I was misinformed, let me assure you I'm not. The lack of a pope does not damage the analogy in any way. The early church was governed by independent bishops. The papacy was established after an external authority (the emperor) got tired of their infighting, doctrinal disputes ct. Only one of the original four doctors of the church was Bishop of Rome and the supremacy was never accepted by the Eastern church. I still don't see how that applies to the IETF, unless it's something to do with all that IETF restructuring that I'm ignoring. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG pgpk6VpLweZIP.pgp Description: PGP signature ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Wed, 2005-11-30 at 18:29 -0800, Paul Hoffman wrote: No escape mechanism is needed. Non-displayable characters are still in the RFC, they simply can't be displayed by everyone (but they can be displayed by many). This is infinitely simpler, and a much better long-term solution, than an escape mechanism. Further, there would be no more ASCII version to be authoritative. The Internet Draft clearly says that there is a single RFC, and it has a single encoding. Why do you think there is a problem using all possible characters in an ID, but not in an RFC? Why would it be okay for the RFC not to be readable for some, but then ensure the ID is limited to ASCII? I liked the idea that Frank suggested, use the HTML escape sequence to declare the unicode character. This allows the ASCII version to remain authoritative. ... as well as unreadable and unsearchable using normal search mechanisms. The purpose of the proposal is to allow RFCs to be readable and searchable using the encoding that is common on the Internet, without resorting to sorta-kinda-HTML or an escape mechanism. Remaining with plain ASCII would be better than either of the latter. The suggestion of the HTML escape would ensure readability. As only ASCII would be used, there would be no issues related to searching. It would allow an alternative display that remains compatible with an ASCII limitation as the authoritative version. UTF-8 use would require additional considerations regarding searching however. -Doug ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On 11/30/05, Bob Braden [EMAIL PROTECTED] wrote: This issue has been brought up before and has been on our list of things to worry about for at least two years. But we always run aground on the following consideration: there is a substantial constituency for have some least-common-denominator form of IETF documents, so people can read and print them with even the most primitive, old-fashioned (unfashionable?) tools. I agree that basic console tools should be all that's required to view the documents. That said, the current format is not easy to print with proper pagination on Microsoft Windows, so it is not perfect (as you know). I would characterize the current situation this way: Some author names and Unicode protocol parameters are garbled for all viewers. I think it's reasonable to shoot for Some author names and Unicode protocol parameters are garbled for some viewers. Allowing extended character sets for author names seems like a really nice idea to the RFC Editor as well, but we see no way to do that and keep the LCD. Hmm. What is the LCD? Is it NetBSD circa 1993? Is it a PDP-11? At what point does internationalization trump (vociferous?) users of long obsolete platforms? You need some kind of structured document that some people won't have the tools to display, search, print, ... The RFC Editor would welcome a way out of this dilemma. http://www.cl.cam.ac.uk/~mgk25/ucs/examples/quickbrown.txt So, which viewing tools fall over on that text file and need to be supported? I just tried it in Emacs, where most of the languages displayed correctly (I don't have East Asian fonts, so I got some squares), and Firefox displayed all of it correctly. -- Robert Sayre I would have written a shorter letter, but I did not have the time. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On 11/30/05, Jeroen Massar [EMAIL PROTECTED] wrote: You can of course publish drafts and RFC's as XML which supports any character set you want. AFAIK, RFC2026 still applies: the ASCII text version is the definitive reference One can always start translating RFC's: This is true. But it has nothing to do with my original message, which suggested that the format for the definitive reference be changed to UTF-8 text documents. This is a simple, backward compatible change. Artwork and natural languages are interesting topics that could probably be debated separately. -- Robert Sayre ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
At 12:15 AM -0800 12/1/05, Douglas Otis wrote: Why do you think there is a problem using all possible characters in an ID, but not in an RFC? I don't. I simply believe that, given the way the IETF deals with process changes, it is easier to change one process at a time. Why would it be okay for the RFC not to be readable for some, but then ensure the ID is limited to ASCII? I don't. It makes good sense to have both publication formats be the same, but getting that to happen at the same time may be too much of a hurdle for the IETF. The suggestion of the HTML escape would ensure readability. Fully disagree. Listing an author as Patrik Fauml;ltstrouml;m is not readable. As only ASCII would be used, there would be no issues related to searching. Fully disagree. People would have to search using the escaped-HTML kludge, even though our documents are plain text. It would allow an alternative display that remains compatible with an ASCII limitation as the authoritative version. It would mix escaped HTML into text/plain documents. It would also make RFCs that talk about HTML extremely difficult to read, because the reader would not know if entities in examples are supposed to be the escaped or unescaped versions. UTF-8 use would require additional considerations regarding searching however. Please list those; they would be valuable for the Internet Draft. --Paul Hoffman, Director --VPN Consortium ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Dec 1, 2005, at 8:34 AM, Paul Hoffman wrote: The suggestion of the HTML escape would ensure readability. Fully disagree. Listing an author as Patrik Fauml;ltstrouml;m is not readable. The suggestion was for an alternate field in the XML input file to contain non-ASCII versions of authors and titles of references used to augment the ASCII version when the output form allows. The idea of a escape mechanism was to allow the definition of non-ASCII characters elsewhere as perhaps could be needed to clarify protocol related issues. It would allow an alternative display that remains compatible with an ASCII limitation as the authoritative version. It would mix escaped HTML into text/plain documents. It would also make RFCs that talk about HTML extremely difficult to read, because the reader would not know if entities in examples are supposed to be the escaped or unescaped versions. This would require a convention with respect to what gets converted. How often are numeric HTML Unicode characters used? If this appears to be problematic, then simply retain the escape sequence in all output forms. UTF-8 use would require additional considerations regarding searching however. Please list those; they would be valuable for the Internet Draft. You talk about some of the issues in your draft. Even when just ASCII is used, there is difficulty discerning differences between characters, where one's ability is largely determined by familiarity with a particular font style. Cyrillic could be an example of there being more than one character-repertoire that may be used. Going from 94+ characters to thousands is obviously more of a challenge for those wishing to pose a search. Perhaps within a few years, as Unicode becomes more ubiquitous and such issues have been resolved with better tools, resistance against full adoption may be less. In my view, the place to start would be with the ID and alternative output forms, and not the ASCII RFC. It seems regimenting where exceptions are allowed makes sense, where an authoritative ASCII document is retained for now. At some point in time, you will be right. -Doug ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Nov 30, 2005, at 2:54 PM, Frank Ellermann wrote: As Bob said raw UTF-8 characters won't fly with `cat rfc4567 /dev/lpt1` and other simple uses of RFCs. 1. I wonder what proportion of the population prints things this way? 2. If the file is correctly encoded in UTF-8 and the above doesn't work, then your operating system is buggy. I print stuff with non- ASCII characters all the time on Linux, Windows, and Macintosh computers. Granted, I don't use cat /dev/anything to do it. -Tim ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Nov 30, 2005, at 2:54 PM, Frank Ellermann wrote: As Bob said raw UTF-8 characters won't fly with `cat rfc4567 /dev/lpt1` and other simple uses of RFCs. 1. I wonder what proportion of the population prints things this way? 2. If the file is correctly encoded in UTF-8 and the above doesn't work, then your operating system is buggy. I guess that depends on your definition of buggy. The most popular OS in the world no longer natively supports printing of _any_ kind of flat files - you have to have a special application to do that. While most people would agree that that OS is buggy, its inability to print flat files was a deliberate design choice and is therefore more properly termed a crippling feature. Also, the vast majority of printers in use don't natively support printing of utf-8, thus forcing users to layer each of their computer systems with more and more buggy cruft just to do simple tasks like printing plain text. Perhaps those are buggy also? These days, your best bet for getting utf-8 files to print is to use a web browser's print command, which is doable but can be fairly cumbersome as compared to typing a simple lpr command. Unfortunately, most web browsers fail to preserve page breaks (FF characters) when printing flat text files, which makes the resulting documents hard to read. HTML with utf-8 actually displays and prints more portably than plain text with utf-8, though it's not clear how many browsers support the style sheet extensions enough to print page breaks in the right places. Also, HTML is still somewhat of a moving target and it is somewhat unclear whether any particular subset of HTML that is used today will still be effectively presented 10-20 years from now. The biggest problems with HTML are (a) no way to include images in the document without external links (yes I know about MHTML but it's not as widely supported); (b) difficulty in finding authoring tools that will produce output in a subset of HTML that we define; (c) avoiding the temptation to make the documents pretty rather than readable. It's hard to escape the conclusion that we're trying very hard to make our document processing much more complex for a very marginal gain. Keith ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Dec 1, 2005, at 12:16 PM, Keith Moore wrote: Also, the vast majority of printers in use don't natively support printing of utf-8, thus forcing users to layer each of their computer systems with more and more buggy cruft just to do simple tasks like printing plain text. Perhaps those are buggy also? Uh, I print UTF-8 documents all the time. Normally I do it from the app in which I'm viewing them (word processor, web browser, RSS reader, xml editor, whatever). These days, your best bet for getting utf-8 files to print is to use a web browser's print command, which is doable but can be fairly cumbersome as compared to typing a simple lpr command. Hmm; control-P, enter. Unfortunately, most web browsers fail to preserve page breaks (FF characters) when printing flat text files, which makes the resulting documents hard to read. Turn this around; when printing HTML, the browser inserts appropriate page breaks depending on the combination of font, styling, and paper size that's in effect. This has the effect that when you're arguing about some text, you have to say Look at 5.2.1.3, 2nd para rather than Look at page 13, 2nd para. It's not clear that this is any better or worse. HTML with utf-8 actually displays and prints more portably than plain text with utf-8, though it's not clear how many browsers support the style sheet extensions enough to print page breaks in the right places. Given the above, I agree with the first half of the sentence. In fact, I am sitting behind a desk on which there's a macintosh and an Ubuntu linux box, and I wouldn't really know how to print plain-ASCII text on either of them, and when I've tried, the page breaks usually come out wrong. On either of them, I can and do print HTML effortlessly and with excellent results. Also, HTML is still somewhat of a moving target and it is somewhat unclear whether any particular subset of HTML that is used today will still be effectively presented 10-20 years from now. I think it is crystal clear that if you stick to HTML4 Strict or Transitional, that has an excellent chance of survival at least for decades. The biggest problems with HTML are (a) no way to include images in the document without external links (yes I know about MHTML but it's not as widely supported); (b) difficulty in finding authoring tools that will produce output in a subset of HTML that we define; (c) avoiding the temptation to make the documents pretty rather than readable. I grant problem (a). (b) and (c) can be solved using automated tools and compulsory stylesheets (or by using xml2rfc). It's hard to escape the conclusion that we're trying very hard to make our document processing much more complex for a very marginal gain. There are two large populations for whom the gain is not marginal. 1. Those, like me, who can't print ASCII files easily 2. Those whose names can't be spelled properly in ASCII. I claim that both those groups either already do, or will soon, constitute large majorities of the interested population. -Tim ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Dec 1, 2005, at 12:16 PM, Keith Moore wrote: Also, the vast majority of printers in use don't natively support printing of utf-8, thus forcing users to layer each of their computer systems with more and more buggy cruft just to do simple tasks like printing plain text. Perhaps those are buggy also? Uh, I print UTF-8 documents all the time. Normally I do it from the app in which I'm viewing them (word processor, web browser, RSS reader, xml editor, whatever). The point is that the apps have to support utf-8 or leverage OS support for UTF-8, in order to print those documents. Because the printer doesn't support UTF-8, it's not as simple as simply sending the characters to a printer anymore. And while this might work fine for you, and seem like a transparent change, it's inherently a much more fragile setup. (Which isn't really an argument against UTF-8, just a digression about what is or is not buggy.) These days, your best bet for getting utf-8 files to print is to use a web browser's print command, which is doable but can be fairly cumbersome as compared to typing a simple lpr command. Hmm; control-P, enter. Well, sure, if you spend all of your time inside a particular web browser. Unfortunately, most web browsers fail to preserve page breaks (FF characters) when printing flat text files, which makes the resulting documents hard to read. Turn this around; when printing HTML, the browser inserts appropriate page breaks depending on the combination of font, styling, and paper size that's in effect. This has the effect that when you're arguing about some text, you have to say Look at 5.2.1.3, 2nd para rather than Look at page 13, 2nd para. It's not clear that this is any better or worse. It's worse, because you really do want page numbers for when you print the document and it's quite natural to reference them. IMHO you really want the HTML for an RFC to preserve pagination and page numbers and make them visible (but not annoyingly so) even in a browser, while causing page breaks when printed and still printing correctly on either us-letter or a4 paper. But I'm not sure that this can actually be done. HTML with utf-8 actually displays and prints more portably than plain text with utf-8, though it's not clear how many browsers support the style sheet extensions enough to print page breaks in the right places. Given the above, I agree with the first half of the sentence. In fact, I am sitting behind a desk on which there's a macintosh and an Ubuntu linux box, and I wouldn't really know how to print plain-ASCII text on either of them, and when I've tried, the page breaks usually come out wrong. Sometimes this is because there are no FFs in the source document (especially for internet-drafts) and sometimes this is because the app that you're using to print doesn't respect them (most web browsers botch this). OTOH, the command-line apps tend to do this right - not because they are tied to a command-line but because they don't have to deal with moby GUI libraries. The biggest problems with HTML are (a) no way to include images in the document without external links (yes I know about MHTML but it's not as widely supported); (b) difficulty in finding authoring tools that will produce output in a subset of HTML that we define; (c) avoiding the temptation to make the documents pretty rather than readable. I grant problem (a). (b) and (c) can be solved using automated tools and compulsory stylesheets (or by using xml2rfc). Well, sure, if we can demand that everyone use the same tools we can define the file format however we want. What we want is to make the RFCs editable and displayable with existing tools. Compulsory stylesheets don't make problem (c) go away - they just move part of the problem. Plain text has the nice attribute that it encourage you to concentrate on substance rather than appearance. It's hard to escape the conclusion that we're trying very hard to make our document processing much more complex for a very marginal gain. There are two large populations for whom the gain is not marginal. 1. Those, like me, who can't print ASCII files easily I suspect it's not because you can't do so, but rather because you want to be able to do so without changing the tools you use. (after all, lpr works fine to print text files on every UNIX/linux system I've seen and also on my Mac) Guess what, nobody else wants to change tools either - and different kinds of people need different tools. Changing the file format won't solve this problem, it will just move it. 2. Those whose names can't be spelled properly in ASCII. It's a valid concern, but surely it's more important to communicate the protocol specification clearly than the authors' names? I certainly don't object to making the native spellings of authors' names visible if this can be done for free, but any change
RE: I-D file formats and internationalization
Behalf Of Tim Bray Unfortunately, most web browsers fail to preserve page breaks (FF characters) when printing flat text files, which makes the resulting documents hard to read. Turn this around; when printing HTML, the browser inserts appropriate page breaks depending on the combination of font, styling, and paper size that's in effect. This has the effect that when you're arguing about some text, you have to say Look at 5.2.1.3, 2nd para rather than Look at page 13, 2nd para. It's not clear that this is any better or worse. This is actually a rediscovery of something the medieval scribes discovered (and Tim almost certainly knows more on this than I do) Early codex manuscripts (books with pages) have foliation, not pagination. That is there is one number per sheet of paper rather than a separate number for each side. The original purpose of the marks was probably to assist the production of the books rather than allow citations but there is some use of citations. Once people started to use foliation for citations they started to see the disadvantages. Editions of the bible were particularly problematic once people started attempting to cross reference translations back to the original text. This was a particular problem with the old testament as some parts of the vulgate are actually translations of translations. The solution was to refer to scripture by chapter and verse. In HTML there are three separate options for creating citations. You can use style sheet pagination if you must but citations will be extremely fragile and can be broken by any edit to the text. You can also use section.subsection etc references which are more robust. The best method though is to label each section with a name. That allows citations to survive from one version of the document to another even if sections are added or deleted. On a point of information, most of the references I see in existing RFCs are to sections in any case. Phill ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Once people started to use foliation for citations they started to see the disadvantages. Editions of the bible were particularly problematic once people started attempting to cross reference translations back to the original text. This was a particular problem with the old testament as some parts of the vulgate are actually translations of translations. The solution was to refer to scripture by chapter and verse. By the time RFCs take on the significance of scripture (and are as widely corrupted as scripture) I'm sure we'll all be using section numbers :) In HTML there are three separate options for creating citations. You can use style sheet pagination if you must but citations will be extremely fragile and can be broken by any edit to the text. Anyone who tries to use a page number from RFC 1342 to refer to text in RFC 2047 deserves whatever he gets. Keith ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On 12/1/05, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: On a point of information, most of the references I see in existing RFCs are to sections in any case. I suspect this is because almost everyone refers to an HTML version in informal communication. But, I actually agree with Keith that keeping the format as a text file is the right thing to do. I agree with Tim about internationalization and printing reality, but think UTF-8 text files would be the best route. I don't like the idea of using HTML, because it breaks up the document and allows bitmap illustrations. I think Keith is spot-on when he says the text format encourages clear thinking. In the WG I frequent, I've found that the worst and most complicated ideas usually come with elaborate illustrations. Anyway, I'm still not clear on the what must-have software is preventing the IETF from using UTF-8. My linux systems allow me to write Thai and Katakana in vi. Guess it must be the printing. I haven't owned a printer since 1998, so I find it hard understand why some consider printing to be a frequent and important task. -- Robert Sayre I would have written a shorter letter, but I did not have the time. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
Robert, See below... -- -Original Message- -- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] -- On Behalf Of Robert Sayre -- Sent: Thursday, December 01, 2005 5:38 PM -- To: Hallam-Baker, Phillip -- Cc: [EMAIL PROTECTED]; Keith Moore; Tim Bray; ietf@ietf.org -- Subject: Re: I-D file formats and internationalization -- -- On 12/1/05, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: -- -- On a point of information, most of the references I see -- in existing RFCs are to sections in any case. -- -- I suspect this is because almost everyone refers to an HTML -- version in informal communication. Actually, at least for my own part, the reason is simply that I am creating the text without any awareness of page numbers. Whether you use nroff or XML, the document is not broken into pages until you process it. Consequently, for ar least some people, the only reasonable references are to section numbers - and even that gets messy if you need to re-organize text. -- But, I actually agree with Keith that keeping the format as -- a text file is the right thing to do. I agree with Tim about -- internationalization and printing reality, but think UTF-8 -- text files would be the best route. -- -- I don't like the idea of using HTML, because it breaks up -- the document and allows bitmap illustrations. I think Keith -- is spot-on when he says the text format encourages clear -- thinking. Unfortunately, as many of us are aware, not everybody thinks - or even comprehends - the same way. For some people, text is explicitly problematic; however, most anybody is able to understand simple diagrams - at least, once they are explained. -- In the WG I frequent, I've found that the worst and most -- complicated ideas usually come with elaborate illustrations. Yes, but sometimes this is not causal in the way that you think it is. In some cases, an elaborate illustration is justified by the complicated ideas it helps to explain and not the other way around. -- -- Anyway, I'm still not clear on the what must-have software is -- preventing the IETF from using UTF-8. My linux systems allow me to -- write Thai and Katakana in vi. Guess it must be the printing. I -- haven't owned a printer since 1998, so I find it hard understand why -- some consider printing to be a frequent and important task. :-) -- -- -- -- -- Robert Sayre -- -- I would have written a shorter letter, but I did not have -- the time. I love this quote. Too bad it's not attributed. Did you make it up yourself? I'd like to use it sometime... -- -- ___ -- Ietf mailing list -- Ietf@ietf.org -- https://www1.ietf.org/mailman/listinfo/ietf -- ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
-Original Message- From: Robert Sayre [mailto:[EMAIL PROTECTED] Sent: Thursday, December 01, 2005 5:38 PM To: Hallam-Baker, Phillip Cc: Tim Bray; Keith Moore; [EMAIL PROTECTED]; ietf@ietf.org Subject: Re: I-D file formats and internationalization On 12/1/05, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: On a point of information, most of the references I see in existing RFCs are to sections in any case. I suspect this is because almost everyone refers to an HTML version in informal communication. Why do you consider the TXT version to be authoritative if as you admit the HTML version is the one that is read by reviewers and readers? Meaning is the result of usage. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On 12/1/05, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: On a point of information, most of the references I see in existing RFCs are to sections in any case. I suspect this is because almost everyone refers to an HTML version in informal communication. But, I actually agree with Keith that keeping the format as a text file is the right thing to do. Actually I don't think that utf-8 plain text is the right thing to do. I think we should stick with ASCII for now, as the benefits of UTF-8 _for our particular purposes_ aren't compelling enough, and the the ability to read and print UTF-8 in the field is still significantly worse than the ability to read and print ASCII. In a few years, perhaps, we should move to UTF-8, but by then I suspect we'd be better off doing (M)HTML than plain text. OS support for plain text files seems to be getting worse over time rather than better. Or maybe we'll just continue to have PDF versions of the RFCs but allow them to contain non-Latin characters for authors' names. (seems like our current rules more or less let us do that already, and it would just require an extension to xml2rfc to allow such names to be specified and some changes to the document production toolchain to permit them to be included in the PDF version) Keith ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Why do you consider the TXT version to be authoritative if as you admit the HTML version is the one that is read by reviewers and readers? I don't think that's actually true. The TXT versions are not only authoritative, they're also the ones available from official sources. And personally I find most of the HTML versions of RFCs quite annoying to read because of distracting embellishments. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
Phillip, Two things: 1) Robert was speculating as to the reason why people use chapter and verse rather than pages in their references, and 2) He said informal communication. There is something a bit informal about referring to an informal communication as authoritative. There is something equally casual about inferring that what people read is necessarily authoritative. Usage is that typically it is what people refer to when they have a question, that is considered authoritative. Otherwise, Executive Summaries would be considered to be authoritative and this would beg the questions - why bother including the rest? -- Eric -- -Original Message- -- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] -- On Behalf Of Hallam-Baker, Phillip -- Sent: Thursday, December 01, 2005 6:07 PM -- To: Robert Sayre -- Cc: [EMAIL PROTECTED]; Keith Moore; Tim Bray; ietf@ietf.org -- Subject: RE: I-D file formats and internationalization -- -- -- -- -Original Message- -- From: Robert Sayre [mailto:[EMAIL PROTECTED] -- Sent: Thursday, December 01, 2005 5:38 PM -- To: Hallam-Baker, Phillip -- Cc: Tim Bray; Keith Moore; [EMAIL PROTECTED]; ietf@ietf.org -- Subject: Re: I-D file formats and internationalization -- -- On 12/1/05, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: -- -- On a point of information, most of the references I see -- in existing -- RFCs are to sections in any case. -- -- I suspect this is because almost everyone refers to an HTML -- version in informal communication. -- -- Why do you consider the TXT version to be authoritative if -- as you admit -- the HTML version is the one that is read by reviewers and readers? -- -- Meaning is the result of usage. -- -- -- ___ -- Ietf mailing list -- Ietf@ietf.org -- https://www1.ietf.org/mailman/listinfo/ietf -- ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
(distro trimmed -- I assume everyone participating in this interminable discussion is on the IETF list) --On Thursday, 01 December, 2005 17:38 -0500 Robert Sayre [EMAIL PROTECTED] wrote: On 12/1/05, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: On a point of information, most of the references I see in existing RFCs are to sections in any case. I suspect this is because almost everyone refers to an HTML version in informal communication. Bad inference, I think. We have had explicit guidelines (some, I think, in various versions of Instructions to RFC Authors) that express a preference for section references. There are many ways to scan (visually) an RFC, or to find material in it online using an ASCII text editor (think ED and its clones, not just emacs/vi), that make section numbers much easier to find than page numbers and paragraphs. Section numbers are related to actual content and context, which is usually what people are trying to reference. I think that, if you went back to archives old enough that there was no HTML alternative because there was no HTML, you would still find that the overwhelming number of references into RFCs, informal and formal, used page numbers. john ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
Title: RE: I-D file formats and internationalization I don't think that the term 'authoritative' has much utility. The version I want is the one most likely to be trustworthy. The early church had a series of battles deciding which text should be considered cannonical. And pretty unpleasant ones at that. If you have the medieval view of knowledge resulting uniquely from divine authority such disputes might make sense. I think rfcs are closer to the wiki end of the scale. -Original Message- From: Gray, Eric [mailto:[EMAIL PROTECTED]] Sent: Thu Dec 01 15:24:59 2005 To: Hallam-Baker, Phillip Cc: Robert Sayre; [EMAIL PROTECTED]; Keith Moore; Tim Bray; ietf@ietf.org Subject: RE: I-D file formats and internationalization Phillip, Two things: 1) Robert was speculating as to the reason why people use chapter and verse rather than pages in their references, and 2) He said informal communication. There is something a bit informal about referring to an informal communication as authoritative. There is something equally casual about inferring that what people read is necessarily authoritative. Usage is that typically it is what people refer to when they have a question, that is considered authoritative. Otherwise, Executive Summaries would be considered to be authoritative and this would beg the questions - why bother including the rest? -- Eric -- -Original Message- -- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] -- On Behalf Of Hallam-Baker, Phillip -- Sent: Thursday, December 01, 2005 6:07 PM -- To: Robert Sayre -- Cc: [EMAIL PROTECTED]; Keith Moore; Tim Bray; ietf@ietf.org -- Subject: RE: I-D file formats and internationalization -- -- -- -- -Original Message- -- From: Robert Sayre [mailto:[EMAIL PROTECTED]] -- Sent: Thursday, December 01, 2005 5:38 PM -- To: Hallam-Baker, Phillip -- Cc: Tim Bray; Keith Moore; [EMAIL PROTECTED]; ietf@ietf.org -- Subject: Re: I-D file formats and internationalization -- -- On 12/1/05, Hallam-Baker, Phillip [EMAIL PROTECTED] wrote: -- -- On a point of information, most of the references I see -- in existing -- RFCs are to sections in any case. -- -- I suspect this is because almost everyone refers to an HTML -- version in informal communication. -- -- Why do you consider the TXT version to be authoritative if -- as you admit -- the HTML version is the one that is read by reviewers and readers? -- -- Meaning is the result of usage. -- -- -- ___ -- Ietf mailing list -- Ietf@ietf.org -- https://www1.ietf.org/mailman/listinfo/ietf -- ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Dec 1, 2005, at 3:16 PM, Keith Moore wrote: the the ability to read and print UTF-8 in the field is still significantly worse than the ability to read and print ASCII. That assertion could use a little empirical backing. Empirically, there are people who find the ASCII versions easier to deal with; you are one. Empirically, there are those who simply do not observe problems with UTF-8; I am one. I could say There are more like me than like you and while I suspect that is correct, I have no evidence to back it up, so I won't assert it. I will state with some confidence however that the group of people in your position is shrinking, while that in mine is growing. And I will freely admit that I find the notion that a group of people designing global infrastructure think it's OK to use ASCII so morally and and aesthetically offensive that it probably interferes with my evaluation of the trade-offs. I will now shut up. It is clearly the case that there is tremendous resistance within the IETF to leaving their comfy ASCII enclave. -Tim ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Robert, This is a good point. It even applies to the IETF secretariat. It used to be impossible to register with your real name if it contained non-ASCII characters. I think that has changed, I recall having Seen Olafur Gudmundson's badge with the real Icelandic curly d (or whatever it is called in English) at a recent meeting. I have not seen Japanese or Chinese or Korean, which I guess would be the next logical step... Ole Ole J. Jacobsen Editor and Publisher, The Internet Protocol Journal Academic Research and Technology Initiatives, Cisco Systems Tel: +1 408-527-8972 GSM: +1 415-370-4628 E-mail: [EMAIL PROTECTED] URL: http://www.cisco.com/ipj On Tue, 29 Nov 2005, Robert Sayre wrote: I've noticed that the recent debate on the ASCII text format has often conflated formatting of artwork and Unicode support. I think finding a non-text artwork format that has free uniform authoring (including diffs) and viewer support will be impossible for the next 5-10 years. An XML equivalent to Postscript may eventually be widely implemented. The current effort, SVG, is a massive specification, unevenly implemented, and lacks a thorough test suite. Unicode support is a different matter. I find the current IETF policy to be incredibly bigoted. Many RFCs and I-Ds are currently forced to misspell the names of authors and contributors, which doesn't seem like correct attribution to me. So, I recommend that the IETF secretariat and the RFC Editor change their policies to allow UTF-8 text files. That way, older RFCs and I-Ds produced using the current tools would follow the same encoding. I'm sure someone has already suggested this approach, but I'll add my voice to the chorus. Robert Sayre ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
It was sad that people can accept this. To degrade the name of their friends. In every country this is insulting. It is good news you can type better. But this has not changed in RFC. If in the thanks section you hurt a name. The thanked person, will not be happy. Eduardo Mendez 2005/11/30, Ole Jacobsen [EMAIL PROTECTED]: Robert, This is a good point. It even applies to the IETF secretariat. It used to be impossible to register with your real name if it contained non-ASCII characters. I think that has changed, I recall having Seen Olafur Gudmundson's badge with the real Icelandic curly d (or whatever it is called in English) at a recent meeting. I have not seen Japanese or Chinese or Korean, which I guess would be the next logical step... Ole Ole J. Jacobsen Editor and Publisher, The Internet Protocol Journal Academic Research and Technology Initiatives, Cisco Systems Tel: +1 408-527-8972 GSM: +1 415-370-4628 E-mail: [EMAIL PROTECTED] URL: http://www.cisco.com/ipj On Tue, 29 Nov 2005, Robert Sayre wrote: I've noticed that the recent debate on the ASCII text format has often conflated formatting of artwork and Unicode support. I think finding a non-text artwork format that has free uniform authoring (including diffs) and viewer support will be impossible for the next 5-10 years. An XML equivalent to Postscript may eventually be widely implemented. The current effort, SVG, is a massive specification, unevenly implemented, and lacks a thorough test suite. Unicode support is a different matter. I find the current IETF policy to be incredibly bigoted. Many RFCs and I-Ds are currently forced to misspell the names of authors and contributors, which doesn't seem like correct attribution to me. So, I recommend that the IETF secretariat and the RFC Editor change their policies to allow UTF-8 text files. That way, older RFCs and I-Ds produced using the current tools would follow the same encoding. I'm sure someone has already suggested this approach, but I'll add my voice to the chorus. Robert Sayre ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Sayre Unicode support is a different matter. I find the current IETF policy to be incredibly bigoted. Many RFCs and I-Ds are currently forced to misspell the names of authors and contributors, which doesn't seem like correct attribution to me. So, I recommend that the IETF secretariat and the RFC Editor change their policies to allow UTF-8 text files. That way, older RFCs and I-Ds produced using the current tools would follow the same encoding. I'm sure someone has already suggested this approach, but I'll add my voice to the chorus. It might seem odd to people whose names do fit in ASCII but there are a lot of people who care about this type of issue. In effect the message is sent out 'you do not really belong here', 'you are a second class citizen', 'the IETF is an American organization and the only people who really matter are going to be American'. The fact that Brian is English and lives in Zurich is irrelevant. People take their names very seriously and personally. It's a question of outreach. Having one meeting out of three held outside North America each year is not outreach, it is a holiday. I am currently at the W3C AC meeting. They are also involved in the ongoing 'internet governance' discussions but the W3C is involved a participant in the discussions while the IETF is one of the topics of the discussions. Needless to say it is better to be a participant than the topic. The W3C has avoided concern by being conspicuously international in its approach. The IETF has had the attitude 'this is the way we do things here, nobody asked you to like it'. So far 700 translations of W3C specifications have been made by volunteers. I don't know what the quality of the translations are, I would certainly be upset if one of my engineers used a translation as the basis for writing running code. But those translations are certainly used by academics to teach comp-sci courses in those languages and a large number of students who would have found it difficult to understand the material in translation have come to understand and use the specification. It is simply a fact of modern life that the ability to speak English well is an essential qualification for almost all forms of knowledge work, particularly at the research and elite levels. That does not mean that a group of mostly English speakers should also make good English an essential qualification at the apprentice and journeyman stages of learning the craft. Phill ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
Ole Jacobsen writes... This is a good point. It even applies to the IETF secretariat. It used to be impossible to register with your real name if it contained non-ASCII characters. I think that has changed, I recall having Seen Olafur Gudmundson's badge with the real Icelandic curly d (or whatever it is called in English) at a recent meeting. I have not seen Japanese or Chinese or Korean, which I guess would be the next logical step... While this is a bit of a digression, the purpose of IETF badges is (at least) two-fold. First, to show that the bearer has paid the registration fee, and second, to allow other attendees to address the bearer by name during sessions and informal conversations. If badges were in non-Latin character sets, it would make it difficult for me to address the bearer in English. It seems that there are parallels for RFCs and I-Ds. If the official language of these documents is English, then should we have portions of those documents represented in other languages, and more at issue, other character sets? In the attributions sections, one could, of course, provide a Latin character set representation in addition to the native national character set, for names, addresses, etc. ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Phillip: I am currently at the W3C AC meeting. They are also involved in the ongoing 'internet governance' discussions but the W3C is involved a participant in the discussions while the IETF is one of the topics of the discussions. Needless to say it is better to be a participant than the topic. The only thing worse than being talked about is NOT being talked about. -- Oscar Wilde So far 700 translations of W3C specifications have been made by volunteers. Yes, VOLUNTEERS! Nobody is stopping anyone from translating the RFCs. I'd have to bet it's been done in quite a few cases. Eliot ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
For that matter, most Americans don't speak English Mark On Nov 30, 2005, at 10:43 AM, Hallam-Baker, Phillip wrote: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Sayre Unicode support is a different matter. I find the current IETF policy to be incredibly bigoted. Many RFCs and I-Ds are currently forced to misspell the names of authors and contributors, which doesn't seem like correct attribution to me. So, I recommend that the IETF secretariat and the RFC Editor change their policies to allow UTF-8 text files. That way, older RFCs and I-Ds produced using the current tools would follow the same encoding. I'm sure someone has already suggested this approach, but I'll add my voice to the chorus. It might seem odd to people whose names do fit in ASCII but there are a lot of people who care about this type of issue. In effect the message is sent out 'you do not really belong here', 'you are a second class citizen', 'the IETF is an American organization and the only people who really matter are going to be American'. The fact that Brian is English and lives in Zurich is irrelevant. People take their names very seriously and personally. It's a question of outreach. Having one meeting out of three held outside North America each year is not outreach, it is a holiday. I am currently at the W3C AC meeting. They are also involved in the ongoing 'internet governance' discussions but the W3C is involved a participant in the discussions while the IETF is one of the topics of the discussions. Needless to say it is better to be a participant than the topic. The W3C has avoided concern by being conspicuously international in its approach. The IETF has had the attitude 'this is the way we do things here, nobody asked you to like it'. So far 700 translations of W3C specifications have been made by volunteers. I don't know what the quality of the translations are, I would certainly be upset if one of my engineers used a translation as the basis for writing running code. But those translations are certainly used by academics to teach comp-sci courses in those languages and a large number of students who would have found it difficult to understand the material in translation have come to understand and use the specification. It is simply a fact of modern life that the ability to speak English well is an essential qualification for almost all forms of knowledge work, particularly at the research and elite levels. That does not mean that a group of mostly English speakers should also make good English an essential qualification at the apprentice and journeyman stages of learning the craft. Phill ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Hallam-Baker, Phillip wrote: SNIP previous posters text It might seem odd to people whose names do fit in ASCII but there are a lot of people who care about this type of issue. You can of course publish drafts and RFC's as XML which supports any character set you want. SNIP The IETF has had the attitude 'this is the way we do things here, nobody asked you to like it'. It would be better to state: if you don't like it, participate. Because if you didn't participate, then don't complain that it isn't like you wanted it to be. Yes that requires significant effort, time and thus cash, but that is mostly unavoidable. So far 700 translations of W3C specifications have been made by SNIP One can always start translating RFC's: 4267 RFC's and a long list of drafts to go, though there is a lot of material already translated by book authors. Note that many languages don't have translations for many English words. German is one of the good examples where they have a lot of German versions of English words, but they don't have one for 'Hyper Text Transfer Protocol' unless you babelfish it to Hyper Text-Übergangsprotokoll*, which is far from a useful translation. There is also the thing that sentence construction might cause misinterpretation from what the original working group meant. SHOULD and MUST both translate to Muß in German, which is thus not correct either. This can cause many issues. And German is somewhat in the same line as English, I am not even thinking in the area of Asian languages, which I unfortunately am far from familiar with except that they resemble small pictures of what they mean. I don't think it is a task for the IETF itself to translate documents. But it would indeed be nice to have a place for them, with a big note that they have not been verified and may include odd translations. Maybe there could be a separate 'translated documents' section? Greets, Jeroen * contains a U-umlaut (Uuml;) and Ringel-S (szlig;) signature.asc Description: OpenPGP digital signature ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Robert Sayre wrote: I'm sure someone has already suggested this approach, but I'll add my voice to the chorus. http://tools.ietf.org/html/draft-hoffman-utf8-rfcs I really don't like this approach for various reasons. Bye, Frank ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Nov 30, 2005, at 12:41 PM, Frank Ellermann wrote: Robert Sayre wrote: I'm sure someone has already suggested this approach, but I'll add my voice to the chorus. http://tools.ietf.org/html/draft-hoffman-utf8-rfcs I really don't like this approach for various reasons. Rather than opening RFCs to text utilizing any character-set anywhere, as this draft suggests, there could be alternative UTF fields for an author's name and reference titles, and perhaps defined characters for simple line and table drawing that invoke automatic translation when an ASCII text version is generated. Being able to review the ID as it would appear as an RFC would also seem to be a requirement. It seems problematic for protocol examples to use non- ASCII characters owing to there not being ubiquitously displayable character-sets. -Doug ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
RE: I-D file formats and internationalization
* * Unicode support is a different matter. I find the current * IETF policy to be incredibly bigoted. Many RFCs and I-Ds are * currently forced to misspell the names of authors and * contributors, which doesn't seem like correct attribution to * me. So, I recommend that the IETF secretariat and the RFC * Editor change their policies to allow UTF-8 text files. That * way, older RFCs and I-Ds produced using the current tools * would follow the same encoding. This issue has been brought up before and has been on our list of things to worry about for at least two years. But we always run aground on the following consideration: there is a substantial constituency for have some least-common-denominator form of IETF documents, so people can read and print them with even the most primitive, old-fashioned (unfashionable?) tools. Allowing extended character sets for author names seems like a really nice idea to the RFC Editor as well, but we see no way to do that and keep the LCD. You need some kind of structured document that some people won't have the tools to display, search, print, ... The RFC Editor would welcome a way out of this dilemma. It is not bigotry, only realism. Bob Braden ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
At 1:54 PM -0800 11/30/05, Douglas Otis wrote: http://tools.ietf.org/html/draft-hoffman-utf8-rfcs Rather than opening RFCs to text utilizing any character-set anywhere, as this draft suggests, That is not what the RFC suggests at all. The character set is Unicode. The encoding is UTF-8. That's it. there could be alternative UTF fields for an author's name and reference titles, and perhaps defined characters for simple line and table drawing that invoke automatic translation when an ASCII text version is generated. That's a possibility (if you define what an alternative UTF field is). Why is it better than simply using UTF-8 everywhere? Being able to review the ID as it would appear as an RFC would also seem to be a requirement. That means changing the Internet Drafts process as well. Certainly possible, but more daunting that changing one process at a time. It seems problematic for protocol examples to use non-ASCII characters owing to there not being ubiquitously displayable character-sets. Unicode is universally displayable if you have the right font(s). Regardless of that, however, any sane document author would not assume that every person reading the document could display it. They would put a legend or explanation near the example. --Paul Hoffman, Director --VPN Consortium ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
Douglas Otis wrote: there could be alternative UTF fields for an author's name and reference titles, For the new 3066bis language tags registry we adopted the known #x12345; notation for u+012345. As Bob said raw UTF-8 characters won't fly with `cat rfc4567 /dev/lpt1` and other simple uses of RFCs. perhaps defined characters for simple line and table drawing Ugh. That's a case where I'd prefer PDF or something better. ASCII art is one thing, IMHO it's cute. But line drawing char.s are a PITA: my local charset pc-multilingual-850+euro still has this, today it's just crap, let's forget it please, codepage 437 etc. and curses ACS_* are ancient history. Ok., in theory Unicode has these characters, but if we'd really want this we could as well jump from plain text to MathML. It seems problematic for protocol examples to use non-ASCII characters owing to there not being ubiquitously displayable character-sets. Yes, OTOH I vaguely recall one of Martin's (?) texts, where his attempt to talk about non-ASCII issues in ASCII wasn't straight forward, to put it mildly, and it certainly wasn't his fault. Similar texts published by the W3C using UTF-8 are even worse from my POV (both with a pre-UTF-8 browser and otherwise, for the latter I just don't have the required fonts). Bye, Frank ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
On Nov 30, 2005, at 2:23 PM, Paul Hoffman wrote: At 1:54 PM -0800 11/30/05, Douglas Otis wrote: Rather than opening RFCs to text utilizing any character-set anywhere, as this draft suggests, That is not what the RFC suggests at all. The character set is Unicode. The encoding is UTF-8. That's it. Unicode provides a unique number for every possible character within a current range of about 97,000 characters. These characters include punctuation marks, diacritics, mathematical and technical symbols, arrows, dingbats, etc. Displaying one of these characters requires a character-set (synonymous with a display system's font-set or character-repertoire), or using the unicode vernacular, a script. It is not just a matter of which character is displayed, which character- repertoire is used, but there are also Middle Eastern right-to-left issues as well. there could be alternative UTF fields for an author's name and reference titles, and perhaps defined characters for simple line and table drawing that invoke automatic translation when an ASCII text version is generated. That's a possibility (if you define what an alternative UTF field is). Why is it better than simply using UTF-8 everywhere? Such alternative field could be an added element to the DTD or Schema defining the XML input document. When the output is other than ASCII, the alternative field could be displayed. To allow compatibility with existing tools, the ASCII version would not be affected. Permitting access to _some_ extended characters could improve upon the quality of some line-drawing for non-ASCII outputs. To avoid the pain-in-the-ass issue, improved drawings could be generated by a simple web based drawing application, where the translation back into ASCII artwork would be straight-forward, and yet provide comparable results. Currently, improved graphics are limited to the generation of HTML tables. The drawing application could even create the needed XML wrapper for an RFC. Being able to review the ID as it would appear as an RFC would also seem to be a requirement. That means changing the Internet Drafts process as well. Certainly possible, but more daunting that changing one process at a time. As an ID becomes an RFC, it seems expecting last minute changes to the document would be even more daunting. It seems problematic for protocol examples to use non-ASCII characters owing to there not being ubiquitously displayable character-sets. Unicode is universally displayable if you have the right font(s). Regardless of that, however, any sane document author would not assume that every person reading the document could display it. They would put a legend or explanation near the example. Assume such characters can not be displayed, at least not with the ASCII version that excludes the extended character-set allowed by unicode. An escape mechanism would be needed to accommodate alternative text, where displaying '?' for the unicode characters that extends beyond ASCII would not be a very satisfactory solution, as this would make the ASCII version less authoritative, to say the least, and break the way many use the RFC text files. I liked the idea that Frank suggested, use the HTML escape sequence to declare the unicode character. This allows the ASCII version to remain authoritative. -Doug ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: I-D file formats and internationalization
At 5:59 PM -0800 11/30/05, Douglas Otis wrote: On Nov 30, 2005, at 2:23 PM, Paul Hoffman wrote: At 1:54 PM -0800 11/30/05, Douglas Otis wrote: Rather than opening RFCs to text utilizing any character-set anywhere, as this draft suggests, That is not what the RFC suggests at all. The character set is Unicode. The encoding is UTF-8. That's it. Unicode provides a unique number for every possible character within a current range of about 97,000 characters. These characters include punctuation marks, diacritics, mathematical and technical symbols, arrows, dingbats, etc. Displaying one of these characters requires a character-set (synonymous with a display system's font-set or character-repertoire), or using the unicode vernacular, a script. It is not just a matter of which character is displayed, which character-repertoire is used, but there are also Middle Eastern right-to-left issues as well. It may be better to use a single vocabulary for discussing things such as internationalization and character sets. That's the purpose of RFC 3536. Being able to review the ID as it would appear as an RFC would also seem to be a requirement. That means changing the Internet Drafts process as well. Certainly possible, but more daunting that changing one process at a time. As an ID becomes an RFC, it seems expecting last minute changes to the document would be even more daunting. Yep, that's the tradeoff. We already make some automatic changes after in Internet Draft is approved by the IESG, and we allow others without IESG oversight. This would be another class. That scares some people, and not others. Having Internet Drafts use Unicode in UTF-8 instead of ASCII scares some people, and not others. It seems problematic for protocol examples to use non-ASCII characters owing to there not being ubiquitously displayable character-sets. Unicode is universally displayable if you have the right font(s). Regardless of that, however, any sane document author would not assume that every person reading the document could display it. They would put a legend or explanation near the example. Assume such characters can not be displayed, at least not with the ASCII version that excludes the extended character-set allowed by unicode. An escape mechanism would be needed to accommodate alternative text, where displaying '?' for the unicode characters that extends beyond ASCII would not be a very satisfactory solution, as this would make the ASCII version less authoritative, to say the least, and break the way many use the RFC text files. No escape mechanism is needed. Non-displayable characters are still in the RFC, they simply can't be displayed by everyone (but they can be displayed by many). This is infinitely simpler, and a much better long-term solution, than an escape mechanism. Further, there would be no more ASCII version to be authoritative. The Internet Draft clearly says that there is a single RFC, and it has a single encoding. I liked the idea that Frank suggested, use the HTML escape sequence to declare the unicode character. This allows the ASCII version to remain authoritative. ... as well as unreadable and unsearchable using normal search mechanisms. The purpose of the proposal is to allow RFCs to be readable and searchable using the encoding that is common on the Internet, without resorting to sorta-kinda-HTML or an escape mechanism. Remaining with plain ASCII would be better than either of the latter. --Paul Hoffman, Director --VPN Consortium ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Internationalization by ASCII art (was Re: I-D file formats and internationalization)
Robert Sayre wrote: Unicode support is a different matter. I find the current IETF policy to be incredibly bigoted. Many RFCs and I-Ds are currently forced to misspell the names of authors and contributors, which doesn't seem like correct attribution to me. It is your stupidity that you can't recognize peoples' names correctly represented in ASCII. See how widely non-ASCII domain names and mail addresses are used. However, if you insist, you may use ASCII art. Masataka Ohta ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf
Re: Internationalization by ASCII art (was Re: I-D file formats and internationalization)
On 11/30/05, Masataka Ohta [EMAIL PROTECTED] wrote: Robert Sayre wrote: Unicode support is a different matter. I find the current IETF policy to be incredibly bigoted. Many RFCs and I-Ds are currently forced to misspell the names of authors and contributors, which doesn't seem like correct attribution to me. It is your stupidity that you can't recognize peoples' names correctly represented in ASCII. Well, I'm not the sharpest knife in the drawer, but Google is even duller than I am. Anyway, I'm wondering what all the command line whinging is about, since I came home from work and tried some command line tools on http://www.cl.cam.ac.uk/~mgk25/ucs/examples/quickbrown.txt. I tried cat, vi/vim, more/less, and pico/nano on Ubuntu Linux 5.10. -- Robert Sayre ___ Ietf mailing list Ietf@ietf.org https://www1.ietf.org/mailman/listinfo/ietf