Re: Accessing alternate glyphs from plain text (from Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)
Thank you for replying. On Saturday, 7 August 2010, Doug Ewell wrote: > I think the "alternate ending glyph" is supposed to be > specified in more detail than that. The example Asmus > gave was U+222A UNION with serifs. Even though the exact > proportions of the serifs may differ from one font to the > next, this is still a relatively precise and constrained > definition, unlike "Latin small letter e with some > 'alternate ending' which is completely up to the discretion > of the font designer." > > Because of stylistic differences among calligraphers—this > is a calligraphy question, not a poetry question—it is > hard to imagine how this aspect of the proposal would not > result in an unbounded number of glyphic variations. > 'e' is not the only letter to which calligraphers like to > attach special endings, and a swash cross-stroke is not the > only special ending that calligraphers like to attach to > 'e'. > It seems to me that there are at least two ways to have an alternate ending e. One is to extend the cross-stroke to the right beyond the e and end the extension with a flourish of some sort, another is to extend the lower line out to the right and end that extension in some way. I can imagine that a proposal would lead to wanting to be able to express a choice of the two, or more, possible variants of a letter, should the font have alternate glyphs of both types. Then there is the question of what is to happen if the requested one is not available in the font: does the other alternate glyph become displayed or does the basic character glyph become displayed? > I'd like to see an FAQ page on "What is Plain Text?" > written primarily by UTC officers. That might go a > long way toward resolving the differences between William's > interpretation of what plain text is, which people like me > think is too broad, and mine, which some people have said is > too narrow. That is a good idea. Thank you also for the careful precision with which you describe the situation of who thinks what. Yet is producing such a document an impossible task? Some years ago there was a suggestion in this mailing list to produce an Frequently Asked Questions (FAQ) page about what should not be encoded. Is the document that is now suggested effectively the same thing? I thought of an analogy of trying to produce a FAQ document of "What is art?". Such a document produced in 1550 might well have been very different from one produced in 1910, and those different from one produced in 1995 and those all different from one produced in 2010. Maybe the analogy is not perfect, but it seems to convey the meaning to me that if a "What is Plain Text?" document is produced, with a view to being able to decide what could and could not in the future be encoded in Unicode as plain text, then it could quickly become either out of date or a restriction of progress in technology. The recent encoding of the emoticons shows a dramatic change in what can be encoded as plain text from the situation some years ago. Some of my ideas have been refuted as not being suitable for encoding in plain text. Yet the refutation all seems to be based on unchangeable rules from about twenty years ago. Yet change is part of progress. I remember once being referred, in this mailing list, to an ISO document about encoding. The document made reference to a definition of character within the same document. The document was ISO/IEC TR 15285. I have found that the document is available here (the link used at the previous time no longer works). http://openstandards.dk/jtc1/sc2/wg2/docs/TR%2015285%20-%20C027163e.pdf The introduction includes the following. quote This Technical Report is written for a reader who is familiar with the work of SC 2 and SC 18. Readers without this background should first read Annex B, “Characters”, and Annex C, “Glyphs”. end quote Annex B has the following. quote In ISO/IEC 10646-1:1993, SC 2 defines a character as: A member of a set of elements used for the organisation, control, and representation of data. end quote On the accessing of alternate glyphs from plain text, I feel that as there are 256 variation selectors that could be used with each of the Latin letters, then, provided that no harm is done to those who choose not to use them, that some should be encoded so that alternate glyphs can be accessed from fonts. Some readers might find the following of interest. http://forum.high-logic.com/viewtopic.php?f=36&t=2229 It is a thread entitled "An unusual glyph of an Esperanto character in the Arno font". I had been looking through the following document. http://store1.adobe.com/type/browser/pdfs/ARNP/ArnoPro-Italic.pdf I had found an alternate ending glyph for the h circumflex character and had then tried to produce some text where it could be used. I felt that it was a situation of typography inspiring creative writing. Readers who enjoyed th
Re: Apostrophe in transliteration
On Mon, 9 Aug 2010, Jukka K. Korpela wrote: > It is of course transliteration standards that should say something > normative about the matter. As far as I can remember, the authoritative > versions of the relevant standards are the paper publications, which > do no identify characters by Unicode numbers, just as ink on paper. ISO standards have always identified the characters used for transliteration by reference to ISO 5426. German standards have always identified the characters by reference to DIN 31624. Recent DIN standards identify the characters by reference to Unicode. Library of Congress rules have always identified the characters by reference to ANSI Z39.47. I believe there are mapping tables from ISO 5426, DIN 31624, ANSI Z39.47 to Unicode.
Auto Reply: Re: Apostrophe in transliteration
I won't have e-mail access before 2010-08-11
Re: Accessing alternate glyphs from plain text
Jukka K. Korpela wrote: Human writing did not originate as plain text, and at the surface level, it is never "plain text": it always has some specific physical appearance, and abstract "plain text" can only be found below the surface, as the underlying data format where only character identities (character numbers in a specific code) are encoded, with no reference to a particular rendering. I have the same trouble with this argument that I had last time it was made. Your handwritten A and mine may look different, and both may differ from a typewritten A, but they have something in common that allows us to identify them with each other. The whole premise of reading and writing is that we look below the surface to the identity of the letters and the meaning of the words. Saying that rendering text always has an appearance is not the same as saying that all text is rich text. The latter viewpoint is what leads some people to propose nonce variations in penmanship as Unicode characters. -- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
Re: Apostrophe in transliteration
Andreas Prilop wrote: On Mon, 9 Aug 2010, Jukka K. Korpela wrote: It is of course transliteration standards that should say something normative about the matter. As far as I can remember, the authoritative versions of the relevant standards are the paper publications, which do no identify characters by Unicode numbers, just as ink on paper. ISO standards have always identified the characters used for transliteration by reference to ISO 5426. Sorry, my memory did not serve me well. I think you have previously referred to such identifications in some discussions. I guess I had forgotten this due to my frustration: having tried to find definitive information on this, I got confused and found contradictions. I believe there are mapping tables from ISO 5426, DIN 31624, ANSI Z39.47 to Unicode. Apparently there are _several_ mapping tables, with e.g. four (or more?) alternative mappings for PRIME, and whatever their status might be, they aren't part of a transliteration standard that refers to, say, ISO 5426. -- Yucca, http://www.cs.tut.fi/~jkorpela/
ISO/TC 37 Conference 2010
*ISO/TC 37 will convene in Dublin next week, in the HQ of NSAI (The National Standards Authority of Ireland), where decisions will be made re many standards within ISO/TC 37's remit, such as ISO 639. The conference will have 104 participants from 21 countries: Australia, Austria, Belgium, Canada, China, Columbia, Finland, France, Germany, Ireland, Korea (Republic of), Mexico, Netherlands, Norway, Poland, Portugal, South Africa, Spain, Sweden, UK, US (more info on www.nsai.ie, including schedule of meetings and directions on how to get there). Irish people and others resident here who wish to participate as members of the NSAI delegation should first contact the convener of NSAI's ISO/TC 37 group, Fidelma Ní Ghallchobhair (fnighallchobh...@forasnagaeilge.ie), for which I run the Irish Delegation's only official e-mail service (nsai-isotc3...@listserv.heanet.ie), courtesy of the Higher Education Authority of Ireland. The NSAI delegation of ISO/TC 37 consists of sixteen members, who will also be happy to meet any members of IETF lists and Unicode lists who may already have arrived here to attend the TKE conference in DCU this week. Those would like to make arrangements to meet up with NSAI delegates may do so by responding to this e-mail. Sincerely, mg * -- Marion Gunn * eGteo (Estab.1991) 27 Páirc an Fhéithlinn, Baile an Bhóthair, An Charraig Dhubh, Co. Átha Cliath, Éire/Ireland * mg...@egt.ie * eam...@egt.ie *
Re: Accessing alternate glyphs from plain text
On Tue, Aug 10, 2010 at 13:15, Doug Ewell wrote: > Your handwritten A and mine may look different, and both may differ from a > typewritten A, but they have something in common that allows us to identify > them with each other. I have problems with this argument too. For example, consider the following text: YOURHANDWRITTENAANDMINEMAYLOOKDIFFERENTANDBOTHM AYDIFFERFROMATYPEWRITTENABUTTHEYHAVESOMETHINGIN COMMONTHATALLOWSUSTOIDENTIFYTHEMWITHEACHOTHER. This is written in a similar manner as texts were written in the past, before spacing, punctuation and lowercase came into being. Now it certainly has “something in common that allows us to identify” it with your original text. E.g., for most uses (but not all), we don’t mind adding modern punctuation and casing to ancient texts and saying it’s the “same” text. Nonetheless, by transforming your text I clearly lost some information. We don’t want to remove spacing and punctuation from plain text, even though the historic examples show that they’re not “strictly necessary”. (As you know, our plain text can even mark _different_ kinds of spacing, as you’re seeing if you’re reading this plain-text sentence in a variable-width font.) There’s some information lost when we render our “plain text” as ancient text. Similarly, there’s some information lost when we render handwritten text, typeset text, or computer “rich text” to plain text. It seems to me these two losses are different only in degree, not in kind. To run with your example, my handwriting certainly can go well beyond just “looking different” than a typewriter; it can actually encode significant linguistic information that the typewriter cannot. I have a letter whose author, in a moment of emotional distress, wrote the sentence “to hurt myself” several times, and in each time the words get larger and more slanted, with more irregular forms. This graphic resource is a representation of features of speak intensity, speed, intonation &c., which is to say, it has pretty much the same role as punctuation. If you encode her text in plain text, and even in rich text, you lose this linguistic information. The only way to keep something I’m willing to call “the same text”, in this case, would be an image. It’s all a matter of intended use. > The whole premise of reading and writing is that we > look below the surface to the identity of the letters and the meaning of the > words. No, the whole premise of reading and writing is to represent language, which is spoken, in a visual manner. Nothing to do with letters; letters are just tools for representing language. You cannot read without re-creating sound images in your head. Only after the sound image is recreated is that you reach the “meaning” (even, contrary to popular myth, in the case of so-called “ideographs”). Plain text can encode some features of the spoken language, but (obviously) not all. Some of the features left out might be considered important for some texts, in some uses. Nietzsche prose employs a lot of italics (which are typographic marks of something like emphatic stress in speak); if you take away the italics, the resulting text simply isn’t “the same” —everyone who uses Nietzsche texts (philosophy students, &c.) is interested in keeping the italics. The question here is what’s the cutoff point; where do we draw the line about what information goes into plain text, and why. In my humble opinion there seems to be no clear “why”; the line seems an entirely arbitrary technological artifact, a remnant of intuitions developed due to limitations of the typewriter, the teletypes, and early tty-style computer terminals. This is not a bad thing. I’m not dissing plain-text or saying we should abolish it or encode italics or anything like that. But by the same token I don’t consider it some special, unique representation of “true meaning”. Plain text is to me simply yet another attempt to represent language, and like all similar tools, has its strengths and weaknesses—in particular, like all language representation tools, it can encode some kinds of “meanings” and not others. -- Leonardo Boiko
RE: Accessing alternate glyphs from plain text
Re: Accessing alternate glyphs from plain text From: Leonardo Boiko (leobo...@gmail.com) Date: Tue Aug 10 2010 - 13:05:36 CDT > On Tue, Aug 10, 2010 at 13:15, Doug Ewell wrote: >> Your handwritten A and mine may look different, and both may differ>> from a >> typewritten A, but they have something in common that allows us to >> identify >> them with each other. > I have problems with this argument too. For example, consider the > following text: > YOURHANDWRITTENAANDMINEMAYLOOKDIFFERENTANDBOTHM > AYDIFFERFROMATYPEWRITTENABUTTHEYHAVESOMETHINGIN > COMMONTHATALLOWSUSTOIDENTIFYTHEMWITHEACHOTHER. > This is written in a similar manner as texts were written in the past, > before spacing, punctuation and lowercase came into being. Now it > certainly has “something in common that allows us to identify” it with > your original text. E.g., for most uses (but not all), we don’t mind > adding modern punctuation and casing to ancient texts and saying it’s > the “same” text. Nonetheless, by transforming your text I clearly > lost some information. We don’t want to remove spacing and > punctuation from plain text, even though the historic examples show > that they’re not “strictly necessary”. (As you know, our plain text > can even mark _different_ kinds of spacing, as you’re seeing if > you’re > reading this plain-text sentence in a variable-width font.) > There’s some information lost when we render our “plain text” as > ancient text. Similarly, there’s some information lost when we render > handwritten text, typeset text, or computer “rich text” to plain text. > It seems to me these two losses are different only in degree, not in > kind. > To run with your example, my handwriting certainly can go well beyond > just “looking different” than a typewriter; it can actually encode > significant linguistic information that the typewriter cannot. I have > a letter whose author, in a moment of emotional distress, wrote the > sentence “to hurt myself” several times, and in each time the words > get larger and more slanted, with more irregular forms. This graphic > resource is a representation of features of speak intensity, speed, > intonation &c., which is to say, it has pretty much the same role as > punctuation. If you encode her text in plain text, and even in rich > text, you lose this linguistic information. The only way to keep > something I’m willing to call “the same text”, in this case, would be > an image. > It’s all a matter of intended use. >> The whole premise of reading and writing is that we >> look below the surface to the identity of the letters and the meaning>> of >> the >> words. > No, the whole premise of reading and writing is to represent language, > which is spoken, in a visual manner. Nothing to do with letters; > letters are just tools for representing language. You cannot read > without re-creating sound images in your head. Only after the sound > image is recreated is that you reach the “meaning” (even, contrary to > popular myth, in the case of so-called “ideographs”). Plain text can > encode some features of the spoken language, but (obviously) not all. > Some of the features left out might be considered important for some > texts, in some uses. Nietzsche prose employs a lot of italics (which > are typographic marks of something like emphatic stress in speak); if > you take away the italics, the resulting text simply isn’t “the same” > —everyone who uses Nietzsche texts (philosophy students, &c.) is > interested in keeping the italics. Hmm. Readers, when they read, do imagine -- to some degree -- sounds; also readers do seem to rely some on punctuation of various kinds (when reading in languages that have punctuation); see: http://www.eric.ed.gov/ERICWebPortal/search/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=ED029763&ERICExtSearch_SearchType_0=no&accno=ED029763 But I do not know to what extent all the punctuation is translated into sound. Written and oral stories for example do share many features; but if you think about what writing has done to texts you will start to think that the process of reading must be a bit different than the process of listening: writing has changed texts according to many researchers. (For one thing: there are no longer so many "formulas" that are repeated with regularity in stories; other kinds of repetition are lost too in written texts; there may be less syntactic and semantic parallelism at least in English writing -- but this depends in part on the writer.) Yes I do imagine sounds when I read. That's part of it. Most of your email I sounded out; however I did not sound out at all "tty" in your text; I recognized it though and hardly tripped up on the fact that it was not pronounceable as a word in the sense that I could put the letters together into a syllable; I then went back and reread "tty" and pronounced each letter and asked myself if I had missed anything by not doing so but I don't think I had. (I'm can provi