Re: Accessing alternate glyphs from plain text (from Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)

2010-08-10 Thread William_J_G Overington
Thank you for replying.
 
On Saturday, 7 August 2010, Doug Ewell  wrote:
 
> I think the "alternate ending glyph" is supposed to be
> specified in more detail than that.  The example Asmus
> gave was U+222A UNION with serifs. Even though the exact
> proportions of the serifs may differ from one font to the
> next, this is still a relatively precise and constrained
> definition, unlike "Latin small letter e with some
> 'alternate ending' which is completely up to the discretion
> of the font designer."
> 
> Because of stylistic differences among calligraphers—this
> is a calligraphy question, not a poetry question—it is
> hard to imagine how this aspect of the proposal would not
> result in an unbounded number of glyphic variations. 
> 'e' is not the only letter to which calligraphers like to
> attach special endings, and a swash cross-stroke is not the
> only special ending that calligraphers like to attach to
> 'e'.
> 
 
It seems to me that there are at least two ways to have an alternate ending e. 
One is to extend the cross-stroke to the right beyond the e and end the 
extension with a flourish of some sort, another is to extend the lower line out 
to the right and end that extension in some way. I can imagine that a proposal 
would lead to wanting to be able to express a choice of the two, or more, 
possible variants of a letter, should the font have alternate glyphs of both 
types. Then there is the question of what is to happen if the requested one is 
not available in the font: does the other alternate glyph become displayed or 
does the basic character glyph become displayed?
 
> I'd like to see an FAQ page on "What is Plain Text?"
> written primarily by UTC officers.  That might go a
> long way toward resolving the differences between William's
> interpretation of what plain text is, which people like me
> think is too broad, and mine, which some people have said is
> too narrow.
 
That is a good idea.
 
Thank you also for the careful precision with which you describe the situation 
of who thinks what.
 
Yet is producing such a document an impossible task? Some years ago there was a 
suggestion in this mailing list to produce an Frequently Asked Questions (FAQ) 
page about what should not be encoded. Is the document that is now suggested 
effectively the same thing?
 
I thought of an analogy of trying to produce a FAQ document of "What is art?". 
Such a document produced in 1550 might well have been very different from one 
produced in 1910, and those different from one produced in 1995 and those all 
different from one produced in 2010. Maybe the analogy is not perfect, but it 
seems to convey the meaning to me that if a "What is Plain Text?" document is 
produced, with a view to being able to decide what could and could not in the 
future be encoded in Unicode as plain text, then it could quickly become either 
out of date or a restriction of progress in technology. The recent encoding of 
the emoticons shows a dramatic change in what can be encoded as plain text from 
the situation some years ago. Some of my ideas have been refuted as not being 
suitable for encoding in plain text. Yet the refutation all seems to be based 
on unchangeable rules from about twenty years ago.
 
Yet change is part of progress.
 
I remember once being referred, in this mailing list, to an ISO document about 
encoding. The document made reference to a definition of character within the 
same document.
 
The document was ISO/IEC TR 15285.
 
I have found that the document is available here (the link used at the previous 
time no longer works).
 
http://openstandards.dk/jtc1/sc2/wg2/docs/TR%2015285%20-%20C027163e.pdf
 
The introduction includes the following.
 
quote
 
This Technical Report is written for a reader who is familiar with the work of 
SC 2 and SC 18. Readers without this background should first read Annex B, 
“Characters”, and Annex C, “Glyphs”.
 
end quote
 
Annex B has the following.
 
quote
 
In ISO/IEC 10646-1:1993, SC 2 defines a character as:
 
A member of a set of elements used for the organisation, control, and 
representation of data.
 
end quote
 
On the accessing of alternate glyphs from plain text, I feel that as there are 
256 variation selectors that could be used with each of the Latin letters, 
then, provided that no harm is done to those who choose not to use them, that 
some should be encoded so that alternate glyphs can be accessed from fonts.
 
Some readers might find the following of interest.
 
http://forum.high-logic.com/viewtopic.php?f=36&t=2229
 
It is a thread entitled "An unusual glyph of an Esperanto character in the Arno 
font".
 
I had been looking through the following document.
 
http://store1.adobe.com/type/browser/pdfs/ARNP/ArnoPro-Italic.pdf
 
I had found an alternate ending glyph for the h circumflex character and had 
then tried to produce some text where it could be used.
 
I felt that it was a situation of typography inspiring creative writing.
 
Readers who enjoyed th

Re: Apostrophe in transliteration

2010-08-10 Thread Andreas Prilop
On Mon, 9 Aug 2010, Jukka K. Korpela wrote:

> It is of course transliteration standards that should say something
> normative about the matter. As far as I can remember, the authoritative
> versions of the relevant standards are the paper publications, which
> do no identify characters by Unicode numbers, just as ink on paper.

ISO standards have always identified the characters used for
transliteration by reference to ISO 5426.

German standards have always identified the characters by reference
to DIN 31624.
Recent DIN standards identify the characters by reference to Unicode.

Library of Congress rules have always identified the characters
by reference to ANSI Z39.47.

I believe there are mapping tables from ISO 5426, DIN 31624, ANSI Z39.47
to Unicode.



Auto Reply: Re: Apostrophe in transliteration

2010-08-10 Thread eike . rathke
I won't have e-mail access before 2010-08-11




Re: Accessing alternate glyphs from plain text

2010-08-10 Thread Doug Ewell

Jukka K. Korpela  wrote:

Human writing did not originate as plain text, and at the surface 
level, it is never "plain text": it always has some specific physical 
appearance, and abstract "plain text" can only be found below the 
surface, as the underlying data format where only character identities 
(character numbers in a specific code) are encoded, with no reference 
to a particular rendering.


I have the same trouble with this argument that I had last time it was 
made.  Your handwritten A and mine may look different, and both may 
differ from a typewritten A, but they have something in common that 
allows us to identify them with each other.  The whole premise of 
reading and writing is that we look below the surface to the identity of 
the letters and the meaning of the words.


Saying that rendering text always has an appearance is not the same as 
saying that all text is rich text.  The latter viewpoint is what leads 
some people to propose nonce variations in penmanship as Unicode 
characters.


--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­




Re: Apostrophe in transliteration

2010-08-10 Thread Jukka K. Korpela

Andreas Prilop wrote:


On Mon, 9 Aug 2010, Jukka K. Korpela wrote:


It is of course transliteration standards that should say something
normative about the matter. As far as I can remember, the
authoritative versions of the relevant standards are the paper
publications, which do no identify characters by Unicode numbers,
just as ink on paper.


ISO standards have always identified the characters used for
transliteration by reference to ISO 5426.


Sorry, my memory did not serve me well. I think you have previously referred 
to such identifications in some discussions. I guess I had forgotten this 
due to my frustration: having tried to find definitive information on this, 
I got confused and found contradictions.



I believe there are mapping tables from ISO 5426, DIN 31624, ANSI
Z39.47 to Unicode.


Apparently there are _several_ mapping tables, with e.g. four (or more?) 
alternative mappings for PRIME, and whatever their status might be, they 
aren't part of a transliteration standard that refers to, say, ISO 5426.


--
Yucca, http://www.cs.tut.fi/~jkorpela/ 





ISO/TC 37 Conference 2010

2010-08-10 Thread Marion Gunn
*ISO/TC 37 will convene in Dublin next week, in the HQ of NSAI (The 
National Standards Authority of Ireland), where decisions will be made 
re many standards within ISO/TC 37's remit, such as ISO 639. The 
conference will have 104 participants from 21 countries: Australia, 
Austria, Belgium, Canada, China, Columbia, Finland, France, Germany, 
Ireland, Korea (Republic of), Mexico, Netherlands, Norway, Poland, 
Portugal, South Africa, Spain, Sweden, UK, US (more info on www.nsai.ie, 
including schedule of meetings and directions on how to get there).


Irish people and others resident here who wish to participate as members 
of the NSAI delegation should first contact the convener of NSAI's 
ISO/TC 37 group, Fidelma Ní Ghallchobhair 
(fnighallchobh...@forasnagaeilge.ie), for which I run the Irish 
Delegation's only official e-mail service 
(nsai-isotc3...@listserv.heanet.ie), courtesy of the Higher Education 
Authority of Ireland.


The NSAI delegation of ISO/TC 37 consists of sixteen members, who will 
also be happy to meet any members of IETF lists and Unicode lists who 
may already have arrived here to attend the TKE conference in DCU this 
week. Those would like to make arrangements to meet up with NSAI 
delegates may do so by responding to this e-mail.


Sincerely,
mg

*
--

Marion Gunn * eGteo (Estab.1991)

27 Páirc an Fhéithlinn, Baile an

Bhóthair, An Charraig Dhubh,

Co. Átha Cliath, Éire/Ireland

* mg...@egt.ie * eam...@egt.ie *



Re: Accessing alternate glyphs from plain text

2010-08-10 Thread Leonardo Boiko
On Tue, Aug 10, 2010 at 13:15, Doug Ewell  wrote:
>  Your handwritten A and mine may look different, and both may differ from a
> typewritten A, but they have something in common that allows us to identify
> them with each other.

I have problems with this argument too.  For example, consider the
following text:

YOURHANDWRITTENAANDMINEMAYLOOKDIFFERENTANDBOTHM
AYDIFFERFROMATYPEWRITTENABUTTHEYHAVESOMETHINGIN
COMMONTHATALLOWSUSTOIDENTIFYTHEMWITHEACHOTHER.

This is written in a similar manner as texts were written in the past,
before spacing, punctuation and lowercase came into being.  Now it
certainly has “something in common that allows us to identify” it with
your original text.  E.g., for most uses (but not all), we don’t mind
adding modern punctuation and casing to ancient texts and saying it’s
the “same” text.  Nonetheless, by transforming your text I clearly
lost some information.  We don’t want to remove spacing and
punctuation from plain text, even though the historic examples show
that they’re not “strictly necessary”.  (As you know, our plain text
can even mark _different_ kinds of spacing, as you’re seeing if you’re
reading this plain-text sentence in a variable-width font.)

There’s some information lost when we render our “plain text” as
ancient text.  Similarly, there’s some information lost when we render
handwritten text, typeset text, or computer “rich text” to plain text.
 It seems to me these two losses are different only in degree, not in
kind.

To run with your example, my handwriting certainly can go well beyond
just “looking different” than a typewriter; it can actually encode
significant linguistic information that the typewriter cannot.  I have
a letter whose author, in a moment of emotional distress, wrote the
sentence “to hurt myself” several times, and in each time the words
get larger and more slanted, with more irregular forms.  This graphic
resource is a representation of features of speak intensity, speed,
intonation &c., which is to say, it has pretty much the same role as
punctuation.  If you encode her text in plain text, and even in rich
text, you lose this linguistic information.  The only way to keep
something I’m willing to call “the same text”, in this case, would be
an image.

It’s all a matter of intended use.

> The whole premise of reading and writing is that we
> look below the surface to the identity of the letters and the meaning of the
> words.

No, the whole premise of reading and writing is to represent language,
which is spoken, in a visual manner.  Nothing to do with letters;
letters are just tools for representing language.  You cannot read
without re-creating sound images in your head.  Only after the sound
image is recreated is that you reach the “meaning” (even, contrary to
popular myth, in the case of so-called “ideographs”).  Plain text can
encode some features of the spoken language, but (obviously) not all.
Some of the features left out might be considered important for some
texts, in some uses.  Nietzsche prose employs a lot of italics (which
are typographic marks of something like emphatic stress in speak); if
you take away the italics, the resulting text simply isn’t “the same”
—everyone who uses Nietzsche texts (philosophy students, &c.) is
interested in keeping the italics.


The question here is what’s the cutoff point; where do we draw the
line about what information goes into plain text, and why.  In my
humble opinion there seems to be no clear “why”; the line seems an
entirely arbitrary technological artifact, a remnant of intuitions
developed due to limitations of the typewriter, the teletypes, and
early tty-style computer terminals.  This is not a bad thing.  I’m not
dissing plain-text or saying we should abolish it or encode italics or
anything like that.  But by the same token I don’t consider it some
special, unique representation of “true meaning”.  Plain text is to me
simply yet another attempt to represent language, and like all similar
tools, has its strengths and weaknesses—in particular, like all
language representation tools, it can encode some kinds of “meanings”
and not others.

-- 
Leonardo Boiko




RE: Accessing alternate glyphs from plain text

2010-08-10 Thread CE Whitehead

Re: Accessing alternate glyphs from plain text
From: Leonardo Boiko (leobo...@gmail.com)
Date: Tue Aug 10 2010 - 13:05:36 CDT


> On Tue, Aug 10, 2010 at 13:15, Doug Ewell wrote:
>> Your handwritten A and mine may look different, and both may differ>> from a
>> typewritten A, but they have something in common that allows us to
>> identify
>> them with each other.

> I have problems with this argument too. For example, consider the
> following text:

> YOURHANDWRITTENAANDMINEMAYLOOKDIFFERENTANDBOTHM
> AYDIFFERFROMATYPEWRITTENABUTTHEYHAVESOMETHINGIN
> COMMONTHATALLOWSUSTOIDENTIFYTHEMWITHEACHOTHER.

> This is written in a similar manner as texts were written in the past,
> before spacing, punctuation and lowercase came into being. Now it
> certainly has “something in common that allows us to identify” it with
> your original text. E.g., for most uses (but not all), we don’t mind
> adding modern punctuation and casing to ancient texts and saying it’s
> the “same” text. Nonetheless, by transforming your text I clearly
> lost some information. We don’t want to remove spacing and
> punctuation from plain text, even though the historic examples show
> that they’re not “strictly necessary”. (As you know, our plain text
> can even mark _different_ kinds of spacing, as you’re seeing if 
> you’re
> reading this plain-text sentence in a variable-width font.)

> There’s some information lost when we render our “plain text” as
> ancient text. Similarly, there’s some information lost when we render
> handwritten text, typeset text, or computer “rich text” to plain text.
> It seems to me these two losses are different only in degree, not in
> kind.

> To run with your example, my handwriting certainly can go well beyond
> just “looking different” than a typewriter; it can actually encode
> significant linguistic information that the typewriter cannot. I have
> a letter whose author, in a moment of emotional distress, wrote the
> sentence “to hurt myself” several times, and in each time the words
> get larger and more slanted, with more irregular forms. This graphic
> resource is a representation of features of speak intensity, speed,
> intonation &c., which is to say, it has pretty much the same role as
> punctuation. If you encode her text in plain text, and even in rich
> text, you lose this linguistic information. The only way to keep
> something I’m willing to call “the same text”, in this case, would be
> an image.

> It’s all a matter of intended use.

>> The whole premise of reading and writing is that we
>> look below the surface to the identity of the letters and the meaning>> of 
>> the
>> words.

> No, the whole premise of reading and writing is to represent language,
> which is spoken, in a visual manner. Nothing to do with letters;
> letters are just tools for representing language. You cannot read
> without re-creating sound images in your head. Only after the sound
> image is recreated is that you reach the “meaning” (even, contrary to
> popular myth, in the case of so-called “ideographs”). Plain text can
> encode some features of the spoken language, but (obviously) not all.
> Some of the features left out might be considered important for some
> texts, in some uses. Nietzsche prose employs a lot of italics (which
> are typographic marks of something like emphatic stress in speak); if
> you take away the italics, the resulting text simply isn’t “the same”
> —everyone who uses Nietzsche texts (philosophy students, &c.) is
> interested in keeping the italics.

Hmm.  Readers, when they read, do imagine -- to some degree -- sounds; also 
readers do seem to rely some on punctuation of various kinds (when reading in 
languages that have punctuation); see:
http://www.eric.ed.gov/ERICWebPortal/search/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=ED029763&ERICExtSearch_SearchType_0=no&accno=ED029763

But I do not know to what extent all the punctuation is translated into sound.

Written and oral stories for example do share many features; but if you think 
about what writing has done to texts you will start to think that the process 
of reading must be a bit different than the process of listening:  writing has 
changed texts according to many researchers. (For one thing: there are no 
longer so many "formulas" that are repeated with regularity in stories; other 
kinds of repetition are lost too in written texts; there may be less syntactic 
and semantic parallelism at least in English writing -- but this depends in 
part on the writer.)

Yes I do imagine sounds when I read. That's part of it.

Most of your email I sounded out; however I did not sound out at all "tty" in 
your text; I recognized it though and hardly tripped up on the fact that it was 
not pronounceable as a word in the sense that I could put the letters together 
into a syllable; I then went back and reread "tty" and pronounced each letter 
and asked myself if I had missed anything by not doing so but I don't think I 
had.

(I'm can provi