from:"Lukas Pietsch"

Re: Letterforms based on p

2003-06-07 Thread Lukas Pietsch


> >I was hoping to find someone who had additional evidence for this
character.

I happened to come across it the other day in a modern printed edition
of 17th- to 19th century handwritten English letters (Miller, Kerby
A., Arnold Schrier, Bruce D. Boling, & David N. Doyle. 2002. _Irish
immigrants in the land of Canaan: letters and memoirs from colonial
and revolutionary America, 1675-1815._ Oxford: Oxford UP) I haven't
got it here just now, but if it is important I might be able to
provide a few scans.

If I remember correctly, it was being used just as an handwritten
ligature of the word "per", as in "per day", "per year", etc.

Lukas

Re: “book end” or in most languages?

2003-05-30 Thread Lukas Pietsch

> >
> >are they the right way round? so in german it'd be:
> >
> >otto said „So, there is not comprehensive list of
> openers vs. closers 
> >possible.“
> 
> Does not look right here. The following is more like it:
> 
> „So, there is not comprehensive list of openers vs.
> closers possible.”
> 

No, as far as I can tell the original version is the
correct one. Look at it with a font other than Courier New,
which has a rather uncommon glyph for the German closing
quotes. Times New Roman is much more representative. 

Lukas

Re: symbols for `born' and `died' + guarani sign

2003-02-24 Thread Lukas Pietsch

> For the "married" symbol use the mathematical "infinity" symbol:
> U+221E (no pun intended).
> Indeed, one could go a step further and introduce (?) a symbol for
divorced:
> Either one of the following offers itself as a candidate:
> U+29DC INCOMPLETE INFINITY
> U+29DE INFINITY NEGATED WITH VERTICAL BAR

Actually, the symbol and several others already exist and seem to be
standardized. The "Duden", the authoritative source on German
orthography, describes them in its section on typesetting practices,
under the heading of "genealogical symbols". Besides the asterisk
(born), the dagger/cross (died), and the two overlapping rings (married)
it lists:
wavy horizontal line (= baptized)
a single ring (= engaged)
two rings separated by a vertical bar (= U+29DE?) (= divorced)
two rings joined by a horizontal line (= extramarital)
two swords crossed (= died in combat)
rectangle (= buried)
urn symbol (= cremated)

The "married" symbol, by the way, typically differs from the infinity
symbol, as it consists of two overlapping circles, not just circles
touching each other. The "born" and "died" symbol, on the other hand,
are clearly identical with the normal typographical asterisk and dagger.

The Duden also makes it clear that these are all for use in inline text
("können in entsprechenden Texten zur Raumersparnis verwendet werden").
I haven't got a scanner here, else I might put up a scan somewhere. I
haven't found the time yet to look up if any more of these are already
in Unicode, but I don't remember having seen them.

Lukas

Re: LATIN LETTER N WITH DIAERESIS?

2003-02-02 Thread Lukas Pietsch

> All characters are now mapped to Unicoe characters or character
sequences
> where I felt that this was possible. If there are obvioous errors,
please
> point them out and I'll update the listing.
>
> However, there are some unidentified characters, or ones that could be
> considered missing from Unicode  4.0, or which have mappings that for
one
> or the other reason could be considered not ideal. These have been
> highlighted. I welcome suggestions for additions to or subtractions
from
> this list, plus any help anyone could provide in identifying the
characters
> or in locating places they are used.

Your F725 Unknown-2, to me, looks like a German SCRIPT CAPITAL S,
(compare with U+2112;SCRIPT CAPITAL L). Yes, we were taught to write an
S like this in school. Perhaps it's used somewhere in mathematics?

Your F7AA Unknown-8 could then be a SCRIPT CAPITAL C.

Your F747, spacing left hook below - doesn't it look very much like the
palatalization hooks used elsewhere in the list (which you mapped to
U+0321)?

Your combinations "with latin small letter dotless i" (e.g. F704, F731,
F77A) seem to be designed for use in phonetic transcriptions, and hence
are probably intended as IPA U+026A;LATIN LETTER SMALL CAPITAL I

F737: the description in your list doesn't match the glyph shown, which
is "with triangular colon".

F70F "Latin small letter a with colon" shows a triangular colon glyph
and should hence be mapped to U+02D0, not U+003A.

F70E "Latin small letter a with tilde with modifier letter triangular
colon" shows a U+0251 "Latin small letter alpha" glyph.

F750 "Latin small letter i with palatalized hook below" shows an
inverted breve glyph, not a hook.

F751 "Latin small letter i with tilde with tilde" shows a macron and a
tilde

F754 and F755 "Latin small letter J with..." show i, not j glyphs.

F79B "Latin small letter S with retroflex hook below" shows not a
retroflex hook, but something more like an ogonek. A retroflex hook
should be attached to the left side of the S, not in the middle below,
and has its own precomposed IPA codepoint U+0282.

F7AC "Latin small letter u with dot below with diaeresis" shows an
acute, not a diaeresis.

Lukas

Re: IPA for "hard g"

2002-12-14 Thread Lukas Pietsch


>> What is the correct IPA symbol for the "g" sound in "gig"?
>> Is it U+0067 LATIN SMALL LETTER G [g], or U+0261
>> LATIN SMALL LETTER SCRIPT G [ɡ]?

> It seems obvious to me that it should be U+0261, but I'm looking for a
> voice of authority to confirm this.

I'm certainly no "voice of authority", but perhaps you will accept as
such the _Handbook of the International Phonetic Association_, Cambridge
UP, 1999, p.163ff.

It states that the glyph represented in the standard IPA chart
("opentail g" in IPA terminology) is to be encoded as U+0261, but that
Ascii g ("looptail g") may be used as an equivalent.

As far as I can see, almost all professionally printed material that
uses IPA symbols (such as dictionaries etc) use the looptail g glyph,
i.e. U+0261.

Lukas

Re: Localized names of character ranges

2002-12-02 Thread Lukas Pietsch

>
> Just a question: has anyone who is concerned about these considered
> sending the suggestions to someone at Microsoft, where they might do
some
> good? It's nice to tell people on the Unicode list, but to have any
impact,
> Microsoft needs to be involved.
>

True enough. Sorry if I used up bandwidth for people not concerned with
this. I was hoping that someone with the right connections would be
around here. Not exactly easy for simple Joe User, like me, to find the
right address at Microsoft and get listened to, would it be?

Lukas

Localized names of character ranges

2002-12-01 Thread Lukas Pietsch

Hello,

I just wondered if anybody at Microsoft has noticed that the names of
the Unicode ranges used in German localized editions of MS Office are
woefully inadequately translated. It's been an long-standing cause of
irritation when working with Word97, and if I remember correctly it
hasn't been corrected so far, at least not in Word2000. I'm referring to
the names as they are used in the Insert-Symbol dialog.

Some of these mistranslations are really far off. To the average user,
they will just make no sense at all, but for people on this list they
may actually be quite funny. So, just for your enjoyment, here goes:

"Spacing modifier letters" has been translated as if it meant "letters
that modify the spacing" ("Buchstaben zur Abstanddefinition"). The
average user would probably expect to find things like em-space and
en-space in that range? Or has somebody succeeded in getting control
characters added to Unicode that encode some kind of kerning
information? W.O., perhaps? ;-)

In a similar vein, "Alphabetic presentation forms" have come out as
"characters for alphabetic display". ("Zeichen zur alphabetischen
Darstellung".) Same goes for the Arabic presentation forms.

Less severely, "combining diacritical marks" have been mistaken for
"combined diacritical marks" ("kombinierte diakritische Kennzeichen").
What would you expect in such a range, things like Greek Dialytika
Tonos, or even precomposed letter combinations?

The same confusion about "combining/combined" goes for the combining
characters in the U+20Dx "Combining Marks for Symbols" range
("kombinierte diakritische Sonderzeichen"). Also, the "for Symbols" part
has not been rendered at all, and the difference between "Sonderzeichen"
and "Kennzeichen" will probably not mean anything to the average user.

Finally, "Georgian" has been translated as "Georgianisch" (perhaps by
analogy with "Gregorianisch"?) instead of correct "Georgisch".


Is there anybody here who could bring this to the attention of the
localization people at MS, if appropriate? I'd really hate having to use
"Buchstaben zur Abstanddefinition" for the next 20 years...

Just to be constructive, here's my suggestions for a better translation:

"Spacing modifier letters" = "Nichtkombinierende Diakritika" (I know
it's not very precise, but I couldn't come up with anything better)
"Combining diacritical marks" = "Kombinierende Diakritika"
"Combining marks for symbols" = "Kombinierende Symbolzusätze"
"Alphabetic presentation forms" = "Alphabetische Präsentationsformen"
"Arabic presentation forms" = "Arabische Präsentationsformen"
"Georgian" = "Georgisch"

Lukas

Re: Special characters

2002-11-05 Thread Lukas Pietsch

> Could someone tell me whether it is possible to produce
> the following characters please?

Sure:

k with a small line underneath
ḵ ḵ LATIN SMALL LETTER K WITH LINE BELOW
or: ḵḵ

K with a small line underneath
Ḵ Ḵ LATIN CAPITAL LETTER K WITH LINE BELOW
or: ḴḴ

H with a dot underneath
Ḥ Ḥ LATIN CAPITAL LETTER H WITH DOT BELOW
or: ḤḤ

h with a dot underneath
ḥ ḥ LATIN SMALL LETTER H WITH DOT BELOW
or: ḥḥ

B with a small line underneath
Ḇ Ḇ LATIN CAPITAL LETTER B WITH LINE BELOW
or: ḆḆ

b with a small line underneath
ḇ ḇ LATIN SMALL LETTER B WITH LINE BELOW
or: ḇ ḇ

D with a small line underneath
Ḏ Ḏ LATIN CAPITAL LETTER D WITH LINE BELOW
or: ḎḎ

d with a small line underneath
ḏ ḏ LATIN SMALL LETTER D WITH LINE BELOW
or: ḏḏ

G with a line on top
Ḡ Ḡ LATIN CAPITAL LETTER G WITH MACRON
or: ḠḠ

g with a line on top
ḡ LATIN SMALL LETTER G WITH MACRON
or: ḡḡ

E with an upside down ^ on top
Ě Ě LATIN CAPITAL LETTER E WITH CARON*

e with an upside down ^ on top
ě ě LATIN SMALL LETTER E WITH CARON*

Mirror image of a comma, but not at the bottom - should be higher, like
an '
ʽ ʽ MODIFIER LETTER REVERSED COMMA (use as a letter);
or: ‛ ‛ SINGLE HIGH-REVERSED-9 QUOTATION MARK* (use as a
punctuation mark);

The codes marked &#x... are the hexadecimal Unicode values, those marked
&#... are the decimal ones. You can use them in this form in html pages.
You will need a 'large' Unicode font for most of these - only the ones
marked with an * are found, for instance, in standard Windows Unicode
fonts such as Times New Roman. I suggest Gentium
(http://www.sil.org/~gaultney/gentium). The combining characters U+0304,
U+0331 and U+0323 are also in Lucida Sans Unicode and some other fonts.

Hope this helps,

Lukas

Re: Common input methods for IPA

2002-07-10 Thread Lukas Pietsch


Hi Marc Wilhelm,

> In this brave new world of wonderful input methods, what is
> the current state of affairs for keyboard-based input methods
> for characters from the IPA block? Is there any de facto standard
> for this and, for that matter, for an IPA keyboard layout?

I'm not aware of a standard either (maybe the Keyman keyboards
distributed by SIL for their SIL phonetic fonts come closest to one, in
terms
of widest distribution), but I guess any *international* standard would
be highly problematic anyway - any key mnemonics are bound to fail for
users that are
accustomed to one national keyboard and not another. Personally, I've
come to use my own home-grown Keyman method for Unicode IPA. Maybe I am
and shall remain the only person in this world who finds the layout
intuitive, but it works for me. Since it's based on the layout of a
German keyboard, you might find it worth having a look at:

http://people.freenet.de/LukasPietsch/Keyman/Keyboards.html

Hope this helps,

Lukas

Re: Recent Threats

2002-02-27 Thread Lukas Pietsch




> Would you by chance mean 'threads' ?

> There is a difference, you know ;-)

Quite right. And, in order to prove Stefan's point: how about starting a
new thread/threat now about why we Germans are so prone to confuse these
letters, and what consequences that ought to have for a possible
unification of these two characters in Unicode? Any takers?
;-)

Lukas

Re: Smiles, faces, etc

2002-02-16 Thread Lukas Pietsch

"Falkor" wrote:

> I was thinking more that this would allow modern software to translate
a
> lower-ASCII three-character sequence into a single unicode emoticon
> character that would be displayed properly regardless of OS and
software,
> also alleviating the need for such developers to create proprietary
artwork
> for each.  This multiple-keystroke-per-character input method does
have
> precedent with Asian languages.

I'm starting to wonder about this thread. Really, why would anybody want
to have the Ascii-smilies replaced by single standardized "faces"
created by some font designer? The creative process of composing these
smilies from their Ascii components, together with the the
open-endedness of the repertoire and the scope for creative variation
this involves - isn't that just the fun of the whole thing? The
playfulness? Isn't it exactly this what has made them so popular?

Lukas

Re: A few questions about decomposition, equvalence and rendering

2002-02-05 Thread Lukas Pietsch

John Cowan wrote:
>
> Eh?  U+1FC1 *is* nonspacing.  The U+1Fxx ones are the spacing
> compatibility equivalents, except for this one.
>

U+1FC1 is spacing in all the fonts that I've seen. And it decomposes to
U+00A8 U+0342 (canonically), i.e. to a sequence of spacing plus
non-spacing character. At least it did so in Unicode 3.0.

Not that I would bother much - I have no idea where that character
should ever be used.

Lukas Pietsch

Re: Unicode 3.1 and Roman numeral harmonic analysis

2001-07-18 Thread Lukas Pietsch


> Are the letters used in "Roman numeral harmonic analysis"  Roman 
> numerals or are some other letters also used ?

There are quite a number of different systems out there, but it's common to them all 
that they use some combination of Roman letters with numbers (often subscript or 
superscript), musical accidentals (flat / sharp signs); plus / minus / greater-than or 
smaller-than signs, and other graphical symbols such as strokes, brackets, circumflex 
accents...

Many systems (including the "Schenkerian" analysis that is fashionable in the 
Anglo-Saxon world) use capital Roman numerals as their base symbols  while other 
systems (such as the "Riemannian" analysis that is common in Germany) use letter 
combinations that stand for the harmonic functions: T, D, S, Tp, and so forth, 
including the "double-dominant" and "double-subdominant" symbols (partial overlay of 
two "D"s or two "S"s respectively.)

Do you need a few scans? 

Lukas

Re: Unicode transliterations (and other operations)

2001-07-04 Thread Lukas Pietsch

James Kass wrote:

> Indeed!  Or, at least if we need a correct definition of
> an English word, we should consult an English dictionary.
> The web page cited by Mr. Constable is simply misleading, unless
> it were to be amended to clearly state "for the purposes of
> this and related documents..." these words mean &c.

well, the English dictionaries give usages of words in everyday language,
and that's fine. But in their usage as technical terms, the distinction between 
"transcription"
and "transliteration" (roughly along the lines of the 
http://www.elot.gr/tc46sc2/purpose.html page) seems to me to be a fairly 
well-established one, in the field of linguistics at least.

> No international body has any authority to alter the meaning of
> existing words in my language or any of our languages.

Sure, but we're dealing with a scholarly discipline's technical vocabulary here, and 
it's not such a bad idea in this case if computer people dealing with language adopt 
the usage of linguists, is it?

> what they call "transliteration" could easily be
> referred to as "reversible transliteration" in plain English,
> without 'breaking existing applications' like my dictionary.

You must understand: this isn't about "breaking existing applications", it's about a 
"higher-level protocol"! ;-)

Lukas Pietsch

Re: [OT] Call for contributions to new 1,000 Language Online Archive

2001-05-23 Thread Lukas Pietsch


> Thou art right saying that thou doest not normally speak like an
> international treaty. But wouldst thou say that these
>
 ype=vocab&version=1&scale=six> really are the 100 most common words in
thy
> language? :-)

You may want to have a look at, for example:
www.fitzroydearborn.com/chicago/linguistics/sample-swadesh-morris.php3
for a short appraisal of the sense or non-sense of "Swadesh lists".


Lukas

Re: Help in a HURRY !!!!!!!!!!!!!!!!!!!!!!!

2001-05-14 Thread Lukas Pietsch


Dear Dr Keihany,

I'm afraid I didn't quite understand if you want your output in actual
Unicode (UTF-16?)
or as Ascii text with html-style numeric entities ("&#;" format). In
the latter case,
a simple Perl script might do the job. For instance:


use utf8;
open INFILE, "test2.txt";
while ($OneLine = ) {
 $NewLine = "";
 $OneLine =~ s/(.)/
  $OneChar = ord($1);
  if ($OneChar >= 128) {
   $NewLine .= "&#$OneChar;";
  }
  else {
   $NewLine .= $1;
  };
  $1;
 /eg;
 print OUTFILE "$NewLine\n";
};

I just jotted this down without much testing, but I think it works in
principle.
Hope this helps,

Lukas Pietsch

Re: translation help desired: "symbols"

2001-05-02 Thread Lukas Pietsch


>>Greek = ?
>symbolo (symbolo)

Yes, but don't omit the accent:
σύμβολο, plural σύμβολα

(oh yes, and this *is* another UTF-8 message again, I couldn't help it.)

Lukas

RichEdit v.4 common control in Win98?

2001-04-07 Thread Lukas Pietsch


Hello,

from a recent posting by Peter Constable I take it that it is possible to
have Unicode keyboard input with Tavultesoft KeyMan 5.0 (using WM_UNICHAR
messages) in some applications under Win98, provided you have version 4 of
richedit20.dll installed, and that a new version of Wordpad supports this.
I have richedit20 v.3 on my system, which apparently came with IE5.5, and
it doesn't provide this functionality.

Questions:
(a) Is the new richedit control, and/or the new Wordpad version, available
for download somewhere?
(b) If you install the new version of richedit20.dll, does that actually
add the WM_UNICHAR functionality automatically to applications that
previously were using richedit20 v.3? (e.g. Outlook Express...?)
(c) Has anybody got a list of existing Win98 applications that can make use
of the WM_UNICHAR functionality?

Thanks,

Lukas


-
Lukas Pietsch
University of Freiburg
English Department

Phone (p.) (#49) (761) 696 37 23
mailto:[EMAIL PROTECTED]

Re: Classical Greek on a Mac

2001-04-04 Thread Lukas Pietsch

David Perry wrote:

> I have a polytonic keyboard for
> Windows that I have created using Keyman, which is not available for the
> Mac.  I'd be happy to share the documentation for this

Would you be willing to share the Keyman keyboard itself? I just downloaded
Keyman 5.0 after some people on this list told us what wonderful things it
can do. But I have no keyboards as yet to go with it.
Thanks ever so much,

Lukas

Re: Polytonic Greek

2001-04-03 Thread Lukas Pietsch


Patrick Rourke wrote:
> I happen to like the Palatino Linotype font, though I
> don't quite understand why the combining diacriticals aren't working in
> IE5 - rather than zero space characters, one gets an artifact character,
> and a particularly obnoxious one, too).

Simple: the diacritics don't work because they aren't there. My version of
Palatino Linotype has only the following from the combining-diacritics
block:

U+0300 combining grave accent
U+0301 combining acute accent
U+0303 combining tilde (*not* combining Greek perispomeni)
U+0309 combining hook above (*not* U+0313 combining comma above / psili,
although it looks like one)
U+0312 combining turned comma above (*not* U+0314 combining inversed comma
above / dasia, and it's really a spacing glyph!)
U+0323 combining dot below
U+0326 combining comma below (*not* U+0345 combining ypogegrammeni)

The choice seems fairly random; maybe it was done by someone with a Greek
in mind but only a very superficial idea of what you need for it?

Lukas Pietsch

Re: Square and lozenge notes -- Musical Notation 3.1 -- Mensuralnotation

2001-03-07 Thread Lukas Pietsch


Patrick insisted:

> They also question the presence of a SEMIBREVIS+FLAG-2 in
> older material (pre-1420 let's say).

and the longer I look into the matter, the more I get the impression that
his informants are actually right and I was wrong. Looking into W. Apel,
probably still the most authoritative book on mensural notation, I don't
find evidence of a "Fusa" (with 2 flags) in Black Notation (i.e. "pre-1420"
as we've come to call it on this list.)

What we get is:

(A) Black Notation (until around 1400):
Maxima
Longa
Brevis(square head)
Semibrevis (lozenge head)
Minima (lozenge head with stem)

(B) Late Black Notation: lots of experimenting with note values one level
below the semibreve: black, white and red noteheads, half noteheads, stems;
double stems above and below; hooks, curls. Lots of different names too:
"semibrevis minima", "dragma", perhaps "fusa". Much too varied to get all
the possible shapes into Unicode at this point. What we do *not* get, not
even in the wildest dreams of the avantgardists of the time (and mind you
these guys were *very* avantgarde!) is a note *two* metrical levels below
the Minim. We don't get shapes with one stem with two parallel flags
either. And even less do we get a consistent terminology for such a note.

(C) White Notation: from the mid-15th century onwards we consistently get
white noteheads for the well-established note values:
maxima-longa-brevis-semibrevis-minim. Below that, we slowly find
established smaller note values, eventually down to three levels below the
minim: Semiminima and Fusa, plus much later the Semifusa. For Semiminim and
Fusa we initially get two alternative forms:

(C1):
Semiminim: white notehead + stem + flag
Fusa: white notehead + stem + two flags.

(C2):
Semiminim: black notehead + stem
Fusa: black notehead + stem + flag

Apel says about these (my translation, p. 93:) "Semiminim and Fusa occur in
two forms, of which the black ones are by far the more common. Occasionally
one finds both forms in the same manuscript or even within the same
composition, with no apparent difference in meaning."

What does this mean for a standardized terminology? In System (A) the small
note values play no role at all, and those note values that exist can
easily be unified with those in system (C). There'll just be a typeface
difference between black and white variants.
System (B) is far too chaotic to try to base Unicode terminology on.
System (C) has a well-established terminology. (C2) is so similar to modern
notation that its notes could even be unified with the modern ones.

The only serious terminological ambiguity is that a "black notehead + stem"
can be either a Semiminim in System (C2), or a Minim in System (A). "Black
notehead + stem + flag" can be either a Fusa in System (C2) or any kind of
shortish note (possibly also called a Fusa) in chaotical System (B). In
this second case, the choice is easy: Unicode should draw its terminology
from the more stable system, C2. Hence, to be consistent, we should also
prefer the C2 terminology over the A one in the first case.

So, I would now join Patrick in a suggestion to rename as follows:
  Current Draft  New Proposal
1D1B9 Semibrevis WhiteSemibrevis
1D1BA Semibrevis Black(can be left out)
1D1BB Minima Minima
1D1BC Minima Black  Semiminima
1D1BD Semiminima WhiteSemiminima White, or: alternative Semiminima
1D1BE Semiminima Black Fusa
1D1BF Fusa White Fusa White, or: alternative Fusa
1D1C0 Fusa Black Semifusa

The only open question: when designing a font for use with 14th cent. Black
Notation, should one unify by meaning or by glyph shape? I.e., in order to
encode a Black-Notation Minima, should one use codepoint u+1D1BB (with the
same meaning but providing a black glyph for it, just as for the larger
notes), or should one use codepoint u+1D1BC (with the same glyph but a
different meaning?)

Lukas

Re: Square and lozenge notes -- Musical Notation 3.1 -- Mensural notation

2001-03-07 Thread Lukas Pietsch

In my last posting I wrote:
> I also notice that the "black maxima" seems to be missing. Since we
> have the "black" and "white" series, we ought to have them both
> complete, right? "black longa" can be thougt of as unified with
> Gregorian 1d1d3 "virga", and "black brevis" with generic
> 1d147 "square notehead black", but the "black maxima" isn't there.

Patrick Andries has answered this point, suggesting that
the black and white variants should be seen as font variants.
I guess that's a valid point, but it raises the question why the other
musical notes aren't unified in the same way. There are separate characters
(1d1b9) "SEMIBREVIS WHITE" and (1d1ba)  "SEMIBREVIS BLACK". Note that these
symbols are *not* affected by the semantic ambiguity problem we were
discussing, which involves only the smaller note values minima, semiminima,
fusa and semifusa.
I'd be interested to learn the rationale behind these choices. Is the
original proposal available anywhere?

As for the other question, that of the stem of "longa" and "maxima": Yes,
Patrick's suggestion is right that the most common form of these notes has
a downwards stem (on the *right* side of the notehead, mind!) In earlier
mensural notation, the directions of noteheads did not depend on the
position of the notehead on the stave, as today; rather, minims and other
small notes always had upwards stems and single longae and maximae mostly
had downward stems. However, the odd example of longae with upward stems
can be found even then. From the mid-16th century onwards the modern
convention of context-dependend stems seems to have emerged, and from then
on both the longae and the minim stems were placed according to it. So, it
seems consistent that the Unicode charts show all notes with upward stems,
implying that upward and downward stems are context-dependend glyph
variants.

Plenty of examples of all this can be found in: Willi Apel, Die Notation
der polyphonen Musik 900-1600. Leipzig 1962/1970.

Lukas

Re: Square and lozenge notes -- Musical Notation 3.1 -- Mensural notation

2001-03-07 Thread Lukas Pietsch


> All notes could have been given post-1420 names given the fact that the
> white notes appear only after 1420...

Well, not really, because there are quite a few symbols (black notes of
semibreve and above) which occur only in the pre-1420 notation. So the
series of "black" note names would have a confusing gap:

"black head with no stem" = "black semibrevis"
   = no "black minima" 
"black head with stem"  = "black semiminima" (new usage)
"black head with stem and flag1" = "black fusa" (new usage)
"black head with stem and flag2" = "black semifusa" (new usage)

"white head with no stem" = "white semibrevis"
"white head with stem" = "white minima"
"white head with stem and flag1" = "white semiminima"
"white head with stem and flag2" = "white fusa"
etc.

That's what your proposal boils down to, isn't it? Well, certainly
historically correct, but I find it even slightly more confusing than the
other way. I do think that the terminology Unicode has chosen is the more
consistent one. Confusing, yes, but it *will* be confusing to
non-specialist users either way, won't it?

>
> P.S. Incidentally, do your sources also show consistently the nominal
form
> of the MAXIMA and LONGA with stems pointing downwards contrarily to the
> Unicode reference glyph ?
>

Oops, indeed, they do, and I hadn't noticed. (As I said, my musicology days
at university are way back...) -- This might very well be significant. Yes,
I think mensural notation did not have the modern convention that the
orientation of the noteheads depends on the position on the stave. Hold on,
I'll check.

I also notice that the "black maxima" seems to be missing. Since we have
the "black" and "white" series, we ought to have them both complete, right?
"black longa" can be thougt of as unified with Gregorian 1d1d3 "virga", and
"black brevis" with generic 1d147 "square notehead black", but the "black
maxima" isn't there.

Lukas

Re: Square and lozenge notes -- Musical Notation 3.1 -- Mensural notation

2001-03-07 Thread Lukas Pietsch


Patrick Andries enquired:

> 2) U+1D1C0 seems to have an incorrect names (e.g. "fusa black"). This is
> character (SEMIBREVIS BLACK + STEM +  FLAG-2)

> I believe, this is black SEMI-FUSA. [snip]
> I believe the confusion may stem from the fact that some symbols have
> change names and values through time (see below). Unicode seems to have
> aligned itself on the pre-1420 names (the smaller set of symbols) and
have
> extrapolated from it the names of the black notes that appeared only
after 1420.

No, I think the Unicode terminology is correct. The name "fusa black" has
not been extrapolated anachronistically. It was indeed used like this
pre-1420 (although the dictionary table you quote doesn't show it.) The
Unicode terminology is consistent in so far as all white notes are given
post-1420 names, and all black notes are given pre-1420 names,
notwithstanding the fact that these black notes were also used with
*different* names and values post-1420.

I only have small dictionaries at hand at the moment ("Meyers
Taschenlexikon Musik", p. 265, and "DTV Atlas zur Musik", p.214, 233). They
show me the following:

Pre-1420 (when all noteheads were black:)

semibr. = (lozenge) head  (=1d1ba "semibrevis black")
minima  = head + stem (=1d1bc "minima black")
semimin.= head + stem + flag1 (=1d1be "semiminima black")
fusa= head + stem + flag2 (=1d1c0 "fusa black")

Post-1420 (when black vs white noteheads became distinctive:)

semibr. = white head(=1d1b9 "semibrevis white")*
minima  = white head + stem (=1d1bb "minima white")**
semimin.= white head + stem + flag1 (=1d1bd "semiminima white"), or:
  black head + stem (=1d1bc "minima black")***
fusa= white head + stem + flag2 (=1d1bf "fusa white"), or:
  black head + stem + flag1 (=1d1be "semiminima black")
semifusa= black head + stem + flag2 (=1d1c0 "fusa black")*

*=> modern whole note
**=> modern half note / minim
***=> modern quarter note / crotchet
=> modern 8th note / quaver
*=> modern 16th note / semiquaver
(Note: My sources don't show the alternative semifusa (white head + stem +
flag3) which your source shows, and which is not in Unicode. Maybe that one
is an anachronistical extrapolation?)


You can see that the Unicode terminology is consistent with all of the
pre-1420 symbols, and at least with one of the two sets of post-1420
symbols. (Even if that is not the set of symbols that eventually came to
dominate.)

If people wish to know with more certainty, I'd have to read up. I'm afraid
I've forgotten most of what they taught us in notation class during my
musicology days at the university. ;-)

Lukas

Re: Latin digraph characters (was: Re: Klingon silliness)

2001-02-28 Thread Lukas Pietsch

Doug Ewell wrote:
>
> Aren't Serbian and Croatian the standard example of two "languages" that
are
> really the same language but are treated separately

This question about languages being "really" the same or no turns out to be
a rather moot one from a linguist's viewpoint, even more so once the issue
gets burdened with national feelings. I mean, are English and Scots the
same? Are Bulgarian and Macedonian the same? Are Rumanian and Aromunian the
same? Are Ancient Greek and Ancient Macedonian the same? Are Upper German
and Lower German the same? Are German, Schwitzerdytsch and Letzeburgsch the
same? Are Dutch and Flemish the same? Are British and American English the
same (that was an issue at one time!) -- There are probably as many such
issues as there are nations in the world, or more, and as a linguist you
get weary of getting asked what the "real" answer is in each case.

> Are there any linguistic or vocabulary differences between them?

Well, there are bound to be, at some level, and if not in the normative
standards, then in the actual spoken varieties of relevant population
centers. The question is just, how big are these, and--different and much
more important question--how big do people *want* to *perceive* these
differences to be?

Lukas

(P.S.: Sorry Doug, I meant to send this to the list in the first place.)

Re: Help with Greek special casing

2001-02-26 Thread Lukas Pietsch


Carl Brown asked:
> It is final when followed by a hyphen or combining diacritical mark?

Patrick Rourke answered:

> Don't know what the Unicode rules are, but the answer is no.  The final
> sigma form is not used if the sigma is in a medial position in the word
but
> at the end of the line (e.g., when it occurs at the point of hyphenation
in
> a hyphenated word at line end).

Just one addition: You do get a final sigma before explicit (hard) hyphens,
i.e. u+2020 and other kinds of dashes, as opposed to (soft) line-breaking
hyphens (u+00AD).
I guess explicit hyphenation isn't likely to occur in typesetting of
Ancient Greek, but it does occur in Modern Greek, in noun compounds of the
type κράτος-μέλος.
The Unicode rules will handle this correctly, as far as I can see.

Lukas

Re: What about musical notation?

2001-02-22 Thread Lukas Pietsch



>
> Am I right in thinking that in the days when hand set metal type on
printing
> presses was the only method of printing that there were fonts of musical
> type?  I have never seen any font of such type myself, though I have seen
> fonts for such non-text matters as chess sets and crossword puzzles.
>

As far as I know, music printing with mobile letters of this kind was
indeed done, mostly back in the 16th/17th century. There were "letters"
which each represented one fragment of a stave with one or several
noteheads on them. It tended to look pretty rough, though. Almost as if we
were to put staves together from ASCII characters like:

---o---
---|
---|
---|



High-quality printing since the mid-18th century has been done by engraving
or etching in metal plates, where the graphics are either first drawn by
hand on the metal surface, or applied to it with stamps of some sort.

Lukas

Re: Inverted breve in Greek?

2001-02-22 Thread Lukas Pietsch


Seán,

these are "perispomeni"s. Not uncommon to see them printed like that.
Encode as u+0342.

Best wishes,

Lukas

Re: extracting words

2001-01-29 Thread Lukas Pietsch



Christopher Fynn wrote:

>BTW without determining the language as well as the script, how do you
propose to determine >if a particular string actually matches a word in
your "blacklist" (in terms of meaning) or not? The >same string of
characters might mean completely different things in two languages that
share >the same script (/Unicode block).

This is assuming that what we want is not just a matching of
*orthographical* words (character strings), but of *lexicographical* words
(aka lexemes). Which of course brings with it even more problems. If you
want to filter out all occurrences of, say, a particular verb, you'll have
to look out for all possible grammatical forms of that verb. 5 forms at
maximum in English (go, goes, went, gone, going), but maybe several
hundreds in a heavily inflectional or agglutinative language. In some
languages the set of possible forms of a lexeme may even be open-ended. No
way of doing that without a full-blown morphological parser (which of
course would have to be language-specific.) Looks like this goes a bit
beyond what Brahim is planning to do.

Lukas

Re: Benefits of Unicode

2001-01-28 Thread Lukas Pietsch


>
> Francois M Richard wrote:
> >
> > Can Unicode conformance be applied to rtf (and how)?
> >
Newer Microsoft products (from Office 97 onwards?) seem to use constructs
of the form
\u\'YY  to encode Unicode characters, where  is the *decimal*
Unicode value and YY is a replacement character in ANSI as an alternative
for non-Unicode-aware readers. The rtf source text itself is encoded in
7bit Ascii, and the codepage used to interpret the \'YY commands is
specified somewhere in a command in the header.
This is the method apparently used by many Windows applications
internally to exchange Unicode data, e.g. through the clipboard. Just save
a sample Word document with some Unicode characters to rtf to see how it
works.
There's more details on this somewhere
in the MSDN library, under "Specifications/Applications/Rich Text Format".


As for html, you can either embed Unicode character entities of the form
&#; in an otherwise 8bit source text, or have the whole source text in
UTF-8 (This is probably rather over-simplified, I guess... :-)

Hope this helps,

Lukas

Re: Greek questions, on- and off-topic

2001-01-24 Thread Lukas Pietsch


That ypogegrammeni/prosgegrammeni thing keeps cropping up, it seems. Looks
like a real stumbling block. I learned a lot during a discussion we had
about it on this list a few months ago.

You may want to refer to the following paper on (ancient + modern) Greek
typesetting and Unicode:

http://genepi.louis-jean.com/omega/boston99.pdf

There's some information both on the different ways of representing mute
iota, and on the use of uppercasing. It turns out that the range of variety
in printing traditions is much greater than many of us might have imagined
at first.



Lukas Pietsch

Re: Greek questions, on- and off-topic

2001-01-23 Thread Lukas Pietsch

Patrick Rourke wrote:
>I imagine that the capitals with
> diaresis are there for text that's in all capitals but is accented.

One modification: I have the impression that capitals with diaresis are
also quite widely used, and may indeed be considered obligatory, with
normal all-uppercased text that is not otherwise accented. This goes for
both polytonic and monotonic Greek.

Lukas Pietsch

Re: Transcriptions of "Unicode"

2001-01-12 Thread Lukas Pietsch

Marco Cimarosti wrote:
>
> I don't fully agree with Mark Davis' API transcription of "Unicode":
>
>
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U
_IPA.gif

Neither do I, but partly for different reasons.

>
> 1) I think that IPA transcriptions should be in [square brackets], while
> phonemic transcriptions should be in /slashes/. If neither enclosing is
> present, the transcription is ambiguous.

Right. And that's actually part of the key to the problem's answer:

> 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not
exist
> in any standard pronunciation of contemporary English. It should rather
be
> the diphthong [ou] (where the [u] would probably better be U+028A).

In America, transcribing the vowel in "code" as /o/ (and "made" as /e/) is
not uncommon, at least in *phonemic* transcription. Generally, American
accents have less diphthongization in these sounds than British accents
have, and phonemically it makes sense to see these sounds as part of the
series of "long vowels". A *narrow phonetic* transcription would have
something like [u+006F u+028A] for American, and [u+0259 u+028A] for
British.

> 3) The transcription shows the primary stress on the first syllable, and
a
> secondary stress on the last one. In the few occasions when I heard
native
> English speakers saying "Unicode", I had the impression that it rather
was
> the other way round.

I can't tell, because where I live I don't get to talk to native speakers
about Unicode a lot. But: According to standard word-formation and
pronunciation patterns in English, the stress pattern shown ('uni,code) is
absolutely what you'd expect: as in "uniform", "unisex", "unicorn",
"universe". (D. Jones, English Pronouncing Dictionary, doesn't even mark a
secondary stress on the third syllable at all.)

> 4) As "Unicode" is the proper name of an international standard, and it
is
> built with two English roots of French origin, it could as well be
> considered a French word, which would lead to a totally different
> transcription.

Right, but this particular pattern of merging word roots into a new word
does suggest English provenance, I think. And, historically, that's where
it did come from.

But there's another inconsistency in the transcription: the vowels in the
first ("u-") and third ("-code") syllable are both phonemically long.
Either you put the length mark on both (recommended for *phonetic*
transcription), or on neither (okay with *phonemic* transcription). (Of
course, if you transcribe the third syllable as a diphthong then you won't
get a length mark there.)

According to the conventions in D. Jones, English Pronouncing Dictionary,
you'd get something like:

[u+02C8 u+006A u+0075 u+02D0 u+006E u+026A  u+006B u+0259 u+028A u+0064]

Lukas

-
Lukas Pietsch
University of Freiburg
English Department

Phone (p.) (#49) (761) 696 37 23
mailto:[EMAIL PROTECTED]

Re: OT (Kind of): Determining whether Locales are left-to-right or right-to-left.

2000-12-07 Thread Lukas Pietsch

Michael Kaplan wrote:

>
> > plus...
> > dumb question 1.  Is Aramaic (which doesn't seem to have a 2 character
ISO
> > code) the same as Amharic (which does...AM)?   If not, Amharic appears
to
> be
> > a Semetic language too, is that written right-to-left too?
>
> Amharic uses the Ethiopic script, and is not RTL as far a I know. Aramaic
> has no native speakers

As far as I know, there is still a (small) minority of speakers in Turkey
and Syria who speak the present-day descendant language of (biblical)
Aramaic. This present-day dialect is commonly called Aramaic too. I have
absolutely no idea what writing system, if any, they would use today
(although probably not the ancient Aramaic script? More likely Arabic?)

Lukas Pietsch

Greek Diacritics Again

2000-11-23 Thread Lukas Pietsch


Dear all,

there's another issue about Greek diacritics I'd like to ask the opinion of
the people who are in the know: the question of (monotonic) Greek "TONOS"
and (polytonic) Greek "OXIA" and their equivalence. I know this has had a
somewhat troublesome history in Unicode.

I seem to remember I read in some Unicode document that the Greek "TONOS"
could be realized *either* as an acute *or* as a vertical stroke. I can't
locate the reference at the moment. Unfortunately I haven't got the book at
hand here and I've been searching the website in vain. Is the standard
(still) actually saying this, or is my memory failing me?

On the other hand, the standard is of course quite unambiguous now about the
fact that the two accents are equivalent in principle. All the "Oxia"
codepoints in 1fxx are singletons (therefore deprecated?) and canonically
map to the corresponding "tonos" codepoints in 03xx.

Would it be fair to sum up the consequences of all this for font design in
the following way: If a font is designed for use with both monotonic and
polytonic Greek, then the "tonos" glyphs should *definitely* look like
acutes. If a font is designed for monotonic Greek only, a font designer can
choose to use either acutes or verticals (or any other shape, for that
matter: decorative typefaces in Greece are apparently using all sorts of
things from wedges to dots or squares...)
But can you think of any good reason for a font to have different (default)
glyphs for the "tonos" and for the "oxia" characters side by side?



Lukas Pietsch
Ferdinand-Kopf-Str. 11
D-79117 Freiburg
Tel. 0761-696 37 23

Universität Freiburg
Englisches Seminar

Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Lukas Pietsch

John Hudson wrote:

> At present, polytonic Greek is not supported in Uniscribe,
> I suspect because no one has determined that it needs to be.

So, would you agree that it does need to be? Keeping in mind what Kenneth
Whistler wrote:

> Not if the fonts they use map capital letter + ypogegrammeni character
> combinations into capital letter + full-size iota glyph sequences.
>
> Of course, if the fonts they use are not designed for correct use with
> polytonic Greek, then the default rendering behavior of the ypogegrammeni
> will not be what they expect or want. Time to upgrade the fonts.
>...
> This is not all that sophisticated. It should be a matter that can be
> wholly encapsulated within the fonts:
>
> Font IFont II
>
> A. 0397 0313 0345  ==>  'H iota adscript  'H iota subscript
>
> B. 1F98==>  'H iota adscript  'H iota subscript
> ...
> Many of us have felt all along that polytonic Greek should always be
> represented decomposed, and that the ELOT polytonic "character" encoding
> was a dangerous conflation of glyph design and character encoding
concerns.
>...
>
> Implementations that use full decomposition for polytonic Greek and fonts
> that correctly map the accentual and diacritic combinations are the
> best bet for consistency *and* good presentation in the long run.
>

Mind that the case-mapping question we were discussing is just one minor
aspect of the issue; the main task is much more general, and at the same
time more straightforward (If we leave aside the issue of automatic case
conversion and the fancy problems of, let's say, small-caps): the decomposed
character sequences simply need to be mapped to the precomposed ones. It
affects not only the iota subscripts/adscripts but also all the other
diacritics. Without some glyph processing most combinations will never
display readably. Since the precomposed glyphs already exist as Unicode
codepoints, I suppose that the implementation would probably not even be
very difficult, and not much of it would even depend on the individual font,
would it?

By the way, I wouldn't agree with Kenneth that it wasn't a good idea to have
the precomposed characters in Unicode in the first place. I'm very glad they
are there, since, as we see, the beautiful smart rendering features we are
talking about are simply not yet available in mainstream text processing
software. Much as I like the idea of the projects such as "Graphite" that
Marco mentioned, I do think there are quite a number of people out here who
would love to be able to handle Greek comfortably in their everyday
all-purpose text-processing and browsing software. The precomposed
characters are at present the only means they have to do so on a Windows
platform. Adding smart rendering support for the decomposed characters would
provide them with a much better means; I'd certainly agree with Kenneth
about that. And I'd also think it would be preferable if that could be done
system-wide and not just by some individual application, wouldn't it? So it
seems as if Uniscribe looks like the best bet at the moment, for Windows
users.

What do the Microsoft people think? May we hope?

Lukas

Re: Greek Prosgegrammeni

2000-11-22 Thread Lukas Pietsch


Thanks to Asmus and Kenneth for their clarifying comments. Things are
beginning to seem to make sense to me... (:-)

Especially, I'm quite relieved to see now that:
- for any one of the common printing variants of mute iota that a user might
want to see,
- there is already at least one easily available truetype font, so that
- even *without* special glyph shaping or glyph substitution mechanisms in
display,
- there will be at least one way of encoding that will be stable, in the
sense that it will guarantee the desired display and not get corrupted when
undergoing canonical composition/decmposition;
and, most importantly:
- all these encodings will be recognized as equivalent by Unicode
applications when it comes to case-insensitive matching (because all these
character sequences case-fold to the same sequence of vowel + small iota
(03B9)). That's something, isn't it?

What will *not* work, for most users, is automatic case *conversion*. This
will lead to undesired or unexpected results in most cases. But there are
other independent reasons for that anyway: For most users, correct
uppercasing also involves the stripping of accents and breathings, and the
Unicode casing rules don't provide for that either. But then again: who
wants to use automatic case conversion for polytonic Greek anyway? (I can
hardly remember having ever used it even in the Latin script in all the text
processing I've done.) People will simply be typing sequences that Unicode
will see as irregular mixed-case strings, but who cares? I guess all the
computational features that really matter to most of us common mortals (like
sorting, word searches etc.) involve the "case-folding" feature used for
case-insensitive matching, and as I said above, this seems to work out in a
fairly intuitive and sensible way.

So, after all, the UTC people do deserve a pat on the back for their good
work? (:-)

I have another ignorant layman's sort of question, but I'll put it into a
second message because it really consitutes a different topic.

Lukas

Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Lukas Pietsch


Dear all,

a lot was said in this thread about intelligent rendering mechanisms, such
as fonts implementing automatic glyph substitution and things like that. The
notion appears to be quite commonplace to the experts, whereas I (being an
amateur) must admit it seemed just like a utopic dream to me when I first
heard of the possibility of such a thing, a few months ago. I figure that
people are mostly thinking of the technology called "Open Type", is that
right?

Can anybody enlighten me about how much support for that technology is
already available in standard software, say, in browsers or text processors
under Windows 9x? If I had a True-Type font that implemented the glyph
substitutions, say, for the Greek combining diacritics, could I make my
average standard word processing software actually use these features? Or
would I have to wait for specialized multilingual word processors to appear
on the market?

I found the documentation of the "GetCharacterPlacement" function in the
Windows API. It looks like that was the place were these things should be
implemented system-wide. But I played with it a bit and found it didn't
actually do any glyph replacements. Is that function actually implemented in
Win98, or is it just a stub? Or did I make a mistake in my testing, or is
something wrong with my system? Can Win2000 do more than Win98 in this
respect?

I also noticed that MS Internet Explorer does use glyph replacement features
on my system when it is displaying Arabic. How does it do that? Would there
be a way of making it use other Open-Type features too?




Lukas Pietsch
Ferdinand-Kopf-Str. 11
D-79117 Freiburg
Tel. 0761-696 37 23

Universität Freiburg
Englisches Seminar

Re: Greek Prosgegrammeni

2000-11-19 Thread Lukas Pietsch


Sorry I'm going on about this again, but I feel still puzzled, so bear with
me once more.

I'm not quite sure if Mark's answer solves my problem. I can see that the
case mappings and decompositions as defined in the charts are internally
contradiction-free, no problem so far. Only, there still seems to be a
mismatch between what the charts show and what users will probably expect to
see. Let me repeat: as far as I can gather, there are several different
typographical traditions, but roughly speaking there are two: In one
tradition, readers expect to see full-size, spacing glyphs for mute iotas
*both* in titlecase and in uppercase (usually a small iota glyph in
titlecase, a small or capital iota glyph in uppercase). In the other
tradition, readers expect to see smaller, diacritic-like glyphs (either
centered under, or near the right corner of, the base letter), again *both*
in titlecase and in uppercase. All the printing I've seen so far seems to
adhere either to the one major pattern or the other; they apparently don't
often get mixed. And as we've seen, many people who are used to the one
pattern aren't even aware that the other exists.

The Unicode charts, somehow arbitrarily, seem to dictate in favour of the
one tradition in the one case and of the second tradition in the other. In
titlecase you get some sort of a non-spacing diacritic, while in uppercase
you *must* use the full-size capital iota glyph. Users who want full-size
iota glyphs throughout will find it difficult to live with the decomposition
to u+0345 in titlecase, while users who want small diacritic glyphs
throughout will see no sense in the u+0399 (capital iota) in uppercase.
Without some *very* sophisticated rendering machine, neither group will be
able to get it all displayed to their taste. People will prefer encoding
their texts in ways deviating from the norm, rather sacrificing case
equivalence than what each of them will consider "correct" display.


Lukas

- Original Message -
From: "Mark Davis" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Sunday, November 19, 2000 8:18 PM
Subject: Re: Greek Prosgegrammeni


> I haven't had time to read this list recently, so here is a somewhat
belated
> response.
>
> >But, even if you do so, we are left with a "wrong" canonical
decomposition:
>
> >1FBC;GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI;Lt;0;L;0391
> 0345N1FB3;
>
> >According to James' statement (which is not totally supported by others,
> >anyway), the decomposition should be U+0391 U+0399 (GREEK CAPITAL LETTER
> >ALPHA + GREEK CAPITAL LETTER IOTA).
>
> Unfortunately, due to historical reasons the characters are misnamed. They
> should be named:
>
> GREEK TITLECASE LETTER ALPHA WITH PROSGEGRAMMENI, etc.
>
> However, we can't change the names. See
> http://www.unicode.org/unicode/standard/policies.html. We can add
> annotations.
>
> Notice that the general category is "Lt" = Titlecase letter, so despite
the
> name the character is the titlecase version. The decomposition is correct
> for that titlecase letter. The full case mapping, as provided in Unidata +
> SpecialCasing is also for the titlecase mapping (see
> http://www.unicode.org/unicode/reports/tr21/ ) You will also find that the
> combining ypogegrammeni cases correctly
>
> The uppercase mappings in Unidata alone are not sufficient for full case
> mapping, but are the best that can be done without changing string
lengths.
> For the full mapping, you have to use SpecialCasing.txt. You can see what
> results on
> http://www.unicode.org/unicode/reports/tr21/charts/CaseChart4.html (you'll
> need a font for the Greek characters). Search for 1FBC. You will find that
> it is the titlecase form. Some fonts will not show the 1FBC with the right
> iota, but you can see from its position in the chart what it should be.
>
> > However, the precomposed characters containing the prosgegrammeni, e.g.
> > "GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI" (u+1FBC) still
> canonically
> > decompose to base letter + "COMBINING GREEK YPOGEGRAMMENI" (u+0345), as
if
> > prosgegrammeni and ypogegrammeni were the same thing. This means that,
> even
>
> Those are the right decompositions (see
>
http://www.unicode.org/unicode/reports/tr15/charts/NormalizationChart17.html
> ), however, because the characters are misnamed it leads to confusion.
>
> Mark
>

Re: Greek Prosgegrammeni

2000-11-08 Thread Lukas Pietsch


> Anyone who really wants to get their brain in a twist over pros- and
> ypogegrammeni should consider the complications of designing and
> associating smallcap variants (lc-sc layout). The lowercase with
> ypogegrammeni maps to a smallcap with miniature prosgegrammeni, while
> the uppercase with prosgegrammeni maps to an uppercase variant with the
> miniature prosgegrammeni.


Gosh, sounds terrible, indeed!
But *is* polytonic Greek ever actually set in this way? Doesn't sound like a
very common thing to do. I can't remember having ever seen it.




Lukas Pietsch
Ferdinand-Kopf-Str. 11
D-79117 Freiburg
Tel. 0761-696 37 23

Universität Freiburg
Englisches Seminar

Re: Re: Greek Prosgegrammeni

2000-11-08 Thread Lukas Pietsch


Thanks to Nick Nicholas for providing the excellent reference to the paper
by Yannis Haralambous, at http://genepi.louis-jean.com/omega/boston99.pdf .
I stand corrected as for the supposed incorrectness of diacritic mute iotas
with capitals. However, I am still puzzled about how to interpret the
unicode norm in the light of Yannis' information.

If I understand correctly now, there have been at least three different
typographic traditions in the rendering of mute iota in titlecase, and up to
five in uppercase:

Titlecase:
(a) small diacritic subscript glyph centered under the base letter,
identical to lowercase iota subscript, as in the earlier Unicode charts.
(b) small diacritic glyph placed near the lower right corner of the base
letter, as in the Unicode 3.0 charts
(c) full-size lowercase iota glyph (according to Yannis, the most common
option, and apparently the only one familiar to many readers outside Greece)

Uppercase:
(a') centered subscript, as in (a)
(b') small diacritic adscript, as in (b)
(c') full-size lowercase iota glyph, as in (c)
(d) small-caps iota glyph
(e) full-size uppercase iota glyph (an option not mentioned by Yannis, but
clearly implied by Unicode's uppercasing rules. Also, this is the one option
familiar to me from German grammars of Greek.)

The question is, which of these glyphs are considered to be valid
realizations of the "letter with prosgegrammeni" code points in Unicode?

As for titlecase, Nick seems to think that all three options should be
regarded as valid typeface variants of the same codepoints, and I guess this
agrees with what Yannis implies. However, if the Unicode charts were
"corrected" to show form (b) instead of form (a) as of version 3.0, as John
Jenkins informs us, this seems to imply that form (a) was no longer
considered a valid representation? Or conversely, if we do consider (a) to
be a valid representation of these codepoints, then the codepoints should
not be *called * "prosgegrammeni", should they?

As for uppercase, the Unicode specification clearly rules out that the
precomposed "prosgegrammeni" codepoints should be used at all, even though
three of the five known typographical variants are identical in both
uppercase and titlecase (a', b', c'). All "prosgegrammeni" and
"ypogegrammeni" codepoints have uppercase mappings to capital iota. Form (e)
is thus the only option formally recognized by Unicode, while options (a')
through (d) are not catered for at all.

It seems to me now that this treatment is both a bit inconsistent and overly
restrictive (in so far as it excludes a', b', c' and d), while at the same
time leading to unneccessary complications in the statement of casing rules.
Given the wide range of typographic variation, wouldn't it be much easier if
we just called these codepoints "capital letter ... with mute iota", using
them for both titlecase and uppercase alike, and leaving it to font
designers to choose the realization type they prefer, and to handle any
positional variants between titlecase and uppercase through OpenType
features and the like?

Until support for systems such as OpenType becomes more widely available,
the bottom line is that I wouldn't advise users to employ the precomposed
characters anyway. Many users have probably never even dreamt of having
computing features such as automatic case conversion or case-insensitive
string comparison work for polytonic Greek; after what they have gone
through in the past they will be quite happy if they can get a line of Greek
text printed correctly on paper, and will be overjoyed if one day they can
paste it into an email. As long as people don't rely on automatic casing
features, it is quite safe for them to encode mute iotas in any number of
ways, using the spacing "prosgegrammeni", "ypogegrammeni", or "iota"
characters to get just the glyph shape they want. It may not be the ideal
Unicode way, though...

Lukas Pietsch
Ferdinand-Kopf-Str. 11
D-79117 Freiburg
Tel. 0761-696 37 23

Universität Freiburg
Englisches Seminar

Greek Prosgegrammeni

2000-11-07 Thread Lukas Pietsch


Hello,

as a newcomer to this list, let me address a question that was probably
discussed a long time ago but seems to be still not quite solved, the
question of the encoding of the "iota prosgegrammeni" characters in Greek
Extended. Apparently a lot of confusion has followed from an initial
misunderstanding that a "iota prosgegrammeni" ("adscript iota") is a
diacritic that looks similar or alike to a "iota ypogegrammeni"
("subscript").
It doesn't. Correct me if I'm wrong, but the only "adscript iota" I know of
in traditional Greek orthography is simply a normal, full-sized iota glyph
(lower-case if the word is title-case or lower-case; upper-case if the word
is upper-case). The only difference between such an "adscript iota" and a
normal iota seems to be that the adscript is ignored in collation. The
adscript iota obligatorily replaces a "subscript iota" in titlecase or
uppercase, whereas it can also be used as an optional, slightly archaic
variant instead of the subscript in lowercase.
If anybody has evidence that small, diacritic-like iota glyphs were ever
used with capital base letters in Greek writing, please let me know and
ignore the rest of this message.
The misunderstanding of the diacritic-like adscript iota is unfortunately
still spreading through the world because the unicode demonstration charts
show it this way. Most font designers have followed what the charts seemed
to dictate to them (even when they knew better), with the result that now
there are very few fonts that show these characters correctly. Microsoft's
"Palatino Linotype" is an exception, as is James Kass's "Code2000" in its
most recent update.

But the real question I'd like to raise is that of the character properties
defined for these characters in Unicode.

The current version correctly states that the standalone "GREEK
PROSGEGRAMMENI" (u+1FBE) is canonically equivalent to a lower-case "GREEK
LETTER IOTA" (u+03B9).

However, the precomposed characters containing the prosgegrammeni, e.g.
"GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI" (u+1FBC) still canonically
decompose to base letter + "COMBINING GREEK YPOGEGRAMMENI" (u+0345), as if
prosgegrammeni and ypogegrammeni were the same thing. This means that, even
if I have a font that shows u+1FBC correctly, if my text undergoes a
canonical decomposition the incorrect subscript glyphs will reappear.
Is there a logic to this that I don't understand, or is this just a hangover
from the time when people did think prosgegrammeni and ypogegrammeni were
the same thing? Wouldn't it be much more logical if precomposed capital
letters such as u+1FBC  decomposed to base letter + "GREEK PROSGEGRAMMENI"
(u+1FBE), or directly to base letter + "GREEK LETTER IOTA" (u+03B9)? To
ensure correct case conversion, it would probably be necessary then to
introduce another special casing rule, making sure that "COMBINING GREEK
YPOGEGRAMMENI" (u+0345) gets mapped to "GREEK PROSGEGRAMMENI" (u+1FBE) when
its preceding base letter gets title-cased.
Something like this is already being done for upper-case anyway.

Does this make sense, and if yes, is there any way of getting it into the
standard?

Lukas Pietsch
Ferdinand-Kopf-Str. 11
D-79117 Freiburg
Tel. 0761-696 37 23

Universität Freiburg
Englisches Seminar

42 matches

Mail list logo