Re: Swastika to be banned by Microsoft?

2003-12-15 Thread Michael Everson
At 08:52 -0800 2003-12-15, Elaine Keown wrote:

Mark said:

I'm embarrassed to admit it, but I find myself thinking that the 
swastika, THE Nazi swastika, right-facing, tilted .the whole 
deal, should be encoded

This looks to me like the ideal place for an extended
note in Unicode, not a code point. 

The note could describe the graphic differences
between the existing code point and the Nazi version.
I am not certain that the existing code position is satisfactory for 
non-CJK use. That is, Tibetan, Norse, Native American, Scouting use, 
and so on. Those NEVER show Han brush-stroke shapes. I would like to 
see some discussion about whether the properties those characters 
have are suitable for use in other contexts.

Some things are really too evil to facilitate even in
a small way in a computer code.
The tilted Nazi swastika is a DIFFERENT character again.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Latin Capital Reversed K

2003-12-15 Thread Michael Everson
At 09:07 -0800 2003-12-15, Alex LeDonne wrote:

http://www.baseballscorecard.com/scoring.htm

This shows complex, non-plain-text notation.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: [OT] CJK -> CJC (Re: Corea?)

2003-12-15 Thread Michael Everson
At 12:09 -0800 2003-12-15, Peter Kirk wrote:

Then let's hope that ISO 10646 doesn't decide to break its own rules 
and change "KOREAN" to "COREAN" in character names e.g. U+321D. 
Think what that would do to the Unicode stability policy - although 
in fact only five names are affected.
It is offensive to suggest that WG2 would do so.

This thread could die now and it would be OK.
--
ME


Re: [OT] CJK -> CJC (Re: Corea?)

2003-12-15 Thread Michael Everson
At 13:55 -0800 2003-12-15, Peter Kirk wrote:
On 15/12/2003 12:25, Michael Everson wrote:

At 12:09 -0800 2003-12-15, Peter Kirk wrote:

Then let's hope that ISO 10646 doesn't decide to break its own 
rules and change "KOREAN" to "COREAN" in character names e.g. 
U+321D. Think what that would do to the Unicode stability policy - 
although in fact only five names are affected.
It is offensive to suggest that WG2 would do so.
Michael, I have never before heard of a committee or working group 
taking offence corporately. My remark was not ad hominem although it 
might have been considered ad comitatem (or whatever the correct 
Latin is). You may personally be very determined not to make such 
changes, but presumably there is a mechanism by which in principle 
you might be outvoted within WG2.
I object, rightly, to your suggestion that ISO/IEC JTC1/SC2/WG2 would 
violate its own rules and make changes which both the UTC and WG2 
have promised not to do. Your statement made it sound as though WG2 
was not a serious standardization body which does not take its 
responsibilities seriously.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: [OT] Corea? (was: Euro-English...)

2003-12-15 Thread Michael Everson
At 00:37 +0100 2003-12-16, Philippe Verdy wrote:

But all this is completely out of topic of Unicode (we are more concerned
here by language codes than by country/territory codes).
Yes, it is.

Still, ISO 3166 or in UN codes is an incomplete standard, as it does not map
correctly all dependant territories (see "YT" for Mayotte, which the UN
still considers a part of Comores in its World Map updated and published in
last August 2003, but that it also falsely documents as a French territorial
collectivity, despite it is now a departmental collectivity, after its local
population approved the new status which integrates it more tightly within
France).
Other missing codes in ISO 3166 and in UN statistics are:
[snip]

If you have issues with the content of ISO 3166, Philippe, take them 
up with ISO TC46. You can contact the secretariat in AFNOR.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Stability of WG2 (was: Re: [OT] CJK -> CJC)

2003-12-16 Thread Michael Everson
At 19:13 -0800 2003-12-15, Doug Ewell wrote:

The North Korean and Chinese national bodies have already made proposals
that violate both the letter and spirit of stability policies.
Yes. And we have rejected them.

I'm glad the U.S. national body will stay involved, but having to rely
on that does sound a bit like having to rely on enlightened statesmen,
doesn't it?
Better than if the whole thing were just left to the employees of 
large companies, Doug. We have good checks and balances.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Case mapping of dotless lowercase letters

2003-12-16 Thread Michael Everson
At 11:03 +0100 2003-12-16, Philippe Verdy wrote:
Doug Ewell <[EMAIL PROTECTED]> writes:
 > Wrong here: I have found occurences of dotless lowercase i, used
 > instead of soft-dotted lowercase i, as base letters for diacritics
 > added above it (it was an accute accent...)
 Don't do that.
What? This is VALID UNICODE to have texts coded like this.
In Irish, it is INCORRECT to spell "físeán" 
'video' with a DOTLESS I + COMBINING ACUTE. It is 
a spelling error, and will fail in 
spell-checking. The correct spelling is either I 
+ COMBINING ACUTE or precomposed I WITH ACUTE.

It is VALID UNICODE to follow LATIN CAPITAL 
LETTER Q with DEVANAGARI VOWEL SIGN E but that 
doesn't mean it's the right way to write anything.

For whatever reason, encoded texts exist before correct fonts are used to
render them. So there does exist texts which use dotless lowercase i before
a diacritic above, simply because the author of the text did not want it to
be rendered with a superposed dot.
Texts which contain spelling errors. Or old IPA 
texts using any number of ad-hoc IPA font 
solutions. Those texts have to be transcoded to 
proper Unicode at some stage. What you suggest is 
Not Recommended.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Stability of WG2

2003-12-16 Thread Michael Everson
At 03:03 -0800 2003-12-16, Peter Kirk wrote:

The North Korean and Chinese national bodies have already made 
proposals that violate both the letter and spirit of stability 
policies.
Fortunately they each have only one vote in WG2.

But isn't that enough to outvote the US body?
Not with Ireland and Japan standing with the US on such an issue. ;-)

We really must get the UK back into SC2 ;-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Case mapping of dotless lowercase letters

2003-12-16 Thread Michael Everson
At 13:00 +0100 2003-12-16, Stefan Persson wrote:
Michael Everson wrote:
In Irish, it is INCORRECT to spell "físeán" 
'video' with a DOTLESS I + COMBINING ACUTE. It 
is a spelling error, and will fail in 
spell-checking. The correct spelling is either 
I + COMBINING ACUTE or precomposed I WITH ACUTE.
Isn't the sequence "dotless i + combining acute" 
canonically equivalent to "dotted i + combining 
acute"?
It is not.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Stability of WG2

2003-12-16 Thread Michael Everson
At 04:36 -0800 2003-12-16, Peter Kirk wrote:

Seriously, can you remind us briefly what the situation is, why 
there is no current UK representation?
I will answer this off-line.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Stability of WG2

2003-12-16 Thread Michael Everson
At 02:53 -0800 2003-12-16, Peter Kirk wrote:

Good point. Remember that the predicted life of Unicode (recently 
predicted by Michael, anyway) is longer than the lifetime of the 
current WG2 members
My point is that the work we do identifying characters and encoding 
them won't have to be done again. Once Manichaean is encoded, it's 
encoded.

One day, 200 years from now, there may be some Puricode revision 
which will do away with some of the duplicate encodings which we have 
for various legacy and round-trip "requirements". But that will not 
invalidate our work today.

Even if this is a millennial reign of peace and prosperity, 
processes of language change will not stop. A list of character 
names from 1000 years ago, even from 400 years ago, would look very 
strange today.
Nothing stops you from publishing a list of character names in proper 
English, in Portuguese, or on some Inglish which may exist a long 
time from now. Currently those strings are "required" to be 
changeless for stability. So we do not change them, as long as that 
requirement remains, which the vendors say it is.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Case mapping of dotless lowercase letters

2003-12-16 Thread Michael Everson
At 16:48 +0100 2003-12-16, Philippe Verdy wrote:
Michael Everson wrote:
 At 11:03 +0100 2003-12-16, Philippe Verdy wrote:
 >Doug Ewell <[EMAIL PROTECTED]> writes:
 > > > Wrong here: I have found occurences of dotless lowercase i, used
 > > > instead of soft-dotted lowercase i, as base letters for diacritics
 > > > added above it (it was an accute accent...)
 > >
 > > Don't do that.
 >
 >What? This is VALID UNICODE to have texts coded like this.
 In Irish, it is INCORRECT to spell "físeán"
 'video' with a DOTLESS I + COMBINING ACUTE. It is
 a spelling error, and will fail in
 spell-checking. The correct spelling is either I
 + COMBINING ACUTE or precomposed I WITH ACUTE.
Spelling was not the issue there. Only Unicode validity.
Apparently you should look up the word "valid".

Any character can follow any other character and 
be "valid". Any combining character can be 
applied to any base character, regardless of 
script.

 > Texts which contain spelling errors. Or old IPA
 texts using any number of ad-hoc IPA font
 solutions. Those texts have to be transcoded to
 proper Unicode at some stage. What you suggest is
 Not Recommended.
Not recommanded but still valid (and actually used in Turkish as well!)
Case folding in Turkish and Azeri is DIFFERENT 
from everywhere else and you have to have a local 
tailoring for it.

used in some occasions because of defects in fonts that don't have a
precomposed glyph for letter i with the diacritic but have a separate glyph
for the combining diacritic and for the dotted and dotless letters i, or
that use renderers unable to remove the soft dot.
What defects there are in FONTS without UNICODE CMAPS is of no concern to us.

The IPA-93 font is such one, which allows good 
typesetting, but which needs glyph processing to 
select the appropriate base letter.
It isn't a Unicode font, and so it doesn't 
matter. Data represented in it has to be 
transcoded to Unicode, and the font has to have 
the right thing in it.

My main issue is, however with Turkish names found in environments where
language identification is not possible (for example a simple filename or a
locale-neutral database field or an international HTML form which requests
user names and use them as case insensitive identifiers); lowercase dotless
i do not work appropriately there.
Oh well.

I think it is completely illogical to match together with case-insensitive
compares, the three letters:
LATIN SMALL LETTER I (dotted)
LATIN CAPITAL LETTER I (dotless)
LATIN CAPITAL LETTER I WITH DOT ABOVE
but not:
LATIN SMALL LETTER DOTLESS I
when use locale-neutral compares, given that the normative uppercase mapping
of this fourth letter is the second letter above.
That is not what happens in locale-neutral comparisons, I believe.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Case mapping of dotless lowercase letters

2003-12-16 Thread Michael Everson
At 20:30 +0100 2003-12-16, Chris Jacobs wrote:

 > NO. There's no canonical equivalence between distinct pairs of characters,
 if the first letter of each pair are not also canonically equivalent.
compare ëœ with ´¨

The first pair has e trema as its first letter, the second pair e ogonek.
Yet these  pairs are canonical equivalent.
The base letter is "e"
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Stability of WG2

2003-12-16 Thread Michael Everson
At 16:05 -0500 2003-12-16, [EMAIL PROTECTED] wrote:

Thus when Brontosaurus and Apatosaurus were found to be synonyms, 
Apatosaurus was chosen as the preferred name because it was 
published first; however, this is not properly describable as 
"changing the name of Brontosaurus to 'Apatosaurus'". "Brontosaurus" 
is a perfectly good name and may still be used even though it is 
dispreferred.
Brontosaurus was good enough for me when I was five, and it's good 
enough for me today. Hmpf. Dispreferred me elbow.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Case mapping of dotless lowercase letters

2003-12-16 Thread Michael Everson
At 00:35 +0100 2003-12-17, Philippe Verdy wrote:

 >>NO. There's no canonical equivalence between distinct pairs of
 >>characters, if the first letter of each pair are not also canonically
 >>equivalent.
 >
 compare ë? with e¨

 The first pair has e trema as its first letter, the second pair e ogonek.
 Yet these  pairs are canonical equivalent.
True in the way you interpret my sentence, but when I say the "first letter"
of each pair, I mean the first non decomposable character of each pair. In
your example, both letters are simple "e" vowels.
e-diaeresis is decomposable to e + combining 
diaeresis. e-ogonek-diaeresis is decomposable to 
e + combining diaeresis + combining ogonek or to 
e + combining ogonek + combining diaeresis. The 
last two are equivalent.

Both "dotted lowercase i" and "dotless lowercase i" are not decomposable...
unlike "dotter uppercase I"...
small letter i and small letter dotless i are as different as t and thorn.

Well Outlook 2000 is unable to represent any e with ogonek and trema of your
example.
Get a better browser.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: [OT] CJK -> CJC (Re: Corea?)

2003-12-17 Thread Michael Everson
At 11:30 + 2003-12-17, [EMAIL PROTECTED] wrote:

I doubt Christians mean offence when they refer to Jesus through any of the
countless transcriptions, spellings and pronunciations used in various
languages.
It's odd that in English Judas and Jude are distinguished; in the 
original they are not.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: [OT] CJK -> CJC (Re: Corea?)

2003-12-17 Thread Michael Everson
At 11:04 +0100 2003-12-17, Marco Cimarosti wrote:

There is reason to rename "Colonia" to "Köln", "Augusta" to "Augsburg",
"Eboraco" to "York", "Provincia" to "Provence", and so on.
Nicely said. Subtle irony tends to go over some 
people's heads on this list though.

Eboraco is called Eabhrac in Irish. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: [OT] Keyboards (was: American English translation of character names)

2003-12-18 Thread Michael Everson
At 14:53 + 2003-12-18, Arcane Jill wrote:
Oh wow. Well, the range of different keyboard 
layouts I see around me is something else! 
(Especially on laptops).

Now here's something weird. Just about every 
standard, fully-size, desktop, (British) QWERTY 
keyboard I have ever seen, has the legend for 
U+00A6 BROKEN BAR as the shifted symbol printed 
on the key to the immediate left of Z (with the 
unshifted symbol being backslash), and the 
legend for U+007C VERTICAL LINE as the third 
symbol printed on the key to the immediate left 
of 1 (with the unshifted and shifted symbols 
being backquote (U+0060, officially GRAVE 
ACCENT) and the aforementioned "not sign" 
(U+00AC) respectively). Thus, you would expect 
 to yeild BROKEN BAR, and you 
would expect  to yield 
VERTICAL LINE, because that's what printed on 
the keys.
On the Mac, the situation is a bit different. On 
older keyboards, the grave/tilde `~ key was to 
the left of the 1; on newer ones, that key is to 
the left of the Z, and to the left of the 1 is 
the section/plus-minus §± key. Then on the other 
side of the keyboard, older keyboards had the 
backslash/vertical-bar key to the right of the 
equals-sign; newer keyboards have this key to the 
right of the apostrophe key.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: American English translation of character names

2003-12-18 Thread Michael Everson
At 16:21 +0100 2003-12-18, Philippe Verdy wrote:
John Cowan wrote:
 The most mysterious term is "caron" for the hacek accent: this word
 seems to exist only in ISO standards, and nobody has any idea where it
 came from.
I think it may have occured in some typographic terminology, because 
the intial glyph looked more like a crochet hook than to a reversed 
circumflex, i.e. caron was not angular in handwritten form, as it is 
now in typesetted fonts, but looked like a rounded and oblique check 
mark (a slight variation of the accute accent with a small rounded 
hook on its bottom end, but still much more distinctful from the 
lower half-circle form used by breve).
This doesn't make any sense to me, but in any case it does not 
explain the origin of the word "caron". The most plausible suggestion 
I've ever come up with is folk-etymological: It's a CARet that sits 
ON the vowel. :-(
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: American English translation of character names

2003-12-18 Thread Michael Everson
At 09:01 -0500 2003-12-18, John Cowan wrote:

"Underscore" would suggest rather U+0332, the combining low line.  As
for "pilcrow", it's probably descended from a perversion of "paragraph",
but nobody knows for sure.
The OED gives other forms for it:

15th-century pylcraft(e), pilecrafte; 16th-century pilcrowe; 
17th-century pilkrow, pill-crow, peelcrow, pilgrow. Apparently for 
pilled crow, cf. pilcord, pilgarlic. The application of the word, 
with the form pylcraft, has suggested that it originated in a 
perversion of PARAGRAPH, through pargrafte, *parcrafte, etc.: cf 
quote c 1460 and 1617. But the history of the word is obscure, and 
evidence is wanting.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: [OT] Keyboards (was: American English translation of characte r names)

2003-12-18 Thread Michael Everson
At 18:44 +0100 2003-12-18, Marco Cimarosti wrote:

 > They didn't add an extra key for the Euro though. We access that as
.
What OS is it? Most european keyboard I have seen have euro on .
Not English. AltGr + E usually gets you the acute accent in the UK; 
certainly that is the case for Irish keyboards.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Aramaic unification and information retrieval

2003-12-21 Thread Michael Everson
At 01:54 +0100 2003-12-21, Philippe Verdy wrote:

The way the various Indic scripts create ligatures and take contextual forms
make each of them very unique by themselves. The only common thing they have
is a set of common phonemes which are more or less near from each other,
with large variations between regional dialects.
They have a common structure, which we follow in encoding.

The way each of these scripts were then used and created their own
orthograph for distinct languages and they were adapted to allow writing one
language in another with irregular orthographic rules is so important that
simple 1-to-1 transliterations from one to the other are very poor. You
can't simply transliterate without taking into account difference of
phonetics between regions speaking variants of the same language.
Nonsense. Of course you can. KA is KA is KA is KA and BHA is BHA is 
BHA is BHA. The *reading rules* for pronouncing what's been written 
differ, but the transliteration is by and large one-to-one. Tamil of 
course is an exception, having lost some consonants.

Finally, not all Indian share the "same" subset of characters. It's just
unfortunate that you think that because the ISCII standard tried to "unify"
them in the same encoding model, but still with distinct charsets.
This doesn't make any sense to me at all.

Indic scripts have much less in common than Greek, Latin and Cyrillic.
That isn't true.

They are just using smaller sets of letters (at the price of an extremely
elaborate system of contextual forms).
I don't know what you are talking about.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Aramaic unification and information retrieval

2003-12-21 Thread Michael Everson
At 12:33 -0800 2003-12-21, Peter Kirk wrote:

Nonsense. Of course you can. KA is KA is KA is KA and BHA is BHA is 
BHA is BHA. The *reading rules* for pronouncing what's been written 
differ, but the transliteration is by and large one-to-one. Tamil 
of course is an exception, having lost some consonants.
Michael, in view of this do you think it might be sensible to treat 
the different Indic scripts as equivalent for collation purposes?
No, not at all. Not in the default template. The default template 
sorts scripts separately.

This might be especially useful with a corpus of material in one 
language e.g. Sanskrit but using different scripts.
Actually I rather think it would form a list which was an 
outrageously illegible mess.

And then, how about the Semitic scripts? After all, ALEF is ALEF is 
ALEF is ALEF and ...
Nope. It would also be an outrageously illegible mess. But you can 
tailor it locally if you wanted to.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-21 Thread Michael Everson
At 14:18 -0800 2003-12-21, Peter Kirk wrote:

So, "KA is KA is KA is KA and BHA is BHA is BHA is BHA", and ALEF is 
ALEF is ALEF is ALEF, except when it comes to comparing them and 
collating them?
In the context which I was speaking, yes. The Indic KAs have a 
one-to-one relationship, historically. We know this. Likewise the 
Semitic ALEFs. That doesn't mean that we should unify the Indic 
scripts all into one (which we haven't) or that we should unify all 
the Semitic scripts into one.

If you have a multiscript database for Pali and you need to search 
all the KAs accross scripts, you will have to have a local engine to 
do so. The scripts are distinct as encoded in the Unicode standard.

If you want to sort such a database, illegible as the result would 
be, you can do it, with a local tailoring for your specific purpose. 
The default table in the UCA will not interfile them, however, 
because it orders the scripts sequentially (apart from digits, which 
are treated differently because of their particular properties). I'm 
not saying you can't tailor. You can. I'm saying we're not going to 
change what we are doing in the UCA and ISO/IEC 14651 because it 
distinguishes scripts on purpose.

Of course if one collates together a mixture of Latin script texts 
in very different fonts and styles one can get an outrageously messy 
list which is illegible to those who don't know all the different 
fonts.
I do not consider the Semitic nodes we are considering for eventual 
encoding to be font variants of each other.

But that is hardly the point. Anyway, I don't see the main purpose 
of collation as producing lists of legible words, but rather as 
matching in text and database searches.
Which you as an expert can do with special tools.

Michael, do you realise that I am trying to offer you an olive 
branch, and all I get is it thrown back in my face, nicely by you 
but rudely by someone else offlist.
No, I didn't. In the first place I didn't know that we were at war. 
In the second place, all I'm telling you is that we have practices 
which are generic to certain levels of our work, and we are not 
likely to deviate from those practices. That's not throwing something 
in your face. That's telling you what's what. We had a similar 
discussion about generic practice when we were putting Runic into the 
UCA. Swedish specialists wanted a Latin-based order. That's specific. 
Everyone else, though, would want the native Futhark order. The 
Japanese NB, which doesn't really worry about Runes much, thought 
that the generic order should be the basic historical one.

I think that it just might be acceptable to encode the various 
ancient Semitic scripts separately if they are unified for collation.
You can tailor a unified collation for them or indeed for anything you like.

But if you are saying that it must be all or nothing, I will 
continue to fight on behalf of the users of these scripts for all of 
what they want, rather than what you have apparently unilaterally 
(on the basis of a book which describes glyph shape differences 
rather than the systematic differences which really distinguish 
scripts) decided that they ought to want and have written into your 
Roadmap.
*I* have not decided on the basis of *one* book, thanks very much. 
Nor have I done anything unilaterally. Nor have we made decisions 
which aren't based on our normal working practice.

I'm not interested in worrying about these bits of the Roadmap right 
now. If I work on anything over the Christmas, it should be N'Ko. 
Then there is more work on Cuneiform. Then work on Manichaean and 
Avestan. Then I've got to prepare for the PDAM comments. This 
sniping, even when nice, isn't doing you any good, nor me. Can we 
drop this for a while, please?

Michael

(I am sorry you had rude private mail from someone. I also had 
private mail from someone which suggested that I didn't know anything 
about Indic scripts, while saying a whole lot of other rather 
incomprehensible things about ISCII and Unicode. Better forgotten.)



Re: Aramaic unification and information retrieval

2003-12-22 Thread Michael Everson
At 04:27 -0800 2003-12-22, Peter Kirk wrote:

In view of this, I call for a review of the roadmaps and in 
particular of the status of the Aramaic, Palmyrene, Nabataean, 
Elymaic and Hatran scripts.
We heard you the last time, Peter. We know that this is a concern of yours.

Serious consideration should be given to unifying these scripts with 
the Hebrew script, of which they appear to be glyph variants.
To you.

The separate status of Phoenician may also need to be reconsidered.
Absolutely not. Phoenician is the mother of these scripts and Greek 
and Old Italic besides. Greek and Old Italic did *not* descend from 
"Hebrew", and it is pernicious to go on suggesting that Phoenician 
should be unified with Hebrew. If you want, as some scholars do, to 
write Phoenician in Hebrew script, go right ahead. That is a 
perfectly reasonable transliteration choice. Nothing prevents you 
from doing it. But historical realities and relationships *do* have 
some relation to the content of the Unicode Standard and ISO/IEC 
10646. And that may include encoding things that you won't use, 
though *others* might.

Note that I am calling for a review only of scripts listed in N2311 
as not in current use.
Please do not force us to undertake this review NOW. We do not have 
the resources to do so effectively and already this thread has taken 
up far too much time and energy. We have explained to you that 
nothing actionable is happening with any of this material at present. 
How many times do I have to say that?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



2003-12-22

2003-12-22 Thread Michael Everson
Grianstad faoi mhaise do chách! Happy Solstice to everyone!
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 21:36 -0800 2003-12-22, Doug Ewell wrote:

Maybe not as far as whether it will actually be encoded.  We do know 
that "Accordance with the Roadmap" is often the sole justification 
for the code positions specified in proposals, as discussed in a 
thread some months ago.
Excuse me? Are you irritated about something, Doug?

When I fill out the proposal summary form, I do NOT bother to rehash 
all the reasons why we decided to put something on the BMP or the 
SMP. Why? Because it isn't a good use of our time to rehash all of 
these things and pour out the history of why we thought it would be 
good to put something where. "Accordance with the Roadmap" is often 
the sole justification that I bother to put in the Proposal Summary 
form. But it reflects consensus about where the Roadmap Committee 
thinks things ought to go. You may remember that Ken convinced me to 
move Phoenician to the SMP at one stage in favour of Arabic 
Extensions. I suppose that's in the archives somewhere, where some 
future Historian of Unicode (hi there!) can find it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 01:59 -0800 2003-12-23, Doug Ewell wrote:

The impression I get, which is probably totally off base, is that when
Script X is first considered a candidate for possible future encoding,
Michael or somebody looks around for a big-enough empty spot in the
Roadmap and says, "Hmm, let's put it... there."  There are zones for RTL
scripts and a rough guideline for zones in the SMP, but in general it's
pretty much open territory.
You may remember that the Roadmaps were first made by me (not by a 
committee), oh, some time back in 1996 or 1997. I am not sure, 
actually. Actually I did find some copies of old versions and I 
thought I might dust them off and post them to the Roadmap site so 
people could see how things evolved, assuming that people care. I'm 
not sure when the first Roadmap version that I sent to WG2 was.

Of course at http://www.unicode.org/roadmaps/index.html we inform you:

"When scripts are actually proposed to the UTC or to WG2, the 
practice is to 'front' them in the zones to which they are 
tentatively allocated, and to adjust the block size with regard to 
the allocation proposed. The size and location of the unallocated 
script blocks are merely proposals based on the current state of 
planning. The size and location of a script may change during final 
allocation of the script."

Years later, when some of the adjoining allocations may not have 
gotten off the ground and others have suddenly sprung into being 
(like the FUPA extensions, which IIRC were never roadmapped until 
after they were proposed),
Alphabetic extensions or something was put in about the same time, if 
I recall. Usually when something pops up I roadmap it. It helps to 
know where things might fit.

the formal proposal for Script X is written and cites the Roadmap as 
the only justification for the proposed code points, even if there 
might be other reasons supporting (or controverting) that criterion.
The justification is only with regard to what plane the thing is on.

Usually it doesn't matter what code positions a script gets, as long 
as small alphabets are aligned on a half-block boundary (for SCSU), 
but it might be nice sometimes to see a rationale other than 
"Accordance with the Roadmap," or a short blurb explaining why the 
Roadmap had the script there in the first place.
Might it indeed. :-|

The justification is only with regard to what plane the thing is on.

This is NOT a huge problem for me, just something I've noticed. 
With all the careful scrutiny that character proposals get, on 
everything from glyphs to properties, the code position assignments 
seem relatively arbitrary.
Ahem. The justification is only with regard to what plane the thing is on.

 >  I deliberately followed the roadmap codepoints for my recent
 'Phags-pa proposal even though I think 'Phags-pa probably belongs in
 the SMP (but I don't really care where 'Phags-pa is encoded as long as
 it is encoded, so I am happy to defer to Michael, Rick and Ken in this
 regard); and then WG2 in their wisdom decided to reallocate the block
 three rows north of the roadmapped codepoints ... so maybe you can't
 assume that roadmap codepoints are carved in stone.
I didn't see the minutes of the meeting where that decision was made.
What was the rationale for moving it?
It had been on the Roadmap to the BMP along with some other Brahmic 
scripts, and with Tibetan and Mongolian, as far as I recall.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 04:30 -0800 2003-12-23, Peter Kirk wrote:

As the subject line here is still about Aramaic, I shall remind you 
all that that script is a good example of a script which has been 
roadmapped for the BMP as a misunderstanding.
I am aware that this is your opinion, Peter.

If there is such a script at all distinct from the Hebrew script, it 
is one which died out, and was replaced by other encoded or 
roadmapped scripts, more than 2000 years ago.
Just for the sake of argument, and not with particular reference to 
the scripts currently under discussion, it is acceptable to us to 
encode extinct scripts even when some scholars prefer to use 
something else. Gothic is one such example.

So this is a case where the original decisions of the Roadmap 
Committee need reviewing.
You have stated this already.

That decision was based on N2311 which, as James points out, notes 
twice that "Further research is required".
Gosh. And I'm the one who wrote that. Isn't that something?

The UTC should make sure that such research has been done properly, 
and not allow provisional decisions taken on the basis of incomplete 
research to become standardised by default.
Don't be ridiculous. Nothing gets standardized by default.

Thank you for your input. Your input has been noted. Will you please 
give it a rest now? The matter will be reviewed in due course.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 17:41 -0800 2003-12-22, Kenneth Whistler wrote:

If there is, however, some consensus that Samaritan and Manichaen 
*do* deserve separate encoding consideration, how about pursuing the 
furthering of encoding proposals for those as distinct scripts and 
then come back around later to review the ancient forms once again 
after some more of the pieces have fallen into place?
Oh, Manichaean is certainly going to be encoded. The German scholars 
I met with in Prague last year have been extremely helpful in

Regarding Samaritan, there is a group of modern users certainly. This 
page http://www.orindalodge.org/kadoshsamaritan.php has a number of 
interesting links on it. Masonic scholars apparently differentiate 
between Hebrew and Samaritan.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 17:41 -0800 2003-12-22, Kenneth Whistler wrote:

If there is, however, some consensus that Samaritan and Manichaen 
*do* deserve separate encoding consideration, how about pursuing the 
furthering of encoding proposals for those as distinct scripts and 
then come back around later to review the ancient forms once again 
after some more of the pieces have fallen into place?
Oh, Manichaean is certainly going to be encoded. The German scholars 
I met with in Prague last year have been extremely helpful in working 
out the specifications needed. And I am supposed to meet with Iranian 
experts later this year to finalize things.

Regarding Samaritan, there is a group of modern users certainly. This 
page http://www.orindalodge.org/kadoshsamaritan.php has a number of 
interesting links on it. Masonic scholars apparently differentiate 
between Hebrew and Samaritan.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 06:10 -0800 2003-12-23, Peter Kirk wrote:

If so you must have second sight, because I have not stated this 
point before, which is that the place for Aramaic, if encoded at 
all, is on the SMP together with other extinct scripts.
Ah. I thought you were complaining (again) about Aramaic being on any 
Roadmap, rather than making a distinction between SMP and BMP.

But extinct scripts should be encoded on the SMP, according to the 
rules in e.g. TUS 4.0 section 2.8. Gothic is an example of that. If 
Aramaic is encoded, it should be another example.
There are no RULES about where anything gets encoded. There are 
guidelines. nevertheless, I have no problem with Aramaic being 
encoded on the SMP. I'll move it there now. Happy Christmas. :-)

The UTC should make sure that such research has been done 
properly, and not allow provisional decisions taken on the basis 
of incomplete research to become standardised by default.
Don't be ridiculous. Nothing gets standardized by default.

It was you, Michael, who wrote:

When I fill out the proposal summary form, I do NOT bother to 
rehash all the reasons why we decided to put something on the BMP 
or the SMP.
That implies that you expect the UTC to accept those reasons without 
further questioning,
No, it doesn't, but you are not taking into account other facets of 
our process that have to do with consensus in the meetings. I can't 
fault you for that, but please don't be so literalist. ;-)

without even any documentation explaining the earlier decision, and 
without checking whether, even according to that documentation, 
"Further research is required". That was my meaning.
The UTC doesn't allocate code positions. WG2 does. We assign things 
their places in WG2 meetings according to consensus.

Now, go have a mince pie. I'm going to. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 08:51 -0800 2003-12-23, Peter Kirk wrote:
On 23/12/2003 06:22, Michael Everson wrote:

...
There are no RULES about where anything gets encoded. There are 
guidelines. nevertheless, I have no problem with Aramaic being 
encoded on the SMP. I'll move it there now. Happy Christmas. :-)

Thank you. I see it is done already. Happy Christmas!
I told you I was going to do it NOW. ;-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 20:24 + 2003-12-23, [EMAIL PROTECTED] wrote:
.
Peter Kirk wrote,
 ... But I do know of one person today who chooses to read the Hebrew
 > Bible rendered with palaeo-Hebrew glyphs.

http://www.crowndiamond.org/cd/torah.html

Yes, this is fascinating and I'd stumbled across it before.
Of course, to echo the observation John Hudson made regarding the 
Masonic Hebrew and Samaritan text, the text presented here 
http://www.crowndiamond.org/cd/genesis.html shows that Palaeo-Hebrew 
should obviously unified with Latin.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: why Aramaic now

2003-12-23 Thread Michael Everson
Elaine,

Rick and I and Ken have all explained our position already. You're 
doing nothing but stirring up a whole bunch of stuff that we aren't 
working on now, and that we aren't going to be working on soon. 
You're not asking us to deal with anything actionable, and this is 
keeping us from doing work which IS actionable and necessary. We have 
received Peter Kirk's request for review. I moved Aramaic to the SMP. 
That doesn't mean that we will ever encode it. It does mean that 
further research is required. I do not have time or resources to 
invest in the work required to handle this request right now. There 
are few others in WG2 or in the UTC who would be prepared to do so 
either. I have asked you any number of times, courteously to accept 
this. Nothing is being encoded that endangers your use of Hebrew 
transliteration which you are currently using. If some day other 
things are encoded, nothing makes you have to use them.

Please stop pouring oil on this.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 14:39 -0800 2003-12-23, John Hudson wrote:

Now, that said, I am very keen to have the Samaritan shin encoded, 
because this is used as a mark in the apparatus critici of the BHS 
and possibly other Bible editions (in BHS it used in citations of 
Pentateuchi textus Hebraeo-Samaritanus secundum). I'd be perfectly 
happy to see it encoded as a Letterlike Symbol, since it is being 
used as a symbol and not as a Samaritan letter.
Perhaps it must be in any case, due to directionality issues.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: why Aramaic now

2003-12-23 Thread Michael Everson
At 14:49 -0800 2003-12-23, John Hudson wrote:

Michael, I think you are missing the point that other people do have 
time and resources to devote to 'further research' at this time, and 
this is why these discussions are happening. Personally, I'm happy 
to accept that the position of Aramaic in the roadmap is an open 
issue and is going to remain so, but as Elaine pointed out there is 
a lot of interest in Unicode among Biblical scholars right now -- 
which is a Good Thing -- and some of these people are wanting to 
start addressing some of the questions and issues that they are 
confronting as they proceed.
The main answer to their question is that they can use Hebrew to 
transliterate whatever they want. Whether Phoenician or Samaritan 
needs to be encoded for OTHER purposes than those of these particular 
scholars (who are happy using Hebrew square letter fonts for them) is 
another question.

I don't think this means you personally need to do anything -- or 
Rick or Ken -- but there are going to be some proposals developed 
for additional Hebrew characters
I'm not complaining about that, and am helping with two of them.

and some documents on different approaches to unifying or not 
unifying the bewildering array of early semitic writing systems,
That *is* something that is going to impact on what I have to do, and 
I would really rather not be forced to give up doing other things to 
deal with that. Which I am, even now.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 01:02 +0100 2003-12-24, Philippe Verdy wrote:
Michael Everson:
 Perhaps it must be in any case, due to directionality issues.
If you have looked at those pages, you have seen that they were coded as a
cypher of Latin, but with no implied association with these letters. It just
allows using the existing font technology in a way that is not Unicode
compliant as it shows unrelated glyphs for standard Latin letters.
Goodness, that would never have occurred to me.

The rest of your post has nothing whatsoever to do with the character 
John Hudson is referring to, nor to its properties, which is what I 
was discussing.

Please do not answer this with a lengthy response explaining what you meant.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Samaritan, was: Aramaic unification and information retrieval

2003-12-23 Thread Michael Everson
At 15:51 -0800 2003-12-23, Peter Kirk wrote:

Agreed that the Samaritan shin is urgent for this reason.
This could be added in the ballot comments to the symbol set 
currently under ballot. I would need a good scan of the character in 
context and its bibliographical reference.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Aramaic unification and information retrieval

2003-12-24 Thread Michael Everson
At 01:40 +0100 2003-12-24, Philippe Verdy wrote:
Michael Everson wrote:
 > Of course, to echo the observation John Hudson made regarding the
 > Masonic Hebrew and Samaritan text, the text presented here
 > http://www.crowndiamond.org/cd/genesis.html shows that Palaeo-Hebrew
 > should obviously unified with Latin.
Instead of taking dogmatic positions on how proto-semitics scripts 
should be encoded, why not leaving this work to the people that will 
really use these scripts and are currently working with those texts 
and publishing them?
Because I am not taking "dogmatic positions". I know what I am doing. 
I am being careful, trying to manage the work within the larger 
context of the schedule we have set ourselves, and trying to do this 
it in terms of realistic priorities.

It seems that there are much enough people working there without 
needing to oppose to all what they have to say.
That isn't what I am doing. Indeed, I accepted a useful suggestion on 
the part of Peter Kirk. I do, however, oppose overunification when it 
is warranted to do so. At the same time it takes time to do that. It 
took a great deal of time to disunify Coptic from Greek and Nuskhuri 
from Mkhedruli. I do NOT want to have to do that again with a hasty 
overunification of early Semitic alphabets.

Could you instead take the time to work on the missing Latin letters 
for African languages? Why isn't there any serious work about these 
living languages that don't have lot of universitary support and 
nearly no computer resources in Africa to make this job?
Thank you for proposing more topics requiring extensive research and 
proposal preparation, especially as the materials needed to make such 
proposals are not available to us. Please give generously to the 
Script Encoding Initiative to enable us to undertake such work. 
Alternatively, please collect the necessary materials and provide 
them to us.

There is still interesting work to do within the Latin and Arabic scripts.
Yes, there is. See N2692, for instance, and Ns247, and N2641, and 
N2640, and N2581R2.

It's a shame that someone like you invest so much in an area that 
would better be specified by other communities.
Is it indeed.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: Aramaic unification and information retrieval

2003-12-24 Thread Michael Everson
At 11:50 +0100 2003-12-24, Philippe Verdy wrote:
John Jenkins wrote:
 > No, it was not.  Han would have been unified even if there had been
 > space not to do so.
I fully agree. Unicode would have been updated later to support 
surrogates if CJK had been extended so much that it could no more 
fit the full CJK set.
This has nothing to do with what John said.

ISO10646 could have followed a distinct path where each language 
could have been encoded separately, but the choice to encode only 
scripts has greatly reduced the needs for more planes, which was 
reasonnable to project when you saw the explosion of encodings that 
were soon to exceed the capabilities of ISO2022 and similar 8-bit 
code repertoires).
This is, I am sorry to say, a completely unwarranted assumption. No 
one EVER suggested "encoding each language separately" in ISO/IEC 
10646 [sic]. This is but the latest of Philippe's pronouncements, 
presented as though he were an expert who had been following the 
Unicode project from the beginning. Unfortunately, it is as wrong as 
it is unsubstantiated.

Note to the historians of Unicode reading these archives: Caveat lector.

Note to Philippe: Over the past six months, you have written as 
though you were expert in all things Unicode; it is clear 
nevertheless that you are not, not yet, and that you have much to 
learn. You need to go and do the work of learning it. Doug Ewell did 
this, and went from being an amateur to a valued member of our team. 
Currently, I can't count the number of times that you have come out 
with "authoritative" pronouncements which had no basis in fact, and 
your credibility is nearing zero.

(That advice, Philippe, is a Christmas present for you. Please do not 
respond with a lengthy explanation. And please do not send me a 
private message about it. If you do, I promise I will blacklist you, 
as I know at least one other has.)

(Of course I am sure I have my own detractors reading this list, to 
whom I will look to some like Michael Curmudgeon McKnowitall Everson 
by saying this out loud, as opposed to sniggering quietly in offlist 
mail, but sometimes that's just my lot.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-24 Thread Michael Everson
At 07:11 -0800 2003-12-24, Elaine Keown wrote:

In the Dead Sea Scrolls, several other letters with Palaeo-Hebrew 
shapes are used as paragraph etc. markers.
Those would be Phoenician letters with RTL directionality used as 
markers in a traditional text. (That is given the current Roadmap 
which unifies Palaeo-Hebrew and Phoenician.)

So, if you wish, your shin could be submitted when they are--Elaine
The Samaritan shin is an LTR clone of, um, the Samaritan shin used in 
Western Biblical references.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-24 Thread Michael Everson
At 15:36 + 2003-12-24, Michael Everson wrote:

The Samaritan shin is an LTR clone of, um, the Samaritan shin used 
in Western Biblical references.
Recte: The Samaritan shin is an LTR clone of, um, the Samaritan shin, 
and is used in Western Biblical references.

--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: why Aramaic now

2003-12-24 Thread Michael Everson
At 11:47 -0800 2003-12-24, Elaine Keown wrote:

Michael, I have NO "master plan" for 2004 where this work on Aramaic 
unification (or near-unification) will be completed in a particular 
month, quarter, or even season.
Or not. It depends what kinds of criteria we select, or don't, and 
it's good to know that you aren't prioritizing that either.

In the meantime, if you *do* have contact with experts in Samaritan, 
could you inform Debbie Anderson of this. Samaritan is likely to be 
actionable in the shorter term rather than the longer, and is clearly 
a different script from Hebrew.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: why Aramaic now

2003-12-24 Thread Michael Everson
At 12:02 -0800 2003-12-24, Elaine Keown wrote:

Some of the sets of symbols I found---which I simply assumed could 
be added to "Hebrew"--are innately controversial because of the 
Roadmap.
Innately?

That's actually true for 3 subsets of symbols that I think of as 
"Extended Hebrew."
Try thinking of them as General RTL Punctuation.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: [hebrew] Re: Aramaic unification and information retrieval

2003-12-24 Thread Michael Everson
We have encoded 70,000 of them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: why Aramaic now lumpers and splitters

2003-12-24 Thread Michael Everson
At 12:29 -0800 2003-12-24, Elaine Keown wrote:

It appears to me that script experts may resemble experts in 
dialects/languages:  there are lumpers and splitters

I'm a lumper, but I am a thinking lumperI will be thinking about 
Phoenician retrieval in early 2004
There is zero chance that Phoenician will be considered to be a glyph 
variant of Hebrew. Zero chance. The number of books about writing 
systems, from children's books to books for adults, which contain 
references to the Phoenician alphabet as the parent to both Etruscan 
and Hebrew, are legion.

Some scholars may decide to transliterate all Phoenician texts into 
Hebrew script and read only that, and retrieve it from their 
databases, and that is perfectly fine. Lots of people transliterate 
Sanskrit into Latin and never use Devanagari.

I would be happy to inform Debbie.  The font for the  Samaritan 
marks is still in rough draft due to what I did in fall
What "marks" are these?

and I had confusing email from a Samaritan expert I consulted that needs to be
processed.(re vowels not unification)
Documents available to me suggest that Samaritan can (but needn't) 
use Arabic fatha and kasra and others, and that there are 
orthographies for which some letters are used vocalically, a bit like 
Yiddish.

 > is clearly a different script from Hebrew.

Different is in the eye of the beholder, I'm afraid. Or, if you 
will, in the eye of the cyber-machine
No. It is a question of history and development.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: why Aramaic now

2003-12-24 Thread Michael Everson
At 12:38 -0800 2003-12-24, Curtis Clark wrote:
on 2003-12-24 12:02 Elaine Keown wrote:
Some of the sets of symbols I found---which I simply
assumed could be added to "Hebrew"--are innately
controversial because of the Roadmap.
I've been following these threads with interest, as an uninformed 
bystander. Michael's unwillingness to unify in haste seems correct 
in first principles, independent of his expertise and experience. 
But you have presented the first cogent (to me :-) argument for why 
delaying the decision is a problem.
Not at all. Punctuation marks are often shared between scripts.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: why Aramaic now lumpers and splitters

2003-12-24 Thread Michael Everson
At 14:08 -0800 2003-12-24, Elaine Keown wrote:

 > There is zero chance that Phoenician will be
 considered to be a glyph variant of Hebrew.
Many, many Semitists would be truly astonished to read this sentence.
They will need to get over it. Many, many other people will want 
Phoenician encoded as a script whether or not Semiticists choose to 
use it. It is a cultural matter, not just a matter of comparative 
Semitics.

Again, Germanicists may prefer Latin to Gothic, and Indo-Europeanists 
may prefer Latin to Kharoshthi or Devanagari, yet we encode all.

Samaritan Bibles have fascinating marks that indicate the emotion or 
dramatic interpretation to use in reading each verse.pretty 
nifty!
Can you please send bibliographical references and/or samples to me 
or to Debbie or both?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: why Aramaic now lumpers and splitters Samaritan

2003-12-26 Thread Michael Everson
At 08:43 -0800 2003-12-25, Elaine Keown wrote:

In addition, I was unable to find complete information on 
Samaritan--I couldn't find any running text with vowels that was 
large enough to scan for a proposal here in Texas.  So anything I 
would send you now would
not be enough to write a proposal.
I would rely on materials you were able to supply to supplement what 
I already have, which is not inconsiderable.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Ancient Northwest Semitic Script (was Re: why Aramaic now)

2003-12-26 Thread Michael Everson
c, etc. at the same time, and now there is
resistance to using Unicode characters with "Hebrew" in their names to
write Phoenician, Aramaic, etc.
I think the "real problem" here arises from the fact that some 
scholars, familiar with Hebrew, find it easier to read early Semitic 
texts in square script than in the originals. The same thing happens 
with Runic and Gothic and Glagolitic and Khutsuri, and indeed 
Cuneiform, where Latin is often preferred (regardless of the 
structure of the writing systems). The needs of those scholars is 
met: they can use Hebrew and Latin with diacritics. No problem. The 
needs of other clients of the Universal Character Set, no matter how 
"unscholarly" they may be, will be met by encoding appropriate nodes 
in the Semitic tree.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Aramaic unification and information retrieval

2003-12-26 Thread Michael Everson
At 17:46 + 2003-12-26, Christopher John Fynn wrote:

(Though the Roman style & Fraktur style of Latin script are probably more
different from each other as some of the separately encoded Indic 
scripts [e.g. Kannada / Telugu])
Sorry, Chris, this is unsubstantiated speculation, and it doesn't 
happen to be true.

In 1997, I showed some comparisons between Coptic, Greek, Cyrillic, 
and Gothic showing that all of them but Greek were similar enough to 
be read with a minimum of training and practice. I revised this a bit 
in 2001: http://www.evertype.com/standards/cy/coptic.html. German, 
English, and Irish can all be read with similarly low learning curve 
whether the script is Fraktur or Gaelic; the number of letterforms 
which differ is small. Wedding invitations in English-speaking 
countries are routinely written in non-Latin garb. the identification 
is uncontested! No student of writing systems classes the "Gaelic 
script" as something different from "Latin script". The same cannot 
be said of Phoenician, Samaritan, and Hebrew, for instance.

So in the case of the ancient Semitic scripts - even if they are closely
related, is each associated with a particular written language   - or were the
different but related scripts being used to write a common language?
All of them can be used to write more than one language. Some of them 
may not have been. It's complex and needs review.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Ancient Northwest Semitic Script

2003-12-27 Thread Michael Everson
At 00:36 -0500 2003-12-27, Dean Snyder wrote:

This document by Michael Everson is particularly revealing and in the end
damning to his whole attempt at disunification of the Northwest Semitic
script.
I am not interested in participating in this kind of discourse. This 
is not "Michael Everson vs the Semitic scholars", Mr Snyder.

Your "Northwest Semitic" is the same as "my" Phoenician in any case; 
so, in fact, you agree with the Roadmap as regards some points.

Lumpers can use Hebrew. Splitters need more granularity. We will, 
eventually, be investigating the levels of granularity that will be 
useful.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Ancient Northwest Semitic Script

2003-12-27 Thread Michael Everson
At 11:20 -0500 2003-12-27, Dean Snyder wrote:

But my main objection is that you have ALREADY made up your mind 
about Phoenician and Hebrew, categorically and emphatically 
declaring that there is "zero chance" that they will be considered 
glyphic variants of one another.
I'm sorry you object. I remain convinced, however, that suggestion 
that Phoenician be unified with Hebrew and Phoenician is ridiculous 
in the extreme, and I will oppose it absolutely. Likewise, it is 
clear that Samaritan is also not to be unified with Hebrew. There may 
be some grey area regarding the relation of one variety or another of 
Aramaic to Phoenician, and to Hebrew and other descendants of 
Aramaic. That is what gave rise to this and related threads.

If you don't like this, that's fine. You can raise your objections 
when I eventually have the time and resources to push the Phoenician 
or Samaritan proposal forward. (Realistically, we can't expect that 
any one else will be doing so.) I'm not going to do that now, nor am 
I going to engage in further academic debate with you. You've put far 
more weight on the niggly details in N2311, which is an informative 
document written two years ago in order to help make sense out of 
chaos. O'Connor's chart there is one of many charts; its being there 
is also informative.

In the meantime, the Roadmap will stay as it is, because these issues 
remain open. As I see it, it is a certainty that Phoenician and 
Samaritan will be encoded, for good reasons I shall not go into here. 
And in due course, it will be possible to discuss what remains.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: [hebrew] Re: Ancient Northwest Semitic Script

2003-12-27 Thread Michael Everson
At 13:36 -0500 2003-12-27, John Cowan wrote:
Michael Everson scripsit:

I remain convinced, however, that suggestion that Phoenician be 
unified with Hebrew and Phoenician is ridiculous in the extreme, 
and I will oppose it absolutely. Likewise, it is clear that 
Samaritan is also not to be unified with Hebrew.
There's clearly a slip here: the second occurrence of "Phoenician" must
mean something else, and I can't figure out what.  However, it is not
so clear to me that Phoenician and palaeo-Hebrew (and a fortiori
Samaritan) should not be unified.
Sorry.

I remain convinced, however, that suggestion that Phoenician be 
unified with Hebrew is ridiculous in the extreme, and I will oppose 
it absolutely. Likewise, it is clear that Samaritan is also not to be 
unified with Hebrew.

Currently we do think that Phoenican and Palaeo-Hebrew should be 
unified. Samaritan on the other hand is a later development of that 
line, which had to good fortune of taking on typographic 
regularization and development; it has interesting and unique 
features with regard to vowel representation, and a modern community 
of users; it is best disunified from Phoenician.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: [hebrew] Re: Aramaic unification and information retrieval

2003-12-27 Thread Michael Everson
At 14:44 -0800 2003-12-27, Peter Kirk wrote:

Doug, thanks for making this new point re ancient Semitic scripts. 
Fundamental identity of the characters is a strong reason for 
unifying these scripts as well as Han scripts. As I wrote a few days 
ago, ALEF is ALEF is ALEF is ALEF, whatever glyph shapes are used.
And ALPHA and A, are just the same.

We disunified Nuskhuri from Mkhedruli, and familiarity and legibility 
were indeed criteria for the disunification. Mark Shoulson has just 
given his expert testimony that, one-to-one relation to the Semitic 
repertoire or not, Samaritan needs to be considered different from 
Hebrew. I'd say he'd probably feel the same about the older 
Phoenician as well.

I will say it again: You and every Semiticist specialist on the face 
of the earth can encode every Phoenician document transliterated into 
Hebrew script in your databases and never even look at an eventually 
encoded Phoenician script. That usage still doesn't mean that the 
Phoenician script is a glyph variant of square Hebrew even if they 
share a repertoire.

Even in antiquity these scripts were used distinctively in a number 
of instances, which will be discussed in the proposal documents in 
due course.

Scripts develop, and differentiate. The nodes of Semitic which we 
will encode have not all been investigated, but, like Indic, it makes 
sense to encode more than one of them. I believe that the distinction 
between Phoenician and Square Hebrew should be maintained in plain 
text; font markup is not sufficient.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: German 0364 COMBINING LATIN SMALL LETTER E

2003-12-28 Thread Michael Everson
Both s and long s are available for use if anyone wants to use them. 
What's the problem?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: [hebrew] Re: Ancient Northwest Semitic Script (was Re: why Aramaic now)

2003-12-29 Thread Michael Everson
At 06:40 -0800 2003-12-29, Elaine Keown wrote:

Michael Everson wrote:
 > And the mother of those scripts is Phoenician. She is *not* Hebrew.
The mother script is probably the southern Sinai or Wadi el-Hol 
script, written in about 1,700 B.C.E. by Aramaeans who worked either 
in the copper mines of the southern Sinai or were mercenaries in an 
Egyptian army in the Western Desert.
That would be the grandmother. :-)

I also think that your attitude is that of a Hellenist or 
Indo-Europeanist, who looks at everything from the perspective of 
Athens.
Think what you like.

Semitics is "Praeparatio Hellenika"--its other aspects are less important, and
hence not to be emphasized in computerization or anything else.
I cannot make sense of this at all.

Not all roads lead to Athens, Michael Everson--some of them go elsewhere
What the bejeesus are you on about, Elaine?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: [hebrew] Re: Ancient Northwest Semitic Script (was Re: why Aramaicnow)

2003-12-29 Thread Michael Everson
At 06:55 -0800 2003-12-29, Peter Kirk wrote:

Yes, this is true at least of Azerbaijani, which mapped Cyrillic 
glyphs to Latin ones one-to-one. But with Serbo-Croat we are talking 
of two separate communities which prefer to use separate scripts for 
what is essentially the same language; and with Azerbaijani we are 
talking of a deliberate decision by a people, or at least its 
government, to change scripts.
In Sanhedrin and Mishnaic text deliberate distinction is made between 
Samaritan and Square Hebrew, as will be demonstrated in the Samaritan 
proposal.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Name Mixup Behind Air France Groundings

2004-01-02 Thread Michael Everson
At 10:08 -0800 2004-01-02, Joe Becker wrote:

French police officials, speaking on condition of anonymity, said 
errors in spelling and transcription of Arabic names played a role 
in the mix-up.
Figures, doesn't it?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Pre-1923 characters?

2004-01-02 Thread Michael Everson
At 12:19 -0800 2004-01-02, D. Starner wrote:
I'm working with Distributed Proofreaders to produce some minimal
Unicode character selectors. Right now I'm working on the Latin
character selectors. Since we soley provide material for Project
Gutenberg, we usually only deal with characters pre-1923. After
stripping composable accents, which characters in the Latin blocks
only appeared after that date? Can I assume that both the Pan-Turkic
Latin orthography and the Pan-Nigerian alphabet postdate that?
No, you can't make assumptions like that.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Pre-1923 characters?

2004-01-02 Thread Michael Everson
At 14:54 -0800 2004-01-02, Peter Kirk wrote:
On 02/01/2004 12:19, D. Starner wrote:

I'm working with Distributed Proofreaders to produce some minimal
Unicode character selectors. Right now I'm working on the Latin
character selectors. Since we soley provide material for Project
Gutenberg, we usually only deal with characters pre-1923. After 
stripping composable accents, which characters in the Latin blocks
only appeared after that date? Can I assume that both the Pan-Turkic
Latin orthography and the Pan-Nigerian alphabet postdate that?

You are probably safe with the Pan-Turkic Latin alphabet. It seems 
that this was adopted followng the First Turkology Congress, held in 
Baku in 1926, see 
http://www.azer.com/aiweb/categories/magazine/81_folder/81_articles/81_turkology_congress.html.
You will find Turkic letters in that alphabet which predate that congress.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Pre-1923 characters?

2004-01-03 Thread Michael Everson
At 16:42 -0800 2004-01-02, D. Starner wrote:
 > > Can I assume that both the Pan-Turkic
 >Latin orthography and the Pan-Nigerian alphabet postdate that?

 No, you can't make assumptions like that.
Yes, I can. And I will if I have to.
Your question was an historical one.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Caucasian Albanian Alphabet: Ancient Script Discovered in the Ashes

2004-01-03 Thread Michael Everson
At 15:47 -0800 2004-01-02, Peter Kirk wrote:
I have found a new script which may need to be encoded in Unicode. 
Well, I haven't found it myself, Zaza Alexidze has done that. I was 
previously aware of this Caucasian Albanian script, but I have only 
just found out that for the first time an extensive document - 300 
pages of a lectionary, dating probably from the 5th century CE - has 
been found written in this alphabet, and in an ancient form of the 
Udi language. It seems to be a truly separate alphabet, although 
distantly related to Georgian and Armenian.
Does it? The links you gave were a bit less than conclusive in that regard.

But it is not even roadmapped for Unicode.
Must you use such rhetoric?

It wasn't roadmapped because we had no comprehensive information on 
it. Now we have more information, which is excellent.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Pre-1923 characters?

2004-01-03 Thread Michael Everson
At 16:56 -0800 2004-01-02, D. Starner wrote:
 > Not safe unless you *know* exactly when a character was invented.

Not safe for what? I've come across six characters that weren't in 
Unicode at all.
What are they?

You assumption wasn't safe given your question.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Pre-1923 characters?

2004-01-03 Thread Michael Everson
At 09:03 -0800 2004-01-03, Peter Kirk wrote:

In fact it should be considered a variant of g.
Or q.

The representative glyph for this character seems to be good.
It is. We went to a lot of trouble getting it that way too.

But, given that the name is so misleading but cannot be changed, it 
is good that there is a note "= gha" in the Unicode character charts.

But in the light of naming errors like this one implementers should 
be advised not to use character names, because they are not reliably 
helpful.
I wouldn't say that. It would better to advise them, as we do, that 
they cannot rely on the names being perfect. That's different from 
not using them at all.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Caucasian Albanian Alphabet: Ancient Script Discovered in the Ashes

2004-01-03 Thread Michael Everson
It looks a lot like what has been called the "Agvan alphabet". See 
http://www.evertype.com/alphabets/Agvan.jpg
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Latin letter GHA or Latin letter IO ? (was: Pre-1923 characters?)

2004-01-03 Thread Michael Everson
Philippe said:

In Unicode, the glyphs are normative in a way that they allow 
character identification, but they are not mandatory, so they are 
mostly informative.
This is not true, Philippe. In fact, it is so dreadfully and 
misleadingly untrue that all I can suggest is that you go back to 
page one of the Unicode Standard and start over.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Pre-1923 characters?

2004-01-03 Thread Michael Everson
At 11:15 -0800 2004-01-03, Michael \(michka\) Kaplan wrote:

It makes me wish we had a CouldaWouldaShoulda_CharacterName property that
contains what the name ought to be, and we document this as one that *will*
change any time there is a mistake made in the original character name. We
just make a nice informative property and go through all of our known
mistakes and the maintenance after the initial pass should be minimal
I am sure that eventually such a thing will be implemented. But it 
would be too early to do it now, I think. Things are still too 
volatile.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Latin letter GHA or Latin letter IO ? (was: Pre-1923 characters?)

2004-01-03 Thread Michael Everson
At 21:50 +0100 2004-01-03, Philippe Verdy wrote privately to me:

From: "Michael Everson" <[EMAIL PROTECTED]>
 > Philippe said:
 >
 >In Unicode, the glyphs are normative in a way that they allow
 >character identification, but they are not mandatory, so they are
 >mostly informative.
 This is not true, Philippe. In fact, it is so dreadfully and
 misleadingly untrue that all I can suggest is that you go back to
 page one of the Unicode Standard and start over.
I have read it. Glyphs are just normative as a way to demonstrate a 
valid representation of the encoded code point, so that any other 
aceptable glyph should be unambiguously identified as the same 
character. So these glyphs are normative but not mandatory. Is that 
a  more acceptable formulation?
NO, IT IS NOT.

Is that clear enough for you?

You are spreading MISINFORMATION about Unicode, and this is 
reprehensible. Particularly when people give you, time and again, 
accurate information.

The glyphs are not normative.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Pre-1923 characters?

2004-01-03 Thread Michael Everson
At 22:37 +0100 2004-01-03, Philippe Verdy wrote:

Note that a fundamental property of character identity is its most common
classification as a vowel, consonnant, or semi-vowel.
That isn't true. The letter "v" is a vowel in Cherokee, a consonant 
in Czech, and (often) a semivowel in Danish.

Please stop talking as though you are a Unicode authority, Philippe. 
You are an enthusiastic beginner. There is nothing wrong with that. 
Good luck with your studies. As I said once before, if you do your 
homework you could well be as valuable a participant to our work as 
Doug Ewell is. But for now, your pretense at expertise just makes a 
lot of people annoyed with you.

Perhaps Patrick Andries' French translation of the text of the 
standard will be of assistance to you.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Latin letter GHA or Latin letter IO ? (was: Pre-1923 characters?)

2004-01-03 Thread Michael Everson
At 23:23 +0100 2004-01-03, Philippe Verdy wrote:
From: "Michael Everson" <[EMAIL PROTECTED]>
 > The glyphs are not normative.
But if you want to insist more with your position, why not simply dropping
completely all glyphs from the Unicode standard?
Because they are informative.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Pre-1923 characters?

2004-01-03 Thread Michael Everson
At 00:00 +0100 2004-01-04, Philippe Verdy wrote:
From: "Michael Everson" <[EMAIL PROTECTED]>
 At 22:37 +0100 2004-01-03, Philippe Verdy wrote:

 >Note that a fundamental property of character identity is its most common
 >classification as a vowel, consonnant, or semi-vowel.
 That isn't true. The letter "v" is a vowel in Cherokee, a consonant
 in Czech, and (often) a semivowel in Danish.
Also: what are you demonstrating here?
That the fundamental property of character identity of the letter "v" 
is NOT its use as a consonant, as a vowel, or as a semi-vowel.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Pre-1923 characters?

2004-01-03 Thread Michael Everson
At 23:40 +0100 2004-01-03, Philippe Verdy wrote:
From: "Michael Everson" <[EMAIL PROTECTED]>
 At 22:37 +0100 2004-01-03, Philippe Verdy wrote:

 >Note that a fundamental property of character identity is its most common
 > >classification as a vowel, consonnant, or semi-vowel.
 >
 That isn't true. The letter "v" is a vowel in Cherokee, a consonant
 in Czech, and (often) a semivowel in Danish.
Stop arguing against each of my words. And READ: Is said "most common"
on purpose above. Once again you are volontarily interpreting things that I
did not say just to find a way to contradict me.
No, I am not. "Vowel", "consonant", or "semi-vowel" is not a 
"fundamental property of character identity", and as I have shown, 
any given letter can have any number of these values. Which is why 
these "properties" are not "fundamental" to "character identity".

I feel now that you have your own reading of the Unicode standard.
I am sure that many will agree with you. (I am perfectly aware that 
sometimes I am less patient than I might be, as well. That's a 
character issue, perhaps.)

But stop saying always that your position is neutral, objective.
I didn't. I said that you said something that wasn't true.

You have the right to think that the representative glyphs are not
representative at all. I think the opposite. You may not like these glyphs,
because you, as a typographic expert, would have designed them
differently.
Actually, I vetted a great many of the chart glyphs (GHA especially) 
to ensure that they were as correctly representative as possible.

I really think that you are unable to accept any words that you have 
not said yourself, and you accept no compromize and prefer a 
systematic and, once again, dogmatic positions as THE only allowed 
and omnipotent expert for all questions regarding Unicode.
I'm not omnipotent, nor do I speak for the Unicode Consortium. I'm 
just an expert. When I am dogmatic, it is (as in this case) often due 
to the fact that we have a *standard* here. You were misusing or 
misunderstanding and misusing the terms "normative" and 
"informative". That distinction *is* dogma.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



LATIN SOFT SIGN

2004-01-05 Thread Michael Everson
At 05:30 -0800 2004-01-05, Peter Kirk wrote:

It seems that we do actually need two new character pairs, this one 
and also the soft sign lookalike - unless it is considered 
acceptable to use the Cyrillic characters in Latin text cf. the use 
of Latin Q and W in Cyrillic Kurdish.
LATIN LETTER TONE SIX **is** the SOFT SIGN clone into Latin, and 
should be used for Pan-Turkic. I've suggested, but perhaps not loudly 
enough, that the reference glyph be modified to be more soft-sign 
like.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: LATIN SOFT SIGN

2004-01-05 Thread Michael Everson
At 07:27 -0800 2004-01-05, Peter Kirk wrote:

If we are talking about U+0184/0185 (an inexact character name is 
not much help), yes, that is a sensible match, but in that case we 
need a note cf. for 01A3 that these are for Pan-Turkic Latin 
alphabets, and not just for Zhuang tones as the existing note 
suggests.
I know.

Also, the reference glyphs seem to have an attachment on their left 
sides, more than a normal serif, which is confusing and makes them 
look as much like a Cyrillic hard sign as a soft sign. A soft sign 
should have symmetrical serifs, or no serif at all.
I know. It will help if we can show a Zhuang text without the weird 
serifs; I've had my eye out for a while.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: unicode Digest V4 #3

2004-01-05 Thread Michael Everson
At 16:27 +0100 2004-01-05, Philippe Verdy wrote:

Why not then use the Latin ton six for all texts in that period, and allow
glyph variants to show the I with right hook glyph used in early Latin
Azeri?
Because that wouldn't be right.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: LATIN SOFT SIGN

2004-01-05 Thread Michael Everson
At 08:31 -0800 2004-01-05, Andrew C. West wrote:

LATIN LETTER TONE SIX isn't a Latin clone of the Cyrillic soft sign 
per se, but is simply a character that is based on the Cyrillic 
letter that looks most like the digit "6". It was chosen to 
represent Zhuang Tone 6 purely on the shape of the glyph (likewise 
the letters for Zhuang Tones 1-5 were chosen simply for their 
resemblence to the digits "1" through "5"), and has no relation to 
the original phonetic usage of the Cyrillic letter.
It doesn't have to have. My point would be that soft sign was 
borrowed into Latin for Tatar as well as for Zhuang, and that though 
we have encoded it for Zhuang, it should be used for old Tatar as 
well.

To modify the reference glyph be modified to be more soft-sign like 
would simply make the reference glyph less Zhuang Tone Six-like.
Only if Zhuang never uses the ordinary soft sign glyph. I am sure I 
have seen the ordinary soft sign glyph used for Zhuang (but cannot 
remember where, so I have to discover it again). I recognize that the 
burden of proof is on me for this.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: unicode Digest V4 #3

2004-01-05 Thread Michael Everson
At 19:23 +0100 2004-01-05, Philippe Verdy wrote:
From: "Michael Everson" <[EMAIL PROTECTED]>

 At 16:27 +0100 2004-01-05, Philippe Verdy wrote:

 >Why not then use the Latin ton six for all texts in that period, and
allow
 >glyph variants to show the I with right hook glyph used in early Latin
 >Azeri?
 Because that wouldn't be right.
Even if it's encoded with a variant selector after the latin tone six?
Yes, even if such odious pseudo-coding were employed.

As this is an historic variant of the letter which was then changed to Latin
soft-sign during the first Latin period, I think it would allow "unifying"
Azeri texts coded in Latin in 1923-1933 and in 1933-1939.
It is NOT a variant of the soft sign. It is a variant of the letter i.

Was there other uses of this i with lower-right hook in other languages or
regions ?
Yes.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)

2004-01-05 Thread Michael Everson
Well, James, I think it would be A LOT better if we got some actual 
documents from Zhuangland.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Latin letter GHA or Latin letter IO ?

2004-01-05 Thread Michael Everson
At 14:37 -0800 2004-01-05, Peter Kirk wrote:

As you will see, I have requested precisely this clarification for 
U+0184/0185, to clarify that this letter is used in pan-Turkic 
alphabets as well as in Zhuang. I am also asking for a change in the 
reference glyph for U+0185, because in both Zhuang and pan-Turkic 
this should be much shorter, and distinguished from "b" primarily by 
its size.
In Pan-Turkic, though, it looks just like CYRILLIC SOFT SIGN in all 
the sources I have seen. For lots of languages.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Latin letter GHA or Latin letter IO ?

2004-01-05 Thread Michael Everson
At 15:51 -0800 2004-01-05, Peter Kirk wrote:

In Pan-Turkic, though, it looks just like CYRILLIC SOFT SIGN in all 
the sources I have seen. For lots of languages.
Precisely. I meant that the glyph must be clearly distinct from 
U+0062, and so should be identical to U+0446. The Pan-Turkic glyph 
was probably really identical to the soft sign because printers 
would have used the same type wherever possible.
We agree! We agree! We agree!
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


New document - N2694

2004-01-05 Thread Michael Everson
N2694 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2694
Proposal to encode two Bhutanese marks for Dzongkha in the UCS
Michael Everson and Chris Fynn
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Latin letter GHA or Latin letter IO ?

2004-01-05 Thread Michael Everson
At 02:00 +0100 2004-01-06, Philippe Verdy wrote:
From: "Kenneth Whistler" <[EMAIL PROTECTED]>
 When the combination of character name and representative
 glyph and associated informative annotations is insufficient
 to correctly identify a character in the standard, the
 recourse is to Ask the Experts and request further annotation
 of the standard to assist future users from running into the
 same problem.
Thanks for your view on this issue. It is far less extreme than the Michael
position, which just consists in saying "informative" without more
justification, when you clearly admit that they are also mandatory.
Ken and I hold the same view and have the same position. Things may 
be mandatory and informative, or they may be mandatory and normative.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: New MS Mac Office and Unicode?

2004-01-06 Thread Michael Everson
At 12:48 -0700 2004-01-06, Tom Gewecke wrote:
MS Mac Office 2004 was announced at MacWorld SF today.  Does anyone know
whether this update finally brings the Unicode capabilities of the WinXP
version to the Mac OS X world?
It would be really wonderful news if it were to do so.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: New MS Mac Office and Unicode?

2004-01-14 Thread Michael Everson
At 09:33 -0600 2004-01-14, David Perry wrote:

I am delighted to see a Unicode-native version of Office come out at 
long last; it lays the foundation for future developments.
Hear, hear.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: New MS Mac Office and Unicode?

2004-01-14 Thread Michael Everson
At 09:11 -0800 2004-01-14, Peter Kirk wrote:

It strikes me that some people are reading the announcement as if it 
is what they want to hear, rather than what it actually says.
It strikes me that some people are wanting to see it as not being 
good enough because it's not as complete as they want. My view: Input 
beyond WorldScript? Huzzah!

If this was the great step forward that everyone wants, surely 
Microsoft would be telling everyone loud and clear.
No company tells all before release.

They are of course saying that their new product is wonderful and 
what everyone is waiting for (who wouldn't?), but when you read the 
small print they are promising rather little, certainly not full 
Unicode support, not even full support for non-complex scripts.
If they are permitting input via the "Unicode Hex Input" and "US 
Extended", then presumably it will allow input via the "Irish 
Extended" and "Devanagari-QWERTY" and "Arabic-QWERTY" keyboards. If 
it doesn't, there is something WRONG. If it does, and there are 
display issues regarding *rendering* of Devanagari or Arabic, that is 
a DIFFERENT issue, which Microsoft will address in due course, one 
expects.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Cuneiform - Dynamic vs. Static

2004-01-14 Thread Michael Everson
It is not useful to continue this thread on both the Unicode and the 
Cuneiform lists.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Samaritan shan

2004-01-15 Thread Michael Everson
At 22:10 -0800 2004-01-14, Peter Constable wrote:

 > >Now, that said, I am very keen to have the Samaritan shin encoded,
 >because this is used as a mark in the apparatus critici of the BHS
 >and possibly other Bible editions (in BHS it used in citations of
 >Pentateuchi textus Hebraeo-Samaritanus secundum). I'd be perfectly
 >happy to see it encoded as a Letterlike Symbol, since it is being
 >used as a symbol and not as a Samaritan letter.
 Perhaps it must be in any case, due to directionality issues.
Apparently nobody noticed that I submitted a proposal for this thing
last year, the response to which was that it should be left until all of
Samaritan is encoded.
We did notice, when we started working on Samaritan. Nobody thought 
about the directionality issue at the time. D'oh!
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Cuneiform - Dynamic vs. Static

2004-01-15 Thread Michael Everson
At 22:32 -0500 2004-01-14, Dean Snyder wrote:

I'm still hoping for even more technical feedback from the Unicode
community on this issue. I would like to be convinced that the dynamic
model is a bad idea.
Whether you are or are not convinced, it certainly is a bad idea. 
That's why we have been preparing proposals based on the static, 
sign-based model. Ken Whistler has gone to the trouble of rehearsing 
the refutation of all of the points on the Cuneiform list.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Klingon

2004-01-15 Thread Michael Everson
At 11:15 +0100 2004-01-15, Chris Jacobs wrote:

 > I had a problem with this too, for a while (previous discussion on this
 list helped clear it up).  Klingon letters had been placed in the PUA by
 the CSUR (ConsScript Unicode Registry, an unofficial allocation of PUA
 space to constructed alphabets),
Really?

And did the Klingon Language Institute endorse that?
Yes. See http://www.evertype.com/standards/csur/klingon.html
The original encoding was made for some Linux implementation in 1995 
or 1996 I suppose.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



OT, utterly OT

2004-01-15 Thread Michael Everson
Anyone know how I can read a .mdb file? Please respond to me directly 
and not on the list.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Klingon

2004-01-15 Thread Michael Everson
At 14:53 +0100 2004-01-15, Chris Jacobs wrote:
WHY THEN DISTRIBUTES THE KLI SUCH A BLATANTLY UNCONFORMANT FONT?
yIjachQo'. vItlhob.

{{{:-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Klingon

2004-01-15 Thread Michael Everson
At 18:06 +0100 2004-01-15, Philippe Verdy wrote:
From: <[EMAIL PROTECTED]>
 > Michael Everson scripsit:
 > >
 > > yIjachQo'. vItlhob.
 >
 Demonstrating once again that the One True Script for Klingon is Latin.
Not really: look at how uppercase letters are used: case mapping, which is
quite safe in languages written with the Latin script, completely breaks the
Klingon text...
Michael did not write: "Yijachqo'. Vitlhob."
Many Latin-script languages write capital letters 
in non-initial positions. Irish does quite 
regularly: "an tSín" 'China'. Breton does 
sometimes. It is common in transliterations of 
Tibetan.

Of course, Philippe seems to be suggesting that 
the One True Script for Klingon is *not* Latin, 
because he thinks that yIjachQo' is not Latin, 
while Yijach1o' is. Which is, well, incredible.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Klingon

2004-01-15 Thread Michael Everson
At 18:50 +0100 2004-01-15, Philippe Verdy wrote:

My remark is still valid: Klingon is not Latin, even if there's a 
font that tries to represent Latin letters by creating Latin digraph 
ligatures into Klingon letters that break the conformance 
requirement for Latin letters.
Oh, stop, stop, stop, stop, stop.

I wrote some words in Klingon. I wrote them in the Latin script. John 
observed that this was more proof that Klingon was conventionally 
written in the Latin script, which it is. It is not conventionally 
written in the pIqaD. That's why pIqaD has not been encoded in the 
Unicode Standard. Enthusiasts use it decoratively; that's why it was 
given a CSUR encodinng.

It's embarrassing to see someone going to such lengths to show what 
an expert he is about this when he is just utterly wrong.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Klingon

2004-01-15 Thread Michael Everson
At 19:16 +0100 2004-01-15, Philippe Verdy wrote:

 > Many Latin-script languages write capital letters
 > in non-initial positions. Irish does quite
 > regularly: "an tSín" 'China'. Breton does
 sometimes. It is common in transliterations of
 Tibetan.
I admit this exists, I don't think it's a good idea to use such weak
conventions, which are justified only by the fact that one is technically
constrained to use a restricted subset of Latin. If people could use more
distinctive letters in Latin, such caveats would be avoided.
Well, golly. I guess we're not going to change 
1,000 years of orthographic practice because it 
fails to meet your r

For Breton, I don't agree with you.
Do you not? The practice is rare, but is 
sometimes used in placenames, as for instance, 
"Inis gWenva" written. (Gosh, look. A fact.)

Words starting by the trigraph letter  are rare in Breton
Like the pronoun "c'hwi" 'you" or the digit 
"c'hwec'h" 'six'. (Wow. Another fact.)

but even in that case, I see NO use of such 
"abuse" of Latin letter case other than a way to 
represent a missing diacritic or a missing 
letter.
Look again.

The presence of case distinctions as meaning strong primary letter
distinctions in these conventions just denotes a missing diacritic or
separate letter for the Latin transliteration...This is still a (very poor)
transliteration system, with its imperfections, and as with other
transliteration systems, it breaks the initial script design and semantic
structure and is a clear sign that this is a plain separate script (as it
was the intent of Tolkien when he created the script).
Heaven help us.

Of course, the original orthography for Klingon 
was Latin, as published in 1985 in Marc Okrand's 
Klingon Dictionary.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Klingon

2004-01-15 Thread Michael Everson
At 19:28 +0100 2004-01-15, Philippe Verdy wrote:

Even in the case of Irish, the uppercase "S" denotes a distinctful variant
of "s", which should better be noted with some diacritic, such as a hacek or
cedilla... Imagine what happens when reading uppercased Irish book titles
and the confusion it produces?
Yes. Imagine.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Samaritan shan symbol

2004-01-15 Thread Michael Everson
At 12:19 -0800 2004-01-15, John Hudson wrote:

Do you know if the directionality issue was considered at that time.
No, we didn't consider it at the time. We dropped the ball on that one.

I sent Michael a number of scans of the Samaritan shin in use as a 
symbol in BHS apparatus critici, including in use in direct 
proximity with LTR letters, numbers and other symbols.
I have that, yes.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Breton

2004-01-15 Thread Michael Everson
At 22:20 +0100 2004-01-15, Philippe Verdy wrote:

Look at this page to find why this happens:
http://www.kervarker.org/fr/grammar_01_kemmadur.html
Perhaps I won't. I know about Breton mutation. See 
http://www.evertype.com/gram/bg.html

By "rare" I mean words without mutation of the leading consonnant.
The same number above would be "kwec'h" without the mutation...
This is incorrect. *kwec'h does not occur; neither does *kwi. In 
fact, no words in kw- occur.

Typical breton dictionnaries will list the word only at K, and not at C'H
This is incorrect. For instance the 1200-page monolingual Breton 
dictionary published in 1995 gives them under C'H.

(in fact the prefered Breton sorting order generally orders C'H 
between K and L, and GW between W and X).
This is incorrect. Alphabetical order is
A B C CH C'H D E F G H I J K L M N O P R S T U V W Y Z
X does not occur. GW is not a letter of its own.

The old alphabetical order was
A B K D E F G H CH C'H I Y J L M N O P R S T U V W Z
Sometimes, as in Kervella's _Yezhadur bras ar brezhoneg_, GW was 
separated out between G and H (where it would fall anyway).

--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Klingon

2004-01-15 Thread Michael Everson
At 22:11 +0100 2004-01-15, Philippe Verdy wrote:

The comment from Michael about the occurence of "gW" in Breton was wrong:
I said I had seen it in print, which was true, and I said that it was 
rare, which is also true. It is not standard.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



  1   2   3   4   5   6   7   8   9   10   >