Re: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread Simon Montagu
Michael Everson wrote:
A new contribution.
http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2755.pdf
N2755
Proposal to add QAMATS QATAN to the BMP of the UCS
Michael Everson  Mark Shoulson

Nice.
 8a. Can any of the proposed characters be considered a presentation
 form of an existing character or character sequence?
 No.
Is this overstating the case? As Mark said on the Hebrew list a little 
while ago:

 Things like the Simanim Tehillim and the Simanim Tiqqun are almost a
 poster-case of fancy text.  Their very selling point is that they are
 clearer and make more distinctions than plain printing.  It's when
 such conventions enter the mainstream (and there's obviously a
 continuum in that regard, and room for disagreement) that we start to
 consider them plaintext distinctions and thus to be encoded
 separately.
I think it would be good if the proposal anticipated the objection that 
qamats qatan could be considered as a presentation form or glyph 
variation of qamats and provided the counter-arguments. (Or would 
answering Yes to 8a just guarantee rejection?)

flippancyIsn't it a little strange that a short qamats should 
represented with a longer vertical than a regular qamats?/flippancy




Re: New contribution

2004-05-04 Thread John Hudson
Michael Everson wrote:
No Georgian can read Nuskhuri without a key. I maintain that no Hebrew 
reader can read Phoenician without a key. I maintain that it is 
completely unacceptable to represent Yiddish text in a Phoenician font 
and have anyone recognize it at all.
But no one is going to do that. No one is talking about doing that. This is a complete 
irrelevancy.

JH
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: Nice to join this forum....

2004-05-04 Thread John Hudson
Michael Everson wrote:
This is no different from Welsh:
A B C CH D DD E F FF G NG
All of those are considered letters in the Welsh alphabet. They are 
all significant. But that doesn't mean that ch and dd get encoded 
as single entities. They write c + h and d + d.

In Yoruba, you treat gb as a letter. That is fine. But you encode it 
with g + b.
Isn't there something in the FAQ about this? We've been through the discussion of digraph 
(and trigraph and tetragraph) encoding several times, and generally confusion stems from 
not understanding that higher level protocols are expected to handle rendering and things 
like sorting and spellchecking.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: New contribution

2004-05-04 Thread John Hudson
Michael Everson wrote:
  Hebrew has the same 22 characters, with the same character properties.
And a baroque set of additional marks and signs, none of which apply to 
any of the Phoenician letterforms, EVER, in the history of typography, 
reading, and literature.
And a baroque set of additional marks and signs, none of which apply any of the STAM 
letterforms...

I'm not arguing against the 'Phoenician' proposal: I just don't find many of these 
arguments very convincing. The fact that one style of lettering sometimes has combining 
marks applied and another doesn't does not seem a compelling reason not to unify them.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: New contribution

2004-05-04 Thread John Hudson
Michael Everson wrote:
If you people, after all of this discussion, can think that it is 
possible to print a newspaper article in Hebrew language or Yiddish in 
Phoenician letters, then all I can say is that understanding of the 
fundamentals of script identity is at an all-time low. I'm really 
surprised.
I can't believe anyone is even talking about typesetting newspapers in Hebrew or 
'Phoenician' letters: this is a total irrelevancy. I wouldn't typeset a Russian newspaper 
in 'vyaz style letters, either, but that doesn't make it a separate script from Cyrillic. 
Treating particular letterforms as glyph variants of existing characters does not imply 
that these letterforms are suitable for any text that might be encoded with those 
characters. So far as I can tell, no one is arguing such nonsense.

The issue is not whether Palaeo-Hebrew letterforms are readable by modern Jews, or whether 
they may be used in religious texts -- and I note that you are not suggesting that STAM 
should be separately encoded, even though it is the *only* style approved for use in Torah 
scrolls --: the issue is how ancient texts should be encoded.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: New contribution

2004-05-04 Thread John Hudson
Mark Davis wrote:
The question for me is whether the scholarly representations of the Phoenician
would vary enough that in order to represent the palo-Hebrew (or the other
language/period variants), one would need to have font difference anyway. If so,
then it doesn't buy much to encode separately from Hebrew. If not, then it would
be reasonable to separate them.
Given the sophistication of today's font technology, I don't think the encoding question 
can be addressed in this way. Regardless of whether 'Phoenician' letterforms are 
separately encoded, it is perfectly easy to include glyphs for these and for typical 
Hebrew square script (or any of a number of other different Hebrew script styles) in a 
single font. If the 'Phoenician' forms are not separately encoded, they can still be 
accessed as glyph variants using a variety of different mechanisms. The question is 
whether the distinction is necessary in plain text.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: CJK(B) and IE6

2004-05-04 Thread Andrew C. West
On Sun, 2 May 2004 12:14:29 -0700, Doug Ewell wrote:
 
 jameskass at att dot net wrote:
 
  The BabelPad editor can easily convert between UTF-8 and NCRs...
 
 As can SC UniPad.

For $199 (unless you're only interested in editing files up to 1,000 characters
in length).

Andrew



Re: Nice to join this forum....

2004-05-04 Thread Philippe Verdy
From: John Hudson [EMAIL PROTECTED]
 Philippe Verdy wrote:

  I thought about missing African letters like barred-R, barred-W, etc... with
  combining overlay diacritics (whose usage has been strongly discouraged
within
  Unicode).
 
  May be a font could handle theses combinations gracefully with custom glyph
  substitution rules similar to the automatic detection of ligatures. But may
be
  they should not if Unicode will, in fine encode these characters separately
  without any canonical equivalence with the composed sequence.

 Having spent weeks time researching African orthographies a few years ago, I'm
inclined to
 think that such barred letters should be separately encoded: they constitute
new Latin
 letters, not combinations of elements within orthographies such as base
letters and
 combining marks.

 A problem, however, is that many such forms are found in unstable
 orthographies, and are difficult to document adequately for inclusion in
proposals.

This last argument should not be a limitation to encode them. After all they are
used for living languages in danger of extinction, and even if documents using
them are rare, encoding them would help preserving these languages and helping
the development of their litteracy.

Without them, the instability of orthographies will always be a problem favored
by absence of standard to represent them adequately in any encoding or charset,
so that even book publishers and authors will need to use their own
approximations or unstable private conventions to represent them.

The case of Berber (in Latin script) is significant, if you just look at the
number of resources on the web that use various conventions to represent its
alphabet (some hacks use symbols like '$', underscores, middledots,
non-combining diacritics, greek letters...)

Today, a stable encoding for missing letters is the first condition to allow
stabilization of orthographies, a required first step needed to develop
educational contents needed to improve litteracy in the corresponding languages.
This is really needed because electronic forms of texts are the most
cost-effective solution to create and publish texts. Other historic mechanical
solutions cost too much, and they won't be used before a sustainable usage of
electronically composed publications is developped.

For many languages using the Latin script, a very limited number of specific
letters are needed. Encoding them and documenting them will help foundries to
improve their standard electronic fonts to include the few glyphs that are
needed for them. I do think that non-governmental educational organizations
present in Africa to help improve litteracy would find a greater audience if
they could finance the production of educational documents in the native
languages, and not only in a few official languages (most often French, English
and Arabic in Africa) that are still foreign to local populations that feel that
these languages are the languages of the empowered government.

Also, the cultural division between local populations does not help improving
peace in these often troubled regions, and the promotion of culture is certainly
one of the means to give back some power, proudness and freedom to these
populations, as a factor for peaceful coexistence and development.




Re:CJK(B) and IE6

2004-05-04 Thread Raymond Mercier
[Earlier posting lost, it seems.]

James Kass writes:
 The lack of support for supplementary characters expressed in UTF-8
 in the Internet Explorer is a bug.  As Philippe Verdy mentions, the
 Mozilla browser does not have this same bug.  Also it should be
 noted that the Opera browser handles non-BMP UTF-8 just fine.

As I said in my starting message Mozilla copes with everything, both UTF8
and NCR, over the whole CJK range.
However Opera (in my experience) cannot do Ext B in either UTF8 or NCR.
IE6 cannot cope with Ext A in UTF8, but will do so in NCR.
I attach two short files (produced by Hanfind) that include both extensions,
one in UTF8 and the other NCR (except that characters given within the text
are all NCR).


 While working with NCRs may be an ugly nightmare, there are some
shortcuts.
BabelPad is great, but it chokes in converting all the UTF8 in unihan.txt to
NCR at one
go. I wrote a dedicated program to do that.


 I *think* that Windows 2000 uses Unicode always internally and uses an
 internal conversion chart if material is non-Unicode like GB-18030.

That at least is declared http://www.i18nguy.com/surrogates.html.

Raymond Mercier
Title: Definition Search
35BE,E4 	(same as ) to beat a drum; to startle, to argue; to debate; to dispute, (interchangeable ) to be surprised; to be amazed; to marvel, (interchangeable ) the blade or edge of a sword, beams of a house3754,YAO4 	deep bottom; the southeast corner of a house3762,YU3 	(same as ) a house; a roof, look; appearance, space376A,DIAN4 DING3 	a slanting house, nightmare386F,ZHAI2 	(ancient form of ) wall of a building, a house, to keep in the house, thriving; flourishing, blazing, (ancient form of ) legal system; laws and institutions, to think; to consider; to ponder; to contemplate386F,ZHAI2 	(ancient form of ) wall of a building, a house, to keep in the house, thriving; flourishing, blazing, (ancient form of ) legal system; laws and institutions, to think; to consider; to ponder; to contemplate3870,YU3 	(large seal type ) a house; a roof, appearance, space; the canopy of heaven, to cover3875,LING2 	roof of the house connected3878,ZHA3 ZHA4 	a house; an unfinished house, uneven; irregular; unsuitable; ill-matched, tenon3878,ZHA3 ZHA4 	a house; an unfinished house, uneven; irregular; unsuitable; ill-matched, tenon387A,DAN4 	a cottage; a small house, a small cup3882,YAN3 	(terrains) of highly strategic; precipitious (hill, etc. a big mound, (same as VEA 3888) a collapse house, to hit, to catch something3888,TUI2 	a collapsed house, (same as ) to heap up; to pile388E,CHA4 ZE2 ZHAI2 ZHE2 	hide; conceal, a house not so high3891,TUI2 	(corrupted form of VEA 3888) a collapsed house, (same as ) to heap up; to pile3892,CHA2 	an almost collapsing house3896,not available 	a store house, to store3897,QIAO4 	a high house; a high building389A,LU3 	a corridor; a hallway; rooms around the hall (the middle room of a Chinese house), a nunnery; a convent, a cottage; a hut, a mansion389D,not available 	cottage; a coarse hourse, house with flat roof389E,YI4 	rooms connected, moveable house ( a yurt, a portable, tentlike dwelling used by nomadic Mongols)3B7D,DI3 	(non-classical form of ) root; foundation; base, eaves of a house; brim3BEA,LING2 	(same as ) carved or patterned window-railings; sills, the wooden planks which join eaves with a house3C03,MIAN2 	(same as U+6AB0) a tree, the bark of which is used in medicine-- Eucommia ulmoides, an awning of the house3C05,DI2 	(same as ) eaves of a house; brim, part of a loom, the cross beams on the frame on which silkworms spin, a bookcase, to abandon or give up3F1F,BAI2 	a tiled house, brick wall of a well414A,DU4 	a spacious house, (corrupted form of ) bundle of rice plant, name of a place4196,HONG2 	a big house, (same as ) great; vast; wide; ample41A7,not available 	(same as ) a cave; a den, living quarters; a house, to hide; to harbor41B2,not available 	a spacious house, emptiness41B5,CHENG2 	an echo, a high and deep; large; big; specious house41B8,CHENG2 	spacious; capacious, sound (of the house), a picture (on silk) scroll45D4,HOU2 	a house-lizard or gecko, a kind of insect; living in the water4997,XU4 	(same as ) quiet (house, surrounding, etc.)4CF8,MA2 MAI2 	the wild goose, sparrow; the house-sparrow4D47,XIAN4 	to dislike; to reject; to hate, a house; a building4D7A,TING3 	(same as  )boundary between agricultural lands, (in Japan) a street; a city block, ant hill; formicary, vacant land by the side of a house; a paddock, deer trace; deer track5740,ZHI3 	site, location, land for house5885,SHU4 	villa, country house58C1,BI4 	partition wall; walls of a house5B87,YU3 	house; building, structure; eaves5BA4,SHI4 	room, home, house, chamber5BB6,JIA1 JIE5 GU1 	house, home, residence; family5C4B,WU1 	house; room; building, shelter5EB3,BEI1 BI3 	a low-built house5EC6,HUI4 GUI1 WEI3 	a room; the wall of a house a man's name623F,FANG2 PANG2 	house, building; room680B,DONG4 	main beams supporting house68DF,DONG4 	the main 

Re: Pal(a)eo-Hebrew and Square Hebrew

2004-05-04 Thread Philippe Verdy
From: Dean Snyder [EMAIL PROTECTED]
 Patrick Andries wrote at 8:55 AM on Monday, May 3, 2004:

 I got this answer from a forum dedicated to Ancient Hebrew :
 
 « Very few people can read let alone recognize the paleo Hebrew font.
 Most modern Hebrew readers are not even aware that Hebrew was once
 written in the paleo Hebrew script.

 The same could be said for archaic Greek versus modern Greek - do you
 propose to encode archaic Greek separately?

Why not? If it helps serving better the scholars, searchers, students, and
script fans so that they will more accurately represent this historic script
than with the modern form.

After all, when I look at some medieval French texts written with what we call
écriture gothique, with its historic orthograph and letters (with long s
notably, and with the absence of modern accents, and very distinct and complex
letter shapes), many French natives will have lots of difficulties to recognize
it as French, thinking that this could be written in Latin. They will recognize
that these letters are really beautiful, but will be often intrigated by some of
them, where some letters are misidentified (b/p, o/u/v, d/a, i/n/u...),
Uppercase letters are even more difficult to decipher... This is what appears
with publications with careful typography. The situation is even worse with
manuscript written with a plum (which very similar to the German Sutterlin).
We don't need to go too far in the history to find during WW1 handwritten
letters of soldiers to their family, using letter forms that were commonly
taught in schools at that time (most of these letters are extrermely stable in
their letter forms and carefully drawn, in a typographic view): very difficult
to read by most French natives, despite it is really using the same modern
popular French language and vocabulary as used and understood today...




Re: 05A2 or 05BA? (was: Proposal to add QAMATS QATAN to the BMP of the UCS)

2004-05-04 Thread Philippe Verdy
From: Michael Everson [EMAIL PROTECTED]
 A new contribution.
 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2755.pdf
 N2755
 Proposal to add QAMATS QATAN to the BMP of the UCS
 Michael Everson  Mark Shoulson

I note that your document uses inconsistently two different code points: it
proposes the inclusion of U+05BA, but documents U+05A2 in the proposed Unicode
Character Properties... Both code points are unassigned in Unicode. Which one is
proposed?




RE: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread Ernest Cline

 [Original Message]
 From: Michael Everson [EMAIL PROTECTED]

 A new contribution.

 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2755.pdf
 N2755
 Proposal to add QAMATS QATAN to the BMP of the UCS
 Michael Everson  Mark Shoulson

Given the description in the proposal which indicates that
this character has its origin as a glyphic variant of QAMATS,
It would seem to me that it would be appropriate that this new
character's canonical combining class should either be the
same as that of QAMATS which is 18 or perhaps a new fixed
position combining class and not the 220 given in the
proposal for QAMATS QATAN.

The sequemce 05D2 05A4 05B8 normalizes to
 05D2 05B8 05A4 placing the vowel point QAMATS
before the cantilation mark.  However as proposed,
 05D2 05A4 05BA would remain in that order, leaving
QAMATS QATAN as the only Hebrew vowel point that
does not uniformly normalize to be before cantillation marks.

***

Also, the proposal gives two different potential codepoints for
QAMATS QATAN, refereing to it in one place as 05BA and in
another as 05A2.  While both are unused codepoints, it would
probably be better to place it among the other vowel points
which would make 05BA the better choice.






Re: Pal(a)eo-Hebrew and Square Hebrew

2004-05-04 Thread Patrick Andries
Dean Snyder a écrit :
Patrick Andries wrote at 8:55 AM on Monday, May 3, 2004:
 

I got this answer from a forum dedicated to Ancient Hebrew :
« Very few people can read let alone recognize the paleo Hebrew font. 
Most modern Hebrew readers are not even aware that Hebrew was once 
written in the paleo Hebrew script.
   

The same could be said for archaic Greek versus modern Greek - do you
propose to encode archaic Greek separately?
 

[PA] I'm proposing nothing here, I'm just forwarding an answer,
When the text was written in the paleo Hebrew four of the 
Hebrew letters were used as vowels - aleph, hey, vav and yud, but were 
removed from the text when the masorites added the vowel pointings. This 
is evident in the Dead Sea Scrolls where the four letters are found in 
the words but removed in the Masoretic text.
   

This is simply not true.
[PA] So there were Dead Sea Scrolls written in Square Hebrew with matres 
lectionis ? (I don't know, I just would like to know.)

P.A.




A binary file format for storing character properties

2004-05-04 Thread Theo Veenker
At this time there are about 160 different character properties defined
in the UCD. In practice most applications probably only use a limited set
of properties to work with. Nevertheless applications should be able to
lookup all the properties of a code point. Compiling-in lookup tables for
all defined properties (including Unihan) makes small applications become
rather big. This made me decide to create a binary file format for storing
character properties and initialize property lookup tables on demand.
Benefits of using run-time loadable lookup tables initialized from binary
files are:
  - no worries about total table size, since data will only be loaded
on demand
  - initializing lookup tables from a binary file is relatively fast
  - property lookup files can be locale specific (useful for character
names and case mappings for example)
  - new properties can be added quickly and never affect layout or
content of other tables
  - any number of properties can be supported including custom
(non-Unicode) properties
  - by initializing a lookup table from two sources (UCD-based and
vendor-based), applications can overload the default property
values assigned to PUA characters with private property values
The file format I've implemented is capable of storing any type of property.
Each file contains property values for one property (no more squeezing as
much property values as possible in as few bits as possible). The format
is called UPR (Unicode PRoperties).
I have written a tool to generate the necessary UPR files from the UCD. A
small C-library for reading a UPR file into a property lookup table, and
a high-level library which provides property lookup functions for *all*
Unicode properties in 4.0.0 are also available.
For more information on the file format and related software see:
http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/. My primary
development platform is UNIX/Linux, but you can compile and run it under
Windows as well (less tested however). Current version supports UCD 4.0.0,
I will add support for 4.0.1 soon.
Please check it out. Feedback is welcome.
Regards,
Theo Veenker



[Fwd: Re: New contribution]

2004-05-04 Thread Patrick Andries
03/05/2004 05:19, Michael Everson wrote:

Suetterlin.

Oh shut UP about Sütterlin already. I don't know where you guys come 
up with this stuff. Sütterlin is a kind of stylized handwriting based on 
Fraktur letterforms and ductus. It is hard to read. It is not hard to 
learn, ...

Since when is this an argument ? Neither is Phoenician hard to learn (22 
letters with no contextual variants, etc.)... Could we please remain 
courteous ?


... and it is not hard to see the relationship between its forms and 
Fraktur. ...

The relationship is not at all apparent to someone that reads only the 
Latin Script and does not know the genealogy from the Fraktur Script to the 
German Script (as Sütterlin was also called). (I like mentioning that 
people saw them as different scripts.) Quite analogous to a set of 
historically related Northern Semitic scripts, and obviously if you have 
learned the genealogy of these scripts it is easy to recognize the 
relationship...

P. A.




Re: New contribution

2004-05-04 Thread Michael Everson
At 23:08 -0400 2004-05-03, John Cowan wrote:
[EMAIL PROTECTED] scripsit:
 Those objections are quite generic and could be made just as well
 for N'ko, Ol Cemet', Egyptian Hieroglyphics, c. 
But there is no clear-cut alternative for any of those.  N'ko encoding
is font-kludge, Unicode, or nothing.  Here there is a fourth possibility:
decide that Phoenician is a script variant in the sense of ISO 15924.
But it would be wrong to do that.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: New contribution

2004-05-04 Thread Michael Everson
At 03:01 + 2004-05-04, [EMAIL PROTECTED] wrote:
John Cowan wrote,
  (And to the last, I'd be tempted to add:  If so, what on Earth could those
  objections be?)
 Expense.  Complication.  Delays while the encoding gets into the Standard
 and thence into popular operating systems, with all the accoutrements
 such as keyboard software.
Those objections are quite generic and could be made just as well
for N'ko, Ol Cemet', Egyptian Hieroglyphics, c. 

While those objections might be voiced by actual users, none of
those objections should impact the decision making process.
Hear, hear.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread Michael Everson
At 00:19 -0400 2004-05-04, Ernest Cline wrote:
It would seem to me that it would be appropriate that this new 
character's canonical combining class should either be the same as 
that of QAMATS which is 18
That is correct. We overlooked the properties line in the proposal, 
the template for which was the earlier ATNAH HAFUKH document. Sorry 
about that. It should read:

05BA;HEBREW POINT QAMATS QATAN;Mn;18;NSM;N;;*;;;
... unless there is an additional error ;-)
Thanks for reading the proposal.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: New contribution

2004-05-04 Thread Michael Everson
At 20:37 -0800 2004-05-03, D. Starner wrote:
Again, change Hebrew to Latin and palaeo-Hebrew to Fraktur and see
how many objections you get.
I should think far fewer; the legibility quotient is much different.
I have said before:
Set a German or Danish or Icelandic wedding invitation in Fraktur. No problem.
Set an Irish wedding invitation in Gaelic. No problem.
Set a Hebrew wedding invitation in Palaeo-Hebrew. Problem.
It's easy to decry Fraktur and Gaelic are hard to read but they 
AREN'T, and their use in invitations, menus, and signage is testament 
to that. The same does not obtain with Phoenician letterforms and 
Hebrew.

Again, no, you can't use archaic forms of letters in many 
situations, but that doesn't mean they aren't unified with the 
modern forms of letters.
From where I sit, it sure does.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: New contribution

2004-05-04 Thread Michael Everson
At 11:42 -0700 2004-05-03, John Hudson wrote:
Michael Everson wrote:
  Hebrew has the same 22 characters, with the same character properties.
And a baroque set of additional marks and signs, none of which 
apply to any of the Phoenician letterforms, EVER, in the history of 
typography, reading, and literature.
And a baroque set of additional marks and signs, none of which apply 
any of the STAM letterforms...
Stam are clearly letterforms belonging to the Square Hebrew 
tradition. Phoenician letterforms do not.

Historical relationships have, do, and will continue to inform some 
of the choices we make in determining what to encode in the Universal 
Character Set.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: New contribution

2004-05-04 Thread D. Starner
 A possible question to ask which is blatantly leading would be:
 
  Would you have any objections if your bibliographic database
  application suddenly began displaying all of your Hebrew
  book titles using the palaeo-Hebrew script rather than
  the modern Hebrew script and the only way to correct
  the problem would be to procure and install a new font?

Again, change Hebrew to Latin and palaeo-Hebrew to Fraktur and see 
how many objections you get. Again, no, you can't use archaic forms
of letters in many situations, but that doesn't mean they aren't
unified with the modern forms of letters. No one would have procure
and install a new font, because Arial/Helevica/FreeSans/misc-fixed
have the modern form of Hebrew and will always have the modern form
of Hebrew and all other scripts that have a modern form.

I mean, maybe you're right and Phonecian has glyph forms too far from
Hebrew's to be useful, and it's connected with Syriac and Greek as
much as Hebrew, but this argument just doesn't fly.
-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm




Re: New contribution

2004-05-04 Thread Michael Everson
At 12:13 -0700 2004-05-03, John Hudson wrote:
Michael Everson wrote:
No Georgian can read Nuskhuri without a key. I maintain that no 
Hebrew reader can read Phoenician without a key. I maintain that it 
is completely unacceptable to represent Yiddish text in a 
Phoenician font and have anyone recognize it at all.
But no one is going to do that. No one is talking about doing that. 
This is a complete irrelevancy.
No, it is not. If Phoenician letterforms are just a font variant of 
Square Hebrew then it is reasonable to assume that readers of Square 
Hebrew will accept them in various contexts. Such as newspaper 
articles, or advertising copy, or restaurant menus, or wedding 
invitations. THAT is font switching.

I consider this fundamental to script identification. The accident of 
1:1 correspondence to another alphabet is not, in my view, sufficient 
justification for unifying them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Drumming them out

2004-05-04 Thread Michael Everson
At 11:53 +1000 2004-05-01, Nick Nicholas wrote:
Coptic could have stayed unified with Greek,
Certainly not!
and myself I'm still not convinced the distinction between Greek and 
Coptic in bilingual editions is not truly just a font issue.
Plain-text searching of Crum's dictionary, for instance, is a 
perfectly valid requirement, and one which was brought to bear on the 
disunification.

So the question again becomes, not whether the scripts are 
historically or graphemically distinct, but what the body of users 
is that wants them disunified.
The distinction itself is a strong reason to disunify. We've done 
that with other scripts. And we will again, I'll warrant.

And the fonts are k00l crowd of enthusiasts :-) which the review 
of hieroglyphics has already mentioned; and I know we shouldn't 
dismiss them out of hand and all, but why can't they be accommodated 
by a font switch too?
Because we are beyond ASCII font hacks. The Phoenician block will 
allow font switching between a recognizably similar family of writing 
systems. Same as we have for Syriac, or for Old Italic. And remember 
-- most Etruscan scholars transliterate. But Unicode is not elitist. 
It's universal.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re:CJK(B) and IE6

2004-05-04 Thread jameskass

Raymond Mercier wrote,

 BabelPad is great, but it chokes in converting all the UTF8 in unihan.txt to
 NCR at one
 go. I wrote a dedicated program to do that.

Options - Advanced Options - (Edit Options) -
Make sure the box for Enable Undo/Redo is not checked.

Yes, when the commas in UNIHAN.TXT were being globally replaced
with middle dots here, BabelPad stopped responding.  But then,
Andrew wrote to the list with a tip about the undo/redo feature.
(Just in time, I was going to write a dedicated program.)

When making global changes in such a large file,
Options - Advanced Options - (Edit Options) -
Make sure the box for Enable Undo/Redo is not checked.

Best regards,

James Kass





RE: New contribution

2004-05-04 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of John Hudson


  No Georgian can read Nuskhuri without a key. I maintain that no
Hebrew
  reader can read Phoenician without a key. I maintain that it is
  completely unacceptable to represent Yiddish text in a Phoenician
font
  and have anyone recognize it at all.
 
 But no one is going to do that. No one is talking about doing that.
This
 is a complete
 irrelevancy.

Michael's argument here is based on the premise that if the communities
that use script A cannot readily interpret text in their language when
written with a written variety (and distinct-script candidate) B, then B
is distinct from A. It *is*, IMO, a valid consideration, but it alone
isn't a sufficient criterion. Note, for instance, that one could apply
that argument to try to justify a Latin cipher.



Peter Constable




Re: New contribution

2004-05-04 Thread Peter Constable
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of Francois Yergeau


 Suppose I were to float a proposal to encode Old Latin, consisting of
the
 original 23-letter unicameral alphabet.  Try this on for size:
 
  It is false to suggest that
  fully-[accented, cased Vietnamese] text can be rendered in
  [Old Latin] script and that this is perfectly
  acceptable to any [Vietnamese] reader (as would be the
  case for ordinary font change).
 
 Would you agree to encode Old Latin on those grounds?

I think there is a difference between this hypothetical example and the
PH case: the Old Latin doesn't have the accents, but if you used the 23
uni-cameral characters for Vietnamese text, then surely a Vietnamese
speaker would recognize it as caseless Vietnamese with the accents
stripped off. And it's easy to see how the accents could be added to Old
Latin to make it even closer: lower-cased Vietnamese text.

But if you took Biblical Hebrew text and set it with PH glyphs w/o
accents, there are a lot of people that know Biblical Hebrew who would
not recognize this sample as Biblical Hebrew. And there is no obvious
way to add the accents, but even if there were, I suspect those same
people still wouldn't recognize it as accented Hebrew with archaic
glyphs.

So, while Michael's argument was flawed in the way he expressed it, I
think your counter-argument also is flawed.



Peter Constable




Re: New Contribution: In support of Phoenician from a user

2004-05-04 Thread C J Fynn
 Peter Kirk [EMAIL PROTECTED] wrote:

 On 02/05/2004 11:57, Deborah W. Anderson wrote:

 As one coming from the world of ancient Indo-European (IE) and as editor of
a journal on IE out of UCLA, I am in support of the Phoenician proposal.

 Thank you, Deborah. You have given what is to me a much better argument
 for separate encoding of the Phoenician script than any that I have seen
 before, from the proposer or anyone else. I find your point about
 ensuring that XML documents are correctly displayed especially
 significant. If your support had been cited in the original proposal
 with your arguments, rather a lot of spilled electrons could have been
 saved. Well, I guess it is not too late to include them in a revised
 proposal.

No need to add it to the proposal itself - something like this should really be
formally submitted to WG2 as a separate document in support of the proposal.

- Chris




Re: Defined Private Use was: SSP default ignorable characters

2004-05-04 Thread C J Fynn

Doug Ewell [EMAIL PROTECTED] wrote:


 C J Fynn cfynn at gmx dot net wrote:

  Philippe Verdy [EMAIL PROTECTED] wrote:

  Certainly, but what is the distinction between downloading/
  distributing a font or downloading/ditributing a XML file containing
  the PUA conventions?

  One file not two - and some assurance that the custom properties
  haven't been altered since the font and the document that uses it were
  created.

 I didn't see Philippe's original post, of course, for reasons that many
 list members will remember.  But this response from Chris piqued my
 curiosity.  So I went digging into my Deleted Items folder, found the
 relevant post from Philippe, and guess what?  A miracle happened.

 I AGREE WITH PHILIPPE.

 That is, if there is ever to be a mechanism for specifying properties of
 PUA characters at the user level (Mark Davis' expectation
 notwithstanding), I agree that it should live in an external file or
 table or other data structure, not within a font.  And XML would be a
 perfectly suitable format for distributing such a property file.

 Not all font formats, not even all smart font formats, can contain all
 of the property information for every character the font supports.
 OpenType/Uniscribe was mentioned as an example where the rendering
 engine does work that would be done by the font in other systems.  The
 division of labor between font and engine isn't the same across systems.
 And even if you can tell the font about the directionality and
 default-ignorability of your characters, there are still issues like
 line breaking and mirroring (and maybe others, or maybe those are bad
 examples) that have to be handled outside the font anyway.

 Putting all the property information inside the font forces the user to
 use *only that font* for his PUA needs.  There might be a choice of
 fonts that support a particular PUA usage (such as for Klingon -- Mark
 Shoulson, is this true?) and it would not make sense to require all of
 these fonts to be updated to include property information (if that is
 even possible).  Better to store the property information separately and
 make it work for any old font the user chooses.

 Storing the custom properties in the font doesn't really provide any
 assurance that they haven't been altered.

Phillipe's suggestion  is good. I've no real objection to storing the property
information in an external XML file - storing them in a font table was just a
suggestion.

However, even if some of the info has to be handled outside of the font
rendering system you could store any kind of property info in any sfnt format
font (TT, OT. AAT, Graphite) which allows you to add custom tables - so long as
the specification for such a table had was designed to hold all the properties
that might be needed.

I'm not sure whether anyone would want to use non-standard properties for such
PUA text  where they didn't have a font that supported the properties for
display.
Given the nature of the PUA,  generic property files supposed to work for any
old font the user chooses might be problematic. Where a script  hasn't been
standardized different developers might wish to use different character
properties.

One of the reasons I suggested putting the properties in the font was that you
would then be fairly certain of having the properties that font was designed to
work with (and avoid the need of having someone maintain something like a
Con-Script character properties registry).

Anything like this should of course be expressly limited to PUA characters.

- Chris




Re: New contribution

2004-05-04 Thread C J Fynn

 John Hudson [EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED] wrote:

  While the fact that it's called Phoenician script doesn't prove anything
  about its origin, it might be considered indicative of the path through
  which the script was borrowed.

 Indeed. This is the point I made earlier: Greco-centric European scholarship
of writing
 systems calls the script 'Phoenician' because the Greeks derived their
alphabet from trade
 contact with the Phoenicians. As should be obvious from recent debate,
semiticists look at
 the old Canaanite writing systems in a different way.

So are Greco-centric European scholars / Indo-Europeanists  the
user community which some were trying to say doesn't exist?

- Chris




RE: New contribution

2004-05-04 Thread Peter Constable
What are the directional properties of Pheonician? Is it RTL only, or
was it ever written with a different directionality?



Peter Constable





RE: New contribution

2004-05-04 Thread Francois Yergeau
Peter Constable wrote:
 the Old Latin doesn't have the accents, but if you 
 used the 23
 uni-cameral characters for Vietnamese text, then surely a Vietnamese
 speaker would recognize it as caseless Vietnamese with the accents
 stripped off.

...
 So, while Michael's argument was flawed in the way he expressed it, I
 think your counter-argument also is flawed.

Hmmm, I'm not sure it's flawed.  Sure, recognizability makes it
non-equivalent to the Phoenician-Hebrew case, but it still demonstrates that
a subset-superset relationship between purported scripts A and B does not
make them distinct.

Recognizability is a much better argument, IMHO, but then there's
Sütterlin...  And cyphers, as you mention in another message.

-- 
François




Re: New contribution

2004-05-04 Thread jcowan
Peter Constable scripsit:

 2) the characters in question are structurally / behaviourally very
 similar to square Hebrew characters, but not to the characters of other
 scripts

Not just very similar: structurally, behaviorally, and even phonemically
identical.

 Item 1, I think we'd agree, is just wrong. Item 2 is probably true. But
 is it enough to refer to square Hebrew as the modern form of
 Phoenician (Old Canaanite, whatever you want to call it)?

Well, one of the two modern forms, Samaritan being the other.

-- 
John Cowan  [EMAIL PROTECTED]  www.reutershealth.com  www.ccil.org/~cowan
It's the old, old story.  Droid meets droid.  Droid becomes chameleon. 
Droid loses chameleon, chameleon becomes blob, droid gets blob back
again.  It's a classic tale.  --Kryten, Red Dwarf



Re: New contribution

2004-05-04 Thread jcowan
Peter Constable scripsit:

 What are the directional properties of Pheonician? Is it RTL only, or
 was it ever written with a different directionality?

It's RTL only, except to the extent that you consider Archaic Greek a
script variant of Phoenician.  :-)

-- 
John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  www.reutershealth.com
Any sufficiently-complicated C or Fortran program contains an ad-hoc,
informally-specified bug-ridden slow implementation of half of Common Lisp.
--Greenspun's Tenth Rule of Programming (rules 1-9 are unknown)



Re: Nice to join this forum....

2004-05-04 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 A problem, however, is that many such forms are found in unstable
 orthographies, and are difficult to document adequately for inclusion
 in proposals.

 This last argument should not be a limitation to encode them. After
 all they are used for living languages in danger of extinction, and
 even if documents using them are rare, encoding them would help
 preserving these languages and helping the development of their
 litteracy.

This is expressly NOT a goal of Unicode and ISO/IEC 10646: to encode
newly invented, possibly ephemeral, letters on the basis that doing so
might encourage literacy and save a language from extinction.

As someone once said -- I don't know who, but it sounds like John
Cowan -- we already have several hundred Latin letters in Unicode; it
shouldn't be difficult to pick one of those when developing a new
orthography, instead of inventing yet another way to write [t].

The danger of encoding novel characters on speculation that they might
be useful is that if they *don't* turn out to be useful, or if a revised
version of the orthography replaces them with something else, Unicode
and 10646 are stuck with unwanted characters, which cannot be removed
for stability reasons.

The Euro sign is a classic counterexample where strong promises of
stability and usefulness (which have been amply borne out) outweighed
the newly invented nature.

See the Principles and Procedures document for more information.

 Without them, the instability of orthographies will always be a
 problem favored by absence of standard to represent them adequately in
 any encoding or charset, so that even book publishers and authors will
 need to use their own approximations or unstable private conventions
 to represent them.

This is a problem; in an increasingly Unicode world, it is more
difficult than ever to print and interchange one's characters if they
are *not* in Unicode.  But the burden should still be on the proponents
of such a character to prove that it is in actual, stable use, and that
the need to print and interchange is real.  Otherwise, it's PUA time.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




RE: New contribution

2004-05-04 Thread Peter Constable
 Hmmm, I'm not sure it's flawed.  Sure, recognizability makes it
 non-equivalent to the Phoenician-Hebrew case, but it still
demonstrates
 that
 a subset-superset relationship between purported scripts A and B does
not
 make them distinct.

Whatever the logic in the examples, I certainly agree that a superset
does not imply a distinct script.


Peter Constable




RE: New contribution

2004-05-04 Thread Peter Constable
  Item 1, I think we'd agree, is just wrong. Item 2 is probably true.
But
  is it enough to refer to square Hebrew as the modern form of
  Phoenician (Old Canaanite, whatever you want to call it)?
 
 Well, one of the two modern forms, Samaritan being the other.

Ah, so the next protracted debate is going to be whether Samaritan
should also be encoded using the existing square Hebrew characters.
Since it would appear that the argument for unification of PH with
Hebrew could also argue for unification of PH with Samaritan, or of all
three.



Peter Constable




RE: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread Michael Everson
At 07:34 -0700 2004-05-04, Peter Constable wrote:
  05BA;HEBREW POINT QAMATS QATAN;Mn;18;NSM;N;;*;;;
Well, of course, the effect of this is that a sequence of  qamats,
qamats qatan  is not canonically equivalent to  qamats qatan, qamats
 . No harm in that, but also not especially useful, I suspect.
Mark Shoulsons says that since QAMATS QATAN is a flavour of QAMATS, 
it should behave like QAMATS. Regarding canonical equivalence, having 
both QAMATS and QAMATS QATAN on a single base letter would be 
pathological, so it doesn't really matter.

I would probably leave the value at 220. That is what all of the Hebrew
vowel points should have been, IMO. Though getting one right doesn't
make a huge difference -- people are still going to be using CGJ to
preserve particular sequences in the cases this will most likely be
needed.
Mark says that should have been is great, but fixing one point is 
of no particular utility.

For my own part, I have no strong view on this matter.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Nice to join this forum....

2004-05-04 Thread John Hudson
Philippe Verdy wrote:
A problem, however, is that many such forms are found in unstable
orthographies, and are difficult to document adequately for inclusion in
proposals.

This last argument should not be a limitation to encode them. After all they are
used for living languages in danger of extinction, and even if documents using
them are rare, encoding them would help preserving these languages and helping
the development of their litteracy.
You misunderstand me. I was not indicating the scarcity of documents (although that can 
also be a problem), and I certainly wasn't suggesting that documentation problems should 
impede encoding. I'm talking about unstable orthographies, such that the documents you may 
have -- even as recent as thirty years ago -- do not necessarily reflect current usage in 
the country in question. Some African countries have strong language standardisation 
organisations, e.g. Ghana, but in others orthographies are being developed by individual 
linguists and missionary translators, and there may be competing orthographies and 
disagreement over which should be adopted as official. On the one hand, one can make the 
argument that anything that is used or has been used in documents should be encoded -- 
which is also the approach I would favour --, but then you are likely to get African 
governments asking 'Why did you encode that? We don't use that. It isn't official.' You 
also get software developers coming along wanting to know what they need to support for a 
given language, and you can't give them a clear answer because the orthographies are 
unstable. Again, none of these factors prevent encoding of new characters, but it is a 
good idea to be aware of the uncertainty in the writing of many African languages, and 
prepared to respond to queries or objections regarding specific characters.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: Nice to join this forum....

2004-05-04 Thread Philippe Verdy
From: Doug Ewell [EMAIL PROTECTED]
 The danger of encoding novel characters on speculation that they might
 be useful is that if they *don't* turn out to be useful, or if a revised
 version of the orthography replaces them with something else, Unicode
 and 10646 are stuck with unwanted characters, which cannot be removed
 for stability reasons.

This depends who is making such proposals. When a non-governmental organization
gets some support from a UN institution for education (UNICEF for example), some
studies may be started to create or stabilize an orthograph, create a
dictionnary, guides for a language grammar or for its translation. Phonetics of
endangered languages becomes then important to help maintain this language in
its litterary form.

Some languages have quite unique sounds, but could look ugly and uneasy to teach
if it uses too many diacritics or symbols from an IPA notation. Today it seems
reasonnable to promote the adoption of an alphabet based on existing alphabets,
but avoiding digraphs can be a requirement, at least for the initial promotion
of the litterary form of the spoken language. Also, the importance of
surrounding languages in the same area may ease the transition for teaching the
local language using the same letters if possible, so that the minority language
gets a more immediate support by educated people in that country that are manily
taught another official language.

So there are reasonnable cases where it is desirable to borrow some lateral
conventions on letter forms but to respect also the uniqueness of the language
to represent with an orthographic system based on a new alphabet. To achieve
this goal, some letters need sometimes to be invented by modification of other
existing near letters.

When such program succeeds, some representative books will be published with
that orthograph, and the most useful ones will be for educational purpose
(including religious sacred books like Bible and Quran, if they can be
translated accurately into the minority language, as religion is a good
motivation to incite people to get litteracy, and get themselves a correct
reading of the true text, and then use their litteracy knowledge for commerce,
local economical development, or preservation and transmission of their
culture).




Re: New contribution

2004-05-04 Thread John Hudson
Michael Everson wrote:
No Georgian can read Nuskhuri without a key. I maintain that no 
Hebrew reader can read Phoenician without a key. I maintain that it 
is completely unacceptable to represent Yiddish text in a Phoenician 
font and have anyone recognize it at all.

But no one is going to do that. No one is talking about doing that. 
This is a complete irrelevancy.

No, it is not. If Phoenician letterforms are just a font variant of 
Square Hebrew then it is reasonable to assume that readers of Square 
Hebrew will accept them in various contexts. Such as newspaper articles, 
or advertising copy, or restaurant menus, or wedding invitations. THAT 
is font switching.

I consider this fundamental to script identification.
Okay, than I fundamentally disagree with you. Good to have that clear.
How do you distinguish those scripts that are rejected as 'ciphers' of other scripts from 
those which you want to encode, if 1:1 correspondence is not sufficient grounds for 
unification but visual dissimilarity is grounds for disunification?

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


RE: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread Peter Constable
 Mark Shoulsons says that since QAMATS QATAN is a flavour of QAMATS,
 it should behave like QAMATS.

True, but giving it the same fixed-position class actually creates a
distinction, though not a particularly significant one.


 Regarding canonical equivalence, having
 both QAMATS and QAMATS QATAN on a single base letter would be
 pathological, so it doesn't really matter.

Agreed. But having qamats qatan and a class-220 accent would not.


 I would probably leave the value at 220. That is what all of the
Hebrew
 vowel points should have been, IMO. Though getting one right doesn't
 make a huge difference -- people are still going to be using CGJ to
 preserve particular sequences in the cases this will most likely be
 needed.
 
 Mark says that should have been is great, but fixing one point is
 of no particular utility.

It provides improvement for very rare possibilities, which is indeed
marginal and only a minor drop in the larger bucket.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




RE: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread Michael Everson
OK, I don't care whether it is 18 or 220, and I am not qualified to 
decide. You and Mark (and whoever else cares) can duke this one out.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: New contribution

2004-05-04 Thread Christian Cooke
Hullo,
I'll claim the immunity of the ill-informed in contributing this but...
On 4 May 2004, at 17:04, John Hudson wrote:
Michael Everson wrote:
No, it is not. If Phoenician letterforms are just a font variant of 
Square Hebrew then it is reasonable to assume that readers of Square 
Hebrew will accept them in various contexts. Such as newspaper 
articles, or advertising copy, or restaurant menus, or wedding 
invitations. THAT is font switching.

I consider this fundamental to script identification.
How do you distinguish those scripts that are rejected as 'ciphers' of 
other scripts from those which you want to encode, if 1:1 
correspondence is not sufficient grounds for unification but visual 
dissimilarity is grounds for disunification?
Surely a cipher is by definition after the event, i.e. there must be 
the parent script before the child. Does it not follow that, by John's 
reasoning, if one is no more than a cipher of the other then it is 
Hebrew that is the cipher and so the only way Phoenician and Hebrew can 
be unified (a suggestion you'll have to assume is suitably showered 
with smileys :-) is for the latter to be deprecated and the former 
encoded as the /real/ parent script?

Christian



Re: Pal(a)eo-Hebrew and Square Hebrew

2004-05-04 Thread Peter Kirk
On 03/05/2004 11:47, Patrick Andries wrote:
Peter Kirk a écrit :
On 03/05/2004 05:55, Patrick Andries wrote:
...
When the Biblical text is written in paleo Hebrew there are no vowel 
pointings. When the text was written in the paleo Hebrew four of the 
Hebrew letters were used as vowels - aleph, hey, vav and yud, but 
were removed from the text when the masorites added the vowel 
pointings. This is evident in the Dead Sea Scrolls where the four 
letters are found in the words but removed in the Masoretic text.


No. The DSS, or nearly all of them, are in square script, and this 
indicates that the (partial) removal of these additional letters (if 
that is indeed a correct way to describe what happened) took place 
long after the transition from paleo-Hebrew to square script.

Do I understand from your remark that the Square Script DSS use matres 
lectionis ?

P. A.


Yes. The Masoretic text Hebrew uses matres lectionis (though not alef as 
one, except perhaps in the Aramaic portions). The earlier square script 
DSS use more of them. Most paleo-Hebrew texts use very few if any of 
them, because they were only starting to be used in pre-exilic times. 
I'm not sure about later paleo-Hebrew texts like the few paleo-Hebrew DSS.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Arid Canaanite Wasteland (was: Re: New contribution)

2004-05-04 Thread Peter Kirk
On 02/05/2004 16:26, Michael Everson wrote:
At 11:06 -0700 2004-05-02, Peter Kirk wrote:
Michael Everson, who knows so little Phoenician that he doesn't know 
how similar it is to Hebrew?

You are confusing language and script. I am not encoding the 
Phoenician language. ...

No, I am not, despite you and James trying to claim that I am, and
despite your attempt to label a script with the name of just one of the
languages using it, which is not only confusing but historically of
doubtful accuracy. My point was that you cannot claim to be a user of
the Phoenician script if you are not familiar with the Phoenician
language. More accurately I should have said, if you are not familiar
with any of the languages written with the Phoenician script. This group
is (apparently apart from the Edessa inscription just mentioned) a small
set of closely related languages in which you do not seem to be an expert.
...I am encoding a set of genetically related scripts with similar 
behaviours, which differ from Hebrew in shape (but which are similar 
in shape themselves) and in function (Hebrew has grown enormously 
complex with its representation. I believe that if you take a pointed 
and cantillated Hebrew text and were to change the font to 
Phoenician you would end up with something that is, plain and 
simply, utterly wrong.

Well, you would end up with something novel and not widely understood,
just as you would if you used a Fraktur font to display a Vietnamese
text complete with multiple diacritics. You can't stop people encoding
garbage in Unicode if they want to.
...
Anyone else? Perhaps one or two, and no evidence for a group. Not 
nearly as many as want Klingon encoded. Do they have an actual use 
for the script?

It is a Universal Character Set. It is not a character set for Certain 
Kinds of Semiticists Who Think That Everything Is Hebrew. The 
Phoenician script has other clients. ...

OK, if you say so, but then, name names, or at least demonstrate the
truth of this statement. According to your proposal, you have not been
in contact with any users of the Phoenician script, but I suppose you
could still know who they are. But then Deborah Anderson has just stated
that she is a user of it, and I know you have had extensive contact with
her. I thought of accusing you of lying in the proposal, but it is
possible that you were unaware that she is a user. I suggest that your
revise your proposal to mention your contact with her, and preferably to
summarise her good reasons for supporting your proposal.
... Runic has specialist and non-specialist clients. Gothic has 
specialist and non-specialist clients. Egyptian has specialist and 
non-specialist clients.

Children learning about the history of their alphabets are arguably 
more important than narrow-minded pendants who think that by bluster 
they can detract us from our goal.

Well then, show us a children's book which uses Phoenician plain text,
rather than a table of glyphs.
Which is to encode all of the world's writing systems in a Universal 
Character Set.

Including Klingon? Or are there some unstated conditions here that the
writing systems have to be actually in use?

Have they demonstrated a need for it or that, if encoded, anyone will 
actually use it? Surely these are the criteria for encoding a script, 
not just that one person has asked for it to be encoded and a few 
have supported him.

I guess it is just a misapprehension on your part about what you will 
be forced to do.

Let's rehearse it again.
Most Germanicists prefer to transliterate Gothic text into Latin to 
work with it, to study it, to publish it, to read it. We encoded 
Gothic anyway, because it is a separate script from Greek.

Most Germanicists prefer to transliterate Runic text into Latin to 
work with it, to study it, to publish it, to read it. We encoded Runic 
anyway, doubtless to the joy of adolescent Dungeons-and-Dragons 
players everywhere.

Most Semiticists (you claim) prefer to transliterate Phoenician (and 
other language) text into Hebrew (or Latin) to work with it, to study 
it, to publish it, to read it. We should encode the Phoenician
family of scripts anyway, because

Your claim that Phoenican is just a subset of Hebrew ignores the 
historical facts of the development of the Hebrew script, in 
particular with regard to the development of related scripts like 
Samaritan. The unification which we did for Phoenician correctly 
rounds up like with like, and leaves specialized branches of the West 
Semitic writing systems (like Hebrew and Samaritan) alone as separate 
scripts.

My claim was not quite this. It was rather that Phoenician can be
treated as subset of Hebrew, and the need to treat it otherwise had not
been demonstrated. I think Deborah's contribution has now come close to
demonstrating that need.

Need is more than just want. I am thinking of people who would 
actually use this encoding, who would prefer to use it, and who are 
not adequately provided for by 

Re: New contribution

2004-05-04 Thread John Hudson
Christian Cooke wrote:
Surely a cipher is by definition after the event, i.e. there must be 
the parent script before the child. Does it not follow that, by John's 
reasoning, if one is no more than a cipher of the other then it is 
Hebrew that is the cipher and so the only way Phoenician and Hebrew can 
be unified (a suggestion you'll have to assume is suitably showered with 
smileys :-) is for the latter to be deprecated and the former encoded as 
the /real/ parent script?
The argument of at least some contributors to this discussion is that the Hebrew' block 
is misnamed. Even if one accepts that 'Phoenician' should be separately encoded, the 
Hebrew block should have been called 'Aramaic' :)

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
I often play against man, God says, but it is he who wants
  to lose, the idiot, and it is I who want him to win.
And I succeed sometimes
In making him win.
 - Charles Peguy


Re: New contribution

2004-05-04 Thread Peter Kirk
On 02/05/2004 16:28, Michael Everson wrote:
...
Common sense says that you should not use the Hebrew block for 
Phoenician script with a masquerading font, since the Hebrew script 
and the Phoenician script are different scripts.

OK, I get the point. Unicode doesn't tell anyone what to do, but common
sense doesn't. Semiticists are allowed to continue to do what they are
doing if your proposal is accepted, but if they don't, in your opinion
they lack common sense. Well, I suspect this negative opinion might be
mutual, and your proposal might be ignored.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: New contribution

2004-05-04 Thread Peter Kirk
On 02/05/2004 14:38, [EMAIL PROTECTED] wrote:
...
The Meshe Stele and the inscription of Edessa were originally written
in the same script.  If encoding the Edessa inscription using the
Hebrew range would be transliteration, then so would the encoding
of the Meshe Stele in the Hebrew range.
 

And if black is white, then white is black. On the other hand, if your
Edessa inscription (by the way is there an Edessa in Macedonia as well
as the well known Edessa in modern Turkey?), and is written with
Phoenician glyphs (as you have stated, I think), and if Phoenician
glyphs are glyph variants of Hebrew glyphs (the hypothesis being
tested), then encoding the Edessa inscription with Hebrew characters is
not transliteration, just as encoding of a text written in Fraktur with
Latin characters is not transliteration but the standard way of encoding
the text. All this is quite independent of the language of the text.
If Phoenician is considered a glyphic variation of modern Hebrew, then
it can also be considered a glyphic variation of modern Greek.  Would
it then follow that modern Greek should have been unified with modern
Hebrew?  (Directionality aside.)
 

In principle, the only thing which makes these unifications impossible
is directionality. I am sure there are a number of other things which
would make them undesirable.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: New Contribution: In support of Phoenician from a user

2004-05-04 Thread Peter Kirk
On 03/05/2004 19:04, Michael Everson wrote:
At 09:41 -0700 2004-05-03, Peter Kirk wrote:
If your support had been cited in the original proposal with your 
arguments, rather a lot of spilled electrons could have been saved. 
Well, I guess it is not too late to include them in a revised proposal.

What format would you like that addition to have? ...

I'll leave that to you, but for a start you can name Deborah Anderson as 
a user of the script with whom you have had contact. And yourself if you 
like, as far as I am concerned.

... While I am pleased that you are happier, my own interest is in the 
technical accuracy of the code chart and character names, not in 
*justifying* its inclusion.

Well, I hope the UTC is concerned with the justification of new 
proposals, and not just their technical accuracy. They were obviously 
concerned that the Klingon proposal was not properly justified, and so 
rejected it. If your proposal is not to suffer the same fate, it needs 
proper justification.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: New contribution

2004-05-04 Thread Peter Kirk
On 03/05/2004 19:03, Michael Everson wrote:
At 10:25 -0700 2004-05-03, Peter Kirk wrote:
It is not possible to take an encoded Genesis text which is pointed 
and cantillated, and blithly change the font to Moabite or Punic and 
expect anyone to even recognize it as Hebrew.

Michael, you assert this, but do you actually know it to be true?

Yes. Yes, I do. Mark Shoulson did a test today with a group of 
well-educated young Hebrew-speaking computer programmers. They did not 
recognize it.

Thanks for the data. These are I suppose American Jews. A fairer test 
might be among Israeli native speakers of Hebrew.

...
But this text would be easily recognisable and readable by anyone 
familiar with both Hebrew and the Phoenician glyphs.

I do not believe that any Yiddish speaker would accept a text in a 
Phoenician font as Yiddish.

Well, someone somewhere (in Edessa apparently, but I still don't know 
which Edessa) accepted a Phoenician script text as Greek. And there are 
people today who accept Samaritan script text as English. As any script 
can be used for any language, we really can't try to decide for users 
which scripts go with which languages.

The field of application of Phoenician is so limited that the script 
just can't be mapped on to the rich typographic and font traditon of 
Square Hebrew with any sense at all.

Wedding invitations are routinely set in Blackletter and Gaelic 
typefaces. I bet you £20 that if an ordinary Hebrew speaker sent out a 
wedding invitation in Palaeo-Hebrew no one would turn up on the day.

And I bet you £20 that is an ordinary English speaker sent out a wedding 
invitation in Suetterlin no one would turn up on the day. Now we just 
need some gullible couples to put our challenges to the test!

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: For Phoenician

2004-05-04 Thread Peter Kirk
On 02/05/2004 17:35, Philippe Verdy wrote:
...
Please be polite Peter. You're talking to the official registrar appointed by
Unicode, the ISO 15924 Registration Agency.
Well, Michael is only the registrar. ISO 15924 will continue to have more
details about what is considered as a separate script for bibliographic
references and differenciation of publications.
 

I am really impressed - not! ;-)
...
The situation for Phenician is quite different. The Hebrew script is already
extremely complex by itself. Som of its most complex rules, that would work and
produce desirable effects in the square hebrew variant, would become disastrous
with another form. Can you really make semantic distinctions with the glyph
layout of hataf vowels applied on top of Phoenician/Old Canaanite glyphs? If you
had to create a special layout engine to handle multiple cantillation and vowel
marks applied safely on square hebrew, would it work as well with the Old
Canaanite base glyphs, which were not designed to support these diacritics and
allow differentiating them?
 

This would require some creative font design to avoid collisions with
descenders, but would be by no means impossible.
...
How will you handle the possible inclusion of new variants or additional letters
from the base Phoenician script, without breaking some of the modern Hebrew
script rules? These are probably lots of these additional variants and
extensions, used in the genesis or evolution of other languages and scripts. If
you integrate them into only the Phoenician script, with a more relaxed rule
than for Hebrew which is strongly fixed today, you'll break the fragile buiding
of the Hebrew script.
 

Of course I cannot already handle all hypothetical possible extensions
to the current scripts. But Unicode deals with existing scripts, not
hypothetical ones. If you have any evidence for any such variants in
actual use, please send it to me, and to Michael as he may wish to
incorpotate it in his proposal.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: [OT] Europe (Was:: Defined Private Use)

2004-05-04 Thread Peter Kirk
On 02/05/2004 20:33, John Cowan wrote:
Ernest Cline scripsit:
 

Defining Europe is vague.  
   

Well, Michael Everson back in 1995 defined it thus:
Europe extends from the Arctic and Atlantic (including
Iceland and the Faroe Islands) southeastwards to the
Mediterranean (including Malta and Cyprus), with its
eastern and southern borders being the Ural Mountains,
the Ural River, the Caspian Sea, and Anatolia, inclusive
of Transcaucasia.
A more precise political definition can be found at
http://www.evertype.com/alphabets/index.html#a .
 

For once I agree with Michael! :-)
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: New contribution

2004-05-04 Thread Peter Kirk
On 03/05/2004 05:19, Michael Everson wrote:
...
Suetterlin.

Oh shut UP about Sütterlin already. I don't know where you guys come 
up with this stuff. Sütterlin is a kind of stylized handwriting based 
on Fraktur letterforms and ductus. It is hard to read. It is not hard 
to learn, ...

Nor is Phoenician.
... and it is not hard to see the relationship between its forms and 
Fraktur. ...

Nor is it hard so see the same relationship between Phoenician and
Hebrew with the help of alphabet development charts of the kind in your
proposal.
... Its existence is not the same kind of historical relationship that 
Phoenician letterforms have to Hebrew letterforms. People have letters 
in their attics written by their grandfathers in Sütterlin. ...

The Phoenicians, paleo-Hebrews etc were not as tidy as the Germans, and
so left their letters lying about on the ground, where (since they were
written on bits of pottery) they could be dug up millennia later and read.
... You can buy books to teach you how to learn Sütterlin. ...

... and Phoenician script.
... Germans who don't read Sütterlin recognize it as what it is -- a 
hard-to-read way that everyone used to write German not so long ago.

And modern Hebrews recognise paleo-Hebrew as a now hard-to-read way that
everyone used to write Hebrew a rather longer time ago.
Phoenician script, on the other hand, is so different that its use 
renders a ritual scroll unclean. If you ask me, who shall I believe, 
John Cowan who has a structural theory or the contemporary users of 
Phoenician/Palaeo-Hebrew vs Aramaic/Square-Hebrew in determining 
whether the scripts are unifiable or not, I shall believe the 
contemporary users, who considered the scripts anything BUT unifiable.

Which contemporary users? I thought you had not been in contact with any.
...
Either way, pointed and cantillated text displayed in a Phoenician 
font is a JOKE at best. And not a very good one.

It is not very good, but it is not a joke, just an anachronism -
although potentially rather a useful one for people who try to
reconstruct Phoenician pronunciation.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: Pal(a)eo-Hebrew and Square Hebrew

2004-05-04 Thread Peter Kirk
On 03/05/2004 05:55, Patrick Andries wrote:
...
When the Biblical text is written in paleo Hebrew there are no vowel 
pointings. When the text was written in the paleo Hebrew four of the 
Hebrew letters were used as vowels - aleph, hey, vav and yud, but were 
removed from the text when the masorites added the vowel pointings. 
This is evident in the Dead Sea Scrolls where the four letters are 
found in the words but removed in the Masoretic text.

No. The DSS, or nearly all of them, are in square script, and this
indicates that the (partial) removal of these additional letters (if
that is indeed a correct way to describe what happened) took place long
after the transition from paleo-Hebrew to square script.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: The Unicode.ORG Server is now moved

2004-05-04 Thread Peter Kirk
On 03/05/2004 18:40, Rick McGowan wrote:
The Unicode.ORG server move has gone more or less according to plan, and  
mail lists have been turned back on. Thank you for your patience.

During the next few weeks, if you notice any service on Unicode.ORG that  
previously worked but is now broken, or if you suspect that some HTML files  
are missing or corrupted please do not hesitate to contact me (off list  
please). I will investigate.

Regards,
Rick McGowan
Unicode, Inc.

 

I sent several messages to the list between 16:20 and 16:30 GMT which 
were simply lost. These were therefore sent some time before the 
announced time of the list being closed down - timing which I chose 
deliberately. This is not an acceptable way to manage a list server. You 
should refuse to accept messages which you are unable to deliver so that 
they are queued for retransmission when the server comes up again.

I am resending these messages, when I can get Internet access from here 
in Azerbaijan which is sometimes a problem. This is resulting in a 
considerable delay to important traffic, and significant expense to me 
as I have to pay by the minute for Internet access.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: Drumming them out

2004-05-04 Thread Peter Kirk
On 04/05/2004 06:10, Michael Everson wrote:
...
and myself I'm still not convinced the distinction between Greek and 
Coptic in bilingual editions is not truly just a font issue.

Plain-text searching of Crum's dictionary, for instance, is a 
perfectly valid requirement, and one which was brought to bear on the 
disunification.

Out of interest, are there any dictionaries e.g. of the Phoenician 
language which use both Phoenician and Hebrew script, with a plain text 
distinction? I can quite imagine that there are. If there are, they 
would provide a good justification for your proposal, helping to supply 
what is currently missing.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: New contribution

2004-05-04 Thread Peter Kirk
On 02/05/2004 16:48, Michael Everson wrote:
...
It is not possible to take an encoded Genesis text which is pointed 
and cantillated, and blithly change the font to Moabite or Punic and 
expect anyone to even recognize it as Hebrew.

Michael, you assert this, but do you actually know it to be true? After
all, this is not your area of expertise. I agree that this kind of
mixture is an anachronistic one, much like the example I mentioned
earlier of Vietnamese in Fraktur. But this text would be easily
recognisable and readable by anyone familiar with both Hebrew and the
Phoenician glyphs.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: Arid Canaanite Wasteland

2004-05-04 Thread Peter Kirk
On 03/05/2004 15:33, Simon Montagu wrote:
Peter Kirk wrote:
On 02/05/2004 05:27, [EMAIL PROTECTED] wrote:
Quoting from the jewfaq page,
The example of pointed text above uses Snuit's Web Hebrew AD font. 
These Hebrew fonts map to ASCII 224-250, high ASCII characters which 
are not normally available on the keyboard, but this is the mapping 
that most Hebrew websites use. I'm not sure how you use those 
characters on a Mac. In Windows, you can go to ...

Is this the same as ISO 8859-8 visual encoding?
Codepoint for codepoint, yes, but IIRC the Web Hebrew fonts only 
worked on sites that were declared (or assumed by default) to be in 
ISO-8859-1 encoding.



But presumably if the same sites were declared as ISO-8859-8 visual they
would be readable with standard Unicode Hebrew fonts, in browsers which
perform the correct mappings? Well, now if I find an unreadable page
which is supposed to be Hebrew, I know which encoding to select manually.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: New contribution

2004-05-04 Thread Peter Kirk
On 04/05/2004 08:58, Peter Constable wrote:
Item 1, I think we'd agree, is just wrong. Item 2 is probably true.
 

But
 

is it enough to refer to square Hebrew as the modern form of
Phoenician (Old Canaanite, whatever you want to call it)?
 

Well, one of the two modern forms, Samaritan being the other.
   

Ah, so the next protracted debate is going to be whether Samaritan
should also be encoded using the existing square Hebrew characters.
Since it would appear that the argument for unification of PH with
Hebrew could also argue for unification of PH with Samaritan, or of all
three.

Peter Constable
 

From my point of view, Michael could have made a better case for a 
unified Phoenician and Samaritan proposal. But I think he intends a 
separate Samaritan proposal. And that I would not oppose, because there 
is an easily demonstrable user community of modern Samaritans. Although 
I would still want assurances that they don't consider Samaritan script 
to be glyph variants of Hebrew script.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: New contribution

2004-05-04 Thread Peter Kirk
On 03/05/2004 06:47, Michael Everson wrote:
...
And frankly, I don't consider that Snyder or Kirk or Cowan speak for 
the Semiticist community as they would have us think.

I admit freely that I don't. And I don't consider that Everson speaks
for the Phoenician script user community as it seems he would now have
us think. The reason? That he has explicitly denied, in his proposal,
having any contact with this community.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Re: Pal(a)eo-Hebrew and Square Hebrew

2004-05-04 Thread Patrick Andries
Peter Kirk a écrit :
On 03/05/2004 05:55, Patrick Andries wrote:
Quoted...
...
When the Biblical text is written in paleo Hebrew there are no vowel 
pointings. When the text was written in the paleo Hebrew four of the 
Hebrew letters were used as vowels - aleph, hey, vav and yud, but 
were removed from the text when the masorites added the vowel 
pointings. This is evident in the Dead Sea Scrolls where the four 
letters are found in the words but removed in the Masoretic text.

No. 




Re: New contribution

2004-05-04 Thread Peter Kirk
On 04/05/2004 06:44, Peter Constable wrote:
...
But if you took Biblical Hebrew text and set it with PH glyphs w/o
accents, there are a lot of people that know Biblical Hebrew who would
not recognize this sample as Biblical Hebrew. ...
Well, Peter, that's not the point. A lot of Vietnamese people would not 
recognise accentless Suetterlin as Vietnamese, they might well guess it 
was a quite different script. But Suetterlin and Vietnamese are unified.

... And there is no obvious
way to add the accents, but even if there were, I suspect those same
people still wouldn't recognize it as accented Hebrew with archaic
glyphs.
 

I don't see any problem in adding the accents if anyone wants to do so. 
After all, they stand above and below the letters, and can be shifted 
out of the way of descenders and ascenders if necessary. No one would 
want to do so, but that's not the point.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



Re: New contribution

2004-05-04 Thread Michael Everson
At 09:43 -0700 2004-05-04, Peter Kirk wrote:
Mark Shoulson did a test today with a group of well-educated young 
Hebrew-speaking computer programmers. They did not recognize it.
Thanks for the data. These are I suppose American Jews. A fairer 
test might be among Israeli native speakers of Hebrew.
(*jaw drops*)
Excuse me?
I don't think I am going to be able to discuss user communities of 
the Universal Character Set with you if this kind of exclusivist 
rubbish is what you think can possibly apply.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Just if and where is the then?

2004-05-04 Thread African Oracle
If a can have U+0061 and have a composite that is U+00e2...U+...
If e can have U+0065 and have a composite that is U+00ea...U+...

Then why is e with accented grave or acute and dot below cannot be assigned
a single unicode value instead of the combinational values 1EB9 0301 and
etc

Since UNICODE is gradually becoming a defacto, I still think it will not be
a bad idea to have such composite values.

Dele Olawole





Re: New Contribution: In support of Phoenician from a user

2004-05-04 Thread Michael Everson
At 09:47 -0700 2004-05-04, Peter Kirk wrote:
On 03/05/2004 19:04, Michael Everson wrote:
At 09:41 -0700 2004-05-03, Peter Kirk wrote:
If your support had been cited in the original proposal with your 
arguments, rather a lot of spilled electrons could have been 
saved. Well, I guess it is not too late to include them in a 
revised proposal.
What format would you like that addition to have? ...
I'll leave that to you,
I'm not really all that interested in the justifications per se. I 
write proposals to encode things that I think should be encoded. That 
involves an investment of time and resources, which implies that I 
think it is worthwhile investing in. Does that make sense to you?

but for a start you can name Deborah Anderson as a user of the 
script with whom you have had contact. And yourself if you like, as 
far as I am concerned.
What if I had done that to start with?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Drumming them out

2004-05-04 Thread Michael Everson
At 10:00 -0700 2004-05-04, Peter Kirk wrote:
and myself I'm still not convinced the distinction between Greek 
and Coptic in bilingual editions is not truly just a font issue.
Plain-text searching of Crum's dictionary, for instance, is a 
perfectly valid requirement, and one which was brought to bear on 
the disunification.
Out of interest, are there any dictionaries e.g. of the Phoenician 
language which use both Phoenician and Hebrew script, with a plain 
text distinction?
James Kass presented a non-dictionary text the other day. I 
considered it plain text. Others didn't.

I can quite imagine that there are.
I don't know. Mostly I would expect to see Hebrew or Latin 
transliteration in such dictionaries. Encoding Phoenician in a 
scholarly context is likely to be more prominent in teaching 
students, preparing exams and grammars, etc (same thing has been said 
about other scripts which are often transliterated).

If there are, they would provide a good justification for your 
proposal, helping to supply what is currently missing.
Enshrining justifications in the proposal documents really all that 
important? It sounds like busywork to me.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: New contribution

2004-05-04 Thread Dominikus Scherkl \(MGW\)
 How do you distinguish those scripts that are rejected as 'ciphers'
 of other scripts from those which you want to encode, if 1:1 correspondence
 is not sufficient grounds for unification but visual dissimilarity
 is grounds for disunification?

As far as I can follow Michaels arguments he says the following:

Disunification for scipts with 1:1 correspondence requires
- having distinct glyphs
- beeing a relevant script (e.g. historical important, because
  other scipts do also derive from it, not only the one with the 1:1
  correspondence).

The later isn't true especialy for Klingon, but it's also not true
for e.g. fraktur, because fraktur is the derived script, not latin.

-- 
Dominikus Scherkl





Re: New contribution

2004-05-04 Thread Patrick Andries
Christian Cooke a écrit :
Surely a cipher is by definition after the event, i.e. there must be 
the parent script before the child. Does it not follow that, by John's 
reasoning, if one is no more than a cipher of the other then it is 
Hebrew that is the cipher and so the only way Phoenician and Hebrew 
can be unified (a suggestion you'll have to assume is suitably 
showered with smileys :-) is for the latter to be deprecated and the 
former encoded as the /real/ parent script? 
What is so important about genealogy ?
P. A. (immunity of the ill-informed also requested)


Re: New contribution

2004-05-04 Thread Michael Everson
At 15:16 -0400 2004-05-04, Patrick Andries wrote:
Christian Cooke a écrit :
Surely a cipher is by definition after the 
event, i.e. there must be the parent script 
before the child. Does it not follow that, by 
John's reasoning, if one is no more than a 
cipher of the other then it is Hebrew that is 
the cipher and so the only way Phoenician and 
Hebrew can be unified (a suggestion you'll have 
to assume is suitably showered with smileys :-) 
is for the latter to be deprecated and the 
former encoded as the /real/ parent script?
What is so important about genealogy ?
Historical origin of characters and scripts is 
one of the things which we take into account when 
determining their identity.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




Re: The Unicode.ORG Server is now moved

2004-05-04 Thread Rick McGowan
Since Peter Kirk wrote, on the Unicode list, I'll CC the list.

Peter Kirk wrote:

 I sent several messages to the list between 16:20 and 16:30 GMT
 which were simply lost.

You are wrong. They were not lost -- at least not on this server. Check  
the archives. (OK, I've had some config trouble with bringing up the new  
real-time archives, but your messages are there, and you can check them. I  
can't guarantee that everything that left *your* machine arrived here, but  
everything that arrived here is in the archives.)

I am certain that everything which *arrived* at the Unicode.org server was  
*delivered* into the mail list process. (What happens after stuff leaves  
this machine for downstream delivery is someone else's problem.)

 These were therefore sent some time before the
 announced time of the list being closed down - timing which I chose
 deliberately. This is not an acceptable way to manage a list server.

Thank you for your concern, but, I know what I'm doing. Everything was  
properly shut down and the outgoing queue drained appropriately.

 You should refuse to accept messages which you are unable to
 deliver so that they are queued for retransmission when the server
 comes up again.

If you want to talk about how the lists are managed, please don't do it on  
this list. It's off-topic. Anyway, there was nothing to queue for  
re-transmission. Incoming mail acceptance was turned off at an appropriate  
juncture, and the outbound queue allowed to drain.

Rick



Re: The Unicode.ORG Server is now moved

2004-05-04 Thread Ernest Cline

Actually, I had already seen all of the messages you resent, Peter, so they
apparently did get through the first time. It may well be that something
happened to delay them getting thru to you.  Some other threads have
appeared disjointed to me tho, so there do appear to be real problems,
or else people have been posing replies to the list that didn't send the
original there.  It could be a side effect of the move, the current Sasser
Internet worm epidemic, or perhaps even the ghosts of unhappy Paleo-
Hebrew scribes warring in the ether(net) over wether Phoenician and
Hebrew should be encoded as separate scripts  in Unicode.





Re: New contribution

2004-05-04 Thread jcowan
Michael Everson scripsit:

 Well. Depends what you mean by forms. Our taxonomy currently lists 
 Samaritan, Square Hebrew, Arabic, Syriac, and Mandaic as modern (RTL) 
 forms of the parent Phoenician.

Arabic and Syriac have very specialized shaping behavior which makes them
obviously distinct from their parent form.  I believe that Mandaic has
this property too.

 Ah, so the next protracted debate is going to be whether Samaritan 
 should also be encoded using the existing square Hebrew characters.
 
 So far participants on this discussion seem to have stipulated that 
 Samaritan be encoded as a modern and unique script.

I have merely postponed the question.  I would still prefer to see an
overall plan with justification (that is, an update of N2311) before any
of these scripts get encoded.

-- 
Evolutionary psychology is the theory   John Cowan
that men are nothing but horn-dogs, http://www.ccil.org/~cowan
and that women only want them for their money.  http://www.reutershealth.com
--Susan McCarthy (adapted)  [EMAIL PROTECTED]



RE: Just if and where is the then?

2004-05-04 Thread Ernest Cline



 [Original Message]
 From: African Oracle [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Date: 5/4/2004 7:04:48 PM
 Subject: Just if and where is the then?

 If a can have U+0061 and have a composite that is U+00e2...U+...
 If e can have U+0065 and have a composite that is U+00ea...U+...

 Then why is e with accented grave or acute and dot below cannot be
 assigned a single unicode value instead of the combinational values
 1EB9 0301 and etc

 Since UNICODE is gradually becoming a defacto, I still think it will not
 be a bad idea to have such composite values.

 Dele Olawole

Take a look at the Unicode Stability Policy [1].  While it does not make
it impossible for there to be a Unicode character LATIN SMALL LETTER E
WITH DOT BELOW AND ACUTE ACCENT that would decompose to
U+1EB9 U+0301, such a character would have to have the Composition
Exclusion property so that it would not appear in any of the Unicode
Normalization Forms.  A number of other standards, such as XML expect
the data they contain to be handled in normalized form, hence even if
the precomposed form were available, most software would still prefer
to work with the unprecomposed form.  The result is that unless there is
another official character standard that has LATIN SMALL LETTER E
WITH DOT BELOW AND ACUTE ACCENT as a character, there is no
benefit to be gained by encoding such a character in Unicode.  Even
then, the benefit is very small as it is only that a transformation from a
single codepoint of that other standard into a single codepoint of the
Unicode standard could be done.  That was an important consideration
when Unicode was getting started, but is not particularly important now.

[1] http://www.unicode.org/standard/stability_policy.html





Re: New contribution

2004-05-04 Thread Mark Davis
I want to point out that the inclusion of a name in N2311 does not mean a
*guaranteed* place in Unicode for it. All it means is that according to our best
current information, we're trying to reserve space for what we think will be
there. But until we get and assess actual concrete proposals, we can't determine
whether two proposed scripts should be unified, or one proposed script should be
de-unified.

Mark
__
http://www.macchiato.com
  

- Original Message - 
From: [EMAIL PROTECTED]
To: Michael Everson [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Tue, 2004 May 04 12:51
Subject: Re: New contribution


 Michael Everson scripsit:

  Well. Depends what you mean by forms. Our taxonomy currently lists
  Samaritan, Square Hebrew, Arabic, Syriac, and Mandaic as modern (RTL)
  forms of the parent Phoenician.

 Arabic and Syriac have very specialized shaping behavior which makes them
 obviously distinct from their parent form.  I believe that Mandaic has
 this property too.

  Ah, so the next protracted debate is going to be whether Samaritan
  should also be encoded using the existing square Hebrew characters.
 
  So far participants on this discussion seem to have stipulated that
  Samaritan be encoded as a modern and unique script.

 I have merely postponed the question.  I would still prefer to see an
 overall plan with justification (that is, an update of N2311) before any
 of these scripts get encoded.

 -- 
 Evolutionary psychology is the theory   John Cowan
 that men are nothing but horn-dogs, http://www.ccil.org/~cowan
 and that women only want them for their money.  http://www.reutershealth.com
 --Susan McCarthy (adapted)  [EMAIL PROTECTED]






Re: New contribution

2004-05-04 Thread Dean Snyder
Michael Everson wrote at 7:21 AM on Tuesday, May 4, 2004:

No, Proto-Sinaitic is out, actually, though it's still in the Summary 
Form by accident.

For similar reasons, Proto-Canaanite should be out.


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi





Re: Pal(a)eo-Hebrew and Square Hebrew

2004-05-04 Thread Dean Snyder
Patrick Andries wrote at 6:53 AM on Tuesday, May 4, 2004:

So there were Dead Sea Scrolls written in Square Hebrew with matres 
lectionis ? (I don't know, I just would like to know.)

Yes; and with final forms of the usual letters.


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi





Re: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread Michael Everson
At 10:09 +0200 2004-05-05, Simon Montagu wrote:
Proposal to add QAMATS QATAN to the BMP of the UCS
Michael Everson  Mark Shoulson
Nice.
Ta.
  8a. Can any of the proposed characters be considered a presentation
 form of an existing character or character sequence?
 No.
Is this overstating the case?
It's got a unique glyph representation, it's got its own name, and it 
has its own pronunciation, so in our judgement it is not a 
presentation form of QAMATS.

flippancyIsn't it a little strange that a short qamats should 
represented with a longer vertical than a regular qamats?/flippancy
Them's the facts.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: New contribution

2004-05-04 Thread Patrick Andries
Patrick Andries a écrit :
Christian Cooke a écrit :
Surely a cipher is by definition after the event, i.e. there must 
be the parent script before the child. Does it not follow that, by 
John's reasoning, if one is no more than a cipher of the other then 
it is Hebrew that is the cipher and so the only way Phoenician and 
Hebrew can be unified (a suggestion you'll have to assume is suitably 
showered with smileys :-) is for the latter to be deprecated and the 
former encoded as the /real/ parent script? 

What is so important about genealogy ?
Let me precise this : what is so important whether we encode the father 
or one of the sons ?




Re: Just if and where is the then?

2004-05-04 Thread African Oracle
The existing composites were included only out of necessity so that new
Unicode implementations could interoperate with existing implementations
using legacy industry-standard encodings. - Peter Constable

Are we saying we have exhausted such necessity?

And what are these legacy-standard encodings?

No new composite values will be added. - Peter Constable

The above sounds dictatorial in nature.

Dele



- Original Message - 
From: Peter Constable [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, May 04, 2004 10:27 PM
Subject: RE: Just if and where is the then?


  If a can have U+0061 and have a composite that is U+00e2...U+...
  If e can have U+0065 and have a composite that is U+00ea...U+...
 
  Then why is e with accented grave or acute and dot below cannot be
 assigned
  a single unicode value instead of the combinational values 1EB9 0301
 and
  etc
 
  Since UNICODE is gradually becoming a defacto, I still think it will
 not be
  a bad idea to have such composite values.

 The existing composites were included only out of necessity so that new
 Unicode implementations could interoperate with existing implementations
 using legacy industry-standard encodings. Apart from the backward
 compatibility issue, these composites go against Unicode's design
 principles and are not needed.

 No new composite values will be added.



 Peter

 Peter Constable
 Globalization Infrastructure and Font Technologies
 Microsoft Windows Division









Re: New contribution

2004-05-04 Thread John Cowan
Dean Snyder scripsit:

 In gross terms, I would characterize the watershed events in scripts
 used to write Hebrew as:
 
 1) adoption of the Canaanite/Phoenician alphabet
 
 2) adoption, around the time of the Babylonian exile, of Imperial Aramaic
 script (coupled with some portions of the Hebrew Bible itself being
 written in Aramaic)
 
 3) adoption of the various supra-consonantal vowel and accent systems

4) The abandonment of most of the apparatus introduced in step 3, as far
as productive use of the script is concerned, reverting to the 22CWSA.

-- 
John Cowan  [EMAIL PROTECTED]http://www.ccil.org/~cowan
Is it not written, That which is written, is written?



Re: New contribution

2004-05-04 Thread Christian Cooke
Patrick,
On 4 May 2004, at 21:27, Patrick Andries wrote:

Patrick Andries a écrit :
Christian Cooke a écrit :
Surely a cipher is by definition after the event, i.e. there must 
be the parent script before the child. Does it not follow that, by 
John's reasoning, if one is no more than a cipher of the other then 
it is Hebrew that is the cipher and so the only way Phoenician and 
Hebrew can be unified (a suggestion you'll have to assume is 
suitably showered with smileys :-) is for the latter to be 
deprecated and the former encoded as the /real/ parent script?
What is so important about genealogy ?
Let me precise this : what is so important whether we encode the 
father or one of the sons ?
[again eschewing any claim to expertise...]
 On 4 May 2004, at 17:04, John Hudson wrote:
How do you distinguish those scripts that are rejected as 'ciphers' of 
other scripts from those which you want to encode, if 1:1 
correspondence is not sufficient grounds for unification but visual 
dissimilarity is grounds for disunification?
Leaving aside the fact that the son is already encoded, I suppose I'm 
asking how a script can predate a script (Hebrew, or Aramaic so I'm 
told) it is said to be the cipher of.

Regards,
Christian



RE: Just if and where is the then?

2004-05-04 Thread Peter Constable
 The existing composites were included only out of necessity so that
new
 Unicode implementations could interoperate with existing
implementations
 using legacy industry-standard encodings. - Peter Constable
 
 Are we saying we have exhausted such necessity?

Yes, because by definition legacy industry-standard encodings not in
widespread usage prior to 1993 do not qualify for the
backward-compatibility requirement.

The necessity had to do with interoperation with existing
implementations, not with the need to support particular languages /
writing systems. For the latter, it has never been a necessity to add
pre-composed characters.



 And what are these legacy-standard encodings?
 
 No new composite values will be added. - Peter Constable
 
 The above sounds dictatorial in nature.

I'm simply telling you what the policy of the Unicode Consortium is.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




Re: Just if and where is the then?

2004-05-04 Thread Philippe Verdy
From: African Oracle [EMAIL PROTECTED]
 The existing composites were included only out of necessity so that new
 Unicode implementations could interoperate with existing implementations
 using legacy industry-standard encodings. - Peter Constable

 Are we saying we have exhausted such necessity?

 And what are these legacy-standard encodings?

I think this is the list shown in the References section of the Unicode
standard.
I don't think that this list is closed: there may be further standards
considered, notably if they reach an ISO standard status, or they start being
used extensively in some popular OS as a de-facto standard.

 No new composite values will be added. - Peter Constable
 The above sounds dictatorial in nature.

I think that the sentence is incomplete, or you interpret it the wrong way. The
key is the composite term, which here should mean a character that has a
canonical decomposition into a sequence of a base character with combining class
0, and one or more diacritics with a positive combining class.

However this is a general principle that applies to already encoded scripts that
are already widely used (notably Latin, Greek, Cyrillic, Hiragana/Katakana with
voicing marks, Han with tone marks, pointed Hebrew or Arabic, and Brahmic
scripts), but which may not apply to newly encoded scripts if they offer some
new combining diacritics and new base letters, where some compositions may be
desirable immediately due to difficulties to render the composite properly.
Some semitic scripts for example have so complex rules to create composites with
a base consonnant and combining vowel modifiers, that the whole script was
instead encoded as if it was a syllabary... (Here I think about Ethiopic, but
some have different opinions and argue that Ethiopic is a true syllabary, given
its current modern usage).




Re: Just if and where is the then?

2004-05-04 Thread African Oracle
Thanks to have taken the time to explain.

Regards

Dele

- Original Message - 
From: Peter Constable [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, May 05, 2004 12:50 AM
Subject: RE: Just if and where is the then?


  The existing composites were included only out of necessity so that
 new
  Unicode implementations could interoperate with existing
 implementations
  using legacy industry-standard encodings. - Peter Constable
  
  Are we saying we have exhausted such necessity?
 
 Yes, because by definition legacy industry-standard encodings not in
 widespread usage prior to 1993 do not qualify for the
 backward-compatibility requirement.
 
 The necessity had to do with interoperation with existing
 implementations, not with the need to support particular languages /
 writing systems. For the latter, it has never been a necessity to add
 pre-composed characters.
 
 
 
  And what are these legacy-standard encodings?
  
  No new composite values will be added. - Peter Constable
  
  The above sounds dictatorial in nature.
 
 I'm simply telling you what the policy of the Unicode Consortium is.
 
 
 
 Peter
  
 Peter Constable
 Globalization Infrastructure and Font Technologies
 Microsoft Windows Division
 
 
 





RE: UNIHAN.TXT

2004-05-04 Thread Mike Ayers
Title: RE: UNIHAN.TXT






 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of [EMAIL PROTECTED]
 Sent: Friday, April 30, 2004 12:12 AM


 Tabs... In addition to the points Mike made about the tab 
 character having
 different semantics depending on the application/platform, I 
 just don't
 think a control character like tab belongs in a *.TXT file 
 period.


 This is long past the point of opinion, however. Tabs in text files are an implementation fact, long past the point of discussion.

 Although
 UNIHAN.TXT is referred to as a database, it isn't.


 Yes, it is. Plain text databases are far more common than most people realize. The awk tool exists solely to work with them.

 Unix -vs- DOS... I'll stick with the tools I've been using 
 for a quarter century
 and their descendants, thanks just the same.


 Hmmm? You know that doesn't narrow it down any...


 With respect to 
 the idea that a 
 text editor is not the proper tool with which to open a *.TXT 
 file, well...


 I think you misunderstand. I believe the point was that text files are not universally fully interoperable. Another fact of implementation, especially when it comes to large files or files with long lines.

 I could send you the CSV file for posting, if you think 
 anyone else would
 want it.


 Give 'em the conversion script, not the CSV file!


 Doug Ewell wrote,
 
  And as John said, converting LF to CRLF is quite a simple 
 task -- it can
  even be done by your FTP client, while downloading the file -- and
  should not be thought of as a deficiency in the current plain-text
  format.
 
 Right. It's not a deficiency, it simply adds one more step 
 to a multi-step
 process for some of us.


 That step is unnecessary. A little more research on your tools will eliminate it.


 In order to see non-Latin characters in the DOS-window of 
 Windows, it's 
 necessary to install a console font covering the 
 characters, and then
 activate (or enable) that font for the console window. Everson Mono
 Terminal should work fine for non-Han characters which don't require
 complex shaping.


 I found Everson Mono, but not Everson Mono Terminal. Am I looking in the wrong place?



/|/|ike





Re: Just if and where is the then?

2004-05-04 Thread Kenneth Whistler
Dele,

 No new composite values will be added. - Peter Constable
 
 The above sounds dictatorial in nature.

Peter has already explained that this is just the nature
of the current policy regarding such additions. The reason
for the policy others in this thread have attempted to
explain. The short answer is that it would disturb the
stability of the definition of normalization of data involving
Unicode characters, and stability of normalization is
extremely important to many implementations of the standard.

This said, you need to understand that there is a learning
curve for people coming new to the Unicode Standard.

The existence of a policy which constrains certain kinds of
additions to the standard is not a matter of dictatorial
proclamations -- it is not something that Peter Constable or
any other individual has the power to impose.

Such policies arise out of the consensus deliberations of
the Unicode Technical Committee, which involve many different
members, jointly responsible for the technical content of
the standard. They are also endorsed in the Principles and
Procedures document for the ISO committee, JTC1/SC2/WG2
responsible for the parallel, de jure international character
encoding standard, ISO/IEC 10646. And in that committee,
decisions are also made based on consensus after discussion
among members of many different participating national bodies.

As for the particular issue regarding characters like {e with
dot below and acute accent}, for example, the policy is not
in place as a matter of discrimination against particular
languages or orthographies.

The *glyph* for {e with dot below and acute accent} can and
should be in a font for use with a language that requires
it. Alternatively, the font and/or rendering system should be 
smart enough to be able to apply diacritics correctly.

But the *characters* needed to represent this are already in
the Unicode Standard, so the text in question can *already*
be handled by the standard. Trying to introduced a single,
precomposed character to do this, instead, would just introduce
normalization issues into the standard without actually
increasing its ability to represent what you need to
represent.

As Peter has explained, a letter or a grapheme doesn't
necessarily have a 1-to-1 relationship to the formal,
abstract character encoded in the Unicode Standard for use
in representing text.

You had one example already: gb is a letter in Edo. That
fact is important for education, for language learning, for
sorting, and various other things. But that letter is
represented by a sequence of *characters* already encoded
in Unicode: 0067, 0062.

Likewise, if you have an acute accented e with dot below, that
may constitute a single accented letter in Edo, but it is
represented by a sequence of *characters* already encoded
in Unicode: 0065, 0323, 0301.

These decisions regarding the underlying numbers representing
these elements of text are *not* required to be surfaced up
to the level of end users. Properly operating software supporting
a particular language should present the alphabetic units and
their behavior to users they way *they* expect they should
work. The fact that Unicode systems haven't gotten there in
many cases yet is an artifact of the enormous difficulty of
getting computers to work for *all* the writing systems and
languages of the world. People are working hard on the
problem, but it is a *big* problem to solve. 

--Ken




Re: Drumming them out

2004-05-04 Thread Michael Everson
At 16:11 -0400 2004-05-04, [EMAIL PROTECTED] wrote:
OTOH, I am quite ignorant of Egyptian demotic as mentioned in the Coptic
proposal, but I am rather surprised to find that it's not on the Roadmaps
anywhere.  Is it unified with hieroglyphic?
No. We don't know enough about its repertoire size.
Finally, I have read the Coptic proposal (I missed the announcement of
it, evidently) and praise it.
One is gratified to hear it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Just if and where is the then?

2004-05-04 Thread African Oracle
Ken I appreciate your detailed response and Peter has also provided an
insightful answer. It is a learning process and I am learning everyday.

Regards

Dele Olawole

- Original Message - 
From: Kenneth Whistler [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, May 05, 2004 2:38 AM
Subject: Re: Just if and where is the then?


 Dele,

  No new composite values will be added. - Peter Constable
 
  The above sounds dictatorial in nature.

 Peter has already explained that this is just the nature
 of the current policy regarding such additions. The reason
 for the policy others in this thread have attempted to
 explain. The short answer is that it would disturb the
 stability of the definition of normalization of data involving
 Unicode characters, and stability of normalization is
 extremely important to many implementations of the standard.

 This said, you need to understand that there is a learning
 curve for people coming new to the Unicode Standard.

 The existence of a policy which constrains certain kinds of
 additions to the standard is not a matter of dictatorial
 proclamations -- it is not something that Peter Constable or
 any other individual has the power to impose.

 Such policies arise out of the consensus deliberations of
 the Unicode Technical Committee, which involve many different
 members, jointly responsible for the technical content of
 the standard. They are also endorsed in the Principles and
 Procedures document for the ISO committee, JTC1/SC2/WG2
 responsible for the parallel, de jure international character
 encoding standard, ISO/IEC 10646. And in that committee,
 decisions are also made based on consensus after discussion
 among members of many different participating national bodies.

 As for the particular issue regarding characters like {e with
 dot below and acute accent}, for example, the policy is not
 in place as a matter of discrimination against particular
 languages or orthographies.

 The *glyph* for {e with dot below and acute accent} can and
 should be in a font for use with a language that requires
 it. Alternatively, the font and/or rendering system should be
 smart enough to be able to apply diacritics correctly.

 But the *characters* needed to represent this are already in
 the Unicode Standard, so the text in question can *already*
 be handled by the standard. Trying to introduced a single,
 precomposed character to do this, instead, would just introduce
 normalization issues into the standard without actually
 increasing its ability to represent what you need to
 represent.

 As Peter has explained, a letter or a grapheme doesn't
 necessarily have a 1-to-1 relationship to the formal,
 abstract character encoded in the Unicode Standard for use
 in representing text.

 You had one example already: gb is a letter in Edo. That
 fact is important for education, for language learning, for
 sorting, and various other things. But that letter is
 represented by a sequence of *characters* already encoded
 in Unicode: 0067, 0062.

 Likewise, if you have an acute accented e with dot below, that
 may constitute a single accented letter in Edo, but it is
 represented by a sequence of *characters* already encoded
 in Unicode: 0065, 0323, 0301.

 These decisions regarding the underlying numbers representing
 these elements of text are *not* required to be surfaced up
 to the level of end users. Properly operating software supporting
 a particular language should present the alphabetic units and
 their behavior to users they way *they* expect they should
 work. The fact that Unicode systems haven't gotten there in
 many cases yet is an artifact of the enormous difficulty of
 getting computers to work for *all* the writing systems and
 languages of the world. People are working hard on the
 problem, but it is a *big* problem to solve.

 --Ken








Re: New contribution

2004-05-04 Thread Mark E. Shoulson
Peter Kirk wrote:
On 03/05/2004 19:03, Michael Everson wrote:
Wedding invitations are routinely set in Blackletter and Gaelic 
typefaces. I bet you 20 that if an ordinary Hebrew speaker sent out 
a wedding invitation in Palaeo-Hebrew no one would turn up on the day.

And I bet you 20 that is an ordinary English speaker sent out a 
wedding invitation in Suetterlin no one would turn up on the day. Now 
we just need some gullible couples to put our challenges to the test!
Well, it doesn't need to be a wedding invitation, does it?  I'll give it 
a try; I've downloaded a Stterlin font, and I'll type up a small 
document and see if I can get some English-readers to read it or 
recognize it.

Even if they can't read it, I'll bet they can recognize it as Latin 
letters and possibly English, which was not so for Paleo-Hebrew and Hebrew.

~mark



Re: New contribution

2004-05-04 Thread Dean Snyder
Michael Everson wrote at 11:07 AM on Monday, May 3, 2004:

If you think that a Hebrew Gemara, with its baroque and 
wonderful typographic richness, can be represented in a Phoenician 
font, then you might as well give up using Unicode and go back to 
8859 font switching and font hacks for Indic.

If you think that a Roman funerary inscription, with its stately and
wonderful typographic formality, can be represented in a modern LED-
inspired font, then ...


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi





Re: New contribution

2004-05-04 Thread Dean Snyder
Michael Everson wrote at 8:19 AM on Monday, May 3, 2004:

Phoenician script, on the other hand, is so 
different that its use renders a ritual scroll 
unclean. 

I'm just guessing that the same thing would be true for modern cursive
Hebrew? 

Regardless, since when is the ritual uncleanness of fonts a trigger for
encoding? Just do a Select All and change the font!


Either way, pointed and cantillated text 
displayed in a Phoenician font is a JOKE at best. 
And not a very good one.

The same could be said for accented archaic Greek - do you want to encode
archaic Greek separately?


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi





Re: Just if and where is the then?

2004-05-04 Thread jcowan
African Oracle scripsit:

 Are we saying we have exhausted such necessity?

Yes.

 And what are these legacy-standard encodings?

Those devised by ISO, various national governments, IBM, Microsoft, and Apple,
roughly speaking.

 No new composite values will be added. - Peter Constable
 
 The above sounds dictatorial in nature.

It's a statement of fact about the current intentions of the Unicode Consortium.
The time for new precomposed characters has passed.

-- 
XQuery Blueberry DOMJohn Cowan
Entity parser dot-com   [EMAIL PROTECTED]
Abstract schemata   http://www.reutershealth.com
XPointer errata http://www.ccil.org/~cowan
Infoset Unicode BOM --Richard Tobin



Re: Drumming them out

2004-05-04 Thread Mark E. Shoulson
Michael Everson wrote:
At 10:00 -0700 2004-05-04, Peter Kirk wrote:
Out of interest, are there any dictionaries e.g. of the Phoenician 
language which use both Phoenician and Hebrew script, with a plain 
text distinction?

James Kass presented a non-dictionary text the other day. I considered 
it plain text. Others didn't. 
There is no such thing as plain text on paper.
~mark



Re: New contribution

2004-05-04 Thread Dean Snyder
Michael Everson wrote at 9:26 AM on Monday, May 3, 2004:

If you people, after all of this discussion, can think that it is 
possible to print a newspaper article in Hebrew language or Yiddish 
in Phoenician letters, then all I can say is that understanding of 
the fundamentals of script identity is at an all-time low. I'm really 
surprised.

Is it possible to print a newspaper article using archaic Greek letters
and it still be legible to a modern Greek reader?  If not, are you going
to propose encoding archaic Greek separately? [As a reference, one could,
for example, take a glance at the alphabetic chart you provide in figure
1 of your proposal.)


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi





Re: New contribution

2004-05-04 Thread Simon Montagu
Mark E. Shoulson wrote:
I'd be interested in such a building.  Anyplace still using Phoenician 
script?  Aside from the Samaritans, whose script has evolved some as 
well...  Wow.
Yes, Wow was exactly my reaction too. I've put some pictures up at 
http://www.smontagu.org/PalaeoHebrew/

It's interesting that the inscription uses modern Hebrew spelling 
conventions and writes  with a mater lectionis, which it doesn't 
have in the Masoretic text of Kings.

The glyphs look to my unexpert eye more like Moabite than Paleo-Hebrew, 
but of course it's a work of art rather than a scholarly presentation, 
and the sculptor may have chosen them from aesthetic considerations.

Next time I'm there, I'll try asking some random passers-by what they 
think the script is. ;-)

Simon



Re: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread John Cowan
Mark E. Shoulson scripsit:

 If it were possible to do this, couldn't we rearrange everything so that 
 the points were NOT screwed up like they are?

No.  The numbers assigned to the various canonical combining classes
are arbitrary so they can be renumbered, but which characters belong to which
classes, and the order of the classes, are both immutable.

Think RESEQUENCE from Basic.

-- 
LEAR: Dost thou call me fool, boy?  John Cowan
FOOL: All thy other titles  http://www.ccil.org/~cowan
 thou hast given away:  [EMAIL PROTECTED]
  That thou wast born with. http://www.reutershealth.com



Re: Just if and where is the sense then?

2004-05-04 Thread R.C. Bakhuizen van den Brink [Rein]
So why can we have zillions of CJK code points and make a fuss about
a few single code points that must be composed by an ever growing 
intelligent display software that is also supposed to run on all 
platforms? 

So why are we unifying all middle east past and present
scripts? 

Why are the few academics here taking up all the band-width
in this group - how many message has mr. P.K. sent lately?

So how come the majority of Polish people living abroad - let's say
40 millions against 40 million living in Poland - is not able of
using their native characters - also called 'ogonki' - in their e-mails?

Just let us KISS

gtx, Rein
 

On Tue, 4 May 2004, African Oracle wrote:
If a can have U+0061 and have a composite that is U+00e2...U+...
If e can have U+0065 and have a composite that is U+00ea...U+...

Then why is e with accented grave or acute and dot below cannot be assigned
a single unicode value instead of the combinational values 1EB9 0301 and
etc

Since UNICODE is gradually becoming a defacto, I still think it will not be
a bad idea to have such composite values.

Dele Olawole








Re: Proposal to add QAMATS QATAN to the BMP of the UCS

2004-05-04 Thread Ernest Cline



 [Original Message]
 From: Mark E. Shoulson [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Date: 5/4/2004 7:49:45 PM
 Subject: Re: Proposal to add QAMATS QATAN to the BMP of the UCS

 Peter Kirk wrote:

  It would actually be possible, although I am not sure if it is useful, 
  to rearrange all the fixed position classes to make a space for QAMATS 
  QATAN next door to QAMATS. 

 If it were possible to do this, couldn't we rearrange everything so that 
 the points were NOT screwed up like they are?

Depends on what you mean by screwed up.

Let f(c) be a function that returns the current canonical combining
class of character c.  It is possible to change the classes so that
the value would be returned by a new function g(c) where f(c) and g(c)
are not equal for all values of c, but there are restrictions on how far
the change could be made,  In particular, for all characters x and y,
currently defined in Unicode the following must be true.

If f(x)  f(y) then g(x)  g(y).
If f(x) = f(y) then g(x) = g(y).
If f(x)  f(y) then g(x)  g(y).

Basically all this does is if there was a need to give a character a
class between the current 18 and 19, Unicode could for example
add 1 to all of the classes that are 19 or greater and give the new
character a class of 19.  If Unicode allowed non-integral combining
classes, it would be simpler to give the new character a class of 18.5.







Re: New contribution

2004-05-04 Thread Dean Snyder
[EMAIL PROTECTED] wrote at 12:44 PM on Monday, May 3, 2004:

Please take a look at the attached screen shot taken from:

www.yahweh.org/publications/sny/sn09Chap.pdf 

If anyone can look at the text in the screen shot and honestly
say that they do not believe that it should be possible to
encode it as plain text, then the solution is obvious:

We'll disagree.

Why, because you want to be able to retain in a plain text encoding the
larger font size in the heading The First Syllable 'Yah'?  ;-)

This whole document requires rich text.

If I substituted modern cursive Hebrew letter forms for the Palaeo-Hebrew
(to contrast them with the classical square Hebrew), would you want to
encode those too?


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi





RE: New contribution

2004-05-04 Thread Dean Snyder
Peter Constable wrote at 8:58 AM on Tuesday, May 4, 2004:

Ah, so the next protracted debate is going to be whether Samaritan
should also be encoded using the existing square Hebrew characters.
Since it would appear that the argument for unification of PH with
Hebrew could also argue for unification of PH with Samaritan, or of all
three.

Correct.

Samaritan, unlike Old Hebrew, which adopted Aramaic forms during and
after the Babylonian exile, has retained the Phoenician/Canaanite forms.

The main complication I see with encoding Samaritan, that is different
than what we are currently discussing, is the reality of its still-
living, long-preserved script and religious tradition.


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi





  1   2   >