Re: Proposal to add QAMATS QATAN to the BMP of the UCS
Michael Everson wrote: A new contribution. http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2755.pdf N2755 Proposal to add QAMATS QATAN to the BMP of the UCS Michael Everson Mark Shoulson Nice. 8a. Can any of the proposed characters be considered a presentation form of an existing character or character sequence? No. Is this overstating the case? As Mark said on the Hebrew list a little while ago: Things like the Simanim Tehillim and the Simanim Tiqqun are almost a poster-case of fancy text. Their very selling point is that they are clearer and make more distinctions than plain printing. It's when such conventions enter the mainstream (and there's obviously a continuum in that regard, and room for disagreement) that we start to consider them plaintext distinctions and thus to be encoded separately. I think it would be good if the proposal anticipated the objection that qamats qatan could be considered as a presentation form or glyph variation of qamats and provided the counter-arguments. (Or would answering Yes to 8a just guarantee rejection?) flippancyIsn't it a little strange that a short qamats should represented with a longer vertical than a regular qamats?/flippancy
Re: New contribution
Michael Everson wrote: No Georgian can read Nuskhuri without a key. I maintain that no Hebrew reader can read Phoenician without a key. I maintain that it is completely unacceptable to represent Yiddish text in a Phoenician font and have anyone recognize it at all. But no one is going to do that. No one is talking about doing that. This is a complete irrelevancy. JH -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: Nice to join this forum....
Michael Everson wrote: This is no different from Welsh: A B C CH D DD E F FF G NG All of those are considered letters in the Welsh alphabet. They are all significant. But that doesn't mean that ch and dd get encoded as single entities. They write c + h and d + d. In Yoruba, you treat gb as a letter. That is fine. But you encode it with g + b. Isn't there something in the FAQ about this? We've been through the discussion of digraph (and trigraph and tetragraph) encoding several times, and generally confusion stems from not understanding that higher level protocols are expected to handle rendering and things like sorting and spellchecking. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: New contribution
Michael Everson wrote: Hebrew has the same 22 characters, with the same character properties. And a baroque set of additional marks and signs, none of which apply to any of the Phoenician letterforms, EVER, in the history of typography, reading, and literature. And a baroque set of additional marks and signs, none of which apply any of the STAM letterforms... I'm not arguing against the 'Phoenician' proposal: I just don't find many of these arguments very convincing. The fact that one style of lettering sometimes has combining marks applied and another doesn't does not seem a compelling reason not to unify them. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: New contribution
Michael Everson wrote: If you people, after all of this discussion, can think that it is possible to print a newspaper article in Hebrew language or Yiddish in Phoenician letters, then all I can say is that understanding of the fundamentals of script identity is at an all-time low. I'm really surprised. I can't believe anyone is even talking about typesetting newspapers in Hebrew or 'Phoenician' letters: this is a total irrelevancy. I wouldn't typeset a Russian newspaper in 'vyaz style letters, either, but that doesn't make it a separate script from Cyrillic. Treating particular letterforms as glyph variants of existing characters does not imply that these letterforms are suitable for any text that might be encoded with those characters. So far as I can tell, no one is arguing such nonsense. The issue is not whether Palaeo-Hebrew letterforms are readable by modern Jews, or whether they may be used in religious texts -- and I note that you are not suggesting that STAM should be separately encoded, even though it is the *only* style approved for use in Torah scrolls --: the issue is how ancient texts should be encoded. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: New contribution
Mark Davis wrote: The question for me is whether the scholarly representations of the Phoenician would vary enough that in order to represent the palo-Hebrew (or the other language/period variants), one would need to have font difference anyway. If so, then it doesn't buy much to encode separately from Hebrew. If not, then it would be reasonable to separate them. Given the sophistication of today's font technology, I don't think the encoding question can be addressed in this way. Regardless of whether 'Phoenician' letterforms are separately encoded, it is perfectly easy to include glyphs for these and for typical Hebrew square script (or any of a number of other different Hebrew script styles) in a single font. If the 'Phoenician' forms are not separately encoded, they can still be accessed as glyph variants using a variety of different mechanisms. The question is whether the distinction is necessary in plain text. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: CJK(B) and IE6
On Sun, 2 May 2004 12:14:29 -0700, Doug Ewell wrote: jameskass at att dot net wrote: The BabelPad editor can easily convert between UTF-8 and NCRs... As can SC UniPad. For $199 (unless you're only interested in editing files up to 1,000 characters in length). Andrew
Re: Nice to join this forum....
From: John Hudson [EMAIL PROTECTED] Philippe Verdy wrote: I thought about missing African letters like barred-R, barred-W, etc... with combining overlay diacritics (whose usage has been strongly discouraged within Unicode). May be a font could handle theses combinations gracefully with custom glyph substitution rules similar to the automatic detection of ligatures. But may be they should not if Unicode will, in fine encode these characters separately without any canonical equivalence with the composed sequence. Having spent weeks time researching African orthographies a few years ago, I'm inclined to think that such barred letters should be separately encoded: they constitute new Latin letters, not combinations of elements within orthographies such as base letters and combining marks. A problem, however, is that many such forms are found in unstable orthographies, and are difficult to document adequately for inclusion in proposals. This last argument should not be a limitation to encode them. After all they are used for living languages in danger of extinction, and even if documents using them are rare, encoding them would help preserving these languages and helping the development of their litteracy. Without them, the instability of orthographies will always be a problem favored by absence of standard to represent them adequately in any encoding or charset, so that even book publishers and authors will need to use their own approximations or unstable private conventions to represent them. The case of Berber (in Latin script) is significant, if you just look at the number of resources on the web that use various conventions to represent its alphabet (some hacks use symbols like '$', underscores, middledots, non-combining diacritics, greek letters...) Today, a stable encoding for missing letters is the first condition to allow stabilization of orthographies, a required first step needed to develop educational contents needed to improve litteracy in the corresponding languages. This is really needed because electronic forms of texts are the most cost-effective solution to create and publish texts. Other historic mechanical solutions cost too much, and they won't be used before a sustainable usage of electronically composed publications is developped. For many languages using the Latin script, a very limited number of specific letters are needed. Encoding them and documenting them will help foundries to improve their standard electronic fonts to include the few glyphs that are needed for them. I do think that non-governmental educational organizations present in Africa to help improve litteracy would find a greater audience if they could finance the production of educational documents in the native languages, and not only in a few official languages (most often French, English and Arabic in Africa) that are still foreign to local populations that feel that these languages are the languages of the empowered government. Also, the cultural division between local populations does not help improving peace in these often troubled regions, and the promotion of culture is certainly one of the means to give back some power, proudness and freedom to these populations, as a factor for peaceful coexistence and development.
Re:CJK(B) and IE6
[Earlier posting lost, it seems.] James Kass writes: The lack of support for supplementary characters expressed in UTF-8 in the Internet Explorer is a bug. As Philippe Verdy mentions, the Mozilla browser does not have this same bug. Also it should be noted that the Opera browser handles non-BMP UTF-8 just fine. As I said in my starting message Mozilla copes with everything, both UTF8 and NCR, over the whole CJK range. However Opera (in my experience) cannot do Ext B in either UTF8 or NCR. IE6 cannot cope with Ext A in UTF8, but will do so in NCR. I attach two short files (produced by Hanfind) that include both extensions, one in UTF8 and the other NCR (except that characters given within the text are all NCR). While working with NCRs may be an ugly nightmare, there are some shortcuts. BabelPad is great, but it chokes in converting all the UTF8 in unihan.txt to NCR at one go. I wrote a dedicated program to do that. I *think* that Windows 2000 uses Unicode always internally and uses an internal conversion chart if material is non-Unicode like GB-18030. That at least is declared http://www.i18nguy.com/surrogates.html. Raymond Mercier Title: Definition Search 35BE,E4 (same as ) to beat a drum; to startle, to argue; to debate; to dispute, (interchangeable ) to be surprised; to be amazed; to marvel, (interchangeable ) the blade or edge of a sword, beams of a house3754,YAO4 deep bottom; the southeast corner of a house3762,YU3 (same as ) a house; a roof, look; appearance, space376A,DIAN4 DING3 a slanting house, nightmare386F,ZHAI2 (ancient form of ) wall of a building, a house, to keep in the house, thriving; flourishing, blazing, (ancient form of ) legal system; laws and institutions, to think; to consider; to ponder; to contemplate386F,ZHAI2 (ancient form of ) wall of a building, a house, to keep in the house, thriving; flourishing, blazing, (ancient form of ) legal system; laws and institutions, to think; to consider; to ponder; to contemplate3870,YU3 (large seal type ) a house; a roof, appearance, space; the canopy of heaven, to cover3875,LING2 roof of the house connected3878,ZHA3 ZHA4 a house; an unfinished house, uneven; irregular; unsuitable; ill-matched, tenon3878,ZHA3 ZHA4 a house; an unfinished house, uneven; irregular; unsuitable; ill-matched, tenon387A,DAN4 a cottage; a small house, a small cup3882,YAN3 (terrains) of highly strategic; precipitious (hill, etc. a big mound, (same as VEA 3888) a collapse house, to hit, to catch something3888,TUI2 a collapsed house, (same as ) to heap up; to pile388E,CHA4 ZE2 ZHAI2 ZHE2 hide; conceal, a house not so high3891,TUI2 (corrupted form of VEA 3888) a collapsed house, (same as ) to heap up; to pile3892,CHA2 an almost collapsing house3896,not available a store house, to store3897,QIAO4 a high house; a high building389A,LU3 a corridor; a hallway; rooms around the hall (the middle room of a Chinese house), a nunnery; a convent, a cottage; a hut, a mansion389D,not available cottage; a coarse hourse, house with flat roof389E,YI4 rooms connected, moveable house ( a yurt, a portable, tentlike dwelling used by nomadic Mongols)3B7D,DI3 (non-classical form of ) root; foundation; base, eaves of a house; brim3BEA,LING2 (same as ) carved or patterned window-railings; sills, the wooden planks which join eaves with a house3C03,MIAN2 (same as U+6AB0) a tree, the bark of which is used in medicine-- Eucommia ulmoides, an awning of the house3C05,DI2 (same as ) eaves of a house; brim, part of a loom, the cross beams on the frame on which silkworms spin, a bookcase, to abandon or give up3F1F,BAI2 a tiled house, brick wall of a well414A,DU4 a spacious house, (corrupted form of ) bundle of rice plant, name of a place4196,HONG2 a big house, (same as ) great; vast; wide; ample41A7,not available (same as ) a cave; a den, living quarters; a house, to hide; to harbor41B2,not available a spacious house, emptiness41B5,CHENG2 an echo, a high and deep; large; big; specious house41B8,CHENG2 spacious; capacious, sound (of the house), a picture (on silk) scroll45D4,HOU2 a house-lizard or gecko, a kind of insect; living in the water4997,XU4 (same as ) quiet (house, surrounding, etc.)4CF8,MA2 MAI2 the wild goose, sparrow; the house-sparrow4D47,XIAN4 to dislike; to reject; to hate, a house; a building4D7A,TING3 (same as )boundary between agricultural lands, (in Japan) a street; a city block, ant hill; formicary, vacant land by the side of a house; a paddock, deer trace; deer track5740,ZHI3 site, location, land for house5885,SHU4 villa, country house58C1,BI4 partition wall; walls of a house5B87,YU3 house; building, structure; eaves5BA4,SHI4 room, home, house, chamber5BB6,JIA1 JIE5 GU1 house, home, residence; family5C4B,WU1 house; room; building, shelter5EB3,BEI1 BI3 a low-built house5EC6,HUI4 GUI1 WEI3 a room; the wall of a house a man's name623F,FANG2 PANG2 house, building; room680B,DONG4 main beams supporting house68DF,DONG4 the main
Re: Pal(a)eo-Hebrew and Square Hebrew
From: Dean Snyder [EMAIL PROTECTED] Patrick Andries wrote at 8:55 AM on Monday, May 3, 2004: I got this answer from a forum dedicated to Ancient Hebrew : « Very few people can read let alone recognize the paleo Hebrew font. Most modern Hebrew readers are not even aware that Hebrew was once written in the paleo Hebrew script. The same could be said for archaic Greek versus modern Greek - do you propose to encode archaic Greek separately? Why not? If it helps serving better the scholars, searchers, students, and script fans so that they will more accurately represent this historic script than with the modern form. After all, when I look at some medieval French texts written with what we call écriture gothique, with its historic orthograph and letters (with long s notably, and with the absence of modern accents, and very distinct and complex letter shapes), many French natives will have lots of difficulties to recognize it as French, thinking that this could be written in Latin. They will recognize that these letters are really beautiful, but will be often intrigated by some of them, where some letters are misidentified (b/p, o/u/v, d/a, i/n/u...), Uppercase letters are even more difficult to decipher... This is what appears with publications with careful typography. The situation is even worse with manuscript written with a plum (which very similar to the German Sutterlin). We don't need to go too far in the history to find during WW1 handwritten letters of soldiers to their family, using letter forms that were commonly taught in schools at that time (most of these letters are extrermely stable in their letter forms and carefully drawn, in a typographic view): very difficult to read by most French natives, despite it is really using the same modern popular French language and vocabulary as used and understood today...
Re: 05A2 or 05BA? (was: Proposal to add QAMATS QATAN to the BMP of the UCS)
From: Michael Everson [EMAIL PROTECTED] A new contribution. http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2755.pdf N2755 Proposal to add QAMATS QATAN to the BMP of the UCS Michael Everson Mark Shoulson I note that your document uses inconsistently two different code points: it proposes the inclusion of U+05BA, but documents U+05A2 in the proposed Unicode Character Properties... Both code points are unassigned in Unicode. Which one is proposed?
RE: Proposal to add QAMATS QATAN to the BMP of the UCS
[Original Message] From: Michael Everson [EMAIL PROTECTED] A new contribution. http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2755.pdf N2755 Proposal to add QAMATS QATAN to the BMP of the UCS Michael Everson Mark Shoulson Given the description in the proposal which indicates that this character has its origin as a glyphic variant of QAMATS, It would seem to me that it would be appropriate that this new character's canonical combining class should either be the same as that of QAMATS which is 18 or perhaps a new fixed position combining class and not the 220 given in the proposal for QAMATS QATAN. The sequemce 05D2 05A4 05B8 normalizes to 05D2 05B8 05A4 placing the vowel point QAMATS before the cantilation mark. However as proposed, 05D2 05A4 05BA would remain in that order, leaving QAMATS QATAN as the only Hebrew vowel point that does not uniformly normalize to be before cantillation marks. *** Also, the proposal gives two different potential codepoints for QAMATS QATAN, refereing to it in one place as 05BA and in another as 05A2. While both are unused codepoints, it would probably be better to place it among the other vowel points which would make 05BA the better choice.
Re: Pal(a)eo-Hebrew and Square Hebrew
Dean Snyder a écrit : Patrick Andries wrote at 8:55 AM on Monday, May 3, 2004: I got this answer from a forum dedicated to Ancient Hebrew : « Very few people can read let alone recognize the paleo Hebrew font. Most modern Hebrew readers are not even aware that Hebrew was once written in the paleo Hebrew script. The same could be said for archaic Greek versus modern Greek - do you propose to encode archaic Greek separately? [PA] I'm proposing nothing here, I'm just forwarding an answer, When the text was written in the paleo Hebrew four of the Hebrew letters were used as vowels - aleph, hey, vav and yud, but were removed from the text when the masorites added the vowel pointings. This is evident in the Dead Sea Scrolls where the four letters are found in the words but removed in the Masoretic text. This is simply not true. [PA] So there were Dead Sea Scrolls written in Square Hebrew with matres lectionis ? (I don't know, I just would like to know.) P.A.
A binary file format for storing character properties
At this time there are about 160 different character properties defined in the UCD. In practice most applications probably only use a limited set of properties to work with. Nevertheless applications should be able to lookup all the properties of a code point. Compiling-in lookup tables for all defined properties (including Unihan) makes small applications become rather big. This made me decide to create a binary file format for storing character properties and initialize property lookup tables on demand. Benefits of using run-time loadable lookup tables initialized from binary files are: - no worries about total table size, since data will only be loaded on demand - initializing lookup tables from a binary file is relatively fast - property lookup files can be locale specific (useful for character names and case mappings for example) - new properties can be added quickly and never affect layout or content of other tables - any number of properties can be supported including custom (non-Unicode) properties - by initializing a lookup table from two sources (UCD-based and vendor-based), applications can overload the default property values assigned to PUA characters with private property values The file format I've implemented is capable of storing any type of property. Each file contains property values for one property (no more squeezing as much property values as possible in as few bits as possible). The format is called UPR (Unicode PRoperties). I have written a tool to generate the necessary UPR files from the UCD. A small C-library for reading a UPR file into a property lookup table, and a high-level library which provides property lookup functions for *all* Unicode properties in 4.0.0 are also available. For more information on the file format and related software see: http://www.let.uu.nl/~Theo.Veenker/personal/projects/upr/. My primary development platform is UNIX/Linux, but you can compile and run it under Windows as well (less tested however). Current version supports UCD 4.0.0, I will add support for 4.0.1 soon. Please check it out. Feedback is welcome. Regards, Theo Veenker
[Fwd: Re: New contribution]
03/05/2004 05:19, Michael Everson wrote: Suetterlin. Oh shut UP about Sütterlin already. I don't know where you guys come up with this stuff. Sütterlin is a kind of stylized handwriting based on Fraktur letterforms and ductus. It is hard to read. It is not hard to learn, ... Since when is this an argument ? Neither is Phoenician hard to learn (22 letters with no contextual variants, etc.)... Could we please remain courteous ? ... and it is not hard to see the relationship between its forms and Fraktur. ... The relationship is not at all apparent to someone that reads only the Latin Script and does not know the genealogy from the Fraktur Script to the German Script (as Sütterlin was also called). (I like mentioning that people saw them as different scripts.) Quite analogous to a set of historically related Northern Semitic scripts, and obviously if you have learned the genealogy of these scripts it is easy to recognize the relationship... P. A.
Re: New contribution
At 23:08 -0400 2004-05-03, John Cowan wrote: [EMAIL PROTECTED] scripsit: Those objections are quite generic and could be made just as well for N'ko, Ol Cemet', Egyptian Hieroglyphics, c. But there is no clear-cut alternative for any of those. N'ko encoding is font-kludge, Unicode, or nothing. Here there is a fourth possibility: decide that Phoenician is a script variant in the sense of ISO 15924. But it would be wrong to do that. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New contribution
At 03:01 + 2004-05-04, [EMAIL PROTECTED] wrote: John Cowan wrote, (And to the last, I'd be tempted to add: If so, what on Earth could those objections be?) Expense. Complication. Delays while the encoding gets into the Standard and thence into popular operating systems, with all the accoutrements such as keyboard software. Those objections are quite generic and could be made just as well for N'ko, Ol Cemet', Egyptian Hieroglyphics, c. While those objections might be voiced by actual users, none of those objections should impact the decision making process. Hear, hear. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Proposal to add QAMATS QATAN to the BMP of the UCS
At 00:19 -0400 2004-05-04, Ernest Cline wrote: It would seem to me that it would be appropriate that this new character's canonical combining class should either be the same as that of QAMATS which is 18 That is correct. We overlooked the properties line in the proposal, the template for which was the earlier ATNAH HAFUKH document. Sorry about that. It should read: 05BA;HEBREW POINT QAMATS QATAN;Mn;18;NSM;N;;*;;; ... unless there is an additional error ;-) Thanks for reading the proposal. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New contribution
At 20:37 -0800 2004-05-03, D. Starner wrote: Again, change Hebrew to Latin and palaeo-Hebrew to Fraktur and see how many objections you get. I should think far fewer; the legibility quotient is much different. I have said before: Set a German or Danish or Icelandic wedding invitation in Fraktur. No problem. Set an Irish wedding invitation in Gaelic. No problem. Set a Hebrew wedding invitation in Palaeo-Hebrew. Problem. It's easy to decry Fraktur and Gaelic are hard to read but they AREN'T, and their use in invitations, menus, and signage is testament to that. The same does not obtain with Phoenician letterforms and Hebrew. Again, no, you can't use archaic forms of letters in many situations, but that doesn't mean they aren't unified with the modern forms of letters. From where I sit, it sure does. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New contribution
At 11:42 -0700 2004-05-03, John Hudson wrote: Michael Everson wrote: Hebrew has the same 22 characters, with the same character properties. And a baroque set of additional marks and signs, none of which apply to any of the Phoenician letterforms, EVER, in the history of typography, reading, and literature. And a baroque set of additional marks and signs, none of which apply any of the STAM letterforms... Stam are clearly letterforms belonging to the Square Hebrew tradition. Phoenician letterforms do not. Historical relationships have, do, and will continue to inform some of the choices we make in determining what to encode in the Universal Character Set. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New contribution
A possible question to ask which is blatantly leading would be: Would you have any objections if your bibliographic database application suddenly began displaying all of your Hebrew book titles using the palaeo-Hebrew script rather than the modern Hebrew script and the only way to correct the problem would be to procure and install a new font? Again, change Hebrew to Latin and palaeo-Hebrew to Fraktur and see how many objections you get. Again, no, you can't use archaic forms of letters in many situations, but that doesn't mean they aren't unified with the modern forms of letters. No one would have procure and install a new font, because Arial/Helevica/FreeSans/misc-fixed have the modern form of Hebrew and will always have the modern form of Hebrew and all other scripts that have a modern form. I mean, maybe you're right and Phonecian has glyph forms too far from Hebrew's to be useful, and it's connected with Syriac and Greek as much as Hebrew, but this argument just doesn't fly. -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
Re: New contribution
At 12:13 -0700 2004-05-03, John Hudson wrote: Michael Everson wrote: No Georgian can read Nuskhuri without a key. I maintain that no Hebrew reader can read Phoenician without a key. I maintain that it is completely unacceptable to represent Yiddish text in a Phoenician font and have anyone recognize it at all. But no one is going to do that. No one is talking about doing that. This is a complete irrelevancy. No, it is not. If Phoenician letterforms are just a font variant of Square Hebrew then it is reasonable to assume that readers of Square Hebrew will accept them in various contexts. Such as newspaper articles, or advertising copy, or restaurant menus, or wedding invitations. THAT is font switching. I consider this fundamental to script identification. The accident of 1:1 correspondence to another alphabet is not, in my view, sufficient justification for unifying them. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Drumming them out
At 11:53 +1000 2004-05-01, Nick Nicholas wrote: Coptic could have stayed unified with Greek, Certainly not! and myself I'm still not convinced the distinction between Greek and Coptic in bilingual editions is not truly just a font issue. Plain-text searching of Crum's dictionary, for instance, is a perfectly valid requirement, and one which was brought to bear on the disunification. So the question again becomes, not whether the scripts are historically or graphemically distinct, but what the body of users is that wants them disunified. The distinction itself is a strong reason to disunify. We've done that with other scripts. And we will again, I'll warrant. And the fonts are k00l crowd of enthusiasts :-) which the review of hieroglyphics has already mentioned; and I know we shouldn't dismiss them out of hand and all, but why can't they be accommodated by a font switch too? Because we are beyond ASCII font hacks. The Phoenician block will allow font switching between a recognizably similar family of writing systems. Same as we have for Syriac, or for Old Italic. And remember -- most Etruscan scholars transliterate. But Unicode is not elitist. It's universal. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re:CJK(B) and IE6
Raymond Mercier wrote, BabelPad is great, but it chokes in converting all the UTF8 in unihan.txt to NCR at one go. I wrote a dedicated program to do that. Options - Advanced Options - (Edit Options) - Make sure the box for Enable Undo/Redo is not checked. Yes, when the commas in UNIHAN.TXT were being globally replaced with middle dots here, BabelPad stopped responding. But then, Andrew wrote to the list with a tip about the undo/redo feature. (Just in time, I was going to write a dedicated program.) When making global changes in such a large file, Options - Advanced Options - (Edit Options) - Make sure the box for Enable Undo/Redo is not checked. Best regards, James Kass
RE: New contribution
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Hudson No Georgian can read Nuskhuri without a key. I maintain that no Hebrew reader can read Phoenician without a key. I maintain that it is completely unacceptable to represent Yiddish text in a Phoenician font and have anyone recognize it at all. But no one is going to do that. No one is talking about doing that. This is a complete irrelevancy. Michael's argument here is based on the premise that if the communities that use script A cannot readily interpret text in their language when written with a written variety (and distinct-script candidate) B, then B is distinct from A. It *is*, IMO, a valid consideration, but it alone isn't a sufficient criterion. Note, for instance, that one could apply that argument to try to justify a Latin cipher. Peter Constable
Re: New contribution
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Francois Yergeau Suppose I were to float a proposal to encode Old Latin, consisting of the original 23-letter unicameral alphabet. Try this on for size: It is false to suggest that fully-[accented, cased Vietnamese] text can be rendered in [Old Latin] script and that this is perfectly acceptable to any [Vietnamese] reader (as would be the case for ordinary font change). Would you agree to encode Old Latin on those grounds? I think there is a difference between this hypothetical example and the PH case: the Old Latin doesn't have the accents, but if you used the 23 uni-cameral characters for Vietnamese text, then surely a Vietnamese speaker would recognize it as caseless Vietnamese with the accents stripped off. And it's easy to see how the accents could be added to Old Latin to make it even closer: lower-cased Vietnamese text. But if you took Biblical Hebrew text and set it with PH glyphs w/o accents, there are a lot of people that know Biblical Hebrew who would not recognize this sample as Biblical Hebrew. And there is no obvious way to add the accents, but even if there were, I suspect those same people still wouldn't recognize it as accented Hebrew with archaic glyphs. So, while Michael's argument was flawed in the way he expressed it, I think your counter-argument also is flawed. Peter Constable
Re: New Contribution: In support of Phoenician from a user
Peter Kirk [EMAIL PROTECTED] wrote: On 02/05/2004 11:57, Deborah W. Anderson wrote: As one coming from the world of ancient Indo-European (IE) and as editor of a journal on IE out of UCLA, I am in support of the Phoenician proposal. Thank you, Deborah. You have given what is to me a much better argument for separate encoding of the Phoenician script than any that I have seen before, from the proposer or anyone else. I find your point about ensuring that XML documents are correctly displayed especially significant. If your support had been cited in the original proposal with your arguments, rather a lot of spilled electrons could have been saved. Well, I guess it is not too late to include them in a revised proposal. No need to add it to the proposal itself - something like this should really be formally submitted to WG2 as a separate document in support of the proposal. - Chris
Re: Defined Private Use was: SSP default ignorable characters
Doug Ewell [EMAIL PROTECTED] wrote: C J Fynn cfynn at gmx dot net wrote: Philippe Verdy [EMAIL PROTECTED] wrote: Certainly, but what is the distinction between downloading/ distributing a font or downloading/ditributing a XML file containing the PUA conventions? One file not two - and some assurance that the custom properties haven't been altered since the font and the document that uses it were created. I didn't see Philippe's original post, of course, for reasons that many list members will remember. But this response from Chris piqued my curiosity. So I went digging into my Deleted Items folder, found the relevant post from Philippe, and guess what? A miracle happened. I AGREE WITH PHILIPPE. That is, if there is ever to be a mechanism for specifying properties of PUA characters at the user level (Mark Davis' expectation notwithstanding), I agree that it should live in an external file or table or other data structure, not within a font. And XML would be a perfectly suitable format for distributing such a property file. Not all font formats, not even all smart font formats, can contain all of the property information for every character the font supports. OpenType/Uniscribe was mentioned as an example where the rendering engine does work that would be done by the font in other systems. The division of labor between font and engine isn't the same across systems. And even if you can tell the font about the directionality and default-ignorability of your characters, there are still issues like line breaking and mirroring (and maybe others, or maybe those are bad examples) that have to be handled outside the font anyway. Putting all the property information inside the font forces the user to use *only that font* for his PUA needs. There might be a choice of fonts that support a particular PUA usage (such as for Klingon -- Mark Shoulson, is this true?) and it would not make sense to require all of these fonts to be updated to include property information (if that is even possible). Better to store the property information separately and make it work for any old font the user chooses. Storing the custom properties in the font doesn't really provide any assurance that they haven't been altered. Phillipe's suggestion is good. I've no real objection to storing the property information in an external XML file - storing them in a font table was just a suggestion. However, even if some of the info has to be handled outside of the font rendering system you could store any kind of property info in any sfnt format font (TT, OT. AAT, Graphite) which allows you to add custom tables - so long as the specification for such a table had was designed to hold all the properties that might be needed. I'm not sure whether anyone would want to use non-standard properties for such PUA text where they didn't have a font that supported the properties for display. Given the nature of the PUA, generic property files supposed to work for any old font the user chooses might be problematic. Where a script hasn't been standardized different developers might wish to use different character properties. One of the reasons I suggested putting the properties in the font was that you would then be fairly certain of having the properties that font was designed to work with (and avoid the need of having someone maintain something like a Con-Script character properties registry). Anything like this should of course be expressly limited to PUA characters. - Chris
Re: New contribution
John Hudson [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: While the fact that it's called Phoenician script doesn't prove anything about its origin, it might be considered indicative of the path through which the script was borrowed. Indeed. This is the point I made earlier: Greco-centric European scholarship of writing systems calls the script 'Phoenician' because the Greeks derived their alphabet from trade contact with the Phoenicians. As should be obvious from recent debate, semiticists look at the old Canaanite writing systems in a different way. So are Greco-centric European scholars / Indo-Europeanists the user community which some were trying to say doesn't exist? - Chris
RE: New contribution
What are the directional properties of Pheonician? Is it RTL only, or was it ever written with a different directionality? Peter Constable
RE: New contribution
Peter Constable wrote: the Old Latin doesn't have the accents, but if you used the 23 uni-cameral characters for Vietnamese text, then surely a Vietnamese speaker would recognize it as caseless Vietnamese with the accents stripped off. ... So, while Michael's argument was flawed in the way he expressed it, I think your counter-argument also is flawed. Hmmm, I'm not sure it's flawed. Sure, recognizability makes it non-equivalent to the Phoenician-Hebrew case, but it still demonstrates that a subset-superset relationship between purported scripts A and B does not make them distinct. Recognizability is a much better argument, IMHO, but then there's Sütterlin... And cyphers, as you mention in another message. -- François
Re: New contribution
Peter Constable scripsit: 2) the characters in question are structurally / behaviourally very similar to square Hebrew characters, but not to the characters of other scripts Not just very similar: structurally, behaviorally, and even phonemically identical. Item 1, I think we'd agree, is just wrong. Item 2 is probably true. But is it enough to refer to square Hebrew as the modern form of Phoenician (Old Canaanite, whatever you want to call it)? Well, one of the two modern forms, Samaritan being the other. -- John Cowan [EMAIL PROTECTED] www.reutershealth.com www.ccil.org/~cowan It's the old, old story. Droid meets droid. Droid becomes chameleon. Droid loses chameleon, chameleon becomes blob, droid gets blob back again. It's a classic tale. --Kryten, Red Dwarf
Re: New contribution
Peter Constable scripsit: What are the directional properties of Pheonician? Is it RTL only, or was it ever written with a different directionality? It's RTL only, except to the extent that you consider Archaic Greek a script variant of Phoenician. :-) -- John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan www.reutershealth.com Any sufficiently-complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp. --Greenspun's Tenth Rule of Programming (rules 1-9 are unknown)
Re: Nice to join this forum....
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: A problem, however, is that many such forms are found in unstable orthographies, and are difficult to document adequately for inclusion in proposals. This last argument should not be a limitation to encode them. After all they are used for living languages in danger of extinction, and even if documents using them are rare, encoding them would help preserving these languages and helping the development of their litteracy. This is expressly NOT a goal of Unicode and ISO/IEC 10646: to encode newly invented, possibly ephemeral, letters on the basis that doing so might encourage literacy and save a language from extinction. As someone once said -- I don't know who, but it sounds like John Cowan -- we already have several hundred Latin letters in Unicode; it shouldn't be difficult to pick one of those when developing a new orthography, instead of inventing yet another way to write [t]. The danger of encoding novel characters on speculation that they might be useful is that if they *don't* turn out to be useful, or if a revised version of the orthography replaces them with something else, Unicode and 10646 are stuck with unwanted characters, which cannot be removed for stability reasons. The Euro sign is a classic counterexample where strong promises of stability and usefulness (which have been amply borne out) outweighed the newly invented nature. See the Principles and Procedures document for more information. Without them, the instability of orthographies will always be a problem favored by absence of standard to represent them adequately in any encoding or charset, so that even book publishers and authors will need to use their own approximations or unstable private conventions to represent them. This is a problem; in an increasingly Unicode world, it is more difficult than ever to print and interchange one's characters if they are *not* in Unicode. But the burden should still be on the proponents of such a character to prove that it is in actual, stable use, and that the need to print and interchange is real. Otherwise, it's PUA time. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
RE: New contribution
Hmmm, I'm not sure it's flawed. Sure, recognizability makes it non-equivalent to the Phoenician-Hebrew case, but it still demonstrates that a subset-superset relationship between purported scripts A and B does not make them distinct. Whatever the logic in the examples, I certainly agree that a superset does not imply a distinct script. Peter Constable
RE: New contribution
Item 1, I think we'd agree, is just wrong. Item 2 is probably true. But is it enough to refer to square Hebrew as the modern form of Phoenician (Old Canaanite, whatever you want to call it)? Well, one of the two modern forms, Samaritan being the other. Ah, so the next protracted debate is going to be whether Samaritan should also be encoded using the existing square Hebrew characters. Since it would appear that the argument for unification of PH with Hebrew could also argue for unification of PH with Samaritan, or of all three. Peter Constable
RE: Proposal to add QAMATS QATAN to the BMP of the UCS
At 07:34 -0700 2004-05-04, Peter Constable wrote: 05BA;HEBREW POINT QAMATS QATAN;Mn;18;NSM;N;;*;;; Well, of course, the effect of this is that a sequence of qamats, qamats qatan is not canonically equivalent to qamats qatan, qamats . No harm in that, but also not especially useful, I suspect. Mark Shoulsons says that since QAMATS QATAN is a flavour of QAMATS, it should behave like QAMATS. Regarding canonical equivalence, having both QAMATS and QAMATS QATAN on a single base letter would be pathological, so it doesn't really matter. I would probably leave the value at 220. That is what all of the Hebrew vowel points should have been, IMO. Though getting one right doesn't make a huge difference -- people are still going to be using CGJ to preserve particular sequences in the cases this will most likely be needed. Mark says that should have been is great, but fixing one point is of no particular utility. For my own part, I have no strong view on this matter. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Nice to join this forum....
Philippe Verdy wrote: A problem, however, is that many such forms are found in unstable orthographies, and are difficult to document adequately for inclusion in proposals. This last argument should not be a limitation to encode them. After all they are used for living languages in danger of extinction, and even if documents using them are rare, encoding them would help preserving these languages and helping the development of their litteracy. You misunderstand me. I was not indicating the scarcity of documents (although that can also be a problem), and I certainly wasn't suggesting that documentation problems should impede encoding. I'm talking about unstable orthographies, such that the documents you may have -- even as recent as thirty years ago -- do not necessarily reflect current usage in the country in question. Some African countries have strong language standardisation organisations, e.g. Ghana, but in others orthographies are being developed by individual linguists and missionary translators, and there may be competing orthographies and disagreement over which should be adopted as official. On the one hand, one can make the argument that anything that is used or has been used in documents should be encoded -- which is also the approach I would favour --, but then you are likely to get African governments asking 'Why did you encode that? We don't use that. It isn't official.' You also get software developers coming along wanting to know what they need to support for a given language, and you can't give them a clear answer because the orthographies are unstable. Again, none of these factors prevent encoding of new characters, but it is a good idea to be aware of the uncertainty in the writing of many African languages, and prepared to respond to queries or objections regarding specific characters. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: Nice to join this forum....
From: Doug Ewell [EMAIL PROTECTED] The danger of encoding novel characters on speculation that they might be useful is that if they *don't* turn out to be useful, or if a revised version of the orthography replaces them with something else, Unicode and 10646 are stuck with unwanted characters, which cannot be removed for stability reasons. This depends who is making such proposals. When a non-governmental organization gets some support from a UN institution for education (UNICEF for example), some studies may be started to create or stabilize an orthograph, create a dictionnary, guides for a language grammar or for its translation. Phonetics of endangered languages becomes then important to help maintain this language in its litterary form. Some languages have quite unique sounds, but could look ugly and uneasy to teach if it uses too many diacritics or symbols from an IPA notation. Today it seems reasonnable to promote the adoption of an alphabet based on existing alphabets, but avoiding digraphs can be a requirement, at least for the initial promotion of the litterary form of the spoken language. Also, the importance of surrounding languages in the same area may ease the transition for teaching the local language using the same letters if possible, so that the minority language gets a more immediate support by educated people in that country that are manily taught another official language. So there are reasonnable cases where it is desirable to borrow some lateral conventions on letter forms but to respect also the uniqueness of the language to represent with an orthographic system based on a new alphabet. To achieve this goal, some letters need sometimes to be invented by modification of other existing near letters. When such program succeeds, some representative books will be published with that orthograph, and the most useful ones will be for educational purpose (including religious sacred books like Bible and Quran, if they can be translated accurately into the minority language, as religion is a good motivation to incite people to get litteracy, and get themselves a correct reading of the true text, and then use their litteracy knowledge for commerce, local economical development, or preservation and transmission of their culture).
Re: New contribution
Michael Everson wrote: No Georgian can read Nuskhuri without a key. I maintain that no Hebrew reader can read Phoenician without a key. I maintain that it is completely unacceptable to represent Yiddish text in a Phoenician font and have anyone recognize it at all. But no one is going to do that. No one is talking about doing that. This is a complete irrelevancy. No, it is not. If Phoenician letterforms are just a font variant of Square Hebrew then it is reasonable to assume that readers of Square Hebrew will accept them in various contexts. Such as newspaper articles, or advertising copy, or restaurant menus, or wedding invitations. THAT is font switching. I consider this fundamental to script identification. Okay, than I fundamentally disagree with you. Good to have that clear. How do you distinguish those scripts that are rejected as 'ciphers' of other scripts from those which you want to encode, if 1:1 correspondence is not sufficient grounds for unification but visual dissimilarity is grounds for disunification? John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
RE: Proposal to add QAMATS QATAN to the BMP of the UCS
Mark Shoulsons says that since QAMATS QATAN is a flavour of QAMATS, it should behave like QAMATS. True, but giving it the same fixed-position class actually creates a distinction, though not a particularly significant one. Regarding canonical equivalence, having both QAMATS and QAMATS QATAN on a single base letter would be pathological, so it doesn't really matter. Agreed. But having qamats qatan and a class-220 accent would not. I would probably leave the value at 220. That is what all of the Hebrew vowel points should have been, IMO. Though getting one right doesn't make a huge difference -- people are still going to be using CGJ to preserve particular sequences in the cases this will most likely be needed. Mark says that should have been is great, but fixing one point is of no particular utility. It provides improvement for very rare possibilities, which is indeed marginal and only a minor drop in the larger bucket. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: Proposal to add QAMATS QATAN to the BMP of the UCS
OK, I don't care whether it is 18 or 220, and I am not qualified to decide. You and Mark (and whoever else cares) can duke this one out. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New contribution
Hullo, I'll claim the immunity of the ill-informed in contributing this but... On 4 May 2004, at 17:04, John Hudson wrote: Michael Everson wrote: No, it is not. If Phoenician letterforms are just a font variant of Square Hebrew then it is reasonable to assume that readers of Square Hebrew will accept them in various contexts. Such as newspaper articles, or advertising copy, or restaurant menus, or wedding invitations. THAT is font switching. I consider this fundamental to script identification. How do you distinguish those scripts that are rejected as 'ciphers' of other scripts from those which you want to encode, if 1:1 correspondence is not sufficient grounds for unification but visual dissimilarity is grounds for disunification? Surely a cipher is by definition after the event, i.e. there must be the parent script before the child. Does it not follow that, by John's reasoning, if one is no more than a cipher of the other then it is Hebrew that is the cipher and so the only way Phoenician and Hebrew can be unified (a suggestion you'll have to assume is suitably showered with smileys :-) is for the latter to be deprecated and the former encoded as the /real/ parent script? Christian
Re: Pal(a)eo-Hebrew and Square Hebrew
On 03/05/2004 11:47, Patrick Andries wrote: Peter Kirk a écrit : On 03/05/2004 05:55, Patrick Andries wrote: ... When the Biblical text is written in paleo Hebrew there are no vowel pointings. When the text was written in the paleo Hebrew four of the Hebrew letters were used as vowels - aleph, hey, vav and yud, but were removed from the text when the masorites added the vowel pointings. This is evident in the Dead Sea Scrolls where the four letters are found in the words but removed in the Masoretic text. No. The DSS, or nearly all of them, are in square script, and this indicates that the (partial) removal of these additional letters (if that is indeed a correct way to describe what happened) took place long after the transition from paleo-Hebrew to square script. Do I understand from your remark that the Square Script DSS use matres lectionis ? P. A. Yes. The Masoretic text Hebrew uses matres lectionis (though not alef as one, except perhaps in the Aramaic portions). The earlier square script DSS use more of them. Most paleo-Hebrew texts use very few if any of them, because they were only starting to be used in pre-exilic times. I'm not sure about later paleo-Hebrew texts like the few paleo-Hebrew DSS. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Arid Canaanite Wasteland (was: Re: New contribution)
On 02/05/2004 16:26, Michael Everson wrote: At 11:06 -0700 2004-05-02, Peter Kirk wrote: Michael Everson, who knows so little Phoenician that he doesn't know how similar it is to Hebrew? You are confusing language and script. I am not encoding the Phoenician language. ... No, I am not, despite you and James trying to claim that I am, and despite your attempt to label a script with the name of just one of the languages using it, which is not only confusing but historically of doubtful accuracy. My point was that you cannot claim to be a user of the Phoenician script if you are not familiar with the Phoenician language. More accurately I should have said, if you are not familiar with any of the languages written with the Phoenician script. This group is (apparently apart from the Edessa inscription just mentioned) a small set of closely related languages in which you do not seem to be an expert. ...I am encoding a set of genetically related scripts with similar behaviours, which differ from Hebrew in shape (but which are similar in shape themselves) and in function (Hebrew has grown enormously complex with its representation. I believe that if you take a pointed and cantillated Hebrew text and were to change the font to Phoenician you would end up with something that is, plain and simply, utterly wrong. Well, you would end up with something novel and not widely understood, just as you would if you used a Fraktur font to display a Vietnamese text complete with multiple diacritics. You can't stop people encoding garbage in Unicode if they want to. ... Anyone else? Perhaps one or two, and no evidence for a group. Not nearly as many as want Klingon encoded. Do they have an actual use for the script? It is a Universal Character Set. It is not a character set for Certain Kinds of Semiticists Who Think That Everything Is Hebrew. The Phoenician script has other clients. ... OK, if you say so, but then, name names, or at least demonstrate the truth of this statement. According to your proposal, you have not been in contact with any users of the Phoenician script, but I suppose you could still know who they are. But then Deborah Anderson has just stated that she is a user of it, and I know you have had extensive contact with her. I thought of accusing you of lying in the proposal, but it is possible that you were unaware that she is a user. I suggest that your revise your proposal to mention your contact with her, and preferably to summarise her good reasons for supporting your proposal. ... Runic has specialist and non-specialist clients. Gothic has specialist and non-specialist clients. Egyptian has specialist and non-specialist clients. Children learning about the history of their alphabets are arguably more important than narrow-minded pendants who think that by bluster they can detract us from our goal. Well then, show us a children's book which uses Phoenician plain text, rather than a table of glyphs. Which is to encode all of the world's writing systems in a Universal Character Set. Including Klingon? Or are there some unstated conditions here that the writing systems have to be actually in use? Have they demonstrated a need for it or that, if encoded, anyone will actually use it? Surely these are the criteria for encoding a script, not just that one person has asked for it to be encoded and a few have supported him. I guess it is just a misapprehension on your part about what you will be forced to do. Let's rehearse it again. Most Germanicists prefer to transliterate Gothic text into Latin to work with it, to study it, to publish it, to read it. We encoded Gothic anyway, because it is a separate script from Greek. Most Germanicists prefer to transliterate Runic text into Latin to work with it, to study it, to publish it, to read it. We encoded Runic anyway, doubtless to the joy of adolescent Dungeons-and-Dragons players everywhere. Most Semiticists (you claim) prefer to transliterate Phoenician (and other language) text into Hebrew (or Latin) to work with it, to study it, to publish it, to read it. We should encode the Phoenician family of scripts anyway, because Your claim that Phoenican is just a subset of Hebrew ignores the historical facts of the development of the Hebrew script, in particular with regard to the development of related scripts like Samaritan. The unification which we did for Phoenician correctly rounds up like with like, and leaves specialized branches of the West Semitic writing systems (like Hebrew and Samaritan) alone as separate scripts. My claim was not quite this. It was rather that Phoenician can be treated as subset of Hebrew, and the need to treat it otherwise had not been demonstrated. I think Deborah's contribution has now come close to demonstrating that need. Need is more than just want. I am thinking of people who would actually use this encoding, who would prefer to use it, and who are not adequately provided for by
Re: New contribution
Christian Cooke wrote: Surely a cipher is by definition after the event, i.e. there must be the parent script before the child. Does it not follow that, by John's reasoning, if one is no more than a cipher of the other then it is Hebrew that is the cipher and so the only way Phoenician and Hebrew can be unified (a suggestion you'll have to assume is suitably showered with smileys :-) is for the latter to be deprecated and the former encoded as the /real/ parent script? The argument of at least some contributors to this discussion is that the Hebrew' block is misnamed. Even if one accepts that 'Phoenician' should be separately encoded, the Hebrew block should have been called 'Aramaic' :) John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] I often play against man, God says, but it is he who wants to lose, the idiot, and it is I who want him to win. And I succeed sometimes In making him win. - Charles Peguy
Re: New contribution
On 02/05/2004 16:28, Michael Everson wrote: ... Common sense says that you should not use the Hebrew block for Phoenician script with a masquerading font, since the Hebrew script and the Phoenician script are different scripts. OK, I get the point. Unicode doesn't tell anyone what to do, but common sense doesn't. Semiticists are allowed to continue to do what they are doing if your proposal is accepted, but if they don't, in your opinion they lack common sense. Well, I suspect this negative opinion might be mutual, and your proposal might be ignored. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: New contribution
On 02/05/2004 14:38, [EMAIL PROTECTED] wrote: ... The Meshe Stele and the inscription of Edessa were originally written in the same script. If encoding the Edessa inscription using the Hebrew range would be transliteration, then so would the encoding of the Meshe Stele in the Hebrew range. And if black is white, then white is black. On the other hand, if your Edessa inscription (by the way is there an Edessa in Macedonia as well as the well known Edessa in modern Turkey?), and is written with Phoenician glyphs (as you have stated, I think), and if Phoenician glyphs are glyph variants of Hebrew glyphs (the hypothesis being tested), then encoding the Edessa inscription with Hebrew characters is not transliteration, just as encoding of a text written in Fraktur with Latin characters is not transliteration but the standard way of encoding the text. All this is quite independent of the language of the text. If Phoenician is considered a glyphic variation of modern Hebrew, then it can also be considered a glyphic variation of modern Greek. Would it then follow that modern Greek should have been unified with modern Hebrew? (Directionality aside.) In principle, the only thing which makes these unifications impossible is directionality. I am sure there are a number of other things which would make them undesirable. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: New Contribution: In support of Phoenician from a user
On 03/05/2004 19:04, Michael Everson wrote: At 09:41 -0700 2004-05-03, Peter Kirk wrote: If your support had been cited in the original proposal with your arguments, rather a lot of spilled electrons could have been saved. Well, I guess it is not too late to include them in a revised proposal. What format would you like that addition to have? ... I'll leave that to you, but for a start you can name Deborah Anderson as a user of the script with whom you have had contact. And yourself if you like, as far as I am concerned. ... While I am pleased that you are happier, my own interest is in the technical accuracy of the code chart and character names, not in *justifying* its inclusion. Well, I hope the UTC is concerned with the justification of new proposals, and not just their technical accuracy. They were obviously concerned that the Klingon proposal was not properly justified, and so rejected it. If your proposal is not to suffer the same fate, it needs proper justification. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: New contribution
On 03/05/2004 19:03, Michael Everson wrote: At 10:25 -0700 2004-05-03, Peter Kirk wrote: It is not possible to take an encoded Genesis text which is pointed and cantillated, and blithly change the font to Moabite or Punic and expect anyone to even recognize it as Hebrew. Michael, you assert this, but do you actually know it to be true? Yes. Yes, I do. Mark Shoulson did a test today with a group of well-educated young Hebrew-speaking computer programmers. They did not recognize it. Thanks for the data. These are I suppose American Jews. A fairer test might be among Israeli native speakers of Hebrew. ... But this text would be easily recognisable and readable by anyone familiar with both Hebrew and the Phoenician glyphs. I do not believe that any Yiddish speaker would accept a text in a Phoenician font as Yiddish. Well, someone somewhere (in Edessa apparently, but I still don't know which Edessa) accepted a Phoenician script text as Greek. And there are people today who accept Samaritan script text as English. As any script can be used for any language, we really can't try to decide for users which scripts go with which languages. The field of application of Phoenician is so limited that the script just can't be mapped on to the rich typographic and font traditon of Square Hebrew with any sense at all. Wedding invitations are routinely set in Blackletter and Gaelic typefaces. I bet you £20 that if an ordinary Hebrew speaker sent out a wedding invitation in Palaeo-Hebrew no one would turn up on the day. And I bet you £20 that is an ordinary English speaker sent out a wedding invitation in Suetterlin no one would turn up on the day. Now we just need some gullible couples to put our challenges to the test! -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: For Phoenician
On 02/05/2004 17:35, Philippe Verdy wrote: ... Please be polite Peter. You're talking to the official registrar appointed by Unicode, the ISO 15924 Registration Agency. Well, Michael is only the registrar. ISO 15924 will continue to have more details about what is considered as a separate script for bibliographic references and differenciation of publications. I am really impressed - not! ;-) ... The situation for Phenician is quite different. The Hebrew script is already extremely complex by itself. Som of its most complex rules, that would work and produce desirable effects in the square hebrew variant, would become disastrous with another form. Can you really make semantic distinctions with the glyph layout of hataf vowels applied on top of Phoenician/Old Canaanite glyphs? If you had to create a special layout engine to handle multiple cantillation and vowel marks applied safely on square hebrew, would it work as well with the Old Canaanite base glyphs, which were not designed to support these diacritics and allow differentiating them? This would require some creative font design to avoid collisions with descenders, but would be by no means impossible. ... How will you handle the possible inclusion of new variants or additional letters from the base Phoenician script, without breaking some of the modern Hebrew script rules? These are probably lots of these additional variants and extensions, used in the genesis or evolution of other languages and scripts. If you integrate them into only the Phoenician script, with a more relaxed rule than for Hebrew which is strongly fixed today, you'll break the fragile buiding of the Hebrew script. Of course I cannot already handle all hypothetical possible extensions to the current scripts. But Unicode deals with existing scripts, not hypothetical ones. If you have any evidence for any such variants in actual use, please send it to me, and to Michael as he may wish to incorpotate it in his proposal. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: [OT] Europe (Was:: Defined Private Use)
On 02/05/2004 20:33, John Cowan wrote: Ernest Cline scripsit: Defining Europe is vague. Well, Michael Everson back in 1995 defined it thus: Europe extends from the Arctic and Atlantic (including Iceland and the Faroe Islands) southeastwards to the Mediterranean (including Malta and Cyprus), with its eastern and southern borders being the Ural Mountains, the Ural River, the Caspian Sea, and Anatolia, inclusive of Transcaucasia. A more precise political definition can be found at http://www.evertype.com/alphabets/index.html#a . For once I agree with Michael! :-) -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: New contribution
On 03/05/2004 05:19, Michael Everson wrote: ... Suetterlin. Oh shut UP about Sütterlin already. I don't know where you guys come up with this stuff. Sütterlin is a kind of stylized handwriting based on Fraktur letterforms and ductus. It is hard to read. It is not hard to learn, ... Nor is Phoenician. ... and it is not hard to see the relationship between its forms and Fraktur. ... Nor is it hard so see the same relationship between Phoenician and Hebrew with the help of alphabet development charts of the kind in your proposal. ... Its existence is not the same kind of historical relationship that Phoenician letterforms have to Hebrew letterforms. People have letters in their attics written by their grandfathers in Sütterlin. ... The Phoenicians, paleo-Hebrews etc were not as tidy as the Germans, and so left their letters lying about on the ground, where (since they were written on bits of pottery) they could be dug up millennia later and read. ... You can buy books to teach you how to learn Sütterlin. ... ... and Phoenician script. ... Germans who don't read Sütterlin recognize it as what it is -- a hard-to-read way that everyone used to write German not so long ago. And modern Hebrews recognise paleo-Hebrew as a now hard-to-read way that everyone used to write Hebrew a rather longer time ago. Phoenician script, on the other hand, is so different that its use renders a ritual scroll unclean. If you ask me, who shall I believe, John Cowan who has a structural theory or the contemporary users of Phoenician/Palaeo-Hebrew vs Aramaic/Square-Hebrew in determining whether the scripts are unifiable or not, I shall believe the contemporary users, who considered the scripts anything BUT unifiable. Which contemporary users? I thought you had not been in contact with any. ... Either way, pointed and cantillated text displayed in a Phoenician font is a JOKE at best. And not a very good one. It is not very good, but it is not a joke, just an anachronism - although potentially rather a useful one for people who try to reconstruct Phoenician pronunciation. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Pal(a)eo-Hebrew and Square Hebrew
On 03/05/2004 05:55, Patrick Andries wrote: ... When the Biblical text is written in paleo Hebrew there are no vowel pointings. When the text was written in the paleo Hebrew four of the Hebrew letters were used as vowels - aleph, hey, vav and yud, but were removed from the text when the masorites added the vowel pointings. This is evident in the Dead Sea Scrolls where the four letters are found in the words but removed in the Masoretic text. No. The DSS, or nearly all of them, are in square script, and this indicates that the (partial) removal of these additional letters (if that is indeed a correct way to describe what happened) took place long after the transition from paleo-Hebrew to square script. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: The Unicode.ORG Server is now moved
On 03/05/2004 18:40, Rick McGowan wrote: The Unicode.ORG server move has gone more or less according to plan, and mail lists have been turned back on. Thank you for your patience. During the next few weeks, if you notice any service on Unicode.ORG that previously worked but is now broken, or if you suspect that some HTML files are missing or corrupted please do not hesitate to contact me (off list please). I will investigate. Regards, Rick McGowan Unicode, Inc. I sent several messages to the list between 16:20 and 16:30 GMT which were simply lost. These were therefore sent some time before the announced time of the list being closed down - timing which I chose deliberately. This is not an acceptable way to manage a list server. You should refuse to accept messages which you are unable to deliver so that they are queued for retransmission when the server comes up again. I am resending these messages, when I can get Internet access from here in Azerbaijan which is sometimes a problem. This is resulting in a considerable delay to important traffic, and significant expense to me as I have to pay by the minute for Internet access. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Drumming them out
On 04/05/2004 06:10, Michael Everson wrote: ... and myself I'm still not convinced the distinction between Greek and Coptic in bilingual editions is not truly just a font issue. Plain-text searching of Crum's dictionary, for instance, is a perfectly valid requirement, and one which was brought to bear on the disunification. Out of interest, are there any dictionaries e.g. of the Phoenician language which use both Phoenician and Hebrew script, with a plain text distinction? I can quite imagine that there are. If there are, they would provide a good justification for your proposal, helping to supply what is currently missing. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: New contribution
On 02/05/2004 16:48, Michael Everson wrote: ... It is not possible to take an encoded Genesis text which is pointed and cantillated, and blithly change the font to Moabite or Punic and expect anyone to even recognize it as Hebrew. Michael, you assert this, but do you actually know it to be true? After all, this is not your area of expertise. I agree that this kind of mixture is an anachronistic one, much like the example I mentioned earlier of Vietnamese in Fraktur. But this text would be easily recognisable and readable by anyone familiar with both Hebrew and the Phoenician glyphs. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Arid Canaanite Wasteland
On 03/05/2004 15:33, Simon Montagu wrote: Peter Kirk wrote: On 02/05/2004 05:27, [EMAIL PROTECTED] wrote: Quoting from the jewfaq page, The example of pointed text above uses Snuit's Web Hebrew AD font. These Hebrew fonts map to ASCII 224-250, high ASCII characters which are not normally available on the keyboard, but this is the mapping that most Hebrew websites use. I'm not sure how you use those characters on a Mac. In Windows, you can go to ... Is this the same as ISO 8859-8 visual encoding? Codepoint for codepoint, yes, but IIRC the Web Hebrew fonts only worked on sites that were declared (or assumed by default) to be in ISO-8859-1 encoding. But presumably if the same sites were declared as ISO-8859-8 visual they would be readable with standard Unicode Hebrew fonts, in browsers which perform the correct mappings? Well, now if I find an unreadable page which is supposed to be Hebrew, I know which encoding to select manually. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: New contribution
On 04/05/2004 08:58, Peter Constable wrote: Item 1, I think we'd agree, is just wrong. Item 2 is probably true. But is it enough to refer to square Hebrew as the modern form of Phoenician (Old Canaanite, whatever you want to call it)? Well, one of the two modern forms, Samaritan being the other. Ah, so the next protracted debate is going to be whether Samaritan should also be encoded using the existing square Hebrew characters. Since it would appear that the argument for unification of PH with Hebrew could also argue for unification of PH with Samaritan, or of all three. Peter Constable From my point of view, Michael could have made a better case for a unified Phoenician and Samaritan proposal. But I think he intends a separate Samaritan proposal. And that I would not oppose, because there is an easily demonstrable user community of modern Samaritans. Although I would still want assurances that they don't consider Samaritan script to be glyph variants of Hebrew script. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: New contribution
On 03/05/2004 06:47, Michael Everson wrote: ... And frankly, I don't consider that Snyder or Kirk or Cowan speak for the Semiticist community as they would have us think. I admit freely that I don't. And I don't consider that Everson speaks for the Phoenician script user community as it seems he would now have us think. The reason? That he has explicitly denied, in his proposal, having any contact with this community. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Pal(a)eo-Hebrew and Square Hebrew
Peter Kirk a écrit : On 03/05/2004 05:55, Patrick Andries wrote: Quoted... ... When the Biblical text is written in paleo Hebrew there are no vowel pointings. When the text was written in the paleo Hebrew four of the Hebrew letters were used as vowels - aleph, hey, vav and yud, but were removed from the text when the masorites added the vowel pointings. This is evident in the Dead Sea Scrolls where the four letters are found in the words but removed in the Masoretic text. No.
Re: New contribution
On 04/05/2004 06:44, Peter Constable wrote: ... But if you took Biblical Hebrew text and set it with PH glyphs w/o accents, there are a lot of people that know Biblical Hebrew who would not recognize this sample as Biblical Hebrew. ... Well, Peter, that's not the point. A lot of Vietnamese people would not recognise accentless Suetterlin as Vietnamese, they might well guess it was a quite different script. But Suetterlin and Vietnamese are unified. ... And there is no obvious way to add the accents, but even if there were, I suspect those same people still wouldn't recognize it as accented Hebrew with archaic glyphs. I don't see any problem in adding the accents if anyone wants to do so. After all, they stand above and below the letters, and can be shifted out of the way of descenders and ascenders if necessary. No one would want to do so, but that's not the point. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: New contribution
At 09:43 -0700 2004-05-04, Peter Kirk wrote: Mark Shoulson did a test today with a group of well-educated young Hebrew-speaking computer programmers. They did not recognize it. Thanks for the data. These are I suppose American Jews. A fairer test might be among Israeli native speakers of Hebrew. (*jaw drops*) Excuse me? I don't think I am going to be able to discuss user communities of the Universal Character Set with you if this kind of exclusivist rubbish is what you think can possibly apply. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Just if and where is the then?
If a can have U+0061 and have a composite that is U+00e2...U+... If e can have U+0065 and have a composite that is U+00ea...U+... Then why is e with accented grave or acute and dot below cannot be assigned a single unicode value instead of the combinational values 1EB9 0301 and etc Since UNICODE is gradually becoming a defacto, I still think it will not be a bad idea to have such composite values. Dele Olawole
Re: New Contribution: In support of Phoenician from a user
At 09:47 -0700 2004-05-04, Peter Kirk wrote: On 03/05/2004 19:04, Michael Everson wrote: At 09:41 -0700 2004-05-03, Peter Kirk wrote: If your support had been cited in the original proposal with your arguments, rather a lot of spilled electrons could have been saved. Well, I guess it is not too late to include them in a revised proposal. What format would you like that addition to have? ... I'll leave that to you, I'm not really all that interested in the justifications per se. I write proposals to encode things that I think should be encoded. That involves an investment of time and resources, which implies that I think it is worthwhile investing in. Does that make sense to you? but for a start you can name Deborah Anderson as a user of the script with whom you have had contact. And yourself if you like, as far as I am concerned. What if I had done that to start with? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Drumming them out
At 10:00 -0700 2004-05-04, Peter Kirk wrote: and myself I'm still not convinced the distinction between Greek and Coptic in bilingual editions is not truly just a font issue. Plain-text searching of Crum's dictionary, for instance, is a perfectly valid requirement, and one which was brought to bear on the disunification. Out of interest, are there any dictionaries e.g. of the Phoenician language which use both Phoenician and Hebrew script, with a plain text distinction? James Kass presented a non-dictionary text the other day. I considered it plain text. Others didn't. I can quite imagine that there are. I don't know. Mostly I would expect to see Hebrew or Latin transliteration in such dictionaries. Encoding Phoenician in a scholarly context is likely to be more prominent in teaching students, preparing exams and grammars, etc (same thing has been said about other scripts which are often transliterated). If there are, they would provide a good justification for your proposal, helping to supply what is currently missing. Enshrining justifications in the proposal documents really all that important? It sounds like busywork to me. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New contribution
How do you distinguish those scripts that are rejected as 'ciphers' of other scripts from those which you want to encode, if 1:1 correspondence is not sufficient grounds for unification but visual dissimilarity is grounds for disunification? As far as I can follow Michaels arguments he says the following: Disunification for scipts with 1:1 correspondence requires - having distinct glyphs - beeing a relevant script (e.g. historical important, because other scipts do also derive from it, not only the one with the 1:1 correspondence). The later isn't true especialy for Klingon, but it's also not true for e.g. fraktur, because fraktur is the derived script, not latin. -- Dominikus Scherkl
Re: New contribution
Christian Cooke a écrit : Surely a cipher is by definition after the event, i.e. there must be the parent script before the child. Does it not follow that, by John's reasoning, if one is no more than a cipher of the other then it is Hebrew that is the cipher and so the only way Phoenician and Hebrew can be unified (a suggestion you'll have to assume is suitably showered with smileys :-) is for the latter to be deprecated and the former encoded as the /real/ parent script? What is so important about genealogy ? P. A. (immunity of the ill-informed also requested)
Re: New contribution
At 15:16 -0400 2004-05-04, Patrick Andries wrote: Christian Cooke a écrit : Surely a cipher is by definition after the event, i.e. there must be the parent script before the child. Does it not follow that, by John's reasoning, if one is no more than a cipher of the other then it is Hebrew that is the cipher and so the only way Phoenician and Hebrew can be unified (a suggestion you'll have to assume is suitably showered with smileys :-) is for the latter to be deprecated and the former encoded as the /real/ parent script? What is so important about genealogy ? Historical origin of characters and scripts is one of the things which we take into account when determining their identity. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: The Unicode.ORG Server is now moved
Since Peter Kirk wrote, on the Unicode list, I'll CC the list. Peter Kirk wrote: I sent several messages to the list between 16:20 and 16:30 GMT which were simply lost. You are wrong. They were not lost -- at least not on this server. Check the archives. (OK, I've had some config trouble with bringing up the new real-time archives, but your messages are there, and you can check them. I can't guarantee that everything that left *your* machine arrived here, but everything that arrived here is in the archives.) I am certain that everything which *arrived* at the Unicode.org server was *delivered* into the mail list process. (What happens after stuff leaves this machine for downstream delivery is someone else's problem.) These were therefore sent some time before the announced time of the list being closed down - timing which I chose deliberately. This is not an acceptable way to manage a list server. Thank you for your concern, but, I know what I'm doing. Everything was properly shut down and the outgoing queue drained appropriately. You should refuse to accept messages which you are unable to deliver so that they are queued for retransmission when the server comes up again. If you want to talk about how the lists are managed, please don't do it on this list. It's off-topic. Anyway, there was nothing to queue for re-transmission. Incoming mail acceptance was turned off at an appropriate juncture, and the outbound queue allowed to drain. Rick
Re: The Unicode.ORG Server is now moved
Actually, I had already seen all of the messages you resent, Peter, so they apparently did get through the first time. It may well be that something happened to delay them getting thru to you. Some other threads have appeared disjointed to me tho, so there do appear to be real problems, or else people have been posing replies to the list that didn't send the original there. It could be a side effect of the move, the current Sasser Internet worm epidemic, or perhaps even the ghosts of unhappy Paleo- Hebrew scribes warring in the ether(net) over wether Phoenician and Hebrew should be encoded as separate scripts in Unicode.
Re: New contribution
Michael Everson scripsit: Well. Depends what you mean by forms. Our taxonomy currently lists Samaritan, Square Hebrew, Arabic, Syriac, and Mandaic as modern (RTL) forms of the parent Phoenician. Arabic and Syriac have very specialized shaping behavior which makes them obviously distinct from their parent form. I believe that Mandaic has this property too. Ah, so the next protracted debate is going to be whether Samaritan should also be encoded using the existing square Hebrew characters. So far participants on this discussion seem to have stipulated that Samaritan be encoded as a modern and unique script. I have merely postponed the question. I would still prefer to see an overall plan with justification (that is, an update of N2311) before any of these scripts get encoded. -- Evolutionary psychology is the theory John Cowan that men are nothing but horn-dogs, http://www.ccil.org/~cowan and that women only want them for their money. http://www.reutershealth.com --Susan McCarthy (adapted) [EMAIL PROTECTED]
RE: Just if and where is the then?
[Original Message] From: African Oracle [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: 5/4/2004 7:04:48 PM Subject: Just if and where is the then? If a can have U+0061 and have a composite that is U+00e2...U+... If e can have U+0065 and have a composite that is U+00ea...U+... Then why is e with accented grave or acute and dot below cannot be assigned a single unicode value instead of the combinational values 1EB9 0301 and etc Since UNICODE is gradually becoming a defacto, I still think it will not be a bad idea to have such composite values. Dele Olawole Take a look at the Unicode Stability Policy [1]. While it does not make it impossible for there to be a Unicode character LATIN SMALL LETTER E WITH DOT BELOW AND ACUTE ACCENT that would decompose to U+1EB9 U+0301, such a character would have to have the Composition Exclusion property so that it would not appear in any of the Unicode Normalization Forms. A number of other standards, such as XML expect the data they contain to be handled in normalized form, hence even if the precomposed form were available, most software would still prefer to work with the unprecomposed form. The result is that unless there is another official character standard that has LATIN SMALL LETTER E WITH DOT BELOW AND ACUTE ACCENT as a character, there is no benefit to be gained by encoding such a character in Unicode. Even then, the benefit is very small as it is only that a transformation from a single codepoint of that other standard into a single codepoint of the Unicode standard could be done. That was an important consideration when Unicode was getting started, but is not particularly important now. [1] http://www.unicode.org/standard/stability_policy.html
Re: New contribution
I want to point out that the inclusion of a name in N2311 does not mean a *guaranteed* place in Unicode for it. All it means is that according to our best current information, we're trying to reserve space for what we think will be there. But until we get and assess actual concrete proposals, we can't determine whether two proposed scripts should be unified, or one proposed script should be de-unified. Mark __ http://www.macchiato.com - Original Message - From: [EMAIL PROTECTED] To: Michael Everson [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tue, 2004 May 04 12:51 Subject: Re: New contribution Michael Everson scripsit: Well. Depends what you mean by forms. Our taxonomy currently lists Samaritan, Square Hebrew, Arabic, Syriac, and Mandaic as modern (RTL) forms of the parent Phoenician. Arabic and Syriac have very specialized shaping behavior which makes them obviously distinct from their parent form. I believe that Mandaic has this property too. Ah, so the next protracted debate is going to be whether Samaritan should also be encoded using the existing square Hebrew characters. So far participants on this discussion seem to have stipulated that Samaritan be encoded as a modern and unique script. I have merely postponed the question. I would still prefer to see an overall plan with justification (that is, an update of N2311) before any of these scripts get encoded. -- Evolutionary psychology is the theory John Cowan that men are nothing but horn-dogs, http://www.ccil.org/~cowan and that women only want them for their money. http://www.reutershealth.com --Susan McCarthy (adapted) [EMAIL PROTECTED]
Re: New contribution
Michael Everson wrote at 7:21 AM on Tuesday, May 4, 2004: No, Proto-Sinaitic is out, actually, though it's still in the Summary Form by accident. For similar reasons, Proto-Canaanite should be out. Respectfully, Dean A. Snyder Assistant Research Scholar Manager, Digital Hammurabi Project Computer Science Department Whiting School of Engineering 218C New Engineering Building 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 cell: 717 817-4897 www.jhu.edu/digitalhammurabi
Re: Pal(a)eo-Hebrew and Square Hebrew
Patrick Andries wrote at 6:53 AM on Tuesday, May 4, 2004: So there were Dead Sea Scrolls written in Square Hebrew with matres lectionis ? (I don't know, I just would like to know.) Yes; and with final forms of the usual letters. Respectfully, Dean A. Snyder Assistant Research Scholar Manager, Digital Hammurabi Project Computer Science Department Whiting School of Engineering 218C New Engineering Building 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 cell: 717 817-4897 www.jhu.edu/digitalhammurabi
Re: Proposal to add QAMATS QATAN to the BMP of the UCS
At 10:09 +0200 2004-05-05, Simon Montagu wrote: Proposal to add QAMATS QATAN to the BMP of the UCS Michael Everson Mark Shoulson Nice. Ta. 8a. Can any of the proposed characters be considered a presentation form of an existing character or character sequence? No. Is this overstating the case? It's got a unique glyph representation, it's got its own name, and it has its own pronunciation, so in our judgement it is not a presentation form of QAMATS. flippancyIsn't it a little strange that a short qamats should represented with a longer vertical than a regular qamats?/flippancy Them's the facts. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New contribution
Patrick Andries a écrit : Christian Cooke a écrit : Surely a cipher is by definition after the event, i.e. there must be the parent script before the child. Does it not follow that, by John's reasoning, if one is no more than a cipher of the other then it is Hebrew that is the cipher and so the only way Phoenician and Hebrew can be unified (a suggestion you'll have to assume is suitably showered with smileys :-) is for the latter to be deprecated and the former encoded as the /real/ parent script? What is so important about genealogy ? Let me precise this : what is so important whether we encode the father or one of the sons ?
Re: Just if and where is the then?
The existing composites were included only out of necessity so that new Unicode implementations could interoperate with existing implementations using legacy industry-standard encodings. - Peter Constable Are we saying we have exhausted such necessity? And what are these legacy-standard encodings? No new composite values will be added. - Peter Constable The above sounds dictatorial in nature. Dele - Original Message - From: Peter Constable [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, May 04, 2004 10:27 PM Subject: RE: Just if and where is the then? If a can have U+0061 and have a composite that is U+00e2...U+... If e can have U+0065 and have a composite that is U+00ea...U+... Then why is e with accented grave or acute and dot below cannot be assigned a single unicode value instead of the combinational values 1EB9 0301 and etc Since UNICODE is gradually becoming a defacto, I still think it will not be a bad idea to have such composite values. The existing composites were included only out of necessity so that new Unicode implementations could interoperate with existing implementations using legacy industry-standard encodings. Apart from the backward compatibility issue, these composites go against Unicode's design principles and are not needed. No new composite values will be added. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: New contribution
Dean Snyder scripsit: In gross terms, I would characterize the watershed events in scripts used to write Hebrew as: 1) adoption of the Canaanite/Phoenician alphabet 2) adoption, around the time of the Babylonian exile, of Imperial Aramaic script (coupled with some portions of the Hebrew Bible itself being written in Aramaic) 3) adoption of the various supra-consonantal vowel and accent systems 4) The abandonment of most of the apparatus introduced in step 3, as far as productive use of the script is concerned, reverting to the 22CWSA. -- John Cowan [EMAIL PROTECTED]http://www.ccil.org/~cowan Is it not written, That which is written, is written?
Re: New contribution
Patrick, On 4 May 2004, at 21:27, Patrick Andries wrote: Patrick Andries a écrit : Christian Cooke a écrit : Surely a cipher is by definition after the event, i.e. there must be the parent script before the child. Does it not follow that, by John's reasoning, if one is no more than a cipher of the other then it is Hebrew that is the cipher and so the only way Phoenician and Hebrew can be unified (a suggestion you'll have to assume is suitably showered with smileys :-) is for the latter to be deprecated and the former encoded as the /real/ parent script? What is so important about genealogy ? Let me precise this : what is so important whether we encode the father or one of the sons ? [again eschewing any claim to expertise...] On 4 May 2004, at 17:04, John Hudson wrote: How do you distinguish those scripts that are rejected as 'ciphers' of other scripts from those which you want to encode, if 1:1 correspondence is not sufficient grounds for unification but visual dissimilarity is grounds for disunification? Leaving aside the fact that the son is already encoded, I suppose I'm asking how a script can predate a script (Hebrew, or Aramaic so I'm told) it is said to be the cipher of. Regards, Christian
RE: Just if and where is the then?
The existing composites were included only out of necessity so that new Unicode implementations could interoperate with existing implementations using legacy industry-standard encodings. - Peter Constable Are we saying we have exhausted such necessity? Yes, because by definition legacy industry-standard encodings not in widespread usage prior to 1993 do not qualify for the backward-compatibility requirement. The necessity had to do with interoperation with existing implementations, not with the need to support particular languages / writing systems. For the latter, it has never been a necessity to add pre-composed characters. And what are these legacy-standard encodings? No new composite values will be added. - Peter Constable The above sounds dictatorial in nature. I'm simply telling you what the policy of the Unicode Consortium is. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: Just if and where is the then?
From: African Oracle [EMAIL PROTECTED] The existing composites were included only out of necessity so that new Unicode implementations could interoperate with existing implementations using legacy industry-standard encodings. - Peter Constable Are we saying we have exhausted such necessity? And what are these legacy-standard encodings? I think this is the list shown in the References section of the Unicode standard. I don't think that this list is closed: there may be further standards considered, notably if they reach an ISO standard status, or they start being used extensively in some popular OS as a de-facto standard. No new composite values will be added. - Peter Constable The above sounds dictatorial in nature. I think that the sentence is incomplete, or you interpret it the wrong way. The key is the composite term, which here should mean a character that has a canonical decomposition into a sequence of a base character with combining class 0, and one or more diacritics with a positive combining class. However this is a general principle that applies to already encoded scripts that are already widely used (notably Latin, Greek, Cyrillic, Hiragana/Katakana with voicing marks, Han with tone marks, pointed Hebrew or Arabic, and Brahmic scripts), but which may not apply to newly encoded scripts if they offer some new combining diacritics and new base letters, where some compositions may be desirable immediately due to difficulties to render the composite properly. Some semitic scripts for example have so complex rules to create composites with a base consonnant and combining vowel modifiers, that the whole script was instead encoded as if it was a syllabary... (Here I think about Ethiopic, but some have different opinions and argue that Ethiopic is a true syllabary, given its current modern usage).
Re: Just if and where is the then?
Thanks to have taken the time to explain. Regards Dele - Original Message - From: Peter Constable [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, May 05, 2004 12:50 AM Subject: RE: Just if and where is the then? The existing composites were included only out of necessity so that new Unicode implementations could interoperate with existing implementations using legacy industry-standard encodings. - Peter Constable Are we saying we have exhausted such necessity? Yes, because by definition legacy industry-standard encodings not in widespread usage prior to 1993 do not qualify for the backward-compatibility requirement. The necessity had to do with interoperation with existing implementations, not with the need to support particular languages / writing systems. For the latter, it has never been a necessity to add pre-composed characters. And what are these legacy-standard encodings? No new composite values will be added. - Peter Constable The above sounds dictatorial in nature. I'm simply telling you what the policy of the Unicode Consortium is. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: UNIHAN.TXT
Title: RE: UNIHAN.TXT From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED] Sent: Friday, April 30, 2004 12:12 AM Tabs... In addition to the points Mike made about the tab character having different semantics depending on the application/platform, I just don't think a control character like tab belongs in a *.TXT file period. This is long past the point of opinion, however. Tabs in text files are an implementation fact, long past the point of discussion. Although UNIHAN.TXT is referred to as a database, it isn't. Yes, it is. Plain text databases are far more common than most people realize. The awk tool exists solely to work with them. Unix -vs- DOS... I'll stick with the tools I've been using for a quarter century and their descendants, thanks just the same. Hmmm? You know that doesn't narrow it down any... With respect to the idea that a text editor is not the proper tool with which to open a *.TXT file, well... I think you misunderstand. I believe the point was that text files are not universally fully interoperable. Another fact of implementation, especially when it comes to large files or files with long lines. I could send you the CSV file for posting, if you think anyone else would want it. Give 'em the conversion script, not the CSV file! Doug Ewell wrote, And as John said, converting LF to CRLF is quite a simple task -- it can even be done by your FTP client, while downloading the file -- and should not be thought of as a deficiency in the current plain-text format. Right. It's not a deficiency, it simply adds one more step to a multi-step process for some of us. That step is unnecessary. A little more research on your tools will eliminate it. In order to see non-Latin characters in the DOS-window of Windows, it's necessary to install a console font covering the characters, and then activate (or enable) that font for the console window. Everson Mono Terminal should work fine for non-Han characters which don't require complex shaping. I found Everson Mono, but not Everson Mono Terminal. Am I looking in the wrong place? /|/|ike
Re: Just if and where is the then?
Dele, No new composite values will be added. - Peter Constable The above sounds dictatorial in nature. Peter has already explained that this is just the nature of the current policy regarding such additions. The reason for the policy others in this thread have attempted to explain. The short answer is that it would disturb the stability of the definition of normalization of data involving Unicode characters, and stability of normalization is extremely important to many implementations of the standard. This said, you need to understand that there is a learning curve for people coming new to the Unicode Standard. The existence of a policy which constrains certain kinds of additions to the standard is not a matter of dictatorial proclamations -- it is not something that Peter Constable or any other individual has the power to impose. Such policies arise out of the consensus deliberations of the Unicode Technical Committee, which involve many different members, jointly responsible for the technical content of the standard. They are also endorsed in the Principles and Procedures document for the ISO committee, JTC1/SC2/WG2 responsible for the parallel, de jure international character encoding standard, ISO/IEC 10646. And in that committee, decisions are also made based on consensus after discussion among members of many different participating national bodies. As for the particular issue regarding characters like {e with dot below and acute accent}, for example, the policy is not in place as a matter of discrimination against particular languages or orthographies. The *glyph* for {e with dot below and acute accent} can and should be in a font for use with a language that requires it. Alternatively, the font and/or rendering system should be smart enough to be able to apply diacritics correctly. But the *characters* needed to represent this are already in the Unicode Standard, so the text in question can *already* be handled by the standard. Trying to introduced a single, precomposed character to do this, instead, would just introduce normalization issues into the standard without actually increasing its ability to represent what you need to represent. As Peter has explained, a letter or a grapheme doesn't necessarily have a 1-to-1 relationship to the formal, abstract character encoded in the Unicode Standard for use in representing text. You had one example already: gb is a letter in Edo. That fact is important for education, for language learning, for sorting, and various other things. But that letter is represented by a sequence of *characters* already encoded in Unicode: 0067, 0062. Likewise, if you have an acute accented e with dot below, that may constitute a single accented letter in Edo, but it is represented by a sequence of *characters* already encoded in Unicode: 0065, 0323, 0301. These decisions regarding the underlying numbers representing these elements of text are *not* required to be surfaced up to the level of end users. Properly operating software supporting a particular language should present the alphabetic units and their behavior to users they way *they* expect they should work. The fact that Unicode systems haven't gotten there in many cases yet is an artifact of the enormous difficulty of getting computers to work for *all* the writing systems and languages of the world. People are working hard on the problem, but it is a *big* problem to solve. --Ken
Re: Drumming them out
At 16:11 -0400 2004-05-04, [EMAIL PROTECTED] wrote: OTOH, I am quite ignorant of Egyptian demotic as mentioned in the Coptic proposal, but I am rather surprised to find that it's not on the Roadmaps anywhere. Is it unified with hieroglyphic? No. We don't know enough about its repertoire size. Finally, I have read the Coptic proposal (I missed the announcement of it, evidently) and praise it. One is gratified to hear it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Just if and where is the then?
Ken I appreciate your detailed response and Peter has also provided an insightful answer. It is a learning process and I am learning everyday. Regards Dele Olawole - Original Message - From: Kenneth Whistler [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, May 05, 2004 2:38 AM Subject: Re: Just if and where is the then? Dele, No new composite values will be added. - Peter Constable The above sounds dictatorial in nature. Peter has already explained that this is just the nature of the current policy regarding such additions. The reason for the policy others in this thread have attempted to explain. The short answer is that it would disturb the stability of the definition of normalization of data involving Unicode characters, and stability of normalization is extremely important to many implementations of the standard. This said, you need to understand that there is a learning curve for people coming new to the Unicode Standard. The existence of a policy which constrains certain kinds of additions to the standard is not a matter of dictatorial proclamations -- it is not something that Peter Constable or any other individual has the power to impose. Such policies arise out of the consensus deliberations of the Unicode Technical Committee, which involve many different members, jointly responsible for the technical content of the standard. They are also endorsed in the Principles and Procedures document for the ISO committee, JTC1/SC2/WG2 responsible for the parallel, de jure international character encoding standard, ISO/IEC 10646. And in that committee, decisions are also made based on consensus after discussion among members of many different participating national bodies. As for the particular issue regarding characters like {e with dot below and acute accent}, for example, the policy is not in place as a matter of discrimination against particular languages or orthographies. The *glyph* for {e with dot below and acute accent} can and should be in a font for use with a language that requires it. Alternatively, the font and/or rendering system should be smart enough to be able to apply diacritics correctly. But the *characters* needed to represent this are already in the Unicode Standard, so the text in question can *already* be handled by the standard. Trying to introduced a single, precomposed character to do this, instead, would just introduce normalization issues into the standard without actually increasing its ability to represent what you need to represent. As Peter has explained, a letter or a grapheme doesn't necessarily have a 1-to-1 relationship to the formal, abstract character encoded in the Unicode Standard for use in representing text. You had one example already: gb is a letter in Edo. That fact is important for education, for language learning, for sorting, and various other things. But that letter is represented by a sequence of *characters* already encoded in Unicode: 0067, 0062. Likewise, if you have an acute accented e with dot below, that may constitute a single accented letter in Edo, but it is represented by a sequence of *characters* already encoded in Unicode: 0065, 0323, 0301. These decisions regarding the underlying numbers representing these elements of text are *not* required to be surfaced up to the level of end users. Properly operating software supporting a particular language should present the alphabetic units and their behavior to users they way *they* expect they should work. The fact that Unicode systems haven't gotten there in many cases yet is an artifact of the enormous difficulty of getting computers to work for *all* the writing systems and languages of the world. People are working hard on the problem, but it is a *big* problem to solve. --Ken
Re: New contribution
Peter Kirk wrote: On 03/05/2004 19:03, Michael Everson wrote: Wedding invitations are routinely set in Blackletter and Gaelic typefaces. I bet you 20 that if an ordinary Hebrew speaker sent out a wedding invitation in Palaeo-Hebrew no one would turn up on the day. And I bet you 20 that is an ordinary English speaker sent out a wedding invitation in Suetterlin no one would turn up on the day. Now we just need some gullible couples to put our challenges to the test! Well, it doesn't need to be a wedding invitation, does it? I'll give it a try; I've downloaded a Stterlin font, and I'll type up a small document and see if I can get some English-readers to read it or recognize it. Even if they can't read it, I'll bet they can recognize it as Latin letters and possibly English, which was not so for Paleo-Hebrew and Hebrew. ~mark
Re: New contribution
Michael Everson wrote at 11:07 AM on Monday, May 3, 2004: If you think that a Hebrew Gemara, with its baroque and wonderful typographic richness, can be represented in a Phoenician font, then you might as well give up using Unicode and go back to 8859 font switching and font hacks for Indic. If you think that a Roman funerary inscription, with its stately and wonderful typographic formality, can be represented in a modern LED- inspired font, then ... Respectfully, Dean A. Snyder Assistant Research Scholar Manager, Digital Hammurabi Project Computer Science Department Whiting School of Engineering 218C New Engineering Building 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 cell: 717 817-4897 www.jhu.edu/digitalhammurabi
Re: New contribution
Michael Everson wrote at 8:19 AM on Monday, May 3, 2004: Phoenician script, on the other hand, is so different that its use renders a ritual scroll unclean. I'm just guessing that the same thing would be true for modern cursive Hebrew? Regardless, since when is the ritual uncleanness of fonts a trigger for encoding? Just do a Select All and change the font! Either way, pointed and cantillated text displayed in a Phoenician font is a JOKE at best. And not a very good one. The same could be said for accented archaic Greek - do you want to encode archaic Greek separately? Respectfully, Dean A. Snyder Assistant Research Scholar Manager, Digital Hammurabi Project Computer Science Department Whiting School of Engineering 218C New Engineering Building 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 cell: 717 817-4897 www.jhu.edu/digitalhammurabi
Re: Just if and where is the then?
African Oracle scripsit: Are we saying we have exhausted such necessity? Yes. And what are these legacy-standard encodings? Those devised by ISO, various national governments, IBM, Microsoft, and Apple, roughly speaking. No new composite values will be added. - Peter Constable The above sounds dictatorial in nature. It's a statement of fact about the current intentions of the Unicode Consortium. The time for new precomposed characters has passed. -- XQuery Blueberry DOMJohn Cowan Entity parser dot-com [EMAIL PROTECTED] Abstract schemata http://www.reutershealth.com XPointer errata http://www.ccil.org/~cowan Infoset Unicode BOM --Richard Tobin
Re: Drumming them out
Michael Everson wrote: At 10:00 -0700 2004-05-04, Peter Kirk wrote: Out of interest, are there any dictionaries e.g. of the Phoenician language which use both Phoenician and Hebrew script, with a plain text distinction? James Kass presented a non-dictionary text the other day. I considered it plain text. Others didn't. There is no such thing as plain text on paper. ~mark
Re: New contribution
Michael Everson wrote at 9:26 AM on Monday, May 3, 2004: If you people, after all of this discussion, can think that it is possible to print a newspaper article in Hebrew language or Yiddish in Phoenician letters, then all I can say is that understanding of the fundamentals of script identity is at an all-time low. I'm really surprised. Is it possible to print a newspaper article using archaic Greek letters and it still be legible to a modern Greek reader? If not, are you going to propose encoding archaic Greek separately? [As a reference, one could, for example, take a glance at the alphabetic chart you provide in figure 1 of your proposal.) Respectfully, Dean A. Snyder Assistant Research Scholar Manager, Digital Hammurabi Project Computer Science Department Whiting School of Engineering 218C New Engineering Building 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 cell: 717 817-4897 www.jhu.edu/digitalhammurabi
Re: New contribution
Mark E. Shoulson wrote: I'd be interested in such a building. Anyplace still using Phoenician script? Aside from the Samaritans, whose script has evolved some as well... Wow. Yes, Wow was exactly my reaction too. I've put some pictures up at http://www.smontagu.org/PalaeoHebrew/ It's interesting that the inscription uses modern Hebrew spelling conventions and writes with a mater lectionis, which it doesn't have in the Masoretic text of Kings. The glyphs look to my unexpert eye more like Moabite than Paleo-Hebrew, but of course it's a work of art rather than a scholarly presentation, and the sculptor may have chosen them from aesthetic considerations. Next time I'm there, I'll try asking some random passers-by what they think the script is. ;-) Simon
Re: Proposal to add QAMATS QATAN to the BMP of the UCS
Mark E. Shoulson scripsit: If it were possible to do this, couldn't we rearrange everything so that the points were NOT screwed up like they are? No. The numbers assigned to the various canonical combining classes are arbitrary so they can be renumbered, but which characters belong to which classes, and the order of the classes, are both immutable. Think RESEQUENCE from Basic. -- LEAR: Dost thou call me fool, boy? John Cowan FOOL: All thy other titles http://www.ccil.org/~cowan thou hast given away: [EMAIL PROTECTED] That thou wast born with. http://www.reutershealth.com
Re: Just if and where is the sense then?
So why can we have zillions of CJK code points and make a fuss about a few single code points that must be composed by an ever growing intelligent display software that is also supposed to run on all platforms? So why are we unifying all middle east past and present scripts? Why are the few academics here taking up all the band-width in this group - how many message has mr. P.K. sent lately? So how come the majority of Polish people living abroad - let's say 40 millions against 40 million living in Poland - is not able of using their native characters - also called 'ogonki' - in their e-mails? Just let us KISS gtx, Rein On Tue, 4 May 2004, African Oracle wrote: If a can have U+0061 and have a composite that is U+00e2...U+... If e can have U+0065 and have a composite that is U+00ea...U+... Then why is e with accented grave or acute and dot below cannot be assigned a single unicode value instead of the combinational values 1EB9 0301 and etc Since UNICODE is gradually becoming a defacto, I still think it will not be a bad idea to have such composite values. Dele Olawole
Re: Proposal to add QAMATS QATAN to the BMP of the UCS
[Original Message] From: Mark E. Shoulson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: 5/4/2004 7:49:45 PM Subject: Re: Proposal to add QAMATS QATAN to the BMP of the UCS Peter Kirk wrote: It would actually be possible, although I am not sure if it is useful, to rearrange all the fixed position classes to make a space for QAMATS QATAN next door to QAMATS. If it were possible to do this, couldn't we rearrange everything so that the points were NOT screwed up like they are? Depends on what you mean by screwed up. Let f(c) be a function that returns the current canonical combining class of character c. It is possible to change the classes so that the value would be returned by a new function g(c) where f(c) and g(c) are not equal for all values of c, but there are restrictions on how far the change could be made, In particular, for all characters x and y, currently defined in Unicode the following must be true. If f(x) f(y) then g(x) g(y). If f(x) = f(y) then g(x) = g(y). If f(x) f(y) then g(x) g(y). Basically all this does is if there was a need to give a character a class between the current 18 and 19, Unicode could for example add 1 to all of the classes that are 19 or greater and give the new character a class of 19. If Unicode allowed non-integral combining classes, it would be simpler to give the new character a class of 18.5.
Re: New contribution
[EMAIL PROTECTED] wrote at 12:44 PM on Monday, May 3, 2004: Please take a look at the attached screen shot taken from: www.yahweh.org/publications/sny/sn09Chap.pdf If anyone can look at the text in the screen shot and honestly say that they do not believe that it should be possible to encode it as plain text, then the solution is obvious: We'll disagree. Why, because you want to be able to retain in a plain text encoding the larger font size in the heading The First Syllable 'Yah'? ;-) This whole document requires rich text. If I substituted modern cursive Hebrew letter forms for the Palaeo-Hebrew (to contrast them with the classical square Hebrew), would you want to encode those too? Respectfully, Dean A. Snyder Assistant Research Scholar Manager, Digital Hammurabi Project Computer Science Department Whiting School of Engineering 218C New Engineering Building 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 cell: 717 817-4897 www.jhu.edu/digitalhammurabi
RE: New contribution
Peter Constable wrote at 8:58 AM on Tuesday, May 4, 2004: Ah, so the next protracted debate is going to be whether Samaritan should also be encoded using the existing square Hebrew characters. Since it would appear that the argument for unification of PH with Hebrew could also argue for unification of PH with Samaritan, or of all three. Correct. Samaritan, unlike Old Hebrew, which adopted Aramaic forms during and after the Babylonian exile, has retained the Phoenician/Canaanite forms. The main complication I see with encoding Samaritan, that is different than what we are currently discussing, is the reality of its still- living, long-preserved script and religious tradition. Respectfully, Dean A. Snyder Assistant Research Scholar Manager, Digital Hammurabi Project Computer Science Department Whiting School of Engineering 218C New Engineering Building 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 office: 410 516-6850 cell: 717 817-4897 www.jhu.edu/digitalhammurabi