Re: Response to Everson Phoenician and why June 7?
Peter Kirk wrote, The solution may be a catch-all, but the problem is a real one. Dr Kaufman's response makes it clear that to professionals in the field Everson's proposal is not just questionable but ridiculous. There is certainly some PR work to be done in this area, not name-calling. Does Dr. Kaufman speak for all professionals in the field, or would it be fair to say that Dr. Kaufman is speaking for only one such professional? Best regards, James Kass
Re: Response to Everson Phoenician and why June 7?
Peter Constable wrote, I'm sure even Youtie would go for this. Except that she's too busy writing new lyrics for Janis Joplin tunes. Ernest Cline wrote, ... This indicates to me that variation sequences are a potential solution that should be considered, even if it ends up being rejected in favor of disunification. In order for Phoenician to be disunified from Hebrew, it must first have been unified with Hebrew. This is not the case. (If anyone can cite from TUS any passage recommending that Phoenician text should be encoded using Hebrew characters, I'll stand corrected.) Variation sequences could be very helpful to distinguish variants in plain text. But, if every character in an entire text needs to have a corresponding variant selector in order for the text to render as expected, then that's a strong argument in favor of a separate encoding. Variation sequences could be used to distinguish glyph variants between Phoenician and neo-Punic, though, or even between neo-Punic and neo-Punic. If members of any discipline need such granularity in plain text, say epigraphers or numismatists, then they'll float a proposal and the proposal can be judged on its merits. Somebody You should use graphics for such distinctions. Graphics aren't part of plain text. Somebody Well then, you should just use mark-up. Neither is mark-up. Best regards, James Kass
Re: Multiple Directions (was: Re: Coptic/Greek (Re: Phoenician))
Philippe Verdy wrote, How can I get so much difference in Internet Explorer when rendering Ogham vertically (look at the trucated horizontal strokes), and is the absence of ligatures in Mongolian caused by lack of support of Internet Explorer or the version of the Code2000 font that I use (I though I had the latest version)? The Ogham text shown in the graphic you attached is not from Code2000. Apparently, your browser is substituting Ogham glyphs from another font. The Mongolian positional variants which ligate well are not yet supported by released versions of Uniscribe (USP10.DLL), as far as I know. Best regards, James Kass
Re: ISO-15924 script nodes and UAX#24 script IDs
Philippe Verdy wrote, 140;Mnda;Mandaean;mand饮 //Is it same as Mende Kikakui Syllabic? Here's a good scan of the Mandaean alphabet: http://essenes.net/Nabc.htm It's not the same as Mende. Best regards, James Kass
RE: Archaic-Greek/Palaeo-Hebrew (was, interleaved ordering; was, Phoenician)
Jony Rosenne wrote, There is another option - to postpone the decision. If the question is controversial, and consent impossible to achieve, this is often the best choice. If it is impossible to achieve a consensus, it's disingenous to suggest that a decision be postponed until an agreement is reached. Rather, if no consent is possible, it's pointless to postpone making a decision. Further, when everyone agrees, no decision is required. Suppose nobody celebrated the Sabbath until all of the World's religious experts agree on the correct day of the week? Best regards, James Kass
Unicode fallback font
Around August of 2002 there was a discussion on this list about the possibility of having some kind of Unicode fall-back font which would have glyphs to display the hex code of any character. Bob Hallissy has just released such a font for the BMP. The font is now on-line at: http://scripts.sil.org/UnicodeBMPFallbackFont Best regards, James Kass
RE: interleaved ordering (was RE: Phoenician)
Dean A. Snyder wrote, The issue is not what we CAN do; the issue is what will we be FORCED to do that already happens right now by default in operating systems, Google, databases, etc. without any end user fiddling? That's the question. Since search engines like Google survive based on their ability to serve users' wants and find what users seek, why wouldn't Google make such a tailoring? I don't have any contacts at Google, so don't know who to ask. But, IMHO Google is one of the best search engines available. From observation, they seem to roll with the punches quite well. They seem to be first with multilingual and Unicode-based search capabilities, multilingual user interfaces, and they even have a beta translator which has given many hours of amusement. (Google interface in Hebrew, http://www.google.com/intl/iw/ ) Plus, they clearly *like* to be avant-garde, even if it takes a little extra work. (They also have user interfaces in Klingon and various other interesting languages. Although many of their language-based interfaces transliterate to Latin, one suspects that this is only because of the lack of widespread system support for many complex scripts, and that this will change when appropriate.) If giving Phoenician script and Hebrew script equivalence for searching purposes means that scholars can use their service to find what they want, it seems only natural that the good folks at Google would do the job right. Obviously for the statistically fewer custom applications we would write software. Although perhaps statistically fewer, it would seem to be just as obvious that the most useful applications in your work would be custom out of necessity. A custom application, for example, would allow the user to set a font for showing, say, cuneiform glyphs in the private use area to display custom file names. But, a default application might just substitute an inappropriate font willy-nilly. But it would seem that encoding defaults should mirror script-user defaults. Would it be fair to say that people who don't use the Phoenician script aren't members of its user community? Best regards, James Kass
RE: interleaved ordering (was RE: Phoenician)
Dean A. Snyder wrote, You only make a response regarding Google; but that is only one of the search engines; and it leaves issues with operating systems and database engines still unanswered. http://www.unicode.org/reports/tr10/#Tailoring The entire report contains much useful information about what a default collation table should and should not try to do. There are also handy examples illustrating that different users will have differing expectations even in the same script. Best regards, James Kass
RE: interleaved ordering (was RE: Phoenician)
Dean A. Snyders asks, Why make something we do all the time more difficult and non-standard, when what we do now works very well? Please, one thing to remember about default collation is that it's default. It's only there when no other instructions exist. Another thing to remember about collation is that it's best when tailorable. Anyone wishing to sort anything will want to impose their own rules on the sort, and anyone who has done this in the past has already worked out a method for such imposition. If you're making a library database, do you want 1984 to sort under the digit 1, would you prefer that it be sorted under O for one, or would it be better if it sorted under N for nineteen? If the database is for biblios rather than books, you might prefer that the book title be sorted under M. If someone keys in nineteen eighty four to a search box, and you want them to be able to find 1984 in your database, you will program for it. If you want Richard III to match with Richard the third, a bit of extra work is required. If it's your purpose to set up a Hebrew script/Hebrew language database of Hebrew inscriptions, and the original script used in the inscription is irrelevant for your purposes, and you are importing data from multiple sources who may use alternate encodings, you will 'normalize' the data upon import. In this case 'normalize' would include converting the character set if necessary, transliterating/transcribing to Hebrew characters if necessary, stripping off points if they're present and not wanted, and so on. If you're importing data into a DSS Unicode database, and your source is using Web Hebrew or another ASCII-masquerade, then you're already performing normalization. If you're importing data originally entered in visual order rather than logical order, you're already normalizing. If your database includes a field to indicate the original script, here presuming that the original script is of some interest, and you want to export something, you'll either export it as Hebrew text, or you'll 'normalize' it back into the original script on export. Either way, it's about as hard to program for as allowing for differences in case, like TROLL vs. troll. And, in either case, it should be done by the tools and trivial to the users, although any application which doesn't allow the user to set preferences and make rules in such an instance is next to worthless. Best regards, James Kass
RE: Phoenician
Peter Constable wrote, Of things already in Unicode, what have been boundary cases between unificiation and de-unification? Canadian Aboriginal Syllabics? Old Italic? Best regards, James Kass
Re: Phoenician
The author of the web site A Bequest Unearthed, Phoenicia ( http://phoenicia.org ) has kindly given permission for his response to a request for comments on the Phoenician proposal to be forwarded to Unicode's public list. Best regards, James Kass, forwarded message follows... Hello James, Thank you for visiting A Bequest Unearthed, Phoenicia and for taking the time to write such a kind yet very important message. I am indebted to you for having alerted me to this bit of information. I was aware the the proposal was underway though I had never had a chance to read it. Further, I was unaware of the attempt to smother Phoenician script by not allowing it to have its unique and separate Unicode identity. No one can deny that the modern Hebrew script is very useful in dealing with Phoenician script in the computer world. However, Hebrew is not the only medium script-wise which can be useful for Phoenician, in fact, Aramaic script as well as its Syriac branch are useful too. Many scholar find western Aramaic to be relatively modern Phoenician. Further, as far as I am concerned, I find it much easier for me to read Phoenician using the Phoenician script than to read it using Hebrew. I cannot recognize all the Hebrew characters while I can easily see Latin characters in the Phoenician alphabet. With due respect to Hebrew, I believe that it must not substitute Phoenician in the computer medium. Phoenician Canaanite is separate, unique and independent of any language, despite its similarities with many ancient languages of the Middle East. I believe one of the strongest points made in the proposal is this: Phoenician is quintessentially illustrative of the historical problem of where to draw lines in an evolutionary tree of continuously changing scripts in use over thousands of years. The twenty-two letters in the Phoenician block may be used, with appropriate font changes, to express Punic, Neo-Punic, Phoenician proper, Late Phoenician cursive, Phoenician papyrus, Siloam Hebrew, Hebrew seals, Ammonite, Moabite, and Palaeo-Hebrew. The historical cut that has been made here considers the line from Phoenician to Punic to represent a single continuous branch of script evolution. The objection and use of Hebrew instead of the Phoenician script reminds of the problem Champolion was faced with when he was trying to decipher Egyptian Hieroglyphics. He had access to the Coptic language which is the closest to ancient Egyptian. However, at some point in time, Coptic books were not anymore written in Egyptian Hieroglyphics but in Greek; therefore, Egyptian was forgotten as a written medium. Refusing to encode Phoenician and using Hebrew is an intellectual crime against the Phoenician heritage and history which I very strongly condemn. I have already planned and started to contact my colleagues in the Aramaic, Coptic and Syriac computer community to lobby their support in approving the unicoding of the Phoenician script. Regretfully, I am not experienced or seasoned in the machination of lobbying support among scholars of this field but I will do my best so to do, thanks to you. My site, a labor of love for preserving and disseminating information about my heritage, is continuously growing with new materials as time permits. Kind regards, Salim* George Khalaf, Byzantine Phoenician Descendent * perhaps from Shalim, Phoenician god of dusk A Bequest Unearthed, Phoenicia ? Encyclopedia Phoeniciana http://phoenicia.org Center for Phoenician Studies Chapel Hill, NC USA Greetings, Your wonderful web site is keeping me on-line! Thank you so much for making all of this information available on the World wide web. There's currently a proposal before ISO/Unicode to encode the ancient Phoenician script so that it can have a unique range in the World's standard for the computer encoding of text. Interested scholars and users are invited to review this proposal and comment upon its merits. Objections have been raised to this proposal by some scholars that the ancient Phoenician writings should be encoded on computers using the modern Hebrew script range, and that Phoenician writing doesn't need to have its own computer encoding range because there is no need to be able to distinguish between modern Hebrew writing and ancient Phoenician writing in computer plain text. There has been a lively discussion about this on the Unicode public mailing list recently. The author of the proposal has said that the proposal will be revised. This is why it is important that scholars and other users voice their opinions and why I am writing you. If you have any opinions about this and would like to respond, your response would be most welcome and would be forwarded to the responsible people. If you know of anyone interested who would like to offer an opinion, please feel free to forward this message along. The current proposal is on-line in PDF format at:
(OT) Sailing Greeks (was Re: New contribution)
Dean Snyder wrote, 2 Greeks are better sailors. Evidence supporting this can be seen here: http://www.greekshops.com/images/ChildrensVideoDVD/popayvideo.jpg It was a troll. And a good one! Best regards, James Kass
Re: Phoenician
Elaine Keown wrote, Hardly. If the rest of you hadn't agreed with his judgments most of the time, the Roadmap might look quite different. It's more like Potter Stewart on pornography. Who's Potter Stewart? (I don't own a TV).Elaine Potter Stewart doesn't get on TV much these days. A while ago, when asked to define pornography (or, possibly it was obscenity?) his response was something like, 'I can't define it, but I know it when I see it'. So, his expert supporters could conclude from this that Potter Stewart was a just and righteous person who spoke the truth with conviction. Experts from the opposition, however, could infer that Potter Stewart must've seen a lot of pornography in order to be such an expert on distinguishing it. The above merely to illustrate that experts in any persuasion seldom agree on everything; if they did -- they couldn't be contentious. Best regards, James Kass
Re: Nice to join this forum....
James Kass wrote, Enter the marks above (tone marks) first, then enter marks below. My error. Enter either the marks below first or the marks above first. It's equivalent and the display is supposed to be the same either way. There was a problem with the font here... The inside out rule on page 125 (TUS 4.0) shows above marks coming before below marks in Figure 5-7. Canonical ordering (TUS 4.0, p. 84) would reverse this. Best regards, James Kass
Re: Philippe's Management of Microsoft (was: Re: Yoruba Keyboard)
Raymond Mercier wrote, Isn't it the other way round ? I attach a file with three characters all in UTF8, representing CJK(A), CJK and CJK(B). The CJK(A) displays in IE6 only if span lang=ZH.../span is included, but it *does* handle the CJK(B) without any reference to lang. In Mozilla all three display without the lang=ZH Well, I tested here before writing you privately. I've never been able to get IE6 to show non-BMP text encoded as UTF-8. And, I've never had a problem getting IE6 to show CJK-A in UTF-8. I attach a file with two lines of CJK characters. The first line is CJK-A, the second line is CJK-B. It's just a simple test file. Here, the first line displays just fine, the second doesn't. The second line won't display even with a FONT FACE inserted. Also attached is a small gif showing the HTML source as it appears in NotePad. So, some Windows apps *can* display CJK-B in UTF-8, but, AFAICT, IE6 can not. Of course to see the CJK(B) you need the font Simsun (Founder Extended). I don't have the Founder Extended SimSun font, though. Best regards, James Kass 㓀㓁㓂㓃㓄㓅㓆㓇㓈㓉㓊㓋㓌㓍㓎㓏 cjka_b.gif
Re: CJK(B) and IE6 (was Re: Philippe's Management...)
(Many thanks to Raymond Mercier who has helped me resolve the display problem here with CJK-B, UTF-8, and MSIE6.) I just got the UTF-8 CJK-B in my test page to display in IE6. Here's how. The registry setting for Windows XP allows for a default font for the BMP, a different font for Plane One, a different font for Plane Two, and so forth. NotePad (and, presumably other Windows apps) use this registry setting for font switching in plain text. The browser does not seem to use this particular registry setting. The registry settings for Internet Explorer only allow for one font for surrogates. Naturally, I had that setting as Code2001, which only tries to cover Plane One. So, what I did was add the CJK-B font as the IEFixedFontName value in the appropriate registry setting, then put PRE tags around the CJK-B text. Voila! (Of course, I could have made the CJK-B font be the IEProp... font name, then it would work without the PRE tags, but I want to have the 'best of both worlds', so kept Code2001 as the proportional surrogate font.) So, IE6 *will* display Plane Two material in UTF-8. IE6 will not display Plane One material in UTF-8. That's the bug. (NotePad does display both Plane One and Plane Two UTF-8 text.) (If you're on Windows, and want to tweak your registry for supplementary character support, this page by Tex Texin will help... http://www.i18nguy.com/surrogates.html ) Best regards, James Kass
Re: Arid Canaanite Wasteland
Peter Kirk wrote, That might help, but living users are better than ones long dead. If you ask us to dig up members of a dead script's user community, it shouldn't surprise if we use a shovel. Best regards, James Kass
Re: New contribution
- Original Message - From: D. Starner [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, May 03, 2004 9:37 PM Subject: Re: New contribution A possible question to ask which is blatantly leading would be: Would you have any objections if your bibliographic database application suddenly began displaying all of your Hebrew book titles using the palaeo-Hebrew script rather than the modern Hebrew script and the only way to correct the problem would be to procure and install a new font? Again, change Hebrew to Latin and palaeo-Hebrew to Fraktur and see how many objections you get. Again, no, you can't use archaic forms of letters in many situations, but that doesn't mean they aren't unified with the modern forms of letters. No one would have procure and install a new font, because Arial/Helevica/FreeSans/misc-fixed have the modern form of Hebrew and will always have the modern form of Hebrew and all other scripts that have a modern form. I mean, maybe you're right and Phonecian has glyph forms too far from Hebrew's to be useful, and it's connected with Syriac and Greek as much as Hebrew, but this argument just doesn't fly. It was only a contrived example of a leading question devised to elicit a pre-determined specific response and was intended to be mildly funny. It was offered in response to a question proposed by John Hudson, which, although not exactly leading, I considered unfair. Yes, it's pretty far-fetched. But, your response supposes that bibliographic databases are always displayed in a fixed-width font. I have a bibliographic database which can display UTF-8 material in a proportional font. It works by exporting a record (or, group of records) in HTML format as a separate file and firing up the browser with this on-the-fly page loaded. Since the database application is stone-age, it has no awareness of anything as exotic as character sets. So, in order to edit these UTF-8 records, a record is exported in plain text format and my application fires up BabelPad, then re-imports to the database from the altered text file. This is a poor man's Unicode enabled multilingual database. Yeah, it's kludgey, but it sure does work! Suppose that, 1) Phoenician is unified with Hebrew. 2) A user has a bibliographic database which uses FreeSans. 3) The FreeSans developer is a Phoenician script enthusiast who removes the Hebrew glyphs from the font and replaces them with Phoenician glyphs. 4) The user updates FreeSans on the system and fails to make a back-up copy of the font. 5) Meanwhile, the FreeSans developer has pulled all of the previous editions of FreeSans off the internet... Hey, it *could* happen! (Yeah, and pigs could learn to fly.) Best regards, James Kass
Re:CJK(B) and IE6
Raymond Mercier wrote, BabelPad is great, but it chokes in converting all the UTF8 in unihan.txt to NCR at one go. I wrote a dedicated program to do that. Options - Advanced Options - (Edit Options) - Make sure the box for Enable Undo/Redo is not checked. Yes, when the commas in UNIHAN.TXT were being globally replaced with middle dots here, BabelPad stopped responding. But then, Andrew wrote to the list with a tip about the undo/redo feature. (Just in time, I was going to write a dedicated program.) When making global changes in such a large file, Options - Advanced Options - (Edit Options) - Make sure the box for Enable Undo/Redo is not checked. Best regards, James Kass
Re: New contribution
Please take a look at the attached screen shot taken from: www.yahweh.org/publications/sny/sn09Chap.pdf If anyone can look at the text in the screen shot and honestly say that they do not believe that it should be possible to encode it as plain text, then the solution is obvious: We'll disagree. Best regards, James Kass tetra.gif
Re: Nice to join this forum....
Dele Olawole wrote, That is what I have said that gb is a letter, a single letter and not combination of letter. Look at this statement - Gbogbo awon are GB ti de. - All people from Great Britain have arrived. Going further to be a bit funny I can say Great Britain o great britain o awon ara Great Britain ti de. Mo gbó̩ Òyìnbó. (My e-mailer doesn't tag outgoing messages as UTF-8, so some people have to manually select UTF-8 encoding in their e-mail display if they want to see it.) Unicode considers such combinations of letters to be presentation forms of letters which are already covered in the Unicode Standard. Although for the Yoruba language, the gb digraph is treated as a single letter, for computer encoding it is a string of two characters, g plus b. I do not know what you were trying to say concerning the letter g - What about gangan, ganganran, gongo, gogongo, gudugudu and etc Since I do not know what you were trying to say, I will stop there. Philippe Verdy had commented on putting a mark under the letter g, and I only said that Yoruba doesn't use any marks with the letter g. I chose the 3rd options and that makes Ariya the best Yoruba fonts available today. It is exciting to know that you are making good fonts for Yoruba! Do you have any examples on-line? Best regards, James Kass
Re: Nice to join this forum....
Dele Olawole wrote, Here are few Yoruba alphabets which might not be new to you, so how can you equate G+B with GB even if you claimed it has significant. How significant is significant? A B D E E F G GB Please take a moment to visit this page: http://www.unicode.org/standard/where/ Notice that the ch digraph as used in Slovak (and Spanish) is simply encoded as U+0063 plus U+0068. For more details on characters versus glyphs, www.unicode.org/versions/Unicode4.0.0/ch02.pdf and http://www.unicode.org/reports/tr17/#Characters vs. Glyphs Best regards, James Kass
Re: Nice to join this forum....
Asmus Freytag wrote, This is only true if: a) there is no visual differentiation There is no visual differentiation in any of the examples I've ever seen. I would like to see a (small) picture of Yoruba text with these digraphs. I sent a small picture off-list taken from this on-line PDF: http://www.learnyoruba.com/ORTHOGRAPHY_1.pdf Wondering about casing, if the gb diagraph appears initially, I have a booklet for learning Yoruba which includes the proper name of the Rt. Rev. Isaac Gbekeleoluwa Abiodun Jadesimi in the bilingual dedication. In both the Yoruba and English versions of the dedication, only the letter G in Gbekeleoluwa is in upper case. Best regards, James Kass
Re: New contribution
John Hudson wrote, Again, I'm not opposing the encoding of 'Phoenician' on principle, but I do think it is more complex than Michael's proposal presumes, and that more consultation with potential users is desirable. I think one of the questions asked should be, frankly: Do you have any objections to encoding text in the Phoenician / Old Canaanite letters using existing 'Hebrew' characters? If so, what are these objections? That question misses being a 'leading question' slightly. The easiest answer for the respondent is No, as then no further explanation on respondent's part is necessary. Furthermore, if we are to believe the allegations about these users, they are already performing this reprehensible practice, and so have apparently surmounted any objections they might have once held. A possible question to ask which is blatantly leading would be: Would you have any objections if your bibliographic database application suddenly began displaying all of your Hebrew book titles using the palaeo-Hebrew script rather than the modern Hebrew script and the only way to correct the problem would be to procure and install a new font? A fairer question to ask might be: Would you have any objections if the Phoenician script were given a separate encoding in the Unicode Standard as long as such an encoding wouldn't interfere with your ability to continue encoding texts as you please? (And to the last, I'd be tempted to add: If so, what on Earth could those objections be?) Best regards, James Kass
Re: New contribution
John Hudson wrote, That said, I am very glad that Ms Anderson's further questions encourage users to review the Phoenician proposal and to comment on its merits. Encouraging users to review the proposal and comment on its merits strikes me as a fairer approach than the questions you and I have constructed. Best regards, James Kass
Re: New contribution
John Cowan wrote, (And to the last, I'd be tempted to add: If so, what on Earth could those objections be?) Expense. Complication. Delays while the encoding gets into the Standard and thence into popular operating systems, with all the accoutrements such as keyboard software. Those objections are quite generic and could be made just as well for N'ko, Ol Cemet', Egyptian Hieroglyphics, c. While those objections might be voiced by actual users, none of those objections should impact the decision making process. Best regards, James Kass
Re: Nice to join this forum....
Philippe Verdy wrote, From: D. Starner [EMAIL PROTECTED] Unicode will not allocate any more codes for characters that can be made precomposed, as it would disrupt normalization. But what about characters that may theorically be composed with combining sequences, but almost always fail to be represented successfully? Likewise. If such ligature has a distinct semantic from a ligature created by ligaturing separate letters for presentation purpose, the character is not a ligature (the AE and OE ligated glyphs are distinct abstract characters) . The gb combination mentioned in the original post is considered a letter in the Yoruba alphabet. It is not a ligature, it is a digraph. Likewise, in the Spanish alphabet, the ll combination is considered a letter. It is also a digraph. Both of these combinations are already handled by ASCII. (Note that the AE and OE ligated glyphs *are* ligatures.) The case of dot below however should be handled in fonts by proper glyph positioning and probably not by new assigned codepoints, unless this is only one possible presentation form for an actual distinct abstract character that may have other forms without this separate diacritic (for example if g with dot below was only one presentation for an abstract character that may be also renderd with a small gamma) Yoruba doesn't use any marks with the letter g. It does use some diacritics like acute, grave, and macron to indicate tones. It also uses a mark below the letters e, o, and s which alter the pronunciation of those letters. This is where there remains some controversy. One faction prefers the use of a vertical line below which should attach to the base letter, and the other faction prefers to use the dot below. Best regards, James Kass
Re: Nice to join this forum....
Dele Olawole wrote, Ẹ ́ the accent is at the edge of the E with dot below - It is the same no matter which font is used On this Ọ̀ it almost fell off éẹ́èẹ̀ - On all these ones they are not on the same level One reason that it displays badly is because it is encoded wrong. In the first example, you have E plus dot below plus space plus combining acute. This should be E plus combining acute + dot below. Likewise, the encoding is wrong for the other examples. Ẹ́ or, more properly (depending upon point of view) É̩ (Both of these display perfectly well here. If they do not display well there, then, assuming that the UTF-8 text survived transmission, either your system lacks a proper font, or your system does not support complex script shaping for the Latin script. In either case, this is beyond the scope of Unicode and is considered a display issue.) Enter the marks above (tone marks) first, then enter marks below. Don't use spaces between base letters and marks, that breaks the complex shaping. For display issue problems, please see: http://www.unicode.org/help/display_problems.html Best regards, James Kass
Re: Arid Canaanite Wasteland
D. Starner wrote, And there are sites that consider Gaelic and Fraktur seperate scripts, including one by Michael Everson. Even if we assume knowledge and competence, we still can't assume they're using the same definition for a seperate script as Unicode does. I agree with the second statement above, but would like to see the link to the Everson page(s) mentioned. Sure, there are people who consider Roman and Italic to be separate scripts, too. When someone requests evidence of how users treat something, we just try to find that evidence and factor it in accordingly. Imagine going back in time ten years or so and approaching the user community with the concept of a double-byte character encoding system which could be used to store and transfer electronic data in a standard fashion. If they'd responded to this notion by indicating that their needs were already being well-served by web-Hebrew, would the Unicode project have been scrapped? Yes. How many millions of dollars have gone into defining and implementing Unicode? Do you honestly think that Microsoft and IBM and Apple would have spent all the money they have if their users were well-served by what you call web-Hebrew? I don't think that the users were well-served by what is called web Hebrew and never said I did. Web Hebrew is a standard which involves what we now call the masquerading of Hebrew characters as upper-ASCII. Web Hebrew AD and Web Hebrew Monospace are the names of TrueType fonts. Other fonts use the same masquerade, thus it was an ad-hoc standard. http://www.brijnet.org/ivrit/webheb.htm http://www.stanford.edu/~nadav/hebrew.html http://www.jewfaq.org/alephbet.htm ... and many other pages give info about Web Hebrew. Quoting from the jewfaq page, The example of pointed text above uses Snuit's Web Hebrew AD font. These Hebrew fonts map to ASCII 224-250, high ASCII characters which are not normally available on the keyboard, but this is the mapping that most Hebrew websites use. I'm not sure how you use those characters on a Mac. In Windows, you can go to ... So now if you think that two scripts that are isomorphic and closely related should be unified, then you're exerting political pressure? Since no rational basis for the heated objections to the proposal seems apparent, political pressure appears to be a likely choice. Best regards, James Kass
Re: New contribution
John Hudson wrote, This is a silly question, because the whole debate is about that constitutes 'properly encoded'. The Mesha Stele can be perfectly easily encoded using existing Hebrew codepoints and displayed in the Phoenician style with appropriate glyphs. I'm not saying that this is necessarily the best encoding for the Mesha Stele, but I'm certainly not convinced that there is anything improper about it, or that having a separate encoding for those glyphs would be more proper. There's nothing improper about transliteration. Likewise, the Phoenician inscription of Edessa in Macedonia could be easily encoded using existing Hebrew code points, even though its language is Greek. If one wanted to go through the trouble of setting up OpenType tables accordingly (to point to redundant glyphs mapped with positional variants to compensate for default shaping behaviour), the Meshe Stele could probably be easily encoded using existing Arabic code points, as well. Best regards, James Kass
Re: New contribution
John Hudson wrote, Again, you are missing the point because you are *assuming* that encoding the Mesha Stele with Unicode Hebrew characters = transliteration, i.e. that there is some other encoding that is more proper or even 'true'. The contra-argument is that the 'Phoenician' script is identical to the Hebrew script, the differences in letterforms being merely glyphic variants. The contra-argument disagrees with your premise that encoding the Mesha Stele with Hebrew characters is transliteration. You can't proceed past that argument simply by restating your premise. (The ISP here is line-breaking John's text inappropriately.) The Meshe Stele and the inscription of Edessa were originally written in the same script. If encoding the Edessa inscription using the Hebrew range would be transliteration, then so would the encoding of the Meshe Stele in the Hebrew range. If Phoenician is considered a glyphic variation of modern Hebrew, then it can also be considered a glyphic variation of modern Greek. Would it then follow that modern Greek should have been unified with modern Hebrew? (Directionality aside.) If Unicode were about encoding languages rather than scripts, then I would see nothing wrong with encoding the Meshe Stele using modern Hebrew characters and relegating correct display to a font switch. Best regards, James Kass
Re: For Phoenician
Peter Kirk wrote, This pedagogical usage is not in plain text, or at least plain text usage has not been demonstrated. I think I asked before and didn't receive an answer: should Unicode encode a script whose ONLY demonstrated usage is in alphabet charts? I think the answer is not, because essentially these charts are graphics of glyphs, not text. I wonder where the folks who made those charts got those glyphs? Best regards, James Kass
Re: New contribution
Peter Kirk wrote, This is based on a historically unproven assumption that this script originated with the Phoenicians. I don't think it's even true that the oldest surviving texts in this script are Phoenician. Would the oldest surviving texts in the Phoenician script be in a script other than Phoenician? The Mesha Stele (otherwise known as the Moabite Stone) is already available in Hebrew script. What is the need for a separate encoding of the same text? There are probably other transliterations of the text already available, too, such as Latin. Wouldn't it be nice to see the inscription displayed in its original script, properly encoded? Yes, this is what I have been talking about, mostly. Sorry to everyone for not making this clear. I take it as self-evident that a Phoenician etc text to be presented (transliterated if you like) with square Hebrew glyphs should be encoded with the Unicode Hebrew characters. What is in dispute is how a text to be presented with Phoenician or Old Canaanite glyphs should be encoded. If the current proposal isn't derailed and eventually accepted, then such a text should be encoded with Phoenician characters because texts should be encoded in the scripts in which they were written unless transliteration is the goal. If the current proposal is derailed, then such a text should be encoded in the PUA. Best regards, James Kass
Re: Arid Canaanite Wasteland (was: Re: New contribution)
- Original Message - From: Peter Kirk [EMAIL PROTECTED] To: Kenneth Whistler [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Saturday, May 01, 2004 9:43 AM Subject: Re: Arid Canaanite Wasteland (was: Re: New contribution) Peter Kirk wrote, Understood. But on the other hand, the lack of a consensus among *any* people that they have a need for an encoding does seem to imply that there is no need for an encoding. I have yet to see ANY EVIDENCE AT ALL that ANYONE AT ALL has a need for this encoding. So I am asking simply that the proposer demonstrates that there is SOME community of users who actually have a need for this encoding, for plain text rather than graphics. I have asked for this over several months. The new proposal not only fails to demonstrate this, it indicates that the proposer has not even attempted to find any such community of users, because he admits to not contacting any user community. Let's find out how some actual users in the user community deal with this controversial issue. Googling for palaeo-Hebrew brings us this... http://ebionite.org/fonts.htm ... (it's the second or third hit, depending on how you count) web site all about fonts and how they can be used to render Hebrew text on our computers. The Evyoni web site uses the good old symbol font to depict the occasional Greek glyph. Quoting from the page: We also use a font that uses upper ASCII to show Hebrew in the same manner as Web Hebrew fonts (with the same character assignments) but with added features. Included in the font is transliteration symbols for Hebrew in two schemes to make it backwards compatible with our first special font we used on our sites. And instead of using the square script used to represent Hebrew today and over the last few milennia, we use Palaeo-Hebrew script. Palaeo-Hebrew has been used in the past to archaize, that is, to preserve a link to an earlier state of things. That is after all, what we are about, so Palaeo is the perfect script for us to use. (Note that this site considers Palaeo a separate script, this is quite clear in the paragraph quoted above.) some flippancy What a simple solution, using upper-ASCII for non-Latin glyph display. Why, with that novel approach, we could set up our computers to handle all kinds of script changes by simply changing the font-in-use to something different! Let's clean up our act and get in on this band wagon. We could start with so-called Linear-B. That's just palaeo-Greek, if one prefers not to refer to the script of the Greeks as Linear, for whatever reason. So, we can deprecate the entire Linear-B range and put notes in the Standard explaining how Linear-B is actually a glyph or font variant of Greek. While we're at it, we can do Coptic the same way, by gosh. Shoot, if we use that clever upper-ASCII method delineated above, we can deprecate the Greek range, too. end flippancy Their home page has a graphic of Hebrew script surrounding a Menorah, a graphic showing Latin script with diacritics, and a graphic showing good, old palaeo-Hebrew. Let's move on to another web page, http://www.fossilizedcustoms.com/critic.html ...where the author has been criticized for his choice of using palaeo-Hebrew characters and is responding... Lew: YHWH Elohim used palaeo-Hebrew to write the Torah in the stone tablets, so I stand on my choice of characters with Him. In fact, most of the prophets wrote in the archaic, primary Hebrew; it was only during the Babylonian Captivity that the Yahudim took the Babylonian Hebrew characters on -- Belshatstsar needed Daniel to read this outlandish and ridiculous script, because the Babylonians knew nothing of it. Mosheh, Abraham, Enoch, Dawid, Shlomoh -- these men could not read modern Hebrew; they used that outlandish and ridiculous palaeo-Hebrew script. The Great Scroll of Isaiah (YeshaYahu) is a copy of the original, and it is on display in the Shrine of the Book Museum in Yerushaliyim -- the Name is preserved in its original outlandish and ridiculous palaeo-Hebrew script, while the rest of the text is in modern Hebrew. Another user heard from who apparently regards Phoenician and Hebrew as different scripts. Let's move on again to... www.yahweh.org/publications/sny/sn02Chap.pdf ... this PDF which doesn't need to be downloaded because we can see all we need in the Google blurb: ... In most cases he will come across a notation that the personal name Yahweh ( hwhy in palaeo-Hebrew and hwhy in Aramaic script) has M ... It's obvious that the good people at yahweh.org aren't complying with the upper-ASCII method for displaying non-Latin text in their PDF; apparently considering that both palaeo-Hebrew and Aramaic script can best be encoded with regular ASCII. Moving on, http://www.geocities.com/stojangr/transliterating___the___ancient.htm (Sorry, it's geocities.) ... here's a page all about the Phoenician inscription of Edessa in
Re: New contribution
Simon Montagu wrote, This misses the point. The question is whether the oldest surviving texts in the Phoenician script were written by Phoenicians. The fact that it's called Phoenician script doesn't prove anything about its origin: it may be analogous to the term Arabic numbers, which are Indian in origin but reached Europe via the Arabs. It's an interesting point, and I got it. Since we're all discussing scripts and script encoding, and since Peter Kirk had written, I don't think it's even true that the oldest surviving texts in this script are Phoenician, without specifying that he meant '...in this script are in the Phoenician *language*', I was only having a bit of fun with his wording. While the fact that it's called Phoenician script doesn't prove anything about its origin, it might be considered indicative of the path through which the script was borrowed. Best regards, James Kass
Re: CJK(B) and IE6
The lack of support for supplementary characters expressed in UTF-8 in the Internet Explorer is a bug. As Philippe Verdy mentions, the Mozilla browser does not have this same bug. Also it should be noted that the Opera browser handles non-BMP UTF-8 just fine. While working with NCRs may be an ugly nightmare, there are some shortcuts. The BabelPad editor can easily convert between UTF-8 and NCRs. Also, even though Internet Explorer doesn't display the material, it doesn't destroy the encoded text, either. It can be copy/pasted from the browser window into any aware application and retain its content. The Internet Explorer browser itself can convert between UTF-8 and NCR encoding forms with the File - Save As command. The Windows registry settings allow a default font to be specified for any plane. I have one font set for Plane One and a different font set for Plane Two in my registry, and Windows seems to handle this well. (Except for the UTF-8 bug in Internet Explorer.) Note also that it is possible to set a font other than the default font for displaying non-BMP text, just as it's possible to change the font in an HTML file. Either with CSS or font-face/family tags. The registry settings should only be for default, in other words if the application or mark-up has not specified another font. I *think* that Windows 2000 uses Unicode always internally and uses an internal conversion chart if material is non-Unicode like GB-18030. As far as I know, this means that GB-18030 support on Win2000 would be limited to Unicode's BMP unless the special registry settings were made. But, I could be wrong on this. Since GB-18030 is important to many, it's very possible that Microsoft already made allowances for this. Best regards, James Kass
Re: Public Review Issues Updated
Kenneth Whistler wrote, What nobody seems to have noticed yet is that in that same document, Rev. J. Owen Dorsey also used an uppercase turned T (the capital letter form of U+0287 LATIN SMALL LETTER TURNED T, which also appears in this text). Those turned t's were used in Dorsey's orthography of Omaha and Ponca texts. Turned upper case T is also used in Fraser script. (Daniels Bright, page 582) Best regards, James Kass
Re: New contribution
Dean Snyder wrote, 1) The script is wrongly called Phoenician - the same script was used for Old Phoenician, Old Aramaic, Old Hebrew, Moabite, Ammonite, and Edomite. That is why I propose it be named [Old] Canaanite. The Latin script is used for English, German, Tahitian, Apache, etc.. But it remains the Latin script. Likewise, Phoenician is Phoenician, even if other users borrowed it. Dean Snyder wrote, Then why were Chinese, Japanese, and Korean unified? They weren't. There are three distinctive writing systems involved with CJK. They share some common ideographs and this is where some unification has been involved. In the case of ideographic unification, one can look at the glyphs involved and clearly observe the similarity. This is not so with Phoenician and Hebrew, clearly. Unifying Phoenician and Hebrew would be akin to unifying Katakana and Hiragana. *That* would be silly. Peter Kirk wrote in response to Chris Fynn's Telugu/Kannada comparison: Yes, but two wrongs don't make a right. One past mistake of Unicode, or decision it had to take for compatibility reasons, does not create a precedent. Treating Telugu and Kannada as distinct scripts was not a mistake. Peter Kirk wrote, Not really. Acceptance of the proposal would create an expectation that Phoenician texts should be encoded with the new Phoenician characters, and so that existing practices are wrong and should be changed. Not necessarily. The existence of a Cyrillic range doesn't preclude Latin script users from writing Trotsky. ...That expectation is of course not acceptable to scholars. Also not acceptable is the inevitable result that Phoenician texts will be encoded in two different ways, leading to lack of searchability and potentially total confusion. Chris Fynn previously pointed out a similar issue with Sanskrit texts written in various Indic scripts. Having one language encoded in more than one script is not unprecedented. Search features can just be programmed accordingly. If there is such a small minority, let us hear from them. As far as I know this is a minority of one. Please. When the Phoenician script is approved, I will post a hypertext version of the Meshe Stele. ( http://home.att.net/~jameskass/phoeniciantest.htm ) John Hudson provided this scan: http://www.tiro.com/view/NorthSemitic.jpg ...which shows the Phoenician script at various stages. It's a bit misleading, though. If the only available reference were this scan, we could infer that, although the Phoenician language used the letters K, L, and M from 975 to 930 B.C.E., these letters were dropped from the language by 900 B.C.E. only to be added back into the repertoire by the Moabites around 830 B.C.E.. Quoting Birnbaum from John Hudson's letter: To apply the term Phoenician to the script of the Hebrews is hardly suitable. I have therefore coined the term Palaeo-Hebrew. In one sense, it is OK to call Phoenician a Hebrew script, since Phoenician was used to write Hebrew. In another sense, calling Phoenician a Hebrew script would be just as incorrect as calling the Phoenicians Hebrews. To apply the term Phoenician to the script of the Phoenicians seems eminently suitable. Best regards, James Kass
UNIHAN.TXT
Like UNIHAN.TXT, brevity is not a feature of the following... Tabs... In addition to the points Mike made about the tab character having different semantics depending on the application/platform, I just don't think a control character like tab belongs in a *.TXT file period. Although UNIHAN.TXT is referred to as a database, it isn't. Rather, it's the raw material for a database offered in plain-text form. Still, tabs are arguably OK. It's easy enough to strip them out when they're not wanted. (I'd rather deal with tabs in a text file which is to be imported into a database than ASCII quotes.) Unix -vs- DOS... I'll stick with the tools I've been using for a quarter century and their descendants, thanks just the same. With respect to the idea that a text editor is not the proper tool with which to open a *.TXT file, well... Trivial -vs- non-trivial... Once the raw data has been imported into a database, it's trivial to massage or manipulate it. It's easy enough to generate a CSV file from a database application, and I've done so. But, the only reason that I wanted it in CSV in the first place was to make it easy to import the data into the database application. This was *not* trivial to do; it involved a lot of coding and counting, and a bit of trial-and-error with various field lengths. Still, the task managed to keep me quiet for a few days... With a CSV file, importing data from a text file into a database file simply involves a single line command in the interactive mode (once the database file structure has been established). This is true for dBASE, FoxPro, and related database applications. Of course, the same kind of single line command can be (and was) used to import the data from the UNIHAN.TXT file into a database, but this produces a huge database file [266844944 bytes] which *still* does not have proper fields. It still has one record/one field just like the original UNIHAN.TXT file. Which means, if you want to get the information for a certain field of a certain character, that you have to go skipping through all 1063127 records checking each one rather than the mere 71098 records that the database actually requires. (Of course, you'd use an index file rather than skipping through all those records in either case.) But, if you wanted to modify only one field, it's more efficient to skip through 71098 records reading and modifying only the appropriate field in the record than to go skipping through all 1063127. Easier to program, too. (Suppose you were a purist who wanted to see Stimson's pronunciations using the actual characters that Stimson used? Or, say you wanted pronunciations in lower case rather than upper case and preferred that the tone marks be superscripted? Hmmm, maybe you'd want those Japanese pronunciations in kana instead of romaji...) So, UNIHAN.TXT is 27592561 bytes, but the CSV text file is 13384544 bytes. Zipped, UNIHCSV.ZIP is 3477887 bytes. (The CSV file lacks the initial 802 lines of comments in the source UNIHAN.TXT file.) Only cut the size in about half, not as great a savings as I'd imagined. This is because many of the fields in the source UNIHAN.TXT are actually empty, and thus don't occupy a line in the file, while empty fields in the CSV file still require a single byte for that comma. D. Starner wrote, Because it's a data file, and it's easier to process without all that HTML junk to discard. Right on! John Jenkins wrote, Now that UTF-8 support is relatively common, we're moving more and more data in the file to non-ASCII form. It is a delight to observe this happening already. But, changing the format of the file might make it harder for some users to find the data they seek. So, I'm not necessarily proposing any change, but rather pointing out that alternatives exist. That's the *real* problem. Goodness knows the current format has real problems, and brevity is not among its virtues. (OTOH, the format it replaces was brief to the point of being incomprehensible.) Unfortunately, nobody's come up with a good strategy for migrating to something else. I could send you the CSV file for posting, if you think anyone else would want it. Doug Ewell wrote, And as John said, converting LF to CRLF is quite a simple task -- it can even be done by your FTP client, while downloading the file -- and should not be thought of as a deficiency in the current plain-text format. Right. It's not a deficiency, it simply adds one more step to a multi-step process for some of us. Benjamin Peterson wrote, Wow -- I'd hate to see your idea of a non-trivial solution! Me too! Edward H. Trager wrote, People tend to use what they know best, ... Exactly. Absolutely. The existence of Cygwin makes work on Windows much more tolerable, especially since Cygwin provides the OpenSSH client, XFree86, Perl, console vim, egrep, etc. However, I still haven't figured out how to display a UTF-8 file with non-latin
Re: Public Review Issues Updated
John Cowan wrote, Ah, I see the next battle line forming: Is Fraser a separate script, or just an oddball application of Latin caps for which we need a few new ones? Well, the Punic wars may not be over yet. But, I'd go with Fraser being just an oddball application of Latin caps for which we need a few new ones. Like the turned T and reversed K, which seem to have other uses, too. Fraser might need some special punctuation-style characters, or these might be treated as ligature presentation forms of existing Western punctuation. Best regards, James Kass
Re: Fraser
John Cowan wrote, Is there an explanation anywhere on the Net? I don't have D B. The Proel page on Miao has a good scan of Fraser script interspersed with several examples of Pollard script. Note that Proel fails to make the distinction between Fraser and Pollard. The Fraser example follows the text La figura inferior muestra el mismo texto, Juan 3:16, en caracteres lisu y en dialecto lisu occidental, hablado en China sudoccidental. http://www.proel.org/alfabetos/miaonew.html But, this example doesn't show the 'punctuation' strings that are in DB. Thank goodness for Omniglot! http://www.omniglot.com/writing/fraser.htm Here's the text example from DB making use of the PUA in UTF-8 (you might have to manually select UTF-8, my ISP doesn't tag outgoing e-mails) which can be viewed if you have a certain font installed... [from Daniels and Bright p. 582] Sample of Hwa (Western) Lisu NY N. NU MI: : SI KW ΛW FI DU FI U KO_ LO-. YI NY NU J GU YE T VU NY, G⅂_ BV_ LO= O: : DE KW L ℲO I RO U TY_ M S NY SI J GU NU W YE T VU NY, YI CƎ. TƎ, TƎ,; BE XY, B LO= (I just used existing ASCII punctuation in this example.) Best regards, James Kass
Re: Fraser
(I just used existing ASCII punctuation in this example.) Actually, I used PUA for these tonal marks, too, it appears. Best regards, James Kass
Re: Brahmic Unification (was Re: New contribution )
Andrew C. West wrote, No, not at all. The charts may show consonant-vowel syllables, but that does not mean that I believe that they should be proposed to be encoded as syllables. What I was saying was that all the glyphs needed for a proposal are nicely laid out here, not that there is necessarily a one-to-one correspondence between these glyphs and Unicode characters. Furthermore, Jost Gippert (the author of the Tocharian page) has long been a proponent of Unicode, has worked with other Indic scripts, and has a good understanding of Unicoding principles. What more could we ask? Best regards, James Kass
Re: New contribution
Dean Snyder wrote, In the case of ideographic unification, one can look at the glyphs involved and clearly observe the similarity. This is not so with Phoenician and Hebrew, clearly. Yes it is, for the ancient periods. Because the ancient Hebrews used the Phoenician script. Hebrew has been frequently used inexactly in the context here as a cover term for a wide range of script variants, spanning thousands of years. That may be, but not by me. This is useful in some contexts, but not when we are talking about the ancient periods. Hebrew (as a cover term for the scripts used by the Israelites down through the millenia) underwent several developmental stages. That is why I specifically use the phrase Old Hebrew when talking in a Phoenician context. They were contemporary scripts and in the earlier periods are practically indistinguishable (as is also Old Aramaic). I posted several glyph charts from several scholarly sources on the Unicode Hebrew list illustrating the marked similarities (and distinctions) that exist between most of the West Semitic diascripts. (Multiple columns of which, by the way, are entirely, and conveniently, missing from the current proposal.) Birnbaum apparently coined the phrase palaeo-Hebrew because he didn't like referring to a Hebrew script as Phoenician. But, that's what it was. When I speak in a Phoenician context, I'm pleased to use the word Phoenician. Old Hebrew, palaeo-Hebrew, Phoenician, and even Old Aramaic are, indeed, practically indistinguishable. One of the several glyph charts which you kindly provided came here, to the main Unicode public list. As I recall, it illustrated similarities in various scripts which were already unified in the (then) current proposal. Please. When the Phoenician script is approved, I will post a hypertext version of the Meshe Stele. You can do it right now - just specify one of the nice Phoenician (or better here, Moabite) fonts available for the text. There aren't any, because Phoenician hasn't been encoded yet. (Couldn't resist, could I?) John Hudson provided this scan: http://www.tiro.com/view/NorthSemitic.jpg ...which shows the Phoenician script at various stages. It's a bit misleading, though. If the only available reference were this scan, we could infer that, although the Phoenician language used the letters K, L, and M from 975 to 930 B.C.E., these letters were dropped from the language by 900 B.C.E. only to be added back into the repertoire by the Moabites around 830 B.C.E.. They're missing from the charts because examples for those particular glyphs were not extant in the sparse data available when those charts were compiled. The scan John provided shows Phoenician as written by six different scribes at different times in slightly different places, as far as I can tell. (Or, it could have been the same scribe who lived a long life and moved around a bit.) The first three lines of the scan are Phoenician and labelled as such. The last line of the scan, Palaeo-Hebrew is a name coined by Birnbaum for Phoenician used to write Hebrew. Likewise, the Moabite and Aramaic examples are showing the Phoenician script used to write those languages. My guess would be that several letters weren't included in all of the examples because the original examples, some of which were apparently single inscriptions, were too short to include all the letters of the alphabet. It is the same script shared by the ancient Phoenicians, Hebrews, Samaritans, Aramaeans, Moabites, Ammonites, and Edomites. In short, the name Canaanite seems preferable. After the first few centuries of use of this script by these peoples, each of the major cultural groups developed this shared script along sometimes more, sometimes less, independent tracks. If a name like Canaanite or proto-Canaanite would be preferable, then so be it. Best regards, James Kass
Re: Unihan.txt and the four dictionary sorting algorithm
Raymond Mercier wrote, John Jenkins writes Also, even though the full Unihan database is 25+ Mb in size, given the cheapness of disk space nowadays, it's not all *that* big, surely. The problem of the size of Unihan has nothing at all to do with the cost of storage, and everything to do with the functioning of programs that might open and read it. Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D, this means that when opened in notepad the lines are not separated. Notepad does have the advantage that the UTF-8 encoding is recognized, and the characters are displayed. UNIHAN.TXT isn't going to get any smaller by itself. The trend indicates that it will just keep on growing, even if VS characters are used with CJK. The DOS editor chokes on such a large text file, so does my older hex editor. Thank goodness for BabelPad, otherwise it would've been hard to insert proper (for my system) line breaks into the file. The tab character is used in the file. Arguably, this character should never appear in a plain text file, rather it should be converted to an appropriate number of U+0020 characters by the application on save. Of course, this would make the file even bigger. Instead of (for instance) KUA4, why not KUA⁴? Much of the text in UNIHAN.TXT is redundant, the hex character is repeated along with each field name over and over again. Putting the hex character at the beginning of each line, with one character per line and CSVs would make UNIHAN.TXT *much* smaller. Of course, commas would have to be removed from the definition fields. (Hmmm, maybe definition field commas could be replaced with MIDDLE DOT?) But, changing the format of the file might make it harder for some users to find the data they seek. So, I'm not necessarily proposing any change, but rather pointing out that alternatives exist. In spite of its unwieldy size, UNIHAN.TXT is a useful tool and I'm grateful for its existence. Best regards, James Kass
CJK U+3ADA and U+66F6
Is there a difference between U+66F6 and U+3ADA? The newest UNIHAN.TXT file doesn't have a definition field for U+66F6. The glyphs in the Unicode 4.0 book appear identical for these two characters. One is placed with radical 72, the other with radical 73, although UNIHAN.TXT gives both as having radical 73. U+3ADA kIRGKangXi 0502.080 U+66F6 kKangXi 0502.080 (In UTF-8:) U+3ADA (㫚) U+66F6 (曶) Best regards, James Kass
Re: CJK U+3ADA and U+66F6
Asmus Freytag wrote, this is the kind of thing that you should report via our error reporting form. Here on the open list, it's liable to get lost (no-one owns excerpting issues from this forum). Before reporting it through proper channels, I wanted to try to find out which kind of error it is. It could either be a bad glyph in the font(s) or a truly duplicated character. Radicals 72 and 73 are similar in appearance, but the central horizontal line in Rad. 73 doesn't meet the vertical line on the right. But, when radical 73 is used as a component of other characters, it often looks just like radical 72. So, I'm not sure whether there are two separate characters with the same top component (U+52FF 勿) over two different radicals, or just a duplication. Best regards, James Kass
Re: New Currency sign in Unicode
See the currency symbol in use on postage stamps, http://www.bird-stamps.org/country/ghana.htm ...notice the different glyphs in the third and fourth rows. Best regards, James Kass
Re: New Currency sign in Unicode
Jim Allan wrote, This web page also has a slashed capital G for the Paraguayan guarani, another symbol not in Unicode. The guarani symbol has been accepted by the UTC. Here's the original proposal: http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2579.pdf Best regards, James Kass
Re: Printing and Displaying Dependent Vowels
John Cowan quoted, Well, it depends on what the equivoque combining marks in the title of Section 7.7 means. This is where (p. 187) the remarks about SP and NBSP appear: # Marks as Spacing Characters. By convention, combining marks may be exhibited # in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NO-BREAK # SPACE. This approach might be taken, for example, when referring to the # diacritical mark itself as a mark, rather than using it in its normal way # in text. Note the use of may and might in the quoted text rather than must. The above could be interpreted in part as '... combining marks may be exhibited in (apparent) isolation by applying them to U+0020 SPACE, or they may not.' Such an interpretation might lead people to decide that the approach is up to the renderer. Semantics aside, if the default display appearance of a combining mark in isolation on a certain system is the mark on a dotted circle, then that system should be considered conformant when it displays space+mark as dotted_circle+mark. An observation, FWIW: on the system here, combiners in Indic scripts get the dotted circle, but combining diacritics from the (mostly) Western combining diacritics range don't. Space + U+0327 displays a stand-alone cedilla here; no dotted circle. Best regards, James Kass
Re: What is the principle?
Asmus Freytag wrote, While applications predating VSs have no choice but to treat them as what they are (in that context) i.e. unassigned characters, applications of later date have no business treating unapproved VS sequences as unassigned *characters*. The intent of VSs is to mark a difference that falls below the distinction between separately encoded characters. Therefore I would expect that by default all VS charactesr are ingnored in an fullblown collation implementation, leaving open the choice of supporting, say, a fourth level difference between specific known variation sequences. They are also best ignored in any kind of identifier or name matching, as otherwise the presence of invisible characters can change the lookup--with all the consequences for spoofing and security. What you're saying makes perfect sense for purposes of forwards compatibility. Thanks to both you and Ernest Cline for pointing this out. I'd prefer to see some kind of toggle for file/archive searching with respect to ignoring VS characters, but can't argue with ignoring them for security/spoofing issues. Otherwise, the spam problem might well become even worse. Good collations are tailorable, so if the default condition is for collation to ignore VS characters, that shouldn't make problems for anyone. Best regards, James Kass At 07:53 PM 3/27/2004, [EMAIL PROTECTED] wrote: What does the collation standard say to do with unassigned codepoints anyhow? Variation selectors are not unassigned characters. But, they might be regarded as such by any application predating VSs. And, likewise for any VS sequences approved after the application was created. While applications predating VSs have no choice but to treat them as what they are (in that context) i.e. unassigned characters, applications of later date have no business treating unapproved VS sequences as unassigned *characters*. The intent of VSs is to mark a difference that falls below the distinction between separately encoded characters. Therefore I would expect that by default all VS charactesr are ingnored in an fullblown collation implementation, leaving open the choice of supporting, say, a fourth level difference between specific known variation sequences. They are also best ignored in any kind of identifier or name matching, as otherwise the presence of invisible characters can change the lookup--with all the consequences for spoofing and security. A./
Re: Printing and Displaying Dependent Vowels
C J Fynn responded to John Hudson, If someone wants this, isn't it possible to put a specific lookup in the font so that any dependant vowel following a space character renders as a spacing (stand-alone) dependant vowel? Surely a specific lookup should overide it being displayed on a dotted circle by default. Has anyone tried this? Would the space glyph U+0020 be expected to trigger a look-up in the Tamil GSUB table as if it were a Tamil base character? The reason that I haven't tried this is because, in the OpenType look-ups here for the re-ordrant vowel signs of Tamil, the vowel sign is INPUT1 and the base letter is INPUT2. This is because the rendering engine has already re-ordered the character string before this look-up is performed. It doesn't seem likely that a rendering engine would re-order a vowel sign before a space. It could be tested both ways, I suppose... This seems to be OT for this list, but, here it is, and it will probably keep popping up from time to time unless clarified. I can only make inferences and suppositions based on observation of the behavior and reasoning behind the behavior of the rendering engine used here, Microsoft's Uniscribe. People who know all about this do follow this list, so they're free to offer corrections. inference and supposition Uniscribe inserts the dotted circle into the display for complex scripts in order to give a visual indication of an encoding or spelling error. This seems quite useful whether text is being entered or merely displayed. Allowing dependent vowels to follow the space character breaks this utility. In other words, somebody could write a Tamil word in a web page starting with the E-vowel-sign (U+0BC6), and there'd be no indication that this is improper, either to the author or the visitor. Someone searching for that word on that page wouldn't find it, and so on. Maybe some kind of spell-checker should be used by the original author, but, there seems to be no way to assure that spell-checking was performed by the author of any web page one visits. It is the very appearance of that dotted circle unexpectedly in our texts which alerts us to the fact that we have made a mistake. That dotted circle jumps out of the page into our vision exclaiming, Hey, I'm wrong! I'm so wrong, don't even bother running your spell-checker on me! This is the basis upon which Uniscribe renders text which includes dependent vowel signs, not just for Tamil, but for the other so-called complex scripts, too. The dotted circle plus the matra is the default rendering for combining marks *in isolation*. Uniscribe seems to rightly treat a vowel sign following a space as being in isolation, and, how could it do otherwise? What goes for the space character also seems to go for any other character which is not a valid character *within the Unicode range*. Again, how could it be otherwise. If the first character in a string isn't a Tamil character, there's no reason for the renderer to consult the Tamil OpenType tables in a font. If it did, my gosh, imagine all the pointless look-ups just to display a page which was, for example, mostly Chinese with a few Tamil phrases. end of supposition and inference The good folks engineering the Uniscribe have been most responsive to all kinds of special requests and pointers related to complex script shaping. I think asking them to break the existing mechanism in order to support vowel signs on spaces asks too much, though. People generating texts for educational purposes will always have special needs. So, they'll always need to make special effort to get special effects. Workarounds concerning the original question have already been suggested. If this is treated as a Unicode issue rather than a display issue, then one solution would be for someone to propose a new character, (back on topic a little bit) COMBINING DOTTED CIRCLE FOR COMBINING MARKS. Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine could be changed to insert this new character. Then, these updated rendering engines could be distributed and font developers could add the new characters to fonts and distribute updated fonts. This might just take a while, but it wouldn't be too hard to find examples of the character in actual text use to accompany the proposal... If it ain't broke, don't fix it. So, is it 'broke'? Best regards, James Kass
RE: Printing and Displaying Dependent Vowels
Peter Jacobi wrote, Using the Linux version of Abiword, which uses the Pango renderer, both the Code 2000 and the MS Latha font display the vowel signs without the unwanted dotted circle. NBSP and normal SPACE give identical results. For Code 2000 only, the dotted circle or a similiar ersatz glypg (the screenshot is not that clear) is drawn for the two-part vowel signs U+0BCA, U+0BCB and U+0BCC between the two parts. U+0B82 TAMIL SIGN ANUSVARA is substituted and re-positioned in the compound glyphs of Code2000 for the normal dotted circle in the default glyphs for U+0BCA, U+0BCB, and U+0BCC. This is only expected to appear with a rendering system which does not support OpenType. This is because the default glyphs for these surroundrant vowel signs would never be drawn on the screen. Rather, the expected approach from the rendering engine is to use the component glyphs for these three vowel signs, such as U+0BC7 for the left part of U+0BCA, and U+0BBE for the right-side portion. If the presence of these default glyphs in Code2000 is making problems, they can be adjusted. (Just because I expect a rendering engine to take a certain approach, doesn't mean that a rendering engine will take that approach!) On Windows, as others have noted, the rendering engine (Uniscribe) inserts the dotted circle glyph (if the font has a dotted circle glyph) into the display. The dotted circle character is not inserted into the text, of course. So, if the question is how to make an OpenType font *not* display the dotted circle on Windows with Uniscribe, one idea would be to add a spacing glyph to U+25CC (DOTTED CIRCLE) in the font. This spacing glyph should be a no-contour glyph, perhaps with the same advance width as U+0020. I've not tried this, but it might just work. Another approach is to simply use a non-OpenType Unicode TrueType font for Tamil. The dotted circles don't seem to ever appear unless the font-in-use has OpenType tables covering the script-in-use. Best regards, James Kass
Re: What is the principle?
Asmus Freytag wrote, Surly not! Intentional pun, inadvertent one, or Freudian slip? Uninterpreted VS characters should *not* turn into black blobs. If we had wanted that to happen, we would have coded different characters. U+E000 COMBINING BLACK BLOB? Censors would probably love it. What does the collation standard say to do with unassigned codepoints anyhow? Variation selectors are not unassigned characters. But, they might be regarded as such by any application predating VSs. And, likewise for any VS sequences approved after the application was created. Best regards, James Kass
Re: tick, tick box, cross, cross box
Avarangal wrote, We are in need of tick, tick box, crossand cross box preferably as symbols with code points. Here are some symbols with code points which might work: U+2610 BALLOT BOX U+2611 BALLOT BOX WITH CHECK U+2612 BALLOT BOX WITH X U+22A0 SQUARED TIMES U+229E SQUARED PLUS U+2713 CHECK MARK U+2714 HEAVY CHECK MARK U+2715 MULTIPLICATION X U+2716 HEAVY MULTIPLICATION X U+2617 BALLOT X U+2618 HEAVY BALLOT X Best regards, James Kass We are in need of tick, tick box, crossand cross box preferably as symbols with code points. Any advice on this is appriciated SArivas We are in need of tick, tick box, crossand cross box preferably as symbols with code points. Any advice on this is appriciated SArivas
RE: New What is Unicode translation.
Speaking of translations of What is Unicode?, I found this page: http://asuult.net/badaa/unicode.htm It is in Mongolian (Cyrillic). Best regards, James Kass Don, Offers to translate What is Unicode? to a particular language should be addressed to the Unicode office. This can be done through our reporting form http://www.unicode.org/reporting.html or by emailing me directly. Magda PS: For everybody's convenience, we provide an html template for the translation as well as a set of translation formatting instructions. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, March 18, 2004 6:30 AM To: [EMAIL PROTECTED] Subject: Re: New What is Unicode translation. If someone were interested in translating to an additional language(s), to whom should they write? TIA... Don Osborn Bisharat.net Quoting Magda Danish \\(Unicode\\) [EMAIL PROTECTED]: What is Unicode in Finnish is now online thanks to Jarkko Hietaniemi. Check it out at http://www.unicode.org/standard/translations/finnish.html
Re: Irish dotless I
Anyone who feels that past monetary contributions towards encoding efforts were made based on false pretenses may be able to seek legal redress. There's a certain barrister in Africa who might be able to help in this regard. Of course, this barrister works under conditions of strict confidentiality, so I can't tell you the exact nature of our business relationship. Perhaps we should wait and see if the big pile of money actually shows up in the bank account here before forwarding the barrister's contact information along. After all, just because someone puts something into an e-mail... that doesn't make it true... Best regards, James Kass
Re: Investigating: LATIN CAPITAL LETTER J WITH DOT ABOVE
Curtis Clark wrote, on 2004-03-18 01:05 Pavel Adamek wrote: So it would be convenient to have an empty diacritical mark, (COMBINING NOTHING ABOVE) which would cause the soft dot of j or i to disappear, without adding anything else. Assuming this could be added to any other character, my mind boggles at the implications, both for decomposition and for rendering. :-) The glyph could look like the old Pac-Man video game. It should remain visible until it has consumed the applicable diacritic, then vanish. Best regards, James Kass
Re: About the Kikaku script for Mende, and an existing font for it
Philippe Verdy wrote, So it seems that tone marks used in the latin transcription of Mend頡re not marked in the Kikaku script. It would be interesting to have some book prints available to see if there are punctuation signs or symbols to mark word separation, as well as digits or numbers (some syllables in the Kikaku script closely ressemble to the European digits, and I wonder if an alternate notation was used to mark numbers, or dates, or simply commercial quantities for market exchange and accounting or for marriage dotations, or for customary judiciary decisions, in countries where most of negocations were performed orally). Does anyone have access to a copy of the following?: Tuchscherer, K.T. 1996. The Kikakui (Mende) syllabary and number writing system. Ph.D., London School of Oriental and African Studies. Based on the title, it seems that Kikakui might have additional symbols for numbers. Best regards, James Kass
Re: Battles lost before they begin?
Chris Jacobs wrote, If you have the text in UniPad try the following: Edit Convert Decompose Combinations, and then Search Replace Text to find: \u0323 Replace with: ;. Replace All Or \u0329, depending on which diacritic was used in the source. Also, since the mark below can appear along with a combining grave or acute, those decompositions would have to be considered in the search and replace operations. Since the input for the search window already does not have to be the font codes a non-conformant font is not as bad as it first seemed. I think it should be not that hard to change the search window to let it accept unicode too. Perhaps they've chosen their custom encoding in order to side-step the mark-below issue. As you say, it shouldn't be hard to enable Unicode in their search window. And, once an agreement within the user community is reached on the mark below, it shouldn't be that hard to convert their entire web site and database to Unicode, too! Best regards, James Kass
Re: Battles lost before they begin?
Don Osborne wrote about the on-line Yoruba dictionary. Without some kind of an agreement among Yoruba users as to which combining mark should be used under certain letters (vert. line or dot), Unicode font development for Yoruba is pretty much stymied. This is really a shame. It's also too bad that the good folks behind the dictionary project didn't use an existing 8-bit encoding scheme rather than adding to the disarray. Best regards, James Kass
Re: Mende Kikakui syllabary
Konrad T. Tuchscherer, Ph.D. wrote, I write to the list from Cameroon where I am conducting research on the Bagam and Bamum scripts. The Proel page should not be consulted for information on the Mende syllabary (Kikakui) or any other African script (or system of graphic symbolism, like Adinkra). The Mende syllabary is not pictographic. The dubious map shows the Mende in Liberia, the Loma in Sierra Leone (they are in Liberia, known as Loma; in Guinea known as Toma), the Bamana in Sierra Leone, and Adinkra in Liberia!!! As I often explain to my students, any one can publish something on the internet -- lots of unreliable stuff out there! Indeed. Researching the Bagam and Bamum scripts sounds fascinating. (I've quoted Dr. Tuchscherer's entire message above, it was clearly intended for the list, but does not seem to have appeared there.) At least Proel's page on bamún shows them in Camerún. Although Proel's accuracy is questionable, they often have fairly good scans of some fairly obscure writing systems. http://www.proel.org/alfabetos/bamun.html Sadly, many of their examples of the evolution of the Bamum script are unclear. Best regards, James Kass
Re: Canadian Unified Syllabics
Chris Harvey wrote, ... I want the examples on my site to be legible (dot accents non-spaced in the middle of syllabics instead of above them aren't really acceptable), and I want the characters to look like what speakers are familiar with, otherwise they may very well choose not to use the font, keyboards, etc. My aim is that people can type their own language on the computer they have now. Once OpenType is available on my machine and others, I will release fonts which have OpenType tables, calling the same glyphs that are now in the PUA. This way, I am trying to make some humble attempt at backward compatibility. But for now, if people cannot use the OpenType substitutions, what else should I do? I am building specific fonts for specific languages, but I wanted one font that would display the lot. That way, if someone wanted to use languagegeek.com, they would only have to download one font, instead of one per language. These are all laudable goals with understandable intentions. As far as *characters* which aren't yet encoded, the PUA really seems to be the only method. Since you asked, however, an alternative to the current approach would be to: * Encode the pages as compliantly as possible. * Offer the one font to fit all the pages while awaiting either language-specific fonts or OpenType technology availability. * Note on the pages that the one font aims to cover all syllabics, but that language-specific variants exist which can't yet be covered in a single font due to technological limitations. * Use any combining dots and so forth from the COMBINING DIACRITIC range. (A font like Code2000 won't display these combiners well due to technology limitations, but, so what? In *your* font, you can place the combining glyphs so that their default position is acceptable and won't overstrike the base glyphs.) An advantage to doing something like the above is that backwardness isn't being perpetuated under the guise of backwards-compatibility. Another merit is that text (aside from necessary PUA matter) is correct, compliant, interchangeable, and permanent. Parsers, search engines, indexing operations, and all the rest, will work as they should. A disadvantage of the current approach is that users may be too easily tempted to also generate text, data, and web pages using a proprietary encoding. In the long run, many might view this a something other than a favor to the user communities. Best regards, James Kass
Re: Phonology [was: interesting SIL-document]
John Cowan wrote, Arcane Jill scripsit: Delenn said abso-fragging-lutely dammit on Babylon 5 once. Wasn't that American? Indeed. ... Nope, sorry. Not American -- Minbari. For more info on the Minbari, please see: http://www.sadgeezer.com/babylon5/minbari.htm Best regards, James Kass
Re: Panther PUA behavior
Doug Ewell wrote, ... On Windows, I can't even rely on being able to display real Unicode characters for Vietnamese in places like the Start menu or the title bar of the browser, because they're not in the one and only font used for each of those places. For the title bar of the browser, [Start] - [Control Panel] - [Display] - [Appearance] - [Advanced] - Select Inactive Title Bar in the box for Item, then select a font from the pop-up list that covers the encoding and range of characters. Select a size that looks good. [OK] - [Apply] - [OK] Then, exit Control Panel and try it. Note that there are other font settings besides Inactive Title Bar that can be changed in that same menu to customize the appearance of other items. Also note that Inactive Title Bar seems to apply to active ones, too! Best regards, James Kass
Re: Panther PUA behavior
Doug Ewell wrote, No, no, I know how ... I thought you might. ... I meant that because Windows doesn't do any fancy font switching in title bars to cover glyphs that aren't in the selected font ... It's too bad that these user-selectables don't allow for some kind of prioritized font list. For power users. Best regards, James Kass
RE: Infix profanity (Very OT) (was Phonology)
Arcane Jill wrote, ...However, at the time she said abso-fraggin-lutely, she did so because she was learning how to swear in English ... In this context, your initial observation appears to be spot on! Rescind your retraction, I'll recall my rhyme. Best regards, James Kass
Re: Examples of Cuneiform Ideographic Descriptor Usage
Dean Snyder wrote, In preparation for tomorrow's Unicode Technical Committee meeting, and for general review and comments, I have uploaded a 140kb PDF file that illustrates some usage examples of the proposed Cuneiform Ideographic Descriptors. http://www.jhu.edu/ice/basesigns/CuneiformDescriptorUsage.pdf Circumstances have forced me to get this out in a hurry and I know there are mistakes in it, but I believe it will still be useful as a point of departure for discussion. [As an exercise for the reader, see if you can find any mistakes ;-)] In circled 9 and 10, the same code point (1221B) is given for LU2 SQUARED and LU2 TENU. In circled 15, the same glyph is used for 1240A INFIX and 1240B OUTFIX. Glyph descriptors could theoretically be applied to any script. Once more than one or two strokes are used to form glyphs, there are bound to be recognizable components. So, I think I understand how the system you are proposing works, although some of the sequences are less than clear for me, perhaps because I'm not a Cuneiform expert. Please see attached 4KB GIF picture in which graphics from the PDF file were borrowed and applied to some Latin glyphs. What I'm not understanding is why this approach should be considered superior to the static approach underlying the current Cuneiform proposal. Cuneiform ideographic descriptors could be quite useful for illustrating the components of existing Unicode Cuneiform characters as well as providing a method for scholars to describe hitherto unknown and/or unencoded characters. But, I share the concern expressed by others on this list that bringing up an alternative encoding method for Cuneiform at this stage might derail the existing proposal, which appears to be on-track. Best regards, James Kass ideodesc.gif
Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)
- Original Message - From: John Jenkins [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Tuesday, January 20, 2004 9:32 AM Subject: Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors) . John Jenkins wrote, 1) U+9CE6 is a traditional Chinese character (a kind of swallow) without a SC counterpart encoded. However, applying the usual rules for simplifications, it would be easy to derive a simplified form which one could conceivably see in a book printed in the PRC. Rather than encode the simplified form, the UTC would prefer to represent the SC form using U+9CE6 + a variation selector. Except that this character is listed in CJK Extension C, on page 612. (File: IRGN9285.PDF 08/06/02) Best regards, James Kass .
Re: Combining down-pointing triangle above?
. Doug Ewell wrote, Is this just a fancified hacek, or a potential candidate for proposal? Naturally, from a Unicode standpoint I'm thinking about a combining character, not a precomposed c-with-triangle. It might be a caron, see: http://www.chumashlanguage.com/pronun/pronun-00-fr.html Best regards, James Kass .
Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)
. Dean Snyder wrote, SOMEONE at SOMETIME must have thought that free variation selectors were a good idea for Mongolian in Unicode. If the thinking has changed on this since then, I would love to hear about why it has changed. Is Mongolian functioning well in Unicode or not? If not, what specifically in it is broken, or is at least sub-optimal? And what are suggested solutions for fixing Mongolian in Unicode if it is indeed problematic? Andrew C. West offers test pages for both Mongolian and Manchu. These pages have some of the technical background that you seek concerning variation selectors and Mongolian, as well as explore many issues concerning Unicode Mongolian. There is some good information about Variation Selectors on the Mongolian page under the heading Mongolian Free Variaton Selectors. (Hello Andrew, ...Typo alert!) http://uk.geocities.com/BabelStone1357/Test/Mongolian.html Unicode for Mongolian is working perfectly on many platforms, (smile) but only if we're discussing Cyrillic script. Best regards, James Kass .
Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)
. Dean Snyder wrote, Tom Gewecke wrote at 2:26 PM on Sunday, January 18, 2004: ... Agreed. I can't imagine that anyone who has ever tried to actually do anything with Unicode Mongolian would recommend variation selectors as an encoding technique, unless perhaps they wanted to make sure the encoding was never implemented. Could you please elaborate? Has this modle not been implemented? Either via Unicode or otherwise? Here's how it works: there are three factions involved. The OS and rendering-engine developers, the editor/processor/input developers, and the font developers. Each faction considers that the fancy stuff needed for Mongolian rendering should properly be handled through a combination provided by the other two factions. Seriously, it's my understanding that implementation guidelines for Mongolian script and Unicode are still being worked out. Aside from experimental set-ups, it's unlikely that anyone can yet correctly (or, even reasonably) display the Mongolian text on Andrew C. West's test pages. Best regards, James Kass .
U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)
- Original Message - From: Peter Kirk [EMAIL PROTECTED] To: Philippe Verdy [EMAIL PROTECTED] Cc: Unicode Mailing List [EMAIL PROTECTED] Sent: Monday, January 05, 2004 8:16 AM Subject: Re: unicode Digest V4 #3 Peter Kirk wrote, I note an incorrect glyph for U+0185 in Code2000 and in Arial Unicode MS; this looks like b with no serif at the bottom but should be much shorter, like ь, the Cyrillic soft sign. The Arial Unicode MS glyph for U+04BB is also incorrect - it should look identical to Latin h - but this problem is well known. No comment on U+04BB. With regards to U+0185, could it be said that the informative glyph in TUS 2.0, 3.0 and 4.0 is a bit misleading, or does that glyph represent a variance from the text(s) with which you're familiar? http://www.unicode.org/charts/PDF/U0180.pdf Magnify U0180.pdf to 400% and put the row 0185 - 0195 - 01A5 towards the top of the screen so that the top of U+0185 touches the screen area border. Note that the top of U+0185 aligns with the top of U+0195, suggesting that these glyphs would have the same height. In THE LANGUAGES OF THE WORLD by Kenneth Katzner (1975), the example for Chuang seems to show a glyph covering U+0185 as you describe. (page 212) This page uses a scan from THE LANGUAGES OF THE WORLD as its Chuang example: http://www.worldlanguage.com/Languages/Chuang.htm No sample text, no lower case illustration: http://www.alphabets-world.com/chuang.html If the informative glyph in TUS *is* misleading, I'll be happy to make appropriate changes here. Best regards, James Kass .
Re: Saving in Unicode
. Jose Rodriguez wrote, Can anyone tell me if it is possible to save a file in Unicode format through Visual Basic and if so how to do it? I have a Visual Basic program which converts my client's file from one format to another. However the resulting file must be saved in Unicode. Please give any help you can or at least point me in the right direction. Internationalization with Visual Basic By Michael S. Kaplan http://www.i18nwithvb.com/ Best regards, James Kass .
Re: U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)
. Michael Everson wrote, Well, James, I think it would be A LOT better if we got some actual documents from Zhuangland. Agreed. Meanwhile... The glyphs used in Everson Mono Terminal for U+0185 and U+044C appear to be identical. That's good enough for me. I'll fix things here accordingly. Best regards, James Kass .
Re: U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)
. Kenneth Whistler wrote, Note that there are more modern representations of Zhuang that dispense with the special tone letters altogether and substitute out ordinary Latin letters, in a Pinyin-like simplification. See: http://www.liuzhou.co.uk/liuzhou/language.htm with a sign showing the substitution of Latin J, H, Z, X, W(?) for the 5 Zhuang tone letters. The chart on this Japanese page about the modern Latin based Zhuang writing systems appears to confirm that ASCII letters are now used for tone marking, but uses the q in place of your questioned W. http://www.geocities.co.jp/NatureLand/3973/zhuangyu_ch06.htm Best regards, James Kass .
Re: Ancient Northwest Semitic Script
. Dean Snyder wrote, But, in either case it is hoped that the needs of script taxonomists and paleographers won't be disregarded. So Unicode is now prepared to provide support, in plain text, for the needs of paleographers? Practitioners of many sciences need Unicode in order to store and exchange information. Mathematicians have successfully encoded what are essentially Latin glyph variants separately for usage as math variables in Plane One, including Fraktur and cursive styles. Epigraphers may elect to classify and codify specific variants for specific needs. They could organize and submit a proposal for these requirements using, say, the existing Unicode mechanism of variation selectors. If they did so, wouldn't the various bodies give such a proposal due consideration? Well I, for one, prefer to read in more paleographically relevant renderings; and fonts combined with markup will, of course, take care of everything. That's not very useful in plain text. Unicode is an encoding standard for plain text. Fraktur has precisely the same plain text rendering issues. Indeed it does. (Unless you're a mathematician, of course!) Quoting from N2311.PDF: This document by Michael Everson is particularly revealing and in the end damning to his whole attempt at disunification of the Northwest Semitic script. The document by Michael Everson is what I had thought had sparked this thread. If we compare this list to the taxonomic chart he reproduces on the next page (see the attachment), we see convenient, but nevertheless glaring, discrepancies between the two. Not mentioned in his list but appearing in the chart under Phoenician are Samaritan, Hebrew Square, Arabic, and Aramaic - including Nabatean, Palmyrene, Mandaic, Syriac, etc. (See the attachment.) It is an evolutionary chart. Everson's fuller quote here is: Phoenician is the catch-all for the largest group of related scripts including its ancestors, Proto-Sinaitic/Proto-Canaanite. Looking at tables 5.1, 5.3, and 5.4 (below) most of the scripts are so similar that there doesn't seem to be any point in trying to encode them separately. But he conveniently excludes any tables for Aramaic, Hebrew Square, and Samaritan paleography and also fails to mention the one column out of sixteen in these tables that IS devoted to Aramaic. A possible reason for omitting tables for Hebrew Square is that this is what is already encoded under HEBREW in Unicode, thus it doesn't need covering in a proposal for unencoded scripts. It's also possible that full tables for Aramaic were omitted because, as the document mentions, further research is required for Aramaic. Samaritan is covered (at least with a chart) in a different document, http://www.evertype.com/standards/iso10646/pdf/samaritan.pdf So once again I refer to other tables with broader paleographic attestation http://www.jhu.edu/ice/ancientnorthwestsemitic/gesenius.gif http://www.jhu.edu/ice/ancientnorthwestsemitic/gibson1.gif http://www.jhu.edu/ice/ancientnorthwestsemitic/gibson2.gif and, based on such tables, suggest, in Everson's words, that Looking at [THESE tables] most of the scripts are so similar that there doesn't seem to be any point in trying to encode them separately. gesenius.gif shows logical divisions between Old Hebrew, Samaritan, Old Aramaic, and Aramaic-Hebrew. It would seem to align well with Michael Everson's N2311.PDF. gibson1.gif is all about (palaeo-) Hebrew and Moabite, which would seem to already all be covered under Phoenician in N2311.PDF gibson2.gif appears to show the evolution of the Aramaic script. Some of the Hebrew legend glyphs at the extreme left bear a passing resemblance to some of the Aramaic glyphs. There is a resemblance between many of the Aramaic glyphs and many of the the Phoenician (palaeo-Hebrew) glyphs. Again, further research is required on Aramaic. Here's another interesting chart: http://phoenicia.org/imgs/evolchar.gif Quoting Herodotus (translated by Audrey de Selincourt) quote The Phoenicians who came with Cadmus - amongst whom were the Gephyraei - introduced into Greece, after their settlement in the country, a number of accomplishments, of which the most important was writing, an art till then, I think, unknown to the Greeks. At first they used the same characters as all the other Phoenicians, but as time went on, and they changed their language, they also changed the shape of their letters. end quote Phoenician shouldn't be unified with either Greek or Hebrew. Best regards, James Kass .
Re: Ancient Northwest Semitic Script
. Peter Kirk wrote, Perhaps we should have a special block of Epigraphical Alphanumeric Symbols, to go with the Mathematical..., for which epigraphers can propose all manner of glyph variants which they might find useful, while the rest of us ignore these blocks get on with encoding our texts using the existing Hebrew, Latin etc blocks with markup for glyph variants. That's an approach which would probably be workable. Two reasons that variation selectors were mentioned are because we have some precedent for variation selectors being used for specific glyph forms for certain math symbol characters. And, variation selectors are supposed to be ignored in searching and indexing, more or less. (Default ignorable) So, that approach might meet epigraphers' needs while enabling painless cross-variant searching, and still permit scholars to get on with encoding their texts as they see fit. Best regards, James Kass .
Re: Ancient Northwest Semitic Script
. Dean Snyder responded to Michael Everson, Sounds very similar to the development of the Latin script variants, doesn't it? Aren't there many common threads in the development of writing systems? Should Latin be separately encoded? Latin *has* been separately encoded. Not the Latin that is comparable to the Phoenician we are talking about. (smile) If you're referring to Old Italic, it's in Plane One. Ancient Latin, as a parent script, is roughly analogous to the Phoenician under discussion. Ancient Latin does not have a J, U, or W in it, and yet Unicode, in the Latin block, has LATIN CAPITAL LETTER J, etc. Some modern languages use extensions to the Latin script. Others, like some Polynesian languages, use only a subset. These are typically either paleographers, who are more interested in emphasizing glyphic variation than commonality, Is it possible that paleographers are interested in representing and reproducing stone inscriptions accurately? Could it be said that paleographers must be aware of commonality as well as variance? or they are script taxonomists intent on delineating lines of derivation and innovation. Taxonomy, from the Greek taxis, arrangement + nomos, law. It shouldn't be much of a semantic stretch to say that some Unicoders are taxonomists. So, hopefully there's nothing really wrong with taxonomy. In neither case are they encoders, Aren't they? The process is open and experts of any persuasion are generally welcomed. Besides, would it be fair to say that many paleographers and script taxonomists have been interested in computer encoding all along? and in neither case do they use the word script with that meaning invested in it by Unicodists. That may be. But, in either case it is hoped that the needs of script taxonomists and paleographers won't be disregarded. Well I, for one, prefer to read in more paleographically relevant renderings; and fonts combined with markup will, of course, take care of everything. That's not very useful in plain text. Unicode is an encoding standard for plain text. The same can be said for the Indic and Philippine and other scripts, yet we (properly) encoded them. Some of the nodes on the tree show enough variation to warrant separate encoding. But not the Phoenician, Punic, Moabite, Ammonite, Old Hebrew, and Old Aramaic nodes. In fact, the glyphic, or paleographic, variation is so slight at times between texts in these languages and dialects, that it is the extra-script evidence that is diagnostic for identification. Quoting from N2311.PDF: quote Phoenician encompasses: Proto-Sinaitic/Proto-Canaanite Punic Neo-Punic Phoenician proper Late Phoenician cursive Phoenician papyrus Siloam Hebrew Hebrew seals Ammonite Moabite Palaeo-Hebrew end quote quote ...most of the scripts are so similar that there doesn't seem to be any point to encoding them separately. end quote Best regards, James Kass .
Re: Aramaic unification and information retrieval
. Quoting from: http://www.jewishencyclopedia.com/view.jsp?artid=1308letter=A quote ... In the letter מ the original bent stem was curved upward still more until it reached the upper horizontal stroke, so that the final Mem to-day has the form ם. The Palmyrene script possesses a final Nun with a lengthened stem; the Nabatean contains similarly final Kaph, Nun, Ẓade, and Shin, and further a closed final Mem and final He. ... end quote So, apparently we have contextual forms which differ a bit between scripts. (Hebrew has final KAF, MEM, NUN, PE, and TSADI.) *** If ancient Hebrew and modern Hebrew were the same script, we wouldn't need the modifiers, we could just say Hebrew and everyone would know what we were talking about. *** The opening line from the Moabite Stone (Mesha Stele) could be expressed as ANK MSO BN KMSMLD MLK MAB, but that's not a compelling argument in favor of unifying Phœnician and Latin. Likewise, the fact that some members of the user communities often transcribe such inscriptions into modern Hebrew is not a compelling argument in favor of unifying ancient and modern Hebrew. *** If it's perfectly acceptable to write old Aramaic using modern Hebrew glyphs, would the converse also be true? In other words, would it be perfectly acceptable to use old Aramaic glyphs along with cantillation marks and modern Hebrew points to represent the Bible? Or, would it be a travesty to do so? *** If referring generically to many of the Indic scripts won't float your boat, suppose we consider the Philippine scripts. Some of these are arguably glyph variants of each other, yet they were not unified. (Well, the punctuation was unified.) *** Referring to the 2311.PDF document, it should be noted that the phrase Further research is required is used twice in the short section on Aramaic. Michael Everson's submission doesn't strike me as by gosh and by golly - this is how we're going to do it, but rather seems to be a preliminary report offering guidelines derived from respected sources. *** Ideally, input would be solicited from members of the user communities who have read Daniels and Bright (as well as other germaine publications) and who know something about computer encoding and the Unicode Standard. (smile) Rara avis. Best regards, James Kass .
Re: Aramaic unification and information retrieval
. Peter Kirk wrote, There are no distinctive features other than glyph shapes distinguishing Hebrew, Phoenician, Samaritan and Early Aramaic as proposed in ... Couldn't the same observation be made about many of the Indic scripts? Best regards, James Kass .
RE: Swastika to be banned by Microsoft?
verdy_p @ wanadoo.fr wrote, ... For now African languages are only representable on Windows with Arial Unicode MS ... What utter nonsense! Bosh. Balderdash. ␈. Yet another blatantly false statement from a generally unreliable source. This is really tiresome. .
RE: Swastika to be banned by Microsoft?
. May be the Unicode name should not be swastika but a transliteration of an Asian name (Tibetan, Chinese Pinyin...), ... How about Sanskrit? *** The swastika was also used as a symbol in scouting. (As in Boy Scouts.) http://www.pinetreeweb.com/bp-can3.htm http://www.scouting.milestones.btinternet.co.uk/badges.htm Best regards, James Kass .
Re: Swastika to be banned by Microsoft?
. Mark E. Shoulson wrote, I'm embarrassed to admit it, but I find myself thinking that the swastika, THE Nazi swastika, right-facing, tilted 45°, proper ratio of stroke-thickness, the whole deal, should be encoded in Unicode. As a matter of history: it *is* a symbol of profound significance in the history of the world. Indeed it is. Perhaps what is needed is a new combining character. Maybe some kind of COMBINING REPLACEMENT WHITEWASH CHARACTER could be proposed. It could be applied by the system wherever appropriate, as deemed by user preferences or regional insistence, in order to obliterate any characters or character strings which might offend. One suggestion for a display glyph would be an ostrich with its head buried. It is said that one who ignores history is doomed to repeat it. Or, we might consider that the same characters used to represent holy books or love poetry can also render 'Mein Kampf'. Ultimately, the ability to freely and openly exchange information and ideas may prove to be harmful only to despots and the like. Best regards, James Kass .
RE: Swastika to be banned by Microsoft?
. James Kass wrote, Yet another blatantly false statement from a generally unreliable source. That was not only ad hominem, it was probably redundant, as well, and I'm sorry for it. It would have been better left unsaid. Best regards, James Kass .
RE: Swastika to be banned by Microsoft?
. Philippe Verdy wrote, ... For now African languages are only representable on Windows with Arial Unicode MS ... What utter nonsense! Bosh. Balderdash. I spoke only of the default core fonts that come with Windows. It's too bad that Arial Unicode MS is not a Windows default core font, then. So please stop insults... I'll try to restrain myself. ... there was no offense in what I said ... Except that it was untrue. Best regards, James Kass .
RE: character map in Microsoft Word
. Philippe Verdy wrote, Note that Windows keyboard drivers do not support input of Unicode code points. Keyboard DLLs for modern Windows systems are Unicode-based. What you have is (below, replace AltGr by Alt+Ctrl on US keyboards that don't have a AltGr key): Alt+Ctrl + any sequence of digits from the numeric key pad produces nothing at all. (At least not on Win XP.) The right-hand Alt key on U.S. keyboards is the AltGr key, even though the physical keyboard may not be labelled as such. Either the right or left Alt key plus digits from the numeric key pad can be used to insert special characters. As Chris Jacobs mentioned, in WordPad (on Win XP, at least) Alt plus 8531 (from the digital key pad) inserts the 1/3 character (U+2153). Chris said this doesn't work in Outlook Express, though. It also doesn't work in Notepad. Best regards, James Kass .
Re: Glottal stops (bis) (was RE: Missing African Latin letters (bis))
. John Hudson wrote, ... If I'd been asked to design upper- and lowercase forms from scratch, I would make the cap form the same height as e.g. P, and as massive, and I would make the lowercase form a *descending* letter, with the bowl filling the x-height and with a straight descender terminating like that of p. Interesting approach. This should look quite pleasing in running text. If a new upper case glottal stop character were added to Unicode, I'd move the existing glottal stop glyph to the new upper case code point and make a lower case glyph which would match the t height and be a bit narrower than the upper case. This would represent a typographic compromise offering a distinction between cases while preserving, more or less, user expectations for existing data display. Best regards, James Kass .
RE: MS Windows and Unicode 4.0 ?
. Arcane Jill wrote, (Ah, well, it was apparently in rich text (or something other than plain text) format, so I guess I can't copy/paste it into my reply, and now it isn't visible on the screen, so I will have to do this from memory...) ... calligraphic (is that a word?) ... Yes. Best regards, James Kass .
Re: MS Windows and Unicode 4.0 ?
. Edward H. Trager wrote, WHY NOT just *give* away the Linear B, Ogham, Cherokee, and lots ... However, I would not suggest giving those fonts away to an OS vendor like ... It's hard to sell something you're giving away. Best regards, James Kass .
RE: Oriya: mba / mwa ?
. Michael Everson wrote, You should implement according to what is on page 238 of the Unicode Standard, and if there are people in India who think otherwise they had better argue their case convincingly to the UTC. I don't personally care which character is used. I *do*. Someone at the TDIL has decided he's got a bright idea about how to use WA, and that changes the traditional orthography. The TDIL document was published in April of 2002. At that time, page 238 of TUS 4.0 did not exist. The authors of the Oriya section of the report really only had the sparse information on page 227 of TUS 3.0 upon which to expand. Perhaps many of us on this list have, in the past, attempted to exptrapolate the direction the consortium might take -- only to be surprised when a different path is chosen. Other than the fine work by Maurice Bauhahn on Khmer, the existence of these comprehensive TDIL reports written by technically-oriented expert members of the script user communities who also are familiar with computer encoding issues *and Unicode* appears to be unprecedented. We should rejoice that these TDIL reports exist and urge the various authors to contribute to discussions on any edge-case issues. Rather than revising history or revising encoding practices, maybe the TDIL reports could be revised where appropriate. Best regards, James Kass .
RE: Complex Combining
. Jonathan Coxhead wrote, ...http://www.doves.demon.co.uk/atomic.html. Quoting from the page, ... the longest word you can write upside-down in Unicode is `aftereffect?). In UTF-8: zʎxʍʌnʇsɹbdouɯլʞſ̣ı̣ɥɓɟəpɔqɐ Best regards, James Kass .
Re: Latin Capital Letter Turned T/K?
Oh, yes, pictures of the characters: due to the miracles of modern technology, I can include them in plain text, but you'll have to stand on your head (-: T K LOL. Aren't these turned letters (and several others) used in the Fraser script? Best regards, James Kass .
Re: Oriya: mba / mwa ?
. Peter Constable wrote, The question, then, is how MBA should be encoded: as 0B2E MA, 0B4D VIRAMA, 0B2C BA , or as 0B2E MA, 0B4D VIRAMA, 0B71 WA ? MA + VIRAMA + BA, according to TUS 4.0, page 238. Best regards, James Kass .
Re: Oriya: nndda / nnta?
. Michael Everson wrote, It would be just so cool if you would say what page and column and line of that document you are referring to. Subjoined DDA (0B21) and TA (0B24) seem to be mentioned on page 8 of 59 in the right-hand column under the heading Consonant Signs. ...It must be noted here that when the consonant sign ... is attached below NNA (0B23) it is pronounced as DDA (0B21). In other words the same consonant sign represents both TA (0B24) and DDA (0B21). Ah, there's nothing quite like glyphic ambiguity... Best regares, James Kass .
Re: Oriya: nndda / nnta?
. Peter Constable wrote, ?The Indian gov?t doc at http://tdil.mit.gov.in/ori-guru-telu.pdf describes the conjunct shown in the attached PNG as being pronounced as though NNA + VIRAMA + DDA (0B21). The component attached to the NNA otherwise represents TA (0B24), however. My question is this: should this conjunct be encoded as 0B23 NNA, 0B4D VIRAMA, 0B24 TA or as 0B23 NNA, 0B4D VIRAMA, 0B21 DDA ? Page 13 of 59, right-hand column shows four examples of the subjoined reduced TA under TA (Sign). The only example given for the subjoined reduced DDA immediately follows. It seems clear from the illustration that the authors of the document expect that the glyph in question would be encoded as NNA + VIRAMA + DDA. Since base letters TA and DDA are similar in appearance, their reduced form(s) could be identical. If this is the case, then probably NNA + VIRAMA + DDA. Or, if it's supposed to be the reduced form of TA and is only *pronounced* like DDA when it's under NNA, then probably NNA + VIRAMA + TA. Best regards, James Kass .
RE: Request
. Peter Constable wrote, On Behalf Of Ritu Malhotra Could someone kindly help me by providing an exe(Font utility) that will not only edit open type fonts(ex: Mangal.ttf)... Making changes to mangal.ttf or other Microsoft fonts would be in violation of the end-user license agreement which you agreed to when you installed the software, and is illegal. Peter is absolutely correct. And, it's not just Microsoft fonts which mustn't be altered by the end-user. Most font developers restrict rights on their fonts. Obtaining a legal copy of a font only grants the user the right to use the font; not to make changes. Best regards, James Kass . -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ritu Malhotra Could someone kindly help me by providing an exe(Font utility) that will not only edit open type fonts(ex: Mangal.ttf)... Making changes to mangal.ttf or other Microsoft fonts would be in violation of the end-user license agreement which you agreed to when you installed the software, and is illegal. If you want a Devanagari font that uses a non-standard encoding, there are plenty of them out there that you can use without doing anything illegal. See, for instance, http://www.sil.org/computing/fonts/LANG/HINDI.HTML. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: Request
. John Hudson wrote, If in doubt, check your license agreement. Windows users can check the licensing material on many newer fonts with a program called TTFEXT.EXE, freely available from Microsoft: http://www.microsoft.com/typography/property/property.htm It's too bad that this feature is not included by default with the font folder. Best regards, James Kass .
RE: Definitions
. Peter Constable wrote, James: Inside a program, for instance... This is *very* faulty logic. ... Jeepers! ... Variable names exist in source code only, and have nothing whatsoever to do with the data actually processed. Exactly. Variable names are always internal while data may be external. You're also referring to an assigned character in your example, not a PUA codepoint. ... Since it was supposed to draw a correlation between ASCII-conformant and Unicode-conformant, an assigned ASCII character was used in the example. After all, ASCII didn't have much to offer in the way of Private Use Areas or unassigned code points. A software product could assign every single PUA codepoint to mean some kind of formatting instruction, and insert these into the text like markup. In that case, a user's PUA characters will be re-interpreted by that software as formatting instructions. HTML manages to use ASCII characters as formatting mark-up yet still allows ASCII text to be processed as expected. Briefly, it's my opinion that applications which claim to support and comply with Unicode should not 'step on' Unicode text. Any loopholes in the 'letter of the law' which allow applications to mung or reject Unicode text should be plugged. Best regards, James Kass .
Re: creating a test font w/ CJKV Extension B characters.
. Gary P. Grosso wrote, On Win2K, Character Map (charmap.exe) does not show anything beyond the BMP. I haven't tried this on XP. Have you tried BabelMap or BabelPad? Both can show non-BMP... http://uk.geocities.com/BabelStone1357 Best regards, James Kass . Since we're comparing notes on font tools, I recently was asked to look over an experimental font which had, among other things, characters in the Supplemental Multilingual Plane and used CFF format. (I had to look up what CFF format even was.) PFAEdit was able to load the font. At least I could see the SMP characters; I didn't attempt any editing, kerning, etc. I've always been fairly impressed with PFAEdit, which probably deserves a name which reflects the fact that it goes well beyond PFA files or even Type 1 fonts. In fact, I'd like to see it ported to Windows. Font Creator Program couldn't load the font due to the CFF format, which was disappointing, because I like FCP's interface and other features, and was hoping to get an up close and personal look at some of the glyphs, which seemed to have some sort of height anomaly. On Win2K, Character Map (charmap.exe) does not show anything beyond the BMP. I haven't tried this on XP. Gary At 09:00 AM 11/20/2003 -0500, Mark E. Shoulson wrote: I haven't tested this myself, but from a look at the source code, it appears that pfaedit (pfaedit.sourceforge.net) can generate format12 TTFs. (Open Source, for UNIX). ~mark On 11/20/03 03:12, Arcane Jill wrote: Is anyone able to answer this? I for one would really like to know. Thanks -Original Message- From: Frank Yung-Fong Tang [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2003 2:29 AM To: John Jenkins Cc: [EMAIL PROTECTED] Subject: Re: creating a test font w/ CJKV Extension B characters. Does FontLab support generating TTF in format12 (32 bits)? Which cheaper solutions could generating TTF in format12 (32 bits)? --- Gary Grosso Arbortext, Inc. Ann Arbor, MI, USA