Re: Swastika to be banned by Microsoft?
At 08:52 -0800 2003-12-15, Elaine Keown wrote: Mark said: I'm embarrassed to admit it, but I find myself thinking that the swastika, THE Nazi swastika, right-facing, tilted .the whole deal, should be encoded This looks to me like the ideal place for an extended note in Unicode, not a code point. The note could describe the graphic differences between the existing code point and the Nazi version. I am not certain that the existing code position is satisfactory for non-CJK use. That is, Tibetan, Norse, Native American, Scouting use, and so on. Those NEVER show Han brush-stroke shapes. I would like to see some discussion about whether the properties those characters have are suitable for use in other contexts. Some things are really too evil to facilitate even in a small way in a computer code. The tilted Nazi swastika is a DIFFERENT character again. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Latin Capital Reversed K
At 09:07 -0800 2003-12-15, Alex LeDonne wrote: http://www.baseballscorecard.com/scoring.htm This shows complex, non-plain-text notation. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [OT] CJK -> CJC (Re: Corea?)
At 12:09 -0800 2003-12-15, Peter Kirk wrote: Then let's hope that ISO 10646 doesn't decide to break its own rules and change "KOREAN" to "COREAN" in character names e.g. U+321D. Think what that would do to the Unicode stability policy - although in fact only five names are affected. It is offensive to suggest that WG2 would do so. This thread could die now and it would be OK. -- ME
Re: [OT] CJK -> CJC (Re: Corea?)
At 13:55 -0800 2003-12-15, Peter Kirk wrote: On 15/12/2003 12:25, Michael Everson wrote: At 12:09 -0800 2003-12-15, Peter Kirk wrote: Then let's hope that ISO 10646 doesn't decide to break its own rules and change "KOREAN" to "COREAN" in character names e.g. U+321D. Think what that would do to the Unicode stability policy - although in fact only five names are affected. It is offensive to suggest that WG2 would do so. Michael, I have never before heard of a committee or working group taking offence corporately. My remark was not ad hominem although it might have been considered ad comitatem (or whatever the correct Latin is). You may personally be very determined not to make such changes, but presumably there is a mechanism by which in principle you might be outvoted within WG2. I object, rightly, to your suggestion that ISO/IEC JTC1/SC2/WG2 would violate its own rules and make changes which both the UTC and WG2 have promised not to do. Your statement made it sound as though WG2 was not a serious standardization body which does not take its responsibilities seriously. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: [OT] Corea? (was: Euro-English...)
At 00:37 +0100 2003-12-16, Philippe Verdy wrote: But all this is completely out of topic of Unicode (we are more concerned here by language codes than by country/territory codes). Yes, it is. Still, ISO 3166 or in UN codes is an incomplete standard, as it does not map correctly all dependant territories (see "YT" for Mayotte, which the UN still considers a part of Comores in its World Map updated and published in last August 2003, but that it also falsely documents as a French territorial collectivity, despite it is now a departmental collectivity, after its local population approved the new status which integrates it more tightly within France). Other missing codes in ISO 3166 and in UN statistics are: [snip] If you have issues with the content of ISO 3166, Philippe, take them up with ISO TC46. You can contact the secretariat in AFNOR. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Stability of WG2 (was: Re: [OT] CJK -> CJC)
At 19:13 -0800 2003-12-15, Doug Ewell wrote: The North Korean and Chinese national bodies have already made proposals that violate both the letter and spirit of stability policies. Yes. And we have rejected them. I'm glad the U.S. national body will stay involved, but having to rely on that does sound a bit like having to rely on enlightened statesmen, doesn't it? Better than if the whole thing were just left to the employees of large companies, Doug. We have good checks and balances. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Case mapping of dotless lowercase letters
At 11:03 +0100 2003-12-16, Philippe Verdy wrote: Doug Ewell <[EMAIL PROTECTED]> writes: > Wrong here: I have found occurences of dotless lowercase i, used > instead of soft-dotted lowercase i, as base letters for diacritics > added above it (it was an accute accent...) Don't do that. What? This is VALID UNICODE to have texts coded like this. In Irish, it is INCORRECT to spell "físeán" 'video' with a DOTLESS I + COMBINING ACUTE. It is a spelling error, and will fail in spell-checking. The correct spelling is either I + COMBINING ACUTE or precomposed I WITH ACUTE. It is VALID UNICODE to follow LATIN CAPITAL LETTER Q with DEVANAGARI VOWEL SIGN E but that doesn't mean it's the right way to write anything. For whatever reason, encoded texts exist before correct fonts are used to render them. So there does exist texts which use dotless lowercase i before a diacritic above, simply because the author of the text did not want it to be rendered with a superposed dot. Texts which contain spelling errors. Or old IPA texts using any number of ad-hoc IPA font solutions. Those texts have to be transcoded to proper Unicode at some stage. What you suggest is Not Recommended. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Stability of WG2
At 03:03 -0800 2003-12-16, Peter Kirk wrote: The North Korean and Chinese national bodies have already made proposals that violate both the letter and spirit of stability policies. Fortunately they each have only one vote in WG2. But isn't that enough to outvote the US body? Not with Ireland and Japan standing with the US on such an issue. ;-) We really must get the UK back into SC2 ;-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Case mapping of dotless lowercase letters
At 13:00 +0100 2003-12-16, Stefan Persson wrote: Michael Everson wrote: In Irish, it is INCORRECT to spell "físeán" 'video' with a DOTLESS I + COMBINING ACUTE. It is a spelling error, and will fail in spell-checking. The correct spelling is either I + COMBINING ACUTE or precomposed I WITH ACUTE. Isn't the sequence "dotless i + combining acute" canonically equivalent to "dotted i + combining acute"? It is not. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Stability of WG2
At 04:36 -0800 2003-12-16, Peter Kirk wrote: Seriously, can you remind us briefly what the situation is, why there is no current UK representation? I will answer this off-line. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Stability of WG2
At 02:53 -0800 2003-12-16, Peter Kirk wrote: Good point. Remember that the predicted life of Unicode (recently predicted by Michael, anyway) is longer than the lifetime of the current WG2 members My point is that the work we do identifying characters and encoding them won't have to be done again. Once Manichaean is encoded, it's encoded. One day, 200 years from now, there may be some Puricode revision which will do away with some of the duplicate encodings which we have for various legacy and round-trip "requirements". But that will not invalidate our work today. Even if this is a millennial reign of peace and prosperity, processes of language change will not stop. A list of character names from 1000 years ago, even from 400 years ago, would look very strange today. Nothing stops you from publishing a list of character names in proper English, in Portuguese, or on some Inglish which may exist a long time from now. Currently those strings are "required" to be changeless for stability. So we do not change them, as long as that requirement remains, which the vendors say it is. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Case mapping of dotless lowercase letters
At 16:48 +0100 2003-12-16, Philippe Verdy wrote: Michael Everson wrote: At 11:03 +0100 2003-12-16, Philippe Verdy wrote: >Doug Ewell <[EMAIL PROTECTED]> writes: > > > Wrong here: I have found occurences of dotless lowercase i, used > > > instead of soft-dotted lowercase i, as base letters for diacritics > > > added above it (it was an accute accent...) > > > > Don't do that. > >What? This is VALID UNICODE to have texts coded like this. In Irish, it is INCORRECT to spell "físeán" 'video' with a DOTLESS I + COMBINING ACUTE. It is a spelling error, and will fail in spell-checking. The correct spelling is either I + COMBINING ACUTE or precomposed I WITH ACUTE. Spelling was not the issue there. Only Unicode validity. Apparently you should look up the word "valid". Any character can follow any other character and be "valid". Any combining character can be applied to any base character, regardless of script. > Texts which contain spelling errors. Or old IPA texts using any number of ad-hoc IPA font solutions. Those texts have to be transcoded to proper Unicode at some stage. What you suggest is Not Recommended. Not recommanded but still valid (and actually used in Turkish as well!) Case folding in Turkish and Azeri is DIFFERENT from everywhere else and you have to have a local tailoring for it. used in some occasions because of defects in fonts that don't have a precomposed glyph for letter i with the diacritic but have a separate glyph for the combining diacritic and for the dotted and dotless letters i, or that use renderers unable to remove the soft dot. What defects there are in FONTS without UNICODE CMAPS is of no concern to us. The IPA-93 font is such one, which allows good typesetting, but which needs glyph processing to select the appropriate base letter. It isn't a Unicode font, and so it doesn't matter. Data represented in it has to be transcoded to Unicode, and the font has to have the right thing in it. My main issue is, however with Turkish names found in environments where language identification is not possible (for example a simple filename or a locale-neutral database field or an international HTML form which requests user names and use them as case insensitive identifiers); lowercase dotless i do not work appropriately there. Oh well. I think it is completely illogical to match together with case-insensitive compares, the three letters: LATIN SMALL LETTER I (dotted) LATIN CAPITAL LETTER I (dotless) LATIN CAPITAL LETTER I WITH DOT ABOVE but not: LATIN SMALL LETTER DOTLESS I when use locale-neutral compares, given that the normative uppercase mapping of this fourth letter is the second letter above. That is not what happens in locale-neutral comparisons, I believe. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Case mapping of dotless lowercase letters
At 20:30 +0100 2003-12-16, Chris Jacobs wrote: > NO. There's no canonical equivalence between distinct pairs of characters, if the first letter of each pair are not also canonically equivalent. compare ë with ´¨ The first pair has e trema as its first letter, the second pair e ogonek. Yet these pairs are canonical equivalent. The base letter is "e" -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Stability of WG2
At 16:05 -0500 2003-12-16, [EMAIL PROTECTED] wrote: Thus when Brontosaurus and Apatosaurus were found to be synonyms, Apatosaurus was chosen as the preferred name because it was published first; however, this is not properly describable as "changing the name of Brontosaurus to 'Apatosaurus'". "Brontosaurus" is a perfectly good name and may still be used even though it is dispreferred. Brontosaurus was good enough for me when I was five, and it's good enough for me today. Hmpf. Dispreferred me elbow. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Case mapping of dotless lowercase letters
At 00:35 +0100 2003-12-17, Philippe Verdy wrote: >>NO. There's no canonical equivalence between distinct pairs of >>characters, if the first letter of each pair are not also canonically >>equivalent. > compare ë? with e¨ The first pair has e trema as its first letter, the second pair e ogonek. Yet these pairs are canonical equivalent. True in the way you interpret my sentence, but when I say the "first letter" of each pair, I mean the first non decomposable character of each pair. In your example, both letters are simple "e" vowels. e-diaeresis is decomposable to e + combining diaeresis. e-ogonek-diaeresis is decomposable to e + combining diaeresis + combining ogonek or to e + combining ogonek + combining diaeresis. The last two are equivalent. Both "dotted lowercase i" and "dotless lowercase i" are not decomposable... unlike "dotter uppercase I"... small letter i and small letter dotless i are as different as t and thorn. Well Outlook 2000 is unable to represent any e with ogonek and trema of your example. Get a better browser. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: [OT] CJK -> CJC (Re: Corea?)
At 11:30 + 2003-12-17, [EMAIL PROTECTED] wrote: I doubt Christians mean offence when they refer to Jesus through any of the countless transcriptions, spellings and pronunciations used in various languages. It's odd that in English Judas and Jude are distinguished; in the original they are not. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: [OT] CJK -> CJC (Re: Corea?)
At 11:04 +0100 2003-12-17, Marco Cimarosti wrote: There is reason to rename "Colonia" to "Köln", "Augusta" to "Augsburg", "Eboraco" to "York", "Provincia" to "Provence", and so on. Nicely said. Subtle irony tends to go over some people's heads on this list though. Eboraco is called Eabhrac in Irish. :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [OT] Keyboards (was: American English translation of character names)
At 14:53 + 2003-12-18, Arcane Jill wrote: Oh wow. Well, the range of different keyboard layouts I see around me is something else! (Especially on laptops). Now here's something weird. Just about every standard, fully-size, desktop, (British) QWERTY keyboard I have ever seen, has the legend for U+00A6 BROKEN BAR as the shifted symbol printed on the key to the immediate left of Z (with the unshifted symbol being backslash), and the legend for U+007C VERTICAL LINE as the third symbol printed on the key to the immediate left of 1 (with the unshifted and shifted symbols being backquote (U+0060, officially GRAVE ACCENT) and the aforementioned "not sign" (U+00AC) respectively). Thus, you would expect to yeild BROKEN BAR, and you would expect to yield VERTICAL LINE, because that's what printed on the keys. On the Mac, the situation is a bit different. On older keyboards, the grave/tilde `~ key was to the left of the 1; on newer ones, that key is to the left of the Z, and to the left of the 1 is the section/plus-minus §± key. Then on the other side of the keyboard, older keyboards had the backslash/vertical-bar key to the right of the equals-sign; newer keyboards have this key to the right of the apostrophe key. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: American English translation of character names
At 16:21 +0100 2003-12-18, Philippe Verdy wrote: John Cowan wrote: The most mysterious term is "caron" for the hacek accent: this word seems to exist only in ISO standards, and nobody has any idea where it came from. I think it may have occured in some typographic terminology, because the intial glyph looked more like a crochet hook than to a reversed circumflex, i.e. caron was not angular in handwritten form, as it is now in typesetted fonts, but looked like a rounded and oblique check mark (a slight variation of the accute accent with a small rounded hook on its bottom end, but still much more distinctful from the lower half-circle form used by breve). This doesn't make any sense to me, but in any case it does not explain the origin of the word "caron". The most plausible suggestion I've ever come up with is folk-etymological: It's a CARet that sits ON the vowel. :-( -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: American English translation of character names
At 09:01 -0500 2003-12-18, John Cowan wrote: "Underscore" would suggest rather U+0332, the combining low line. As for "pilcrow", it's probably descended from a perversion of "paragraph", but nobody knows for sure. The OED gives other forms for it: 15th-century pylcraft(e), pilecrafte; 16th-century pilcrowe; 17th-century pilkrow, pill-crow, peelcrow, pilgrow. Apparently for pilled crow, cf. pilcord, pilgarlic. The application of the word, with the form pylcraft, has suggested that it originated in a perversion of PARAGRAPH, through pargrafte, *parcrafte, etc.: cf quote c 1460 and 1617. But the history of the word is obscure, and evidence is wanting. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: [OT] Keyboards (was: American English translation of characte r names)
At 18:44 +0100 2003-12-18, Marco Cimarosti wrote: > They didn't add an extra key for the Euro though. We access that as . What OS is it? Most european keyboard I have seen have euro on . Not English. AltGr + E usually gets you the acute accent in the UK; certainly that is the case for Irish keyboards. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Aramaic unification and information retrieval
At 01:54 +0100 2003-12-21, Philippe Verdy wrote: The way the various Indic scripts create ligatures and take contextual forms make each of them very unique by themselves. The only common thing they have is a set of common phonemes which are more or less near from each other, with large variations between regional dialects. They have a common structure, which we follow in encoding. The way each of these scripts were then used and created their own orthograph for distinct languages and they were adapted to allow writing one language in another with irregular orthographic rules is so important that simple 1-to-1 transliterations from one to the other are very poor. You can't simply transliterate without taking into account difference of phonetics between regions speaking variants of the same language. Nonsense. Of course you can. KA is KA is KA is KA and BHA is BHA is BHA is BHA. The *reading rules* for pronouncing what's been written differ, but the transliteration is by and large one-to-one. Tamil of course is an exception, having lost some consonants. Finally, not all Indian share the "same" subset of characters. It's just unfortunate that you think that because the ISCII standard tried to "unify" them in the same encoding model, but still with distinct charsets. This doesn't make any sense to me at all. Indic scripts have much less in common than Greek, Latin and Cyrillic. That isn't true. They are just using smaller sets of letters (at the price of an extremely elaborate system of contextual forms). I don't know what you are talking about. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 12:33 -0800 2003-12-21, Peter Kirk wrote: Nonsense. Of course you can. KA is KA is KA is KA and BHA is BHA is BHA is BHA. The *reading rules* for pronouncing what's been written differ, but the transliteration is by and large one-to-one. Tamil of course is an exception, having lost some consonants. Michael, in view of this do you think it might be sensible to treat the different Indic scripts as equivalent for collation purposes? No, not at all. Not in the default template. The default template sorts scripts separately. This might be especially useful with a corpus of material in one language e.g. Sanskrit but using different scripts. Actually I rather think it would form a list which was an outrageously illegible mess. And then, how about the Semitic scripts? After all, ALEF is ALEF is ALEF is ALEF and ... Nope. It would also be an outrageously illegible mess. But you can tailor it locally if you wanted to. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 14:18 -0800 2003-12-21, Peter Kirk wrote: So, "KA is KA is KA is KA and BHA is BHA is BHA is BHA", and ALEF is ALEF is ALEF is ALEF, except when it comes to comparing them and collating them? In the context which I was speaking, yes. The Indic KAs have a one-to-one relationship, historically. We know this. Likewise the Semitic ALEFs. That doesn't mean that we should unify the Indic scripts all into one (which we haven't) or that we should unify all the Semitic scripts into one. If you have a multiscript database for Pali and you need to search all the KAs accross scripts, you will have to have a local engine to do so. The scripts are distinct as encoded in the Unicode standard. If you want to sort such a database, illegible as the result would be, you can do it, with a local tailoring for your specific purpose. The default table in the UCA will not interfile them, however, because it orders the scripts sequentially (apart from digits, which are treated differently because of their particular properties). I'm not saying you can't tailor. You can. I'm saying we're not going to change what we are doing in the UCA and ISO/IEC 14651 because it distinguishes scripts on purpose. Of course if one collates together a mixture of Latin script texts in very different fonts and styles one can get an outrageously messy list which is illegible to those who don't know all the different fonts. I do not consider the Semitic nodes we are considering for eventual encoding to be font variants of each other. But that is hardly the point. Anyway, I don't see the main purpose of collation as producing lists of legible words, but rather as matching in text and database searches. Which you as an expert can do with special tools. Michael, do you realise that I am trying to offer you an olive branch, and all I get is it thrown back in my face, nicely by you but rudely by someone else offlist. No, I didn't. In the first place I didn't know that we were at war. In the second place, all I'm telling you is that we have practices which are generic to certain levels of our work, and we are not likely to deviate from those practices. That's not throwing something in your face. That's telling you what's what. We had a similar discussion about generic practice when we were putting Runic into the UCA. Swedish specialists wanted a Latin-based order. That's specific. Everyone else, though, would want the native Futhark order. The Japanese NB, which doesn't really worry about Runes much, thought that the generic order should be the basic historical one. I think that it just might be acceptable to encode the various ancient Semitic scripts separately if they are unified for collation. You can tailor a unified collation for them or indeed for anything you like. But if you are saying that it must be all or nothing, I will continue to fight on behalf of the users of these scripts for all of what they want, rather than what you have apparently unilaterally (on the basis of a book which describes glyph shape differences rather than the systematic differences which really distinguish scripts) decided that they ought to want and have written into your Roadmap. *I* have not decided on the basis of *one* book, thanks very much. Nor have I done anything unilaterally. Nor have we made decisions which aren't based on our normal working practice. I'm not interested in worrying about these bits of the Roadmap right now. If I work on anything over the Christmas, it should be N'Ko. Then there is more work on Cuneiform. Then work on Manichaean and Avestan. Then I've got to prepare for the PDAM comments. This sniping, even when nice, isn't doing you any good, nor me. Can we drop this for a while, please? Michael (I am sorry you had rude private mail from someone. I also had private mail from someone which suggested that I didn't know anything about Indic scripts, while saying a whole lot of other rather incomprehensible things about ISCII and Unicode. Better forgotten.)
Re: Aramaic unification and information retrieval
At 04:27 -0800 2003-12-22, Peter Kirk wrote: In view of this, I call for a review of the roadmaps and in particular of the status of the Aramaic, Palmyrene, Nabataean, Elymaic and Hatran scripts. We heard you the last time, Peter. We know that this is a concern of yours. Serious consideration should be given to unifying these scripts with the Hebrew script, of which they appear to be glyph variants. To you. The separate status of Phoenician may also need to be reconsidered. Absolutely not. Phoenician is the mother of these scripts and Greek and Old Italic besides. Greek and Old Italic did *not* descend from "Hebrew", and it is pernicious to go on suggesting that Phoenician should be unified with Hebrew. If you want, as some scholars do, to write Phoenician in Hebrew script, go right ahead. That is a perfectly reasonable transliteration choice. Nothing prevents you from doing it. But historical realities and relationships *do* have some relation to the content of the Unicode Standard and ISO/IEC 10646. And that may include encoding things that you won't use, though *others* might. Note that I am calling for a review only of scripts listed in N2311 as not in current use. Please do not force us to undertake this review NOW. We do not have the resources to do so effectively and already this thread has taken up far too much time and energy. We have explained to you that nothing actionable is happening with any of this material at present. How many times do I have to say that? -- Michael Everson * * Everson Typography * * http://www.evertype.com
2003-12-22
Grianstad faoi mhaise do chách! Happy Solstice to everyone! -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 21:36 -0800 2003-12-22, Doug Ewell wrote: Maybe not as far as whether it will actually be encoded. We do know that "Accordance with the Roadmap" is often the sole justification for the code positions specified in proposals, as discussed in a thread some months ago. Excuse me? Are you irritated about something, Doug? When I fill out the proposal summary form, I do NOT bother to rehash all the reasons why we decided to put something on the BMP or the SMP. Why? Because it isn't a good use of our time to rehash all of these things and pour out the history of why we thought it would be good to put something where. "Accordance with the Roadmap" is often the sole justification that I bother to put in the Proposal Summary form. But it reflects consensus about where the Roadmap Committee thinks things ought to go. You may remember that Ken convinced me to move Phoenician to the SMP at one stage in favour of Arabic Extensions. I suppose that's in the archives somewhere, where some future Historian of Unicode (hi there!) can find it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 01:59 -0800 2003-12-23, Doug Ewell wrote: The impression I get, which is probably totally off base, is that when Script X is first considered a candidate for possible future encoding, Michael or somebody looks around for a big-enough empty spot in the Roadmap and says, "Hmm, let's put it... there." There are zones for RTL scripts and a rough guideline for zones in the SMP, but in general it's pretty much open territory. You may remember that the Roadmaps were first made by me (not by a committee), oh, some time back in 1996 or 1997. I am not sure, actually. Actually I did find some copies of old versions and I thought I might dust them off and post them to the Roadmap site so people could see how things evolved, assuming that people care. I'm not sure when the first Roadmap version that I sent to WG2 was. Of course at http://www.unicode.org/roadmaps/index.html we inform you: "When scripts are actually proposed to the UTC or to WG2, the practice is to 'front' them in the zones to which they are tentatively allocated, and to adjust the block size with regard to the allocation proposed. The size and location of the unallocated script blocks are merely proposals based on the current state of planning. The size and location of a script may change during final allocation of the script." Years later, when some of the adjoining allocations may not have gotten off the ground and others have suddenly sprung into being (like the FUPA extensions, which IIRC were never roadmapped until after they were proposed), Alphabetic extensions or something was put in about the same time, if I recall. Usually when something pops up I roadmap it. It helps to know where things might fit. the formal proposal for Script X is written and cites the Roadmap as the only justification for the proposed code points, even if there might be other reasons supporting (or controverting) that criterion. The justification is only with regard to what plane the thing is on. Usually it doesn't matter what code positions a script gets, as long as small alphabets are aligned on a half-block boundary (for SCSU), but it might be nice sometimes to see a rationale other than "Accordance with the Roadmap," or a short blurb explaining why the Roadmap had the script there in the first place. Might it indeed. :-| The justification is only with regard to what plane the thing is on. This is NOT a huge problem for me, just something I've noticed. With all the careful scrutiny that character proposals get, on everything from glyphs to properties, the code position assignments seem relatively arbitrary. Ahem. The justification is only with regard to what plane the thing is on. > I deliberately followed the roadmap codepoints for my recent 'Phags-pa proposal even though I think 'Phags-pa probably belongs in the SMP (but I don't really care where 'Phags-pa is encoded as long as it is encoded, so I am happy to defer to Michael, Rick and Ken in this regard); and then WG2 in their wisdom decided to reallocate the block three rows north of the roadmapped codepoints ... so maybe you can't assume that roadmap codepoints are carved in stone. I didn't see the minutes of the meeting where that decision was made. What was the rationale for moving it? It had been on the Roadmap to the BMP along with some other Brahmic scripts, and with Tibetan and Mongolian, as far as I recall. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 04:30 -0800 2003-12-23, Peter Kirk wrote: As the subject line here is still about Aramaic, I shall remind you all that that script is a good example of a script which has been roadmapped for the BMP as a misunderstanding. I am aware that this is your opinion, Peter. If there is such a script at all distinct from the Hebrew script, it is one which died out, and was replaced by other encoded or roadmapped scripts, more than 2000 years ago. Just for the sake of argument, and not with particular reference to the scripts currently under discussion, it is acceptable to us to encode extinct scripts even when some scholars prefer to use something else. Gothic is one such example. So this is a case where the original decisions of the Roadmap Committee need reviewing. You have stated this already. That decision was based on N2311 which, as James points out, notes twice that "Further research is required". Gosh. And I'm the one who wrote that. Isn't that something? The UTC should make sure that such research has been done properly, and not allow provisional decisions taken on the basis of incomplete research to become standardised by default. Don't be ridiculous. Nothing gets standardized by default. Thank you for your input. Your input has been noted. Will you please give it a rest now? The matter will be reviewed in due course. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 17:41 -0800 2003-12-22, Kenneth Whistler wrote: If there is, however, some consensus that Samaritan and Manichaen *do* deserve separate encoding consideration, how about pursuing the furthering of encoding proposals for those as distinct scripts and then come back around later to review the ancient forms once again after some more of the pieces have fallen into place? Oh, Manichaean is certainly going to be encoded. The German scholars I met with in Prague last year have been extremely helpful in Regarding Samaritan, there is a group of modern users certainly. This page http://www.orindalodge.org/kadoshsamaritan.php has a number of interesting links on it. Masonic scholars apparently differentiate between Hebrew and Samaritan. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 17:41 -0800 2003-12-22, Kenneth Whistler wrote: If there is, however, some consensus that Samaritan and Manichaen *do* deserve separate encoding consideration, how about pursuing the furthering of encoding proposals for those as distinct scripts and then come back around later to review the ancient forms once again after some more of the pieces have fallen into place? Oh, Manichaean is certainly going to be encoded. The German scholars I met with in Prague last year have been extremely helpful in working out the specifications needed. And I am supposed to meet with Iranian experts later this year to finalize things. Regarding Samaritan, there is a group of modern users certainly. This page http://www.orindalodge.org/kadoshsamaritan.php has a number of interesting links on it. Masonic scholars apparently differentiate between Hebrew and Samaritan. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 06:10 -0800 2003-12-23, Peter Kirk wrote: If so you must have second sight, because I have not stated this point before, which is that the place for Aramaic, if encoded at all, is on the SMP together with other extinct scripts. Ah. I thought you were complaining (again) about Aramaic being on any Roadmap, rather than making a distinction between SMP and BMP. But extinct scripts should be encoded on the SMP, according to the rules in e.g. TUS 4.0 section 2.8. Gothic is an example of that. If Aramaic is encoded, it should be another example. There are no RULES about where anything gets encoded. There are guidelines. nevertheless, I have no problem with Aramaic being encoded on the SMP. I'll move it there now. Happy Christmas. :-) The UTC should make sure that such research has been done properly, and not allow provisional decisions taken on the basis of incomplete research to become standardised by default. Don't be ridiculous. Nothing gets standardized by default. It was you, Michael, who wrote: When I fill out the proposal summary form, I do NOT bother to rehash all the reasons why we decided to put something on the BMP or the SMP. That implies that you expect the UTC to accept those reasons without further questioning, No, it doesn't, but you are not taking into account other facets of our process that have to do with consensus in the meetings. I can't fault you for that, but please don't be so literalist. ;-) without even any documentation explaining the earlier decision, and without checking whether, even according to that documentation, "Further research is required". That was my meaning. The UTC doesn't allocate code positions. WG2 does. We assign things their places in WG2 meetings according to consensus. Now, go have a mince pie. I'm going to. :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 08:51 -0800 2003-12-23, Peter Kirk wrote: On 23/12/2003 06:22, Michael Everson wrote: ... There are no RULES about where anything gets encoded. There are guidelines. nevertheless, I have no problem with Aramaic being encoded on the SMP. I'll move it there now. Happy Christmas. :-) Thank you. I see it is done already. Happy Christmas! I told you I was going to do it NOW. ;-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 20:24 + 2003-12-23, [EMAIL PROTECTED] wrote: . Peter Kirk wrote, ... But I do know of one person today who chooses to read the Hebrew > Bible rendered with palaeo-Hebrew glyphs. http://www.crowndiamond.org/cd/torah.html Yes, this is fascinating and I'd stumbled across it before. Of course, to echo the observation John Hudson made regarding the Masonic Hebrew and Samaritan text, the text presented here http://www.crowndiamond.org/cd/genesis.html shows that Palaeo-Hebrew should obviously unified with Latin. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: why Aramaic now
Elaine, Rick and I and Ken have all explained our position already. You're doing nothing but stirring up a whole bunch of stuff that we aren't working on now, and that we aren't going to be working on soon. You're not asking us to deal with anything actionable, and this is keeping us from doing work which IS actionable and necessary. We have received Peter Kirk's request for review. I moved Aramaic to the SMP. That doesn't mean that we will ever encode it. It does mean that further research is required. I do not have time or resources to invest in the work required to handle this request right now. There are few others in WG2 or in the UTC who would be prepared to do so either. I have asked you any number of times, courteously to accept this. Nothing is being encoded that endangers your use of Hebrew transliteration which you are currently using. If some day other things are encoded, nothing makes you have to use them. Please stop pouring oil on this. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 14:39 -0800 2003-12-23, John Hudson wrote: Now, that said, I am very keen to have the Samaritan shin encoded, because this is used as a mark in the apparatus critici of the BHS and possibly other Bible editions (in BHS it used in citations of Pentateuchi textus Hebraeo-Samaritanus secundum). I'd be perfectly happy to see it encoded as a Letterlike Symbol, since it is being used as a symbol and not as a Samaritan letter. Perhaps it must be in any case, due to directionality issues. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: why Aramaic now
At 14:49 -0800 2003-12-23, John Hudson wrote: Michael, I think you are missing the point that other people do have time and resources to devote to 'further research' at this time, and this is why these discussions are happening. Personally, I'm happy to accept that the position of Aramaic in the roadmap is an open issue and is going to remain so, but as Elaine pointed out there is a lot of interest in Unicode among Biblical scholars right now -- which is a Good Thing -- and some of these people are wanting to start addressing some of the questions and issues that they are confronting as they proceed. The main answer to their question is that they can use Hebrew to transliterate whatever they want. Whether Phoenician or Samaritan needs to be encoded for OTHER purposes than those of these particular scholars (who are happy using Hebrew square letter fonts for them) is another question. I don't think this means you personally need to do anything -- or Rick or Ken -- but there are going to be some proposals developed for additional Hebrew characters I'm not complaining about that, and am helping with two of them. and some documents on different approaches to unifying or not unifying the bewildering array of early semitic writing systems, That *is* something that is going to impact on what I have to do, and I would really rather not be forced to give up doing other things to deal with that. Which I am, even now. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Aramaic unification and information retrieval
At 01:02 +0100 2003-12-24, Philippe Verdy wrote: Michael Everson: Perhaps it must be in any case, due to directionality issues. If you have looked at those pages, you have seen that they were coded as a cypher of Latin, but with no implied association with these letters. It just allows using the existing font technology in a way that is not Unicode compliant as it shows unrelated glyphs for standard Latin letters. Goodness, that would never have occurred to me. The rest of your post has nothing whatsoever to do with the character John Hudson is referring to, nor to its properties, which is what I was discussing. Please do not answer this with a lengthy response explaining what you meant. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Samaritan, was: Aramaic unification and information retrieval
At 15:51 -0800 2003-12-23, Peter Kirk wrote: Agreed that the Samaritan shin is urgent for this reason. This could be added in the ballot comments to the symbol set currently under ballot. I would need a good scan of the character in context and its bibliographical reference. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Aramaic unification and information retrieval
At 01:40 +0100 2003-12-24, Philippe Verdy wrote: Michael Everson wrote: > Of course, to echo the observation John Hudson made regarding the > Masonic Hebrew and Samaritan text, the text presented here > http://www.crowndiamond.org/cd/genesis.html shows that Palaeo-Hebrew > should obviously unified with Latin. Instead of taking dogmatic positions on how proto-semitics scripts should be encoded, why not leaving this work to the people that will really use these scripts and are currently working with those texts and publishing them? Because I am not taking "dogmatic positions". I know what I am doing. I am being careful, trying to manage the work within the larger context of the schedule we have set ourselves, and trying to do this it in terms of realistic priorities. It seems that there are much enough people working there without needing to oppose to all what they have to say. That isn't what I am doing. Indeed, I accepted a useful suggestion on the part of Peter Kirk. I do, however, oppose overunification when it is warranted to do so. At the same time it takes time to do that. It took a great deal of time to disunify Coptic from Greek and Nuskhuri from Mkhedruli. I do NOT want to have to do that again with a hasty overunification of early Semitic alphabets. Could you instead take the time to work on the missing Latin letters for African languages? Why isn't there any serious work about these living languages that don't have lot of universitary support and nearly no computer resources in Africa to make this job? Thank you for proposing more topics requiring extensive research and proposal preparation, especially as the materials needed to make such proposals are not available to us. Please give generously to the Script Encoding Initiative to enable us to undertake such work. Alternatively, please collect the necessary materials and provide them to us. There is still interesting work to do within the Latin and Arabic scripts. Yes, there is. See N2692, for instance, and Ns247, and N2641, and N2640, and N2581R2. It's a shame that someone like you invest so much in an area that would better be specified by other communities. Is it indeed. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Aramaic unification and information retrieval
At 11:50 +0100 2003-12-24, Philippe Verdy wrote: John Jenkins wrote: > No, it was not. Han would have been unified even if there had been > space not to do so. I fully agree. Unicode would have been updated later to support surrogates if CJK had been extended so much that it could no more fit the full CJK set. This has nothing to do with what John said. ISO10646 could have followed a distinct path where each language could have been encoded separately, but the choice to encode only scripts has greatly reduced the needs for more planes, which was reasonnable to project when you saw the explosion of encodings that were soon to exceed the capabilities of ISO2022 and similar 8-bit code repertoires). This is, I am sorry to say, a completely unwarranted assumption. No one EVER suggested "encoding each language separately" in ISO/IEC 10646 [sic]. This is but the latest of Philippe's pronouncements, presented as though he were an expert who had been following the Unicode project from the beginning. Unfortunately, it is as wrong as it is unsubstantiated. Note to the historians of Unicode reading these archives: Caveat lector. Note to Philippe: Over the past six months, you have written as though you were expert in all things Unicode; it is clear nevertheless that you are not, not yet, and that you have much to learn. You need to go and do the work of learning it. Doug Ewell did this, and went from being an amateur to a valued member of our team. Currently, I can't count the number of times that you have come out with "authoritative" pronouncements which had no basis in fact, and your credibility is nearing zero. (That advice, Philippe, is a Christmas present for you. Please do not respond with a lengthy explanation. And please do not send me a private message about it. If you do, I promise I will blacklist you, as I know at least one other has.) (Of course I am sure I have my own detractors reading this list, to whom I will look to some like Michael Curmudgeon McKnowitall Everson by saying this out loud, as opposed to sniggering quietly in offlist mail, but sometimes that's just my lot.) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 07:11 -0800 2003-12-24, Elaine Keown wrote: In the Dead Sea Scrolls, several other letters with Palaeo-Hebrew shapes are used as paragraph etc. markers. Those would be Phoenician letters with RTL directionality used as markers in a traditional text. (That is given the current Roadmap which unifies Palaeo-Hebrew and Phoenician.) So, if you wish, your shin could be submitted when they are--Elaine The Samaritan shin is an LTR clone of, um, the Samaritan shin used in Western Biblical references. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 15:36 + 2003-12-24, Michael Everson wrote: The Samaritan shin is an LTR clone of, um, the Samaritan shin used in Western Biblical references. Recte: The Samaritan shin is an LTR clone of, um, the Samaritan shin, and is used in Western Biblical references. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: why Aramaic now
At 11:47 -0800 2003-12-24, Elaine Keown wrote: Michael, I have NO "master plan" for 2004 where this work on Aramaic unification (or near-unification) will be completed in a particular month, quarter, or even season. Or not. It depends what kinds of criteria we select, or don't, and it's good to know that you aren't prioritizing that either. In the meantime, if you *do* have contact with experts in Samaritan, could you inform Debbie Anderson of this. Samaritan is likely to be actionable in the shorter term rather than the longer, and is clearly a different script from Hebrew. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: why Aramaic now
At 12:02 -0800 2003-12-24, Elaine Keown wrote: Some of the sets of symbols I found---which I simply assumed could be added to "Hebrew"--are innately controversial because of the Roadmap. Innately? That's actually true for 3 subsets of symbols that I think of as "Extended Hebrew." Try thinking of them as General RTL Punctuation. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: [hebrew] Re: Aramaic unification and information retrieval
We have encoded 70,000 of them. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: why Aramaic now lumpers and splitters
At 12:29 -0800 2003-12-24, Elaine Keown wrote: It appears to me that script experts may resemble experts in dialects/languages: there are lumpers and splitters I'm a lumper, but I am a thinking lumperI will be thinking about Phoenician retrieval in early 2004 There is zero chance that Phoenician will be considered to be a glyph variant of Hebrew. Zero chance. The number of books about writing systems, from children's books to books for adults, which contain references to the Phoenician alphabet as the parent to both Etruscan and Hebrew, are legion. Some scholars may decide to transliterate all Phoenician texts into Hebrew script and read only that, and retrieve it from their databases, and that is perfectly fine. Lots of people transliterate Sanskrit into Latin and never use Devanagari. I would be happy to inform Debbie. The font for the Samaritan marks is still in rough draft due to what I did in fall What "marks" are these? and I had confusing email from a Samaritan expert I consulted that needs to be processed.(re vowels not unification) Documents available to me suggest that Samaritan can (but needn't) use Arabic fatha and kasra and others, and that there are orthographies for which some letters are used vocalically, a bit like Yiddish. > is clearly a different script from Hebrew. Different is in the eye of the beholder, I'm afraid. Or, if you will, in the eye of the cyber-machine No. It is a question of history and development. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: why Aramaic now
At 12:38 -0800 2003-12-24, Curtis Clark wrote: on 2003-12-24 12:02 Elaine Keown wrote: Some of the sets of symbols I found---which I simply assumed could be added to "Hebrew"--are innately controversial because of the Roadmap. I've been following these threads with interest, as an uninformed bystander. Michael's unwillingness to unify in haste seems correct in first principles, independent of his expertise and experience. But you have presented the first cogent (to me :-) argument for why delaying the decision is a problem. Not at all. Punctuation marks are often shared between scripts. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: why Aramaic now lumpers and splitters
At 14:08 -0800 2003-12-24, Elaine Keown wrote: > There is zero chance that Phoenician will be considered to be a glyph variant of Hebrew. Many, many Semitists would be truly astonished to read this sentence. They will need to get over it. Many, many other people will want Phoenician encoded as a script whether or not Semiticists choose to use it. It is a cultural matter, not just a matter of comparative Semitics. Again, Germanicists may prefer Latin to Gothic, and Indo-Europeanists may prefer Latin to Kharoshthi or Devanagari, yet we encode all. Samaritan Bibles have fascinating marks that indicate the emotion or dramatic interpretation to use in reading each verse.pretty nifty! Can you please send bibliographical references and/or samples to me or to Debbie or both? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: why Aramaic now lumpers and splitters Samaritan
At 08:43 -0800 2003-12-25, Elaine Keown wrote: In addition, I was unable to find complete information on Samaritan--I couldn't find any running text with vowels that was large enough to scan for a proposal here in Texas. So anything I would send you now would not be enough to write a proposal. I would rely on materials you were able to supply to supplement what I already have, which is not inconsiderable. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Ancient Northwest Semitic Script (was Re: why Aramaic now)
c, etc. at the same time, and now there is resistance to using Unicode characters with "Hebrew" in their names to write Phoenician, Aramaic, etc. I think the "real problem" here arises from the fact that some scholars, familiar with Hebrew, find it easier to read early Semitic texts in square script than in the originals. The same thing happens with Runic and Gothic and Glagolitic and Khutsuri, and indeed Cuneiform, where Latin is often preferred (regardless of the structure of the writing systems). The needs of those scholars is met: they can use Hebrew and Latin with diacritics. No problem. The needs of other clients of the Universal Character Set, no matter how "unscholarly" they may be, will be met by encoding appropriate nodes in the Semitic tree. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Aramaic unification and information retrieval
At 17:46 + 2003-12-26, Christopher John Fynn wrote: (Though the Roman style & Fraktur style of Latin script are probably more different from each other as some of the separately encoded Indic scripts [e.g. Kannada / Telugu]) Sorry, Chris, this is unsubstantiated speculation, and it doesn't happen to be true. In 1997, I showed some comparisons between Coptic, Greek, Cyrillic, and Gothic showing that all of them but Greek were similar enough to be read with a minimum of training and practice. I revised this a bit in 2001: http://www.evertype.com/standards/cy/coptic.html. German, English, and Irish can all be read with similarly low learning curve whether the script is Fraktur or Gaelic; the number of letterforms which differ is small. Wedding invitations in English-speaking countries are routinely written in non-Latin garb. the identification is uncontested! No student of writing systems classes the "Gaelic script" as something different from "Latin script". The same cannot be said of Phoenician, Samaritan, and Hebrew, for instance. So in the case of the ancient Semitic scripts - even if they are closely related, is each associated with a particular written language - or were the different but related scripts being used to write a common language? All of them can be used to write more than one language. Some of them may not have been. It's complex and needs review. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Ancient Northwest Semitic Script
At 00:36 -0500 2003-12-27, Dean Snyder wrote: This document by Michael Everson is particularly revealing and in the end damning to his whole attempt at disunification of the Northwest Semitic script. I am not interested in participating in this kind of discourse. This is not "Michael Everson vs the Semitic scholars", Mr Snyder. Your "Northwest Semitic" is the same as "my" Phoenician in any case; so, in fact, you agree with the Roadmap as regards some points. Lumpers can use Hebrew. Splitters need more granularity. We will, eventually, be investigating the levels of granularity that will be useful. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Ancient Northwest Semitic Script
At 11:20 -0500 2003-12-27, Dean Snyder wrote: But my main objection is that you have ALREADY made up your mind about Phoenician and Hebrew, categorically and emphatically declaring that there is "zero chance" that they will be considered glyphic variants of one another. I'm sorry you object. I remain convinced, however, that suggestion that Phoenician be unified with Hebrew and Phoenician is ridiculous in the extreme, and I will oppose it absolutely. Likewise, it is clear that Samaritan is also not to be unified with Hebrew. There may be some grey area regarding the relation of one variety or another of Aramaic to Phoenician, and to Hebrew and other descendants of Aramaic. That is what gave rise to this and related threads. If you don't like this, that's fine. You can raise your objections when I eventually have the time and resources to push the Phoenician or Samaritan proposal forward. (Realistically, we can't expect that any one else will be doing so.) I'm not going to do that now, nor am I going to engage in further academic debate with you. You've put far more weight on the niggly details in N2311, which is an informative document written two years ago in order to help make sense out of chaos. O'Connor's chart there is one of many charts; its being there is also informative. In the meantime, the Roadmap will stay as it is, because these issues remain open. As I see it, it is a certainty that Phoenician and Samaritan will be encoded, for good reasons I shall not go into here. And in due course, it will be possible to discuss what remains. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [hebrew] Re: Ancient Northwest Semitic Script
At 13:36 -0500 2003-12-27, John Cowan wrote: Michael Everson scripsit: I remain convinced, however, that suggestion that Phoenician be unified with Hebrew and Phoenician is ridiculous in the extreme, and I will oppose it absolutely. Likewise, it is clear that Samaritan is also not to be unified with Hebrew. There's clearly a slip here: the second occurrence of "Phoenician" must mean something else, and I can't figure out what. However, it is not so clear to me that Phoenician and palaeo-Hebrew (and a fortiori Samaritan) should not be unified. Sorry. I remain convinced, however, that suggestion that Phoenician be unified with Hebrew is ridiculous in the extreme, and I will oppose it absolutely. Likewise, it is clear that Samaritan is also not to be unified with Hebrew. Currently we do think that Phoenican and Palaeo-Hebrew should be unified. Samaritan on the other hand is a later development of that line, which had to good fortune of taking on typographic regularization and development; it has interesting and unique features with regard to vowel representation, and a modern community of users; it is best disunified from Phoenician. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [hebrew] Re: Aramaic unification and information retrieval
At 14:44 -0800 2003-12-27, Peter Kirk wrote: Doug, thanks for making this new point re ancient Semitic scripts. Fundamental identity of the characters is a strong reason for unifying these scripts as well as Han scripts. As I wrote a few days ago, ALEF is ALEF is ALEF is ALEF, whatever glyph shapes are used. And ALPHA and A, are just the same. We disunified Nuskhuri from Mkhedruli, and familiarity and legibility were indeed criteria for the disunification. Mark Shoulson has just given his expert testimony that, one-to-one relation to the Semitic repertoire or not, Samaritan needs to be considered different from Hebrew. I'd say he'd probably feel the same about the older Phoenician as well. I will say it again: You and every Semiticist specialist on the face of the earth can encode every Phoenician document transliterated into Hebrew script in your databases and never even look at an eventually encoded Phoenician script. That usage still doesn't mean that the Phoenician script is a glyph variant of square Hebrew even if they share a repertoire. Even in antiquity these scripts were used distinctively in a number of instances, which will be discussed in the proposal documents in due course. Scripts develop, and differentiate. The nodes of Semitic which we will encode have not all been investigated, but, like Indic, it makes sense to encode more than one of them. I believe that the distinction between Phoenician and Square Hebrew should be maintained in plain text; font markup is not sufficient. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: German 0364 COMBINING LATIN SMALL LETTER E
Both s and long s are available for use if anyone wants to use them. What's the problem? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [hebrew] Re: Ancient Northwest Semitic Script (was Re: why Aramaic now)
At 06:40 -0800 2003-12-29, Elaine Keown wrote: Michael Everson wrote: > And the mother of those scripts is Phoenician. She is *not* Hebrew. The mother script is probably the southern Sinai or Wadi el-Hol script, written in about 1,700 B.C.E. by Aramaeans who worked either in the copper mines of the southern Sinai or were mercenaries in an Egyptian army in the Western Desert. That would be the grandmother. :-) I also think that your attitude is that of a Hellenist or Indo-Europeanist, who looks at everything from the perspective of Athens. Think what you like. Semitics is "Praeparatio Hellenika"--its other aspects are less important, and hence not to be emphasized in computerization or anything else. I cannot make sense of this at all. Not all roads lead to Athens, Michael Everson--some of them go elsewhere What the bejeesus are you on about, Elaine? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [hebrew] Re: Ancient Northwest Semitic Script (was Re: why Aramaicnow)
At 06:55 -0800 2003-12-29, Peter Kirk wrote: Yes, this is true at least of Azerbaijani, which mapped Cyrillic glyphs to Latin ones one-to-one. But with Serbo-Croat we are talking of two separate communities which prefer to use separate scripts for what is essentially the same language; and with Azerbaijani we are talking of a deliberate decision by a people, or at least its government, to change scripts. In Sanhedrin and Mishnaic text deliberate distinction is made between Samaritan and Square Hebrew, as will be demonstrated in the Samaritan proposal. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Name Mixup Behind Air France Groundings
At 10:08 -0800 2004-01-02, Joe Becker wrote: French police officials, speaking on condition of anonymity, said errors in spelling and transcription of Arabic names played a role in the mix-up. Figures, doesn't it? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 12:19 -0800 2004-01-02, D. Starner wrote: I'm working with Distributed Proofreaders to produce some minimal Unicode character selectors. Right now I'm working on the Latin character selectors. Since we soley provide material for Project Gutenberg, we usually only deal with characters pre-1923. After stripping composable accents, which characters in the Latin blocks only appeared after that date? Can I assume that both the Pan-Turkic Latin orthography and the Pan-Nigerian alphabet postdate that? No, you can't make assumptions like that. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 14:54 -0800 2004-01-02, Peter Kirk wrote: On 02/01/2004 12:19, D. Starner wrote: I'm working with Distributed Proofreaders to produce some minimal Unicode character selectors. Right now I'm working on the Latin character selectors. Since we soley provide material for Project Gutenberg, we usually only deal with characters pre-1923. After stripping composable accents, which characters in the Latin blocks only appeared after that date? Can I assume that both the Pan-Turkic Latin orthography and the Pan-Nigerian alphabet postdate that? You are probably safe with the Pan-Turkic Latin alphabet. It seems that this was adopted followng the First Turkology Congress, held in Baku in 1926, see http://www.azer.com/aiweb/categories/magazine/81_folder/81_articles/81_turkology_congress.html. You will find Turkic letters in that alphabet which predate that congress. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 16:42 -0800 2004-01-02, D. Starner wrote: > > Can I assume that both the Pan-Turkic >Latin orthography and the Pan-Nigerian alphabet postdate that? No, you can't make assumptions like that. Yes, I can. And I will if I have to. Your question was an historical one. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Caucasian Albanian Alphabet: Ancient Script Discovered in the Ashes
At 15:47 -0800 2004-01-02, Peter Kirk wrote: I have found a new script which may need to be encoded in Unicode. Well, I haven't found it myself, Zaza Alexidze has done that. I was previously aware of this Caucasian Albanian script, but I have only just found out that for the first time an extensive document - 300 pages of a lectionary, dating probably from the 5th century CE - has been found written in this alphabet, and in an ancient form of the Udi language. It seems to be a truly separate alphabet, although distantly related to Georgian and Armenian. Does it? The links you gave were a bit less than conclusive in that regard. But it is not even roadmapped for Unicode. Must you use such rhetoric? It wasn't roadmapped because we had no comprehensive information on it. Now we have more information, which is excellent. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 16:56 -0800 2004-01-02, D. Starner wrote: > Not safe unless you *know* exactly when a character was invented. Not safe for what? I've come across six characters that weren't in Unicode at all. What are they? You assumption wasn't safe given your question. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 09:03 -0800 2004-01-03, Peter Kirk wrote: In fact it should be considered a variant of g. Or q. The representative glyph for this character seems to be good. It is. We went to a lot of trouble getting it that way too. But, given that the name is so misleading but cannot be changed, it is good that there is a note "= gha" in the Unicode character charts. But in the light of naming errors like this one implementers should be advised not to use character names, because they are not reliably helpful. I wouldn't say that. It would better to advise them, as we do, that they cannot rely on the names being perfect. That's different from not using them at all. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Caucasian Albanian Alphabet: Ancient Script Discovered in the Ashes
It looks a lot like what has been called the "Agvan alphabet". See http://www.evertype.com/alphabets/Agvan.jpg -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Latin letter GHA or Latin letter IO ? (was: Pre-1923 characters?)
Philippe said: In Unicode, the glyphs are normative in a way that they allow character identification, but they are not mandatory, so they are mostly informative. This is not true, Philippe. In fact, it is so dreadfully and misleadingly untrue that all I can suggest is that you go back to page one of the Unicode Standard and start over. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 11:15 -0800 2004-01-03, Michael \(michka\) Kaplan wrote: It makes me wish we had a CouldaWouldaShoulda_CharacterName property that contains what the name ought to be, and we document this as one that *will* change any time there is a mistake made in the original character name. We just make a nice informative property and go through all of our known mistakes and the maintenance after the initial pass should be minimal I am sure that eventually such a thing will be implemented. But it would be too early to do it now, I think. Things are still too volatile. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Latin letter GHA or Latin letter IO ? (was: Pre-1923 characters?)
At 21:50 +0100 2004-01-03, Philippe Verdy wrote privately to me: From: "Michael Everson" <[EMAIL PROTECTED]> > Philippe said: > >In Unicode, the glyphs are normative in a way that they allow >character identification, but they are not mandatory, so they are >mostly informative. This is not true, Philippe. In fact, it is so dreadfully and misleadingly untrue that all I can suggest is that you go back to page one of the Unicode Standard and start over. I have read it. Glyphs are just normative as a way to demonstrate a valid representation of the encoded code point, so that any other aceptable glyph should be unambiguously identified as the same character. So these glyphs are normative but not mandatory. Is that a more acceptable formulation? NO, IT IS NOT. Is that clear enough for you? You are spreading MISINFORMATION about Unicode, and this is reprehensible. Particularly when people give you, time and again, accurate information. The glyphs are not normative. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 22:37 +0100 2004-01-03, Philippe Verdy wrote: Note that a fundamental property of character identity is its most common classification as a vowel, consonnant, or semi-vowel. That isn't true. The letter "v" is a vowel in Cherokee, a consonant in Czech, and (often) a semivowel in Danish. Please stop talking as though you are a Unicode authority, Philippe. You are an enthusiastic beginner. There is nothing wrong with that. Good luck with your studies. As I said once before, if you do your homework you could well be as valuable a participant to our work as Doug Ewell is. But for now, your pretense at expertise just makes a lot of people annoyed with you. Perhaps Patrick Andries' French translation of the text of the standard will be of assistance to you. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Latin letter GHA or Latin letter IO ? (was: Pre-1923 characters?)
At 23:23 +0100 2004-01-03, Philippe Verdy wrote: From: "Michael Everson" <[EMAIL PROTECTED]> > The glyphs are not normative. But if you want to insist more with your position, why not simply dropping completely all glyphs from the Unicode standard? Because they are informative. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 00:00 +0100 2004-01-04, Philippe Verdy wrote: From: "Michael Everson" <[EMAIL PROTECTED]> At 22:37 +0100 2004-01-03, Philippe Verdy wrote: >Note that a fundamental property of character identity is its most common >classification as a vowel, consonnant, or semi-vowel. That isn't true. The letter "v" is a vowel in Cherokee, a consonant in Czech, and (often) a semivowel in Danish. Also: what are you demonstrating here? That the fundamental property of character identity of the letter "v" is NOT its use as a consonant, as a vowel, or as a semi-vowel. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Pre-1923 characters?
At 23:40 +0100 2004-01-03, Philippe Verdy wrote: From: "Michael Everson" <[EMAIL PROTECTED]> At 22:37 +0100 2004-01-03, Philippe Verdy wrote: >Note that a fundamental property of character identity is its most common > >classification as a vowel, consonnant, or semi-vowel. > That isn't true. The letter "v" is a vowel in Cherokee, a consonant in Czech, and (often) a semivowel in Danish. Stop arguing against each of my words. And READ: Is said "most common" on purpose above. Once again you are volontarily interpreting things that I did not say just to find a way to contradict me. No, I am not. "Vowel", "consonant", or "semi-vowel" is not a "fundamental property of character identity", and as I have shown, any given letter can have any number of these values. Which is why these "properties" are not "fundamental" to "character identity". I feel now that you have your own reading of the Unicode standard. I am sure that many will agree with you. (I am perfectly aware that sometimes I am less patient than I might be, as well. That's a character issue, perhaps.) But stop saying always that your position is neutral, objective. I didn't. I said that you said something that wasn't true. You have the right to think that the representative glyphs are not representative at all. I think the opposite. You may not like these glyphs, because you, as a typographic expert, would have designed them differently. Actually, I vetted a great many of the chart glyphs (GHA especially) to ensure that they were as correctly representative as possible. I really think that you are unable to accept any words that you have not said yourself, and you accept no compromize and prefer a systematic and, once again, dogmatic positions as THE only allowed and omnipotent expert for all questions regarding Unicode. I'm not omnipotent, nor do I speak for the Unicode Consortium. I'm just an expert. When I am dogmatic, it is (as in this case) often due to the fact that we have a *standard* here. You were misusing or misunderstanding and misusing the terms "normative" and "informative". That distinction *is* dogma. -- Michael Everson * * Everson Typography * * http://www.evertype.com
LATIN SOFT SIGN
At 05:30 -0800 2004-01-05, Peter Kirk wrote: It seems that we do actually need two new character pairs, this one and also the soft sign lookalike - unless it is considered acceptable to use the Cyrillic characters in Latin text cf. the use of Latin Q and W in Cyrillic Kurdish. LATIN LETTER TONE SIX **is** the SOFT SIGN clone into Latin, and should be used for Pan-Turkic. I've suggested, but perhaps not loudly enough, that the reference glyph be modified to be more soft-sign like. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: LATIN SOFT SIGN
At 07:27 -0800 2004-01-05, Peter Kirk wrote: If we are talking about U+0184/0185 (an inexact character name is not much help), yes, that is a sensible match, but in that case we need a note cf. for 01A3 that these are for Pan-Turkic Latin alphabets, and not just for Zhuang tones as the existing note suggests. I know. Also, the reference glyphs seem to have an attachment on their left sides, more than a normal serif, which is confusing and makes them look as much like a Cyrillic hard sign as a soft sign. A soft sign should have symmetrical serifs, or no serif at all. I know. It will help if we can show a Zhuang text without the weird serifs; I've had my eye out for a while. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: unicode Digest V4 #3
At 16:27 +0100 2004-01-05, Philippe Verdy wrote: Why not then use the Latin ton six for all texts in that period, and allow glyph variants to show the I with right hook glyph used in early Latin Azeri? Because that wouldn't be right. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: LATIN SOFT SIGN
At 08:31 -0800 2004-01-05, Andrew C. West wrote: LATIN LETTER TONE SIX isn't a Latin clone of the Cyrillic soft sign per se, but is simply a character that is based on the Cyrillic letter that looks most like the digit "6". It was chosen to represent Zhuang Tone 6 purely on the shape of the glyph (likewise the letters for Zhuang Tones 1-5 were chosen simply for their resemblence to the digits "1" through "5"), and has no relation to the original phonetic usage of the Cyrillic letter. It doesn't have to have. My point would be that soft sign was borrowed into Latin for Tatar as well as for Zhuang, and that though we have encoded it for Zhuang, it should be used for old Tatar as well. To modify the reference glyph be modified to be more soft-sign like would simply make the reference glyph less Zhuang Tone Six-like. Only if Zhuang never uses the ordinary soft sign glyph. I am sure I have seen the ordinary soft sign glyph used for Zhuang (but cannot remember where, so I have to discover it again). I recognize that the burden of proof is on me for this. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: unicode Digest V4 #3
At 19:23 +0100 2004-01-05, Philippe Verdy wrote: From: "Michael Everson" <[EMAIL PROTECTED]> At 16:27 +0100 2004-01-05, Philippe Verdy wrote: >Why not then use the Latin ton six for all texts in that period, and allow >glyph variants to show the I with right hook glyph used in early Latin >Azeri? Because that wouldn't be right. Even if it's encoded with a variant selector after the latin tone six? Yes, even if such odious pseudo-coding were employed. As this is an historic variant of the letter which was then changed to Latin soft-sign during the first Latin period, I think it would allow "unifying" Azeri texts coded in Latin in 1923-1933 and in 1933-1939. It is NOT a variant of the soft sign. It is a variant of the letter i. Was there other uses of this i with lower-right hook in other languages or regions ? Yes. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)
Well, James, I think it would be A LOT better if we got some actual documents from Zhuangland. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Latin letter GHA or Latin letter IO ?
At 14:37 -0800 2004-01-05, Peter Kirk wrote: As you will see, I have requested precisely this clarification for U+0184/0185, to clarify that this letter is used in pan-Turkic alphabets as well as in Zhuang. I am also asking for a change in the reference glyph for U+0185, because in both Zhuang and pan-Turkic this should be much shorter, and distinguished from "b" primarily by its size. In Pan-Turkic, though, it looks just like CYRILLIC SOFT SIGN in all the sources I have seen. For lots of languages. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Latin letter GHA or Latin letter IO ?
At 15:51 -0800 2004-01-05, Peter Kirk wrote: In Pan-Turkic, though, it looks just like CYRILLIC SOFT SIGN in all the sources I have seen. For lots of languages. Precisely. I meant that the glyph must be clearly distinct from U+0062, and so should be identical to U+0446. The Pan-Turkic glyph was probably really identical to the soft sign because printers would have used the same type wherever possible. We agree! We agree! We agree! -- Michael Everson * * Everson Typography * * http://www.evertype.com
New document - N2694
N2694 http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2694 Proposal to encode two Bhutanese marks for Dzongkha in the UCS Michael Everson and Chris Fynn -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Latin letter GHA or Latin letter IO ?
At 02:00 +0100 2004-01-06, Philippe Verdy wrote: From: "Kenneth Whistler" <[EMAIL PROTECTED]> When the combination of character name and representative glyph and associated informative annotations is insufficient to correctly identify a character in the standard, the recourse is to Ask the Experts and request further annotation of the standard to assist future users from running into the same problem. Thanks for your view on this issue. It is far less extreme than the Michael position, which just consists in saying "informative" without more justification, when you clearly admit that they are also mandatory. Ken and I hold the same view and have the same position. Things may be mandatory and informative, or they may be mandatory and normative. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New MS Mac Office and Unicode?
At 12:48 -0700 2004-01-06, Tom Gewecke wrote: MS Mac Office 2004 was announced at MacWorld SF today. Does anyone know whether this update finally brings the Unicode capabilities of the WinXP version to the Mac OS X world? It would be really wonderful news if it were to do so. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New MS Mac Office and Unicode?
At 09:33 -0600 2004-01-14, David Perry wrote: I am delighted to see a Unicode-native version of Office come out at long last; it lays the foundation for future developments. Hear, hear. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: New MS Mac Office and Unicode?
At 09:11 -0800 2004-01-14, Peter Kirk wrote: It strikes me that some people are reading the announcement as if it is what they want to hear, rather than what it actually says. It strikes me that some people are wanting to see it as not being good enough because it's not as complete as they want. My view: Input beyond WorldScript? Huzzah! If this was the great step forward that everyone wants, surely Microsoft would be telling everyone loud and clear. No company tells all before release. They are of course saying that their new product is wonderful and what everyone is waiting for (who wouldn't?), but when you read the small print they are promising rather little, certainly not full Unicode support, not even full support for non-complex scripts. If they are permitting input via the "Unicode Hex Input" and "US Extended", then presumably it will allow input via the "Irish Extended" and "Devanagari-QWERTY" and "Arabic-QWERTY" keyboards. If it doesn't, there is something WRONG. If it does, and there are display issues regarding *rendering* of Devanagari or Arabic, that is a DIFFERENT issue, which Microsoft will address in due course, one expects. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Cuneiform - Dynamic vs. Static
It is not useful to continue this thread on both the Unicode and the Cuneiform lists. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Samaritan shan
At 22:10 -0800 2004-01-14, Peter Constable wrote: > >Now, that said, I am very keen to have the Samaritan shin encoded, >because this is used as a mark in the apparatus critici of the BHS >and possibly other Bible editions (in BHS it used in citations of >Pentateuchi textus Hebraeo-Samaritanus secundum). I'd be perfectly >happy to see it encoded as a Letterlike Symbol, since it is being >used as a symbol and not as a Samaritan letter. Perhaps it must be in any case, due to directionality issues. Apparently nobody noticed that I submitted a proposal for this thing last year, the response to which was that it should be left until all of Samaritan is encoded. We did notice, when we started working on Samaritan. Nobody thought about the directionality issue at the time. D'oh! -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Cuneiform - Dynamic vs. Static
At 22:32 -0500 2004-01-14, Dean Snyder wrote: I'm still hoping for even more technical feedback from the Unicode community on this issue. I would like to be convinced that the dynamic model is a bad idea. Whether you are or are not convinced, it certainly is a bad idea. That's why we have been preparing proposals based on the static, sign-based model. Ken Whistler has gone to the trouble of rehearsing the refutation of all of the points on the Cuneiform list. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Klingon
At 11:15 +0100 2004-01-15, Chris Jacobs wrote: > I had a problem with this too, for a while (previous discussion on this list helped clear it up). Klingon letters had been placed in the PUA by the CSUR (ConsScript Unicode Registry, an unofficial allocation of PUA space to constructed alphabets), Really? And did the Klingon Language Institute endorse that? Yes. See http://www.evertype.com/standards/csur/klingon.html The original encoding was made for some Linux implementation in 1995 or 1996 I suppose. -- Michael Everson * * Everson Typography * * http://www.evertype.com
OT, utterly OT
Anyone know how I can read a .mdb file? Please respond to me directly and not on the list. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Klingon
At 14:53 +0100 2004-01-15, Chris Jacobs wrote: WHY THEN DISTRIBUTES THE KLI SUCH A BLATANTLY UNCONFORMANT FONT? yIjachQo'. vItlhob. {{{:-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Klingon
At 18:06 +0100 2004-01-15, Philippe Verdy wrote: From: <[EMAIL PROTECTED]> > Michael Everson scripsit: > > > > yIjachQo'. vItlhob. > Demonstrating once again that the One True Script for Klingon is Latin. Not really: look at how uppercase letters are used: case mapping, which is quite safe in languages written with the Latin script, completely breaks the Klingon text... Michael did not write: "Yijachqo'. Vitlhob." Many Latin-script languages write capital letters in non-initial positions. Irish does quite regularly: "an tSín" 'China'. Breton does sometimes. It is common in transliterations of Tibetan. Of course, Philippe seems to be suggesting that the One True Script for Klingon is *not* Latin, because he thinks that yIjachQo' is not Latin, while Yijach1o' is. Which is, well, incredible. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Klingon
At 18:50 +0100 2004-01-15, Philippe Verdy wrote: My remark is still valid: Klingon is not Latin, even if there's a font that tries to represent Latin letters by creating Latin digraph ligatures into Klingon letters that break the conformance requirement for Latin letters. Oh, stop, stop, stop, stop, stop. I wrote some words in Klingon. I wrote them in the Latin script. John observed that this was more proof that Klingon was conventionally written in the Latin script, which it is. It is not conventionally written in the pIqaD. That's why pIqaD has not been encoded in the Unicode Standard. Enthusiasts use it decoratively; that's why it was given a CSUR encodinng. It's embarrassing to see someone going to such lengths to show what an expert he is about this when he is just utterly wrong. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Klingon
At 19:16 +0100 2004-01-15, Philippe Verdy wrote: > Many Latin-script languages write capital letters > in non-initial positions. Irish does quite > regularly: "an tSín" 'China'. Breton does sometimes. It is common in transliterations of Tibetan. I admit this exists, I don't think it's a good idea to use such weak conventions, which are justified only by the fact that one is technically constrained to use a restricted subset of Latin. If people could use more distinctive letters in Latin, such caveats would be avoided. Well, golly. I guess we're not going to change 1,000 years of orthographic practice because it fails to meet your r For Breton, I don't agree with you. Do you not? The practice is rare, but is sometimes used in placenames, as for instance, "Inis gWenva" written. (Gosh, look. A fact.) Words starting by the trigraph letter are rare in Breton Like the pronoun "c'hwi" 'you" or the digit "c'hwec'h" 'six'. (Wow. Another fact.) but even in that case, I see NO use of such "abuse" of Latin letter case other than a way to represent a missing diacritic or a missing letter. Look again. The presence of case distinctions as meaning strong primary letter distinctions in these conventions just denotes a missing diacritic or separate letter for the Latin transliteration...This is still a (very poor) transliteration system, with its imperfections, and as with other transliteration systems, it breaks the initial script design and semantic structure and is a clear sign that this is a plain separate script (as it was the intent of Tolkien when he created the script). Heaven help us. Of course, the original orthography for Klingon was Latin, as published in 1985 in Marc Okrand's Klingon Dictionary. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Klingon
At 19:28 +0100 2004-01-15, Philippe Verdy wrote: Even in the case of Irish, the uppercase "S" denotes a distinctful variant of "s", which should better be noted with some diacritic, such as a hacek or cedilla... Imagine what happens when reading uppercased Irish book titles and the confusion it produces? Yes. Imagine. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Samaritan shan symbol
At 12:19 -0800 2004-01-15, John Hudson wrote: Do you know if the directionality issue was considered at that time. No, we didn't consider it at the time. We dropped the ball on that one. I sent Michael a number of scans of the Samaritan shin in use as a symbol in BHS apparatus critici, including in use in direct proximity with LTR letters, numbers and other symbols. I have that, yes. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Breton
At 22:20 +0100 2004-01-15, Philippe Verdy wrote: Look at this page to find why this happens: http://www.kervarker.org/fr/grammar_01_kemmadur.html Perhaps I won't. I know about Breton mutation. See http://www.evertype.com/gram/bg.html By "rare" I mean words without mutation of the leading consonnant. The same number above would be "kwec'h" without the mutation... This is incorrect. *kwec'h does not occur; neither does *kwi. In fact, no words in kw- occur. Typical breton dictionnaries will list the word only at K, and not at C'H This is incorrect. For instance the 1200-page monolingual Breton dictionary published in 1995 gives them under C'H. (in fact the prefered Breton sorting order generally orders C'H between K and L, and GW between W and X). This is incorrect. Alphabetical order is A B C CH C'H D E F G H I J K L M N O P R S T U V W Y Z X does not occur. GW is not a letter of its own. The old alphabetical order was A B K D E F G H CH C'H I Y J L M N O P R S T U V W Z Sometimes, as in Kervella's _Yezhadur bras ar brezhoneg_, GW was separated out between G and H (where it would fall anyway). -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Klingon
At 22:11 +0100 2004-01-15, Philippe Verdy wrote: The comment from Michael about the occurence of "gW" in Breton was wrong: I said I had seen it in print, which was true, and I said that it was rare, which is also true. It is not standard. -- Michael Everson * * Everson Typography * * http://www.evertype.com