Re: Medieval CJK race-horse names (was Re: Bantu click letters )
John Jenkins wrote: And the proper solution for the race horse problem is for the People's Hong Kong Jockey Club to refuse to let a horse race unless its name is in Unicode. :-) Wouldn't that be like putting the cart before the horse?
Re: Medieval CJK race-horse names (was Re: Bantu click letters )
> On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote: > > > Depite the oft-mentioned cutesy Hong Kong race horse names, > > idiosyncratic > > invented Han ideographs are a negligible component of the encoded CJK > > repertoire. In my opinion there are thousands, possibly tens of > > thousands, of > > ideographs that should not really have been encoded individually as > > they are > > simply minor glyph variants, frequently only attested in a single > > source because > > the author simply wrote the character wrongly in the first place. This > > is the > > real issue with the over-encoding of CJKV, not the occasional race > > horse name. > > In particular, the decision to import en masse the repertoire of the > Hanyu Da Zidian was not a wise one, as a substantial number of the > entries are of the form "same as X". Andrew and John have correctly identified the bulk of the problem for CJKV overencoding. Unfortunately, given the nature of the Han script and the historical practice of Chinese lexicography, the result we have ended up with is almost inevitable. This historic mistakes, minor glyph variants, and such got carried into scholastic compendia *as characters*, where they become lexical headwords, repeated ad infinitum, in each further edition and each new compendium. The fact that they got carried into the Hanyu Da Zidian, the Chinese moral equivalent of the Oxford English Dictionary, means that inevitably they end up in the character encoding, as digital representation of the Hanyu Da Zidian is absolutely required. Leaving some out, no matter how mistaken or obsolete, would, from the Chinese point of view be like deciding to leave some obsolete word out of the OED simply because there wasn't a "character" encoded for it. It would have been nice if a better mechanism for expressing Han glyphic (and other types of) variants had been feasible and in place before CJK Extension B went in, but that is water under the bridge now. One can only hope that some restraint and use of alternative mechanisms will be shown in the current effort to define and encode additional CJK extensions, which involve even *less* useful characters, for the most part, missed even by the major dictionary compendia. --Ken
Re: Medieval CJK race-horse names (was Re: Bantu click letters )
On Jun 11, 2004, at 1:20 PM, Kenneth Whistler wrote: It would have been nice if a better mechanism for expressing Han glyphic (and other types of) variants had been feasible and in place before CJK Extension B went in, but that is water under the bridge now. One can only hope that some restraint and use of alternative mechanisms will be shown in the current effort to define and encode additional CJK extensions, which involve even *less* useful characters, for the most part, missed even by the major dictionary compendia. FWIW, I was able to give my demo on variation selectors on Han in Chengdu after all, and I think it made the appropriate impression. John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)
On 11/06/2004 10:51, James Kass wrote: ... Doesn't this mean that it isn't possible to stack a combining circumflex above a combining spanning inverted breve? Does this mean we'd need double-wide clones of all the combining marks in order to support such combos? Sounds like the same problem we found with Hebrew nearly a year ago, and solved by inserting CGJ to keep the non-canonical order which we needed. Perhaps this is another suitable application for CGJ. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)
> Peter Constable wrote, > > > Don't forget canonical equivalence (I forgot about this as well): the > > double-width diacritics have a combining class of 234 rather than 230. > > This means that 0251 0361 0302 028A is canonically equivalent to 0251 > > 0302 0361 028A. Therefore, the first (for better or worse) should appear > > just the way Doulos SIL renders it. > Sure enough! Thanks. I didn't even think to check the combining class, > both were marks above. > > Doesn't this mean that it isn't possible to stack a combining circumflex > above a combining spanning inverted breve? Does this mean we'd need > double-wide clones of all the combining marks in order to support such > combos? Actually, no. The UTC has had discussions about this. The whole issue of how to display accents *above* a combining double diacritic (or for that matter *below* a combining double diacritic below) was debated at some length on the list last year -- I expect that a search of the archives would turn it up. In any case, the addition of U+034F COMBINING GRAPHEME JOINER, and the recent refinement of the definition of combining character sequence to explicitly allow ZWJ and ZWNJ, gives you a text mechanism for blocking what would otherwise result in a canonical reordering for such sequences. Thus: <0251, 0361, 0302, 028A> is canonically equivalent to: <0251, 0302, 0361, 028A> and both should result in the same display, with the circumflex over the "a" and the ligature tie spanning both base characters, *over* the circumflex. But: <0251, 0361, 034F, 0302, 028A> or <0251, 0361, 200D, 0302, 028A> are *not* canonically equivalent to: <0251, 0302, 034F, 0361, 028A> or <0251, 0302, 200D, 0361, 028A> And they should, in principle, at least, result in a display with the circumflex positioned *above* the ligature tie and with respect to it, rather than above the "a" and with respect to it. This is the same principle which is being used to enable textual distinctions for certain combinations of Hebrew points and accents, for example, which would otherwise be reordered into undesirable orders by any normalization process. Whether any existing rendering engine will do a decent job of implementing that, I don't actually know. --Ken
RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)
Peter Constable wrote, > Don't forget canonical equivalence (I forgot about this as well): the > double-width diacritics have a combining class of 234 rather than 230. > This means that 0251 0361 0302 028A is canonically equivalent to 0251 > 0302 0361 028A. Therefore, the first (for better or worse) should appear > just the way Doulos SIL renders it. and later wrote, > > That rule applies to combining marks in the *same* canonical combining > class. In this case, they are in different classes. Sure enough! Thanks. I didn't even think to check the combining class, both were marks above. Doesn't this mean that it isn't possible to stack a combining circumflex above a combining spanning inverted breve? Does this mean we'd need double-wide clones of all the combining marks in order to support such combos? (Well, at least I can give up on trying to make it display right here.) Best regards, James Kass
RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of James Kass > Hmmm. Further on the inside-out rule. Note the following pairs, which > are supposed to be in UTF-8: > > aÌ"Ì^ aÌ^Ì" > uÌ"Ì^ uÌ^Ì" [Why isn't UTF-8 coming through as such?] > The first "a" with combiners isn't displaying correctly here, it should > have the diaeresis above the macron, just like the first "u". This is a known problem in Uniscribe. Peter Constable
RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of James Kass > > Not sure what you are saying here or what you mean by the inside-out > rule. > > The two sequences are canonically equivalent and should look identical. > > The "inside-out" rule is explained and illustrated on page 125 (TUS 4.0). > > An "a" followed by combining umlaut followed by combining macron > is not the same as "a" plus combining macron plus combining umlaut. That rule applies to combining marks in the *same* canonical combining class. In this case, they are in different classes. Peter Constable
Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)
[EMAIL PROTECTED] wrote: Even with OpenType experimental support here, my display looks like the GIF you sent. I'll try fixing this, Um, good luck. I am not sure it is possible to correctly position double-diacritics with OpenType logic. Specifically, the vertical position of the double-diacritic must be adjusted so that it is above the *taller* of the preceding and following combining sequence. AFAIK, such logic isn't feasible in OpenType. You could handle it fairly easily by contextually substituting a glyph variant of the double-diacritic at a different height. John Hudson -- Tiro Typeworkswww.tiro.com Vancouver, BC[EMAIL PROTECTED] Currently reading: Typespaces, by Peter Burnhill White Mughals, by William Dalrymple Hebrew manuscripts of the Middle Ages, by Colette Sirat
Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)
> The "inside-out" rule is explained and illustrated on page 125 (TUS 4.0). > > An "a" followed by combining umlaut followed by combining macron > is not the same as "a" plus combining macron plus combining umlaut. Hmmm. Further on the inside-out rule. Note the following pairs, which are supposed to be in UTF-8: ā̈ ǟ ṻ ǖ The first "a" with combiners isn't displaying correctly here, it should have the diaeresis above the macron, just like the first "u". I attach a GIF showing the display using Doulos SIL/BabelPad. But, this isn't a font problem as this repros with at least one other font. Therefore, I think it's a bug in the rendering engine. It looks like the rendering engine is doing an unwanted reordering for the first "a" sequence. Best regards, James Kass <>
Re: Medieval CJK race-horse names (was Re: Bantu click letters )
On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote: Depite the oft-mentioned cutesy Hong Kong race horse names, idiosyncratic invented Han ideographs are a negligible component of the encoded CJK repertoire. In my opinion there are thousands, possibly tens of thousands, of ideographs that should not really have been encoded individually as they are simply minor glyph variants, frequently only attested in a single source because the author simply wrote the character wrongly in the first place. This is the real issue with the over-encoding of CJKV, not the occasional race horse name. In particular, the decision to import en masse the repertoire of the Hanyu Da Zidian was not a wise one, as a substantial number of the entries are of the form "same as X". Using variation selectors with Han is really the proper solution for that kind of thing. Nonce Latin forms such as experimental notations would probably best be handled via the PUA. And the proper solution for the race horse problem is for the People's Hong Kong Jockey Club to refuse to let a horse race unless its name is in Unicode. :-) John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jhjenkins/
Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)
Bob Hallissy wrote, > >Even with OpenType experimental support here, my display looks like > >the GIF you sent. I'll try fixing this, > > Um, good luck. I am not sure it is possible to correctly position > double-diacritics with OpenType logic. Specifically, the vertical position > of the double-diacritic must be adjusted so that it is above the *taller* > of the preceding and following combining sequence. AFAIK, such logic isn't > feasible in OpenType. > > > > >Following the "inside-out" rule, the first sequence should render > >correctly, the second sequence should not. > > Not sure what you are saying here or what you mean by the inside-out rule. > The two sequences are canonically equivalent and should look identical. The "inside-out" rule is explained and illustrated on page 125 (TUS 4.0). An "a" followed by combining umlaut followed by combining macron is not the same as "a" plus combining macron plus combining umlaut. So, I'd expect that entering a combiner before the spanning character would render the combiner below the spanning character, while reversing this order would render the combiner above the spanning character. Is this not the case? As you suggest for the double-wide combiners, this turns out not to be an easy fix. So far, I'm unsuccessful in getting a good display. I'll have to double-check everything in GDEF and GPOS to make sure I'm doing it right, but, it may simply not be possible yet. Best regards, James Kass
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of James Kass > > > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see > > > attached) is not productively different from U+0251 > > > U+0302 U+0361 U+028A (see attached)... > > Following the "inside-out" rule, the first sequence should render > correctly, Don't forget canonical equivalence (I forgot about this as well): the double-width diacritics have a combining class of 234 rather than 230. This means that 0251 0361 0302 028A is canonically equivalent to 0251 0302 0361 028A. Therefore, the first (for better or worse) should appear just the way Doulos SIL renders it. The only way to stack a diacritic on top of a double-width diacritic is to use another double-width diacritic. (Unfortunately, that wasn’t anticipated when Doulos SIL was being developed. Wouldn’t have been hard to support, though.) Peter Constable
Shavian (was: "Re: Bantu click letters")
On 2004.06.11, 06:25, Doug Ewell <[EMAIL PROTECTED]> wrote: > I concede that "Androcles and the Lion" was the only book published > in Shavian Check < http://katalogo.uea.org/index.php?inf=5522 > for one more. At least its last chapter is fully in shavian script; shavian letters are introduced gradually from chap.2 (cap.1 fully in latin script) as an attempt to enable smooth learning -- didn't work too wel for me though. (And no, no specific Esperanto extensions for shavian -- just the kind of grapheme value changes akin to, say, Polish and Welsh "w".) --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Rendering of sequences containing double diacritic (was Re: Bantu click letters)
On 11/06/2004 14:39:48 James Kass wrote: >-- Original message from "Anto'nio Martins-Tuva'lkin" : >-- >> On 2004.06.10, 17:11, I wrote: >> >> > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see >> > attached) is not productively different from U+0251 >> > U+0302 U+0361 U+028A (see attached)... >> >> Now attached. (Both GIFs are identical, byte by byte, though I swear >> I made them separately: click the characters in BabelMap, PrtScr, >> paste into PhotoShop, crop, resample, save!) > >You're getting default positioning only, it looks like your system >doesn't support OpenType combining diacritic positioning for Latin. > >Even with OpenType experimental support here, my display looks like >the GIF you sent. I'll try fixing this, Um, good luck. I am not sure it is possible to correctly position double-diacritics with OpenType logic. Specifically, the vertical position of the double-diacritic must be adjusted so that it is above the *taller* of the preceding and following combining sequence. AFAIK, such logic isn't feasible in OpenType. >Fonts and rendering systems probably aren't ready for this kind of >combination yet. SIL Graphite handles it, but then we don't [yet] have wide-spread availability of Graphite-capable applications. >> > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see >> > attached) is not productively different from U+0251 >> > U+0302 U+0361 U+028A (see attached)... > >Following the "inside-out" rule, the first sequence should render >correctly, the second sequence should not. Not sure what you are saying here or what you mean by the inside-out rule. The two sequences are canonically equivalent and should look identical. Bob
Re: Bantu click letters
-- Original message from "Anto'nio Martins-Tuva'lkin" : -- > On 2004.06.10, 17:11, I wrote: > > > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see > > attached) is not productively different from U+0251 > > U+0302 U+0361 U+028A (see attached)... > > Now attached. (Both GIFs are identical, byte by byte, though I swear > I made them separately: click the characters in BabelMap, PrtScr, > paste into PhotoShop, crop, resample, save!) You're getting default positioning only, it looks like your system doesn't support OpenType combining diacritic positioning for Latin. Even with OpenType experimental support here, my display looks like the GIF you sent. I'll try fixing this, now that I know there is a problem. But, the fix probably won't work on your system because OpenType Latin positioning support is needed. Attached is a GIF showing U+0251 U+0361 U+0302 U+028A as it appears in BabelPad with Doulos SIL. The Doulos font puts the combining double wide mark higher, and then the combining circumflex doesn't overstrike it. Fonts and rendering systems probably aren't ready for this kind of combination yet. > > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see > > attached) is not productively different from U+0251 > > U+0302 U+0361 U+028A (see attached)... Following the "inside-out" rule, the first sequence should render correctly, the second sequence should not. As for the combination which uses a combining mark below, that mark below is either going to apply to a previous mark below, or it is going to apply to the previously entered base letter. A mark below probably can't be configured to apply to a mark above. So, if we have a combining mark below which is to apply to a span of two base characters, then we need to have this combining mark added to the standard as a double-wide combining mark below, as far as I can tell. Best regards, James Kass --- Begin Message --- <><>--- End Message --- <>
Medieval CJK race-horse names (was Re: Bantu click letters )
On Fri, 11 Jun 2004 03:04:17 +0100, Michael Everson wrote: > > How many people use medieval CJK race-horse-name characters? > Actually, the famous Song dynasty female poet Li Qingzhao (1084-c.1151) invented a board game (da3 ma3 tu2 in Chinese) which involved racing around a course in which each square was marked with the name of one of dozens of famous horses ancient and modern, most of which are written using idiosyncratic ideographs. I would of thought that Michael of all people would be in favour of encoding characters used in board games ! Depite the oft-mentioned cutesy Hong Kong race horse names, idiosyncratic invented Han ideographs are a negligible component of the encoded CJK repertoire. In my opinion there are thousands, possibly tens of thousands, of ideographs that should not really have been encoded individually as they are simply minor glyph variants, frequently only attested in a single source because the author simply wrote the character wrongly in the first place. This is the real issue with the over-encoding of CJKV, not the occasional race horse name. Andrew
PUA - (was: Re: Bantu click letters)
D. Starner wrote: John Cowan <[EMAIL PROTECTED]> writes: We must be talking past one another somehow, but I don't understand how. To represent the text as originally written, I need a digital representation for each of the characters in it. Since all I want to do is reprint the book -- I don't need to use the unusual characters in interchange -- the PUA and a commissioned font seem just perfect to me. But that doesn't work if you're reprinting to XML or HTML, where you can't rely upon a commissioned font being installed and correctly used. I'm not even sure you can trust a commissioned font to be installable on the operating systems of the next few decades. Nor on PUA characters actually being useable.. See: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=PUACharsInMSSotware If there is not some kind of guarantee that major OS vendors won't grab PUA characters for their own purposes, using PUA characters and a commissioned font to solve problems like this, is not a workable solution in the real world. - Chris
Re: Bantu click letters
At 07:41 PM 6/10/2004, Kenneth Whistler wrote: Yes, it's a scare claim. It is trying to bludgeon the committee I think the verb in question is inappropriate for the occasion and for this e-mail exchange. Especially when used in the context of imputing intention of your opponent which is always a chancy thing to do. A./ into thinking that their encoding is scholastically incomplete if it doesn't represent every invented character by every idiosyncratic scholar creating his or her own conventions out there.
RE: Bantu click letters
"Mike Ayers" <[EMAIL PROTECTED]> writes: > > > I'm not > > > even sure you can trust a commissioned font to be > > installable on the operating > > > systems of the next few decades. > > Font support has only improved with time. What causes you to > foresee a sharp reversal? I don't expect a reversal; but if I commissioned a Type-1 font 15 years ago, I'd have a hard time installing it on a lot of computers nowdays. Just because OpenType is common now, doesn't mean that everyone will support it in 20 years. -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
Double diacriticals (was: "Re: Bantu click letters")
On 2004.06.10, 18:45, Michael Everson <[EMAIL PROTECTED]> wrote: >> After a "double" diacritical, any further combining character could >> take as its base the "pair" of spacing characters "under" the said >> double diacritical, shouldn't it? > > I tried that in TextEdit, which is pretty smart, and the second > diacritic didn't centre over the pair, but rather over the 0251. But > I guess that's the only choice, and it would be a question of making > a precomposed glyph. With six combining double characters (U+035D..U+0362) and a zillion regular combining characters (101 alone in the U+0330 block), of which a full dozen would be in realist need, we'd need at the very least 6×12=72 precomposed glyphs. Isn't the Standard explicit about the positioning of a regular diacritical after a double one? --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: Bantu click letters
On 2004.06.10, 21:54, Kenneth Whistler <[EMAIL PROTECTED]> wrote: > 9. n's with loops <...> I think PUA, markup, or other arbitrary text > representational mechanisms are sufficient here. Hm... U+023D LATIN SMALL LETTER N WITH RIGHT LOOP as U+0273 U+302D U+0240 LATIN SMALL LETTER ENG WITH LEFT LOOP as U+014B U+0325 U+0242 LATIN SMALL LETTER N WITH LEFT LOOP as U+006E U+0325 U+0245 LATIN SMALL LETTER N WITH LEFT HOOK AND RIGHT LOOP as U+0272 U+302D U+0248 LATIN SMALL LETTER N WITH LEFT LOOP AND RIGHT LOOP as U+0273 U+0325 U+302D Perhaps not bad for a kludge...? --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: Bantu click letters
On 2004.06.10, 17:11, I wrote: > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see > attached) is not productively different from U+0251 > U+0302 U+0361 U+028A (see attached)... Now attached. (Both GIFs are identical, byte by byte, though I swear I made them separately: click the characters in BabelMap, PrtScr, paste into PhotoShop, crop, resample, save!) --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|<><>
Re: Bantu click letters
On 2004.06.10, 20:50, Asmus Freytag <[EMAIL PROTECTED]> wrote: > In at least one case I suspect that a character named 'script' was > actually intended for an *italic* shape. In principle, all "holes" in the ranges U+1D434..U+1D49D, U+1D608..U+1D66F, U+1D6E2..U+1D755 and U+1D790..U+1D7C9 (those with italics style) correspond to already encoded "letter like" characters with italics style. These are... hm, only U+1D455 -- which points to U+210E : PLANCK CONSTANT (which name does not include a misused "script")... I'd expect U+212F : SCRIPT SMALL E, but is is refered at the non existent U+1D4BA, from the script block, not italics. --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: Bantu click letters
On 2004.06.10, 22:35, Asmus Freytag <[EMAIL PROTECTED]> wrote: > the fact that you are not conversant with mathematical notation, but > very familiar with linguistic notations, makes you treat these two > as worlds apart. <...> In that way, both are different from regular > 'language text' What is the difference between "language text" and "linguistic notation"? After all, the characters under discussion *could* have been adopted as the usual orthography of a writing community... --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: Bantu click letters
Michael, And now you are answering arguments with irrelevancies. > >But the argument in this particular case hinges on a particular, > >nonce set of characters. > > You use "nonce" very easily. Nonce: Occurring, used, or made only once or for a special occasion. You can, of course, quibble that this should be applied to only a single *token* of a character, but I think it applies fairly to the situation we are talking about: a single scholar's invention that developed no community of use, so saw no application beyond that one person's usage. > That they did not *adopt* them as standard representations does not > mean that there is no need to *use* them in interchangeable text. The case to standardize them for use in interchange is different from the case to make a particular orthography in a particular (small) set of documents available online. > > In fairness to Professor Doke, he published from 1925 to at least > 1966. Let's see what he did, shall we? Sure. > >Well, in terms of requirements, I consider that more than a little > >cart before the horse. I'd be more sympathetic if someone was > >actually *trying* to do this and had a technical problem with > >representing the text accurately for an online edition which was > >best resolved by adding a dozen character to the Unicode Standard. > >Then, at least there would be a valid *use* argument to be made, > >as opposed to a scare claim that 50 years from now someone *might* > >want to do this and not be able to if we don't encode these > >characters right now. > > Scare claim? You think I'm making a scare claim about the UCS? Our > visions of "universal" must differ rather a lot. Yes, it's a scare claim. It is trying to bludgeon the committee into thinking that their encoding is scholastically incomplete if it doesn't represent every invented character by every idiosyncratic scholar creating his or her own conventions out there. I claim that there are limits to what is useful to pursue in representing every squiggle. And *my* vision of Universal is that it is a hell of a lot more important to encode Avestan and Egyptian hieroglyphics, which have *large*, important literatures and large communities of users, rather than waste time on a dozen weird phonetic characters used by one scholar, characters rejected by his field, and not even significant enough to be listed in the premier work on phonetic symbol usage today, Pullum & Ladusaw. Wasting list time and committee time pursuing these things is *detracting* from the big prizes that need to be attained out there still, and fighting tooth and nail for Doke's "OWL" character is a strategic error on your part, undermining the good will and consensus you need to get the other important things done. > >Right *now* anyone could (if they had the rights) put a version of > >Dokes online using pdf and an embedded font, and it would be perfectly > >referenceable for anyone wanting access to the content of the > >document. True, the dozen or so "weird" characters in the > >orthography wouldn't have standard encodings, so searching inside > >the document for them wouldn't be optimal. > > Come clean, Ken. You suggested offline that it would be OK with you > for the Khoisan scholars to use Runic MADR or YR to represent the > VOICELESS and VOICED RETROFLEX CLICKs. *That* is not UCS philosophy, > and it is not good sense. O.k., *NOW* I'm pissed. If you are going to continue dragging things back to the Unicode list after I suggested that these discussions be dealt with offlist to argue out the issues, and THEN misrepresent my position, do me the courtesy of *quoting* the actual position you misrepresent: 8. The pitchforks The etiology of these is unexplained. Dokes may have been reusing an existing symbol (mathematical or runic or Greek) and then flipping it for an additional semantic, just as he apparently created the lateral click character by flipping the glottal stop. In any case, again because this is a nonce orthography, the rationale for creating *new* characters for a standard encoding of them is weak. As an approximation, it would make just as much sense to use a psi and inverted psi, or Runic long branch madr and yr (16D8, 16E6). Note, in particular, the already approved encoding of rotated and flipped versions of Greek letters as symbols used in Ancient Greek musical notation. 1D201, 1D218, 1D21E. A psi and the flipped psi symbol (1D218) would be sufficient to carry the distinction. Yeah, yeah, Michael, I know you are going to hit the roof about such a suggestion, since these symbols used by Dokes are part of a Latin phonetic orthography, and are not Runic or Greek. So spare us the detour into that lecture. My point is that given the unproductive nature of Doke's experiment here, and given that the conventions did *not* catch on to become part of any user community of Latin phonetic practice, there is no burning need to actually extend the *standard* list of Latin letters merely
RE: Bantu click letters
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Ayers > Reprinting the book brings with it the potential for its special > characters to gain currency, even if only in the context of discussing > the book. Um, Mike, let's get real. Linguists have had 80 years of opportunity during which Doke's writings have been accessible and Khoisan phonology has held at least some measure of interest, if for no other purpose than as material used by phonetics teachers and authors of books that teach phonetics who need to provide comprehensive coverage of phonetic symbols for the world's speech sounds. And during that time, they have *not* been using these symbols of Doke's for any purpose. A reprint of his 1926 book isn't going to suddenly change that. Peter Constable
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Michael Everson > Mark, come on. Doke's phonetic transcription of !Xung is a set of > explicit glyphs representing specific sounds, indeed more precisely > than IPA allows (I don't think IPA specifies a representation for > retroflex clicks). You've said that several times, as though having a bunch of distinct atomic symbols for close transcription were a good thing. Actually, the IPA is built on a principle of phonemic representation. I believe that atomic characters are never added for distinctions that are never phonemic. Where close allophonic transcription is desired, diacritics are used. So, for instance, the symbol for pre-palatal n wouldn't be added to IPA since there's no language that has a phonemic contrast between palatal nasal and pre-palatal nasal. If a linguist wants to indicate the forward position, 031F can be combined with 0272. If a linguist needs to explain *really* close details on consonants, then they resort to face diagrams (as Doke used in the samples you provided), palatographs, x-ray cinematographs or the like. This is not an argument against encoding these characters, though. It is simply pointing out that statements like "more precisely than IPA" do not constitute an argument in favour of encoding. The fact that after 80 years there are no conventional symbols for pre-palatal nasals speaks to the value and necessity of having symbols with such precise meanings. Peter Constable
Re: Bantu click letters
At 18:48 -0700 2004-06-10, Mark Davis wrote: There are two reasons we might not encode a particular image as a character. I had said: Many images are not appropriate for use in plain text, or have too small a user community. That is, you need to have something that is appropriate for use in plain text *and* have a significant user community. "Significant"? How many people use medieval CJK race-horse-name characters? As far as I have seen from the email, there is no real evidence for a user community. If a character only occurs in a couple of works, means there is simply not the utility in encoding it; PUA is the right choice. I don't like shifting goalposts. We have encoded many characters which are extremely rare. There is a much larger set of documents containing the Prince icon, but we don't want to encode that either! The Prince icon is a LOGO, Mark, and is out of scope by definition. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
At 18:10 -0700 2004-06-10, Kenneth Whistler wrote: But the argument in this particular case hinges on a particular, nonce set of characters. You use "nonce" very easily. We have this one scholar, who invented a bunch of characters in the 20's to represent click sounds that nobody was doing justice to at that point, either in understanding their phonetics or making sufficiently accurate distinctions in their recording. Bully for Dokes -- it was an important advance in the field of Khoisan studies and the phonetics of clicks. But even though he published his analysis, using his characters, nobody else chose to adopt his character conventions. That they did not *adopt* them as standard representations does not mean that there is no need to *use* them in interchangeable text. In fairness to Professor Doke, he published from 1925 to at least 1966. Let's see what he did, shall we? It comes down then to a *prospective* claim that someone *might* want to digitize the classic Dokes publication and that if they did so they would require that the particular set of weird phonetic letters used by Dokes would have to be representable in Unicode plain text in order for that one publication to be made available electronically. (Or a few other publications that might cite Dokes verbatim, of course.) It seems reasonable to suppose that such might be the case. Well, in terms of requirements, I consider that more than a little cart before the horse. I'd be more sympathetic if someone was actually *trying* to do this and had a technical problem with representing the text accurately for an online edition which was best resolved by adding a dozen character to the Unicode Standard. Then, at least there would be a valid *use* argument to be made, as opposed to a scare claim that 50 years from now someone *might* want to do this and not be able to if we don't encode these characters right now. Scare claim? You think I'm making a scare claim about the UCS? Our visions of "universal" must differ rather a lot. Right *now* anyone could (if they had the rights) put a version of Dokes online using pdf and an embedded font, and it would be perfectly referenceable for anyone wanting access to the content of the document. True, the dozen or so "weird" characters in the orthography wouldn't have standard encodings, so searching inside the document for them wouldn't be optimal. Come clean, Ken. You suggested offline that it would be OK with you for the Khoisan scholars to use Runic MADR or YR to represent the VOICELESS and VOICED RETROFLEX CLICKs. *That* is not UCS philosophy, and it is not good sense. But I don't hear people yelling about the online Unicode Standard is crippled for use by people who wish to refer to it because you can't do an automated search for in it which will accurately find all instances of Devanagari ksha in the text. KA + VIRAMA + SSA. Works every time, if you are using Unicode. Finally, if someone actually wants to do a redacted publication of Dokes for its *content*, as opposed its orthographic antiquarian interest, it is perfectly possible to do so with an updated set of orthographic conventions that would make it more accessible to people used to modern IPA usage. Many Uralicists prefer IPA today, but the baroque weirdness of UPA usage was encoded in order to allow them to cite original forms. Whether they also transcribe UPA into IPA is a different question. Usability of published or republished documents is not limited to slavish facsimile reproduction of their orginal form -- for that we have facsimiles. :-) I love Shakespeare, but I don't have to read his plays with long ess's and antique typefaces. Face is irrelevant. And the long ess is encoded for those who need or want to use it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
There are two reasons we might not encode a particular image as a character. I had said: >Many images are not appropriate for use in plain text, or have too small a user community. That is, you need to have something that is appropriate for use in plain text *and* have a significant user community. As far as I have seen from the email, there is no real evidence for a user community. If a character only occurs in a couple of works, means there is simply not the utility in encoding it; PUA is the right choice. There is a much larger set of documents containing the Prince icon, but we don't want to encode that either! Mark __ http://www.macchiato.com â à â - Original Message - From: "Michael Everson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thu, 2004 Jun 10 17:00 Subject: Re: Bantu click letters > At 15:34 -0700 2004-06-10, Mark Davis wrote: > >This argument does not hold water. Simply because some images appear > >in some documents does not mean that they automatically should be > >represented as encoded characters. Many images are not appropriate > >for use in plain text, or have too small a user community. They > >should be represented as private use characters, or as literal > >images. The Prince glyph, on-beyond-zebra characters, the images on > >images on http://www.aperfectworld.org/animals.htm, etc. are in > >quite a number of documents, but that doesn't mean that any of them > >necessarily qualify as characters for encoding. > > Mark, come on. Doke's phonetic transcription of !Xung is a set of > explicit glyphs representing specific sounds, indeed more precisely > than IPA allows (I don't think IPA specifies a representation for > retroflex clicks). Apart from the question whether or not the > characters are important enough for people to want to be able to > interchange them as encoded UCS characters (which is stipulated as a > question), it's just not on to say that these are the same kinds of > things as Prince's logo or the Seussian extensions. > -- > Michael Everson * * Everson Typography * * http://www.evertype.com > >
RE: Bantu click letters
Title: RE: Bantu click letters > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Mark Davis > Sent: Thursday, June 10, 2004 3:35 PM > The Prince glyph, on-beyond-zebra > characters, the images on > images on http://www.aperfectworld.org/animals.htm, etc. are > in quite a number > of documents, but that doesn't mean that any of them > necessarily qualify as > characters for encoding. ...because none of them have ever been used as characters? Really, I'm quite surprised at having to mention this distinction. > From: "D. Starner" <[EMAIL PROTECTED]> > Sent: Thu, 2004 Jun 10 13:46 > > John Cowan <[EMAIL PROTECTED]> writes: > > > > > We must be talking past one another somehow, but I don't > understand how. > > > To represent the text as originally written, I need a > digital representation > > > for each of the characters in it. Since all I want to do > is reprint > > > the book -- I don't need to use the unusual characters in > interchange -- > > > the PUA and a commissioned font seem just perfect to me. I don't think "all I want to do is reprint the book" is a reasonable constraint upon future usage. Reprinting the book brings with it the potential for its special characters to gain currency, even if only in the context of discussing the book. > > I'm not > > even sure you can trust a commissioned font to be > installable on the operating > > systems of the next few decades. Font support has only improved with time. What causes you to foresee a sharp reversal? /|/|ike
Re: Bantu click letters
> > Simply because some images appear in some > > documents does not mean that they automatically should be > > represented as encoded > > characters. > > These aren't images. They're clearly letters; they occur in running texts and > represent > the sounds of a spoken language. Well, I agree with that assessment. > If I were transcribing them, I wouldn't encode them > as pictures; I would encode them as PUA elements or XML elements (which are usually > more easier to use and more reliable than the PUA). And with that assessment, as well. > I'll admit that it's a bit sketchy encoding these characters based on one article by > one author. But I think it important to remember that more and more text is available > online, even stuff that might never get reprinted in hardcopy, and that needs > Unicode. And in generally, I can't find fault with that, either. But the argument in this particular case hinges on a particular, nonce set of characters. We have this one scholar, who invented a bunch of characters in the 20's to represent click sounds that nobody was doing justice to at that point, either in understanding their phonetics or making sufficiently accurate distinctions in their recording. Bully for Dokes -- it was an important advance in the field of Khoisan studies and the phonetics of clicks. But even though he published his analysis, using his characters, nobody else chose to adopt his character conventions. Subsequent scholars, and the IPA, chose *other* characters to represent the distinctions involved, in part because Dokes' inventions were just weird and hard to use, as well as neither (in my opinion) mnemonic nor aesthetically pleasing. Well, we've encoded ugly letters for ugly orthographies in ugly scripts before. That isn't the issue. But the non-use of these forms is. It comes down then to a *prospective* claim that someone *might* want to digitize the classic Dokes publication and that if they did so they would require that the particular set of weird phonetic letters used by Dokes would have to be representable in Unicode plain text in order for that one publication to be made available electronically. (Or a few other publications that might cite Dokes verbatim, of course.) Well, in terms of requirements, I consider that more than a little cart before the horse. I'd be more sympathetic if someone was actually *trying* to do this and had a technical problem with representing the text accurately for an online edition which was best resolved by adding a dozen character to the Unicode Standard. Then, at least there would be a valid *use* argument to be made, as opposed to a scare claim that 50 years from now someone *might* want to do this and not be able to if we don't encode these characters right now. Right *now* anyone could (if they had the rights) put a version of Dokes online using pdf and an embedded font, and it would be perfectly referenceable for anyone wanting access to the content of the document. True, the dozen or so "weird" characters in the orthography wouldn't have standard encodings, so searching inside the document for them wouldn't be optimal. But is the burden that might place on the dozen or so Khoisan orthographic historians and phonetic historians who might actually be interested in doing so out of scale with the burden placed permanently on the standard itself for adding a dozen or so nonce characters for that *one* document? After all those historians and scholars today are basically using the document in its printed-only (out-of-print) hard copy format, and we aren't exactly worried about the difficulties that *that* poses them, now are we? I might point out at this point that the Unicode Standard itself is published online using non-standard encodings for many of its textual examples, simply because of the limitations of FrameMaker and PDF and fonts and the specialized requirements of citing lots and lots of characters outside normal text contexts. But I don't hear people yelling about the online Unicode Standard is crippled for use by people who wish to refer to it because you can't do an automated search for in it which will accurately find all instances of Devanagari ksha in the text. And the *database* arguments just don't cut it. If anybody is seriously going to be using Dokes materials in comparative Khoisan studies, they will *normalize* the material in their text databases. After all, this is just one of a large variety of really varied material, in all kinds of orthographies, and in all levels of detail and quality. Arguing that making these particular dozen nonce characters searchable by giving them standard Unicode values just doesn't cut it for me, because if I were going to do that kind of work, a significant amount of philological work would be required to "massage" the data into comparable formats, anyway, and use of intermediate normalized conventions would not be a problem -- in fact, it would almost be mandatory. Finally, if so
Re: Bantu click letters
At 15:34 -0700 2004-06-10, Mark Davis wrote: This argument does not hold water. Simply because some images appear in some documents does not mean that they automatically should be represented as encoded characters. Many images are not appropriate for use in plain text, or have too small a user community. They should be represented as private use characters, or as literal images. The Prince glyph, on-beyond-zebra characters, the images on images on http://www.aperfectworld.org/animals.htm, etc. are in quite a number of documents, but that doesn't mean that any of them necessarily qualify as characters for encoding. Mark, come on. Doke's phonetic transcription of !Xung is a set of explicit glyphs representing specific sounds, indeed more precisely than IPA allows (I don't think IPA specifies a representation for retroflex clicks). Apart from the question whether or not the characters are important enough for people to want to be able to interchange them as encoded UCS characters (which is stipulated as a question), it's just not on to say that these are the same kinds of things as Prince's logo or the Seussian extensions. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
> But Gutenberg may not care: they mostly (now exclusively?) publish texts > in the public domain. We publish anything previously published we can get permission on, but since we can't afford to pay for anything, we're primarily public domain. In any case, we have decades of the Reports of the Bureau of American ethnology plus many more public domain works of linguistics, so we really don't need to ask for more text. (This is really getting off topic, though.) -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
Re: Bantu click letters
This argument does not hold water. Simply because some images appear in some documents does not mean that they automatically should be represented as encoded characters. Many images are not appropriate for use in plain text, or have too small a user community. They should be represented as private use characters, or as literal images. The Prince glyph, on-beyond-zebra characters, the images on images on http://www.aperfectworld.org/animals.htm, etc. are in quite a number of documents, but that doesn't mean that any of them necessarily qualify as characters for encoding. Mark __ http://www.macchiato.com â à â - Original Message - From: "D. Starner" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thu, 2004 Jun 10 13:46 Subject: Re: Bantu click letters > John Cowan <[EMAIL PROTECTED]> writes: > > > We must be talking past one another somehow, but I don't understand how. > > To represent the text as originally written, I need a digital representation > > for each of the characters in it. Since all I want to do is reprint > > the book -- I don't need to use the unusual characters in interchange -- > > the PUA and a commissioned font seem just perfect to me. > > But that doesn't work if you're reprinting to XML or HTML, where you can't > rely upon a commissioned font being installed and correctly used. I'm not > even sure you can trust a commissioned font to be installable on the operating > systems of the next few decades. > > -- > ___ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > > >
Re: Bantu click letters
At 16:24 -0400 2004-06-10, [EMAIL PROTECTED] wrote: Asmus Freytag scripsit: That doesn't mean that we stop asking all the hard questions, but that we allow a presumption of usefulness for characters that were in demonstrated use over some time and by several authors. I quite agree. Here, however, we have (as far as the evidence goes) a single use by a single author. Many characters have been encoded with just as much. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
> Simply because some images appear in some > documents does not mean that they automatically should be represented as encoded > characters. These aren't images. They're clearly letters; they occur in running texts and represent the sounds of a spoken language. If I were transcribing them, I wouldn't encode them as pictures; I would encode them as PUA elements or XML elements (which are usually more easier to use and more reliable than the PUA). I don't think any transcriber would treat them as images (maybe display them as images, but that's purely presentational.) I'll admit that it's a bit sketchy encoding these characters based on one article by one author. But I think it important to remember that more and more text is available online, even stuff that might never get reprinted in hardcopy, and that needs Unicode. -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
RE: Bantu click letters
In light of Ken's reply it's probably not worthwhile going into the details on all points of your answer. However there are a few points were, like John, I feel you and I are simply talking past each other. Let me pick just one item: At 01:07 PM 6/10/2004, Michael Everson wrote: In any case -- and I think this is the precedent I am looking for -- this is a "script" capital Q in the same way that U+0261 is a script g. It is **not** unified with U+210A SCRIPT SMALL G. It's not a precedent, since the use of the word 'script' has different meaning in both cases. No, it doesn't. Your mathematical "script" has a meaning which is different from the one which applies to the IPA [g] and from the one I had in mind when I named the character. The early namers used the term 'script' rather indiscriminantly. For example they applied it to 2118 which they called SCRIPT CAPITAL P, even though, typographically it's a calligraphic lower case p and would have been better called *WEIERSTRASS ELLIPTIC FUNCTION (that is now annotated in the names list). Similarly, the character at 2113 so called SCRIPT SMALL L is now annotated as = mathematical symbol 'ell' * despite its character name, this symbol is derived from a special italicized version of the small letter l since that's what it is. We've in fact had to add a separate MATHEMATICAL SMALL SCRIPT L since. Similarly, the letters 0251 for which the Unicode 1.0 name was LATIN SMALL LETTER SCRIPT A and 0261 are not 'script' forms in the same way as used correctly for e.g. 2130, 2131, etc. in the Letterlike Symbols block. The mathematical alphanumerics are simply additional instances of letterlike symbols. If we can unify the historic symbol for Mark used in Germany with 2133, even though its shape allows less variation than that allowed for mathematical script fonts, we can certainly unify other uses that are letter-like. Sometimes I suspect that the fact that you are not conversant with mathematical notation, but very familiar with linguistic notations, makes you treat these two as worlds apart. However, both are specialized technical notations, and both share the feature that if you changed the font on any letter sufficiently far, you would destroy the meaning. In that way, both are different from regular 'language text' where you can transpose the text into different font styles, and preserve the meaning. A./
Re: Bantu click letters
Michael Everson scripsit: > Unless one contacted whomever it is who owns "Bantu Studies" and > simply *asked*. Carfax (part of the Taylor and Francis Group). Here's contact information: Reprints, permissions + electronic rights Joanne Nerland Taylor & Francis PO Box 2562 Solli N-0202 Oslo Norway + 47 22 12 9880 or: +47 22 12 9884 Mobile: +47 90 11 3974 +47 22 12 9890 But Gutenberg may not care: they mostly (now exclusively?) publish texts in the public domain. -- John Cowanhttp://www.ccil.org/~cowan [EMAIL PROTECTED] Please leave your valuesCheck your assumptions. In fact, at the front desk. check your assumptions at the door. --sign in Paris hotel --Cordelia Vorkosigan
Re: Bantu click letters
At 13:35 -0800 2004-06-10, D. Starner wrote: >> Due to the latest US copyright extensions, it will take us a couple >> decades, but we'll want to transcribe this article. > In 2050. I wouldn't worry about it. It's 95 years from publication, so it's 2022. In any case, it's entirely likely that some commercial organization will license these and start digitially transcribing old linguistics documents for sale to libraries. And I hardly see how the issues will change in the next 18 years. Unless one contacted whomever it is who owns "Bantu Studies" and simply *asked*. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
> From: Asmus Freytag [mailto:[EMAIL PROTECTED] > However, sometimes we have single citations where > we don't believe (for other reasons) that they are the only existing > ones, just the only ones found so far. True; I did mention that possibility at some point. > Then there is the issue brought up by D. Starner: is a work sufficiently > interesting that digital archivers like Project Gutenberg would be interested > in it. Yes, that would be a consideration. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: Bantu click letters
D. Starner scripsit: > There's at least a small user community; those people who are actively > transcribing old works, like Project Gutenberg. Due to the latest US > copyright extensions, it will take us a couple decades, but we'll want > to transcribe this article. In 2050. I wouldn't worry about it. -- Do what you will, John Cowan this Life's a Fiction[EMAIL PROTECTED] And is made up of http://www.reutershealth.com Contradiction. --William Blake http://www.ccil.org/~cowan
RE: Bantu click letters
At 12:08 PM 6/10/2004, Michael Everson wrote: At 11:53 -0700 2004-06-10, Asmus Freytag wrote: It was understood that the mathematical symbols were not to be used in language text. What was understood is that if you need a run of text in a script font you wouldn't use these characters, but would use markup. But if you needed an isolated, out of context shape, where the font style has semantic meaning, you would use these characters. That's precisely the case here. Not so. That's a statement, not an argument. Nor does it address my contention that the phonetic extensions (all of them) that are styled Latin characters are in fact equivalent to mathematical usage in that in both cases you have a letter form that carries specific semantics based on what otherwise would be font style. There's no need to have yet another clone. I disagree. Leave the math characters, please, to the math fonts. For instance, the flowery style we use now for the math block is wy to italic for harmonization with the use of the character in a phonetic context. This is a glyphic argument that doesn't hold water. The font you use is well within the range of 'script' fonts that can be used for mathematical use. In fact our font is not even the best script font of that purpose. There is nothing magically different about mathematical usage. Mathematicians will be happy to use any of the existing phonetic letters if and when the fancy strikes them. Now that Unicode is widespread I wouldn't be surprised if there weren't any mathematicians already spelunking... I am also not very happy opening the door to splitting Latin characters off into Plane 1. That's an argument of convenience. The BMP will be full at some point in the very near future, and then there will be no choice. Opening the door for a historic extension makes a more sense than for a commonly used modern orthography. I will be perfectly happy to rename the character LATIN LETTER VOICED PALATOAVEOLAR CLICK. It doesn't have an upper case property anyway. That's just hiding the issue. In any case -- and I think this is the precedent I am looking for -- this is a "script" capital Q in the same way that U+0261 is a script g. It is **not** unified with U+210A SCRIPT SMALL G. It's not a precedent, since the use of the word 'script' has different meaning in both cases. The early namers didn't have your benefit and applied these labels haphazardly. Look no further than 2118 !! In at least one case I suspect that a character named 'script' was actually intended for an *italic* shape. A./
RE: Bantu click letters
At 01:04 PM 6/10/2004, Peter Constable wrote: > That doesn't mean that we stop asking all the hard questions, but that we > allow a presumption of usefulness for characters that were in demonstrated > use over some time and by several authors. But it is precisely that status that is called into question here. Unless your definition of "several" is '>=1'. I realize that. However, sometimes we have single citations where we don't believe (for other reasons) that they are the only existing ones, just the only ones found so far. Then there is the issue brought up by D. Starner: is a work sufficiently interesting that digital archivers like Project Gutenberg would be interested in it. I don't have an opinion on the merits of this particular set of characters, but I suspect there are many Han characters that equally represent nonce usage... A./
Re: Bantu click letters
John Cowan <[EMAIL PROTECTED]> writes: > We must be talking past one another somehow, but I don't understand how. > To represent the text as originally written, I need a digital representation > for each of the characters in it. Since all I want to do is reprint > the book -- I don't need to use the unusual characters in interchange -- > the PUA and a commissioned font seem just perfect to me. But that doesn't work if you're reprinting to XML or HTML, where you can't rely upon a commissioned font being installed and correctly used. I'm not even sure you can trust a commissioned font to be installable on the operating systems of the next few decades. -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
Re: Bantu click letters
At 16:21 -0400 2004-06-10, [EMAIL PROTECTED] wrote: > You don't KNOW that. You assert that. This is the "adversarial" style I was objecting to, John. Could you please take this on board? Fair enough, Michael. But the burden of going forward with the evidence is still yours. (I'll do what I can.) I have shown (1) that they exist, (2) that they have specific usage. I have not shown them in a second document, though I have shown that Pullum & Ladusaw have quoted one word in Doke's orthography, spelling it with his peculiar use of diacritics. I would be happy to find a second use of the letters, but I consider the usefulness of being able to cite Doke in the original to be perfectly legitimate. Let's see what turns up in NYPL and LOC. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
> > Due to the latest US > > copyright extensions, it will take us a couple decades, but we'll want > > to transcribe this article. > > In 2050. I wouldn't worry about it. It's 95 years from publication, so it's 2022. In any case, it's entirely likely that some commercial organization will license these and start digitially transcribing old linguistics documents for sale to libraries. And I hardly see how the issues will change in the next 18 years. -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
RE: Bantu click letters
At 12:38 -0800 2004-06-10, D. Starner wrote: "Peter Constable" <[EMAIL PROTECTED]> writes: If > the small n with left loop is not accepted, it will be because it was a > proposal that never gained currency and has no user community. There's at least a small user community; those people who are actively transcribing old works, like Project Gutenberg. Due to the latest US copyright extensions, it will take us a couple decades, but we'll want to transcribe this article. Hence the Universal Character Set and the effort I go to write up proposals for this kind of thing. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
At 16:21 -0400 2004-06-10, [EMAIL PROTECTED] wrote: > HETA is on my to-do list. Isn't ANTISIGMA the GREEK CAPITAL REVERSED LUNATE SIGMA that's under ballot? Yes, except these letters are Latin letters (indeed, letters used to write the Latin language). You if anyone should be against unifying them with Greek letters, particularly since they were applied for purposes very different from those of sigma or heta. Then I am not sure what you are talking about (I don't know the Latin versions of these), but please take this up with me in July. My brain is full. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
Asmus Freytag scripsit: > That doesn't mean that we stop asking all the hard questions, but that we > allow a presumption of usefulness for characters that were in demonstrated > use over some time and by several authors. I quite agree. Here, however, we have (as far as the evidence goes) a single use by a single author. -- In politics, obedience and support John Cowan <[EMAIL PROTECTED]> are the same thing. --Hannah Arendthttp://www.ccil.org/~cowan
RE: Bantu click letters
"Peter Constable" <[EMAIL PROTECTED]> writes: > If > the small n with left loop is not accepted, it will be because it was a > proposal that never gained currency and has no user community. There's at least a small user community; those people who are actively transcribing old works, like Project Gutenberg. Due to the latest US copyright extensions, it will take us a couple decades, but we'll want to transcribe this article. -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
Re: Bantu click letters
Michael Everson scripsit: > You don't KNOW that. You assert that. This is the "adversarial" style > I was objecting to, John. Could you please take this on board? Fair enough, Michael. But the burden of going forward with the evidence is still yours. (I'll do what I can.) > But it is QUITE another thing for you to come out > and say that there are no other documents which make use of the same > characters. Quite so, and I retract all remarks implying that. > >In their day, there were probably a lot more documents using LATIN > >CAPITAL LETTER ANTISIGMA and LATIN CAPITAL LETTER H LEFT HALF than > >one, yet they are not encoded either. > > HETA is on my to-do list. Isn't ANTISIGMA the GREEK CAPITAL REVERSED > LUNATE SIGMA that's under ballot? Yes, except these letters are Latin letters (indeed, letters used to write the Latin language). You if anyone should be against unifying them with Greek letters, particularly since they were applied for purposes very different from those of sigma or heta. -- Newbies always ask: John Cowan "Elements or attributes? http://www.ccil.org/~cowan Which will serve me best?" http://www.reutershealth.com Those who know roar like lions; [EMAIL PROTECTED] Wise hackers smile like tigers. --a tonka, or extended haiku
RE: Some thoughts on encoding specialized notations: was RE: Bantu click letters
> From: Asmus Freytag [mailto:[EMAIL PROTECTED] > Any notation for a highly specialized subject would always tend to suffer > from a very small number of participants. This is not a-priori a reason to > force this notation into private use. Just to clarify: I have not at any point contended that the characters in Michael's proposal must be considered PUA. I simply commented that I had expected something with such little usage would be contested, which by implication raised the question as to whether these characters should be encoded in spite of their very limited usage. In relation to that question, your suggestion > One of our goals in this direction > would be to enable publishers to support online editions of a large number > of fields without running into a hodge-podge of supported vs. non-supported > characters. seems to me to be worth consideration. > For historical notations, issues are different. If a modern notations has > completely replaced the historical notation, it should be treated the in > the same manner as archaic scripts, that is, the focus should be on what's > needed or useful to support historians of the discipline. If a notation was > widespread before being supplanted, that would strengthen the case for > supporting it, as the likelihood that symbols will be referenced in modern > contexts is that much greater. In this particular case, the notation was clearly not in widespread use. The question then is whether it would be useful to linguists or documenters of the history of linguistics. So far after 80 years, there is no known indication that linguists have a use for these; Pullum and Ladusaw were, in part, the latter, and did not find these in need of documentation. Of course, that does not imply that other documenters have no need, and there may be linguists for whom these would be useful that are simply not known to us. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Asmus Freytag > As a matter of basic parity, I just don't > see why we take such great pains to standardize extremely rare forms of Han > ideographs, but baulk at supporting our own writing system and its > extensions equally faithfully. Point taken. > That doesn't mean that we stop asking all the hard questions, but that we > allow a presumption of usefulness for characters that were in demonstrated > use over some time and by several authors. But it is precisely that status that is called into question here. Unless your definition of "several" is '>=1'. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: Bantu click letters
At 12:50 -0700 2004-06-10, Asmus Freytag wrote: That's a statement, not an argument. Nor does it address my contention that the phonetic extensions (all of them) that are styled Latin characters are in fact equivalent to mathematical usage in that in both cases you have a letter form that carries specific semantics based on what otherwise would be font style. "Style" per se is applied to Mathematical characters, regularly, and meaningfully. Just because "script" is part of the name of U+0261 does not mean that "style" as in HTML markup is what makes it look like that; that's just not the case, and you can't read that much into the name. It is for the same reason that I chose the name "script" when ***I*** named the voiced palatoalveolar click. I recognized its shape as similar to some forms of script Q. There are other forms of script Q. I did not consider it to be "styled" in the same way. By the way, I *made* the glyph out of U+0541 ARMENIAN LETTER JA, andd it looks a lot more like that than a script Q, so PLEEEAASE let's not jump overboard on a crusade to unify this character with a mathematical character, OK? To do so would be really very silly. I disagree. Leave the math characters, please, to the math fonts. For instance, the flowery style we use now for the math block is wy to italic for harmonization with the use of the character in a phonetic context. This is a glyphic argument that doesn't hold water. The font you use is well within the range of 'script' fonts that can be used for mathematical use. In fact our font is not even the best script font of that purpose. I'm aware of that; I still do not think we should start encouraging linguists to go off into the mathematical characters and press them into service for phonetics. Letters are letters. There is nothing magically different about mathematical usage. Mathematicians will be happy to use any of the existing phonetic letters if and when the fancy strikes them. Now that Unicode is widespread I wouldn't be surprised if there weren't any mathematicians already spelunking... Mathematicians can do what they like. That's an argument of convenience. The BMP will be full at some point in the very near future, and then there will be no choice. Opening the door for a historic extension makes a more sense than for a commonly used modern orthography. There is no value to unifying this with the maths character just because *I* named it that way for reasons which you misconstrue. I will be perfectly happy to rename the character LATIN LETTER VOICED PALATOAVEOLAR CLICK. It doesn't have an upper case property anyway. That's just hiding the issue. No, it's not. There is nothing particularly Q-like about the character in question; it's more JA-like anyway. It was a superficial identification I made; had I simply named it VOICED PALATOAVEOLAR CLICK, we would probably not be having this conversation. In any case -- and I think this is the precedent I am looking for -- this is a "script" capital Q in the same way that U+0261 is a script g. It is **not** unified with U+210A SCRIPT SMALL G. It's not a precedent, since the use of the word 'script' has different meaning in both cases. No, it doesn't. Your mathematical "script" has a meaning which is different from the one which applies to the IPA [g] and from the one I had in mind when I named the character. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
At 03:47 AM 6/10/2004, Michael Everson wrote: At 00:11 -0400 2004-06-10, Ernest Cline wrote: > [Original Message] From: Michael Everson <[EMAIL PROTECTED]> Practice your tongue-twisting. Proposal to add Bantu phonetic click characters to the UCS http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf Why wouldn't U+1D4AC MATHEMATICAL SCRIPT CAPITAL Q work for the script capital Q? At the very least I feel that should be explained. It was understood that the mathematical symbols were not to be used in language text. Well this isn't 'language text' as it would be for a modern orthography, but specialized notation. I see no reason to rule out a unification in this case. The general category of this character is 'Lu', letter upper. It does *not* have a case mapping to /script small q/ but that would be correct as there's not case pair for the proposed character. If this was the case of a modern orthography, where case pairs would be needed, and where development of font and usage could take place over time in unanticipated directions, then a disunification would make a lot more sense and I would support it. But for this very limited and technical notation, it appears unwarranted. A./
Re: Bantu click letters
At 07:00 AM 6/10/2004, John Cowan wrote: (LATIN LETTER OWL, indeed.) This is an interesting symbol as a fairly similar symbol is used in Japan to annotate phone numbers - if I correctly understand those that have a taped message or automated response system. We don't have a symbol for the latter in Unicode, but a quick look at modern Japanese material finds instances of this quickly. The problem in adding a letter owl by itself is that it invites incorrect, shape based mappings from East Asian sets or fonts. We would be better off if we could pair the letter owl with a simultaneous but separate addition of a technical symbol. A./
RE: Some thoughts on encoding specialized notations: was RE: Bantu click letters
At 12:08 PM 6/10/2004, Peter Constable wrote: > From: Asmus Freytag [mailto:[EMAIL PROTECTED] > Any notation for a highly specialized subject would always tend to suffer > from a very small number of participants. This is not a-priori a reason to > force this notation into private use. Just to clarify: I have not at any point contended that the characters in Michael's proposal must be considered PUA. I simply commented that I had expected something with such little usage would be contested, which by implication raised the question as to whether these characters should be encoded in spite of their very limited usage. I think that was someone else.. In relation to that question, your suggestion > One of our goals in this direction > would be to enable publishers to support online editions of a large number > of fields without running into a hodge-podge of supported vs. non-supported > characters. seems to me to be worth consideration. I then wrote in the original thread: To represent the text as originally written, I need a digital representation for each of the characters in it. Since all I want to do is reprint the book -- I don't need to use the unusual characters in interchange -- the PUA and a commissioned font seem just perfect to me. In the modern world many forms of publication require interchange. For example, anything that's HTML based does poorly with non-standardized characters. So does storage in databases. If you can conceive of a digital re-edition of a prominent work (including citation from) and can assume that there's some realistic chance that technologies other than faximile or PDF would be brought to bear, then you have the interchange requirement, even if noone uses the notation for new text. Over time, I'm becoming more supportive of Michael's stance of inclusiveness in that direction. As a matter of basic parity, I just don't see why we take such great pains to standardize extremely rare forms of Han ideographs, but baulk at supporting our own writing system and its extensions equally faithfully. but this would belong better as part of this more generic discussion. > For historical notations, issues are different. If a modern notations has > completely replaced the historical notation, it should be treated the in > the same manner as archaic scripts, that is, the focus should be on what's > needed or useful to support historians of the discipline. If a notation was > widespread before being supplanted, that would strengthen the case for > supporting it, as the likelihood that symbols will be referenced in modern > contexts is that much greater. In this particular case, the notation was clearly not in widespread use. The question then is whether it would be useful to linguists or documenters of the history of linguistics. So far after 80 years, there is no known indication that linguists have a use for these; Pullum and Ladusaw were, in part, the latter, and did not find these in need of documentation. Of course, that does not imply that other documenters have no need, and there may be linguists for whom these would be useful that are simply not known to us. These are good questions. But remember, the notation in question is also limited in another way: it applies to features not shared by many languages. I'm not an expert enough to know whether that adds another level of rarity, because it means the potential number of users of these characters was always limited, then and now. But let's consider an extreme, but for now hypothetical example. Assume a seminal work, for example comparable to Newton's works, that spawns an entire field or discipline. If such a work used notation that was quickly replaced by something else, it would still be useful to consider it for its historic aspect, even it only one author used it - as the presumption that such a work and its notation will be cited or explained by historians is clearly quite strong. A./
Re: Bantu click letters
At 12:15 -0700 2004-06-10, Asmus Freytag wrote: Over time, I'm becoming more supportive of Michael's stance of inclusiveness in that direction. As a matter of basic parity, I just don't see why we take such great pains to standardize extremely rare forms of Han ideographs, but baulk at supporting our own writing system and its extensions equally faithfully. Thank you. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
At 07:46 AM 6/10/2004, John Cowan wrote: To represent the text as originally written, I need a digital representation for each of the characters in it. Since all I want to do is reprint the book -- I don't need to use the unusual characters in interchange -- the PUA and a commissioned font seem just perfect to me. In the modern world many forms of publication require interchange. For example, anything that's HTML based does poorly with non-standardized characters. So does storage in databases. If you can conceive of a digital re-edition of a prominent work (including citation from) and can assume that there's some realistic chance that technologies other than faximile or PDF would be brought to bear, then you have the interchange requirement, even if noone uses the notation for new text. Over time, I'm becoming more supportive of Michael's stance of inclusiveness in that direction. As a matter of basic parity, I just don't see why we take such great pains to standardize extremely rare forms of Han ideographs, but baulk at supporting our own writing system and its extensions equally faithfully. That doesn't mean that we stop asking all the hard questions, but that we allow a presumption of usefulness for characters that were in demonstrated use over some time and by several authors. A./
Re: Bantu click letters
At 13:50 -0400 2004-06-10, [EMAIL PROTECTED] wrote: Michael Everson scripsit: You have a weird view of the history of phonetics, John. You haven't addressed the substantive issue: these are Latin characters used to represent sounds which in 1925 could not easily be represented. And never have been represented thus since. You don't KNOW that. You assert that. This is the "adversarial" style I was objecting to, John. Could you please take this on board? It is one thing for me to make a proposal with evidence from one document and have it questioned. (I have on many other occasions proposed archaic phonetic characters with as much evidence and had them accepted, which is one reason I think the grilling is a bit gratuitious here.) But it is QUITE another thing for you to come out and say that there are no other documents which make use of the same characters. In their day, there were probably a lot more documents using LATIN CAPITAL LETTER ANTISIGMA and LATIN CAPITAL LETTER H LEFT HALF than one, yet they are not encoded either. HETA is on my to-do list. Isn't ANTISIGMA the GREEK CAPITAL REVERSED LUNATE SIGMA that's under ballot? > Indeed, there are click letters like the STRETCHED C which did get into IPA and were later deprecated. So you can represent the STRETCHED C in chu: as Doke writes it (as do Pullum and Ladusaw, using Doke's diacritics as well) but you can't represent Doke's other letters? This doesn't make sense. It makes sense because others used STRETCHED C (and indeed it was part of the standard for a while), but no one has used OWL before or since. Prove it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
At 11:32 -0700 2004-06-10, Peter Constable wrote: We're talking about the same group of languages. I believe they use similar orthographies. > Also: What about upper case forms? The uppercase of !xhosa is !Xhosa. Uppercase versions of phonetic symbols are a concern only if the phonetic symbols gain currency, which is not the case here. True. Though it would be fun to draw some of them. I believe, by the way, that chû: is now written !Xung though whether or not the speakers are literate and make use of a practical orthography A couple of errors were corrected in the version which is on my web site (including the title); that document has been sent to UTC and WG2. In the first sentence it suggests that chû: is "Kxoe", but on further research this seems to be a different language. I think chû: is what the Ethnologue calls Kung-Ekoka (in Namibia). -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
At 11:53 -0700 2004-06-10, Asmus Freytag wrote: It was understood that the mathematical symbols were not to be used in language text. What was understood is that if you need a run of text in a script font you wouldn't use these characters, but would use markup. But if you needed an isolated, out of context shape, where the font style has semantic meaning, you would use these characters. That's precisely the case here. Not so. There's no need to have yet another clone. I disagree. Leave the math characters, please, to the math fonts. For instance, the flowery style we use now for the math block is wy to italic for harmonization with the use of the character in a phonetic context. I am also not very happy opening the door to splitting Latin characters off into Plane 1. I will be perfectly happy to rename the character LATIN LETTER VOICED PALATOAVEOLAR CLICK. It doesn't have an upper case property anyway. In any case -- and I think this is the precedent I am looking for -- this is a "script" capital Q in the same way that U+0261 is a script g. It is **not** unified with U+210A SCRIPT SMALL G. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
It was understood that the mathematical symbols were not to be used in language text. What was understood is that if you need a run of text in a script font you wouldn't use these characters, but would use markup. But if you needed an isolated, out of context shape, where the font style has semantic meaning, you would use these characters. That's precisely the case here. There's no need to have yet another clone. A./
Some thoughts on encoding specialized notations: was RE: Bantu click letters
Any notation for a highly specialized subject would always tend to suffer from a very small number of participants. This is not a-priori a reason to force this notation into private use. One of our goals in this direction would be to enable publishers to support online editions of a large number of fields without running into a hodge-podge of supported vs. non-supported characters. This issue is squarely faced by mathematicians all the time (in fact, mathematicians and linguists are very similar in their voraciousness of pressing unrelated or novel symbols into use in extending their notatins to new sub-fields). If a notational extension is very new, and not widely adopted, it makes sense holding off on permanently adding characters to support it -- until it is more widely established. For historical notations, issues are different. If a modern notations has completely replaced the historical notation, it should be treated the in the same manner as archaic scripts, that is, the focus should be on what's needed or useful to support historians of the discipline. If a notation was widespread before being supplanted, that would strengthen the case for supporting it, as the likelihood that symbols will be referenced in modern contexts is that much greater. If occasional use or reference to the historic notation can be documented, then it would be more appropriate to treat it like a rare script, or like historic additions to modern scripts, which see occasional use. If there's known ongoing use, or documented recent citations of older notation, then it's really a case of modern use of a specialized notation and it should be treated like that. A./
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Anto'nio Martins-Tuva'lkin > Something else: What is the usual spelling for these phonemes in > today's orthography? Clicks in Xhosa and Zulu are spelt nowadays with > usual Latin letters (c, q, x etc.). We're talking about the same group of languages. I believe they use similar orthographies. > Also: What about upper case forms? The uppercase of !xhosa is !Xhosa. Uppercase versions of phonetic symbols are a concern only if the phonetic symbols gain currency, which is not the case here. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Category of "Mathematic Alphanumeric Symbols" (was: "Re: Bantu click letters")
On 2004.06.10, 11:47, Michael Everson <[EMAIL PROTECTED]> answered: >> Why wouldn't U+1D4AC MATHEMATICAL SCRIPT CAPITAL Q work for the >> script capital Q? At the very least I feel that should be >> explained. > > It was understood that the mathematical symbols were not to be used > in language text. I though the very same, but U+1D4AC's category is simply "Lu [Letter, Uppercase]". In fact all math symbols of the 1D400-1D7FF block are simply Lu or Ll (or Nd) and "can" be used as letters; I have somehow asumed otherwise. --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: Bantu click letters
Michael Everson scripsit: > You have a weird view of the history of phonetics, John. You haven't > addressed the substantive issue: these are Latin characters used to > represent sounds which in 1925 could not easily be represented. And never have been represented thus since. In their day, there were probably a lot more documents using LATIN CAPITAL LETTER ANTISIGMA and LATIN CAPITAL LETTER H LEFT HALF than one, yet they are not encoded either. (Though LATIN CAPITAL LETTER TURNED F is.) > Indeed, there are click letters like the STRETCHED C > which did get into IPA and were later deprecated. So you can > represent the STRETCHED C in chu: as Doke writes it (as do Pullum and > Ladusaw, using Doke's diacritics as well) but you can't represent > Doke's other letters? This doesn't make sense. It makes sense because others used STRETCHED C (and indeed it was part of the standard for a while), but no one has used OWL before or since. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] Be yourself. Especially do not feign a working knowledge of RDF where no such knowledge exists. Neither be cynical about RELAX NG; for in the face of all aridity and disenchantment in the world of markup, James Clark is as perennial as the grass. --DeXiderata, Sean McGrath
Re: Bantu click letters
At 17:11 +0100 2004-06-10, Anto'nio Martins-Tuva'lkin wrote: What about U+0251 U+0361 U+0302 U+028A ? After a "double" diacritical, any further combining character could take as its base the "pair" of spacing characters "under" the said double diacritical, shouldn't it? I tried that in TextEdit, which is pretty smart, and the second diacritic didn't centre over the pair, but rather over the 0251. But I guess that's the only choice, and it would be a question of making a precomposed glyph. Note that, U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see attached) is not productively different from U+0251 U+0302 U+0361 U+028A (see attached)... OS X does it correctly. (Though I didn't see your gif.) -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Michael Everson > I have an offprint of Doke's article in Bantu Studies. We have noted > that 70 years later Pullum and Ladusaw cite a word (the word > stretchedc-h-utildecaronbelow-triangularcolon chu:) in Doke's > orthography. Isn't that an indication that the work and its > characters have not been lost to history? It is, but it's that the stretched C that's been called into question. There is no question that that character gained currency -- it was adopted for a time by the IPA; so also did the qp ligature and db ligature gain currency -- and those have been accepted for encoding. If the small n with left loop is not accepted, it will be because it was a proposal that never gained currency and has no user community. > It's a little peculiar to suggest that data has to be printed in two > books in order to be considered "interchangeable". Books don't > interchange data between themselves. Users do. ;-) Books are only indicators of the users; a lack of attestation in books by anyone besides Doke is suggestive of a lack of a user community. P&L clearly indicated that these characters were excluded from their compilation because they never gained currency, and that strongly suggests a lack of user community. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
Re: Bantu click letters
On 2004.06.10, 03:28, Michael Everson <[EMAIL PROTECTED]> wrote: > Proposal to add Bantu phonetic click characters to the UCS > http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf On page 10, Michael askes: > UTC advice as to the correct encoding of these sequences would be > welcome. What about U+0251 U+0361 U+0302 U+028A ? After a "double" diacritical, any further combining character could take as its base the "pair" of spacing characters "under" the said double diacritical, shouldn't it? Note that, U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see attached) is not productively different from U+0251 U+0302 U+0361 U+028A (see attached)... --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: Bantu click letters
On 2004.06.10, 15:14, John Wilcock <[EMAIL PROTECTED]> wrote: > it seems to me that this information could be important for the > proposal Something else: What is the usual spelling for these phonemes in today's orthography? Clicks in Xhosa and Zulu are spelt nowadays with usual Latin letters (c, q, x etc.). Also: What about upper case forms? (And BTW: thanks, Michael, for one more!) --. António MARTINS-Tuválkin | ()| <[EMAIL PROTECTED]>|| PT-1XXX-XXX LISBOA Não me invejo de quem tem| +351 934 821 700 carros, parelhas e montes| http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe| http://pagina.de/bandeiras/ a água em todas as fontes|
Re: Bantu click letters
Peter Constable scripsit: > Would you consider these too idiosyncratic? No. The "idio-" in "idiosyncratic" has to do with an individual. I forgot to point this out earlier, but !Xu phonology isn't idiosyncratic either -- it's just unusual. To the !Xu it's the normal thing. -- Is a chair finely made tragic or comic? Is the John Cowan portrait of Mona Lisa good if I desire to see [EMAIL PROTECTED] it? Is the bust of Sir Philip Crampton lyrical, www.ccil.org/~cowan epical or dramatic? If a man hacking in fury www.reutershealth.com at a block of wood make there an image of a cow, is that image a work of art? If not, why not? --Stephen Dedalus
Re: Bantu click letters
Patrick Andries a écrit : Michael Everson a écrit : Practice your tongue-twisting. Proposal to add Bantu phonetic click characters to the UCS http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf :-P Are these letters used in any other book than Doke's book on Kalahari Bushmen ? P. A. [PA] I don't think I got a direct answer on these non Bantu clik symbols being used in any other book. If these symbols are indeed used in a single book and by a single author, I would put them in the PUA, I don't see any interchange requirement to do otherwise. If letters unique to an author may now be encoded in Unicode, I have many to propose to the enabling technology that Unicode is and people will be free to use them or not. P.A.
Re: Bantu click letters
Michael Everson scripsit: > > > Effort and expense was made to cut the letters for the publication. > > > >And today, if I were reprinting it, I'd commission a digital font > >(your effort, my expense) and put the characters in the PUA. > > Not if you wanted, as an Africanist, to be able to represent the text > as it was originally written. We must be talking past one another somehow, but I don't understand how. To represent the text as originally written, I need a digital representation for each of the characters in it. Since all I want to do is reprint the book -- I don't need to use the unusual characters in interchange -- the PUA and a commissioned font seem just perfect to me. > You don't know whether or not they were only used in a single > document. You know only that I *own* that single document. You are > declaring the characters guilty until proved innocent. That's > antagonistic. I intend no antagonism. We treat the Phaistos-disk characters as guilty until proven innocent, for the same reason -- there's only one text. (It's also true that we can't interpret them, which is additional evidence against them.) There's no *point* in encoding the PD characters because they aren't used in interchange -- see above. > >If I decided to start using thorn instead of theta in my otherwise > >IPA transcriptions, that would be an idiosyncratic use of it. > > Plenty of Germanist transcriptions use thorn. In any case, the > analogy isn't relevant, as both thorn and theta are encoded and > available for use. I was talking about what it means to be idiosyncratic. (Not that either of us need any real instruction on the subject!) > >(LATIN LETTER OWL, indeed.) > > COMBINING SEAGULL BELOW, indeed. LATIN LETTER OI, indeed. :-) > [OWL] is interesting, by the way. Asmus says it's similar to > something the Japanese use for telephone answering machines. I don't > know about that, though it looks familar to me. I wonder what Doke's > source for it was. It looks to me the sort of thing that would be easy to reinvent. Some of my habitual doodles are much like it. > I was astonished because I hadn't seen them before. That does not > mean I didn't believe that they weren't worthy of encoding. Just > because I hadn't seen them before doesn't mean they don't exist and > aren't worthy of encoding either. Khoisian phonology is rather > esoteric, after all. Sure. I was addressing the question of the *novelty* of the characters. If neither you nor I nor anyone else in this community has seen them before, they are most certainly novel. > I am gobsmacked. On what grounds are these not characters? They are > not glyph representations of other characters. They *are* characters. It's just not useful to encode them, any more than it's useful to encode most of the scripts in the Conscript Registry. Find more documents, and the picture changes. (Find more Phaistos-type disks, and that picture changes too.) -- If you have ever wondered if you are in hell, John Cowan it has been said, then you are on a well-traveled http://www.ccil.org/~cowan road of spiritual inquiry. If you are absolutely http://www.reutershealth.com sure you are in hell, however, then you must be [EMAIL PROTECTED] on the Cross Bronx Expressway. --Alan Feuer, NYTimes, 2002-09-20
Re: Bantu click letters
At 08:51 -0700 2004-06-10, Patrick Andries wrote: Not if you wanted, as an Africanist, to be able to represent the text as it was originally written. Could you please explain this, how would using PUA characters prevent the text to be represented as it was originally written ? What would the value of that be? Doke was an important Africanist. His characters have specific (very, very specific) phonetic values. Why shouldn't a Khoisan database be able to represent these characters as written? Why should the PUA be proposed for these? Some formerly-used click letters are encoded and available for use. Why shouldn't these, in principle? Many of the UPA characters are not used productively today, but they remain important for citation. As does the LATIN SMALL LETTER INSULAR G for that matter, and other archaic phonetic characters which have been encoded. If these symbols are indeed used in a single book and by a single author, I would put them in the PUA, I don't see any interchange requirement to do otherwise. If letters unique to an author may now be encoded in Unicode, I have many to propose to the enabling technology that Unicode is and people will be free to use them or not. I have an offprint of Doke's article in Bantu Studies. We have noted that 70 years later Pullum and Ladusaw cite a word (the word stretchedc-h-utildecaronbelow-triangularcolon chu:) in Doke's orthography. Isn't that an indication that the work and its characters have not been lost to history? It's a little peculiar to suggest that data has to be printed in two books in order to be considered "interchangeable". Books don't interchange data between themselves. Users do. ;-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
At 07:00 -0700 2004-06-10, Peter Constable wrote: What about Bell's Visible Speech? They're on our list. As are i.t.a and the Phonotypy characters. I'll bring a lovely Phonotypic text with me to Toronto. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
At 08:36 -0700 2004-06-10, Peter Constable wrote: Don't you think the fact that P&L don't show them might suggest that, in fact, authors today *don't* particularly use them? Not necessarily. Indeed, they do quote the name chu: with STRETCHED C, and with both diacritics, the TILDE for nasalization (which is standard) and the CARON BELOW for the rising tone (which is not). So Pullum and Ladusaw are *using* Doke's orthography. If they wanted to show a different word in that orthography they would have to use one of Doke's other letters. I looked through many publications last year searching for attested phonetic symbols not yet encoded, and while my search wasn't specifically focused on Africanist usage, I did go through a number of Africanist items and never once saw any of these. Big world, isn't it? There's all those non-Slavic Cyrillic characters which haven't turned up again either. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
Michael Everson a écrit : At 10:00 -0400 2004-06-10, John Cowan wrote: And today, if I were reprinting it, I'd commission a digital font (your effort, my expense) and put the characters in the PUA. Not if you wanted, as an Africanist, to be able to represent the text as it was originally written. Could you please explain this, how would using PUA characters prevent the text to be represented as it was originally written ? P. A.
Re: Bantu click letters
At 11:00 -0400 2004-06-10, John Cowan wrote: Michael Everson scripsit: Although Pullum and Ladusaw don't show the glyphs, they refer specifically to Doke's characters (s.v. ///). They describe them as "ad hoc" which I suppose the were, in 1925, though "novel" would do as well as they aren't entirely arbitrary and they weren't "found" bits of lead type pressed into other service -- they were cut to order. If Sequoyah had had clout, we'd probably be using his original characters for Cherokee today. Not only non sequitur, but an unreasonable assumption. My point was that I considered Pullum and Ladusaw's use of the word "ad hoc" to be unlikely. If they were ad-hoc, any printer's sorts might be used. (Pullum and Ladusaw are not infallible of course; cf Yogh.) > That Pullum and Ladusaw have not forgotten Doke's characters suggests > that Africanists will also likely not forget them, and will find use in access to them as encoded characters in the UCS. It's P&L's business to remember what would otherwise be (mercifully, in some cases) forgotten, so that people who need to interpret old documents have some hope of doing so. You have a weird view of the history of phonetics, John. You haven't addressed the substantive issue: these are Latin characters used to represent sounds which in 1925 could not easily be represented. That they didn't become the IPA standard for representing them is accidental. Indeed, there are click letters like the STRETCHED C which did get into IPA and were later deprecated. So you can represent the STRETCHED C in chu: as Doke writes it (as do Pullum and Ladusaw, using Doke's diacritics as well) but you can't represent Doke's other letters? This doesn't make sense. I would like to know where the STRETCHED C comes from, actually. Pullum & Ladusaw note that Beach used it in 1938, and proposed a curly version in the same year, but it certainly predates that since Doke uses it. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Michael Everson > >Of course, it's an empirical question as to whether anyone else in that > >era did, in fact, adopt any of these symbols, or whether authors today > >ever use them (e.g. in citing Doke, whose work was of some importance in > >Africanist linguistics). > > It's reasonable to think that they would. Although Pullum and Ladusaw > don't show the glyphs, they refer specifically to Doke's characters > (s.v. ///). Don't you think the fact that P&L don't show them might suggest that, in fact, authors today *don't* particularly use them? I looked through many publications last year searching for attested phonetic symbols not yet encoded, and while my search wasn't specifically focused on Africanist usage, I did go through a number of Africanist items and never once saw any of these. > That Pullum and Ladusaw have not forgotten Doke's characters suggests > that Africanists will also likely not forget them, and will find use > in access to them as encoded characters in the UCS. I'm inclined to think there's probably greater likelihood that one of the few modifier letters I proposed but that weren't accepted, e.g. a MODIFIER LETTER SMALL TURNED Y, would be used than one of Doke's idiosyncratic symbols. But, they were indeed rejected, and for now remain PUA only (supported in the Doulos SIL font). Peter Constable
Re: Bantu click letters
At 10:46 -0400 2004-06-10, John Cowan wrote: We must be talking past one another somehow, but I don't understand how. To represent the text as originally written, I need a digital representation for each of the characters in it. Since all I want to do is reprint the book -- I don't need to use the unusual characters in interchange -- the PUA and a commissioned font seem just perfect to me. Erm. You could say that about ANY additions to the Unicode Standard! I intend no antagonism. It is perceived. "No! Bad characters! No biscuit!" We treat the Phaistos-disk characters as guilty until proven innocent, for the same reason -- there's only one text. I would disagree. Say it were a bilingual and we could read it. Do you really think we wouldn't encode the script? In any case, it's not a true analogy, since Phaistos presents a script, and the Khoisian characters are phonetic additions to Latin. There's no *point* in encoding the PD characters because they aren't used in interchange -- see above. This doesn't make any sense. I have the Phaistos text encoded with PUA characters and a font available for it. If you wanted to exchange the text (by sending it to someone else) you could do so. If Phaistos were encoded outside of the PUA, it would likewise be exchangeable. Bits of Phaistos could be inserted into Latin or Greek or Russian text describing them. And those texts could be interchanged. > >If I decided to start using thorn instead of theta in my otherwise > >IPA transcriptions, that would be an idiosyncratic use of it. > > Plenty of Germanist transcriptions use thorn. In any case, the analogy isn't relevant, as both thorn and theta are encoded and available for use. I was talking about what it means to be idiosyncratic. That isn't what Doke was doing. He was representing what are to us extremely strange sounds in the Latin script. I was addressing the question of the *novelty* of the characters. If neither you nor I nor anyone else in this community has seen them before, they are most certainly novel. That is not a reason to consign them to the PUA. > I am gobsmacked. On what grounds are these not characters? They are not glyph representations of other characters. They *are* characters. It's just not useful to encode them, any more than it's useful to encode most of the scripts in the Conscript Registry. If they are encoded, then historians of Khoisian linguistics can make use of them. In what way is this "not useful"? Find more documents, and the picture changes. Go to the NYPL and look up Bantu Studies and some of Doke's other works for me, will you? I'll be in Markham for the next fortnight. Of course I will do what I can when I'm at the Library of Congress in early July, but you are welcome to assist. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Michael Everson > You don't know whether or not they were only used in a single > document. You know only that I *own* that single document. You are > declaring the characters guilty until proved innocent. That's > antagonistic. Indeed. Didn't everyone have to become a signatory to the Universal Declaration of Character Rights before subscribing? > Khoisian phonology is rather > esoteric, after all. Esoteric?? (Do we perhaps need to review the meaning of this word?) > > > Private use? Be > >> serious, John. That's a pretty ridiculous suggestion. > > > >I am serious. The PUA is the proper place for these things. > > I am gobsmacked. On what grounds are these not characters? They are > not glyph representations of other characters. The PRE-PALATAL N is > described in terms of its phonology as being neither N nor N WITH > LEFT HOOK. If I publish a web page using DIAGONAL X WITH TURNED HOOK to represent something that's not quite this or that cardinal phonetic value, does it automatically become a character worthy of encoding? This isn't about character rights. It's about criteria for deciding what to encode or not to encode. Peter Constable
Re: Bantu click letters
Peter Constable wrote: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Cowan [T]he Unicode Standard does not encode idiosyncratic, personal, novel, or private use characters [...]. What about Bell's Visible Speech? (I'm sure I've seen it discussed here on on qalam, but I've no recollection what might have been said.) I don't know what Bell might have published, but they were also used by Sweet: Sweet, Henry. 1906. A primer of phonetics. 3rd edn., revised. Oxford: Clarendon Press. Would you consider these too idiosyncratic? I hope not. IMO Visible Speech *definitely* deserves encoding in Plane 1. Bell used it some, and I have several articles by Sweet in which he used it, and I even managed to find an article by someone else using Visible Speech. Everything is a novel invention once; the question is whether it has a life (or at least a significance) beyond its inventor. (cf. Shavian, which probably was used not more than, and likely less than, Visible Speech). In fact, in the movie of My Fair Lady, Visible Speech is, in fact, well, visible in Henry Higgins' notebook. I have a font (my own), and proposal for VS is languishing on my hard drive; it should someday be finished up and submitted. ~mark (owner of visiblespeech.info, which someday, I hope, will actually have useful VS information on it)
Re: Bantu click letters
Michael Everson scripsit: > Although Pullum and Ladusaw > don't show the glyphs, they refer specifically to Doke's characters > (s.v. ///). They describe them as "ad hoc" which I suppose the were, > in 1925, though "novel" would do as well as they aren't entirely > arbitrary and they weren't "found" bits of lead type pressed into > other service -- they were cut to order. If Sequoyah had had clout, we'd probably be using his original characters for Cherokee today. > That Pullum and Ladusaw have not forgotten Doke's characters suggests > that Africanists will also likely not forget them, and will find use > in access to them as encoded characters in the UCS. It's P&L's business to remember what would otherwise be (mercifully, in some cases) forgotten, so that people who need to interpret old documents have some hope of doing so. What we need is more evidence: either documentary evidence, or the evidence of breathing Africanists. -- John Cowan <[EMAIL PROTECTED]> http://www.ccil.org/~cowan http://www.reutershealth.com Charles li reis, nostre emperesdre magnes, Set anz totz pleinz ad ested in Espagnes.
RE: Bantu click letters
At 07:11 -0700 2004-06-10, Peter Constable wrote: If no other author uses them, then I think it's not unreasonable to suggest that they are private-use: Doke puts the terms of the agreement into his product, his readers enter into that agreement when they decide to read the book. It is "private-use" as opposed to conventional use if the readers agree to read his symbols but don't adopt them for their own use. It's not like it's samizdat, though. Of course, it's an empirical question as to whether anyone else in that era did, in fact, adopt any of these symbols, or whether authors today ever use them (e.g. in citing Doke, whose work was of some importance in Africanist linguistics). It's reasonable to think that they would. Although Pullum and Ladusaw don't show the glyphs, they refer specifically to Doke's characters (s.v. ///). They describe them as "ad hoc" which I suppose the were, in 1925, though "novel" would do as well as they aren't entirely arbitrary and they weren't "found" bits of lead type pressed into other service -- they were cut to order. That Pullum and Ladusaw have not forgotten Doke's characters suggests that Africanists will also likely not forget them, and will find use in access to them as encoded characters in the UCS. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Michael Everson > The sounds they represent are idiosyncratic and difficult to > describe, much less write. Personal? No: he published. Novel? Perhaps > (in 1925); Doke is likely to have devised them. Private use? Be > serious, John. That's a pretty ridiculous suggestion. If no other author uses them, then I think it's not unreasonable to suggest that they are private-use: Doke puts the terms of the agreement into his product, his readers enter into that agreement when they decide to read the book. It is "private-use" as opposed to conventional use if the readers agree to read his symbols but don't adopt them for their own use. Of course, it's an empirical question as to whether anyone else in that era did, in fact, adopt any of these symbols, or whether authors today ever use them (e.g. in citing Doke, whose work was of some importance in Africanist linguistics). Peter Constable
Re: Bantu click letters
On Thu, 10 Jun 2004 14:30:12 +0100, Michael Everson wrote: > They were published in Bantu Studies in 1925 in an article by a > rather important scholar in the field of African linguistics. Effort > and expense was made to cut the letters for the publication. But have they been used in other publications since? Are they used by scholars of African linguistics today? [I have no idea whether they are or not, but it seems to me that this information could be important for the proposal] John. -- -- Over 2400 webcams from ski resorts around the world - www.snoweye.com -- Translate your technical documents and web pages- www.tradoc.fr
Re: Bantu click letters
At 10:00 -0400 2004-06-10, John Cowan wrote: Michael Everson scripsit: They were published in Bantu Studies in 1925 in an article by a rather important scholar in the field of African linguistics. We don't encode characters according to the clout of the user, or the Apple logo would have been in Unicode long since. :-) False analogy. The Apple logo is a logo. Phonetic characters are phonetic characters. > Effort and expense was made to cut the letters for the publication. And today, if I were reprinting it, I'd commission a digital font (your effort, my expense) and put the characters in the PUA. Not if you wanted, as an Africanist, to be able to represent the text as it was originally written. > The sounds they represent are idiosyncratic and difficult to describe, much less write. I think that characters used in a single document by a single scholar, however prestigious, can fairly be described as idiosyncratic to him. You don't know whether or not they were only used in a single document. You know only that I *own* that single document. You are declaring the characters guilty until proved innocent. That's antagonistic. If I decided to start using thorn instead of theta in my otherwise IPA transcriptions, that would be an idiosyncratic use of it. Plenty of Germanist transcriptions use thorn. In any case, the analogy isn't relevant, as both thorn and theta are encoded and available for use. If instead I used OVERCLOCKED HOOCHIMADINGER SYMBOL, that would be even more idiosyncratic. (LATIN LETTER OWL, indeed.) COMBINING SEAGULL BELOW, indeed. This symbol is interesting, by the way. Asmus says it's similar to something the Japanese use for telephone answering machines. I don't know about that, though it looks familar to me. I wonder what Doke's source for it was. > Personal? No: he published. Fair enough. Thank you. > Novel? Perhaps (in 1925); Doke is likely to have devised them. They are just as novel today as they were eighty years ago; I well remember how astonished you and I were, looking over the text. I was astonished because I hadn't seen them before. That does not mean I didn't believe that they weren't worthy of encoding. Just because I hadn't seen them before doesn't mean they don't exist and aren't worthy of encoding either. Khoisian phonology is rather esoteric, after all. > Private use? Be serious, John. That's a pretty ridiculous suggestion. I am serious. The PUA is the proper place for these things. I am gobsmacked. On what grounds are these not characters? They are not glyph representations of other characters. The PRE-PALATAL N is described in terms of its phonology as being neither N nor N WITH LEFT HOOK. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of John Cowan > [T]he Unicode Standard does not encode idiosyncratic, > personal, novel, or private use characters [...]. What about Bell's Visible Speech? (I'm sure I've seen it discussed here on on qalam, but I've no recollection what might have been said.) I don't know what Bell might have published, but they were also used by Sweet: Sweet, Henry. 1906. A primer of phonetics. 3rd edn., revised. Oxford: Clarendon Press. Would you consider these too idiosyncratic? Peter Constable
Re: Bantu click letters
Michael Everson scripsit: > They were published in Bantu Studies in 1925 in an article by a > rather important scholar in the field of African linguistics. We don't encode characters according to the clout of the user, or the Apple logo would have been in Unicode long since. :-) > Effort and expense was made to cut the letters for the publication. And today, if I were reprinting it, I'd commission a digital font (your effort, my expense) and put the characters in the PUA. > The sounds they represent are idiosyncratic and difficult to > describe, much less write. I think that characters used in a single document by a single scholar, however prestigious, can fairly be described as idiosyncratic to him. If I decided to start using thorn instead of theta in my otherwise IPA transcriptions, that would be an idiosyncratic use of it. If instead I used OVERCLOCKED HOOCHIMADINGER SYMBOL, that would be even more idiosyncratic. (LATIN LETTER OWL, indeed.) > Personal? No: he published. Fair enough. > Novel? Perhaps > (in 1925); Doke is likely to have devised them. They are just as novel today as they were eighty years ago; I well remember how astonished you and I were, looking over the text. > Private use? Be > serious, John. That's a pretty ridiculous suggestion. I am serious. The PUA is the proper place for these things. -- "May the hair on your toes never fall out!" John Cowan --Thorin Oakenshield (to Bilbo) [EMAIL PROTECTED]
Re: Bantu click letters
At 09:26 -0400 2004-06-10, John Cowan wrote: [T]he Unicode Standard does not encode idiosyncratic, personal, novel, or private use characters [...]. Whatever may have been done in the past, I don't think that one document is enough to support the introduction of new Latin letters; these look extremely idiosyncratic, personal, novel and private use to me. They were published in Bantu Studies in 1925 in an article by a rather important scholar in the field of African linguistics. Effort and expense was made to cut the letters for the publication. The sounds they represent are idiosyncratic and difficult to describe, much less write. Personal? No: he published. Novel? Perhaps (in 1925); Doke is likely to have devised them. Private use? Be serious, John. That's a pretty ridiculous suggestion. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
Michael Everson scripsit: > Proposal to add Bantu phonetic click characters to the UCS > http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf [T]he Unicode Standard does not encode idiosyncratic, personal, novel, or private use characters [...]. Whatever may have been done in the past, I don't think that one document is enough to support the introduction of new Latin letters; these look extremely idiosyncratic, personal, novel and private use to me. -- All Norstrilians knew what laughter was:John Cowan it was "pleasurable corrigible malfunction".http://www.reutershealth.com --Cordwainer Smith, Norstrilia [EMAIL PROTECTED]
RE: Bantu click letters
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Michael Everson > >I had not proposed ones I know of before now as I expected they'd be > >about as well received as the two symbols created by Doke that I > >proposed last summer: the s and z with swash tail (they were not > >accepted at that time). > > Heavens, really? The bilabials? Desmond Cole discusses them (and > shows them) in his article "The History of African Linguistics to > 1945" in Current Trends in Linguistics, Vol. 7, Linguistics in > Sub-Saharan Africa. Are you serious? I missed that! In the very same volume (p. 648), the z (but not the s) is cited in A.N. Tucker's article "Orthographic systems and conventions in Sub-Saharan Africa." Peter Constable
RE: Bantu click letters
Heh. Of course despite the fact that Doke published in Bantu Studies, chu: (Kxoe, SIL code XUU) is a Khoisian language. I'll be changing the title of the document, though for the purposes of discussion, it would be best not to change the title of this thread. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
At 00:30 -0700 2004-06-10, Peter Constable wrote: I had not proposed ones I know of before now as I expected they'd be about as well received as the two symbols created by Doke that I proposed last summer: the s and z with swash tail (they were not accepted at that time). Those are also both used in N. V. Jushmanov, "Foneticheskie paralleli afrikanskix i jafeticheskix jazykov", in Africana (Transations of the section of African languages), Moskva: Izdatel´stvo Akademii Nauk SSSR, 1937. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
At 00:30 -0700 2004-06-10, Peter Constable wrote: !! I had not assumed that we would encode symbols attested in single publications. I am CERTAIN that we have many characters which were encoded with only one citation in the proposal. I know there are several more idiosyncratic phonetic symbols out there; As do we all. I err on the side of generosity in encoding. I had not proposed ones I know of before now as I expected they'd be about as well received as the two symbols created by Doke that I proposed last summer: the s and z with swash tail (they were not accepted at that time). Heavens, really? The bilabials? Desmond Cole discusses them (and shows them) in his article "The History of African Linguistics to 1945" in Current Trends in Linguistics, Vol. 7, Linguistics in Sub-Saharan Africa. In that article it suggests that Doke's use of those symbols could have been inspired by Daniel Jones' 1911 pamphlet on Chindau (which I have not seen). -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Bantu click letters
At 00:11 -0400 2004-06-10, Ernest Cline wrote: > [Original Message] From: Michael Everson <[EMAIL PROTECTED]> Practice your tongue-twisting. Proposal to add Bantu phonetic click characters to the UCS http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf Why wouldn't U+1D4AC MATHEMATICAL SCRIPT CAPITAL Q work for the script capital Q? At the very least I feel that should be explained. It was understood that the mathematical symbols were not to be used in language text. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Bantu click letters
At 23:30 -0400 2004-06-09, Mark E. Shoulson wrote: On the last page, the word spelled approximately n®ª? is translated as "to roast" when in fact that is approximately nwi (with a different n). the n®ª? word means "bow." Error corrected. I hadn't submitted the document to WG2 and UTC yet. -- Michael Everson * * Everson Typography * * http://www.evertype.com