Re: Medieval CJK race-horse names (was Re: Bantu click letters )

2004-06-11 Thread Simon Montagu
John Jenkins wrote:
And the proper solution for the race horse problem is for the People's 
Hong Kong Jockey Club to refuse to let a horse race unless its name is 
in Unicode.  :-)
Wouldn't that be like putting the cart before the horse?


Re: Medieval CJK race-horse names (was Re: Bantu click letters )

2004-06-11 Thread Kenneth Whistler

> On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote:
> 
> > Depite the oft-mentioned cutesy Hong Kong race horse names, 
> > idiosyncratic
> > invented Han ideographs are a negligible component of the encoded CJK
> > repertoire. In my opinion there are thousands, possibly tens of 
> > thousands, of
> > ideographs that should not really have been encoded individually as 
> > they are
> > simply minor glyph variants, frequently only attested in a single 
> > source because
> > the author simply wrote the character wrongly in the first place. This 
> > is the
> > real issue with the over-encoding of CJKV, not the occasional race 
> > horse name.

> 
> In particular, the decision to import en masse the repertoire of the 
> Hanyu Da Zidian was not a wise one, as a substantial number of the 
> entries are of the form "same as X".

Andrew and John have correctly identified the bulk of the problem
for CJKV overencoding.

Unfortunately, given the nature of the Han script and the
historical practice of Chinese lexicography, the result we
have ended up with is almost inevitable.

This historic mistakes, minor glyph variants, and such got
carried into scholastic compendia *as characters*, where they
become lexical headwords, repeated ad infinitum, in each
further edition and each new compendium. The fact that they
got carried into the Hanyu Da Zidian, the Chinese moral
equivalent of the Oxford English Dictionary, means that
inevitably they end up in the character encoding, as digital
representation of the Hanyu Da Zidian is absolutely required.
Leaving some out, no matter how mistaken or obsolete, would,
from the Chinese point of view be like deciding to leave
some obsolete word out of the OED simply because there
wasn't a "character" encoded for it.

It would have been nice if a better mechanism for expressing
Han glyphic (and other types of) variants had been feasible
and in place before CJK Extension B went in, but that is
water under the bridge now. One can only hope that some
restraint and use of alternative mechanisms will be shown
in the current effort to define and encode additional CJK
extensions, which involve even *less* useful characters, for
the most part, missed even by the major dictionary compendia.

--Ken




Re: Medieval CJK race-horse names (was Re: Bantu click letters )

2004-06-11 Thread John Jenkins
On Jun 11, 2004, at 1:20 PM, Kenneth Whistler wrote:
It would have been nice if a better mechanism for expressing
Han glyphic (and other types of) variants had been feasible
and in place before CJK Extension B went in, but that is
water under the bridge now. One can only hope that some
restraint and use of alternative mechanisms will be shown
in the current effort to define and encode additional CJK
extensions, which involve even *less* useful characters, for
the most part, missed even by the major dictionary compendia.
FWIW, I was able to give my demo on variation selectors on Han in 
Chengdu after all, and I think it made the appropriate impression.


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/



Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread Peter Kirk
On 11/06/2004 10:51, James Kass wrote:
...
Doesn't this mean that it isn't possible to stack a combining circumflex
above a combining spanning inverted breve?  Does this mean we'd need 
double-wide clones of all the combining marks in order to support such
combos?
 

Sounds like the same problem we found with Hebrew nearly a year ago, and 
solved by inserting CGJ to keep the non-canonical order which we needed. 
Perhaps this is another suitable application for CGJ.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/



RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread Kenneth Whistler
> Peter Constable wrote,
> 
> > Don't forget canonical equivalence (I forgot about this as well): the
> > double-width diacritics have a combining class of 234 rather than 230.
> > This means that 0251 0361 0302 028A is canonically equivalent to 0251
> > 0302 0361 028A. Therefore, the first (for better or worse) should appear
> > just the way Doulos SIL renders it.

> Sure enough!  Thanks.  I didn't even think to check the combining class,
> both were marks above.
> 
> Doesn't this mean that it isn't possible to stack a combining circumflex
> above a combining spanning inverted breve?  Does this mean we'd need 
> double-wide clones of all the combining marks in order to support such
> combos?

Actually, no. The UTC has had discussions about this. The whole
issue of how to display accents *above* a combining double diacritic
(or for that matter *below* a combining double diacritic below)
was debated at some length on the list last year -- I expect that
a search of the archives would turn it up.

In any case, the addition of U+034F COMBINING GRAPHEME JOINER,
and the recent refinement of the definition of combining character
sequence to explicitly allow ZWJ and ZWNJ, gives you a text
mechanism for blocking what would otherwise result in a canonical
reordering for such sequences.

Thus:

<0251, 0361, 0302, 028A>

is canonically equivalent to:

<0251, 0302, 0361, 028A>

and both should result in the same display, with the circumflex
over the "a" and the ligature tie spanning both base characters,
*over* the circumflex.

But:

<0251, 0361, 034F, 0302, 028A> or
<0251, 0361, 200D, 0302, 028A>

are *not* canonically equivalent to:

<0251, 0302, 034F, 0361, 028A> or
<0251, 0302, 200D, 0361, 028A>

And they should, in principle, at least, result in a display with
the circumflex positioned *above* the ligature tie and with respect
to it, rather than above the "a" and with respect to it.

This is the same principle which is being used to enable textual
distinctions for certain combinations of Hebrew points and accents,
for example, which would otherwise be reordered into
undesirable orders by any normalization process.

Whether any existing rendering engine will do a decent job of
implementing that, I don't actually know.

--Ken





RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread James Kass


Peter Constable wrote,

> Don't forget canonical equivalence (I forgot about this as well): the
> double-width diacritics have a combining class of 234 rather than 230.
> This means that 0251 0361 0302 028A is canonically equivalent to 0251
> 0302 0361 028A. Therefore, the first (for better or worse) should appear
> just the way Doulos SIL renders it.

and later wrote,

> 
> That rule applies to combining marks in the *same* canonical combining 
> class. In this case, they are in different classes. 

Sure enough!  Thanks.  I didn't even think to check the combining class,
both were marks above.

Doesn't this mean that it isn't possible to stack a combining circumflex
above a combining spanning inverted breve?  Does this mean we'd need 
double-wide clones of all the combining marks in order to support such
combos?

(Well, at least I can give up on trying to make it display right here.)

Best regards,

James Kass




RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of James Kass


> Hmmm.  Further on the inside-out rule.  Note the following pairs, which
> are supposed to be in UTF-8:
> 
> aÌ"Ì^ aÌ^Ì"
> uÌ"Ì^ uÌ^Ì"

[Why isn't UTF-8 coming through as such?]

 
> The first "a" with combiners isn't displaying correctly here, it should
> have the diaeresis above the macron, just like the first "u".

This is a known problem in Uniscribe.



Peter Constable




RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of James Kass


> > Not sure what you are saying here or what you mean by the inside-out
> rule.
> > The two sequences are canonically equivalent and should look
identical.
> 
> The "inside-out" rule is explained and illustrated on page 125 (TUS
4.0).
> 
> An "a" followed by combining umlaut followed by combining macron
> is not the same as "a" plus combining macron plus combining umlaut.

That rule applies to combining marks in the *same* canonical combining
class. In this case, they are in different classes.



Peter Constable




Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread John Hudson
[EMAIL PROTECTED] wrote:
Even with OpenType experimental support here, my display looks like
the GIF you sent.  I'll try fixing this, 

Um, good luck. I am not sure it is possible to correctly position 
double-diacritics with OpenType logic. Specifically, the vertical position 
of the double-diacritic must be adjusted so that it is above the *taller* 
of the preceding and following combining sequence. AFAIK, such logic isn't 
feasible in OpenType.
You could handle it fairly easily by contextually substituting a glyph variant of the 
double-diacritic at a different height.

John Hudson
--
Tiro Typeworkswww.tiro.com
Vancouver, BC[EMAIL PROTECTED]
Currently reading:
Typespaces, by Peter Burnhill
White Mughals, by William Dalrymple
Hebrew manuscripts of the Middle Ages, by Colette Sirat


Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread James Kass

> The "inside-out" rule is explained and illustrated on page 125 (TUS 4.0). 
> 
> An "a" followed by combining umlaut followed by combining macron 
> is not the same as "a" plus combining macron plus combining umlaut. 


Hmmm.  Further on the inside-out rule.  Note the following pairs, which 
are supposed to be in UTF-8:

ā̈ ǟ
ṻ ǖ

The first "a" with combiners isn't displaying correctly here, it should
have the diaeresis above the macron, just like the first "u".  I attach
a GIF showing the display using Doulos SIL/BabelPad.

But, this isn't a font problem as this repros with at least one other
font.  Therefore, I think it's a bug in the rendering engine.  It looks
like the rendering engine is doing an unwanted reordering for the
first "a" sequence.

Best regards,

James Kass
<>

Re: Medieval CJK race-horse names (was Re: Bantu click letters )

2004-06-11 Thread John Jenkins
On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote:
Depite the oft-mentioned cutesy Hong Kong race horse names, 
idiosyncratic
invented Han ideographs are a negligible component of the encoded CJK
repertoire. In my opinion there are thousands, possibly tens of 
thousands, of
ideographs that should not really have been encoded individually as 
they are
simply minor glyph variants, frequently only attested in a single 
source because
the author simply wrote the character wrongly in the first place. This 
is the
real issue with the over-encoding of CJKV, not the occasional race 
horse name.

In particular, the decision to import en masse the repertoire of the 
Hanyu Da Zidian was not a wise one, as a substantial number of the 
entries are of the form "same as X".

Using variation selectors with Han is really the proper solution for 
that kind of thing.  Nonce Latin forms such as experimental notations 
would probably best be handled via the PUA.

And the proper solution for the race horse problem is for the People's 
Hong Kong Jockey Club to refuse to let a horse race unless its name is 
in Unicode.  :-)


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jhjenkins/



Re: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread James Kass

Bob Hallissy wrote,

> >Even with OpenType experimental support here, my display looks like 
> >the GIF you sent. I'll try fixing this, 
> 
> Um, good luck. I am not sure it is possible to correctly position 
> double-diacritics with OpenType logic. Specifically, the vertical position 
> of the double-diacritic must be adjusted so that it is above the *taller* 
> of the preceding and following combining sequence. AFAIK, such logic isn't 
> feasible in OpenType. 
> 

> > 
> >Following the "inside-out" rule, the first sequence should render 
> >correctly, the second sequence should not. 
> 
> Not sure what you are saying here or what you mean by the inside-out rule. 
> The two sequences are canonically equivalent and should look identical. 

The "inside-out" rule is explained and illustrated on page 125 (TUS 4.0).

An "a" followed by combining umlaut followed by combining macron
is not the same as "a" plus combining macron plus combining umlaut.

So, I'd expect that entering a combiner before the spanning character would
render the combiner below the spanning character, while reversing this order
would render the combiner above the spanning character.  Is this not the
case?

As you suggest for the double-wide combiners, this turns out not to 
be an easy fix.  So far, I'm unsuccessful in getting a good display.  I'll
have to double-check everything in GDEF and GPOS to make sure I'm
doing it right, but, it may simply not be possible yet.

Best regards,

James Kass

 



RE: Bantu click letters

2004-06-11 Thread Peter Constable








> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On

> Behalf Of James Kass

 

 

> > > U+0251 U+0361 U+0302 U+028A as given by
BabelMap+Code2000 (see

> > > attached) is not productively different from U+0251

> > > U+0302 U+0361 U+028A (see attached)...

> 

> Following the "inside-out" rule, the first sequence
should render

> correctly,

 

Don't forget canonical equivalence (I forgot about this as well): the
double-width diacritics have a combining class of 234 rather than 230. This
means that 0251 0361 0302 028A is canonically equivalent to 0251 0302 0361 028A.
Therefore, the first (for better or worse) should appear just the way Doulos
SIL renders it.

 

The only way to stack a diacritic on top of a double-width diacritic is
to use another double-width diacritic. (Unfortunately, that wasn’t
anticipated when Doulos SIL was being developed. Wouldn’t have been hard
to support, though.)

 

 

 

Peter Constable








Shavian (was: "Re: Bantu click letters")

2004-06-11 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.11, 06:25, Doug Ewell <[EMAIL PROTECTED]> wrote:

> I concede that "Androcles and the Lion" was the only book published
> in Shavian

Check < http://katalogo.uea.org/index.php?inf=5522 > for one more. At
least its last chapter is fully in shavian script; shavian letters are
introduced gradually from chap.2 (cap.1 fully in latin script) as an
attempt to enable smooth learning -- didn't work too wel for me though.
(And no, no specific Esperanto extensions for shavian -- just the kind
of grapheme value changes akin to, say, Polish and Welsh "w".)

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread Bob_Hallissy
On 11/06/2004 14:39:48 James Kass wrote:

>-- Original message from "Anto'nio Martins-Tuva'lkin" : 
>--
>> On 2004.06.10, 17:11, I wrote:
>>
>> > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see
>> > attached) is not productively different from U+0251
>> > U+0302 U+0361 U+028A (see attached)...
>>
>> Now attached. (Both GIFs are identical, byte by byte, though I swear
>> I made them separately: click the characters in BabelMap, PrtScr,
>> paste into PhotoShop, crop, resample, save!)
>
>You're getting default positioning only, it looks like your system
>doesn't support OpenType combining diacritic positioning for Latin.
>
>Even with OpenType experimental support here, my display looks like
>the GIF you sent.  I'll try fixing this, 

Um, good luck. I am not sure it is possible to correctly position 
double-diacritics with OpenType logic. Specifically, the vertical position 
of the double-diacritic must be adjusted so that it is above the *taller* 
of the preceding and following combining sequence. AFAIK, such logic isn't 
feasible in OpenType.

>Fonts and rendering systems probably aren't ready for this kind of
>combination yet.

SIL Graphite handles it, but then we don't [yet] have wide-spread 
availability of Graphite-capable applications.

>> > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see
>> > attached) is not productively different from U+0251
>> > U+0302 U+0361 U+028A (see attached)...
>
>Following the "inside-out" rule, the first sequence should render
>correctly, the second sequence should not.

Not sure what you are saying here or what you mean by the inside-out rule. 
The two sequences are canonically equivalent and should look identical.

Bob



Re: Bantu click letters

2004-06-11 Thread James Kass


-- Original message from "Anto'nio Martins-Tuva'lkin" : -- 
> On 2004.06.10, 17:11, I wrote: 
> 
> > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see 
> > attached) is not productively different from U+0251 
> > U+0302 U+0361 U+028A (see attached)... 
> 
> Now attached. (Both GIFs are identical, byte by byte, though I swear 
> I made them separately: click the characters in BabelMap, PrtScr, 
> paste into PhotoShop, crop, resample, save!) 

You're getting default positioning only, it looks like your system
doesn't support OpenType combining diacritic positioning for Latin.

Even with OpenType experimental support here, my display looks like
the GIF you sent.  I'll try fixing this, now that I know there is a
problem.  But, the fix probably won't work on your system because
OpenType Latin positioning support is needed.

Attached is a GIF showing U+0251 U+0361 U+0302 U+028A as it appears
in BabelPad with Doulos SIL.  The Doulos font puts the combining
double wide mark higher, and then the combining circumflex doesn't
overstrike it.

Fonts and rendering systems probably aren't ready for this kind of
combination yet.

> > U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see 
> > attached) is not productively different from U+0251 
> > U+0302 U+0361 U+028A (see attached)... 

Following the "inside-out" rule, the first sequence should render
correctly, the second sequence should not.

As for the combination which uses a combining mark below, that mark
below is either going to apply to a previous mark below, or it is
going to apply to the previously entered base letter.  A mark below
probably can't be configured to apply to a mark above.

So, if we have a combining mark below which is to apply to a span
of two base characters, then we need to have this combining mark added
to the standard as a double-wide combining mark below, as far as I 
can tell.

Best regards,

James Kass


--- Begin Message ---
<><>--- End Message ---
<>

Medieval CJK race-horse names (was Re: Bantu click letters )

2004-06-11 Thread Andrew C. West
On Fri, 11 Jun 2004 03:04:17 +0100, Michael Everson wrote:
> 
> How many people use medieval CJK race-horse-name characters?
> 

Actually, the famous Song dynasty female poet Li Qingzhao (1084-c.1151) invented
a board game (da3 ma3 tu2 in Chinese) which involved racing around a course in
which each square was marked with the name of one of dozens of famous horses
ancient and modern, most of which are written using idiosyncratic ideographs. I
would of thought that Michael of all people would be in favour of encoding
characters used in board games !

Depite the oft-mentioned cutesy Hong Kong race horse names, idiosyncratic
invented Han ideographs are a negligible component of the encoded CJK
repertoire. In my opinion there are thousands, possibly tens of thousands, of
ideographs that should not really have been encoded individually as they are
simply minor glyph variants, frequently only attested in a single source because
the author simply wrote the character wrongly in the first place. This is the
real issue with the over-encoding of CJKV, not the occasional race horse name.

Andrew



PUA - (was: Re: Bantu click letters)

2004-06-11 Thread Christopher Fynn
D. Starner wrote:
John Cowan <[EMAIL PROTECTED]> writes:
 

We must be talking past one another somehow, but I don't understand how.
To represent the text as originally written, I need a digital representation
for each of the characters in it.  Since all I want to do is reprint
the book -- I don't need to use the unusual characters in interchange --
the PUA and a commissioned font seem just perfect to me.
   

But that doesn't work if you're reprinting to XML or HTML, where you can't
rely upon a commissioned font being installed and correctly used. I'm not
even sure you can trust a commissioned font to be installable on the operating
systems of the next few decades.
 

Nor on PUA characters actually being  useable.. See:
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=PUACharsInMSSotware
If there is not some kind of  guarantee that major OS vendors won't grab 
PUA characters for their own purposes, using PUA characters and a 
commissioned font to solve problems like this, is not a workable 
solution in the real world.

- Chris



Re: Bantu click letters

2004-06-10 Thread Asmus Freytag
At 07:41 PM 6/10/2004, Kenneth Whistler wrote:
Yes, it's a scare claim. It is trying to bludgeon the committee
I think the verb in question is inappropriate for the occasion and
for this e-mail exchange. Especially when used in the context of
imputing intention of your opponent which is always a chancy thing to do.
A./

into thinking that their encoding is scholastically incomplete if
it doesn't represent every invented character by every idiosyncratic
scholar creating his or her own conventions out there.




RE: Bantu click letters

2004-06-10 Thread D. Starner
"Mike Ayers" <[EMAIL PROTECTED]> writes:

> > >  I'm not
> > > even sure you can trust a commissioned font to be 
> > installable on the operating
> > > systems of the next few decades.
> 
>   Font support has only improved with time.  What causes you to
> foresee a sharp reversal?

I don't expect a reversal; but if I commissioned a Type-1 font 15 years
ago, I'd have a hard time installing it on a lot of computers nowdays.
Just because OpenType is common now, doesn't mean that everyone will
support it in 20 years.

-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm





Double diacriticals (was: "Re: Bantu click letters")

2004-06-10 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.10, 18:45, Michael Everson <[EMAIL PROTECTED]> wrote:

>> After a "double" diacritical, any further combining character could
>> take as its base the "pair" of spacing characters "under" the said
>> double diacritical, shouldn't it?
>
> I tried that in TextEdit, which is pretty smart, and the second
> diacritic didn't centre over the pair, but rather over the 0251. But
> I guess that's the only choice, and it would be a question of making
> a precomposed glyph.

With six combining double characters (U+035D..U+0362) and a zillion
regular combining characters (101 alone in the U+0330 block), of which
a full dozen would be in realist need, we'd need at the very least
6×12=72 precomposed glyphs.

Isn't the Standard explicit about the positioning of a regular
diacritical after a double one?

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Bantu click letters

2004-06-10 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.10, 21:54, Kenneth Whistler <[EMAIL PROTECTED]> wrote:

> 9. n's with loops <...> I think PUA, markup, or other arbitrary text
> representational mechanisms are sufficient here.

Hm...

U+023D LATIN SMALL LETTER N WITH RIGHT LOOP
as U+0273 U+302D

U+0240 LATIN SMALL LETTER ENG WITH LEFT LOOP
as U+014B U+0325

U+0242 LATIN SMALL LETTER N WITH LEFT LOOP
as U+006E U+0325

U+0245 LATIN SMALL LETTER N WITH LEFT HOOK AND RIGHT LOOP
as U+0272 U+302D

U+0248 LATIN SMALL LETTER N WITH LEFT LOOP AND RIGHT LOOP
as U+0273 U+0325 U+302D

Perhaps not bad for a kludge...?

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Bantu click letters

2004-06-10 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.10, 17:11, I wrote:

> U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000 (see
>  attached) is not productively different from U+0251
> U+0302 U+0361 U+028A (see  attached)...

Now attached. (Both GIFs are identical, byte by byte, though I swear
I made them separately: click the characters in BabelMap, PrtScr,
paste into PhotoShop, crop, resample, save!)

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|<><>

Re: Bantu click letters

2004-06-10 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.10, 20:50, Asmus Freytag <[EMAIL PROTECTED]> wrote:

> In at least one case I suspect that a character named 'script' was
> actually intended for an *italic* shape.

In principle, all "holes" in the ranges
   U+1D434..U+1D49D,
   U+1D608..U+1D66F,
   U+1D6E2..U+1D755 and
   U+1D790..U+1D7C9
(those with italics style) correspond to already encoded "letter like"
characters with italics style.

These are... hm, only U+1D455 -- which points to U+210E : PLANCK
CONSTANT (which name does not include a misused "script")...

I'd expect U+212F : SCRIPT SMALL E, but is is refered at the non
existent U+1D4BA, from the script block, not italics.

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Bantu click letters

2004-06-10 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.10, 22:35, Asmus Freytag <[EMAIL PROTECTED]> wrote:

> the fact that you are not conversant with mathematical notation, but
> very familiar with linguistic notations, makes you treat these two
> as worlds apart. <...> In that way, both are different from regular
> 'language text'

What is the difference between "language text" and "linguistic
notation"? After all, the characters under discussion *could* have
been adopted as the usual orthography of a writing community...

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Bantu click letters

2004-06-10 Thread Kenneth Whistler
Michael,

And now you are answering arguments with irrelevancies.

> >But the argument in this particular case hinges on a particular,
> >nonce set of characters.
> 
> You use "nonce" very easily.

Nonce: Occurring, used, or made only once or for a special occasion.

You can, of course, quibble that this should be applied to only
a single *token* of a character, but I think it applies fairly
to the situation we are talking about: a single scholar's invention
that developed no community of use, so saw no application beyond
that one person's usage.


> That they did not *adopt* them as standard representations does not 
> mean that there is no need to *use* them in interchangeable text.

The case to standardize them for use in interchange is different
from the case to make a particular orthography in a particular
(small) set of documents available online.

> 
> In fairness to Professor Doke, he published from 1925 to at least 
> 1966. Let's see what he did, shall we?

Sure.

> >Well, in terms of requirements, I consider that more than a little
> >cart before the horse. I'd be more sympathetic if someone was
> >actually *trying* to do this and had a technical problem with
> >representing the text accurately for an online edition which was
> >best resolved by adding a dozen character to the Unicode Standard.
> >Then, at least there would be a valid *use* argument to be made,
> >as opposed to a scare claim that 50 years from now someone *might*
> >want to do this and not be able to if we don't encode these
> >characters right now.
> 
> Scare claim? You think I'm making a scare claim about the UCS? Our 
> visions of "universal" must differ rather a lot.

Yes, it's a scare claim. It is trying to bludgeon the committee
into thinking that their encoding is scholastically incomplete if
it doesn't represent every invented character by every idiosyncratic
scholar creating his or her own conventions out there.

I claim that there are limits to what is useful to pursue in
representing every squiggle.

And *my* vision of Universal is that it is a hell of a lot more
important to encode Avestan and Egyptian hieroglyphics, which
have *large*, important literatures and large communities of
users, rather than waste time on a dozen weird phonetic characters
used by one scholar, characters rejected by his field, and not
even significant enough to be listed in the premier work on
phonetic symbol usage today, Pullum & Ladusaw.

Wasting list time and committee time pursuing these things is
*detracting* from the big prizes that need to be attained out
there still, and fighting tooth and nail for Doke's "OWL" character
is a strategic error on your part, undermining the good will and
consensus you need to get the other important things done.
 
> >Right *now* anyone could (if they had the rights) put a version of
> >Dokes online using pdf and an embedded font, and it would be perfectly
> >referenceable for anyone wanting access to the content of the
> >document. True, the dozen or so "weird" characters in the
> >orthography wouldn't have standard encodings, so searching inside
> >the document for them wouldn't be optimal.
> 
> Come clean, Ken. You suggested offline that it would be OK with you 
> for the Khoisan scholars to use Runic MADR or YR to represent the 
> VOICELESS and VOICED RETROFLEX CLICKs. *That* is not UCS philosophy, 
> and it is not good sense.

O.k., *NOW* I'm pissed. If you are going to continue dragging things
back to the Unicode list after I suggested that these discussions
be dealt with offlist to argue out the issues, and THEN misrepresent
my position, do me the courtesy of *quoting* the actual position you
misrepresent:


8. The pitchforks

The etiology of these is unexplained. Dokes may have been
reusing an existing symbol (mathematical or runic or Greek) and
then flipping it for an additional semantic, just as he
apparently created the lateral click character by flipping
the glottal stop.

In any case, again because this is a nonce orthography,
the rationale for creating *new* characters for a
standard encoding of them is weak. As an approximation, it
would make just as much sense to use a psi and inverted psi,
or Runic long branch madr and yr (16D8, 16E6).

Note, in particular, the already approved encoding of
rotated and flipped versions of Greek letters as symbols
used in Ancient Greek musical notation. 1D201, 1D218,
1D21E. A psi and the flipped psi symbol (1D218) would
be sufficient to carry the distinction.

Yeah, yeah, Michael, I know you are going to hit the roof
about such a suggestion, since these symbols used by
Dokes are part of a Latin phonetic orthography, and
are not Runic or Greek. So spare us the detour into
that lecture. My point is that given the unproductive
nature of Doke's experiment here, and given that the conventions
did *not* catch on to become part of any user community of
Latin phonetic practice, there is no burning need to actually
extend the *standard* list of Latin letters merely 

RE: Bantu click letters

2004-06-10 Thread Peter Constable
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Mike Ayers

> Reprinting the book brings with it the potential for its special 
> characters to gain currency, even if only in the context of discussing

> the book.

Um, Mike, let's get real. Linguists have had 80 years of opportunity
during which Doke's writings have been accessible and Khoisan phonology
has held at least some measure of interest, if for no other purpose than
as material used by phonetics teachers and authors of books that teach
phonetics who need to provide comprehensive coverage of phonetic symbols
for the world's speech sounds. And during that time, they have *not*
been using these symbols of Doke's for any purpose. A reprint of his
1926 book isn't going to suddenly change that.


Peter Constable




RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Michael Everson


> Mark, come on. Doke's phonetic transcription of !Xung is a set of
> explicit glyphs representing specific sounds, indeed more precisely
> than IPA allows (I don't think IPA specifies a representation for
> retroflex clicks).

You've said that several times, as though having a bunch of distinct
atomic symbols for close transcription were a good thing. Actually, the
IPA is built on a principle of phonemic representation. I believe that
atomic characters are never added for distinctions that are never
phonemic. Where close allophonic transcription is desired, diacritics
are used. So, for instance, the symbol for pre-palatal n wouldn't be
added to IPA since there's no language that has a phonemic contrast
between palatal nasal and pre-palatal nasal. If a linguist wants to
indicate the forward position, 031F can be combined with 0272. If a
linguist needs to explain *really* close details on consonants, then
they resort to face diagrams (as Doke used in the samples you provided),
palatographs, x-ray cinematographs or the like.

This is not an argument against encoding these characters, though. It is
simply pointing out that statements like "more precisely than IPA" do
not constitute an argument in favour of encoding. The fact that after 80
years there are no conventional symbols for pre-palatal nasals speaks to
the value and necessity of having symbols with such precise meanings.


Peter Constable





Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 18:48 -0700 2004-06-10, Mark Davis wrote:
There are two reasons we might not encode a particular image as a 
character.  I had said:

Many images are not appropriate for use in plain text, or have too
small a user community.
That is, you need to have something that is appropriate for use in plain text
*and* have a significant user community.
"Significant"? How many people use medieval CJK race-horse-name characters?
As far as I have seen from the email, there is no real evidence for 
a user community. If a character only occurs in a couple of works, 
means there is simply not the utility in encoding it; PUA is the 
right choice.
I don't like shifting goalposts. We have encoded many characters 
which are extremely rare.

There is a much larger set of documents containing the Prince icon, 
but we don't want to encode that either!
The Prince icon is a LOGO, Mark, and is out of scope by definition.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 18:10 -0700 2004-06-10, Kenneth Whistler wrote:
But the argument in this particular case hinges on a particular,
nonce set of characters.
You use "nonce" very easily.
We have this one scholar, who invented a bunch of characters in the 
20's to represent click sounds that nobody was doing justice to at 
that point, either in understanding their phonetics or making 
sufficiently accurate distinctions in their recording. Bully for 
Dokes -- it was an important advance in the field of Khoisan studies 
and the phonetics of clicks. But even though he published his 
analysis, using his characters, nobody else chose to adopt his 
character conventions.
That they did not *adopt* them as standard representations does not 
mean that there is no need to *use* them in interchangeable text.

In fairness to Professor Doke, he published from 1925 to at least 
1966. Let's see what he did, shall we?

It comes down then to a *prospective* claim that someone *might*
want to digitize the classic Dokes publication and that if they
did so they would require that the particular set of weird
phonetic letters used by Dokes would have to be representable
in Unicode plain text in order for that one publication to be
made available electronically. (Or a few other publications that
might cite Dokes verbatim, of course.)
It seems reasonable to suppose that such might be the case.
Well, in terms of requirements, I consider that more than a little
cart before the horse. I'd be more sympathetic if someone was
actually *trying* to do this and had a technical problem with
representing the text accurately for an online edition which was
best resolved by adding a dozen character to the Unicode Standard.
Then, at least there would be a valid *use* argument to be made,
as opposed to a scare claim that 50 years from now someone *might*
want to do this and not be able to if we don't encode these
characters right now.
Scare claim? You think I'm making a scare claim about the UCS? Our 
visions of "universal" must differ rather a lot.

Right *now* anyone could (if they had the rights) put a version of
Dokes online using pdf and an embedded font, and it would be perfectly
referenceable for anyone wanting access to the content of the
document. True, the dozen or so "weird" characters in the
orthography wouldn't have standard encodings, so searching inside
the document for them wouldn't be optimal.
Come clean, Ken. You suggested offline that it would be OK with you 
for the Khoisan scholars to use Runic MADR or YR to represent the 
VOICELESS and VOICED RETROFLEX CLICKs. *That* is not UCS philosophy, 
and it is not good sense.

But I don't hear people yelling about the online Unicode Standard is 
crippled for use by people who wish to refer to it because you can't 
do an automated search for  in it which will accurately find 
all instances of Devanagari ksha in the text.
KA + VIRAMA + SSA. Works every time, if you are using Unicode.
Finally, if someone actually wants to do a redacted publication of 
Dokes for its *content*, as opposed its orthographic antiquarian 
interest, it is perfectly possible to do so with an updated set of 
orthographic conventions that would make it more accessible to 
people used to modern IPA usage.
Many Uralicists prefer IPA today, but the baroque weirdness of UPA 
usage was encoded in order to allow them to cite original forms. 
Whether they also transcribe UPA into IPA is a different question.

Usability of published or republished documents is not limited to 
slavish facsimile reproduction of their orginal form -- for that we 
have facsimiles. :-) I love Shakespeare, but I don't have to read 
his plays with long ess's and antique typefaces.
Face is irrelevant. And the long ess is encoded for those who need or 
want to use it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Bantu click letters

2004-06-10 Thread Mark Davis
There are two reasons we might not encode a particular image as a character.  I
had said:

>Many images are not appropriate for use in plain text, or have too
small a user community.

That is, you need to have something that is appropriate for use in plain text
*and* have a significant user community. As far as I have seen from the email,
there is no real evidence for a user community. If a character only occurs in a
couple of works, means there is simply not the utility in encoding it; PUA is
the right choice. There is a much larger set of documents containing the Prince
icon, but we don't want to encode that either!

Mark
__
http://www.macchiato.com
â à â

- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thu, 2004 Jun 10 17:00
Subject: Re: Bantu click letters


> At 15:34 -0700 2004-06-10, Mark Davis wrote:
> >This argument does not hold water. Simply because some images appear
> >in some documents does not mean that they automatically should be
> >represented as encoded characters. Many images are not appropriate
> >for use in plain text, or have too small a user community. They
> >should be represented as private use characters, or as literal
> >images. The Prince glyph, on-beyond-zebra characters,  the images on
> >images on http://www.aperfectworld.org/animals.htm, etc. are in
> >quite a number of documents, but that doesn't mean that any of them
> >necessarily qualify as characters for encoding.
>
> Mark, come on. Doke's phonetic transcription of !Xung is a set of
> explicit glyphs representing specific sounds, indeed more precisely
> than IPA allows (I don't think IPA specifies a representation for
> retroflex clicks). Apart from the question whether or not the
> characters are important enough for people to want to be able to
> interchange them as encoded UCS characters (which is stipulated as a
> question), it's just not on to say that these are the same kinds of
> things as Prince's logo or the Seussian extensions.
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>
>




RE: Bantu click letters

2004-06-10 Thread Mike Ayers
Title: RE: Bantu click letters






> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Mark Davis
> Sent: Thursday, June 10, 2004 3:35 PM


>  The Prince glyph, on-beyond-zebra 
> characters,  the images on
> images on http://www.aperfectworld.org/animals.htm, etc. are 
> in quite a number
> of documents, but that doesn't mean that any of them 
> necessarily qualify as
> characters for encoding.


    ...because none of them have ever been used as characters?  Really, I'm quite surprised at having to mention this distinction.

> From: "D. Starner" <[EMAIL PROTECTED]>
> Sent: Thu, 2004 Jun 10 13:46


> > John Cowan <[EMAIL PROTECTED]> writes:
> >
> > > We must be talking past one another somehow, but I don't 
> understand how.
> > > To represent the text as originally written, I need a 
> digital representation
> > > for each of the characters in it.  Since all I want to do 
> is reprint
> > > the book -- I don't need to use the unusual characters in 
> interchange --
> > > the PUA and a commissioned font seem just perfect to me.


    I don't think "all I want to do is reprint the book" is a reasonable constraint upon future usage.  Reprinting the book brings with it the potential for its special characters to gain currency, even if only in the context of discussing the book.

> >  I'm not
> > even sure you can trust a commissioned font to be 
> installable on the operating
> > systems of the next few decades.


    Font support has only improved with time.  What causes you to foresee a sharp reversal?



/|/|ike





Re: Bantu click letters

2004-06-10 Thread Kenneth Whistler

> > Simply because some images appear in some
> > documents does not mean that they automatically should be 
> > represented as encoded
> > characters.
> 
> These aren't images. They're clearly letters; they occur in running texts and 
> represent
> the sounds of a spoken language. 

Well, I agree with that assessment. 

> If I were transcribing them, I wouldn't encode them 
> as pictures; I would encode them as PUA elements or XML elements (which are usually
> more easier to use and more reliable than the PUA). 

And with that assessment, as well.

> I'll admit that it's a bit sketchy encoding these characters based on one article by
> one author. But I think it important to remember that more and more text is available
> online, even stuff that might never get reprinted in hardcopy, and that needs 
> Unicode.

And in generally, I can't find fault with that, either.

But the argument in this particular case hinges on a particular,
nonce set of characters. We have this one scholar, who invented
a bunch of characters in the 20's to represent click sounds that nobody
was doing justice to at that point, either in understanding their
phonetics or making sufficiently accurate distinctions in their
recording. Bully for Dokes -- it was an important advance in the
field of Khoisan studies and the phonetics of clicks. But even
though he published his analysis, using his characters, nobody
else chose to adopt his character conventions. Subsequent scholars,
and the IPA, chose *other* characters to represent the distinctions
involved, in part because Dokes' inventions were just weird and
hard to use, as well as neither (in my opinion) mnemonic nor
aesthetically pleasing.

Well, we've encoded ugly letters for ugly orthographies in ugly
scripts before. That isn't the issue. But the non-use of these
forms is.

It comes down then to a *prospective* claim that someone *might*
want to digitize the classic Dokes publication and that if they
did so they would require that the particular set of weird
phonetic letters used by Dokes would have to be representable
in Unicode plain text in order for that one publication to be
made available electronically. (Or a few other publications that
might cite Dokes verbatim, of course.)

Well, in terms of requirements, I consider that more than a little
cart before the horse. I'd be more sympathetic if someone was
actually *trying* to do this and had a technical problem with
representing the text accurately for an online edition which was
best resolved by adding a dozen character to the Unicode Standard.
Then, at least there would be a valid *use* argument to be made,
as opposed to a scare claim that 50 years from now someone *might*
want to do this and not be able to if we don't encode these
characters right now.

Right *now* anyone could (if they had the rights) put a version of
Dokes online using pdf and an embedded font, and it would be perfectly
referenceable for anyone wanting access to the content of the
document. True, the dozen or so "weird" characters in the
orthography wouldn't have standard encodings, so searching inside
the document for them wouldn't be optimal. But is the burden that
might place on the dozen or so Khoisan orthographic historians and
phonetic historians who might actually be interested in doing so
out of scale with the burden placed permanently on the standard
itself for adding a dozen or so nonce characters for that *one*
document? After all those historians and scholars today are
basically using the document in its printed-only (out-of-print)
hard copy format, and we aren't exactly worried about the difficulties
that *that* poses them, now are we?

I might point out at this point that the Unicode Standard itself is
published online using non-standard encodings for many of its
textual examples, simply because of the limitations of FrameMaker
and PDF and fonts and the specialized requirements of citing lots
and lots of characters outside normal text contexts. But I don't
hear people yelling about the online Unicode Standard is crippled for
use by people who wish to refer to it because you can't do an
automated search for  in it which will accurately find all
instances of Devanagari ksha in the text.

And the *database* arguments just don't cut it. If anybody is seriously
going to be using Dokes materials in comparative Khoisan studies,
they will *normalize* the material in their text databases.
After all, this is just one of a large variety of really varied
material, in all kinds of orthographies, and in all levels of
detail and quality. Arguing that making these particular dozen
nonce characters searchable by giving them standard Unicode values
just doesn't cut it for me, because if I were going to do that kind
of work, a significant amount of philological work would be required
to "massage" the data into comparable formats, anyway, and use of
intermediate normalized conventions would not be a problem -- in fact,
it would almost be mandatory.

Finally, if so

Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 15:34 -0700 2004-06-10, Mark Davis wrote:
This argument does not hold water. Simply because some images appear 
in some documents does not mean that they automatically should be 
represented as encoded characters. Many images are not appropriate 
for use in plain text, or have too small a user community. They 
should be represented as private use characters, or as literal 
images. The Prince glyph, on-beyond-zebra characters,  the images on 
images on http://www.aperfectworld.org/animals.htm, etc. are in 
quite a number of documents, but that doesn't mean that any of them 
necessarily qualify as characters for encoding.
Mark, come on. Doke's phonetic transcription of !Xung is a set of 
explicit glyphs representing specific sounds, indeed more precisely 
than IPA allows (I don't think IPA specifies a representation for 
retroflex clicks). Apart from the question whether or not the 
characters are important enough for people to want to be able to 
interchange them as encoded UCS characters (which is stipulated as a 
question), it's just not on to say that these are the same kinds of 
things as Prince's logo or the Seussian extensions.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Bantu click letters

2004-06-10 Thread D. Starner
> But Gutenberg may not care: they mostly (now exclusively?) publish texts
> in the public domain.

We publish anything previously published we can get permission on, but since 
we can't afford to pay for anything, we're primarily public domain. In any
case, we have decades of the Reports of the Bureau of American ethnology
plus many more public domain works of linguistics, so we really don't need to 
ask for more text.

(This is really getting off topic, though.)
-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm





Re: Bantu click letters

2004-06-10 Thread Mark Davis
This argument does not hold water. Simply because some images appear in some
documents does not mean that they automatically should be represented as encoded
characters. Many images are not appropriate for use in plain text, or have too
small a user community. They should be represented as private use characters, or
as literal images. The Prince glyph, on-beyond-zebra characters,  the images on
images on http://www.aperfectworld.org/animals.htm, etc. are in quite a number
of documents, but that doesn't mean that any of them necessarily qualify as
characters for encoding.

Mark
__
http://www.macchiato.com
â à â

- Original Message - 
From: "D. Starner" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thu, 2004 Jun 10 13:46
Subject: Re: Bantu click letters


> John Cowan <[EMAIL PROTECTED]> writes:
>
> > We must be talking past one another somehow, but I don't understand how.
> > To represent the text as originally written, I need a digital representation
> > for each of the characters in it.  Since all I want to do is reprint
> > the book -- I don't need to use the unusual characters in interchange --
> > the PUA and a commissioned font seem just perfect to me.
>
> But that doesn't work if you're reprinting to XML or HTML, where you can't
> rely upon a commissioned font being installed and correctly used. I'm not
> even sure you can trust a commissioned font to be installable on the operating
> systems of the next few decades.
>
> -- 
> ___
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
>
>
>
>




Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 16:24 -0400 2004-06-10, [EMAIL PROTECTED] wrote:
Asmus Freytag scripsit:
 That doesn't mean that we stop asking all the hard questions, but that we
 allow a presumption of usefulness for characters that were in demonstrated
 use over some time and by several authors.
I quite agree.  Here, however, we have (as far as the evidence goes) a
single use by a single author.
Many characters have been encoded with just as much.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Bantu click letters

2004-06-10 Thread D. Starner
> Simply because some images appear in some
> documents does not mean that they automatically should be represented as encoded
> characters.

These aren't images. They're clearly letters; they occur in running texts and represent
the sounds of a spoken language. If I were transcribing them, I wouldn't encode them 
as pictures; I would encode them as PUA elements or XML elements (which are usually
more easier to use and more reliable than the PUA). I don't think any transcriber would
treat them as images (maybe display them as images, but that's purely presentational.)

I'll admit that it's a bit sketchy encoding these characters based on one article by
one author. But I think it important to remember that more and more text is available
online, even stuff that might never get reprinted in hardcopy, and that needs Unicode.
-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm





RE: Bantu click letters

2004-06-10 Thread Asmus Freytag
In light of Ken's reply it's probably not worthwhile going into the details 
on all points of your answer. However there are a few points were, like 
John, I feel you and I are simply talking past each other. Let me pick just 
one item:

At 01:07 PM 6/10/2004, Michael Everson wrote:
In any case -- and I think this is the precedent I am looking for -- this 
is a "script" capital Q in the same way that U+0261 is a script g. It is 
**not** unified with U+210A SCRIPT SMALL G.

It's not a precedent, since the use of the word 'script' has different 
meaning in both cases.

No, it doesn't. Your mathematical "script" has a meaning which is 
different from the one which applies to the IPA [g] and from the one I had 
in mind when I named the character.

The early namers used the term 'script' rather indiscriminantly. For 
example they applied it to 2118 which they called SCRIPT CAPITAL P, even 
though, typographically it's a calligraphic lower case p and would have 
been better called *WEIERSTRASS ELLIPTIC FUNCTION (that is now annotated in 
the names list). Similarly, the character at 2113 so called SCRIPT SMALL L 
is now annotated as

= mathematical symbol 'ell'
* despite its character name, this symbol is derived from a
  special italicized version of the small letter l
since that's what it is. We've in fact had to add a separate MATHEMATICAL 
SMALL SCRIPT L since. Similarly, the letters 0251 for which the Unicode 1.0 
name was LATIN SMALL LETTER SCRIPT A and 0261 are not 'script' forms in the 
same way as used correctly for e.g. 2130, 2131, etc. in the Letterlike 
Symbols block.

The mathematical alphanumerics are simply additional instances of 
letterlike symbols. If we can unify the historic symbol for Mark used in 
Germany with 2133, even though its shape allows less variation than that 
allowed for mathematical script fonts, we can certainly unify other uses 
that are letter-like.

Sometimes I suspect that the fact that you are not conversant with 
mathematical notation, but very familiar with linguistic notations, makes 
you treat these two as worlds apart. However, both are specialized 
technical notations, and both share the feature that if you changed the 
font on any letter sufficiently far, you would destroy the meaning.

In that way, both are different from regular 'language text' where you can 
transpose the text into different font styles, and preserve the meaning.

A./





Re: Bantu click letters

2004-06-10 Thread John Cowan
Michael Everson scripsit:

> Unless one contacted whomever it is who owns "Bantu Studies" and 
> simply *asked*.

Carfax (part of the Taylor and Francis Group).

Here's contact information:

Reprints, permissions + electronic rights   

Joanne Nerland
Taylor & Francis
PO Box 2562 Solli
N-0202 Oslo
Norway
+ 47 22 12 9880
or: +47 22 12 9884
Mobile: +47 90 11 3974
+47 22 12 9890

But Gutenberg may not care: they mostly (now exclusively?) publish texts
in the public domain.

-- 
John Cowanhttp://www.ccil.org/~cowan  [EMAIL PROTECTED]
Please leave your valuesCheck your assumptions.  In fact,
   at the front desk.  check your assumptions at the door.
 --sign in Paris hotel   --Cordelia Vorkosigan



Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 13:35 -0800 2004-06-10, D. Starner wrote:
 >> Due to the latest US copyright extensions, it will take us a couple
 >> decades, but we'll want to transcribe this article.
 >
 In 2050.  I wouldn't worry about it.
It's 95 years from publication, so it's 2022. In any case, it's 
entirely likely that some commercial organization will license these 
and start digitially transcribing old linguistics documents for sale 
to libraries. And I hardly see how the issues will change in the 
next 18 years.
Unless one contacted whomever it is who owns "Bantu Studies" and 
simply *asked*.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: Asmus Freytag [mailto:[EMAIL PROTECTED]


> However, sometimes we have single citations where
> we don't believe (for other reasons) that they are the only existing
> ones, just the only ones found so far.

True; I did mention that possibility at some point.

 
> Then there is the issue brought up by D. Starner: is a work
sufficiently
> interesting that digital archivers like Project Gutenberg would be
interested
> in it.

Yes, that would be a consideration.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




Re: Bantu click letters

2004-06-10 Thread jcowan
D. Starner scripsit:

> There's at least a small user community; those people who are actively
> transcribing old works, like Project Gutenberg. Due to the latest US
> copyright extensions, it will take us a couple decades, but we'll want
> to transcribe this article.

In 2050.  I wouldn't worry about it.

-- 
Do what you will,   John Cowan
   this Life's a Fiction[EMAIL PROTECTED]
And is made up of   http://www.reutershealth.com
   Contradiction.  --William Blake  http://www.ccil.org/~cowan



RE: Bantu click letters

2004-06-10 Thread Asmus Freytag
At 12:08 PM 6/10/2004, Michael Everson wrote:
At 11:53 -0700 2004-06-10, Asmus Freytag wrote:
It was understood that the mathematical symbols were not to be used in 
language text.
What was understood is that if you need a run of text in a script font 
you wouldn't use these characters, but would use markup. But if you 
needed an isolated, out of context shape, where the font style has 
semantic meaning, you would use these characters. That's precisely the 
case here.
Not so.
That's a statement, not an argument. Nor does it address my contention that 
the phonetic extensions (all of them) that are styled Latin characters are 
in fact equivalent to mathematical usage in that in both cases you have a 
letter form that carries specific semantics based on what otherwise would 
be font style.


There's no need to have yet another clone.
I disagree. Leave the math characters, please, to the math fonts. For 
instance, the flowery style we use now for the math block is wy to 
italic for harmonization with the use of the character in a phonetic context.
This is a glyphic argument that doesn't hold water. The font you use is 
well within the range of 'script' fonts that can be used for mathematical 
use. In fact our font is not even the best script font of that purpose.

There is nothing magically different about mathematical usage. 
Mathematicians will be happy to use any of the existing phonetic letters if 
and when the fancy strikes them. Now that Unicode is widespread I wouldn't 
be surprised if there weren't any mathematicians already spelunking...

I am also not very happy opening the door to splitting Latin characters 
off into Plane 1.
That's an argument of convenience. The BMP will be full at some point in 
the very near future, and then there will be no choice. Opening the door 
for a historic extension makes a more sense than for a commonly used modern 
orthography.

I will be perfectly happy to rename the character LATIN LETTER VOICED 
PALATOAVEOLAR CLICK. It doesn't have an upper case property anyway.
That's just hiding the issue.
In any case -- and I think this is the precedent I am looking for -- this 
is a "script" capital Q in the same way that U+0261 is a script g. It is 
**not** unified with U+210A SCRIPT SMALL G.
It's not a precedent, since the use of the word 'script' has different 
meaning in both cases. The early namers didn't have your benefit and 
applied these labels haphazardly. Look no further than 2118 !!  In at least 
one case I suspect that a character named 'script' was actually intended 
for an *italic* shape.

A./ 




RE: Bantu click letters

2004-06-10 Thread Asmus Freytag
At 01:04 PM 6/10/2004, Peter Constable wrote:
> That doesn't mean that we stop asking all the hard questions, but that
we
> allow a presumption of usefulness for characters that were in
demonstrated
> use over some time and by several authors.
But it is precisely that status that is called into question here.
Unless your definition of "several" is '>=1'.
I realize that. However, sometimes we have single citations where
we don't believe (for other reasons) that they are the only existing
ones, just the only ones found so far.
Then there is the issue brought up by D. Starner: is a work sufficiently
interesting that digital archivers like Project Gutenberg would be interested
in it. I don't have an opinion on the merits of this particular set of
characters, but I suspect there are many Han characters that equally
represent nonce usage...
A./ 




Re: Bantu click letters

2004-06-10 Thread D. Starner
John Cowan <[EMAIL PROTECTED]> writes:

> We must be talking past one another somehow, but I don't understand how.
> To represent the text as originally written, I need a digital representation
> for each of the characters in it.  Since all I want to do is reprint
> the book -- I don't need to use the unusual characters in interchange --
> the PUA and a commissioned font seem just perfect to me.

But that doesn't work if you're reprinting to XML or HTML, where you can't
rely upon a commissioned font being installed and correctly used. I'm not
even sure you can trust a commissioned font to be installable on the operating
systems of the next few decades.

-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm





Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 16:21 -0400 2004-06-10, [EMAIL PROTECTED] wrote:
 > You don't KNOW that. You assert that. This is the "adversarial" style
 I was objecting to, John. Could you please take this on board?
Fair enough, Michael.  But the burden of going forward with the evidence
is still yours.  (I'll do what I can.)
I have shown (1) that they exist, (2) that they have specific usage. 
I have not shown them in a second document, though I have shown that 
Pullum & Ladusaw have quoted one word in Doke's orthography, spelling 
it with his peculiar use of diacritics. I would be happy to find a 
second use of the letters, but I consider the usefulness of being 
able to cite Doke in the original to be perfectly legitimate. Let's 
see what turns up in NYPL and LOC.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Bantu click letters

2004-06-10 Thread D. Starner
> > Due to the latest US
> > copyright extensions, it will take us a couple decades, but we'll want
> > to transcribe this article.
> 
> In 2050.  I wouldn't worry about it.

It's 95 years from publication, so it's 2022. In any case, it's entirely likely
that some commercial organization will license these and start digitially transcribing
old linguistics documents for sale to libraries. And I hardly see how the issues will
change in the next 18 years.

-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm





RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 12:38 -0800 2004-06-10, D. Starner wrote:
"Peter Constable" <[EMAIL PROTECTED]> writes:
 If
 > the small n with left loop is not accepted, it will be because it was a
 > proposal that never gained currency and has no user community.
There's at least a small user community; those people who are actively
transcribing old works, like Project Gutenberg. Due to the latest US
copyright extensions, it will take us a couple decades, but we'll want
to transcribe this article.
Hence the Universal Character Set and the effort I go to write up 
proposals for this kind of thing.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 16:21 -0400 2004-06-10, [EMAIL PROTECTED] wrote:
 > HETA is on my to-do list. Isn't ANTISIGMA the GREEK CAPITAL REVERSED
 LUNATE SIGMA that's under ballot?
Yes, except these letters are Latin letters (indeed, letters used to
write the Latin language).  You if anyone should be against unifying them
with Greek letters, particularly since they were applied for purposes
very different from those of sigma or heta.
Then I am not sure what you are talking about (I don't know the Latin 
versions of these), but please take this up with me in July. My brain 
is full.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Bantu click letters

2004-06-10 Thread jcowan
Asmus Freytag scripsit:

> That doesn't mean that we stop asking all the hard questions, but that we 
> allow a presumption of usefulness for characters that were in demonstrated 
> use over some time and by several authors.

I quite agree.  Here, however, we have (as far as the evidence goes) a
single use by a single author.

-- 
In politics, obedience and support  John Cowan <[EMAIL PROTECTED]>
are the same thing.  --Hannah Arendthttp://www.ccil.org/~cowan



RE: Bantu click letters

2004-06-10 Thread D. Starner
"Peter Constable" <[EMAIL PROTECTED]> writes:

> If
> the small n with left loop is not accepted, it will be because it was a
> proposal that never gained currency and has no user community.

There's at least a small user community; those people who are actively
transcribing old works, like Project Gutenberg. Due to the latest US
copyright extensions, it will take us a couple decades, but we'll want
to transcribe this article.

-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm





Re: Bantu click letters

2004-06-10 Thread jcowan
Michael Everson scripsit:

> You don't KNOW that. You assert that. This is the "adversarial" style 
> I was objecting to, John. Could you please take this on board?

Fair enough, Michael.  But the burden of going forward with the evidence
is still yours.  (I'll do what I can.)

> But it is QUITE another thing for you to come out 
> and say that there are no other documents which make use of the same 
> characters.

Quite so, and I retract all remarks implying that.

> >In their day, there were probably a lot more documents using LATIN 
> >CAPITAL LETTER ANTISIGMA and LATIN CAPITAL LETTER H LEFT HALF than 
> >one, yet they are not encoded either.
> 
> HETA is on my to-do list. Isn't ANTISIGMA the GREEK CAPITAL REVERSED 
> LUNATE SIGMA that's under ballot?

Yes, except these letters are Latin letters (indeed, letters used to
write the Latin language).  You if anyone should be against unifying them
with Greek letters, particularly since they were applied for purposes
very different from those of sigma or heta.

-- 
Newbies always ask: John Cowan
  "Elements or attributes?  http://www.ccil.org/~cowan
Which will serve me best?"  http://www.reutershealth.com
  Those who know roar like lions;   [EMAIL PROTECTED]
  Wise hackers smile like tigers.   --a tonka, or extended haiku



RE: Some thoughts on encoding specialized notations: was RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: Asmus Freytag [mailto:[EMAIL PROTECTED]

> Any notation for a highly specialized subject would always tend to
suffer
> from a very small number of participants. This is not a-priori a
reason to
> force this notation into private use.

Just to clarify: I have not at any point contended that the characters
in Michael's proposal must be considered PUA. I simply commented that I
had expected something with such little usage would be contested, which
by implication raised the question as to whether these characters should
be encoded in spite of their very limited usage.

In relation to that question, your suggestion

> One of our goals in this direction
> would be to enable publishers to support online editions of a large
number
> of fields without running into a hodge-podge of supported vs.
non-supported
> characters.

seems to me to be worth consideration.


> For historical notations, issues are different. If a modern notations
has
> completely replaced the historical notation, it should be treated the
in
> the same manner as archaic scripts, that is, the focus should be on
what's
> needed or useful to support historians of the discipline. If a
notation was
> widespread before being supplanted, that would strengthen the case for
> supporting it, as the likelihood that symbols will be referenced in
modern
> contexts is that much greater.

In this particular case, the notation was clearly not in widespread use.
The question then is whether it would be useful to linguists or
documenters of the history of linguistics. So far after 80 years, there
is no known indication that linguists have a use for these; Pullum and
Ladusaw were, in part, the latter, and did not find these in need of
documentation. Of course, that does not imply that other documenters
have no need, and there may be linguists for whom these would be useful
that are simply not known to us.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Asmus Freytag

> As a matter of basic parity, I just don't
> see why we take such great pains to standardize extremely rare forms
of Han
> ideographs, but baulk at supporting our own writing system and its
> extensions equally faithfully.

Point taken.


> That doesn't mean that we stop asking all the hard questions, but that
we
> allow a presumption of usefulness for characters that were in
demonstrated
> use over some time and by several authors.

But it is precisely that status that is called into question here.
Unless your definition of "several" is '>=1'.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 12:50 -0700 2004-06-10, Asmus Freytag wrote:
That's a statement, not an argument. Nor does it address my 
contention that the phonetic extensions (all of them) that are 
styled Latin characters are in fact equivalent to mathematical usage 
in that in both cases you have a letter form that carries specific 
semantics based on what otherwise would be font style.
"Style" per se is applied to Mathematical characters, regularly, and 
meaningfully. Just because "script" is part of the name of U+0261 
does not mean that "style" as in HTML markup is what makes it look 
like that; that's just not the case, and you can't read that much 
into the name. It is for the same reason that I chose the name 
"script" when ***I*** named the voiced palatoalveolar click. I 
recognized its shape as similar to some forms of script Q. There are 
other forms of script Q. I did not consider it to be "styled" in the 
same way.

By the way, I *made* the glyph out of U+0541 ARMENIAN LETTER JA, andd 
it looks a lot more like that than a script Q, so PLEEEAASE let's not 
jump overboard on a crusade to unify this character with a 
mathematical character, OK? To do so would be really very silly.

I disagree. Leave the math characters, please, to the math fonts. 
For instance, the flowery style we use now for the math block is 
wy to italic for harmonization with the use of the character in 
a phonetic context.
This is a glyphic argument that doesn't hold water. The font you use 
is well within the range of 'script' fonts that can be used for 
mathematical use. In fact our font is not even the best script font 
of that purpose.
I'm aware of that; I still do not think we should start encouraging 
linguists to go off into the mathematical characters and press them 
into service for phonetics. Letters are letters.

There is nothing magically different about mathematical usage. 
Mathematicians will be happy to use any of the existing phonetic 
letters if and when the fancy strikes them. Now that Unicode is 
widespread I wouldn't be surprised if there weren't any 
mathematicians already spelunking...
Mathematicians can do what they like.
That's an argument of convenience. The BMP will be full at some 
point in the very near future, and then there will be no choice. 
Opening the door for a historic extension makes a more sense than 
for a commonly used modern orthography.
There is no value to unifying this with the maths character just 
because *I* named it that way for reasons which you misconstrue.

I will be perfectly happy to rename the character LATIN LETTER 
VOICED PALATOAVEOLAR CLICK. It doesn't have an upper case property 
anyway.
That's just hiding the issue.
No, it's not. There is nothing particularly Q-like about the 
character in question; it's more JA-like anyway. It was a superficial 
identification I made; had I simply named it VOICED PALATOAVEOLAR 
CLICK, we would probably not be having this conversation.

In any case -- and I think this is the precedent I am looking for 
-- this is a "script" capital Q in the same way that U+0261 is a 
script g. It is **not** unified with U+210A SCRIPT SMALL G.
It's not a precedent, since the use of the word 'script' has 
different meaning in both cases.
No, it doesn't. Your mathematical "script" has a meaning which is 
different from the one which applies to the IPA [g] and from the one 
I had in mind when I named the character.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Asmus Freytag
At 03:47 AM 6/10/2004, Michael Everson wrote:
At 00:11 -0400 2004-06-10, Ernest Cline wrote:
 > [Original Message]
 From: Michael Everson <[EMAIL PROTECTED]>
 Practice your tongue-twisting.
 Proposal to add Bantu phonetic click characters to the UCS
 http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf
Why wouldn't U+1D4AC MATHEMATICAL SCRIPT CAPITAL Q
work for the script capital Q?  At the very least I feel that should
be explained.
It was understood that the mathematical symbols were not to be used in 
language text.
Well this isn't 'language text' as it would be for a modern orthography, 
but specialized notation. I see no reason to rule out a unification in this 
case. The general category of
this character is 'Lu', letter upper. It does *not* have a case mapping to 
/script small q/ but that would be correct as there's not case pair for the 
proposed character.

If this was the case of a modern orthography, where case pairs would be 
needed, and where development of font and usage could take place over time 
in unanticipated directions, then a disunification would make a lot more 
sense and I would support it. But for this very limited and technical 
notation, it appears unwarranted.

A./




Re: Bantu click letters

2004-06-10 Thread Asmus Freytag
At 07:00 AM 6/10/2004, John Cowan wrote:
(LATIN LETTER OWL, indeed.)
This is an interesting symbol as a fairly similar symbol is used in Japan 
to annotate phone numbers - if I correctly understand those that have a 
taped message or automated response system.

We don't have a symbol for the latter in Unicode, but a quick look at 
modern Japanese material finds instances of this quickly.

The problem in adding a letter owl by itself is that it invites incorrect, 
shape based mappings from East Asian sets or fonts. We would be better off 
if we could pair the letter owl with a simultaneous but separate addition 
of a technical symbol.

A./ 




RE: Some thoughts on encoding specialized notations: was RE: Bantu click letters

2004-06-10 Thread Asmus Freytag
At 12:08 PM 6/10/2004, Peter Constable wrote:
> From: Asmus Freytag [mailto:[EMAIL PROTECTED]
> Any notation for a highly specialized subject would always tend to
suffer
> from a very small number of participants. This is not a-priori a
reason to
> force this notation into private use.
Just to clarify: I have not at any point contended that the characters
in Michael's proposal must be considered PUA. I simply commented that I
had expected something with such little usage would be contested, which
by implication raised the question as to whether these characters should
be encoded in spite of their very limited usage.
I think that was someone else..
In relation to that question, your suggestion
> One of our goals in this direction
> would be to enable publishers to support online editions of a large
number
> of fields without running into a hodge-podge of supported vs.
non-supported
> characters.
seems to me to be worth consideration.
I then wrote in the original thread:
To represent the text as originally written, I need a digital representation
for each of the characters in it.  Since all I want to do is reprint
the book -- I don't need to use the unusual characters in interchange --
the PUA and a commissioned font seem just perfect to me.
In the modern world many forms of publication require interchange. For 
example, anything that's HTML based does poorly with non-standardized 
characters. So does storage in databases. If you can conceive of a digital 
re-edition of a prominent work (including citation from) and can assume 
that there's some realistic chance that technologies other than faximile 
or PDF would be brought to bear, then you have the interchange 
requirement, even if noone uses the notation for new text.

Over time, I'm becoming more supportive of Michael's stance of 
inclusiveness in that direction. As a matter of basic parity, I just don't 
see why we take such great pains to standardize extremely rare forms of 
Han ideographs, but baulk at supporting our own writing system and its 
extensions equally faithfully.
but this would belong better as part of this more generic discussion.
> For historical notations, issues are different. If a modern notations
has
> completely replaced the historical notation, it should be treated the
in
> the same manner as archaic scripts, that is, the focus should be on
what's
> needed or useful to support historians of the discipline. If a
notation was
> widespread before being supplanted, that would strengthen the case for
> supporting it, as the likelihood that symbols will be referenced in
modern
> contexts is that much greater.
In this particular case, the notation was clearly not in widespread use.
The question then is whether it would be useful to linguists or
documenters of the history of linguistics. So far after 80 years, there
is no known indication that linguists have a use for these; Pullum and
Ladusaw were, in part, the latter, and did not find these in need of
documentation. Of course, that does not imply that other documenters
have no need, and there may be linguists for whom these would be useful
that are simply not known to us.
These are good questions. But remember, the notation in question is also 
limited in another way: it applies to features not shared by many 
languages. I'm not an expert enough to know whether that adds another level 
of rarity, because it means the potential number of users of these 
characters was always limited, then and now.

But let's consider an extreme, but for now hypothetical example. Assume a 
seminal work, for example comparable to Newton's works, that spawns an 
entire field or discipline. If such a work used notation that was quickly 
replaced by something else, it would still be useful to consider it for its 
historic aspect, even it only one author used it - as the presumption that 
such a work and its notation will be cited or explained by historians is 
clearly quite strong.

A./ 




Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 12:15 -0700 2004-06-10, Asmus Freytag wrote:
Over time, I'm becoming more supportive of Michael's stance of 
inclusiveness in that direction. As a matter of basic parity, I just 
don't see why we take such great pains to standardize extremely rare 
forms of Han ideographs, but baulk at supporting our own writing 
system and its extensions equally faithfully.
Thank you.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Bantu click letters

2004-06-10 Thread Asmus Freytag
At 07:46 AM 6/10/2004, John Cowan wrote:
To represent the text as originally written, I need a digital representation
for each of the characters in it.  Since all I want to do is reprint
the book -- I don't need to use the unusual characters in interchange --
the PUA and a commissioned font seem just perfect to me.
In the modern world many forms of publication require interchange. For 
example, anything that's HTML based does poorly with non-standardized 
characters. So does storage in databases. If you can conceive of a digital 
re-edition of a prominent work (including citation from) and can assume 
that there's some realistic chance that technologies other than faximile or 
PDF would be brought to bear, then you have the interchange requirement, 
even if noone uses the notation for new text.

Over time, I'm becoming more supportive of Michael's stance of 
inclusiveness in that direction. As a matter of basic parity, I just don't 
see why we take such great pains to standardize extremely rare forms of Han 
ideographs, but baulk at supporting our own writing system and its 
extensions equally faithfully.

That doesn't mean that we stop asking all the hard questions, but that we 
allow a presumption of usefulness for characters that were in demonstrated 
use over some time and by several authors.

A./ 




Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 13:50 -0400 2004-06-10, [EMAIL PROTECTED] wrote:
Michael Everson scripsit:
 You have a weird view of the history of phonetics, John. You haven't
 addressed the substantive issue: these are Latin characters used to
 represent sounds which in 1925 could not easily be represented. 
And never have been represented thus since.
You don't KNOW that. You assert that. This is the "adversarial" style 
I was objecting to, John. Could you please take this on board?

It is one thing for me to make a proposal with evidence from one 
document and have it questioned. (I have on many other occasions 
proposed archaic phonetic characters with as much evidence and had 
them accepted, which is one reason I think the grilling is a bit 
gratuitious here.) But it is QUITE another thing for you to come out 
and say that there are no other documents which make use of the same 
characters.

In their day, there were probably a lot more documents using LATIN 
CAPITAL LETTER ANTISIGMA and LATIN CAPITAL LETTER H LEFT HALF than 
one, yet they are not encoded either.
HETA is on my to-do list. Isn't ANTISIGMA the GREEK CAPITAL REVERSED 
LUNATE SIGMA that's under ballot?

 > Indeed, there are click letters like the STRETCHED C
 which did get into IPA and were later deprecated. So you can
 represent the STRETCHED C in chu: as Doke writes it (as do Pullum and
 Ladusaw, using Doke's diacritics as well) but you can't represent
 Doke's other letters? This doesn't make sense.
It makes sense because others used STRETCHED C (and indeed it was
part of the standard for a while), but no one has used OWL before or since.
Prove it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 11:32 -0700 2004-06-10, Peter Constable wrote:

We're talking about the same group of languages. I believe they use
similar orthographies.
 > Also: What about upper case forms?
The uppercase of !xhosa is !Xhosa. Uppercase versions of phonetic
symbols are a concern only if the phonetic symbols gain currency, which
is not the case here.
True. Though it would be fun to draw some of them.
I believe, by the way, that chû: is now written 
!Xung though whether or not the speakers are 
literate and make use of a practical orthography

A couple of errors were corrected in the version 
which is on my web site (including the title); 
that document has been sent to UTC and WG2. In 
the first sentence it suggests that chû: is 
"Kxoe", but on further research this seems to be 
a different language. I think chû: is what the 
Ethnologue calls Kung-Ekoka (in Namibia).
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 11:53 -0700 2004-06-10, Asmus Freytag wrote:
It was understood that the mathematical symbols were not to be used 
in language text.
What was understood is that if you need a run of text in a script 
font you wouldn't use these characters, but would use markup. But if 
you needed an isolated, out of context shape, where the font style 
has semantic meaning, you would use these characters. That's 
precisely the case here.
Not so.
There's no need to have yet another clone.
I disagree. Leave the math characters, please, to the math fonts. For 
instance, the flowery style we use now for the math block is wy 
to italic for harmonization with the use of the character in a 
phonetic context. I am also not very happy opening the door to 
splitting Latin characters off into Plane 1.

I will be perfectly happy to rename the character LATIN LETTER VOICED 
PALATOAVEOLAR CLICK. It doesn't have an upper case property anyway.

In any case -- and I think this is the precedent I am looking for -- 
this is a "script" capital Q in the same way that U+0261 is a script 
g. It is **not** unified with U+210A SCRIPT SMALL G.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Asmus Freytag

It was understood that the mathematical symbols were not to be used in 
language text.

What was understood is that if you need a run of text in a script font you 
wouldn't
use these characters, but would use markup. But if you needed an isolated, 
out of
context shape, where the font style has semantic meaning, you would use 
these characters.

That's precisely the case here. There's no need to have yet another clone.
A./




Some thoughts on encoding specialized notations: was RE: Bantu click letters

2004-06-10 Thread Asmus Freytag
Any notation for a highly specialized subject would always tend to suffer 
from a very small number of participants. This is not a-priori a reason to 
force this notation into private use. One of our goals in this direction 
would be to enable publishers to support online editions of a large number 
of fields without running into a hodge-podge of supported vs. non-supported 
characters.

This issue is squarely faced by mathematicians all the time (in fact, 
mathematicians and linguists are very similar in their voraciousness of 
pressing unrelated or novel symbols into use in extending their notatins to 
new sub-fields).

If a notational extension is very new, and not widely adopted, it makes 
sense holding off on permanently adding characters to support it -- until 
it is more widely established.

For historical notations, issues are different. If a modern notations has 
completely replaced the historical notation, it should be treated the in 
the same manner as archaic scripts, that is, the focus should be on what's 
needed or useful to support historians of the discipline. If a notation was 
widespread before being supplanted, that would strengthen the case for 
supporting it, as the likelihood that symbols will be referenced in modern 
contexts is that much greater.

If occasional use or reference to the historic notation can be documented, 
then it would be more appropriate to treat it like a rare script, or like 
historic additions to modern scripts, which see occasional use.

If there's known ongoing use, or documented recent citations of older 
notation, then it's really a case of modern use of a specialized notation 
and it should be treated like that.

A./



RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Anto'nio Martins-Tuva'lkin

> Something else: What is the usual spelling for these phonemes in
> today's orthography? Clicks in Xhosa and Zulu are spelt nowadays with
> usual Latin letters (c, q, x etc.).

We're talking about the same group of languages. I believe they use
similar orthographies.

 
> Also: What about upper case forms?

The uppercase of !xhosa is !Xhosa. Uppercase versions of phonetic
symbols are a concern only if the phonetic symbols gain currency, which
is not the case here.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




Category of "Mathematic Alphanumeric Symbols" (was: "Re: Bantu click letters")

2004-06-10 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.10, 11:47, Michael Everson <[EMAIL PROTECTED]> answered:

>> Why wouldn't U+1D4AC MATHEMATICAL SCRIPT CAPITAL Q work for the
>> script capital Q?  At the very least I feel that should be
>> explained.
>
> It was understood that the mathematical symbols were not to be used
> in language text.

I though the very same, but U+1D4AC's category is simply "Lu [Letter,
Uppercase]". In fact all math symbols of the 1D400-1D7FF block are
simply Lu or Ll (or Nd) and "can" be used as letters; I have somehow
asumed otherwise.

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Bantu click letters

2004-06-10 Thread jcowan
Michael Everson scripsit:

> You have a weird view of the history of phonetics, John. You haven't 
> addressed the substantive issue: these are Latin characters used to 
> represent sounds which in 1925 could not easily be represented.  

And never have been represented thus since.  In their day, there were
probably a lot more documents using LATIN CAPITAL LETTER ANTISIGMA
and LATIN CAPITAL LETTER H LEFT HALF than one, yet they are not
encoded either.  (Though LATIN CAPITAL LETTER TURNED F is.)

> Indeed, there are click letters like the STRETCHED C 
> which did get into IPA and were later deprecated. So you can 
> represent the STRETCHED C in chu: as Doke writes it (as do Pullum and 
> Ladusaw, using Doke's diacritics as well) but you can't represent 
> Doke's other letters? This doesn't make sense.

It makes sense because others used STRETCHED C (and indeed it was
part of the standard for a while), but no one has used OWL before or since.

-- 
John Cowan  http://www.ccil.org/~cowan  [EMAIL PROTECTED]
Be yourself.  Especially do not feign a working knowledge of RDF where
no such knowledge exists.  Neither be cynical about RELAX NG; for in
the face of all aridity and disenchantment in the world of markup,
James Clark is as perennial as the grass.  --DeXiderata, Sean McGrath



Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 17:11 +0100 2004-06-10, Anto'nio Martins-Tuva'lkin wrote:
What about U+0251 U+0361 U+0302 U+028A ? After a "double" diacritical,
any further combining character could take as its base the "pair" of
spacing characters "under" the said double diacritical, shouldn't it?
I tried that in TextEdit, which is pretty smart, and the second 
diacritic didn't centre over the pair, but rather over the 0251. But 
I guess that's the only choice, and it would be a question of making 
a precomposed glyph.

Note that, U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000
(see  attached) is not productively different from U+0251
U+0302 U+0361 U+028A (see  attached)...
OS X does it correctly. (Though I didn't see your gif.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Michael Everson


> I have an offprint of Doke's article in Bantu Studies. We have noted
> that 70 years later Pullum and Ladusaw cite a word (the word
> stretchedc-h-utildecaronbelow-triangularcolon chu:) in Doke's
> orthography. Isn't that an indication that the work and its
> characters have not been lost to history?

It is, but it's that the stretched C that's been called into question.
There is no question that that character gained currency -- it was
adopted for a time by the IPA; so also did the qp ligature and db
ligature gain currency -- and those have been accepted for encoding. If
the small n with left loop is not accepted, it will be because it was a
proposal that never gained currency and has no user community.

 
> It's a little peculiar to suggest that data has to be printed in two
> books in order to be considered "interchangeable". Books don't
> interchange data between themselves. Users do. ;-)

Books are only indicators of the users; a lack of attestation in books
by anyone besides Doke is suggestive of a lack of a user community. P&L
clearly indicated that these characters were excluded from their
compilation because they never gained currency, and that strongly
suggests a lack of user community.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




Re: Bantu click letters

2004-06-10 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.10, 03:28, Michael Everson <[EMAIL PROTECTED]> wrote:

> Proposal to add Bantu phonetic click characters to the UCS
> http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf

On page 10, Michael askes:

> UTC advice as to the correct encoding of these sequences would be
> welcome.

What about U+0251 U+0361 U+0302 U+028A ? After a "double" diacritical,
any further combining character could take as its base the "pair" of
spacing characters "under" the said double diacritical, shouldn't it?

Note that, U+0251 U+0361 U+0302 U+028A as given by BabelMap+Code2000
(see  attached) is not productively different from U+0251
U+0302 U+0361 U+028A (see  attached)...

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Bantu click letters

2004-06-10 Thread Anto'nio Martins-Tuva'lkin
On 2004.06.10, 15:14, John Wilcock <[EMAIL PROTECTED]> wrote:

> it seems to me that this information could be important for the
> proposal

Something else: What is the usual spelling for these phonemes in
today's orthography? Clicks in Xhosa and Zulu are spelt nowadays with
usual Latin letters (c, q, x etc.).

Also: What about upper case forms?

(And BTW: thanks, Michael, for one more!)
--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|




Re: Bantu click letters

2004-06-10 Thread jcowan
Peter Constable scripsit:

> Would you consider these too idiosyncratic?

No.  The "idio-" in "idiosyncratic" has to do with an individual.
I forgot to point this out earlier, but !Xu phonology isn't idiosyncratic
either -- it's just unusual.  To the !Xu it's the normal thing.

-- 
Is a chair finely made tragic or comic? Is the  John Cowan
portrait of Mona Lisa good if I desire to see   [EMAIL PROTECTED]
it? Is the bust of Sir Philip Crampton lyrical, www.ccil.org/~cowan
epical or dramatic?  If a man hacking in fury   www.reutershealth.com
at a block of wood make there an image of a cow,
is that image a work of art? If not, why not?   --Stephen Dedalus



Re: Bantu click letters

2004-06-10 Thread Patrick Andries
Patrick Andries a écrit :
Michael Everson a écrit :
Practice your tongue-twisting.
Proposal to add Bantu phonetic click characters to the UCS
http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf
:-P

Are these letters used in any other book than Doke's book on Kalahari 
Bushmen ?

P. A.
[PA] I don't think I got a direct answer on these non Bantu clik 
symbols being used in any other book.
If these symbols are indeed used in a single book and by a single 
author, I would put them in the PUA, I don't see any interchange 
requirement to do otherwise. If letters unique to an author may now be 
encoded in Unicode, I have many to propose to the enabling technology 
that Unicode is and people will be free to use them or not.

P.A.





Re: Bantu click letters

2004-06-10 Thread John Cowan
Michael Everson scripsit:

> > > Effort and expense was made to cut the letters for the publication.
> >
> >And today, if I were reprinting it, I'd commission a digital font
> >(your effort, my expense) and put the characters in the PUA.
> 
> Not if you wanted, as an Africanist, to be able to represent the text 
> as it was originally written.

We must be talking past one another somehow, but I don't understand how.
To represent the text as originally written, I need a digital representation
for each of the characters in it.  Since all I want to do is reprint
the book -- I don't need to use the unusual characters in interchange --
the PUA and a commissioned font seem just perfect to me.

> You don't know whether or not they were only used in a single 
> document. You know only that I *own* that single document. You are 
> declaring the characters guilty until proved innocent. That's 
> antagonistic.

I intend no antagonism.

We treat the Phaistos-disk characters as guilty until proven innocent,
for the same reason -- there's only one text.  (It's also true that
we can't interpret them, which is additional evidence against them.)
There's no *point* in encoding the PD characters because they aren't
used in interchange -- see above.

> >If I decided to start using thorn instead of theta in my otherwise
> >IPA transcriptions, that would be an idiosyncratic use of it.
> 
> Plenty of Germanist transcriptions use thorn. In any case, the 
> analogy isn't relevant, as both thorn and theta are encoded and 
> available for use.

I was talking about what it means to be idiosyncratic.  (Not that
either of us need any real instruction on the subject!)

> >(LATIN LETTER OWL, indeed.)
> 
> COMBINING SEAGULL BELOW, indeed.

LATIN LETTER OI, indeed.  :-)

> [OWL] is interesting, by the way. Asmus says it's similar to 
> something the Japanese use for telephone answering machines. I don't 
> know about that, though it looks familar to me. I wonder what Doke's 
> source for it was.

It looks to me the sort of thing that would be easy to reinvent.
Some of my habitual doodles are much like it.

> I was astonished because I hadn't seen them before. That does not 
> mean I didn't believe that they weren't worthy of encoding. Just 
> because I hadn't seen them before doesn't mean they don't exist and 
> aren't worthy of encoding either. Khoisian phonology is rather 
> esoteric, after all.

Sure.  I was addressing the question of the *novelty* of the characters.
If neither you nor I nor anyone else in this community has seen them
before, they are most certainly novel.

> I am gobsmacked. On what grounds are these not characters? They are 
> not glyph representations of other characters. 

They *are* characters.  It's just not useful to encode them, any more
than it's useful to encode most of the scripts in the Conscript Registry.

Find more documents, and the picture changes.  (Find more Phaistos-type
disks, and that picture changes too.)

-- 
If you have ever wondered if you are in hell, John Cowan
it has been said, then you are on a well-traveled http://www.ccil.org/~cowan
road of spiritual inquiry.  If you are absolutely   http://www.reutershealth.com
sure you are in hell, however, then you must be [EMAIL PROTECTED]
on the Cross Bronx Expressway.  --Alan Feuer, NYTimes, 2002-09-20



Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 08:51 -0700 2004-06-10, Patrick Andries wrote:
Not if you wanted, as an Africanist, to be able to represent the 
text as it was originally written.
Could you please explain this, how would using PUA characters 
prevent the text to be represented as it was originally written ?
What would the value of that be? Doke was an important Africanist. 
His characters have specific (very, very specific) phonetic values. 
Why shouldn't a Khoisan database be able to represent these 
characters as written? Why should the PUA be proposed for these?

Some formerly-used click letters are encoded and available for use. 
Why shouldn't these, in principle? Many of the UPA characters are not 
used productively today, but they remain important for citation. As 
does the LATIN SMALL LETTER INSULAR G for that matter, and other 
archaic phonetic characters which have been encoded.

If these symbols are indeed used in a single book and by a single 
author, I would put them in the PUA, I don't see any interchange 
requirement to do otherwise. If letters unique to an author may now 
be encoded in Unicode, I have many to propose to the enabling 
technology that Unicode is and people will be free to use them or 
not.
I have an offprint of Doke's article in Bantu Studies. We have noted 
that 70 years later Pullum and Ladusaw cite a word (the word 
stretchedc-h-utildecaronbelow-triangularcolon chu:) in Doke's 
orthography. Isn't that an indication that the work and its 
characters have not been lost to history?

It's a little peculiar to suggest that data has to be printed in two 
books in order to be considered "interchangeable". Books don't 
interchange data between themselves. Users do. ;-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 07:00 -0700 2004-06-10, Peter Constable wrote:
What about Bell's Visible Speech?
They're on our list. As are i.t.a and the Phonotypy characters. I'll 
bring a lovely Phonotypic text with me to Toronto.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 08:36 -0700 2004-06-10, Peter Constable wrote:
Don't you think the fact that P&L don't show them might suggest that, in
fact, authors today *don't* particularly use them?
Not necessarily. Indeed, they do quote the name chu: with STRETCHED 
C, and with both diacritics, the TILDE for nasalization (which is 
standard) and the CARON BELOW for the rising tone (which is not). So 
Pullum and Ladusaw are *using* Doke's orthography. If they wanted to 
show a different word in that orthography they would have to use one 
of Doke's other letters.

I looked through many publications last year searching for attested 
phonetic symbols not yet encoded, and while my search wasn't 
specifically focused on Africanist usage, I did go through a number 
of Africanist items and never once saw any of these.
Big world, isn't it? There's all those non-Slavic Cyrillic characters 
which haven't turned up again either.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Bantu click letters

2004-06-10 Thread Patrick Andries
Michael Everson a écrit :
At 10:00 -0400 2004-06-10, John Cowan wrote:
And today, if I were reprinting it, I'd commission a digital font
(your effort, my expense) and put the characters in the PUA.

Not if you wanted, as an Africanist, to be able to represent the text 
as it was originally written.
Could you please explain this, how would using PUA characters prevent 
the text to be represented as it was originally written ?

P. A.



Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 11:00 -0400 2004-06-10, John Cowan wrote:
Michael Everson scripsit:
Although Pullum and Ladusaw don't show the glyphs, they refer 
specifically to Doke's characters (s.v. ///). They describe them as 
"ad hoc" which I suppose the were,  in 1925, though "novel" would 
do as well as they aren't entirely arbitrary and they weren't 
"found" bits of lead type pressed into other service -- they were 
cut to order.
If Sequoyah had had clout, we'd probably be using his original
characters for Cherokee today.
Not only non sequitur, but an unreasonable assumption.
My point was that I considered Pullum and Ladusaw's use of the word 
"ad hoc" to be unlikely. If they were ad-hoc, any printer's sorts 
might be used. (Pullum and Ladusaw are not infallible of course; cf 
Yogh.)

 > That Pullum and Ladusaw have not forgotten Doke's characters suggests
 > that Africanists will also likely not forget them, and will find use
 in access to them as encoded characters in the UCS.
It's P&L's business to remember what would otherwise be (mercifully, 
in some cases) forgotten, so that people who need to interpret old 
documents have some hope of doing so.
You have a weird view of the history of phonetics, John. You haven't 
addressed the substantive issue: these are Latin characters used to 
represent sounds which in 1925 could not easily be represented. That 
they didn't become the IPA standard for representing them is 
accidental. Indeed, there are click letters like the STRETCHED C 
which did get into IPA and were later deprecated. So you can 
represent the STRETCHED C in chu: as Doke writes it (as do Pullum and 
Ladusaw, using Doke's diacritics as well) but you can't represent 
Doke's other letters? This doesn't make sense.

I would like to know where the STRETCHED C comes from, actually. 
Pullum & Ladusaw note that Beach used it in 1938, and proposed a 
curly version in the same year, but it certainly predates that since 
Doke uses it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Michael Everson


> >Of course, it's an empirical question as to whether anyone else in
that
> >era did, in fact, adopt any of these symbols, or whether authors
today
> >ever use them (e.g. in citing Doke, whose work was of some importance
in
> >Africanist linguistics).
> 
> It's reasonable to think that they would. Although Pullum and Ladusaw
> don't show the glyphs, they refer specifically to Doke's characters
> (s.v. ///).

Don't you think the fact that P&L don't show them might suggest that, in
fact, authors today *don't* particularly use them? I looked through many
publications last year searching for attested phonetic symbols not yet
encoded, and while my search wasn't specifically focused on Africanist
usage, I did go through a number of Africanist items and never once saw
any of these.


> That Pullum and Ladusaw have not forgotten Doke's characters suggests
> that Africanists will also likely not forget them, and will find use
> in access to them as encoded characters in the UCS.

I'm inclined to think there's probably greater likelihood that one of
the few modifier letters I proposed but that weren't accepted, e.g. a
MODIFIER LETTER SMALL TURNED Y, would be used than one of Doke's
idiosyncratic symbols. But, they were indeed rejected, and for now
remain PUA only (supported in the Doulos SIL font).



Peter Constable




Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 10:46 -0400 2004-06-10, John Cowan wrote:
We must be talking past one another somehow, but I don't understand how.
To represent the text as originally written, I need a digital representation
for each of the characters in it.  Since all I want to do is reprint
the book -- I don't need to use the unusual characters in interchange --
the PUA and a commissioned font seem just perfect to me.
Erm. You could say that about ANY additions to the Unicode Standard!
I intend no antagonism.
It is perceived. "No! Bad characters! No biscuit!"
We treat the Phaistos-disk characters as guilty until proven innocent,
for the same reason -- there's only one text.
I would disagree. Say it were a bilingual and we could read it. Do 
you really think we wouldn't encode the script? In any case, it's not 
a true analogy, since Phaistos presents a script, and the Khoisian 
characters are phonetic additions to Latin.

There's no *point* in encoding the PD characters because they aren't
used in interchange -- see above.
This doesn't make any sense. I have the Phaistos text encoded with 
PUA characters and a font available for it. If you wanted to exchange 
the text (by sending it to someone else) you could do so. If Phaistos 
were encoded outside of the PUA, it would likewise be exchangeable. 
Bits of Phaistos could be inserted into Latin or Greek or Russian 
text describing them. And those texts could be interchanged.

 > >If I decided to start using thorn instead of theta in my otherwise
 > >IPA transcriptions, that would be an idiosyncratic use of it.
 >
 > Plenty of Germanist transcriptions use thorn. In any case, the
 analogy isn't relevant, as both thorn and theta are encoded and
 available for use.
I was talking about what it means to be idiosyncratic.
That isn't what Doke was doing. He was representing what are to us 
extremely strange sounds in the Latin script.

I was addressing the question of the *novelty* of the characters.
If neither you nor I nor anyone else in this community has seen them
before, they are most certainly novel.
That is not a reason to consign them to the PUA.
 > I am gobsmacked. On what grounds are these not characters? They are
 not glyph representations of other characters.
They *are* characters.  It's just not useful to encode them, any more
than it's useful to encode most of the scripts in the Conscript Registry.
If they are encoded, then historians of Khoisian linguistics can make 
use of them. In what way is this "not useful"?

Find more documents, and the picture changes.
Go to the NYPL and look up Bantu Studies and some of Doke's other 
works for me, will you? I'll be in Markham for the next fortnight. Of 
course I will do what I can when I'm at the Library of Congress in 
early July, but you are welcome to assist.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Michael Everson


> You don't know whether or not they were only used in a single
> document. You know only that I *own* that single document. You are
> declaring the characters guilty until proved innocent. That's
> antagonistic.

Indeed. Didn't everyone have to become a signatory to the Universal
Declaration of Character Rights before subscribing?


> Khoisian phonology is rather
> esoteric, after all.

Esoteric?? (Do we perhaps need to review the meaning of this word?)

 
> >  > Private use? Be
> >>  serious, John. That's a pretty ridiculous suggestion.
> >
> >I am serious.  The PUA is the proper place for these things.
> 
> I am gobsmacked. On what grounds are these not characters? They are
> not glyph representations of other characters. The PRE-PALATAL N is
> described in terms of its phonology as being neither N nor N WITH
> LEFT HOOK.

If I publish a web page using DIAGONAL X WITH TURNED HOOK to represent
something that's not quite this or that cardinal phonetic value, does it
automatically become a character worthy of encoding?

This isn't about character rights. It's about criteria for deciding what
to encode or not to encode. 


Peter Constable




Re: Bantu click letters

2004-06-10 Thread Mark E. Shoulson
Peter Constable wrote:
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
   

On
 

Behalf Of John Cowan
   

	[T]he Unicode Standard does not encode idiosyncratic,
	personal, novel, or private use characters [...].
   

What about Bell's Visible Speech? (I'm sure I've seen it discussed here
on on qalam, but I've no recollection what might have been said.) I
don't know what Bell might have published, but they were also used by
Sweet:
Sweet, Henry. 1906. A primer of phonetics. 3rd edn., revised. Oxford:
Clarendon Press.
Would you consider these too idiosyncratic?
 

I hope not.  IMO Visible Speech *definitely* deserves encoding in Plane 
1.  Bell used it some, and I have several articles by Sweet in which he 
used it, and I even managed to find an article by someone else using 
Visible Speech.  Everything is a novel invention once; the question is 
whether it has a life (or at least a significance) beyond its inventor.  
(cf. Shavian, which probably was used not more than, and likely less 
than, Visible Speech).  In fact, in the movie of My Fair Lady, Visible 
Speech is, in fact, well, visible in Henry Higgins' notebook.

I have a font (my own), and proposal for VS is languishing on my hard 
drive; it should someday be finished up and submitted.

~mark
(owner of visiblespeech.info, which someday, I hope, will actually have 
useful VS information on it)



Re: Bantu click letters

2004-06-10 Thread John Cowan
Michael Everson scripsit:

> Although Pullum and Ladusaw 
> don't show the glyphs, they refer specifically to Doke's characters 
> (s.v. ///). They describe them as "ad hoc" which I suppose the were, 
> in 1925, though "novel" would do as well as they aren't entirely 
> arbitrary and they weren't "found" bits of lead type pressed into 
> other service -- they were cut to order.

If Sequoyah had had clout, we'd probably be using his original
characters for Cherokee today.

> That Pullum and Ladusaw have not forgotten Doke's characters suggests 
> that Africanists will also likely not forget them, and will find use 
> in access to them as encoded characters in the UCS.

It's P&L's business to remember what would otherwise be (mercifully,
in some cases) forgotten, so that people who need to interpret old
documents have some hope of doing so.

What we need is more evidence: either documentary evidence, or the
evidence of breathing Africanists.

-- 
John Cowan  <[EMAIL PROTECTED]>
http://www.ccil.org/~cowan  http://www.reutershealth.com
Charles li reis, nostre emperesdre magnes,
Set anz totz pleinz ad ested in Espagnes.



RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 07:11 -0700 2004-06-10, Peter Constable wrote:
If no other author uses them, then I think it's not unreasonable to
suggest that they are private-use: Doke puts the terms of the agreement
into his product, his readers enter into that agreement when they decide
to read the book. It is "private-use" as opposed to conventional use if
the readers agree to read his symbols but don't adopt them for their own
use.
It's not like it's samizdat, though.
Of course, it's an empirical question as to whether anyone else in that
era did, in fact, adopt any of these symbols, or whether authors today
ever use them (e.g. in citing Doke, whose work was of some importance in
Africanist linguistics).
It's reasonable to think that they would. Although Pullum and Ladusaw 
don't show the glyphs, they refer specifically to Doke's characters 
(s.v. ///). They describe them as "ad hoc" which I suppose the were, 
in 1925, though "novel" would do as well as they aren't entirely 
arbitrary and they weren't "found" bits of lead type pressed into 
other service -- they were cut to order.

That Pullum and Ladusaw have not forgotten Doke's characters suggests 
that Africanists will also likely not forget them, and will find use 
in access to them as encoded characters in the UCS.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Michael Everson


> The sounds they represent are idiosyncratic and difficult to
> describe, much less write. Personal? No: he published. Novel? Perhaps
> (in 1925); Doke is likely to have devised them. Private use? Be
> serious, John. That's a pretty ridiculous suggestion.

If no other author uses them, then I think it's not unreasonable to
suggest that they are private-use: Doke puts the terms of the agreement
into his product, his readers enter into that agreement when they decide
to read the book. It is "private-use" as opposed to conventional use if
the readers agree to read his symbols but don't adopt them for their own
use.

Of course, it's an empirical question as to whether anyone else in that
era did, in fact, adopt any of these symbols, or whether authors today
ever use them (e.g. in citing Doke, whose work was of some importance in
Africanist linguistics).


Peter Constable




Re: Bantu click letters

2004-06-10 Thread John Wilcock
On Thu, 10 Jun 2004 14:30:12 +0100, Michael Everson wrote:
> They were published in Bantu Studies in 1925 in an article by a 
> rather important scholar in the field of African linguistics. Effort 
> and expense was made to cut the letters for the publication.

But have they been used in other publications since? 
Are they used by scholars of African linguistics today?

[I have no idea whether they are or not, but it seems to me that this
information could be important for the proposal]

John.

-- 
-- Over 2400 webcams from ski resorts around the world - www.snoweye.com
-- Translate your technical documents and web pages- www.tradoc.fr





Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 10:00 -0400 2004-06-10, John Cowan wrote:
Michael Everson scripsit:
 They were published in Bantu Studies in 1925 in an article by a
 rather important scholar in the field of African linguistics. 
We don't encode characters according to the clout of the user, or
the Apple logo would have been in Unicode long since. :-)
False analogy. The Apple logo is a logo. Phonetic characters are 
phonetic characters.

 > Effort and expense was made to cut the letters for the publication.
And today, if I were reprinting it, I'd commission a digital font
(your effort, my expense) and put the characters in the PUA.
Not if you wanted, as an Africanist, to be able to represent the text 
as it was originally written.

 > The sounds they represent are idiosyncratic and difficult to
 describe, much less write.
I think that characters used in a single document by a single scholar,
however prestigious, can fairly be described as idiosyncratic to him.
You don't know whether or not they were only used in a single 
document. You know only that I *own* that single document. You are 
declaring the characters guilty until proved innocent. That's 
antagonistic.

If I decided to start using thorn instead of theta in my otherwise
IPA transcriptions, that would be an idiosyncratic use of it.
Plenty of Germanist transcriptions use thorn. In any case, the 
analogy isn't relevant, as both thorn and theta are encoded and 
available for use.

If instead I used OVERCLOCKED HOOCHIMADINGER SYMBOL, that would be 
even more idiosyncratic.

(LATIN LETTER OWL, indeed.)
COMBINING SEAGULL BELOW, indeed.
This symbol is interesting, by the way. Asmus says it's similar to 
something the Japanese use for telephone answering machines. I don't 
know about that, though it looks familar to me. I wonder what Doke's 
source for it was.

 > Personal? No: he published.
Fair enough.
Thank you.
 > Novel? Perhaps
 (in 1925); Doke is likely to have devised them.
They are just as novel today as they were eighty years ago; I well
remember how astonished you and I were, looking over the text.
I was astonished because I hadn't seen them before. That does not 
mean I didn't believe that they weren't worthy of encoding. Just 
because I hadn't seen them before doesn't mean they don't exist and 
aren't worthy of encoding either. Khoisian phonology is rather 
esoteric, after all.

 > Private use? Be
 serious, John. That's a pretty ridiculous suggestion.
I am serious.  The PUA is the proper place for these things.
I am gobsmacked. On what grounds are these not characters? They are 
not glyph representations of other characters. The PRE-PALATAL N is 
described in terms of its phonology as being neither N nor N WITH 
LEFT HOOK.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of John Cowan


>   [T]he Unicode Standard does not encode idiosyncratic,
>   personal, novel, or private use characters [...].

What about Bell's Visible Speech? (I'm sure I've seen it discussed here
on on qalam, but I've no recollection what might have been said.) I
don't know what Bell might have published, but they were also used by
Sweet:

Sweet, Henry. 1906. A primer of phonetics. 3rd edn., revised. Oxford:
Clarendon Press.

Would you consider these too idiosyncratic?



Peter Constable




Re: Bantu click letters

2004-06-10 Thread John Cowan
Michael Everson scripsit:

> They were published in Bantu Studies in 1925 in an article by a 
> rather important scholar in the field of African linguistics.  

We don't encode characters according to the clout of the user, or
the Apple logo would have been in Unicode long since. :-)

> Effort and expense was made to cut the letters for the publication.

And today, if I were reprinting it, I'd commission a digital font
(your effort, my expense) and put the characters in the PUA.

> The sounds they represent are idiosyncratic and difficult to 
> describe, much less write. 

I think that characters used in a single document by a single scholar,
however prestigious, can fairly be described as idiosyncratic to him.
If I decided to start using thorn instead of theta in my otherwise
IPA transcriptions, that would be an idiosyncratic use of it.  If
instead I used OVERCLOCKED HOOCHIMADINGER SYMBOL, that would be
even more idiosyncratic.

(LATIN LETTER OWL, indeed.)

> Personal? No: he published.

Fair enough.

> Novel? Perhaps
> (in 1925); Doke is likely to have devised them. 

They are just as novel today as they were eighty years ago; I well
remember how astonished you and I were, looking over the text.

> Private use? Be
> serious, John. That's a pretty ridiculous suggestion.

I am serious.  The PUA is the proper place for these things.

-- 
"May the hair on your toes never fall out!" John Cowan
--Thorin Oakenshield (to Bilbo) [EMAIL PROTECTED]



Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 09:26 -0400 2004-06-10, John Cowan wrote:
[T]he Unicode Standard does not encode idiosyncratic,
personal, novel, or private use characters [...].
Whatever may have been done in the past, I don't think that one
document is enough to support the introduction of new Latin letters;
these look extremely idiosyncratic, personal, novel and private use
to me.
They were published in Bantu Studies in 1925 in an article by a 
rather important scholar in the field of African linguistics. Effort 
and expense was made to cut the letters for the publication.

The sounds they represent are idiosyncratic and difficult to 
describe, much less write. Personal? No: he published. Novel? Perhaps 
(in 1925); Doke is likely to have devised them. Private use? Be 
serious, John. That's a pretty ridiculous suggestion.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Bantu click letters

2004-06-10 Thread John Cowan
Michael Everson scripsit:

> Proposal to add Bantu phonetic click characters to the UCS
> http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf

[T]he Unicode Standard does not encode idiosyncratic,
personal, novel, or private use characters [...].

Whatever may have been done in the past, I don't think that one
document is enough to support the introduction of new Latin letters;
these look extremely idiosyncratic, personal, novel and private use
to me.

-- 
All Norstrilians knew what laughter was:John Cowan
it was "pleasurable corrigible malfunction".http://www.reutershealth.com
--Cordwainer Smith, Norstrilia  [EMAIL PROTECTED]



RE: Bantu click letters

2004-06-10 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Michael Everson


> >I had not proposed ones I know of before now as I expected they'd be
> >about as well received as the two symbols created by Doke that I
> >proposed last summer: the s and z with swash tail (they were not
> >accepted at that time).
> 
> Heavens, really? The bilabials? Desmond Cole discusses them (and
> shows them) in his article "The History of African Linguistics to
> 1945" in Current Trends in Linguistics, Vol. 7, Linguistics in
> Sub-Saharan Africa. 

Are you serious? I missed that! In the very same volume (p. 648), the z
(but not the s) is cited in A.N. Tucker's article "Orthographic systems
and conventions in Sub-Saharan Africa."


Peter Constable




RE: Bantu click letters

2004-06-10 Thread Michael Everson
Heh. Of course despite the fact that Doke published in Bantu Studies, 
chu: (Kxoe, SIL code XUU) is a Khoisian language. I'll be changing 
the title of the document, though for the purposes of discussion, it 
would be best not to change the title of this thread.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 00:30 -0700 2004-06-10, Peter Constable wrote:
I had not proposed ones I know of before now as 
I expected they'd be about as well received as 
the two symbols created by Doke that I proposed 
last summer: the s and z with swash tail (they 
were not accepted at that time).
Those are also both used in N. V. Jushmanov, 
"Foneticheskie paralleli afrikanskix i 
jafeticheskix jazykov", in Africana (Transations 
of the section of African languages), Moskva: 
Izdatel´stvo Akademii Nauk SSSR, 1937.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 00:30 -0700 2004-06-10, Peter Constable wrote:
!! I had not assumed that we would encode symbols attested in single
publications.
I am CERTAIN that we have many characters which were encoded with 
only one citation in the proposal.

I know there are several more idiosyncratic phonetic symbols out there;
As do we all. I err on the side of generosity in encoding.
I had not proposed ones I know of before now as I expected they'd be 
about as well received as the two symbols created by Doke that I 
proposed last summer: the s and z with swash tail (they were not 
accepted at that time).
Heavens, really? The bilabials? Desmond Cole discusses them (and 
shows them) in his article "The History of African Linguistics to 
1945" in Current Trends in Linguistics, Vol. 7, Linguistics in 
Sub-Saharan Africa. In that article it suggests that Doke's use of 
those symbols could have been inspired by Daniel Jones' 1911 pamphlet 
on Chindau (which I have not seen).
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Bantu click letters

2004-06-10 Thread Michael Everson
At 00:11 -0400 2004-06-10, Ernest Cline wrote:
 > [Original Message]
 From: Michael Everson <[EMAIL PROTECTED]>
 Practice your tongue-twisting.
 Proposal to add Bantu phonetic click characters to the UCS
 http://www.evertype.com/standards/iso10646/pdf/n2790-clicks.pdf
Why wouldn't U+1D4AC MATHEMATICAL SCRIPT CAPITAL Q
work for the script capital Q?  At the very least I feel that should
be explained.
It was understood that the mathematical symbols were not to be used 
in language text.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Bantu click letters

2004-06-10 Thread Michael Everson
At 23:30 -0400 2004-06-09, Mark E. Shoulson wrote:
On the last page, the word spelled approximately 
n®ª? is translated as "to roast" when in fact 
that is approximately nwi (with a different n). 
the n®ª? word means "bow."
Error corrected. I hadn't submitted the document to WG2 and UTC yet.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



  1   2   >