Re: Latin digraph characters (was: Re: Klingon silliness)

2001-02-28 Thread Lukas Pietsch

Doug Ewell wrote:

 Aren't Serbian and Croatian the standard example of two "languages" that
are
 really the same language but are treated separately

This question about languages being "really" the same or no turns out to be
a rather moot one from a linguist's viewpoint, even more so once the issue
gets burdened with national feelings. I mean, are English and Scots the
same? Are Bulgarian and Macedonian the same? Are Rumanian and Aromunian the
same? Are Ancient Greek and Ancient Macedonian the same? Are Upper German
and Lower German the same? Are German, Schwitzerdytsch and Letzeburgsch the
same? Are Dutch and Flemish the same? Are British and American English the
same (that was an issue at one time!) -- There are probably as many such
issues as there are nations in the world, or more, and as a linguist you
get weary of getting asked what the "real" answer is in each case.

 Are there any linguistic or vocabulary differences between them?

Well, there are bound to be, at some level, and if not in the normative
standards, then in the actual spoken varieties of relevant population
centers. The question is just, how big are these, and--different and much
more important question--how big do people *want* to *perceive* these
differences to be?

Lukas

(P.S.: Sorry Doug, I meant to send this to the list in the first place.)




RE: What about musical notation?

2001-02-28 Thread Edward Cherlin

About 25 years ago, I and several friends set some music in our 
church publications (newsletters and handouts for the congregation) 
using transfer symbols and photocopying. The process is definitely 
not suitable for serious publication.

I saw a 19th century American music book set from movable type in the 
Newark, NJ public library about 35 years ago. Naturally I have no 
idea of the title or publisher now. :(

In addition to its inflexibility, movable type is not well suited to 
music because it is difficult to get lines to join smoothly.

At 8:26 AM -0800 2/22/01, Figge, Donald wrote:
About thirty years ago, I was involved in the production of a song book. At
that time, the notes were engraved directly onto copper plates by artisans
who specialized in music engraving. Repro proofs were made from the plates,
and then the words were pasted onto the proofs.

Don
//

-Original Message-
From: William Overington [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 22, 2001 3:53 AM
To: Unicode List
Subject: Re: What about musical notation?


-
-
Does anyone know of any details of metal music type please?

William Overington

22 February 2001

-- 

Edward Cherlin
Generalist
"A knot!" exclaimed Alice. "Oh, do let me help to undo it."
Alice in Wonderland



Re: Latin digraph characters (was: Re: Klingon silliness)

2001-02-28 Thread J%ORG KNAPPEN

Doug Ewell frug:

 Aren't Serbian and Croatian the standard example of two "languages" that are 
 really the same language but are treated separately (a) for political reasons 
 and (b) because Cyrillic is used to write the former and Latin to write the 
 latter?  Are there any linguistic or vocabulary differences between them?

The matter is much more complicated here. Linguistically speaking, 
there is a south slavonic dialect continuum from slovenian to bulgarian
with no sharp language boundaries. There are, of course, many feature 
boundaries and isoglosses, as usual in dialect continua.

Any national language is a contruction (where the degree of contructedness
varies considerably). Serbocroatian (as a single language) is essentially
a 19th century construction and became the national language of Yugoslavia
after WW I. Serbian, Croatian, Bosnian (and maybe Montenegrin soon) are more
recent constructions before and after the split of Yugoslavia into parts.

There is lot of prescriptive language planning going on in order to make 
the three languages more different form each other. The national languages
do not map the major dialect boundaries in the dialect continuum. 

If you can read german, I recommend to you the book of 

Detlev Blanke, Internationale Plansprachen, Akademie-Verlag Berlin

whch contains lots of examples how national languages contained 
planned elements. I proceeds with a survey of planned languages and
Esperanto. Did you know, the Slovak was reconstructed in the 19th century
in order to make it more different from czech?

--J"org Knappen



Re: What about musical notation?

2001-02-28 Thread Edward Cherlin

At 3:52 AM -0800 2/22/01, William Overington wrote:
Having been advised recently about accessing 21 bit unicode characters using
an example from musical notation, following up on that advice I have found
the document that details characters in the range U+1d100 to U+1d1ff,
entitled Musical Symbols.

[snip]

I find myself interested in the possibility that unicode could be used to
encode as a sequence of characters a representation of the contents of the
composing stick of a hand set metal type printer, including the various
items of spacing material of which the viewer of a finished print is not
aware.

One application at present would be so that fine quality type set
illustrations of music and mathematics could be produced by placing that
sequence of codes in the param statement of a java applet in a web page.

Does anyone know of any details of metal music type please?

William Overington

22 February 2001

The TeX DVI output file format does something close to what you 
describe by putting items and expressions composed from basic 
characters into boxes, and specifying the location of each box. Both 
horizontal and vertical spacing are expressed in integer multiples of 
the basic unit, 1/65536 of a true printer's point (1/72.27 in.). Font 
sizes are also specified in the same unit.

The TeX source format includes codes for the common typographical 
spaces, several more specialized math spaces, and a general concept 
of "glue" spaces with numeric stretch and shrink parameters, 
including three orders of infinite stretchability. Further spacing 
control is provided in tables.

Most software that handles mathematical expressions can translate 
them to TeX. This includes high-end math software such as 
Mathematica, technical publishing applications, notably FrameMaker, 
and ordinary word processors with built-in expression editors. In 
some cases, the translation from a word processing format requires an 
external utility.

I suggest, therefore, that writing a downloadable TeX DVI renderer 
plug-in for a Web browser is a more general long-term solution for 
your application. Most of the code you would need is available as 
open source in C. It would not surprise me if a DVI renderer in Java 
had been done somewhere, although I have not heard of one.

There is a Unicode TeX, called Omega TeX, capable of handling any 
writing system in principle, and supporting a fair number of writing 
systems in practice.
-- 

Edward Cherlin
Generalist
"A knot!" exclaimed Alice. "Oh, do let me help to undo it."
Alice in Wonderland



RE: Latin digraph characters (was: Re: Klingon silliness)

2001-02-28 Thread Marco Cimarosti

Douw Ewell wrote:
 Aren't Serbian and Croatian the standard example of two 
 "languages" that are 
 really the same language but are treated separately (a) for 
 political reasons 
 and (b) because Cyrillic is used to write the former and 
 Latin to write the 
 latter?  Are there any linguistic or vocabulary differences 
 between them?

OT talking about languages, not scrips

I think that the difference between the two is comparable to the difference
between British and American English. (Oops! I am a Latino, not an Anglo, so
change last 4 words to "Spanish and American Castillian" :-)

I.e., relatively big differences in the colloquial languages, but just a
handful of spelling and vucabulary variations in the unified literary
language.

I think, for instance, that "river" would be "rijeka" in Croatian and "reka"
in Serbian ("ријека" ad "река", in Cyrillic).

/OT

_ Marco



Re: CJKV ideographic, was Re: Perception that Unicode is 16-bit

2001-02-28 Thread akerbeltz.alba

Tomasek idatzi zuen:

 More importantly, Han \u6f22 (Cant. Hon) really isn't an ethnonym used
 by the Cantonese and other southern Chinese; rather, Tang \u5510 (Cant.
 Tong) is used instead, e.g., tangcan \u5510\u9910 'Chinese cuisine' (Cant.
 tongchaan), tanghua \u5510\u8a71 'Chinese (spoken) language' (Cant.
 tongwa), tangren \u5510\u4eba 'Chinese person' (Cant. tongyan), tangrenjie
 \u5510\u4eba\u8857 'Chinatown' (Cant. tongyangaai), tangshan \u5510\u5c71
 'China (lit. "Tang mountain")' (Cant. tongsaan), etc.  Some of these terms
 are kind of old-fashioned or rustic, though.

True, but it would be a bit unfair, since other groups use the same
ethnonym. If we're looking for a high register term for Cantonese
ideographs, how about 'YuhtJih' [7cb5\u5b57 ] (Mand. Yuzi)?

 I think I heard of a tangzi \u5510\u5b57 (Cant. tongji) term once; this
 would be most ideal to make use of, if one wanted to invent new English
 terminology.  But that still leaves the problem of distinguishing the
 "dialect" characters of other southern Chinese languages from the
 mainstream characters, and the Cantonese "dialectal" characters.

 Basically just linguistic transcription, like the recently-created Hong
 Kong-indigenous Jyutping \u7cb5\u62fc (Mand. Yuepin) system.  Unlike some
 other Chinese languages, romanization (usu. introduced by missionaries)
 didn't catch on, and the dominant (and conservative) trend is to write in
 Han characters, even if that means having to create new ones, hijacking
 existing ones, or resurrecting old ones.

Just for the sake of our sanity ; ) with the number of homophones we have,
writing entirely in romanization is ... an interesting pasttime. Believe me,
I've tried ... and the Yale English Cantonese dictionary still drives me
nuts for not having characters ...
So apart from dictionaries you find romanization (a myriad of varieties) for
the transcription of names and shops, place names ... and that's about it I
would think. I believe 'status' comes into this a lot - educated people
"know how to write" ...
Michael




Re: Klingon silliness

2001-02-28 Thread Michael Everson

At 15:03 -0800 2001-02-27, David Starner wrote:

Then why do you berate people for working on Klingon? They're probably
no more an expert at the minority scripts than you are.

One begs to differ. ;-)

They're just
working on what they find interesting and are knowledgable about.

I guess in a bit I'll have to say something about all this Klingon silliness.
--
Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein ochtarach; Baile tha Cliath 2; ire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Pirc an Fhithlinn;  Baile an Bhthair;  Co. tha Cliath; ire



Re: What about musical notation?

2001-02-28 Thread Florian Weimer

Edward Cherlin [EMAIL PROTECTED] writes:

 I suggest, therefore, that writing a downloadable TeX DVI renderer
 plug-in for a Web browser is a more general long-term solution for
 your application. Most of the code you would need is available as open
 source in C. It would not surprise me if a DVI renderer in Java had
 been done somewhere, although I have not heard of one.

There's a Java DVI viewer available at:

http://www.geom.umn.edu/java/idvi/

However, it's not free software (it's only for non-commercial use).



Re: Latin digraph characters (was: Re: Klingon silliness)

2001-02-28 Thread G. Adam Stanislav

On Tue, Feb 27, 2001 at 08:38:04PM -0800, [EMAIL PROTECTED] wrote:
Aren't Serbian and Croatian the standard example of two "languages" that are 
really the same language but are treated separately (a) for political reasons 
and (b) because Cyrillic is used to write the former and Latin to write the 
latter?  Are there any linguistic or vocabulary differences between them?

They are very similar, but there are subtle differences. The reasons are
not just political but cultural and religious. The Serbians are mostly
Orthodox, which is why they use Cyrillic. The Croatians are mostly
Catholic, hence the use of the Roman alphabet. (It is a fairly precise
rule that you can tell whether a Slavic nation has been historically
Eastern Orthodox or Roman Catholic by which alphabet they use. Naturally,
there have been other developments, e.g., there is a strong Lutheran
minority in Slovakia, a strong Hussite tradition in Bohemia, etc. And,
of course, you cannot assume that any individual is of a specific
religion based on his nationality.)

Adam

-- 
Cogitans me cogito esse



Re: Latin digraph characters (was: Re: Klingon silliness)

2001-02-28 Thread G. Adam Stanislav

On Wed, Feb 28, 2001 at 12:39:09AM -0800, J%ORG KNAPPEN wrote:
Did you know, the Slovak was reconstructed in the 19th century
in order to make it more different from czech?

Not true. Written documents dating back to the Middle Ages clearly show
that Slovak has been virtually unchanged since then.

What you are talking about is the difference between Bernolk and
tr, both of whom tried to codify Slovak in the 19th Century, that
is to say, to create a set of formal rules (just as Webster did in
America).

Bernolk did it first. He used a Western Slovak dialect. His attempt
failed, mostly because that dialect was not representative of the
way most Slovaks speak.

tr then attempted the same but using the Central Slovak dialect.
He was very successful because that dialect was something all Slovaks
could easily accept as the "official" language of Slovakia.

Since Western Slovakia is closer to Bohemia (where the Czechs live),
naturally, its dialect is closer to Czech (still quite distinct but
closer) than the Central Slovak dialect.

Because Bernolk's attempt predated tr, I can see how it could
give out the impression of "reconstruction" but that is not what
happened. Rather, Bernolk's attempt failed because it may have worked
for the Slovak intelligentsia living in Bratislava (which is in Western
Slovakia), it made no sense to the rest of the nation. tr was
successful (even Bernolk accepted him) because he did it right.

By the way, there was another attempt which preceded both. I cannot
remember the author's name off the top of my head. This gentleman
wrote what is now considered the first Slovak novel. He pretty much
created his own Slovak based on his own theories of what Slovak should
be. In the 20th Century someone translated that novel into real Slovak,
so people can actually understand it.

All of this happened during the 19th Century period of nationalism
(which arose not just in Slovakia) as a reaction to the attempt of
having Slovaks assimilated into Hungary after the Austrian Empire
turned into Austro-Hungarian Empire and Hungarian became the official
legal language of that part of the Empire (before that, Latin was the
official language of the entire Empire and all nationalities used
their own languages in day-to-day communications in their respective
territories -- incidentally, Slovak has two very different words for
Hungarian, one describing the entire territory of the Kingdom of
Hungary, one describing the specific language of modern Hungary --
that means that a 19th Century Slovak could say, yes I am a Hungarian,
as in someone from the Kingdom of Hungary, and at the same time, no,
I am certainly not a Hungarian, as in a member of the nation living
in modern Hungary -- the first word was Uhor, the second Maar,
quite different).

The Czechs lived in the Austrian section of the Empire, the Slovaks
in the Hungarian section. Back then no one would have suggested that
Czech and Slovak were the same language (they have been distinct
linguistically, culturally, and politically, ever since the Slavs
moved to Europe). There was no need to "reconstruct" Slovak to make
it more distinct from Czech.

Cheers,
Adam

-- 
When two do the same, it's not the same
-- Slovak proverb



Re: CJKV ideographic, was Re: Perception that Unicode is 16-bit

2001-02-28 Thread John Jenkins


On Wednesday, February 28, 2001, at 01:54 AM, akerbeltz.alba wrote:

 So apart from dictionaries you find romanization (a myriad of 
 varieties) for
 the transcription of names and shops, place names ... and that's about 
 it I
 would think. I believe 'status' comes into this a lot - educated people
 "know how to write" ...

Most place names in Hong Kong seem to be modified versions of 
Wade-Giles, which I (having learned Yale first) tend to find almost 
incomprehensible.  Still, it was always fun to hear British news readers 
talk about what had just happened in, say, Sham Shui Po.

=
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/



utf-1.3 and utf-1.4

2001-02-28 Thread J%ORG KNAPPEN

On
 
http://www.atm.ch.cam.ac.uk/acmsu/utf/

I found the acronym utf used in a very different way than 
UNicoders/ISO10646ers use it. Fortunately, there never was 
a utf-1.3 or utf-1.4 in our context.

--J"org Knappen



Re: [OT] What is DEL for?

2001-02-28 Thread Gaute B Strokkenes

On Thu, 22 Feb 2001, [EMAIL PROTECTED] wrote:

 Otto Stolz [mailto:[EMAIL PROTECTED]] wrote:
 
 Dear Unicoders,
 
 again, I have inadvertently sent a contribution to a member rather
 than to the whole list, because the Unicode list sets the Reply-to
 header in an utmost inconvenient and unexpected manner.
 
 Here is a copy for the list. I hope I will not mistype the address.
 I really wish that I simply could use the reply-to-sender function
 of my MUA to answer to the Unicode list.

...

 Or maybe you need a mail client that allows you to apply a special
 rule to messages that come from this list such that any reply you
 send to a list message defaults to the address in the To: line
 rather than that in the From: or Reply To: line of the original
 message.

I agree with you about the specific case of the Reply-To: header, but
I think that it might be a good idea to change the list setup in other
ways.  For instance, the current setup seems to remove all
In-Reply-To: and References: headers.  This is a problem since it
breaks the ability of my email program (Gnus) to do threading, for no
particularly good reason.  In fact, I believe that all headers except
for a those in a particular list are removed.

Another annoyance is that no special headers are used to indicate that
the message is in fact from this mailing list, so that you have to use
the Sender: header etc. to do mail splitting, which is also annoying.

-- 
Gaute Strokkeneshttp://www.srcf.ucam.org/~gs234/
I'm using my X-RAY VISION to obtain a rare glimpse of the
 INNER WORKINGS of this POTATO!!



Re: Klingon silliness

2001-02-28 Thread Marion Gunn

Arsa Michael Everson:

 At 13:05 -0800 2001-02-27, Timothy Partridge wrote:
 
 How come the Klingons only have one
 language and script? :-)

 The victors successfully assimilated the conquered.
 --
 Michael Everson

Sure they can assimilate? I'm reliably informed that they only cling on.

mg

--
Marion Gunn
Everson Gunn Teoranta
http://www.egt.ie





Re: Latin digraph characters

2001-02-28 Thread William Overington


Germans transliterate a single cyrillic letter with TSCH, shouldn't
Unicode have also this tetragraph encoded?  (ducking...)



Is this the Cyrillic letter that is transliterated into English as CH
pronounced as CH in church?

There are in mathematics some polynomials called Chebyshev polynomials after
a mathematician whose name was written in Cyrillic characters.  I think that
he was Russian, but I am not congruently certain of that.

I remember seeing once that his name is sometimes expressed in roman
characters as Chebyshev and sometimes in another way that I do not precisely
remember and will not guess at but it began with the letter T.

It is an interesting circumstance that Chebyshev polynomials are represented
as

y = T0(x)
y = T1(x)
y = T2(x)
y = T3(x)
and so on for all non-negative integers, where the number following the
letter T should be written as a subscript and are not done so here because
of the limitations of this email format.

Perhaps there is some interesting footnote to this circumstance and that
maybe the T refers to the mathematician's surname and that for some
peculiarity of history of different routes being used at different times his
surname was transliterated by one method for defining the functions and by
another method for stating his name.

Does anyone know whether there is any evidence of that being the case?

William Overington

28 February 2001






RE: Latin digraph characters

2001-02-28 Thread Handwerker, Reinhard (ISS Atlanta)

William,
I can only second your assumption for naming the Chebyshev polynomials Tx(),
since the German transliteration is indeed Tschebyscheff (as the
mathematician in me remembers...).

FYI, there is one cyrillic character (U+0429: ?) that is transliterated as
SCHTSCH (in German),
a few years ago there was one Nathan Tcharansky (Schtscharanski in German).

getting a little [OT] now...

Reinhard G. Handwerker, 
Sr. i18n Engineer 
Internet Security Systems

-Original Message-
From: William Overington [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 28, 2001 13:14
To: Unicode List
Cc: [EMAIL PROTECTED]
Subject: Re: Latin digraph characters



Germans transliterate a single cyrillic letter with TSCH, shouldn't
Unicode have also this tetragraph encoded?  (ducking...)



Is this the Cyrillic letter that is transliterated into English as CH
pronounced as CH in church?

There are in mathematics some polynomials called Chebyshev polynomials after
a mathematician whose name was written in Cyrillic characters.  I think that
he was Russian, but I am not congruently certain of that.

I remember seeing once that his name is sometimes expressed in roman
characters as Chebyshev and sometimes in another way that I do not precisely
remember and will not guess at but it began with the letter T.

It is an interesting circumstance that Chebyshev polynomials are represented
as

y = T0(x)
y = T1(x)
y = T2(x)
y = T3(x)
and so on for all non-negative integers, where the number following the
letter T should be written as a subscript and are not done so here because
of the limitations of this email format.

Perhaps there is some interesting footnote to this circumstance and that
maybe the T refers to the mathematician's surname and that for some
peculiarity of history of different routes being used at different times his
surname was transliterated by one method for defining the functions and by
another method for stating his name.

Does anyone know whether there is any evidence of that being the case?

William Overington

28 February 2001





Re: Latin digraph characters

2001-02-28 Thread Antoine Leca

[utf-8]

William Overington wrote:
 
 Germans transliterate a single cyrillic letter with TSCH, shouldn't
 Unicode have also this tetragraph encoded?  (ducking...)
 
 Is this the Cyrillic letter that is transliterated into English as CH
 pronounced as CH in church?

Yes.

 There are in mathematics some polynomials called Chebyshev polynomials after
 a mathematician whose name was written in Cyrillic characters.  I think that
 he was Russian, but I am not congruently certain of that.

Neither do I. But see below.

 
 I remember seeing once that his name is sometimes expressed in roman
 characters as Chebyshev and sometimes in another way that I do not precisely
 remember and will not guess at but it began with the letter T.

Perhaps "Tchébicheff", i.e. the French way. It happens that the international
way to spell Russian names is to use the French way of translating. This is
important for passports, for example.

The German way would be "Tschebyscheff", or something like that.

Incidentally, I gave a look at Don Knuth's site (he has a long list of
non-Latin names of famous mathematicians), at
URL:http://Sunburn.Stanford.EDU/~knuth/help.html#exotic.
And Don Knuth gave Chebyshev and Tschebyscheff, and not the French
Tchébicheff. Also it appears that "Tchébicheff" is not very common on
the web, much less than is "Tschebyscheff", which is itself much more
uncommon than Chebyshev is (resp. 65, 1100 and 31000 in Google).

Also, Don Knuth gives Пафнутий for his first name, which does not
sounds very Russian to me. So it might happen that Chebyshev came from
a region later dominated by Germany, hence had his name changed toward
the German orthographic rules. Just a thought.

 
 It is an interesting circumstance that Chebyshev polynomials are represented
 as
 
 y = T0(x)

Curiously, I do not remember using T, but rather the more usual P, to
represent the Chebyshev polynomials while at school (but it is quite a
long time ago, so I may easily record incorrectly).


Antoine



Re: Latin digraph characters

2001-02-28 Thread Valeriy E. Ushakov

On Wed, Feb 28, 2001 at 11:19:37 -0800, Antoine Leca wrote:

 [utf-8]

[koi8-r] ;-(
I know I should upgrade my mailer.

 Also, Don Knuth gives ðÁÆÎÕÔÉÊ for his first name, which does not
 sounds very Russian to me.

It's Russian.  Though, surely, not of Russian/Slavic origin.

He was born on May 4 (Julian) in a village just few miles avay of a
monastery famous for and named after Russian saint of the same name.
Incidentally Russian Orthodox Church celebrate the memory of this
saint on May 1 (Julian).  Hence the name choice is not that surprising
given the time and the place of birth.

http://users.kaluga.ru/school6/chebishev/Family.htm
http://www.days.ru/Life/life964.htm

SY, Uwe
-- 
[EMAIL PROTECTED] |   Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/|   Ist zu Grunde gehen



UTF-8, C1 controls, and UNIX

2001-02-28 Thread Frank da Cruz

The idea behind UTF-8 is to be able to use it in non-Unicode-aware UNIX
versions: It lets you have Unicode filenames, Unicode directory names,
Unicode file contents, Unicode email, etc.  But what it does not do is let
you *type* Unicode into regular UNIX applications or shells, if the UTF-8
happens to contain C1 control characters as do, for example, many of the
Cyrillic letters (e.g. capital A through PE).  Most UNIX terminal drivers
treat incoming C1 controls like their C0 counterparts, so 0x83 == 0x03 ==
Ctrl-C, which interrupts whatever process you are talking to.   Similarly
0x84 ==  Ctrl-D, which is EOF; 0x88 is backspace, and so on.

I suppose this is a statement of the obvious, but now that I'm using a
Unicode based terminal emulator with UTF-8 character set and trying to
compose e-mail and netnews containing Russian words in a Telnet session to
UNIX, the problem is suddenly concrete.  We have said that UTF-8 is a kind
of "transport form" that must be decoded prior to (e.g.) terminal escape
sequences in the host-to-terminal direction.  That's fine, the terminal
emulator can (and does) do that.  But in the other direction there is no
such decoder on the UNIX end.  The bare C1 octets are read by the UNIX
terminal driver, which treats them as interrupt, suspend, xoff, tab,
carriage return, linefeed, and all the rest.  Here the model breaks down --
it is not symmetric.

The nice thing about ISO 8859-1 was that it could be freely used in UNIX,
in both directions, without UNIX knowing a thing about it.  The same is not
true for UTF-8.

- Frank




Re: Latin digraph characters

2001-02-28 Thread Pierpaolo BERNARDI


On Wed, 28 Feb 2001, Antoine Leca wrote:
 William Overington wrote:

  Germans transliterate a single cyrillic letter with TSCH, shouldn't
  Unicode have also this tetragraph encoded?  (ducking...)
  
  Is this the Cyrillic letter that is transliterated into English as CH
  pronounced as CH in church?
 
 Yes.
...
  I remember seeing once that his name is sometimes expressed in roman
  characters as Chebyshev and sometimes in another way that I do not precisely
  remember and will not guess at but it began with the letter T.

The initial character of the name is transliterated as CH in English, TCH
in French, TSCH in German, C or CI in Italian, C WITH CARON in the
official Russian transliteration. It's the same character as the first one
in Chajkovskij, Chekhov...
 
 Perhaps "Tchbicheff", i.e. the French way. It happens that the international
 way to spell Russian names is to use the French way of translating. This is
 important for passports, for example.

The Russian passports that I have seen (ok, only two) used the official
transliteration, which is different from the French one. The French
transliteration was popular once, as French was widely known between
high-class Russians.

In material printed nowadays in Russia I have always seen used the
official transliteration, when needed. 

__
||oka,
  Pierpaolo





Re: What about musical notation?

2001-02-28 Thread Werner LEMBERG


 One application at present would be so that fine quality type set
 illustrations of music and mathematics could be produced by placing
 that sequence of codes in the param statement of a java applet in a
 web page.

You may have a look at Lilypond, which is a free musical typesetting
engine producing output for TeX (direct PS output is still
experimental).

  http://www.cs.uu.nl/~hanwen/lilypond/index.html


Werner



Re: UTF-8, C1 controls, and UNIX

2001-02-28 Thread Frank da Cruz

 Maybe one should make a transmission safe UTF that left C1 alone?
 
Remember this? --

From: Markus Scherer [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Date: Mon, 10 Apr 2000 15:23:53 -0800 (GMT-0800)
Subject: What if UTF-8 had been defined after UTF-16?

What if UTF-8 had been defined just for the code point range 0..0x10?
What if UTF-8 had been designed to be not just "File-System-Safe" but also
"Terminal-Safe"?

UTF-8 could have had all the nice features that it has now, plus:
- C1 control codes (0x80..0x9f) passed through as single bytes
- no sequences longer than 4 bytes, BMP still covered with 3 bytes
- no checking for code points  0x10 because
  it could have been designed just for that range
- no minimum-length problem - no security concerns
- all byte values used for some encoding

It would have been possible. Interested? See
http://www.mindspring.com/~markus.scherer/utf-8c1.html .

Note: This is _not_ an approved UTF. I am _not_ proposing this as a new
UTF. This is _not_ compatible with any existing UTF or other Unicode
implementation. It is just a play with bits and bytes, a "what if", a
"Gedankenexperiment".

Just to share a thought -

markus




Re: Question on Unicode data files

2001-02-28 Thread Jianping Yang

Mr Zhang is CEO of that company.

Regards,
Jianping.

John Jenkins wrote:

 On Monday, February 26, 2001, at 09:12 PM, Richard Cook wrote:

  Is there any connection between this http://www.unihan.com.cn/ site and
  IRG? What is UniHan Digital Tech Co.? Their website has some rather
  annoying graphics and windows, but no basic info that i can see ... the
  bottom buttons don't work at all, no?
 

 I don't know who they are.  They're not associated with the IRG that I'm
 aware.  I'm checking with Mr. Zhang to see if he's heard of them.

 =
 John H. Jenkins
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 http://homepage.mac.com/jenkins/


begin:vcard 
n:Yang;Jianping
tel;fax:650-506-7225
tel;work:650-506-4865
x-mozilla-html:FALSE
org:Server Gobalization Technology;Server Technology
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Development Manager
adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065;
fn:Jianping Yang
end:vcard



Re: UTF-8, C1 controls, and UNIX

2001-02-28 Thread Keld Jørn Simonsen

On Wed, Feb 28, 2001 at 01:11:20PM -0800, Frank da Cruz wrote:
 The idea behind UTF-8 is to be able to use it in non-Unicode-aware UNIX
 versions: It lets you have Unicode filenames, Unicode directory names,
 Unicode file contents, Unicode email, etc.  But what it does not do is let
 you *type* Unicode into regular UNIX applications or shells, if the UTF-8
 happens to contain C1 control characters as do, for example, many of the
 Cyrillic letters (e.g. capital A through PE).  Most UNIX terminal drivers
 treat incoming C1 controls like their C0 counterparts, so 0x83 == 0x03 ==
 Ctrl-C, which interrupts whatever process you are talking to.   Similarly
 0x84 ==  Ctrl-D, which is EOF; 0x88 is backspace, and so on.

Maybe one should make a transmission safe UTF that left C1 alone?

keld



RE: Latin digraph characters

2001-02-28 Thread jarkko . hietaniemi

 The initial character of the name is transliterated as CH in 
 English, TCH
 in French, TSCH in German, C or CI in Italian, C WITH CARON in the
 official Russian transliteration. It's the same character as 
 the first one
 in Chajkovskij, Chekhov...

...and as TS (the S with caron) in the Finnish transliteration.
(Finnish uses S-caron and Z-caron but only to transliterate Cyrillic,
and also in certain loanwords that contain the same "shound", such as
"sakki" (s-caron tehere) for "chess")  When caron is not available,
I think the allowed fallback is TSH.







Spacing diacritics in Greek Extended

2001-02-28 Thread Nick NICHOLAS


As you know, in the short term any texts out there in Unicode
polytonic Greek use precomposed characters, as people are not waiting for
the intelligent font engines of the future. To put texts in Unicode, they
convert them from existing codings. In all of these existing codings, be
they 8-bit or ASCII-based (Beta Code), a capital letter with diacritics
(titlecase) is rendered as two glyphs: the diacritics, as a spacing glyph,
and then the capital.

Since people have no familiarity with single-glyph
capitals-with-diacritics, they are doing the same with their precomposed
Unicode glyphs, using the spacing diacritics at the bottom of Greek
Extended. See for example
http://www.fordham.edu/halsall/basis/thomais-uni.html : the diacritics in
section 5, at least, are separate glyphs.

Unicode allows these spacing diacritic glyphs, but the Standard says that
"unless information is present to the contrary", they should be
interpreted as SPACE + non-spacing equivalent diacritic (Unicode 3.0,
p.169-170). Would it be expedient to change this to having it postmodify the
next character, as a legitimate legacy concern (which is why the
precomposeds are there in the first place?)

Fortunately the main online resource for converting into Unicode
polytonic Greek (Sean Redmond's,
http://www.jiffycomp.com/smr/unicode/convert.php3) is well-behaved in
this regard.

-- 
Nick Nicholas. TLG, UCI, USA. [EMAIL PROTECTED]; www.tlg.uci.edu/~opoudjis
 Many among their proselytes had sold their lands and houses to increase
  the public riches of the sect --- at the expense, indeed, of their
  unfortunate children, who found themselves beggars because their
  parents had been saints. (Edward Gibbon, _Decline and Fall_.)




ELL: Aboriginal Language TV Series

2001-02-28 Thread Peter_Constable

FYI
- Forwarded by Peter Constable/IntlAdmin/WCT on 02/28/2001 07:23 PM
-
|+--
||  "Paul M. Rickard"   |
||  [EMAIL PROTECTED] |
||  Sent by:|
||  [EMAIL PROTECTED]|
||  och.edu.au  |
||  |
||  |
||  02/28/2001 04:58 PM |
||  Please respond to endangered-languages-l|
||  |
|+--
  
-|
  |
 |
  |   To: "[EMAIL PROTECTED]" 
[EMAIL PROTECTED] |
  |   cc:  
 |
  |   Subject: ELL: Aboriginal Language TV Series  
 |
  
-|




For those interested:

Mushkeg Media Inc., a native-owned production company is currently
broadcasting a 13 part series
on APTN (Aboriginal People's Television Network) on the state of
Aboriginal languages in Canada.

We are also planning season two of the series and are looking for any
interesting and unique language
revitilization programs or iniatitives by individuals, communities or
organizations across the land.

The series can be seen on APTN every Thursday at 2:30 pm and 11:30 pm
EASTERN time.  Listed
below is the current schedule as seen on APTN.   For more information
about the series or any story
ideas you can email:  [EMAIL PROTECTED]or
[EMAIL PROTECTED]

Also check out:  www.aptn.ca   for schedule and tv channel on cable in
different parts of the Canada.
Can also be picked up on Bell-Express Vu and StarChoice satellite dish
for those in remote areas.

Meegwetch.

Paul M. Rickard
Mushkeg Media Inc.
103 Villeneauve West
Montreal, Quebec
H2T 1 R6
[EMAIL PROTECTED]


APTN: FINDING OUR TALK: EPISODES
STARTING THURSDAY FEB 1 AT 2:30pm  11:30pm EASTERN TIME

Episode 1 - Feb.1: Language Among the  Skywalkers: Mohawk:  This is the
story of the legendary Mohawk ironworkers, and of new approaches to
language instruction for both adults and children within the
contemporary community of Kahnawake.

Episode 2 - Feb. 8: Language Immersion: Cree:  This episode will trace
the history of the very successful Cree Language Immersion Program,
developed and implemented in schools in the Cree communities of Northern
Qubec.

Episode 3  - Feb. 15: The Trees are Talking: Algonquin:  George and
Maggie Wabanonick take a group of teens to the woods to initiate them in
their traditional culture and language. In the classroom, the kids and
teachers struggle with their Algonquin lessons, while the pop group
Anishnabe give the language new life.

Episode 4  - Feb 22: The Power of Words: Inuktitut:  At a language
conference in Puvirnituq, we witness efforts to keep Inuktitut alive and
up-to-date, largely through the knowledge and commitment of elders.

Episode 5 - March 1: Words Travel On Air: Attikamekw, Innu:  Karin
Awashish, a young radio journalist working at SOCAM, makes a trip to her
home community to tape interviews and legends told by elders in
Attikamekw,  as part of the network's language initiative.

Episode 6 - March 8: Language in the City: Ojibwe/Anishinabe:  This
episode will focus on Isadore Toulouse's  weekly trajectory to four
different urban-based schools,  where we witness first-hand, and with
raw immediacy, his efforts to pass on his own enthusiasm and passion for
the Ojibwe language.

Episode 7 - March 15:  Getting Into Michif: Michif:  We meet some of the
movers and shakers  working politically and through the education
system, to have Michif recognized as the official language of the Mtis,
as well as those whose passion and dedication are evidenced at the
grass-roots level.

Episode 8 - March 22: Plains Talk: Saulteaux:  This episode follows the
work of a virtually self-taught, highly motivated language teacher.
Stella Ketchemonia has devoted her life to teaching the Saulteaux
language . She is now a member  of the dynamic staff of the Saskatchewan
Indian Federated College.

Episode 9 - March 29: Breaking New Ground: Mi'kmaw:  This episode looks
at two projects; a pilot to have Mi'kmaw adopted as an 

Re: Spacing diacritics in Greek Extended

2001-02-28 Thread Kenneth Whistler

Nick Nicholas said:

 As you know, in the short term any texts out there in Unicode
 polytonic Greek use precomposed characters, as people are not waiting for
 the intelligent font engines of the future. To put texts in Unicode, they
 convert them from existing codings. In all of these existing codings, be
 they 8-bit or ASCII-based (Beta Code), a capital letter with diacritics
 (titlecase) is rendered as two glyphs: the diacritics, as a spacing glyph,
 and then the capital.
 
 Since people have no familiarity with single-glyph
 capitals-with-diacritics, they are doing the same with their precomposed
 Unicode glyphs, using the spacing diacritics at the bottom of Greek
 Extended. See for example
 http://www.fordham.edu/halsall/basis/thomais-uni.html : the diacritics in
 section 5, at least, are separate glyphs.
 
 Unicode allows these spacing diacritic glyphs, but the Standard says that
 "unless information is present to the contrary", they should be
 interpreted as SPACE + non-spacing equivalent diacritic (Unicode 3.0,
 p.169-170). Would it be expedient to change this to having it postmodify the
 next character, as a legitimate legacy concern (which is why the
 precomposeds are there in the first place?)

No, if what you mean is a mechanical change of interpretation of such
a sequence, so that the Unicode Standard would specify that:

1F0A (for example) = 1FCD, 0391 = 0391, 0313, 0300

The intermediate node of that equivalence would be totally out of
whack for Unicode, formally, since it decomposes instead to:

0020, 0313, 0300, 0391

i.e., not the same as the recursive decomposition of 1F0A.

What the text on pp. 169-170 says, in full is:

"Decomposition of [Greek Diacritic] Spacing Forms. When decomposing
the spacing forms, the spacing status of the implied usage must be
taken into account. Unless information is present to the contrary,
these spacing forms would be decomposed to U+0020 SPACE followed by
the nonspacing form equivalents shown in Table 7-2."

The exegesis of that passage is as follows.

If you are simply decomposing text by a general algorithm, as for
a Unicode Normalization Form (UAX #15), then you *must* use the
normative decomposition mappings, as specified by that algorithm.
I.e., 1FCD, 0391 normalized to NFKD is 0020, 0313, 0300, 0391
and nothing else.

However, if you have "information present to the contrary", as would
be the case if you were doing intelligent conversion of polytonic
Greek, then it is perfectly o.k. to take a Unicode representation
of a compatibility sequence, i.e. 1FCD, 0391, perhaps obtained by
a one-to-one mapping against an 8-bit implementation, and turn that
into the preferred Unicode representation of polytonic Greek,
i.e., 0391, 0313, 0300. This is a knowing transformation of the
data from one form to another form by a process aware of these
equivalences. But that is comparable, for example, to doing
a transliteration from one form to another form, rather than being
a built-in normative equivalence defined by the Unicode Standard
itself.

 
 Fortunately the main online resource for converting into Unicode
 polytonic Greek (Sean Redmond's,
 http://www.jiffycomp.com/smr/unicode/convert.php3) is well-behaved in
 this regard.

Good. I expect for these kinds of issues smart implementers ought
to be able to "do the right thing". ;-)

--Ken




RE: Handling UTF-8

2001-02-28 Thread thejokrishna

Hi
What is your system architecture? A lot depends on it.
btw, you could check up www.unicode.org, goto Useful resources. I
think it might have  a few tools and libraries
that could be helpful.

Hope this helps
Thejo



 --
 From: Polykarpos Karamaoynas
 Sent: Tuesday, February 27, 2001 7:02 PM
 To:   Unicode List
 Subject:  Handling UTF-8
 
 Hello to everybody,
 i'd like to know if there is any available library which handles UTF-8
 strings.
 I intend to make a non graphic application interface which takes as input
 unicode strings, handles them and communicates with data base by quering
 it or informing it with UTF-8 format.
 Is there anything available that could help me for this?
  
 Thank you in advance.
  
 Polykarpos
 



Re: UTF-8, C1 controls, and UNIX

2001-02-28 Thread DougEwell2

In a message dated 2001-02-28 15:13:02 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

   Maybe one should make a transmission safe UTF that left C1 alone?
   
  Remember this? --
  
  From: Markus Scherer [EMAIL PROTECTED]
  To: "Unicode List" [EMAIL PROTECTED]
  Date: Mon, 10 Apr 2000 15:23:53 -0800 (GMT-0800)
  Subject: What if UTF-8 had been defined after UTF-16?
  
  What if UTF-8 had been defined just for the code point range 0..0x10?
  What if UTF-8 had been designed to be not just "File-System-Safe" but also
  "Terminal-Safe"?

Keld may have been referring to his own "UTF-7d5", described at 
http://www.uni-mainz.de/~knappen/jk009.html.  Like UTF-8, it can express 
basic characters in no more than 3 code units, but unlike UTF-8 it requires 
the additional layer of UTF-16 to express supplementary characters (so they 
take 6 code units).

UTF-1, the original UTF, was also designed not to use C0 or C1 bytes, or 
space or DEL, except to represent themselves.  However, apparently the 
"slash" issue was deemed more critical than avoiding C1 bytes.

-Doug Ewell
 Fullerton, California



Re: Latin digraph characters

2001-02-28 Thread DougEwell2

In a message dated 2001-02-28 14:13:23 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

  The initial character of the name is transliterated as CH in English, TCH
  in French, TSCH in German, C or CI in Italian, C WITH CARON in the
  official Russian transliteration. It's the same character as the first one
  in Chajkovskij, Chekhov...

Oddly enough, however, the great romantic composer's name is generally 
rendered "Tchaikovsky" in English, with the "tch" combination that is not 
used in any other English transliterations of Russian names.  In olden days 
the German or German-like spelling "Tschaikowsky" was more common.

-Doug Ewell
 Fullerton, California