Re: Script use by Mathematicians (Was Re: Single Unicode Font)

2001-05-23 Thread J%ORG KNAPPEN

Yes,

script and symbol use by mathematicians ad scientists is well researched. The
outcome of this research is the STIX proposal for additional math characters
to be added in UNicode.

You may also want to consult the pages on MathML at http://www.w3c.org

--J"org Knappen




[unicode] Re: x-bar character

2001-03-23 Thread J%ORG KNAPPEN

John Cowan schrieb:

:Hmm.  If you multiply x-bar by y-bar, surely you want the bars to be
:separated, not run together into a single bar (which would be the mean
:of x times y), no?  In that case COMBINING MACRON would be better.
:Or should x-bar times y-bar be written with a THIN SPACE separating them?

With UNicode 3.2 you can place the character InvisibleTimes (Mathematica speak)
in between. But I agree, combining macron is better than combining bar in this
case. And the Invisible Times will be an optional embellishment for some 
time going -- tho it is really usefull in MathML and Computer Algebra.

--J"org Knapen




[unicode] Re: Moving mail lists

2001-03-22 Thread J%ORG KNAPPEN


I don't like the [unicode] prefix to all subject lines, because it eats up
too much of the valuable human readble subject line space. I could live with
it better, if it can be shortened to something like [uc], but the best
for me is dropping it at all.

And of course, there MUST NOT be any "Re: " after the tag, it must stand
in front of the tag.

Scanning over the Subjects, they all look like

"Re: [unicode] Re:"

to me -- almost no significant information in the first three words. This is
awfull

--J"org Knappen




Script of Elbasan and other Albanian Alphabets

2001-03-09 Thread J%ORG KNAPPEN

Having consulted my references:

The discussion of the Albanian alphabets is in Jenssen, p 494 ff.
Haarmann has nothing about them, not even the pictures.

--J"org Knappen 



Re: Albanian alphabet

2001-03-08 Thread J%ORG KNAPPEN

The Alphabet of Elbasan is reproduced in typical alphabet colletions
like 

Carl Faulmann: Das Buch der Schrift 18??, many recent reprints
Hans Jenssen: Die Schrift, Akademie-Verlag Berlin 1969
Harald Haarmann: Universalgeschichte der Schrift, Campus, Frankfurt/Main 199?

In one of the latter two references, (having them not on my desktop I can't
tell which, but I think it is Haarmann) is a longer discussion on the evolution
of the albanic alphabet. Acording to that reference, the Elbasan alphabet
derives from contemporary handwritten greek. The reference shows also really
fancy latin alphabet used in the first two decades of the 20th century
with greek and cyrillic derived letters augmenting the standard latin
alphabet. Probably not all of those are yet encoded in UNicode/ISO 10646.

--J"org Knappen



Re: UTF-8, C1 controls, and UNIX

2001-03-01 Thread J%ORG KNAPPEN

Keld schrieb:

> Maybe one should make a transmission safe UTF that left C1 alone?

There already is utf-7d5 created exactly for this purpose ... see
http://www.uni-mainz.de/~knappen/jk009.html and
http://www.uni-mainz.de/~knappen/jk010.html .

It also has the nice faeture of escaping the Latin-1 letters (but not the 
symbols!) with a pound sign, thus being almost human readable in a latin-1
context.

--J"org Knappen



utf-1.3 and utf-1.4

2001-02-28 Thread J%ORG KNAPPEN

On
 
http://www.atm.ch.cam.ac.uk/acmsu/utf/

I found the acronym utf used in a very different way than 
UNicoders/ISO10646ers use it. Fortunately, there never was 
a utf-1.3 or utf-1.4 in our context.

--J"org Knappen



Re: Latin digraph characters (was: Re: Klingon silliness)

2001-02-28 Thread J%ORG KNAPPEN

Doug Ewell frug:

> Aren't Serbian and Croatian the standard example of two "languages" that are 
> really the same language but are treated separately (a) for political reasons 
> and (b) because Cyrillic is used to write the former and Latin to write the 
> latter?  Are there any linguistic or vocabulary differences between them?

The matter is much more complicated here. Linguistically speaking, 
there is a south slavonic dialect continuum from slovenian to bulgarian
with no sharp language boundaries. There are, of course, many feature 
boundaries and isoglosses, as usual in dialect continua.

Any national language is a contruction (where the degree of contructedness
varies considerably). Serbocroatian (as a single language) is essentially
a 19th century construction and became the national language of Yugoslavia
after WW I. Serbian, Croatian, Bosnian (and maybe Montenegrin soon) are more
recent constructions before and after the split of Yugoslavia into parts.

There is lot of prescriptive language planning going on in order to make 
the three languages more different form each other. The national languages
do not map the major dialect boundaries in the dialect continuum. 

If you can read german, I recommend to you the book of 

Detlev Blanke, Internationale Plansprachen, Akademie-Verlag Berlin

whch contains lots of examples how national languages contained 
planned elements. I proceeds with a survey of planned languages and
Esperanto. Did you know, the Slovak was reconstructed in the 19th century
in order to make it more different from czech?

--J"org Knappen



Re: Inverted breve in Greek?

2001-02-22 Thread J%ORG KNAPPEN

Inverted breve is one of the possibilities to represent the 
greek circumflex accent (in Unicode called PERISPOMENI). 
It looks very british to my eyes, here in germany one usually sees 
the tilde as representation.

Note that there is a floating PERISPOMENI at U+1FC0, it is not
unified with the latin tilde accent.

--J"org Knappen



Re: Inverted breve in Greek?

2001-02-22 Thread J%ORG KNAPPEN

Erratum: the combining perispomeni is at U+0342, I first digged out
the non.combing one.

--J"org Knappen 



Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

2001-02-20 Thread J%ORG KNAPPEN

Doug Ewell wrote:

> A few days ago I said there was a "widespread belief" that Unicode is a 
> 16-bit-only character set that ends at U+.  A corollary is that the 
> supplementary characters ranging from U+1 to U+10 are either 
> little-known or perceived to belong to ISO/IEC 10646 only, not to Unicode.

This still echoes the marketing hype of Unicode 1.0 (which was before the
merger with ISO 10646).

> At least one list member questioned whether this belief was really widespread.

Since there was much noise about Unicode 1.0, this belief is implemented
widely. Only the technical experts who keep with the updates know better.

> "A 16-bit character encoding standard developed by the Unicode Consortium 
> between 1988 and 1991.  By using two bytes to represent each character, 
> Unicode enables almost all of the written languages of the world to be 
> represented using a single character set.  By contrast, 8-bit ASCII is not 
> capable of representing all of the combinations of letters and diacritical 
> marks that are used just with the Roman alphabet.

A little out of date, but describing correctly the state of art in 1991
before the merger. Even 8-bit ASCII is a correct term meaning ISO-8859-1.
A nit to pick: It's the latin alphabet, not roman. Roman is a kind of typeface,
contrasting to sans serif aka grotesque.
 
> "Approximately 39,000 of the 65,536 possible Unicode character codes have 
> been assigned to date, 21,000 of them being used for Chinese ideographs.  The 
> remaining combinations are open for expansion.

Also true (no Hangull syllables at that time).

> "See also ASCII."

> Exercise for the reader:  See how many misstatements about Unicode (and 
> ASCII) you can find in this text.

Fewer than you expect. Only the target described does not exist any longer.
Since the merger with ISO 10646 was forseeable even at that time, there are
no implementation of Unicode 1.0 anyway.

--J"org Knappen



Esperanto (estis: [OT] Close to latin)

2001-01-03 Thread J%ORG KNAPPEN

Antoine Leca skribis:

>  Esperanto
>  showed us that a fossilized language cannot aim at being lingua franca
>  (at least, this is what I learnt from the linguists I read; I welcome
>  counter arguments).

Several errors here: 

First of all, a fossilized language can indeed be the lingua franca of an epoch,
as the example of latin in europe for a long time shows.

Second, there is an error on the nature of esperanto: Allthough it started as
a planned and designed language, it shows now all features of language evolution:
Innovation on vacabulary and grammar (e.g. the male moving suffix -icho), some
vocabulary and some grammatical features become obsolete and sound archaic.

Esperanto surely can _aim at being lingua franca_, however I doubt that it will
succeed in this aim. It has its merits, however, and will survive as the 
communication language of its own tribe.

There is another point: All languages have a certain degree of planedness.
A rough ordering may look like:

Loglan -- Esperanto -- Ivrith -- Slovak, Estonian -- French -- German -- ...

Becoming on-topic again:

There seems to be a strong analogy between languages and character sets. Any 
character to be encoded is a human invention. There are no 'natural' characters
at all. For some characters, it is no longer known, who invented them when 
and why; for others we know these facts quite exactly (e. g. latin letter j 
with circumflex). The fact, that a character (or a complete script) is 'made up'
or invented by someone, give no argument (neither pro nor contra) for its 
inclusion in UNicode.

The need to put text containing a certain character or script onto the 
computer, and ongoing publication activity are arguments. Character worth 
encoding are like living languages in this respect. They need to have at 
least some market share (which may be small compared to the 'big players').

--J"org Knappen




Re: Information about curly-tailed phonetic letters

2000-12-17 Thread J%ORG KNAPPEN

The curly-tail consonants t, d, n, l, c, z are also included in the
TeX IPA (tipa fonts). The documentation of those fonts is available
on 

ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz

--J"org Knappen





Re: Information about curly-tailed phonetic letters

2000-11-24 Thread J%ORG KNAPPEN

The curly-tail consonants t, d, n, l, c, z are also included in the
TeX IPA (tipa fonts). The documentation of those fonts is available
on 

ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz

--J"org Knappen





Missing mathematical character discovered

2000-10-04 Thread J%ORG KNAPPEN

Dear colleagues,

I noticed that the following mathematical character seems to be absent both 
from current UNicode and from the STIX proposal:

|=| tautological equivalent sign
* german: gleichstark
* mathematical relation (R)
* Reference: Bauer and Wirsing, Elementare Aussagenlogik, Springer-Verlag
  Berlin/Heidelberg, 1991, page 32 ff.
* Looks like TeX's \models with a closing vertical bar added
* Simple ASCII graphics: |=|

Yours,

J"org Knappen
Springer-Verlag Heidelberg


* See you at the MathML conference at Urbana/Champaign



Re: New Name Registry Using Unicode

2000-09-29 Thread J%ORG KNAPPEN

There is another serious problem:

Characters sharing the same glyph, but being different.

In Russia, users of TeX got annoyed when they got the error message
unknown command sequence when they had typed in \TeX. It is known
if and only if all three letters are latin. There are 8 possible
spellings of TeX, 7 of them invalid. Greek adds more possibilities,
if you allow for capital letters. Forcing lowercase makes the situation 
better, but does not resolve it completely ("a", "e", "y" latin/cyrillic;
"o" latin/cyrillic/greek are examples).

--J"org Knappen
 



Re: unicode + oracle query....... (suggestions needed...)

2000-09-27 Thread J%ORG KNAPPEN

Sandeep Krishna schrieb:

>   * some unicode characters(or rather code points.) like' F95F' when encoded
> in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as
> EF A5 9F..  in fact many unicode charcters whose encoded form had to had a
> byte in the range (80..9F) were being somehow changed to BF ... thus
> resulting in incorrect retrieval

Oops, it seems that this particular version of Oracle is only 7,5bit clean ...
Hope they fix it soon, otherwise you need UTF-7d5 (inofficial) as a workaround.

--J"org Knappen



[very OT] Rotwelsch

2000-09-26 Thread J%ORG KNAPPEN

Rotwelsch is an argot, spoken several hundred years ago throughout
europe. It was based on french with many words from hebrew. It was intentionally
obscure to outsiders.

If I remember right, Francois Villon wrote peoms and songs in Rotwelsch.

--J"org Knappen



Re: [very OT] "Slavic"

2000-09-21 Thread J%ORG KNAPPEN



No, in german "welsch" always means a romance language (in most
cases french, but also italian and even romanian can fill in). Note
also "rotwelsch". 

The "generic" term for slavonic languages is "wendisch" or "windisch"
derived form the formerly slavonic "Wenden", settling in a region
called "Wendland" (approximately identical to today's Landkreis
Lüchow-Danneberg, north of Uelzen). 

--J"org Knappen



Re: TATAP => TATAR

2000-09-19 Thread J%ORG KNAPPEN

Browsing the picture given at the Radio Free Europe site, there is one
pair of suspicious letters:

The tatar letter Eng has a shape sufficiently different from standard latin
eng to be considered unsupported by unicode.

The O with bar I finally found to be already encoded.

However, Radio Free Europe is not what I'd call a primary source, more 
research is definitely needed.

--J"org Knappen

P.S. Bad news for the fans of the dark G -- it is not resurrected, at least to
this source.



Re: TATAP => TATAR

2000-09-19 Thread J%ORG KNAPPEN

I'd really like to see the new latin alfabet of tatar. A transitions can
be very smooth, if the new alfabet is just a transliteration of the old
one. Than in tatarstan there will be a situation like in yugoslavia before
the split: One written language with two eqsily convertable alpfabets.

For standardisation, there may occur further cases like "LJ"/"lj"/"lj"
with tatar.

But without further information, this is speculative only.

--J"org Knappen



Re: the Ethnologue

2000-09-14 Thread J%ORG KNAPPEN

What really makes me wonder, is that the ethnologue seems to ignore the 
vast amount of published information on the german language and its dialects.
There is more than a century of dialetological research on german, and there
are easy accessible publications showing the major and minor subdivisions
of the german language.

The ethnologue gives a very strange picture there, compared to the mainstream
german literature. Maybe, because german dialectologists prefer to publish
in german?

--J"org Knappen

P.S. For fans of the german language, I recommend: 

Werner König, DTV-Atlas zur deutschen Sprache, DTV München, 10th printing 1994,
ISBN 3-423-03025-9

Make sure to get the 10th printing or a latter version, it contains more 
fascinating material. 




Re: the Ethnologue

2000-09-12 Thread J%ORG KNAPPEN

Rick McGowan asked:

> Can anyone point me to an existing list of languages that is more =
> comprehensive and better researched than the Ethnologue?  If there is no =
> such list, then we don't need to consider any alternatives, right?

Ask the closest university department of comparative linuguistics, and you will 
receive quite impressive lists. As a starter, 
David Crystall's Cambridge Encyclopedia of Language contains a good list 
of languages in one of its appendices.

I once looked at the ethnologue and its subdivision of the german language
is just ridiculous. Not small errors, a gross misconception. I don't trust
the ethnologue in area where I don't know the fact well, since it fails in one
area where I know them.

--J"org Knappen




Re: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion

2000-09-08 Thread J%ORG KNAPPEN

I wonder that no one has suggested free recode (under GNU copyleft) yet.
It can do all the mentioned conversions and many more.

It also has a nice perl module as a frontend. It run under any operating system,
including WIN32.

--J"org Knappen



Swiss numerical format (war einmal: What is ` (U+0060) for?)

2000-08-10 Thread J%ORG KNAPPEN

As an aside:

Are there good (authorative) references on the so called
swiss numerical format with its peculiar thousand separator?

I only know about a manual shipped with some Aldus software product
as a reference. I own several books printed in Switzerland and they
show the typical swiss orthography (lack of ß), but all show one of
the two usual german number formats (. or \, (thin space) as thousands
separator).

--J"org Knappen



Re: Addition of remaining two Maltese Characters to Unicode

2000-08-01 Thread J%ORG KNAPPEN

John Cowan frug:

> I have a recollection of seeing a list of Chinese words written in pinyin
> but alphabetized according to bopomofo rules.  Is this commonplace?

I have seen wordlist of indic languages (mostly sanskrit) printed in latin
transliteration but sorted to the devanagari alphabet. The audience of the
material is linguists who know how to sort devanagari. It is for sure not
"commonplace", but also not really rare.

--J"org Knappen