Re: Script use by Mathematicians (Was Re: Single Unicode Font)
Yes, script and symbol use by mathematicians ad scientists is well researched. The outcome of this research is the STIX proposal for additional math characters to be added in UNicode. You may also want to consult the pages on MathML at http://www.w3c.org --J"org Knappen
[unicode] Re: x-bar character
John Cowan schrieb: :Hmm. If you multiply x-bar by y-bar, surely you want the bars to be :separated, not run together into a single bar (which would be the mean :of x times y), no? In that case COMBINING MACRON would be better. :Or should x-bar times y-bar be written with a THIN SPACE separating them? With UNicode 3.2 you can place the character InvisibleTimes (Mathematica speak) in between. But I agree, combining macron is better than combining bar in this case. And the Invisible Times will be an optional embellishment for some time going -- tho it is really usefull in MathML and Computer Algebra. --J"org Knapen
[unicode] Re: Moving mail lists
I don't like the [unicode] prefix to all subject lines, because it eats up too much of the valuable human readble subject line space. I could live with it better, if it can be shortened to something like [uc], but the best for me is dropping it at all. And of course, there MUST NOT be any "Re: " after the tag, it must stand in front of the tag. Scanning over the Subjects, they all look like "Re: [unicode] Re:" to me -- almost no significant information in the first three words. This is awfull --J"org Knappen
Script of Elbasan and other Albanian Alphabets
Having consulted my references: The discussion of the Albanian alphabets is in Jenssen, p 494 ff. Haarmann has nothing about them, not even the pictures. --J"org Knappen
Re: Albanian alphabet
The Alphabet of Elbasan is reproduced in typical alphabet colletions like Carl Faulmann: Das Buch der Schrift 18??, many recent reprints Hans Jenssen: Die Schrift, Akademie-Verlag Berlin 1969 Harald Haarmann: Universalgeschichte der Schrift, Campus, Frankfurt/Main 199? In one of the latter two references, (having them not on my desktop I can't tell which, but I think it is Haarmann) is a longer discussion on the evolution of the albanic alphabet. Acording to that reference, the Elbasan alphabet derives from contemporary handwritten greek. The reference shows also really fancy latin alphabet used in the first two decades of the 20th century with greek and cyrillic derived letters augmenting the standard latin alphabet. Probably not all of those are yet encoded in UNicode/ISO 10646. --J"org Knappen
Re: UTF-8, C1 controls, and UNIX
Keld schrieb: > Maybe one should make a transmission safe UTF that left C1 alone? There already is utf-7d5 created exactly for this purpose ... see http://www.uni-mainz.de/~knappen/jk009.html and http://www.uni-mainz.de/~knappen/jk010.html . It also has the nice faeture of escaping the Latin-1 letters (but not the symbols!) with a pound sign, thus being almost human readable in a latin-1 context. --J"org Knappen
utf-1.3 and utf-1.4
On http://www.atm.ch.cam.ac.uk/acmsu/utf/ I found the acronym utf used in a very different way than UNicoders/ISO10646ers use it. Fortunately, there never was a utf-1.3 or utf-1.4 in our context. --J"org Knappen
Re: Latin digraph characters (was: Re: Klingon silliness)
Doug Ewell frug: > Aren't Serbian and Croatian the standard example of two "languages" that are > really the same language but are treated separately (a) for political reasons > and (b) because Cyrillic is used to write the former and Latin to write the > latter? Are there any linguistic or vocabulary differences between them? The matter is much more complicated here. Linguistically speaking, there is a south slavonic dialect continuum from slovenian to bulgarian with no sharp language boundaries. There are, of course, many feature boundaries and isoglosses, as usual in dialect continua. Any national language is a contruction (where the degree of contructedness varies considerably). Serbocroatian (as a single language) is essentially a 19th century construction and became the national language of Yugoslavia after WW I. Serbian, Croatian, Bosnian (and maybe Montenegrin soon) are more recent constructions before and after the split of Yugoslavia into parts. There is lot of prescriptive language planning going on in order to make the three languages more different form each other. The national languages do not map the major dialect boundaries in the dialect continuum. If you can read german, I recommend to you the book of Detlev Blanke, Internationale Plansprachen, Akademie-Verlag Berlin whch contains lots of examples how national languages contained planned elements. I proceeds with a survey of planned languages and Esperanto. Did you know, the Slovak was reconstructed in the 19th century in order to make it more different from czech? --J"org Knappen
Re: Inverted breve in Greek?
Inverted breve is one of the possibilities to represent the greek circumflex accent (in Unicode called PERISPOMENI). It looks very british to my eyes, here in germany one usually sees the tilde as representation. Note that there is a floating PERISPOMENI at U+1FC0, it is not unified with the latin tilde accent. --J"org Knappen
Re: Inverted breve in Greek?
Erratum: the combining perispomeni is at U+0342, I first digged out the non.combing one. --J"org Knappen
Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in
Doug Ewell wrote: > A few days ago I said there was a "widespread belief" that Unicode is a > 16-bit-only character set that ends at U+. A corollary is that the > supplementary characters ranging from U+1 to U+10 are either > little-known or perceived to belong to ISO/IEC 10646 only, not to Unicode. This still echoes the marketing hype of Unicode 1.0 (which was before the merger with ISO 10646). > At least one list member questioned whether this belief was really widespread. Since there was much noise about Unicode 1.0, this belief is implemented widely. Only the technical experts who keep with the updates know better. > "A 16-bit character encoding standard developed by the Unicode Consortium > between 1988 and 1991. By using two bytes to represent each character, > Unicode enables almost all of the written languages of the world to be > represented using a single character set. By contrast, 8-bit ASCII is not > capable of representing all of the combinations of letters and diacritical > marks that are used just with the Roman alphabet. A little out of date, but describing correctly the state of art in 1991 before the merger. Even 8-bit ASCII is a correct term meaning ISO-8859-1. A nit to pick: It's the latin alphabet, not roman. Roman is a kind of typeface, contrasting to sans serif aka grotesque. > "Approximately 39,000 of the 65,536 possible Unicode character codes have > been assigned to date, 21,000 of them being used for Chinese ideographs. The > remaining combinations are open for expansion. Also true (no Hangull syllables at that time). > "See also ASCII." > Exercise for the reader: See how many misstatements about Unicode (and > ASCII) you can find in this text. Fewer than you expect. Only the target described does not exist any longer. Since the merger with ISO 10646 was forseeable even at that time, there are no implementation of Unicode 1.0 anyway. --J"org Knappen
Esperanto (estis: [OT] Close to latin)
Antoine Leca skribis: > Esperanto > showed us that a fossilized language cannot aim at being lingua franca > (at least, this is what I learnt from the linguists I read; I welcome > counter arguments). Several errors here: First of all, a fossilized language can indeed be the lingua franca of an epoch, as the example of latin in europe for a long time shows. Second, there is an error on the nature of esperanto: Allthough it started as a planned and designed language, it shows now all features of language evolution: Innovation on vacabulary and grammar (e.g. the male moving suffix -icho), some vocabulary and some grammatical features become obsolete and sound archaic. Esperanto surely can _aim at being lingua franca_, however I doubt that it will succeed in this aim. It has its merits, however, and will survive as the communication language of its own tribe. There is another point: All languages have a certain degree of planedness. A rough ordering may look like: Loglan -- Esperanto -- Ivrith -- Slovak, Estonian -- French -- German -- ... Becoming on-topic again: There seems to be a strong analogy between languages and character sets. Any character to be encoded is a human invention. There are no 'natural' characters at all. For some characters, it is no longer known, who invented them when and why; for others we know these facts quite exactly (e. g. latin letter j with circumflex). The fact, that a character (or a complete script) is 'made up' or invented by someone, give no argument (neither pro nor contra) for its inclusion in UNicode. The need to put text containing a certain character or script onto the computer, and ongoing publication activity are arguments. Character worth encoding are like living languages in this respect. They need to have at least some market share (which may be small compared to the 'big players'). --J"org Knappen
Re: Information about curly-tailed phonetic letters
The curly-tail consonants t, d, n, l, c, z are also included in the TeX IPA (tipa fonts). The documentation of those fonts is available on ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz --J"org Knappen
Re: Information about curly-tailed phonetic letters
The curly-tail consonants t, d, n, l, c, z are also included in the TeX IPA (tipa fonts). The documentation of those fonts is available on ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz --J"org Knappen
Missing mathematical character discovered
Dear colleagues, I noticed that the following mathematical character seems to be absent both from current UNicode and from the STIX proposal: |=| tautological equivalent sign * german: gleichstark * mathematical relation (R) * Reference: Bauer and Wirsing, Elementare Aussagenlogik, Springer-Verlag Berlin/Heidelberg, 1991, page 32 ff. * Looks like TeX's \models with a closing vertical bar added * Simple ASCII graphics: |=| Yours, J"org Knappen Springer-Verlag Heidelberg * See you at the MathML conference at Urbana/Champaign
Re: New Name Registry Using Unicode
There is another serious problem: Characters sharing the same glyph, but being different. In Russia, users of TeX got annoyed when they got the error message unknown command sequence when they had typed in \TeX. It is known if and only if all three letters are latin. There are 8 possible spellings of TeX, 7 of them invalid. Greek adds more possibilities, if you allow for capital letters. Forcing lowercase makes the situation better, but does not resolve it completely ("a", "e", "y" latin/cyrillic; "o" latin/cyrillic/greek are examples). --J"org Knappen
Re: unicode + oracle query....... (suggestions needed...)
Sandeep Krishna schrieb: > * some unicode characters(or rather code points.) like' F95F' when encoded > in UTF-8 was being encoded as EF A5 BF, when it should have been encoded as > EF A5 9F.. in fact many unicode charcters whose encoded form had to had a > byte in the range (80..9F) were being somehow changed to BF ... thus > resulting in incorrect retrieval Oops, it seems that this particular version of Oracle is only 7,5bit clean ... Hope they fix it soon, otherwise you need UTF-7d5 (inofficial) as a workaround. --J"org Knappen
[very OT] Rotwelsch
Rotwelsch is an argot, spoken several hundred years ago throughout europe. It was based on french with many words from hebrew. It was intentionally obscure to outsiders. If I remember right, Francois Villon wrote peoms and songs in Rotwelsch. --J"org Knappen
Re: [very OT] "Slavic"
No, in german "welsch" always means a romance language (in most cases french, but also italian and even romanian can fill in). Note also "rotwelsch". The "generic" term for slavonic languages is "wendisch" or "windisch" derived form the formerly slavonic "Wenden", settling in a region called "Wendland" (approximately identical to today's Landkreis Lüchow-Danneberg, north of Uelzen). --J"org Knappen
Re: TATAP => TATAR
Browsing the picture given at the Radio Free Europe site, there is one pair of suspicious letters: The tatar letter Eng has a shape sufficiently different from standard latin eng to be considered unsupported by unicode. The O with bar I finally found to be already encoded. However, Radio Free Europe is not what I'd call a primary source, more research is definitely needed. --J"org Knappen P.S. Bad news for the fans of the dark G -- it is not resurrected, at least to this source.
Re: TATAP => TATAR
I'd really like to see the new latin alfabet of tatar. A transitions can be very smooth, if the new alfabet is just a transliteration of the old one. Than in tatarstan there will be a situation like in yugoslavia before the split: One written language with two eqsily convertable alpfabets. For standardisation, there may occur further cases like "LJ"/"lj"/"lj" with tatar. But without further information, this is speculative only. --J"org Knappen
Re: the Ethnologue
What really makes me wonder, is that the ethnologue seems to ignore the vast amount of published information on the german language and its dialects. There is more than a century of dialetological research on german, and there are easy accessible publications showing the major and minor subdivisions of the german language. The ethnologue gives a very strange picture there, compared to the mainstream german literature. Maybe, because german dialectologists prefer to publish in german? --J"org Knappen P.S. For fans of the german language, I recommend: Werner König, DTV-Atlas zur deutschen Sprache, DTV München, 10th printing 1994, ISBN 3-423-03025-9 Make sure to get the 10th printing or a latter version, it contains more fascinating material.
Re: the Ethnologue
Rick McGowan asked: > Can anyone point me to an existing list of languages that is more = > comprehensive and better researched than the Ethnologue? If there is no = > such list, then we don't need to consider any alternatives, right? Ask the closest university department of comparative linuguistics, and you will receive quite impressive lists. As a starter, David Crystall's Cambridge Encyclopedia of Language contains a good list of languages in one of its appendices. I once looked at the ethnologue and its subdivision of the german language is just ridiculous. Not small errors, a gross misconception. I don't trust the ethnologue in area where I don't know the fact well, since it fails in one area where I know them. --J"org Knappen
Re: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion
I wonder that no one has suggested free recode (under GNU copyleft) yet. It can do all the mentioned conversions and many more. It also has a nice perl module as a frontend. It run under any operating system, including WIN32. --J"org Knappen
Swiss numerical format (war einmal: What is ` (U+0060) for?)
As an aside: Are there good (authorative) references on the so called swiss numerical format with its peculiar thousand separator? I only know about a manual shipped with some Aldus software product as a reference. I own several books printed in Switzerland and they show the typical swiss orthography (lack of ß), but all show one of the two usual german number formats (. or \, (thin space) as thousands separator). --J"org Knappen
Re: Addition of remaining two Maltese Characters to Unicode
John Cowan frug: > I have a recollection of seeing a list of Chinese words written in pinyin > but alphabetized according to bopomofo rules. Is this commonplace? I have seen wordlist of indic languages (mostly sanskrit) printed in latin transliteration but sorted to the devanagari alphabet. The audience of the material is linguists who know how to sort devanagari. It is for sure not "commonplace", but also not really rare. --J"org Knappen