RE: Character identities
Peter Constable wrote: then *any* font having a unicode cmap is a Unicode font. No, not if the glyps (for the supported characters) are inappropriate for the characters given. Kent is quite right here. There are a *lot* of fonts out there with Unicode cmaps that do not at all conform to the Unicode standard --- custom-encoded (some call them hacked) fonts, usually abusing the characters that make up Windows cp1252. IMHO, you are confusing two very different things here: 1) Assigning arbitrary glyphs to some Unicode characters. E.g., assigning the $ character to long S; the ASCII letters to Greek letters; the whole Latin-1 range to Devanagari characters, etc. 2) Choosing strange or unorthodox glyph variants for some Unicode characters. The hacked fonts you mention are case (1); what is being discussed in this thread is case (2). Like it or not, superscript e *is* the same diacritic that later become ¨, so there is absolutely no violation of the Unicode standard. Of course, this only applies German. The fact that umlaut and dieresis have been unified in Unicode, makes such a variant glyph only applicable to a font targeted to German. You could not use that font to, e.g., typeset English or French, because the ¨ in coöperation or naïve is a dieresis, not an umlaut sign. There are other cases out there of Unicode fonts suitable for Chinese but not Japanese, Italian but not Polish, Arabic but not Urdu, etc. Why should a Unicode font suitable for German but not for English be any worse? _ Marco
Re: Character identities
- Original Message - From: Marco Cimarosti [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, October 25, 2002 10:42 AM Subject: RE: Character identities Of course, this only applies German. And Swedish. Stefan _ Gratis e-mail resten av livet på www.yahoo.se/mail Busenkelt!
RE: Character identities
... Like it or not, superscript e *is* the same diacritic that later become ¨, so there is absolutely no violation of the Unicode standard. Of course, this only applies German. Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different derivations from o^e), it is just as inappropriate for font makers to use a o^e glyph for ö (by default in a Unicode font). Though in some sense the same they are still different enough for authors to care, and it is up to the document author/editor to decide, not the font maker. From: [EMAIL PROTECTED] ... We've implemented this successfully in OpenType fonts using the Historical Forms hist feature. If the umlaut to overscript e transformation is put under this feature for some fonts, I see no major reason to complain... (As others have noted, it does not really work for the long s, unless the language is labelled 'en'...) /Kent K
Re: Character identities
To all contributors to this thread: Please cease cc-ing [EMAIL PROTECTED]! The CC was meant for my remark on fuzzy search wrt. long-s and round-s. Google are certainly not interested in any and all other turns this thread has taken, or may take later. David J. Perry had written: An OpenType font that is smart enough to substitute a long s glyph at the right spots is the much superior long-term solution. To which I had replied: This will not work, cf. infra. John Hudson wrote: To be accurate, it works for display of English but not for German. David's remark was about German Fraktur orthography. My quote was too short, so this detail was lost. I apologize for any misunderstandings possibly caused by my omission. Best wishes, Otto Stolz
RE: need open source tools to convert indic font encoding into ISCII or Unicode
Frank Tang wrote: I am looking for open source tool (C / C++ / Perl or Java) to convert between (UTF-8/UTF-16 or ISCII) and differnt Indict font encoding. Please let me know if you know anything available. Language: C, [...] Convert from A to / from B where A mean UTF-8 UTF-16, or ISCII B mean font encoding of Nadunia font font encoding of Shusha font font encoding of DV-Suresh font font encoding of DV-Yogesh font font encoding of Mangal font (that is just OpenType, is it ?) Also conversion between UTF-16 and ISCII Check out these ongoing project: - ISSCIIlib (http://www.cse.iitk.ac.in/users/isciig/isciilib/main.html) - iconverter (http://www.cse.iitk.ac.in/users/isciig/iconverter/main.html) Both hosted at the Linux Technology Development for Indian Languages (http://www.cse.iitk.ac.in/users/isciig/). _ Marco
Superscript e (was: Character identities)
Marco Cimarosti (amongst others, using the same term) wrote: superscript e *is* the same diacritic that later become ¨ The term superscript e does not aptly describe the situation. Rather, the German a-Umlaut is derived from U+0061 U+0364 (LATIN SMALL CHARACTER A + COMBINING LATIN SMALL LETTER E), cf. http://www.unicode.org/charts/PDF/U0300.pdf. Best wishes, Otto Stolz
Re: The character @ and gender studies...
At 05:31 -0700 2002-10-25, Ramiro Espinoza wrote: In some latin countries the people involved in gender studies are using the character to mean a/o. Example: Tods nosotrs (instead of todos nosotros -All of us-). They try to give a male and female approach to the spanish generic words. That's pretty horrible. Why don't they just use the letter schwa? :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
At 14:04 25.10.2002 +0200, Kent Karlsson wrote: Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different derivations from o^e), it is just as inappropriate for font makers to use a o^e glyph for ö (by default in a Unicode font). Though in some sense the same they are still different enough for authors to care, and it is up to the document author/editor to decide, not the font maker. My wholehearted support! DIN asked for the combining letter small e as well as the other combining small letters specifically to cater for the requirements of scholars in a number of countries, notably Germany. In a large number of editions and scholarly dictionaries, both diacritics, the combining diaeresis and the combining letter e, are used on the very same page, even directly next to each other. The former is used for modern German words, the latter for medieval German words. The combining letter small e does not even necessarily stand for what today is the umlaut, it may have a number of different interpretations. For modern and medieval German words, the base font is in these cases the same -- editions are not normally printed in some sort of pseudo-archaic font. For this reason it is quite impermissible to render the combining letter small e as a diaeresis or, for that matter, the diaeresis as a combining letter small e (however, you see the latter version sometimes, very infrequently, in advertisement). As to the long s, it is not used for writing present-day German except in rare cases, notably in some scholarly editions and in the Fraktur script. Very few texts beyond the names of newspapers are nowadays produced in Fraktur. To put the long s on the German keyboard would be quite contrary to user requirements -- and if a requirement existed, it would be DIN's job to amend DIN 2137-2 and the upcoming DIN 2137-12 to cater for it. Best regards, Marc * Marc Wilhelm Küster Saphor GmbH Fronländer 22 D-72072 Tübingen Tel.: (+49) / (0)7472 / 949 100 Fax: (+49) / (0)7472 / 949 114
Re: The character @ and gender studies...
Yes - imagine the burden on open relay mailers when they try to blast spam to ill formed email addresses they harvested! Hey wait - maybe this is a *good* idea! Barry www.i18n.com At 02:12 PM 10/25/2002 +0100, Michael Everson wrote: At 05:31 -0700 2002-10-25, Ramiro Espinoza wrote: In some latin countries the people involved in gender studies are using the character to mean a/o. Example: Tods nosotrs (instead of todos nosotros -All of us-). They try to give a male and female approach to the spanish generic words. That's pretty horrible. Why don't they just use the letter schwa? :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
Marc Wilhelm Küster wrote: At 14:04 25.10.2002 +0200, Kent Karlsson wrote: Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different derivations from o^e), it is just as inappropriate for font makers to use a o^e glyph for ö (by default in a Unicode font). Though in some sense the same they are still different enough for authors to care, and it is up to the document author/editor to decide, not the font maker. My wholehearted support! [...] For this reason it is quite impermissible to render the combining letter small e as a diaeresis So far so good. There would be no reason for doing such a thing. If the author of a scholarly work used U+0364 (COMBINING LATIN SMALL LETTER E), this character should be displayed as either a letter e superscript to the base letter, or as an empty square (for fonts not caring about that character). or, for that matter, the diaeresis as a combining letter small e (however, you see the latter version sometimes, very infrequently, in advertisement). This is the case I though we were discussing, and it is a very different case. Standing Keld's opinion and Marc's wholehearted support, it follows that those infrequent advertisements should be encoded using U+0364... But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a small collection of Medieval superscript letter diactrics, which is supposed to appear primarily in medieval Germanic manuscripts, or to reproduce some usage as late as the 19th century in some languages. Using such a character to encode 21st century advertisements is doomed to cause problems: 1) The glyph for U+0364 is more likely found in the font collection of the Faculty of Germanic Studies that on the PC of people wishing to read the advertisement for Ye Olde Küster Pub. So, most people will be unable to view the advertisement correctly. 2) The designer of the advertisement will be unable to use his spell-checker and hyphenator on the advertisement's text. 3) User's will be unable to find the Küster Pub by searching Küster in a search engine. What will actually happen is that everybody will see an empty square, so they'll think that the web designer is an idiot, apart the professors at the Faculty of Germanic Studies, who'll think that the designer is an idiot because she doesn't know the difference between U+0308 and U+0364 in ancient German. The real error (IMHO) is the idea that font designers should stick to the *sample* glyphs printed on the Unicode book, because this would force graphic designer to change the *encoding* of their text in order to get the desired result. Another big error (IMHO, once again) is the idea that two different Unicode characters should look different. The difference must be preserved when it is useful -- e.g., U+0308 should not look like U+0364 in a font designed for publishing books on the history of German! What should really happen, IMHO, is that modern German should be encoded as modern German. A U+0308 (COMBINING DIAERESIS) should remain a U+0308, regardless that the corresponding glyph *looks* like U+0364 (COMBINING LATIN SMALL LETTER E) in one font, and it looks like U+0304 (COMBINING MACRON) in another font, and it looks like two five-pointed start side-by-side in a third font, and it looks like Mickey Mouse's ears in Disney.ttf... _ Marco
FW: Toned Greek Capital Vocals
-Original Message- Date/Time:Fri Oct 25 08:12:22 EDT 2002 Contact: [EMAIL PROTECTED] Report Type: Other Question, Problem, or Feedback Most browsers do not support Toned Greek Capital Vocals and I can't find this code in Uni-Coding. If you can read greek the letters I'm reffered to are: Έ , Ά , Ύ , Ό , Π, Ί and Ή . Is there a code that I can't find? -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- (End of Report)
RE: Character identities
Kent Karlsson wrote: ... Like it or not, superscript e *is* the same diacritic that later become ¨, so there is absolutely no violation of the Unicode standard. Of course, this only applies German. Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different derivations from o^e), it is just as inappropriate for font makers to use a o^e glyph for ö (by default in a Unicode font). Though in some sense the same they are still different enough for authors to care, and it is up to the document author/editor to decide, not the font maker. It is certainly up to the author of the document to decide. But, as I explained more at length in my reply to Marc, the are two different approaches for deciding this: 1. When this decision is a matter of *content* (as may be the case when writing about linguistics, to differentiate spellings with o^e from spellings with ö), it is more appropriate to make the difference at the *encoding* level, by using the appropriate code point. 2. When this decision is only a matter of *presentation*, it is more appropriate to make the difference by using a font which uses the desired glyph for the normal ¨. If the umlaut to overscript e transformation is put under this feature for some fonts, I see no major reason to complain... (As others have noted, it does not really work for the long s, unless the language is labelled 'en'...) And, of course, in an ideal word option 2 will be done by switching a font feature, rather than switching to an ad-hoc font. This makes it possible for font designers to provide a single font which covers both needs. But this is just optimization, not compliance! _ Marco
hacked fonts in MS-Windows: rev. solidus vs Yen/Won(was..RE: Characteridentities)
On Fri, 25 Oct 2002, Marco Cimarosti wrote: 1) Assigning arbitrary glyphs to some Unicode characters. E.g., assigning the $ character to long S; the ASCII letters to Greek letters; the whole Latin-1 range to Devanagari characters, etc. There are several Japanese and Korean fonts with exactly this problem that are shipped with MS Windows. As is well known, at U+005C they have glyphs for Yen sign and Won sign instead of reverse solidus. IMO, MS is trying to solve a well-known problem of 'Yen/Won vs reverse solidus' in a totally inappropriate way(a font-hack !) . Nobody would argue that reverse solidus and Yen/Won signs are identical. However, what these fonts do is exactly that. Just switching fonts suddenly turn all reverse solidus to Yen/Won signs. MS-Windows has to provide distinct ways to enter 'reverse solidus' and 'Yen/Won' sign (both full-width and half-width) in Japanese and Korean IMEs. It must be very easy to modify their IMEs that way. In Japanese and Korean input mode, pressing the key marked with 'vertical bar, Yen/Won' (this marking also has to be changed to have three 'vertical bar, reverse solidus and Yen/Won in a diff. color') should generate 'Yen/Won' sign (half-width/full-width version can be controlled the same way as for US-ASCII characters. Both Japanese and Korean IMEs in MS Windows offer this run-time configurable option) while in non-Japanese/Korean mode (usually English) they should generate 'reverse solidus' for the key. Somebody may argue that this would be problematic because there are a lot of old documents in traditional/legacy encodings (Shift_JIS/CP932 and CP949) that use 0x5c for Yen/Won sign. There's no easy solution other than some heuristics combined with manual correction for this problem. This conversion has to be done for all old documents. It may be painful and time-consuming to life the degeneracy between 'reverse solidus' and 'Yen/Won sign' (in financial documents, most of them must be 'Yen/Won sign'. In TeX/LaTeX, most of them must be 'reverse solidus'), but it has to be done at some point and it's better to do it early than later. After that, there should be no old documents/data with this problem IF MS Windows stops tacitly promoting its users to produce documents with U+005C meant for Yen/Won sign by shipping hacked-font mentioned above. Another argument against this change may be that it's quite inconvenient to toggle between EN mode and JA/KO mode when typing a long file path(in Japanese and Korean) with a lot of path separators (U+005C in MS-Windows) embedded. Well, most Windows users rarely type file paths directly. For a small number of users that do type them often but do not type currency signs as often (another group of people that need this option is TeX/LaTeX users, perl/shell/etc programmers), JA/KO IME can have another run-time option, whether or not hitting the key labelled with 'reverse solidus/veritcal bar/Yen/Won' always produce 'reverse solidus'. Jungshik Shin
Re: Sarati
At 00:30 -0400 2002-10-25, Robert wrote: Another language alphabetic script that reads left-to-right vertically from top-to-bottom is Sarati, another of the fantasy scripts from the late J. R. R. Tolkien's *Lord Of The Rings* book series. Perhaps another of the scripts J.R.R. Tolkien devised for his fictional universe. (He's not late. He died in 19.) Featured in Sarati are the consonant symbols (sarat) that form the backbone of the vertical reading line; the vowels are small marks that go on either side (left for before, right for after) of the involved consonantóthe name *Illuvatar* (for example) would be written thus in Sarati: We haven't roadmapped Sarati yet, by the way. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Toned Greek Capital Vocals
- Original Message - From: Magda Danish (Unicode) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, October 25, 2002 6:54 PM Subject: FW: Toned Greek Capital Vocals -Original Message- Date/Time:Fri Oct 25 08:12:22 EDT 2002 Contact: [EMAIL PROTECTED] Report Type: Other Question, Problem, or Feedback Most browsers do not support Toned Greek Capital Vocals and I can't find this code in Uni-Coding. If you can read greek the letters I'm reffered to are: Î , Î , Î , Î , Î , Î and Î . Is there a code that I can't find? -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- (End of Report) First of all, please don't misinterpret UTF-8 as CP1252 and then convert that into UTF-8; that just causes problems. Secondly, how were you able to include those characters in your mail if you couldn't find them? Finally, this has nothing with browsers to do; it is caused by the fact that many people don't have any font supporting those characters. If browser means a web browser, I'd suggest that you ask the page visitors to download appropriate fonts. Stefan _ Följ VM på nära håll på Yahoo!s officielle VM-sajt www.yahoo.se/vm2002 Håll dig ajour med nyheter och resultat, med vinnare och förlorare...
Re: Toned Greek Capital Vocals
evlagoutaris at hotmail dot com wrote: Most browsers do not support Toned Greek Capital Vocals and I can't find this code in Uni-Coding. If you can read greek the letters I'm reffered to are: Έ , Ά , Ύ , Ό , Ώ , Ί and Ή . Is there a code that I can't find? But you did provide Unicode code points for these seven letters in your message: ΈU+0388 ΆU+0386 ΎU+038E ΌU+038C ΏU+038F ΊU+038A ΉU+0389 Other letters with tonos can be found in the Greek and Coptic block at U+03xx, in the Greek Extended block at U+1Fxx, or by adding U+0301 COMBINING ACUTE ACCENT to a base vowel. As far as the browsers are concerned, it is likely that most *browsers* can handle these letters just fine but many *fonts* do not include glyphs for them. Go to www.unicode.org and click on Display Problems? for more information. -Doug Ewell Fullerton, California
Java Unicode support
What level of Unicode does Java currently fully support? Carl
Re: Apple news!
Michael Everson wrote: Interesting news. A P P L ED E V E L O P E RC O N N E C T I O N 1 N E W S Issue 318September 13, 2002 [3] Mac OS X-Only Booting For 2003 Starting in January 2003, all new Mac models will only boot into Mac OS X as the start-up operating system, though they will retain the ability to run most Mac OS 9 applications through Apple's bundled Classic software. There are nearly 4,000 native applications now available for Mac OS X. http://www.apple.com/pr/library/2002/sep/10macosx.html Great news. How many of those 4,000 allow Unicode input? AppleWorks doesn't. Adobe products don't. Office X doesn't. Netscape7 DOES Mozilla DOES Mac AOL 7 DOES Chimera DOES We wait more or less patiently. On the other hand, OmniWeb is by far and away the finest Unicode browser I've ever seen.
Re: hacked fonts in MS-Windows: rev. solidus vs Yen/Won(was..RE: Character identities)
Jungshik Shin jshin at mailaps dot org wrote: ... MS-Windows has to provide distinct ways to enter 'reverse solidus' and 'Yen/Won' sign (both full-width and half-width) in Japanese and Korean IMEs. ... Good points, well stated. To make matters worse, the keyboard references at Microsoft's Global Development subsite [1] show: 1. for Korean, a won sign and the legend U+005C Reverse Solidus\nWon Sign 2. for Japanese, a yen sign and the legend U+005C Reverse Solidus\nYen Sign This helps perpetuate the idea that U+005C could be either a reverse solidus, a won sign, or a yen sign, depending on the font. This is exactly what Unicode is *not* about. Microsoft usually understands this. -Doug Ewell Fullerton, California [1] http://www.microsoft.com/globaldev/keyboards/keyboards.asp
ANN: Sadiss 1.0
application: Sadiss 1.0, Unicode text editor for Ethiopic platform: Java 1.4.0 or above source code: included dev-language: Java home page: http://www.senamirmir.com/projects/ download (zip): http://www.senamirmir.com/download/sadiss1-0.zip download (exe): http://www.senamirmir.com/download/sadiss1-0.exe license: LaTeX Project Public License (LPPL) features: Ethiopic font, keyboard layout for Ethiopic, Unicode character entry, text search, UTF-8, locale support, help, samples,... Thank You, -abass alamnehe ~~
Call for Papers - 2nd notice - Draft 1
Call for Papers! Twenty-third Internationalization and Unicode Conference (IUC23) Unicode, Internationalization, the Web: The Global Connection Week of March 24-28, 2003 Prague, Czech Republic Send in your submission now! Submissions due: November 15, 2002 Notification date: November 29, 2002 Completed papers due: January 6, 2003 (in electronic form and camera-ready paper form) Just 3 weeks to go! The Internationalization Unicode Conference is the premier technical conference worldwide for both software and Web internationalization. The conference (renamed from Unicode Conference to more accurately reflect its content) features tutorials, lectures, and panel discussions that provide coverage of standards, best practices, and recent advances in the globalization of software and the Internet. Attendees benefit from the wide range of basic to advanced topics and the opportunities for dialog and idea exchange with experts in the field. The conference runs multiple sessions simultaneously to maximize the value provided. New technologies, innovative Internet applications, and the evolving Unicode Standard bring new challenges along with their new capabilities. This technical conference will explore the opportunities created by the latest advances and how to leverage them for global users, as well as potential pitfalls to be aware of, and problem areas that need further research. There will also be demonstrations of best practices for designing applications that can accommodate any language. We invite you to submit papers that relate to Unicode or any aspect of software and Web Internationalization. You can view the programs of previous conferences at: http://www.unicode.org/unicode/conference/about-conf.html CONFERENCE ATTENDEES Conference attendees are generally involved in either the development and deployment of Unicode software, or the globalization of software and the Internet. They include managers, software engineers, systems analysts, font designers, graphic designers, content developers, web designers, web administrators, technical writers, and product marketing personnel. THEME TOPICS International computing is the overall theme of the Conference. Presentations should be geared towards a technical audience. Topics of interest include, but are not limited to, the following (within the context of Unicode, internationalization or localizability): - Internationalization issues with new technologies - XML and Web protocols - The World Wide Web (WWW) - Security concerns e.g. Avoiding the spoofing of UTF-8 data - Impact of new encoding standards - Implementing Unicode: Practical and political hurdles - Implementing new features of recent versions of Unicode - Evaluations (case studies, usability studies) - Natural language processing - Algorithms (e.g. normalization, collation, bidirectional) - Programming languages and libraries (Java, Perl, et al) - Optimizing performance of systems and applications - Search engines - Library and archival concerns - Portable devices - Migrating legacy applications - Cross platform issues - Printing and imaging - Operating systems - Databases - Large scale networks - Government applications - Testing applications - Business models for software development (e.g. Open source) We invite you to submit papers which define tomorrow's computing, demonstrate best practices in computing today, or articulate problems that must be solved before further advances can occur. SESSIONS The Conference Program will provide a wide range of sessions including: - Keynote presentations - Workshops/Tutorials - Technical presentations - Panel sessions All sessions except the Workshops/Tutorials will be of 40 minute duration. In some cases, two consecutive 40 minute program slots may be devoted to a single session. The Workshops/Tutorials will each last approximately three hours. They should be designed to stimulate discussion and participation, using slides and demonstrations. PUBLICATIONS If your paper is accepted, your details will be included in the Conference brochure and Web pages and the paper itself will appear on a Conference CD, with an optional printed book of Conference Proceedings. CONFERENCE LANGUAGE The Conference language is English. All submissions, papers and presentations should be provided in English. SUBMISSIONS Submissions MUST contain: 1. An abstract of 150-250 words, consisting of statement of purpose, paper description, and your conclusions or final summary. Also, if this is a paper for an intermediate or advanced audience, please specify what assumptions you are making about the attendees' prior knowledge. 2. A brief biography. 3. The details listed below: SESSION TITLE: _
DUTR #29: Text Boundaries
There is a UTC meeting on November 5. If there is any public feedback on text boundary issues fromhttp://www.unicode.org/unicode/reports/tr29/ (or the related http://www.unicode.org/unicode/reports/tr14/)that feedback should be in by next Thursday at the latest. This will be one of the last opportunities to make any changes before Unicode 4.0.Please send any feedback to me; I will collect it for the meeting. Mark__http://www.macchiato.com► “Eppur si muove” ◄
FYI: Last call WDs: css3-text, css3-ruby
The following last-call drafts of CSS3 modules: Text and Ruby have been posted. They contain quite a bit of non-western typography, so will be of interest to people on this list. The preferred place for comments is the public mailing list [EMAIL PROTECTED] css3-ruby CSS3 module: Ruby http://www.w3.org/TR/2002/WD-css3-ruby-20021024 This document proposes a set of CSS properties associated with the 'Ruby' elements. css3-text CSS3 module: text http://www.w3.org/TR/2002/WD-css3-text-20021024 This document presents a set of text formatting properties for CSS3. Many of these properties already existed in CSS 2. Many of the new properties have been added to address basic requirements in international text layout, particularly for East Asian and bidirectional text.