Re: Character identities
William Overington WOverington at ngo dot globalnet dot co dot uk wrote: Would it be possible to define the U+FE00 variant sequence for a with two dots above it to be a with an e above it, and similarly U+FE00 variant sequences for o with two dots above it and for u with two dots above it, and possibly for e with two dots above it as well? It would be possible for the Unicode Technical Committee to define such a standardized variant, though they have not elected to do so. It would *not* be possible for end users such as you or me to do so. -Doug Ewell Fullerton, California
RE: Character identities
Let me take a few comparable examples; 1. Some (I think font makers) a few years ago argued that the Lithuanian i-dot-circumflex was just a glyph variant (Lithuanian specific) of i-circumflex, and a few other similar characters. Still, the Unicode standard now does not regard those as glyph variants (anymore, if it ever did), and embodies that the Lithuanian i-dot-circumflex is a different character in its casing rules (see SpecialCasing.txt). There are special rules for inserting (when lowercasing) or removing (when uppercasing) dot-aboves on i-s and I-s for Lithuanian. I can only conclude that it would be wrong even for a Lithuanian specific font to display an i-circumflex character as an i-dot-circumflex glyph, even though an i-circumflex glyph is never used for Lithuanian. 2. The Khmer script got allocated a KHMER SIGN BEYYAL. It stands (stood...) for any abbreviation of the Khmer correspondence to etc.; there are at least four different abbreviations, much like etc, etc., c, et c., ... It would be up to the font maker to decide exactly which abbreviation, and would vary by font. However, it is now targeted for deprecation for precisely that reason: it is *not* the font (maker) that should decide which abbreviation convention to use in a document, it is the *author* of the document who should decide. Just as for the Latin script, the author decides how to abbreviate et cetera. The way of abbreviating should stay the same *regardless of font*. Note that the font may be chosen at a much later time, and not for wanting to change abbreviation convention. That convention one may want to have the same throughout a document also when using several different fonts in it, not having to carefully consider abbreviation conventions when choosing fonts. 3. Marco would even allow (by default; I cannot get away from that caveat since some (not all) font technologies do what they do) displaying the ROMAN NUMERAL ONE THOUSAND C D (U+2180) as an M, and it would be up to the font designer. While the glyphs are informative, this glyphic substitution definitely goes too far. If the author chose to use U+2180, a glyph having at least some similarity to the sample glyph should be shown, unless and until someone makes a (permanent or transient) explicit character change. 4. Some people write è instead of é (I claim they cannot spell...). So is it up to a font designer to display é as è if the font is made for a context where many people does not make a distinction? Can a correctly spelled name (say) be turned into an apparent misspelling by just choosing such a font? And that would be a Unicode font? 5. I can't leave the ö vs. ø; these are just different ways of writing the same letter; and it is not the case that ø is used instead of ö for any 7-bit reasons. It is conventional to use ø for ö in Norway and Denmark for any Swedish name (or word) containing it. The same goes for ä vs. æ. Why shouldn't this one be up to the font makers too? If the font is made purely for Norwegian, why not display ö as ø, as is the convention? This is *exactly* the same situation as with ä vs. a^e. I say, let the *author* decide in all these cases, and let that decision stand, *regardless of font changes*. [There is an implicit qualification there, but I'm tired of writing it.] Kent Karlsson wrote: I insist that you can talk about character-to-character mappings only when the so-called backing store is affected in some way. No, why? It is perfectly permissible to do the equivalent of print(to_upper(mystring)) without changing the backing store (mystring in the pseudocode); to_upper here would return a NEW string without changing the argument. And that, conceptually, is a character-to-glyph mapping. Now I have lost you. How can it be that? The print part, yes. But not the to_upper part; that is a character-to-character mapping, inserted between the backing store and mapping characters to glyphs. It is still an (apparent) character-to-character mapping even if it is not stored in the backing store. In my mind, you are so much into the OpenType architecture, and so much used to the concept that glyphization is what a font does, that you can't view the big picture. Now I have lost you again. Some fonts (in some font technologies) do more that pure glyphization. This is why I have been putting in caveats, since many people seem to think that all fonts *only* do glyphisation, which is not the case. But to be general I was referring to such mappings regardless of if that is built into some font (using character code points or, as in OT/AAT, using glyph indices) or (better) were external to the font. I was trying to use general formulations, but I cannot avoid having caveats for certain mappings that certain technologies do
[OT] Gthe (was: Re: RE: Character identities)
Adam Twardoch list dot adam at twardoch dot com wrote: Should an English language font render ö as oe, so that Göthe appears automatically in the more normal English form Goethe? If you refer to Johann Wolfgang von Goethe, his name is *not* spelled with an ö anyway. Somebody thinks so: http://www.transkription.de/gb_seiten/beispiele/goethe.htm -Doug Ewell Fullerton, California
Re: [OT] Gthe (was: Re: RE: Character identities)
At 08:32 31.10.2002 -0800, Doug Ewell wrote: Adam Twardoch list dot adam at twardoch dot com wrote: Should an English language font render ö as oe, so that Göthe appears automatically in the more normal English form Goethe? If you refer to Johann Wolfgang von Goethe, his name is *not* spelled with an ö anyway. Somebody thinks so: http://www.transkription.de/gb_seiten/beispiele/goethe.htm Both forms are permissible and used, even though Goethe is today by far the more frequent version -- remember that there was no standardized German orthography before the late 19th century and that the idea that a person's name has exactly one spelling is a fairly young idea in Europe. Taking such facts into account for matching purposes is a good idea, but changing the version for rendering is not. Best regards, Marc * Marc Wilhelm Küster Saphor GmbH Fronländer 22 D-72072 Tübingen Tel.: (+49) / (0)7472 / 949 100 Fax: (+49) / (0)7472 / 949 114
Re: Character identities
(After sending this unadvertedly to Dominikus only, here's for the list also...) On 2002.10.30, 16:26, Dominikus Scherkl [EMAIL PROTECTED] wrote: A font representing my mothers handwriting (german only :-) would render u as u with breve above to distinguish it from the representation of n. I don't know how my mother would write a text containing an u with breve above, FWIW, I've seen the handwriting of an elder German esperantist, and he does exactly that: he puts breves above each and every u, both on those which have it and on those which don't -- slightly confusing... On the brink of off-topic-ness, something of that sort is made in handwritten cyrillic (at least in Russian tradition): the triple wave of a lower case t is distinguished from the triple wave of a lower case shch (*) by means of a stroke above the former and a stroke below the latter. (*) Not that I'm an enthusiast of this transliteration... -- . António MARTINS-Tuválkin, | ()| [EMAIL PROTECTED] || R. Laureano de Oliveira, 64 r/c esq. | PT-1885-050 MOSCAVIDE (LRS) Não me invejo de quem tem | +351 917 511 549 carros, parelhas e montes | http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe | http://pagina.de/bandeiras/ a água em todas as fontes |
Re: Character identities
In Unicode code point U+308 is applied to COMBINING DIAERESIS. There are a number of precomposed forms with diaeresis. Let's take one of these, : The diaeresis may mean separate pronunication of the u, indicating it is not merged with preceding of following letter but is pronounced distinctly, as in the classical Greek name Peirithos or Spanish antigedad. Similarly in Catalan. It is identified with the Greek dialytika of the same meaning, which is indeed the ultimate known origin of the symbol. The diaeresis indicates umlaut modification of u, as in German ber, a use also found in Finnish, Turkish, Pinyin Chinese Romanization and in many other languages. In Magyar indicates a sound like French eu. In IPA it indicates u with a centralized pronunciation. There are may be other phonic interpretations. Of these uses, only for the second (and possibly the third), might combining superscript e be used instead of the diaeresis. The second certainly represents the most common use of tody, but not the only only one. Unicode encodes the character COMBINING DIAERESIS, not a generic UMLAUT MARKER which might take various forms. It provides itself no way of distinguishing between uses of diaeresis. All the above uses might occur in German text, or Swedish text, or Finnish text or any text which might introduce personal names or geographical names or particular words or phrases from various languages outside the main language of the text. The same applies for and . Indeed individual words with vowels and umlaut marker, whether represented as a COMBINING DIAERESIS or COMBINING LATIN SMALL LETTER E or following e may appear in text in any language because of use of technical vocabulary, eg. Senhnscht, or in personal or place names. Now any use of diaeresis meaning umlaut in any language might, it seems to me, be reasonably replaced by superscript e meaning umlaut. But it is incorrect to replace diaeresis used for any other purpose by superscript e. In stright, plain Unicode, if you want to use diaeresis for umlaut, use diaeresis. If you want to use combining superscript e to indicate umlaut, use COMBINING LATIN SMALL LETTER E. Leave any other occurrences of umlaut alone. This is the only possiblitiy at the plain text level, and the most robust way of chosing between diaeresis and superscript e at any level. Given a higher protocol, we can do more. We might, as suggested, have a font which uses superscript e instead of diaeresis, at least for the combination characters with the base characters a, o, or u and in place of the diaeresis symbol itself. If we have another generally identical font with a true diaeresis instead, we can switch between fonts as necessary depending on whether diaeresis is used for umlaut or not, or whether in particular cases we wish to use one or the other symbol for umlaut. Switching between such alternate fonts as long been a standby when fancy typography is required. Yet I don't see there is any advantage to switching betwen between fonts and switching between the Unicode character COMBINING DIAERESIS and COMBINING LATIN SMALL LETTER E. And it makes us dependent on a particular set of fonts. That is probably not good. :-( A better solution might be an intelligent font that recognizes some kinds of tagging and which allows us to turn on different glyphs for diaeresis according to the tagging, one of these glyphs being a superscript e. So we tag words and phrases. And, magically, if that particular font works properly, we see diaeresis where we want diaeresis and superscript e where we want superscript e. But it is not evident that tagging for this purpose is any easier than entering the different Unicode characters from the beginning. And we are again dependent on the intelligence of a particular font. Of course, we might expect there will be soon be many such intelligent fonts. It is less likely that they will all work exactly the same, and understand exactly the same tags in the same way. And we are restricted to such intelligent fonts as understand a particular system of tagging rather than using almost any font. :-( We might propose introducing a tag or indicator of some kind at some level to indicate a diaeresis has umlaut function, but such a tag or indicator would probably only be used when a user wanted to use a superscript e, in which case it is not clear that using it would have any advantage over actually entering COMBINING LATIN SMALL LETTER E. :-( We might go to a still higher level of protocol, to a routine or plugin in an application or a new style feature added to HTML or XML which allows diaeresis replacement. Just as Microsoft Word and some other programs now allow capitalization and small capitalization as an effect, though the underlying text is still actually in upper and lower case, so we might show a diaeresis as a superscript e, though in fact at the plain text level the text has a diaeresis. Presumably for viewing
RE: Character identities
Keld Jørn Simonsen wrote: On Tue, Oct 29, 2002 at 09:07:16PM +0100, Marco Cimarosti wrote: Kent Karlsson wrote: Marco, Keld, please allow me to begin with the end of your post: I really have not contributed much to this thread, I think you mean Kent. Oh No! Again! Apologies to both of you! I seriously start to be worried about my dislexia... _ Marco
Re: Character identities
Summary: Would it be possible to define the U+FE00 variant sequence for a with two dots above it to be a with an e above it, and similarly U+FE00 variant sequences for o with two dots above it and for u with two dots above it, and possibly for e with two dots above it as well? I may not have got the details right about this suggestion, but, if the general idea is thought good, I am sure that one of the experts on this list could codify it properly. It seems to me that there is middle ground between the two views being expressed. Suppose, for example, hypothetically, that there is a font available in Germany, named Volksmusik which is a display font intended for setting headings in modern German, such as for the headings in advertisements for restaurants and so on, and that in that font the a umlaut, o umlaut and u umlaut are all expressed using a mark which is something like a small letter e. Then, it seems to me that if a theatre restaurant manager has set out the text required for a menu for the restaurant for some special gala evening to be held soon using a plain text editor on a PC using a font such as Arial, with a umlaut characters appearing many times, sometimes in headings and sometimes in the main body of the text, then stored the text on a floppy disc and walked down the road to the print shop and explained to the print shop manager that here is the text content for the menus in Arial, could the print shop please supply 500 menus using that text content yet jazzing it up a bit so that the headings on each of the four pages is in a fancy typeface in a different colour, then it should be quite straightforward for the print shop manager to copy the text onto the clipboard from the Arial file, and paste it into some other file, then change the font for each of the page headings to the Volksmusik font, and make the font for the rest of the menu some plainer font. Thus, some a umlaut characters originally keyed by the restaurant manager would display on the final menu as a with two dots above and some a umlaut characters keyed by the restaurant manager would display on the final menu as a with a small letter e above. The restaurant manager is, however, studying part-time for a research degree at the local university. This involves producing essays about various aspects of the printing of German literature, including quoting passages from earlier times, taking care to distinguish clearly between a with two dots above it and a with an e above it, all within using a plain text file, so that there is maximum portability in sending copies of the essay to various people, including the project supervisor at the University and the editors of various learned journals. How is the a with an e above it set, bearing in mind that there is no precomposed a with an e accent above character in regular Unicode and also that it would be nice if the text could be searched for keywords using just the usual search methods? Would it be possible to define the U+FE00 variant sequence for a with two dots above it to be a with an e above it, and similarly U+FE00 variant sequences for o with two dots above it and for u with two dots above it, and possibly for e with two dots above it as well? I may not have got the details right about this suggestion, but, if the general idea is thought good, I am sure that one of the experts on this list could codify it properly. William Overington 30 October 2002
RE: Character identities
A 21:46 2002-10-29 +, Michael Everson a écrit : At 13:27 -0800 2002-10-29, Kenneth Whistler wrote: Michael asked: My eyes have glazed over reading this discussion. What am I being asked to agree with? Here's the executive summary for those without the time to plow through the longer exchange: Marco: It is o.k. (in a German-specific context) to display an umlaut as a macron (or a tilde, or a little e above), since that is what Germans do. Kent: It is *not* o.k. -- that constitutes changing a character. [Michael] Kent can't be right here. [Alain] However I agree with Kent. Let's say a text identified as German quotes a French word with an U DIAERESIS *in the German text* (a word like capharnaüm). It would be a heresy to show a macron in a printed text in this context. In French *nobody* uses this practice that is frequent in German handwriting (but not in printing, unless I am wrong). One has to respect characters for what they are. A U DIAERESIS is not a U MACRON even if its codepoint is shared with a German U UMLAUT that may be handwritten with a *vague* resemblance to a U MACRON. Alain LaBonté Québec
Re: RE: Character identities
A 22:21 2002-10-29 +, Michael Everson a écrit : At 15:56 -0600 2002-10-29, [EMAIL PROTECTED] wrote: Is it complaint with Unicode to have a font where a-umlaut has a glyph of a with e above? What about a glyph of a-macron (e.g. a handwriting font for someone who writes a-umlaut that way)? Of course it is. Glyphs are informative. [Alain] (: If they are informative, they should inform, not disinform... (; Alain
Re: RE: Character identities
John Cowan jcowan at reutershealth dot com wrote: If I find your Suetterlin font unreadable, however, and switch to an Antiqua font to read your German, I expect to find the text littered with diaereses, not macrons, although the Suetterlin umlaut-mark looks pretty much like a macron. Actually, the Sütterlin umlaut-mark is a small italicized e, which is very similar to an n. What it really ends up looking like, from a distance, is a double acute. (John's point is still perfectly valid, of course.) Sütterlin does use a macron over m and n to indicate that the letter should be doubled, and it uses a breve over u to differentiate it from the otherwise identical n. -Doug Ewell Fullerton, California
RE: Character identities
At 10:53 -0500 2002-10-30, Alain LaBontÈÝ wrote: A 21:46 2002-10-29 +, Michael Everson a écrit : At 13:27 -0800 2002-10-29, Kenneth Whistler wrote: Michael asked: My eyes have glazed over reading this discussion. What am I being asked to agree with? Here's the executive summary for those without the time to plow through the longer exchange: Marco: It is o.k. (in a German-specific context) to display an umlaut as a macron (or a tilde, or a little e above), since that is what Germans do. Kent: It is *not* o.k. -- that constitutes changing a character. [Michael] Kent can't be right here. [Alain] However I agree with Kent. Let's say a text identified as German quotes a French word with an U DIAERESIS *in the German text* (a word like capharnaüm). It would be a heresy to show a macron in a printed text in this context. In French *nobody* uses this practice that is frequent in German handwriting (but not in printing, unless I am wrong). All that means is that the German font which did that would not be useful for French. The underlying coded character is the same, and the glyph is INFORMATIVE. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
[Alain] However I agree with Kent. Let's say a text identified as German quotes a French word with an U DIAERESIS *in the German text* (a word like capharnaüm). It would be a heresy to show a macron in a printed text in this context. Hm. A font representing my mothers handwriting (german only :-) would render u as u with breve above to distinguish it from the representation of n. I don't know how my mother would write a text containing an u with breve above, but nevertheless the u-glyphe has to have a breve, even if it may conflict with another charakter. If you got a text with such ambiguities, why don't use another font for the quotings? - has the additional advantage of pointing out visualy that it's a quotation. -- Dominikus Scherkl [EMAIL PROTECTED]
Re: RE: Character identities
At 10:54 -0500 2002-10-30, Alain LaBontÈÝ wrote: A 22:21 2002-10-29 +, Michael Everson a écrit : At 15:56 -0600 2002-10-29, [EMAIL PROTECTED] wrote: Is it complaint with Unicode to have a font where a-umlaut has a glyph of a with e above? What about a glyph of a-macron (e.g. a handwriting font for someone who writes a-umlaut that way)? Of course it is. Glyphs are informative. [Alain] (: If they are informative, they should inform, not disinform... (; I think this thread has about worn itself out ;-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
I insist that you can talk about character-to-character mappings only when the so-called backing store is affected in some way. No, why? It is perfectly permissible to do the equivalent of print(to_upper(mystring)) without changing the backing store (mystring in the pseudocode); to_upper here would return a NEW string without changing the argument. If the backing store is not changed, it is only a character-to-glyph mapping, however complicate and indirect it may be. Yes. But with several font technologies the user can affect the mapping in some ways, via features. Including what *amounts to* mapping to uppercase (an x-height A glyph is an A not an a, even if you have an a in the backing store), or various other changes, like changing diaeresis to e-above (they are still not glyph variants of eachother, even in German, which is why DIN asked for e-above, etc.). My claim is that it is a bad idea for fonts (I don't dare say Unicode font at this point) to do what *amounts to* such in-effect character mappings *without explicit request* from whoever is in charge of the text in some way (author, editor, graphic designer, reader who like to make changes to the text, ...). Such changes should NOT be the result of JUST changing font. (I still think it is a bad idea to build such *in effect* transient character to character mappings into fonts; but people are doing that anyway, so...) I totally agree with Doug's careful definition, and I am glad that you agree as well. Doug indicates two key points that a font must respect to be suitable for Unicode: « [...] calling a font a Unicode font implies two things: 1. It must be based on Unicode code points. [...] 2. The glyphs must reflect the essential characteristics of the Unicode character to which they are mapped. [...] » If we agree that the only requirement for a glyph representing a certain Unicode character is to respect the essential characteristics which make it recognizable, then all our discussion is simply about determining which essential characteristics a particular character is supposed to have. So far we agree completely re. that definition. To me, a glyph floating atop of letters a, o and u is recognizably a German umlaut if (a) the text is written in German, and (b) the glyph has one of the following shapes: 1. Two small blobs (e.g. circles, squares, acute accents) places side by side; I'm going to opt staying on the restrictive side here. Except for the last one, that is a diaeresis, yes. That is the modern standard way of writing umlaut in typeset German. The last one is a double acute, which is normally not used for this in German, and it is stretching things a bit too far to consider it a glyph variant of diaeresis. 2. A straight horizontal line; That's a macron. Not used in *standard orthography* for German. Using that as a glyph variant for diaeresis is stretching things quite a lot, even if it occurs in particular forms of handwriting or some signs. (In handwriting, some people use I-dot-above, or even I-ring-above. Does that make them glyph variants of I, in a (non-Turkish) font (that mimic handwriting)? I hope not. If you want I-ring-above, then do what *in effect* amounts to a (permanent or transient) mapping to I, combining-ring-above.) 3. A wavy horizontal line; That's a tilde. Not used in *standard orthography* for German. Using that as a glyph variant for diaeresis is stretching things quite a lot. Though it is quite common to use tilde instead of diaeresis in handwriting. (If there were a handwriting font feature, what amounts to a transient mapping from diaeresis to tilde would be expected under that feature. For some fonts I might even agree that it might have that feature on by default; but possible to turn off.) 4. a small lowercase e, or something recalling it. Our major point of disagreement (along with M vs. Roman Numeral One Thousand C D ;-). Historically that is the origin of the umlaut. It is definitely distinct from diaeresis, just as much as æ is distinct from ä, even in a German context. This is not just stretching it very far, I'd say it's plain wrong, also in a purely German context. That does not at all prevent a hist feature (or whatever; but never on by default) to do what amounts to a transient mapping from diaeresis to e-above. I don't argue this for caprice or provocation, but because these particular shapes are commonly attested in one context or another: be it modern typography, traditional typography, handwriting, fancy graphics, etc. Yes. ... If (and only if!) the author/editor of the text asks for an overscript e should the font produce one. It is not up to the font maker to make such substitutions without request, Yes. But a font which displays U+0308 with a glyph resembling the typical glyph for U+0364 is not producing anything; it is not substituting anything with anything else: it is just
RE: Character identities
Marco: It is o.k. (in a German-specific context) to display an umlaut as a macron (or a tilde, or a little e above), since that is what Germans do. Kent: It is *not* o.k. -- that constitutes changing a character. Kent can't be right here. 1. We have all seen examples, in print, in signage, and in handwriting of German umlauts being displayed in each of those ways. Obviously the underlying encoding of them is the same, as is the intent. The underlying encoding *may* be the same (if there is an encoding at all...). Still, I claim, it should not be up to the font designer to make a font that shows e.g. an a-with-e-above glyph for a-diaeresis *without also* the font being explicitly requested (via some higher-level protocol) to do such a mapping, via a hist feature (off by default) or whatever other mechanism. Such a mapping *amounts to* a transient character-to-character mapping. Just as I think an author (I use that in a general sense) should be in charge of the spelling in a document, the author should be in charge of what diacritics are used. Would it be a good idea for a British font to change color to colour, i18n to internationalisation? AAT fonts can in principle do that (via glyph index mappings executed through a finite automaton, but that is beside the point), so should they? Is such a font (if it did this mapping by default) a Unicode font? Each item in these two example pairs are seen in print (etc.) and they are known to mean the same within each pair... There are signs (and printed texts) that say Gøteborg; but we usually spell that Göteborg. Does that mean that the underlying encoding (if any) therefore must be the same (the same city is intended...), and ø is just a glyph variant of ö (or the other way around), and a ((Unicode)) font may display ö as ø (without being asked to perform any extraneous mapping). Say the font is made for Norwegian. Is this all up to the font designer? This is an exact parallel to what we started off with. /Kent K
Re: RE: Character identities
Doug Ewell scripsit: Actually, the Sütterlin umlaut-mark is a small italicized e, which is very similar to an n. What it really ends up looking like, from a distance, is a double acute. Oops, yes. Brain fart. Sütterlin does use a macron over m and n to indicate that the letter should be doubled, This I think is a true COMBINING MACRON. and it uses a breve over u to differentiate it from the otherwise identical n. Part of the u glyph. -- XQuery Blueberry DOMJohn Cowan Entity parser dot-com [EMAIL PROTECTED] Abstract schemata http://www.reutershealth.com XPointer errata http://www.ccil.org/~cowan Infoset Unicode BOM --Richard Tobin
RE: RE: Character identities
Sütterlin does use a macron over m and n to indicate that the letter should be doubled So should a Sütterlin font then by default replace mm with an m-macron glyph? Or should the author decide which orthography to use? /Kent K
Re: Character identities
Hello Doug, DE Actually, the Sütterlin umlaut-mark is a small italicized e, DE which is very similar to an n. What it really ends up looking DE like, from a distance, is a double acute. [...] Sütterlin does use DE a macron over m and n to indicate that the letter should be DE doubled, Actually, when I learned it in school about seventeen years ago, I was taught to use double acutes as umlaut markers, and there were no macrons to indicate doubled letters. Double m was not very legible, however. Cheers - Philipp Reichmuthmailto:mailinglistenprozessor;gmx.net -- You step in the stream, / but the water has moved on / This page is not here
RE: Character identities
Alain LaBonté wrote: [Alain] However I agree with Kent. Let's say a text identified as German quotes a French word with an U DIAERESIS *in the German text* (a word like capharnaüm). A Fraktur font designed solely for German should not be used for typesetting French words. (And, BTW, that is probably why German Fraktur books used roman type for foreign words). In general, you cannot expect a good result using a font designed for one language to typeset another: see, in the attached image, what your capharnaüm looks like in a font designed for Chinese. Nice typography, eh? That ü is so weird because it is designed to be used in conjunction with the full width letters in U+FF41..U+FF5A, which is perhaps the right choice for Chinese, but not for French. _ Marco attachment: cafarnao.gif
RE: RE: Character identities
I said: Ah! I never realized that the Sütterlin zig-zag-shaped e was the missing with the ¨ glyph! ^ Sorry: ... the missing LINK with _ Marco
RE: RE: Character identities
Doug Ewell wrote: Actually, the Sütterlin umlaut-mark is a small italicized e, which is very similar to an n. What it really ends up looking like, from a distance, is a double acute. Ah! I never realized that the Sütterlin zig-zag-shaped e was the missing with the ¨ glyph! Thanks! After all, this discussion has not been completely useless. :-) _ Marco
RE: Character identities
Kent Karlsson wrote: I insist that you can talk about character-to-character mappings only when the so-called backing store is affected in some way. No, why? It is perfectly permissible to do the equivalent of print(to_upper(mystring)) without changing the backing store (mystring in the pseudocode); to_upper here would return a NEW string without changing the argument. And that, conceptually, is a character-to-glyph mapping. In my mind, you are so much into the OpenType architecture, and so much used to the concept that glyphization is what a font does, that you can't view the big picture. If you look at Unicode from a platform independent perspective, fonts do not necessarily do something. In some architectures, fonts are just inert repository of glyphs, and the display intelligence is somewhere out of the font. If the backing store is not changed, it is only a character-to-glyph mapping, however complicate and indirect it may be. Yes. But with several font technologies the user can affect the mapping in some ways, via features. [...] Even in the simplest of technologies, the user can affect the mapping in some way, e.g. using a different font. My claim is that it is a bad idea for fonts (I don't dare say Unicode font at this point) to do what *amounts to* such in-effect character mappings *without explicit request* from whoever is in charge of the text in some way (author, editor, graphic designer, reader who like to make changes to the text, ...). Such changes should NOT be the result of JUST changing font. All undue generalizations of the OpenType paradigm. Not all fonts do something (let alone doing what you wish them to do); not all font technologies have modes (better said, *no* font technologies have modes, if not in theory). To me, a glyph floating atop of letters a, o and u is recognizably a German umlaut if (a) the text is written in German, and (b) the glyph has one of the following shapes: 1. Two small blobs (e.g. circles, squares, acute accents) places side by side; I'm going to opt staying on the restrictive side here. Except for the last one, that is a diaeresis, yes. That is the modern standard way of writing umlaut in typeset German. The last one is a double acute, which is normally not used for this in German, and it is stretching things a bit too far to consider it a glyph variant of diaeresis. I think stretching things is not seeing that the umlaut of most Fraktur fonts looks like a double acute: a shape which is consistent with the usual shape of the dots on i and j. BTW, strangely, you don't seem to be worried by the fact that also i and í look the same... What if I use Fraktur for Spanish? [...] If (and only if!) the author/editor of the text asks for an overscript e should the font produce one. It is not up to the font maker to make such substitutions without request, Yes. But a font which displays U+0308 with a glyph resembling the typical glyph for U+0364 is not producing anything; it is not substituting anything with anything else: it is just faithfully reproducing the text, according to the content decided by the author *and* according to the typographical style decided by the font designer. This is not a typographic decision, it is a spelling decision, and not up to the font designer, I'd say. It is a typographic decision whether the diaeresis digs into the glyph below, or if an e-above looks like a capital e inside. But spelling changes, whether transient or permanent, should be the author's call. It is a cat biting its tail (*). If you consider it a glyph variation, it is just a typographic decision; if you consider it a character change, it becomes an orthographic issue. But considering a character change the fact that a certain code point is displayed with a certain glyph is, IMHO, totally out of the letter and spirit of the Unicode character-glyph model. (*: Am I exporting an Italian idiom or is this used in English too? Anyway, it means a chicken-egg issue) _ Marco
RE: Character identities
This is not a typographic decision, it is a spelling decision, and not up to the font designer, I'd say. It is a typographic decision whether the diaeresis digs into the glyph below, or if an e-above looks like a capital e inside. But spelling changes, whether transient or permanent, should be the author's call. No, it is not a spelling decision. Both are umlauts: one with a letter form of /e/ and one with a letter form of ¨ . Any textual editor in the world would make that judgment call, and typeset according to the graphic expectations of his (or her) readers, not according to the graphic usage of the author, no matter how conservative the text.
Re: Character identities
On Wed, Oct 30, 2002 at 10:53:10AM -0500, Alain LaBonté wrote: [Alain] However I agree with Kent. Let's say a text identified as German quotes a French word with an U DIAERESIS *in the German text* (a word like capharnaüm). It would be a heresy to show a macron in a printed text in this context. It would be heresy not to change the font, since in the typesetting convention used with Fraktur fonts, French quotes were in set in Roman fonts different from the surrounding text. -- David Starner - [EMAIL PROTECTED] Great is the battle-god, great, and his kingdom-- A field where a thousand corpses lie. -- Stephen Crane, War is Kind
RE: Character identities
Unicode captures the ice-age during the global warming era! Do we have codepoints for images found on the walls of caves? :) CRO-MAGNON PAINTING HUMAN SPEARING A MAMMOTH CRO-MAGNON PAINTING MAMMOTH STOMPING A HUMAN ...
RE: Character identities
-Original Message- From: Marco Cimarosti [mailto:marco.cimarosti;essetre.it] Sent: den 28 oktober 2002 16:23 To: 'Kent Karlsson'; Marco Cimarosti Cc: [EMAIL PROTECTED] Subject: RE: Character identities Kent Karlsson wrote: For this reason it is quite impermissible to render the combining letter small e as a diaeresis So far so good. There would be no reason for doing such a thing. ... or, for that matter, the diaeresis as a combining letter small e (however, you see the latter version sometimes, very infrequently, in advertisement). This is the case I though we were discussing, and it is a very different case. No, the claim was that diaresis and overscript e are the same, The claim was that dieresis and overscript e are the same in *modern* *standard* German. Or, better stated, that overscript e is just a glyph variant of dieresis, in *modern* *standard* German typeset in Fraktur. Well, we strongly disagree about that then. Marc and I clearly see them as different. More about this below. Sorry if I haven't stated this clearly enough. You have several times. No need to emphasise it anymore. We still don't agree. ... Some of them (overscript e in particular) should be(come) quite commonly occurring in any Fraktur Unicode font. Commonly sounds funny near Fraktur... We were talkning about Fraktur fonts (which may not be all that common.) Using such a character to encode 21st century advertisements is doomed to cause problems: 1) The glyph for U+0364 is more likely found in the font collection of the Faculty of Germanic Studies that on the PC of people wishing to read the advertisement for Ye Olde Küster Pub. So, most people will be unable to view the advertisement correctly. 2) The designer of the advertisement will be unable to use his spell-checker and hyphenator on the advertisement's text. Advertisements should invariably be final spell-checked and hyphenated by humans! Automated spell checkers and hyphenators for German (as well as Scandinavian languages) have (so far) not been good enough even for running text that you want to publish... This has no connection with this discussion. Well, you brought it up. I'm usually rather picky about spelling, so a spell checker can only suggest corrections, often to be rejected as wrong or even silly. However, IMHO, the presence U+0364 (COMBINING LATIN SMALL LETTER E) in a modern German or Swedish text is just a plain spelling error, and even the naivest spellchecker should flag it as such. So what? Naïve spell checkers flag all kinds of correctly spelled words! ... Most modern use of Fraktur seem to use diaeresis or double acute for this. U+0308 (COMBINING DIAERESIS) should be the only umlaut to be found in modern German text. What that diacritic *looks* like (two dots, an e, a double acute, a macron, Mickey Mouse's ears), is a choice of the font designer. Not quite. Please note that some characters are defined to have very specific glyphs, e.g. the estimated sign, there is no shape variability except for size. Others are glyphically allocated/ unified, like the diacritics, and some glyphic variability is expected. But a diaeresis is two dots (of some shape, and it would be a margin case to have them elongated), never a tilde, macron or overscript e. Those are other characters, not just a glyph variation. Other characters have more glyphic variability (informally) associated with them, like A, but some of them have compatibility variants that have a somewhat more restricted glyphic variability, like the Math Fraktur A in plane 1. Some scripts have by tradition some very strong ligatures; strong in the sense that may be hard to recognise the ligated pieces in the result glyph. That does not mean that you can legitimately use an M glyph for One Thousand C D, just because they mean the same. Nor does that mean that diacritics can be substituted for each other, asking for a diaeresis and get a tilde. Yes, it is common practice with many to use a tilde instead of a diaresis in handwriting, but it is still character substitution, not a glyphic variant (since that is the way diacritics are allocated in Unicode). (But the web designer could use a dynamically downloaded font fragment, if there is worry that all glyphs might not be supported by the fonts used by the vast majority of the target audience.) This too has no connection with this discussion, and is OT. Unicode is concerned with how text is *encoded* the details of fonts and display technology are out of scope. We were talking about fonts. What Unicode really mandates is that the encoding should not change to obtain a certain graphic effect. You can do any character mappings you like before you apply any font, or make it into graphics... ... And overscript small e will also vary with the font, looking like a shrunken ordinary e
Re: Character identities
At 23:21 -0800 2002-10-28, Barry Caplan wrote: Do we have codepoints for images found on the walls of caves? No. The closest we come to that is wondering about the Tartaria proto-script, which we haven't readmapped. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
Kent Karlsson wrote: The claim was that dieresis and overscript e are the same in *modern* *standard* German. Or, better stated, that overscript e is just a glyph variant of dieresis, in *modern* *standard* German typeset in Fraktur. Well, we strongly disagree about that then. Marc and I clearly see them as different. More about this below. We could simply agree to disagree, weren't it for the fact that we both claim that each other's view violates the principles of Unicode. I have tried to show that glyphic variation is part the principles of Unicode, as per TUS 3.0. You might wish to point us to where the current Unicode Standard support your view, or contradicts mine. However, IMHO, the presence U+0364 (COMBINING LATIN SMALL LETTER E) in a modern German or Swedish text is just a plain spelling error, and even the naivest spellchecker should flag it as such. So what? Naïve spell checkers flag all kinds of correctly spelled words! Yes but, IMHO, in this case they would be right: I never heard that U+0364 (COMBINING LATIN SMALL LETTER E) is part of the spelling of modern German or Swedish. Not quite. Please note that some characters are defined to have very specific glyphs, e.g. the estimated sign, there is no shape variability except for size. A small set of *symbols* like the estimate sign and some dingbats are an exception to the rule that Unicode encodes character but not glyphs. Others are glyphically allocated/ unified, like the diacritics, and some glyphic variability is expected. But a diaeresis is two dots (of some shape, and it would be a margin case to have them elongated), never a tilde, macron or overscript e. Would you care to go in Germany and have a look at shop signs? The umlaut is more often a straight line than not. But this doesn't make it a macron: there is no macron in German. Those are other characters, not just a glyph variation. So I was wrong: German orthography uses macrons! Can you please explain the German pronunciation of ā, ō and ū? Other characters have more glyphic variability (informally) associated with them, like A, but some of them have compatibility variants that have a somewhat more restricted glyphic variability, like the Math Fraktur A in plane 1. More *symbol* characters which escape the general rule. Some scripts have by tradition some very strong ligatures; strong in the sense that may be hard to recognise the ligated pieces in the result glyph. That does not mean that you can legitimately use an M glyph for One Thousand C D, just because they mean the same. Perhaps. It could have been a poor example. But the opposite is much more important: you cannot use a character in place of another which means a different thing just because you want a different look. Nor does that mean that diacritics can be substituted for each other, asking for a diaeresis and get a tilde. Substituting diacritics for each other is what *you* seem to suggest! Yes, it is common practice with many to use a tilde instead of a diaresis in handwriting, but it is still character substitution, not a glyphic variant (since that is the way diacritics are allocated in Unicode). So, German orthography uses tildes too! Can you please explain the German pronunciation of ã, õ and ũ? What Unicode really mandates is that the encoding should not change to obtain a certain graphic effect. You can do any character mappings you like before you apply any font, or make it into graphics... There can be no character-to-character mapping inside a font or a display engine! Applications are allowed to do character-to-character mappings only when they want to *change* the text in some way (e.g., a case conversion, a transliteration, etc.), not when they want to display it. Displaying Unicode only implies character-to-glyph mappings. Internally, there can be some glyph-to-glyph mapping, but never a character-to-character mapping. Even character-to-character mappings done on a temporary copy of the text are, conceptually, a step on the character-to-glyph mapping. This fundamental error spreads throughout all your post, and makes it impossible to go into the details without keeping on saying: you can't do any character-to-character mappings during display; you can't do any character-to-character mappings during display; you can't do any... I was trying to be general (not fancy) and not just talk about Opentype. But yes, I meant (at least) the case where no features (or similar) are invoked. Who tells you that there are any features to be invoked? There is no similar requirement in Unicode! What I was aiming at excluding were features that implicitly involve character mappings, [...] You see? You can't do any character-to-character mappings during display. For simplicity, I will simply cut off all passages where you assume this. A font that by default (that is ordinary English, not a fancy term) Who tells you that
RE: Character identities
Marco, Standard orthography, and orthography that someone may choose to use on a sign, or in handwriting, are often not the same. And I did say that current font technologies (e.g. OT) does not actually do character to character mappings, but the net effect is *as if* they did (if, and I hope only if, certain features are invoked, like smallcaps). It would be more honest to do them as character-to-character mappings though, either inside (which OT does not support) or outside of the font. Capital A, even at x-height, is not a glyph variant of small a (even though, centuries ago, that was the case, but then I and J were the same, and U and V, et and , ad and , ...). But displaying U as V (in effect doing a character replacement on a copy of the input) would be ok in a non-default mode (using the hist feature, say). My point here is that that replacement (effectively) should not be done by default in a Unicode font (see Doug's explanation for what a Unicode font is, if you don't like mine). [...] I never heard that U+0364 (COMBINING LATIN SMALL LETTER E) is part of the spelling of modern German or Swedish. True (that is not part of modern standard orthography), but I don't see how that could imply some kind of support for your (rather surprising and extreme) position. If (and only if!) the author/editor of the text asks for an overscript e should the font produce one. It is not up to the font maker to make such substitutions without request, either by the author/(human) editor changing the text, or by the author/editor invoking a non-default font feature (via some higher-level protocol, can't be done in plain text). The default mode (for lack of a better term) would be the one used, well, by default; e.g. on plain text. Other characters have more glyphic variability (informally) associated with them, like A, but some of them have compatibility variants that have a somewhat more restricted glyphic variability, like the Math Fraktur A in plane 1. More *symbol* characters which escape the general rule. Math Fraktur A is a letter (of course!). Many letters, including ordinary A, are used as symbols too. You seem to argue that for symbols (whichever those are, I'm sure you *don't* mean general categories S*...) there is total rigidity, while for non-symbols (whichever those are) there is near total anarchy and font makers can change glyphs to something entirely different. I claim that there are no characters for which there is total anarchy (except possibly for view invisibles of normally invisible characters), but that there are several degrees of flexibility (I'm sure someone can list more than three, but here is a coarse division): 1. glyph (almost) fixed: Dingbats, estimated sign, ... [could possibly be given a rugged look, or texture if you want to mimic e.g. a typewriter look] 2. abstract glyph is fixed but there can be minor shape variations: diacritics, math symbols (Sm), math letters (there are several Math Fraktur designs, several Math sans-serif designs, etc. that could suit), Arabic presentation forms (initial/ medial/final/isolated decided but other aspects are not fixed, maybe this case is between 2 and 3), ... 3. fairly free as long as (some) readers recognise the character from the glyph (modulo compatibility/ canonical variants and what should have been compatibility/canonical variants...): nominal digits/letters/punctuation, ... [This, however, does NOT allow, e.g., the One Thousand C D character to be shown with an M glyph, nor display € as EUR, ... in a Unicode font in...; if it did so in default mode [by default], it would not be a Unicode font.] [4. Near anarchy; you seem to argue that a large part of case 2 and all of case 3 fall here...] Yes, you can have glyphic variation, but for the diacritics there is (by design, but maybe not sufficiently explicit stated in the book) a limit to how much it can vary (in default mode). There are limits also for, e.g., 'nominal' letters and roman numeral characters, that are (by design) somewhat less constrained. In addition you may note that those who asked for the inclusion of overscript e does not regard an overscript e glyph to be an acceptable way of displaying a diaeresis [in a Uni..., you know]. These things come up quite often in discussions about proposals to add characters, even though it is not formally stated. If some of the Unicode elders care to elaborate, please feel free. Marco, I'm not sure it is of any use to try to explain in more detail, since you don't appear to be listening. However, I think I, Marc, Doug, and Mark (at the very least) seem to be in approximate agreement on this (at least, I have yet to see any major disagreement). I'm sure Michael
RE: Character identities
Kent Karlsson wrote: Marco, Keld, please allow me to begin with the end of your post: Marco, please calm down and reread every sentence of my previous message. You seem to have misread quite a few things, but it is better you reread calmly before I try to clear up any remaining misunderstandings. I have been absolutely calm, and I apologize if I gave a different impression. I may happen to heat up when discussing things like ethics, politics, religions, racism, war, etc., but definitely not when discussing about the details of the Unicode character-glyph model. I wish to recall that we are just discussing about a glyph variation for a diacritic character: a variation that I consider acceptable and you consider undesirable. Please let's not make this bigger than it could reasonably be. Standard orthography, and orthography that someone may choose to use on a sign, or in handwriting, are often not the same. And I did say that current font technologies (e.g. OT) does not actually do character to character mappings, but the net effect is *as if* they did (if, and I hope only if, certain features are invoked, like smallcaps). It would be more honest to do them as character-to-character mappings though, either inside (which OT does not support) or outside of the font. Capital A, even at x-height, is not a glyph variant of small a (even though, centuries ago, that was the case, but then I and J were the same, and U and V, et and , ad and , ...). But displaying U as V (in effect doing a character replacement on a copy of the input) would be ok in a non-default mode (using the hist feature, say). I insist that you can talk about character-to-character mappings only when the so-called backing store is affected in some way. If the backing store is not changed, it is only a character-to-glyph mapping, however complicate and indirect it may be. Whether these mappings takes part inside or outside a font is irrelevant as far, again, as the backing store is not changed. My point here is that that replacement (effectively) should not be done by default in a Unicode font (see Doug's explanation for what a Unicode font is, if you don't like mine). I totally agree with Doug's careful definition, and I am glad that you agree as well. Doug indicates two key points that a font must respect to be suitable for Unicode: « [...] calling a font a Unicode font implies two things: 1. It must be based on Unicode code points. [...] 2. The glyphs must reflect the essential characteristics of the Unicode character to which they are mapped. [...] » If we agree that the only requirement for a glyph representing a certain Unicode character is to respect the essential characteristics which make it recognizable, then all our discussion is simply about determining which essential characteristics a particular character is supposed to have. To me, a glyph floating atop of letters a, o and u is recognizably a German umlaut if (a) the text is written in German, and (b) the glyph has one of the following shapes: 1. Two small blobs (e.g. circles, squares, acute accents) places side by side; 2. A straight horizontal line; 3. A wavy horizontal line; 4. a small lowercase e, or something recalling it. I don't argue this for caprice or provocation, but because these particular shapes are commonly attested in one context or another: be it modern typography, traditional typography, handwriting, fancy graphics, etc. You seem to argue that only case 1 is acceptable, and probably also add some constraints on the shape of the blobs (e.g., I think I understood that you find that a double acute shape would be unacceptable). As I see it, the only reason for which you say this is because the other shapes are similar or identical to the typical shapes of other Unicode characters. As I said, I don't find that this is valid reason, unless the font we are talking about is to be used in contexts (e.g., linguistics, or languages other than German) in which the distinction is meaningful. [...] I never heard that U+0364 (COMBINING LATIN SMALL LETTER E) is part of the spelling of modern German or Swedish. True (that is not part of modern standard orthography), but I don't see how that could imply some kind of support for your (rather surprising and extreme) position. (Frankly, I find surprising and extreme your position -- perhaps we're only choosing bad examples.) What I meant is that if (a) U+0364 is not supposed to appear in modern German, and (b) the font we are considering is designed to be used for modern German only, then (c) the possibility of confusing U+0364 with U+0308 is a non issue. If (and only if!) the author/editor of the text asks for an overscript e should the font produce one. It is not up to the font maker to make such substitutions without request, Yes. But a font which displays U+0308 with a glyph resembling the typical glyph for U+0364 is not producing anything; it is not
Re: Character identities
Standard orthography, and orthography that someone may choose to use on a sign, or in handwriting, are often not the same. If someone's writes an a-umlaut, no matter what it looks, it should be encoded as an a-umlaut. That's the identity of the character they wrote. I'm sure my German teacher would not appreciate us typing up our homework and using A-macron, even if the symbol she used for a-umlaut on the blackboard looked like a macron. Math Fraktur A is a letter (of course!). Many letters, including ordinary A, are used as symbols too. If it were a letter, then no one would have a problem with you writing language with it. But there are warnings all over the place, about how A and an appropriate font should be used for Fraktur A. Math Fraktur A is a symbol - it doesn't stand for a sound or a word. You seem to argue that for symbols (whichever those are, I'm sure you *don't* mean general categories S*...) there is total rigidity, while for non-symbols (whichever those are) there is near total anarchy and font makers can change glyphs to something entirely different. Font makers can change the glyphs to whatever they want, so long as it is uniquely that character. Marco, I'm not sure it is of any use to try to explain in more detail, since you don't appear to be listening. However, I think I, Marc, Doug, and Mark (at the very least) seem to be in approximate agreement on this (at least, I have yet to see any major disagreement). I'm sure Michael would agree too (at least I hope so), and many others. Interesting. I don't agree totally with Marco, but I'm of the opinion that glyphs of a with e above, a with macron above, and a with Disney ears above can be suitable glyphs for a-umlaut, and I got the impression that Mark and Doug agreed with me.
RE: Character identities
At 21:07 +0100 2002-10-29, Marco Cimarosti wrote: I'm sure Michael would agree too (at least I hope so), and many others. There are many Michaels and many others here... If any of them wish to intervene, I hope they'll rather say something new to take the discussion out of the loop, rather than joining one faction. My eyes have glazed over reading this discussion. What am I being asked to agree with? -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
Michael asked: My eyes have glazed over reading this discussion. What am I being asked to agree with? Here's the executive summary for those without the time to plow through the longer exchange: Marco: It is o.k. (in a German-specific context) to display an umlaut as a macron (or a tilde, or a little e above), since that is what Germans do. Kent: It is *not* o.k. -- that constitutes changing a character. [Sorry, guys, if I have ridden roughshod over the nuances... ;-)] Michael, you might have to recuse yourself, however, since when it was suggested that displaying Devanagari characters with snowpeaked glyphs for a Nepali hiking company would be o.k., you misunderstood and suggested private use characters! --Ken
RE: Character identities
At 13:27 -0800 2002-10-29, Kenneth Whistler wrote: Michael asked: My eyes have glazed over reading this discussion. What am I being asked to agree with? Here's the executive summary for those without the time to plow through the longer exchange: Marco: It is o.k. (in a German-specific context) to display an umlaut as a macron (or a tilde, or a little e above), since that is what Germans do. Kent: It is *not* o.k. -- that constitutes changing a character. Kent can't be right here. 1. We have all seen examples, in print, in signage, and in handwriting of German umlauts being displayed in each of those ways. Obviously the underlying encoding of them is the same, as is the intent. 2. The fact that a + diaeresis with a superscript e glyph could be mistaken for a + superscript-e is not more troublesome than the possibility of mistaking Latin or Cyrillic o with Greek omicron. Michael, you might have to recuse yourself, however, since when it was suggested that displaying Devanagari characters with snowpeaked glyphs for a Nepali hiking company would be o.k., you misunderstood and suggested private use characters! I did admit that I did not read the sentence entirely -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: RE: Character identities
At 21:07 +0100 2002-10-29, Marco Cimarosti wrote: I'm sure Michael would agree too (at least I hope so), and many others. There are many Michaels and many others here... If any of them wish to intervene, I hope they'll rather say something new to take the discussion out of the loop, rather than joining one faction. My eyes have glazed over reading this discussion. What am I being asked to agree with? Is it complaint with Unicode to have a font where a-umlaut has a glyph of a with e above? What about a glyph of a-macron (e.g. a handwriting font for someone who writes a-umlaut that way)?
Re: RE: Character identities
At 15:56 -0600 2002-10-29, [EMAIL PROTECTED] wrote: Is it complaint with Unicode to have a font where a-umlaut has a glyph of a with e above? What about a glyph of a-macron (e.g. a handwriting font for someone who writes a-umlaut that way)? Of course it is. Glyphs are informative. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Character identities
On Tue, Oct 29, 2002 at 09:07:16PM +0100, Marco Cimarosti wrote: Kent Karlsson wrote: Marco, Keld, please allow me to begin with the end of your post: I really have not contributed much to this thread, I think you mean Kent. Best regards keld
Re: RE: Character identities
At 14:56 10/29/2002, [EMAIL PROTECTED] wrote: Is it complaint with Unicode to have a font where a-umlaut has a glyph of a with e above? What about a glyph of a-macron (e.g. a handwriting font for someone who writes a-umlaut that way)? Yes, I would say that it is compliant with Unicode because there is absolutely nothing in the Unicode Standard to say that it is non-compliant. I have seen German display types in which the umlaut is indicated by a miniature uppercase E *inside* the uppercase O. The point is that the small e is an accepted traditional German convention for indicating an umlaut, and any recognisable glyph variant of that convention fits the cognitive model for many competent readers reading German. The example of a handwriting font in which the umlaut is represented by something that looks like a macron, or a tilde, or a duckbilled platypus, should be judged by the same criteria: does the reader recognise the glyph as representing a vowel with umlaut? If so, it is a perfectly valid glyph representation of the umlaut character. It is, of course, a perfectly valid response to a typeface design to say 'I don't want to use this font because it has a weird umlaut', but it is equally valid for a typeface to have a weird umlaut; it may limit the popularity of the typeface, but so might the shape of the lowercase f or the curl of the tail of the Q, but would you say that these forms need to be a certain way to be valid or compliant? Although the line between glyph variants that are recognised by readers as valid representations of characters and those that are not is difficult to define, in practice readers are capable of making these decisions (and even of recognising, accepting or learning new forms that they have not encountered before): it is a bit like the distinction between pornography and erotica, which is hard to define but which magistrates and juries regularly decide on with confidence, competence and consensus. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] It is necessary that by all means and cunning, the cursed owners of books should be persuaded to make them available to us, either by argument or by force. - Michael Apostolis, 1467
Re: RE: Character identities
Do we again need an intelligent font that understands language tagging? This should be achievable with OpenType, no? Do we now have different flavors of Unicocde, one for English, one for Icelandic, one for French, one for German ... ? In most of the cases described be you, you can still have just one Unicode character but different glyphs representing it. In OpenType, you could assign glyph substitutions to some features such as historical forms and do it on a language-dependant level. Should an English language font render ö as oe, so that Göthe appears automatically in the more normal English form Goethe? If you refer to Johann Wolfgang von Goethe, his name is *not* spelled with an ö anyway. The use of macron for dieresis is somewhat a different matter. If a particular style of German script uses a line for a diaeresis, then indeed the diaeresis in that script has fallen together in appearance with the macron. But this doesn't mean that you have to encode it just once. Unicode should be of what characters *mean*, not what characters look like. Unfortunately, for spatial reasons, many lookalikes have been consolidated. But you can intelligently split them with OpenType. You can have styllistic sets that you choose basing on your preferred writing. Adam
Re: RE: Character identities
On Tue, Oct 29, 2002 at 08:53:59PM -0500, Jim Allan wrote: Using the Unicode method makes far more sense than creating fonts that work for particular languages only, provided no foreign words or names appear, or which require language tagging. Why does the Unicode method exclude creating fonts that work for particular language only? A lot of fontmakers specialize in the one purpose font, and may not want or need to put in the time to cover multiple languages. Marco's desire to use a font to indicate combining superscript einstead of the way Unicode wants it done seems prompted because currently most Unicode fonts do not currently support the combinining superscript characters and he wishes a fallback to normal diaeresis instead of to an undefined character indicator. It was my wish, and it had nothing to do with that. I was looking at the book mentioned in my first message, which was printed in 1920 and yet used the superscript e instead of an umlaut. I thought about encoding that font in a computer, and then about printing a text in the font. If I take a sample German text, and want to print it in this font, why should I have to change the text? The text hasn't changed, just the presentation. While _I_ could change the text, the average user would probably find it prohibatively complex, and even if walked throught it, would be frustrated to have to put so much work into it. As for the concerns brought up by you and Marc, I find them absurd in this case. This font won't support other languages, because the book doesn't have the glyphs for them. (Not even ô or ï, if you're one of the people who think English needs them.) The font's not made for academic or scholarly work, and even if I were to encode the a-e in an a-e slot, it probably won't have a proper a-diaresis. -- David Starner - [EMAIL PROTECTED] Great is the battle-god, great, and his kingdom-- A field where a thousand corpses lie. -- Stephen Crane, War is Kind
Re: Character identities
At 11:37 25.10.2002 -0700, Doug Ewell wrote: Marc Wilhelm Küster kuester at saphor dot net wrote: As to the long s, it is not used for writing present-day German except in rare cases, notably in some scholarly editions and in the Fraktur script. Very few texts beyond the names of newspapers are nowadays produced in Fraktur. To put the long s on the German keyboard would be quite contrary to user requirements -- and if a requirement existed, it would be DIN's job to amend DIN 2137-2 and the upcoming DIN 2137-12 to cater for it. Irrelevant, sure, but contrary? I don't see what harm could come from adding a character to a previously unassigned key, especially in the relatively obscure AltGr zone (Level 3). Most users could safely ignore it, and most would never even know it was there. In principle, you are right. Unfortunately, there's quite a bit of software around that (mis-)uses unassigned AltGr-Keys for their own purposes - this includes, on Windows NT ff at least, software such as the localized MS Word. So, adding new assignments potentially clashes with existing software and should only be done if there is a sufficiently high public interest in doing so. But yes, of course it would be DIN's job to standardize such a thing (or not). Patrick Andries asked if a revised German keyboard standard would be ignored in the market with the same cavalier attitude seen in Canada (and the U.S.). My impression is that European manufacturers are held more closely to conformance with national and international standards than North American manufacturers, but I'd want some Europeans to back me up on this. Speaking of Europe, it differs from country to country. In Germany certainly DIN 2137 is widely adhered to and changes to it would in all likelihood be taken up fast on the market. Best regards, Marc Küster -Doug Ewell Fullerton, California * Marc Wilhelm Küster Saphor GmbH Fronländer 22 D-72072 Tübingen Tel.: (+49) / (0)7472 / 949 100 Fax: (+49) / (0)7472 / 949 114
RE: Character identities
... For this reason it is quite impermissible to render the combining letter small e as a diaeresis So far so good. There would be no reason for doing such a thing. ... or, for that matter, the diaeresis as a combining letter small e (however, you see the latter version sometimes, very infrequently, in advertisement). This is the case I though we were discussing, and it is a very different case. No, the claim was that diaresis and overscript e are the same, so the reversed case Marc is talking about is not different at all. Standing Keld's opinion and Marc's wholehearted support, it Please don't confuse me with Keld! follows that those infrequent advertisements should be encoded using U+0364... But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a small collection of Medieval superscript letter diactrics, which is supposed to appear primarily in medieval Germanic manuscripts, or to reproduce some usage as late as the 19th century in some languages. Yes, but you should not read too much into the explanation, which, while correct, does not limit the existence of their glyphs to fonts used only by germanic professors... Some of them (overscript e in particular) should be(come) quite commonly occurring in any Fraktur Unicode font. Using such a character to encode 21st century advertisements is doomed to cause problems: 1) The glyph for U+0364 is more likely found in the font collection of the Faculty of Germanic Studies that on the PC of people wishing to read the advertisement for Ye Olde Küster Pub. So, most people will be unable to view the advertisement correctly. 2) The designer of the advertisement will be unable to use his spell-checker and hyphenator on the advertisement's text. Advertisements should invariably be final spell-checked and hyphenated by humans! Automated spell checkers and hyphenators for German (as well as Scandinavian languages) have (so far) not been good enough even for running text that you want to publish... 3) User's will be unable to find the Küster Pub by searching Küster in a search engine. Depends on the search engine, and if it uses a correct collation table (for the language) or not... What will actually happen is that everybody will see an empty square, so they'll think that the web designer is an idiot, apart the professors at the Faculty of Germanic Studies, who'll think that the designer is an idiot because she doesn't know the difference between U+0308 and U+0364 in ancient German. Most modern use of Fraktur seem to use diaeresis or double acute for this. (But the web designer could use a dynamically downloaded font fragment, if there is worry that all glyphs might not be supported by the fonts used by the vast majority of the target audience.) The real error (IMHO) is the idea that font designers should stick to the *sample* glyphs printed on the Unicode book, because this would force Well, the diacritics are allocated/unified on glyphic grounds. While a diaeresis may look different from font to font, it is basically two dots (of some shape in line with the design of the font), never an e shape. At least not in the *default mode* of a *Unicode font*. And overscript small e will also vary with the font, looking like a shrunken ordinary e glyph of (ideally) the same font. But never like two dots (in the default mode of a Unicode font). graphic designer to change the *encoding* of their text in order to get the desired result. A graphic designer is likely to turn the whole thing into 2-d or 3-d graphics, probably distorted, possibly animated, to get the desired result! At which point the original, or intemediary, encoding of any text elements is not very relevant to the end result. Another big error (IMHO, once again) is the idea that two different Unicode characters should look different. I have never said that! E.g., a µ as well as an Å (both of which are allocated twice!) should look the same (resp.) regardless of which of their respective code points is used. There are many more examples of characters that definitely should (e.g. capital K and Kelvin sign, small i and small roman numeral one) or may (capital A, capital Alpha, ...) look the same. There are also lots of characters that mean the same, but always (in a Unicode font in default mode) should/must look different. Like M and Roman Numeral One Thousand C D (just to take an example closer to Italy... ;-). The difference must be preserved when it is useful -- e.g., U+0308 should not look like U+0364 in a should not -- must never font designed for publishing books on the history of German! a font . -- any Unicode font in default mode (Bad example, Marco!) What should really happen, IMHO, is that modern German should be encoded as modern German. A U+0308 (COMBINING DIAERESIS) should remain a U+0308, regardless that the corresponding glyph *looks* like U+0364 (COMBINING LATIN SMALL LETTER E) in one font, and it looks
Re: Character identities
On Mon, Oct 28, 2002 at 11:21:30AM +0100, Kent Karlsson wrote: No, the claim was that diaresis and overscript e are the same, so the reversed case Marc is talking about is not different at all. The claim is, that for certain fonts, it is appropriate to image the a-umlaut character as an a^e. That doesn't imply anything about the other way around, or else t' could legally be displayed as a t with caron above. A U+0308 (COMBINING DIAERESIS) should remain a U+0308, regardless that the corresponding glyph *looks* like U+0364 (COMBINING LATIN SMALL LETTER E) in one font, and it looks like U+0304 (COMBINING MACRON) in another font, and it looks like two five-pointed start side-by-side in a third font, and it looks like Mickey Mouse's ears in Disney.ttf... These are all unacceptable variations in a *Unicode font (in default mode)*. But you can have all kinds of silly variations in *non*-Unicode fonts applied to Unicode text, including ciphers or rebuses... (ok, there are degrees...) Basically, any decorative or handwriting font can't be a Unicode font. (The glyph for my German teachers umlaut was definitely a macron.) Seems pointless to tell a lot of the fontmakers out there that they shouldn't worry about Unicode, because Unicode's only for standard book fonts, but that's the only way I can read your last statement. -- David Starner - [EMAIL PROTECTED] Great is the battle-god, great, and his kingdom-- A field where a thousand corpses lie. -- Stephen Crane, War is Kind
RE: Character identities
Kent Karlsson wrote: For this reason it is quite impermissible to render the combining letter small e as a diaeresis So far so good. There would be no reason for doing such a thing. ... or, for that matter, the diaeresis as a combining letter small e (however, you see the latter version sometimes, very infrequently, in advertisement). This is the case I though we were discussing, and it is a very different case. No, the claim was that diaresis and overscript e are the same, The claim was that dieresis and overscript e are the same in *modern* *standard* German. Or, better stated, that overscript e is just a glyph variant of dieresis, in *modern* *standard* German typeset in Fraktur. Sorry if I haven't stated this clearly enough. so the reversed case Marc is talking about is not different at all. It is. In the first case, we are talking about a glyph variant in *modern* *standard* German, in the second case, we are talking about two different diacritics in some *other* context. (Ancient German? ancient Swedish?). Standing Keld's opinion and Marc's wholehearted support, it Please don't confuse me with Keld! Oooops! My apologies! follows that those infrequent advertisements should be encoded using U+0364... But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a small collection of Medieval superscript letter diactrics, which is supposed to appear primarily in medieval Germanic manuscripts, or to reproduce some usage as late as the 19th century in some languages. Yes, but you should not read too much into the explanation, which, while correct, does not limit the existence of their glyphs to fonts used only by germanic professors... Some of them (overscript e in particular) should be(come) quite commonly occurring in any Fraktur Unicode font. Commonly sounds funny near Fraktur... Using such a character to encode 21st century advertisements is doomed to cause problems: 1) The glyph for U+0364 is more likely found in the font collection of the Faculty of Germanic Studies that on the PC of people wishing to read the advertisement for Ye Olde Küster Pub. So, most people will be unable to view the advertisement correctly. 2) The designer of the advertisement will be unable to use his spell-checker and hyphenator on the advertisement's text. Advertisements should invariably be final spell-checked and hyphenated by humans! Automated spell checkers and hyphenators for German (as well as Scandinavian languages) have (so far) not been good enough even for running text that you want to publish... This has no connection with this discussion. However, IMHO, the presence U+0364 (COMBINING LATIN SMALL LETTER E) in a modern German or Swedish text is just a plain spelling error, and even the naivest spellchecker should flag it as such. 3) User's will be unable to find the Küster Pub by searching Küster in a search engine. Depends on the search engine, and if it uses a correct collation table (for the language) or not... What will actually happen is that everybody will see an empty square, so they'll think that the web designer is an idiot, apart the professors at the Faculty of Germanic Studies, who'll think that the designer is an idiot because she doesn't know the difference between U+0308 and U+0364 in ancient German. Most modern use of Fraktur seem to use diaeresis or double acute for this. U+0308 (COMBINING DIAERESIS) should be the only umlaut to be found in modern German text. What that diacritic *looks* like (two dots, an e, a double acute, a macron, Mickey Mouse's ears), is a choice of the font designer. (But the web designer could use a dynamically downloaded font fragment, if there is worry that all glyphs might not be supported by the fonts used by the vast majority of the target audience.) This too has no connection with this discussion, and is OT. Unicode is concerned with how text is *encoded* the details of fonts and display technology are out of scope. What Unicode really mandates is that the encoding should not change to obtain a certain graphic effect. The real error (IMHO) is the idea that font designers should stick to the *sample* glyphs printed on the Unicode book, because this would force Well, the diacritics are allocated/unified on glyphic grounds. While a diaeresis may look different from font to font, it is basically two dots (of some shape in line with the design of the font), never an e shape. At least not in the *default mode* of a *Unicode font*. And overscript small e will also vary with the font, looking like a shrunken ordinary e glyph of (ideally) the same font. But never like two dots (in the default mode of a Unicode font). You haven't yet defined your meaning of Unicode font and, now, you add a new fancy term: default mode! What's a default mode? Unicode does not require fonts to have any kind of modes. You seem to be
Re: Character identities
Marco Cimarosti marco dot cimarosti at essetre dot it wrote: There are also lots of characters that mean the same, but always (in a Unicode font in default mode) should/must look different. Like M and Roman Numeral One Thousand C D (just to take an example closer to Italy... ;-). Well, the first and only time I have seen that Thousand C D was on the Unicode charts... However, if I'd be asked which glyph is more appropriate for that character, I would say: the same as capital M. I would disagree with this. It seems to me the whole reason for both U+216F ROMAN NUMERAL ONE THOUSAND and U+2180 ROMAN NUMERAL ONE THOUSAND C D to exist is that they should have different glyphs. This is not necessarily is keeping with the purest spirit of Unicode (which might regard these as two glyphs of a single character), but in reality they are encoded as two characters. Note, however, that there is nothing wrong with using the same glyph for U+004D and U+216F, although in many fonts they are different for no obvious reason. -Doug Ewell Fullerton, California
Re: Character identities
On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote: Basically, any decorative or handwriting font can't be a Unicode font. ... Seems pointless to tell a lot of the fontmakers out there that they shouldn't worry about Unicode, because Unicode's only for standard book fonts Hm, what if I want to make, say, snow capped Devanagari glyphs for my hiking company in Nepal? Shouldn't I assign them to Unicode code points? -- . António MARTINS-Tuválkin| ()| [EMAIL PROTECTED] || R. Laureano de Oliveira, 64 r/c esq. | PT-1885-050 MOSCAVIDE (LRS) Não me invejo de quem tem | +351 917 511 549 carros, parelhas e montes | http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe | http://pagina.de/bandeiras/ a água em todas as fontes |
Re: Character identities
On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote: Basically, any decorative or handwriting font can't be a Unicode font. ... Seems pointless to tell a lot of the fontmakers out there that they shouldn't worry about Unicode, because Unicode's only for standard book fonts Hello? Who says decorative or handwriting fonts can't be Unicode fonts? I've got dozens of fonts on my system that prove this wrong. Zapfino, which ships with OS X and which I had the privilege to work on, is about as decorative a handwriting font as you could wish for, and of course it has a Unicode cmap. Or are you working with some definition of 'Unicode font' other than 'font with a Unicode cmap'? John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] It is necessary that by all means and cunning, the cursed owners of books should be persuaded to make them available to us, either by argument or by force. - Michael Apostolis, 1467
Re: Character identities
At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote: On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote: Basically, any decorative or handwriting font can't be a Unicode font. ... Seems pointless to tell a lot of the fontmakers out there that they shouldn't worry about Unicode, because Unicode's only for standard book fonts Hm, what if I want to make, say, snow capped Devanagari glyphs for my hiking company in Nepal? Shouldn't I assign them to Unicode code points? That's what Private Use code positions are for. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote: On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote: Basically, any decorative or handwriting font can't be a Unicode font. ... Seems pointless to tell a lot of the fontmakers out there that they shouldn't worry about Unicode, because Unicode's only for standard book fonts Hm, what if I want to make, say, snow capped Devanagari glyphs for my hiking company in Nepal? Shouldn't I assign them to Unicode code points? That's what Private Use code positions are for. -- Michael Everson * * Everson Typography * * http://www.evertype.com -- I don't think so. He seems to be talking about a specific typographic style. Code points don't care about style, whether it's Franklin Gothic or Snowcapped Helvetica. Don
Re: Character identities
At 13:36 -0700 2002-10-28, John Hudson wrote: Or are you working with some definition of 'Unicode font' other than 'font with a Unicode cmap'? It seemed to me that he was talking about fonts that had characters that weren't in Unicode at all. I don't mean precomposed vowels, but, say, fonts with moon phases in them. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Character identities
Hm, what if I want to make, say, snow capped Devanagari glyphs for my hiking company in Nepal? Shouldn't I assign them to Unicode code points? That's what Private Use code positions are for. -- Michael Everson * * Everson Typography * * http://www.evertype.com Um, Michael, I think Anto'nio was talking about glyphs in a decorative font, which should -- clearly -- just be mapped to ordinary Unicode characters, via an ordinary Unicode cmap. Or do you think that the yellow, cursive, shadow-dropped, 3-D letters Getaway! at: http://www.trekking-in-nepal.com/ should also be represented by Private Use code positions? ;-) --Ken
Re: Character identities
On Mon, Oct 28, 2002 at 09:36:34PM +, Michael Everson wrote: At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote: On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote: Basically, any decorative or handwriting font can't be a Unicode font. ... Seems pointless to tell a lot of the fontmakers out there that they shouldn't worry about Unicode, because Unicode's only for standard book fonts Hm, what if I want to make, say, snow capped Devanagari glyphs for my hiking company in Nepal? Shouldn't I assign them to Unicode code points? That's what Private Use code positions are for. But think of the utility if Unicode added a COMBINING SNOWCAP and COMBINING FIRECAP! But should we combine the SNOWCAP with the ICECAP? (-: -- David Starner - [EMAIL PROTECTED] Great is the battle-god, great, and his kingdom-- A field where a thousand corpses lie. -- Stephen Crane, War is Kind
Re: Character identities
On Mon, Oct 28, 2002 at 01:36:08PM -0700, John Hudson wrote: On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote: Basically, any decorative or handwriting font can't be a Unicode font. ... Seems pointless to tell a lot of the fontmakers out there that they shouldn't worry about Unicode, because Unicode's only for standard book fonts Hello? Who says decorative or handwriting fonts can't be Unicode fonts? [...] Or are you working with some definition of 'Unicode font' other than 'font with a Unicode cmap'? Right above where it was cut it said: Marco: A U+0308 (COMBINING DIAERESIS) should remain a U+0308, regardless that the corresponding glyph *looks* like U+0364 (COMBINING LATIN SMALL LETTER E) in one font, and it looks like U+0304 (COMBINING MACRON) in another font, and it looks like two five-pointed start side-by-side in a third font, and it looks like Mickey Mouse's ears in Disney.ttf... Kent: These are all unacceptable variations in a *Unicode font (in default mode)*. Earlier: Marco: there are fonts which don't have dots over i and j; Kent: You have a slight point there, but those are not intended for running text. And I'm hesitant to label them Unicode fonts. Given that definition of Unicode fonts, a number of decorative or handwriting fonts (though fewer than I expected) are arbitrarily excluded from being Unicode fonts. -- David Starner - [EMAIL PROTECTED] Great is the battle-god, great, and his kingdom-- A field where a thousand corpses lie. -- Stephen Crane, War is Kind
Re: Character identities
At 14:30 -0800 2002-10-28, Kenneth Whistler wrote: Hm, what if I want to make, say, snow capped Devanagari glyphs for my hiking company in Nepal? Shouldn't I assign them to Unicode code points? That's what Private Use code positions are for. -- Michael Everson * * Everson Typography * * http://www.evertype.com Um, Michael, I think Anto'nio was talking about glyphs in a decorative font, which should -- clearly -- just be mapped to ordinary Unicode characters, via an ordinary Unicode cmap. If they correspond to Unicode characters, yes, certainly. Or do you think that the yellow, cursive, shadow-dropped, 3-D letters Getaway! at: http://www.trekking-in-nepal.com/ should also be represented by Private Use code positions? ;-) Not at all. Fonts with images of igloos and yurts would use it, though, I would think. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
At 14:31 -0800 2002-10-28, Figge, Donald wrote: At 20:59 + 2002-10-28, Anto'nio Martins-Tuva'lkin wrote: On 2002.10.28, 13:09, David Starner [EMAIL PROTECTED] wrote: Basically, any decorative or handwriting font can't be a Unicode font. ... Seems pointless to tell a lot of the fontmakers out there that they shouldn't worry about Unicode, because Unicode's only for standard book fonts Hm, what if I want to make, say, snow capped Devanagari glyphs for my hiking company in Nepal? Shouldn't I assign them to Unicode code points? That's what Private Use code positions are for. -- Michael Everson * * Everson Typography * * http://www.evertype.com -- I don't think so. He seems to be talking about a specific typographic style. Code points don't care about style, whether it's Franklin Gothic or Snowcapped Helvetica. I must have misunderstood. I think I only saw the snow-capped and not the Devanagari. Sorry. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Character identities
My USD 0.02, as someone who is neither a professional typographer nor a font designer (more than one, but not quite two, different things)... Discussions about the character-glyph model often mention the essential characteristics of a given character. For example, a Latin capital A can be bold, italic, script, sans-serif, etc., but it must always have that essential A-ness such that readers of (e.g.) English can identify it as an A instead of, say, an O or a 4 or a picture of a duck. (Mark Davis has a chart showing dozens of different A's in his Unicode Myths presentation.) Somewhere in between the obvious relationships (A = A, B ≠ A), we have the case pair A and a. They are not identical, but they are certainly more similar to each other than are A and B. It seems to me, as a non-font guy, that calling a font a Unicode font implies two things: 1. It must be based on Unicode code points. For True- and OpenType fonts, this implies a Unicode cmap; for other font technologies it implies some more-or-less equivalent mechanism. The point is that glyphs must be associated with Unicode code points (not necessarily 1-to-1, of course), not merely with an internal 8-bit table that can be mapped to Unicode only through some other piece of software. 2. The glyphs must reflect the essential characteristics of the Unicode character to which they are mapped. That means a capital A can be bold, italic, script, sans-serif, etc. A small a can also be small-caps (or even full-size caps), but I think this is the only controversial point. In a Unicode font, U+0041 cannot be mapped to a capital A with macron, as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings. (But it can be mapped to a notdef glyph, if the font makes no claim to supporting U+0041.) U+0915 absolutely can have snow on it, or be bold or italic or whatever (or all of these), as long as a Devanagari reader would recognize its essential ka-ness. It cannot look like a Latin A, nor for that matter can U+0041 look like a Devanagari ka. Font guys, do you agree with this? Of course, the term Unicode font is also often used to mean a font that covers all, or nearly all, of Unicode. Font technologies generally don't even allow this, of course, and even by the standards of nearly we are still limiting ourselves to things like Bitstream Cyberbit, Arial Unicode MS, Code2000, Cardo, etc. Right or wrong, this is a commonly accepted meaning for Unicode font. -Doug Ewell Fullerton, California
Re: Character identities
I'm pretty much in agreement with what you say, except the following: Of course, the term Unicode font is also often used to mean a font that covers all, or nearly all, of Unicode. I would consider a Unicode font to be one that met your other conditions, aside from the repertoire. If I had a font that covered Latin, Greek and Cyrillic and worked with Unicode strings, for example, I would still consider that a Unicode font. I just wouldn't consider it a (pick your adjective) full / complete Unicode font. Mark __ http://www.macchiato.com ► “Eppur si muove” ◄ - Original Message - From: Doug Ewell [EMAIL PROTECTED] To: Unicode Mailing List [EMAIL PROTECTED] Sent: Monday, October 28, 2002 17:37 Subject: Re: Character identities My USD 0.02, as someone who is neither a professional typographer nor a font designer (more than one, but not quite two, different things)... Discussions about the character-glyph model often mention the essential characteristics of a given character. For example, a Latin capital A can be bold, italic, script, sans-serif, etc., but it must always have that essential A-ness such that readers of (e.g.) English can identify it as an A instead of, say, an O or a 4 or a picture of a duck. (Mark Davis has a chart showing dozens of different A's in his Unicode Myths presentation.) Somewhere in between the obvious relationships (A = A, B ≠ A), we have the case pair A and a. They are not identical, but they are certainly more similar to each other than are A and B. It seems to me, as a non-font guy, that calling a font a Unicode font implies two things: 1. It must be based on Unicode code points. For True- and OpenType fonts, this implies a Unicode cmap; for other font technologies it implies some more-or-less equivalent mechanism. The point is that glyphs must be associated with Unicode code points (not necessarily 1-to-1, of course), not merely with an internal 8-bit table that can be mapped to Unicode only through some other piece of software. 2. The glyphs must reflect the essential characteristics of the Unicode character to which they are mapped. That means a capital A can be bold, italic, script, sans-serif, etc. A small a can also be small-caps (or even full-size caps), but I think this is the only controversial point. In a Unicode font, U+0041 cannot be mapped to a capital A with macron, as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings. (But it can be mapped to a notdef glyph, if the font makes no claim to supporting U+0041.) U+0915 absolutely can have snow on it, or be bold or italic or whatever (or all of these), as long as a Devanagari reader would recognize its essential ka-ness. It cannot look like a Latin A, nor for that matter can U+0041 look like a Devanagari ka. Font guys, do you agree with this? Of course, the term Unicode font is also often used to mean a font that covers all, or nearly all, of Unicode. Font technologies generally don't even allow this, of course, and even by the standards of nearly we are still limiting ourselves to things like Bitstream Cyberbit, Arial Unicode MS, Code2000, Cardo, etc. Right or wrong, this is a commonly accepted meaning for Unicode font. -Doug Ewell Fullerton, California
Re: Character identities
All this talk about the letter A reminded me of something from Hofstadter: The problem of intelligence, as I see it is to understand the fluid nature of mental categories, to understand the invariant cores of percepts such as your mother’s face, to understand the strangely flexible yet strong boundaries of concepts such as “chair” or the letter “a“ … The central problem of (artificial intelligence) is the question: What is the letter ‘a’ and ‘i’? ...By making these claims, I am suggesting that, for any program to handle letterforms with the flexibility that human beings do, it would have to possess full-scale general intelligence. -- Douglas R. Hofstadter, from one of his Metamagical Themas articles The notion that we could ever capture the essence of A-ness has already been discussed at length and dismissed as impossible without an AI breakthrough. :-) MichKa
Re: Character identities
Doug Ewell scripsit: 1. It must be based on Unicode code points. For True- and OpenType fonts, this implies a Unicode cmap; for other font technologies it implies some more-or-less equivalent mechanism. The point is that glyphs must be associated with Unicode code points (not necessarily 1-to-1, of course), not merely with an internal 8-bit table that can be mapped to Unicode only through some other piece of software. If it's a FIGlet font, of course, it's automatically Unicode, since FIGlet's table is 32 bits wide. In a Unicode font, U+0041 cannot be mapped to a capital A with macron, as it is in Bookshelf Symbol 1; nor to a six-pointed star, as in Monotype Sorts; nor to a hand holding up two fingers, as in Wingdings. (But it can be mapped to a notdef glyph, if the font makes no claim to supporting U+0041.) In fact, these fonts map these glyphs to U+F041. Only when seen as 8-bit fonts do they map to 0x41. -- With techies, I've generally found John Cowan If your arguments lose the first round http://www.reutershealth.com Make it rhyme, make it scan http://www.ccil.org/~cowan Then you generally can [EMAIL PROTECTED] Make the same stupid point seem profound! --Jonathan Robie
Re: Character identities
At 18:37 10/28/2002, Doug Ewell wrote: It seems to me, as a non-font guy, that calling a font a Unicode font implies two things: 1. It must be based on Unicode code points. For True- and OpenType fonts, this implies a Unicode cmap; for other font technologies it implies some more-or-less equivalent mechanism. The point is that glyphs must be associated with Unicode code points (not necessarily 1-to-1, of course), not merely with an internal 8-bit table that can be mapped to Unicode only through some other piece of software. My only ammendment to that would be: 'The point is that those glyphs that are intended to represent the default form of the characters supported by that font must be associated with Unicode codepoints, whether directly or indirectly, not merely...' Not every glyph in a font needs to be encoded, and in general glyph variants and things like ligatures should not be, unless standard Unicode codepoints happen to be available for them (even then, it would be legitimate to leave them unencoded and access them only via glyph processing features). 2. The glyphs must reflect the essential characteristics of the Unicode character to which they are mapped. That means a capital A can be bold, italic, script, sans-serif, etc. A small a can also be small-caps (or even full-size caps), but I think this is the only controversial point. Yes, I would agree with that, with the caveat that the A-ness of an A isn't necessarily something that can be defined: it can only be recognised. Of course, the term Unicode font is also often used to mean a font that covers all, or nearly all, of Unicode. Font technologies generally don't even allow this, of course, and even by the standards of nearly we are still limiting ourselves to things like Bitstream Cyberbit, Arial Unicode MS, Code2000, Cardo, etc. Right or wrong, this is a commonly accepted meaning for Unicode font. I really think we should all do what we can to bury this use of the term. It is singularly unhelpful, and the idea in the minds of some customers that they *need* a font that covers all of Unicode has not done anyone any good. Sure some font developers made some money making these ridiculously huge grab-bag fonts, but their time could have been much better spent. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] It is necessary that by all means and cunning, the cursed owners of books should be persuaded to make them available to us, either by argument or by force. - Michael Apostolis, 1467
Re: Character identities
John Hudson commented. At 02:46 10/26/2002, William Overington wrote: I don't know whether you might be interested in the use of a small letter a with an e as an accent codified within the Private Use Area, but in case you might be interested, the web page is as follows. http://www.users.globalnet.co.uk/~ngo/ligatur5.htm I have encoded the a with an e as an accent as U+E7B4 so that both variants may coexist in a document encoded in a plain text format and displayed with an ordinary TrueType font. If anyone were interested, he could do this himself and use any codepoint in the Private Use Area. The meaning which I intended to convey was as follows. I don't know whether you might be interested in having a look at a particular example of the use of a small letter a with an e as an accent codified within the Private Use Area by an individual with an interest in applying Unicode, but in case you might be interested in having a look at that particular example, the web page is as follows. If, following from your response to the way that you read my sentence, someone were interested in defining a codepoint in the Private Use Area then certainly he or she could do that himself or herself and use any codepoint in the Private Use Area. However, exercising that freedom is something which could benefit from some thought. If someone wishes to encode an a with an e as an accent in the Private Use Area, he or she may wish to be able to apply that code point allocation in a document. If he or she looks at which Private Use Area codepoints are already in use within some existing fonts, then selecting a code point which is at present unused in those fonts might give a greater chance of his or her new character assignment being implemented than choosing a code point for which those fonts already have a glyph in use. Searching through such fonts takes time and requires some skill. If someone does wish to use a Private Use Area code point for an a with an e accent, then by using U+E7B4 does give a possible slight advantage in that the code point is already part of a published set of code points available on the web, for, even though that set of code points is not a standard, it is a consistent set and other people might well use those codepoints as well. However, anyone may produce and publish such a set of code point allocations of his or her own if he or she so wishes, or indeed keep them to himself or herself. Yet I was not seeking to make any such point in my posting. I simply added to a thread on a specialised topic what I thought might be a short interesting note with a link to a web page at which some readers might like to look. The web page indeed provides two external links to interesting documents on the web. Maybe it is time to include a note in the Unicode Standard to suggest that 'Private' Use Area means that one should keep it to oneself Well, at the moment the Unicode Standard does include the word publish in the text about the Private Use Area. I have published details of various uses of the Private Use Area on the web yet not mentioned them in this forum. For example, readers might perhaps like to have a look at the following. http://www.users.globalnet.co.uk/~ngo/ast07101.htm Anyone who chooses to do so might like to have a look at the following file as well, which introduces the application area. http://www.users.glpbalnet.co.uk/~ngo/ast02100.htm This is an application of the Unicode Private Use Area so as to produce a set of soft buttons for a Java calculator so that the twenty hard button minimum configuration of a hand held infra-red control device for a DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) television can be used in a consistent manner to signal information from the end user to the computer in the television set. I am very pleased with the result. The encoding achieves a useful effect while being consistent for information handling purposes with the Unicode specification, so that an input stream of characters may be processed by a Java program without any ambiguity over whether a particular code point is a printing character or a calculator button (or indeed mouse event or simulated mouse event as mouse events are also encoded using the Private Use Area in my research). William Overington 29 October 2002
Re: Character identities
At 04:39 PM 10/28/2002 -0600, David Starner wrote: But think of the utility if Unicode added a COMBINING SNOWCAP and COMBINING FIRECAP! But should we combine the SNOWCAP with the ICECAP? (-: Unicode captures the ice-age during the global warming era! Do we have codepoints for images found on the walls of caves? :) Barry www.i18n.com
Re: Character identities
I don't know whether you might be interested in the use of a small letter a with an e as an accent codified within the Private Use Area, but in case you might be interested, the web page is as follows. http://www.users.globalnet.co.uk/~ngo/ligatur5.htm I have encoded the a with an e as an accent as U+E7B4 so that both variants may coexist in a document encoded in a plain text format and displayed with an ordinary TrueType font. http://www.users.globalnet.co.uk/~ngo William Overington 25 October 2002
Re: Character identities
At 02:46 10/26/2002, William Overington wrote: I don't know whether you might be interested in the use of a small letter a with an e as an accent codified within the Private Use Area, but in case you might be interested, the web page is as follows. http://www.users.globalnet.co.uk/~ngo/ligatur5.htm I have encoded the a with an e as an accent as U+E7B4 so that both variants may coexist in a document encoded in a plain text format and displayed with an ordinary TrueType font. If anyone were interested, he could do this himself and use any codepoint in the Private Use Area. Maybe it is time to include a note in the Unicode Standard to suggest that 'Private' Use Area means that one should keep it to oneself and not keep pestering other people about one's private use of it. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] It is necessary that by all means and cunning, the cursed owners of books should be persuaded to make them available to us, either by argument or by force. - Michael Apostolis, 1467
RE: Character identities
Peter Constable wrote: then *any* font having a unicode cmap is a Unicode font. No, not if the glyps (for the supported characters) are inappropriate for the characters given. Kent is quite right here. There are a *lot* of fonts out there with Unicode cmaps that do not at all conform to the Unicode standard --- custom-encoded (some call them hacked) fonts, usually abusing the characters that make up Windows cp1252. IMHO, you are confusing two very different things here: 1) Assigning arbitrary glyphs to some Unicode characters. E.g., assigning the $ character to long S; the ASCII letters to Greek letters; the whole Latin-1 range to Devanagari characters, etc. 2) Choosing strange or unorthodox glyph variants for some Unicode characters. The hacked fonts you mention are case (1); what is being discussed in this thread is case (2). Like it or not, superscript e *is* the same diacritic that later become ¨, so there is absolutely no violation of the Unicode standard. Of course, this only applies German. The fact that umlaut and dieresis have been unified in Unicode, makes such a variant glyph only applicable to a font targeted to German. You could not use that font to, e.g., typeset English or French, because the ¨ in coöperation or naïve is a dieresis, not an umlaut sign. There are other cases out there of Unicode fonts suitable for Chinese but not Japanese, Italian but not Polish, Arabic but not Urdu, etc. Why should a Unicode font suitable for German but not for English be any worse? _ Marco
Re: Character identities
- Original Message - From: Marco Cimarosti [EMAIL PROTECTED] To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, October 25, 2002 10:42 AM Subject: RE: Character identities Of course, this only applies German. And Swedish. Stefan _ Gratis e-mail resten av livet på www.yahoo.se/mail Busenkelt!
RE: Character identities
... Like it or not, superscript e *is* the same diacritic that later become ¨, so there is absolutely no violation of the Unicode standard. Of course, this only applies German. Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different derivations from o^e), it is just as inappropriate for font makers to use a o^e glyph for ö (by default in a Unicode font). Though in some sense the same they are still different enough for authors to care, and it is up to the document author/editor to decide, not the font maker. From: [EMAIL PROTECTED] ... We've implemented this successfully in OpenType fonts using the Historical Forms hist feature. If the umlaut to overscript e transformation is put under this feature for some fonts, I see no major reason to complain... (As others have noted, it does not really work for the long s, unless the language is labelled 'en'...) /Kent K
Re: Character identities
To all contributors to this thread: Please cease cc-ing [EMAIL PROTECTED]! The CC was meant for my remark on fuzzy search wrt. long-s and round-s. Google are certainly not interested in any and all other turns this thread has taken, or may take later. David J. Perry had written: An OpenType font that is smart enough to substitute a long s glyph at the right spots is the much superior long-term solution. To which I had replied: This will not work, cf. infra. John Hudson wrote: To be accurate, it works for display of English but not for German. David's remark was about German Fraktur orthography. My quote was too short, so this detail was lost. I apologize for any misunderstandings possibly caused by my omission. Best wishes, Otto Stolz
Superscript e (was: Character identities)
Marco Cimarosti (amongst others, using the same term) wrote: superscript e *is* the same diacritic that later become ¨ The term superscript e does not aptly describe the situation. Rather, the German a-Umlaut is derived from U+0061 U+0364 (LATIN SMALL CHARACTER A + COMBINING LATIN SMALL LETTER E), cf. http://www.unicode.org/charts/PDF/U0300.pdf. Best wishes, Otto Stolz
RE: Character identities
At 14:04 25.10.2002 +0200, Kent Karlsson wrote: Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different derivations from o^e), it is just as inappropriate for font makers to use a o^e glyph for ö (by default in a Unicode font). Though in some sense the same they are still different enough for authors to care, and it is up to the document author/editor to decide, not the font maker. My wholehearted support! DIN asked for the combining letter small e as well as the other combining small letters specifically to cater for the requirements of scholars in a number of countries, notably Germany. In a large number of editions and scholarly dictionaries, both diacritics, the combining diaeresis and the combining letter e, are used on the very same page, even directly next to each other. The former is used for modern German words, the latter for medieval German words. The combining letter small e does not even necessarily stand for what today is the umlaut, it may have a number of different interpretations. For modern and medieval German words, the base font is in these cases the same -- editions are not normally printed in some sort of pseudo-archaic font. For this reason it is quite impermissible to render the combining letter small e as a diaeresis or, for that matter, the diaeresis as a combining letter small e (however, you see the latter version sometimes, very infrequently, in advertisement). As to the long s, it is not used for writing present-day German except in rare cases, notably in some scholarly editions and in the Fraktur script. Very few texts beyond the names of newspapers are nowadays produced in Fraktur. To put the long s on the German keyboard would be quite contrary to user requirements -- and if a requirement existed, it would be DIN's job to amend DIN 2137-2 and the upcoming DIN 2137-12 to cater for it. Best regards, Marc * Marc Wilhelm Küster Saphor GmbH Fronländer 22 D-72072 Tübingen Tel.: (+49) / (0)7472 / 949 100 Fax: (+49) / (0)7472 / 949 114
RE: Character identities
Marc Wilhelm Küster wrote: At 14:04 25.10.2002 +0200, Kent Karlsson wrote: Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different derivations from o^e), it is just as inappropriate for font makers to use a o^e glyph for ö (by default in a Unicode font). Though in some sense the same they are still different enough for authors to care, and it is up to the document author/editor to decide, not the font maker. My wholehearted support! [...] For this reason it is quite impermissible to render the combining letter small e as a diaeresis So far so good. There would be no reason for doing such a thing. If the author of a scholarly work used U+0364 (COMBINING LATIN SMALL LETTER E), this character should be displayed as either a letter e superscript to the base letter, or as an empty square (for fonts not caring about that character). or, for that matter, the diaeresis as a combining letter small e (however, you see the latter version sometimes, very infrequently, in advertisement). This is the case I though we were discussing, and it is a very different case. Standing Keld's opinion and Marc's wholehearted support, it follows that those infrequent advertisements should be encoded using U+0364... But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a small collection of Medieval superscript letter diactrics, which is supposed to appear primarily in medieval Germanic manuscripts, or to reproduce some usage as late as the 19th century in some languages. Using such a character to encode 21st century advertisements is doomed to cause problems: 1) The glyph for U+0364 is more likely found in the font collection of the Faculty of Germanic Studies that on the PC of people wishing to read the advertisement for Ye Olde Küster Pub. So, most people will be unable to view the advertisement correctly. 2) The designer of the advertisement will be unable to use his spell-checker and hyphenator on the advertisement's text. 3) User's will be unable to find the Küster Pub by searching Küster in a search engine. What will actually happen is that everybody will see an empty square, so they'll think that the web designer is an idiot, apart the professors at the Faculty of Germanic Studies, who'll think that the designer is an idiot because she doesn't know the difference between U+0308 and U+0364 in ancient German. The real error (IMHO) is the idea that font designers should stick to the *sample* glyphs printed on the Unicode book, because this would force graphic designer to change the *encoding* of their text in order to get the desired result. Another big error (IMHO, once again) is the idea that two different Unicode characters should look different. The difference must be preserved when it is useful -- e.g., U+0308 should not look like U+0364 in a font designed for publishing books on the history of German! What should really happen, IMHO, is that modern German should be encoded as modern German. A U+0308 (COMBINING DIAERESIS) should remain a U+0308, regardless that the corresponding glyph *looks* like U+0364 (COMBINING LATIN SMALL LETTER E) in one font, and it looks like U+0304 (COMBINING MACRON) in another font, and it looks like two five-pointed start side-by-side in a third font, and it looks like Mickey Mouse's ears in Disney.ttf... _ Marco
RE: Character identities
Kent Karlsson wrote: ... Like it or not, superscript e *is* the same diacritic that later become ¨, so there is absolutely no violation of the Unicode standard. Of course, this only applies German. Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different derivations from o^e), it is just as inappropriate for font makers to use a o^e glyph for ö (by default in a Unicode font). Though in some sense the same they are still different enough for authors to care, and it is up to the document author/editor to decide, not the font maker. It is certainly up to the author of the document to decide. But, as I explained more at length in my reply to Marc, the are two different approaches for deciding this: 1. When this decision is a matter of *content* (as may be the case when writing about linguistics, to differentiate spellings with o^e from spellings with ö), it is more appropriate to make the difference at the *encoding* level, by using the appropriate code point. 2. When this decision is only a matter of *presentation*, it is more appropriate to make the difference by using a font which uses the desired glyph for the normal ¨. If the umlaut to overscript e transformation is put under this feature for some fonts, I see no major reason to complain... (As others have noted, it does not really work for the long s, unless the language is labelled 'en'...) And, of course, in an ideal word option 2 will be done by switching a font feature, rather than switching to an ad-hoc font. This makes it possible for font designers to provide a single font which covers both needs. But this is just optimization, not compliance! _ Marco
Re: hacked fonts in MS-Windows: rev. solidus vs Yen/Won(was..RE: Character identities)
Jungshik Shin jshin at mailaps dot org wrote: ... MS-Windows has to provide distinct ways to enter 'reverse solidus' and 'Yen/Won' sign (both full-width and half-width) in Japanese and Korean IMEs. ... Good points, well stated. To make matters worse, the keyboard references at Microsoft's Global Development subsite [1] show: 1. for Korean, a won sign and the legend U+005C Reverse Solidus\nWon Sign 2. for Japanese, a yen sign and the legend U+005C Reverse Solidus\nYen Sign This helps perpetuate the idea that U+005C could be either a reverse solidus, a won sign, or a yen sign, depending on the font. This is exactly what Unicode is *not* about. Microsoft usually understands this. -Doug Ewell Fullerton, California [1] http://www.microsoft.com/globaldev/keyboards/keyboards.asp
Re: Character identities
David Starner starner at okstate dot edu wrote: Likewise, ä is printed as a with e above in old texts.* Would it be acceptable to make a font with a a^e glyph for ä? It's not even changing the meaning of the character in any way. Indeed, that is exactly what Sütterlin fonts do. (Then again, Sütterlin fonts assign the long-s glyph to U+0073 and make you type $ to get a round s, so they may not be the best example.) Stefan Persson alsjebegrijptwatikbedoel at yahoo dot se replied: Unicode defines a^e as U+0061 U+0364 (though it's exactly the same character as ä). Why? They're not exactly the same, except in this particular German example. Combining superscript e was encoded along with combining superscript a, i, o, u, c, d, h, m, r, t, v, and x, none of which evolved into a real diacritical mark the way e did. Combining e had non-German uses as well, as in early modern English Yͤ (which did not become Ÿ). As for the diaeresis, its use in French, English (coöperate), and other languages often has no relationship to the letter e. Indeed, in the sequence güe in Spanish, the diaeresis serves as a sort of anti-e, ensuring the separate pronunciation of the u when the e would otherwise prevent it! Historically speaking, I and J were once equivalent, and U and V were once equivalent, but they are all encoded today. -Doug Ewell Fullerton, California
RE: Character identities
First, is it compliant with Unicode for an Antiqua font to use an s glyph for ſ (U+017F)? It makes switching between Antiqua and Fraktur fonts possible, and it is arguably the glyph given to the middle s in modern Antiqua fonts. Likewise, ä is printed as a with e above in old texts.* Would it be acceptable to make a font with a a^e glyph for ä? Please don't. a^e is U+0061, U+0364. It's not even changing the meaning of the character in any way. And ä and æ are the same, likewise are ö, œ, and ø the same (in some sense, but not in general). Some (in Denmark and Norway, no-where else) even consider aa and å (and a, small o above) to be the same (but not quite, especially when spelling names...). Still they are definitely different enough to be considered othographic differences, not font differences. Likewise for your examples. As for collation, and searches that are advanced enough to make use of collation keys, the collation tables *can* be tailored so that these variants, within each equivalence (in some sense) group, have the same level 1 weights (which is appropriate for scandinavian and german uses), but different level 2 weights (as is appropriate, since this difference is (usually) more significant than case distinctions). /Kent Karlsson
Re: Character identities
On Thu, Oct 24, 2002 at 11:46:04AM +0200, Kent Karlsson wrote: Please don't. a^e is U+0061, U+0364. Which is great, if you're a scholar trying to accurately reproduce an old text; if you're Joe User, trying to print a document in an Olde German font, it's far more inconvienant than helpful. Still they are definitely different enough to be considered othographic differences, not font differences. Changing a^e to ä is all that would need to be done to make the books that use a^e look like those of the same timeframe that use ä. I'm not sure where you draw the line between font and orthographic differences, but this does not require dictionary lookup, and for my purposes is most easily done by a font change. -- David Starner - [EMAIL PROTECTED] Great is the battle-god, great, and his kingdom-- A field where a thousand corpses lie. -- Stephen Crane, War is Kind
Re: Character identities
David J. Perry had written: An OpenType font that is smart enough to substitute a long s glyph at the right spots is the much superior long-term solution. This will not work, cf. infra. David Starner wrote: no matter what the convention, it requires a dictionary lookup for various case; A dictionary lookup will not suffice, as there are pairs of words differing only in an ſ vs. s (long vs. round s), e. g. · Wachſtube ['vaxʃtu:bə] = guard room · Wachstube ['vakstu:bə] = wax tube [Pronounciation in brackets] To substitute a long s glyph at the right spots you must fully analyse the sentence -- grammatically, and in cases as in the previous example, even semantically -- to find the correct spelling. Hence, it is much easier to type the ſ, and s, characters in their proper places, and then replace ſ with s, if so desired. Fuzzy searches should equate ſ with s. Apparently, Google.de doesn't do it right: a search vor Kinderſtube yields no hits, while a search for Kinderstube yields about 10700. Best wishes, Otto Stolz
Re: Character identities
- Original Message - From: [EMAIL PROTECTED] To: John Hudson [EMAIL PROTECTED] Cc: Otto Stolz [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, October 24, 2002 8:44 PM Subject: Re: Character identities Looking at a Fraktur book published in 1917, which is neither English nor German, use of the long s appears almost whimsical. Words like historie and utgivelse use the long s, while words like oplysninger and ensformig use the final s medially. (The title of the book is En norsk bygds historie.) S is the firÅ¿t letter of a Å¿yllable in words Å¿uch at hiÅ¿torie and utgivelÅ¿e, but the laÅ¿t in words Å¿uch as oplysninger and ensformig. Stefan _ Gratis e-mail resten av livet på www.yahoo.se/mail Busenkelt!
Re: Character identities
John Hudson wrote, At 06:47 AM 24-10-02, Otto Stolz wrote: David J. Perry had written: An OpenType font that is smart enough to substitute a long s glyph at the right spots is the much superior long-term solution. This will not work, cf. infra. To be accurate, it works for display of English but not for German. The British convention for using the long-s can be handled contextually, because it does not need to consider whether the letter is occuring at the beginning or end of a syllable. We've implemented this successfully in OpenType fonts using the Historical Forms hist feature. German presents a much more difficult problem. Looking at a copy of Of the Law-Terms: A Discourse Written by The Learned Antiquary. Sir Henry Spelman, Kt. (1684 edition) here. Use of initial/medial s versus final s is straightforward except in cases like Malmesbury and Sarisburiam, in which the final s is used medially. Looking at a Fraktur book published in 1917, which is neither English nor German, use of the long s appears almost whimsical. Words like historie and utgivelse use the long s, while words like oplysninger and ensformig use the final s medially. (The title of the book is En norsk bygds historie.) Best regards, James Kass.
RE: Character identities
Kent Karlsson wrote: And it is easy for Joe User to make a simple (visual...) substitution cipher by just swiching to a font with the glyphs for letters (etc.) permuted. Sure! I think it would be a bad idea to call it a Unicode font though... (That it technically may have a unicode cmap is beside my point.) The only meaning that I can attach to the expression Unicode font is a pan-Unicode font: a font which covers all the scripts in Unicode. If this is what you mean, then displaying ä as an a^e is clearly not a good idea. But neither choosing Fraktur glyphs would be a good idea! How can you have Fraktur IPA!? Fraktur Pinyin!? Fraktur Devanagari!? Fraktur Arabic!? In general, no noticeable difference from the glyphs used on the Unicode book would be a good idea for a pan-Unicode font. But if by Unicode font you just mean a font which is compliant with the Unicode standard, but only supports one or more of the scripts, then *any* font having a unicode cmap is a Unicode font. And also many fonts *not* having a Unicode cmap are, provided that something inside or outside the font knows how to pick up the right glyphs. In this sense, what is or is not appropriate depends on the font's style and targeted usages and languages: there are fonts which don't have dots over i and j; fonts where U+0059 and U+03A5 look different; fonts where U+0061, U+0251, U+03B1 and U+FF41 look identical; fonts where capital and small letters look identical... Why can't there be a Fraktur font where ä and a^e look identical, if this is appropriate for that typographical style and for the usages and languages intended for the font? Ciao. Marco
Re: Character identities
At 09:46 -0700 2002-10-24, John Hudson wrote: At 06:47 AM 24-10-02, Otto Stolz wrote: David J. Perry had written: An OpenType font that is smart enough to substitute a long s glyph at the right spots is the much superior long-term solution. This will not work, cf. infra. To be accurate, it works for display of English but not for German. The British convention for using the long-s can be handled contextually, because it does not need to consider whether the letter is occuring at the beginning or end of a syllable. Not even for compounds? -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Character identities
At 06:47 AM 24-10-02, Otto Stolz wrote: David J. Perry had written: An OpenType font that is smart enough to substitute a long s glyph at the right spots is the much superior long-term solution. This will not work, cf. infra. To be accurate, it works for display of English but not for German. The British convention for using the long-s can be handled contextually, because it does not need to consider whether the letter is occuring at the beginning or end of a syllable. We've implemented this successfully in OpenType fonts using the Historical Forms hist feature. German presents a much more difficult problem. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] It is necessary that by all means and cunning, the cursed owners of books should be persuaded to make them available to us, either by argument or by force. - Michael Apostolis, 1467
Re: Long S on keyboard (was: Character identities)
- Message d'origine - De : Otto Stolz [EMAIL PROTECTED] À : Doug Ewell [EMAIL PROTECTED] Cc : Unicode Mailing List [EMAIL PROTECTED]; Torsten Mohrin [EMAIL PROTECTED] Envoyé : 24 oct. 2002 12:06 Objet : Long S on keyboard (was: Character identities) Doug Ewell wrote: I'm not aware of any keyboard layout, German or otherwise, that contains U+017F. Would it be reasonable to suggest that it be added to the standard German layout? AltGr+s seems to be available. To whom would you suggest such an addition? DIN ? Are its standard as loosely followed as the Canadian standards as far as PC keyboards are concerned? It is about impossible to find the CSA/ACNOR keyboard (CAN/CSA Z243.200-92) in the main office and electronic equipement stores in Canada... P. A. - o - O - o - Unicode et ISO10646 Nouveaux articles http://hapax.iquebec.com
RE: Character identities
And it is easy for Joe User to make a simple (visual...) substitution cipher by just swiching to a font with the glyphs for letters (etc.) permuted. Sure! I think it would be a bad idea to call it a Unicode font though... (That it technically may have a unicode cmap is beside my point.) Likewise for for your (less extreme) suggestions. They are very close to suggesting making a swedish text use Danish writing style for åäö (aa or å, æ, ø) by just a font change. Which you easily could do by a special font. Would that font be a Unicode font? I think all of these changes would be unexpected (for the Latin script) from a mere font change (between Unicode fonts). If someone really wants such substitutions, it is easy enough to produce using character string substitution. No fonts like you suggest are needed. But you would need (a) font(s) that display U+0061, U+0364 and similar cases properly. The latter would be very welcome! /Kent K On Thu, Oct 24, 2002 at 11:46:04AM +0200, Kent Karlsson wrote: Please don't. a^e is U+0061, U+0364. Which is great, if you're a scholar trying to accurately reproduce an old text; if you're Joe User, trying to print a document in an Olde German font, it's far more inconvienant than helpful. Still they are definitely different enough to be considered othographic differences, not font differences. Changing a^e to ä is all that would need to be done to make the books that use a^e look like those of the same timeframe that use ä. I'm not sure where you draw the line between font and orthographic differences, but this does not require dictionary lookup, and for my purposes is most easily done by a font change.
Long S on keyboard (was: Character identities)
Doug Ewell wrote: I'm not aware of any keyboard layout, German or otherwise, that contains U+017F. Would it be reasonable to suggest that it be added to the standard German layout? AltGr+s seems to be available. It would certainly not hurt to have it there. Fraktur, and Long-s, are not much used, these days. So, there will be not much demand for a long-s key -- though it would come handy for some kinds of usage, e. g. modern advertising, or reproducing texts from before ~1950 (cf. http://www.gutenberg2000.de/). Most German Fraktur fonts currently available seem to have particular, proprietary encodings. A standardized Long-s key would certainly help to promote Unicode amongst Fraktur font designers. Best wishes, Otto Stolz
RE: Character identities
Kent Karlsson wrote: And it is easy for Joe User to make a simple (visual...) substitution cipher by just swiching to a font with the glyphs for letters (etc.) permuted. Sure! I think it would be a bad idea to call it a Unicode font though... (That it technically may have a unicode cmap is beside my point.) The only meaning that I can attach to the expression Unicode font is a pan-Unicode font: a font which covers all the scripts in Unicode. If this is what you mean, No. (No current font technology can handle that b.t.w., them having a limit of 64 Ki glyphs...; you'd need to one way or another coalesce several fonts. Or do something very neat for CJK...) But if by Unicode font you just mean a font which is compliant with the Unicode standard, but only supports one or more of the scripts, Yes, including that the glyphs are recognisably correct for the given characters. then *any* font having a unicode cmap is a Unicode font. No, not if the glyps (for the supported characters) are inappropriate for the characters given. In this sense, what is or is not appropriate depends on the font's style and targeted usages and languages: there are fonts which don't have dots over i and j; You have a slight point there, but those are not intended for running text. And I'm hesitant to label them Unicode fonts. fonts where U+0059 and U+03A5 look different; Of course, those aren't even in the same script (though they are similarlooking). fonts where U+0061, U+0251, U+03B1 and U+FF41 look identical; So? fonts where capital and small letters look identical... If you want small caps, or capitals, via the font, yes. (But that should not be the default 'mode', should it?) Why can't there be a Fraktur font where ä and a^e look identical, if ä and a^e look different even in Fraktur... Maybe the use of ä in Fraktur is a beast, but that is beside my point. this is appropriate for that typographical style and for the usages and languages intended for the font? Of course you can have such a font. You can have any font you like. But I would not label it a Unicode font (regardless if there is a Unicode cmap, in a particular subset of font technologies, or not; bugs nothwithstanding). Talking about this particular subset of font technologies, maybe interested parties (not me) should lobby for a new font feature for this. But do you really want a font feature for this? Is it worth the cost? (I'd just do some global substitutions; or put that in a little special-purpose utility somewhere.) /Kent K Ciao. Marco
Re: Long S on keyboard (was: Character identities)
At 12:47 -0400 2002-10-24, Patrick Andries wrote: - Message d'origine - De : Otto Stolz [EMAIL PROTECTED] Ä : Doug Ewell [EMAIL PROTECTED] Cc : Unicode Mailing List [EMAIL PROTECTED]; Torsten Mohrin [EMAIL PROTECTED] Envoy© : 24 oct. 2002 12:06 Objet : Long S on keyboard (was: Character identities) Doug Ewell wrote: I'm not aware of any keyboard layout, German or otherwise, that contains U+017F. Would it be reasonable to suggest that it be added to the standard German layout? AltGr+s seems to be available. I'm developing some drivers which take it into account. Of course GHA and WYNN and HWAIR are of greater concern to me -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Character identities
On 10/24/2002 01:02:39 PM Kent Karlsson wrote: then *any* font having a unicode cmap is a Unicode font. No, not if the glyps (for the supported characters) are inappropriate for the characters given. Kent is quite right here. There are a *lot* of fonts out there with Unicode cmaps that do not at all conform to the Unicode standard --- custom-encoded (some call them hacked) fonts, usually abusing the characters that make up Windows cp1252. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Character identities
I have several questions about character identities. First, is it compliant with Unicode for an Antiqua font to use an s glyph for ſ (U+017F)? It makes switching between Antiqua and Fraktur fonts possible, and it is arguably the glyph given to the middle s in modern Antiqua fonts. Likewise, ä is printed as a with e above in old texts.* Would it be acceptable to make a font with a a^e glyph for ä? It's not even changing the meaning of the character in any way. (I suspect the answer is it's not technically complaint, but nobody cares.) (To my surprise, I came across a text from 1920 that used the e-above instead of a diearsis. The only other texts I've see with this date before 1810. It was Islands Kultur zur Wikingerzeit by Felix Niedner, in the series (?) Thule: Altnordische Dichtung und Prosa, which leads me to believe, based off my limited German, that it's a deliberate anacronism. Right?) As a third case, I looked briefly at information and advocacy of the duodecimal system. Chi and epsilon have been used as glyphs for 10 and 11, as well as an upside-down 2 and 3, a chi and reversed pound symbol (? I'd need at that one again . . .) and * and #. Unified, they might a proposal here, if someone still cares enough to make it. Would it be unreasonable to unify them? There's quite a disparity in glyphs, but not much argument against them all being the same character, and I don't think there's anyone wanting to make the distinction. -- David Starner - [EMAIL PROTECTED] Great is the battle-god, great, and his kingdom-- A field where a thousand corpses lie. -- Stephen Crane, War is Kind
Re: Character identities
- Original Message - From: David Starner [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, October 23, 2002 7:00 PM Subject: Character identities Likewise, ä is printed as a with e above in old texts.* Would it be acceptable to make a font with a a^e glyph for ä? It's not even changing the meaning of the character in any way. Unicode defines a^e as U+0061 U+0364 (though it's exactly the same character as ä). Why? Stefan _ Gratis e-mail resten av livet på www.yahoo.se/mail Busenkelt!
Re: Character identities
David Starner wrote: First, is it compliant with Unicode for an Antiqua font to use an s glyph for ſ (U+017F)? It makes switching between Antiqua and Fraktur fonts possible, and it is arguably the glyph given to the middle s in modern Antiqua fonts. Likewise, ä is printed as a with e above in old texts.* Would it be acceptable to make a font with a a^e glyph for ä? It's not even changing the meaning of the character in any way. In my opinion, this is all reasonable and should be allowed. Viel Erfolg! As a third case, I looked briefly at information and advocacy of the duodecimal system. Chi and epsilon have been used as glyphs for 10 and ... I assume that the answer will be that these things are just alternate uses of existing characters. markus -- Opinions expressed here may not reflect my company's positions unless otherwise noted.