RE: Is long s a presentation form?
Michael Everson wrote: I like to think of the long s as similar to the final sigma. Nobody thinks that final sigma should be a presentation form of sigma. Never say nobody: I *do* think that Greek final sigma, final Hebrew letters, and Latin long s should all be presentation forms. I think that they are encoded as separate characters only because of compatibility with pre-existing standards such as ISO 8859. Occasional exceptions to the general distributional rules of these presentation forms would not have been a valid reason to encode them as separate characters. Similar exceptions also occur in Indic and Arabic scripts (e.g., the Arabic abbreviation for plural is a jiim in initial form). These case can be supported in plain-text using ZWJ and ZWNJ: Wachstube = German for guard room; WachsZWNJtube = German for wax tube. jiimZWJ = Arabic for plural; Nobody really uses long s in modern Roman typography, and it's a lot more convenient to have this as a separate character for the nonce-uses that it has than to expect font designers round the world to add special shaping tables to all their fonts just for this critter. Why all their fonts? Only a few fonts designed for special purposes need to have the long/short s distinction. _ Marco
Re: Plane 1 maths fraktur in textual apparatus?
I've been pondering the very same issue as John, though with a little less focused attention. On 11/09/2002 11:57:18 AM jameskass wrote: In the case you have offered, since these Fraktur letters are used as variables (indicating sourcing in BHS), it shouldn't be considered abuse, IMHO. The use of Fraktur in Greek and Hebrew apparatus is not as variables, which denote some particular attribute but have no specific value; they are symbols with specific meaning, more comparable to letters denoting units of measure. But, the Fraktur-ness is essential in their interpretation. Options: 1. Use a symbol font / PUA for all apparatus and text-annotation symbols (e.g. some texts use angle brackets that look like |_ and _| but are positioned in the lower corners of the em square). Cons: involves PUA codepoints, and interchange requires prior agreement -- would really need to seek agreement throughout Biblical studies community. 2. Use regular Latin letters and a Fraktur face. Cons: need multiple fonts to work with Biblical texts (but may be true regardless), and plain-text interchange not possible. 3. Use regular Latin letters; provide a single font with Fraktur glyphs as alternates. Cons: usefulness limited to certain software only, and plain-text interchange not possible. 4. Use Fraktur math symbols. Cons: I can't think of any, though we'd still want to promote consensus among the Biblical studies community on using this. I think I could readily go along with John's suggestion (i.e. option 4). - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Lunate, Terminal, and Medial Sigma
“φιλοσ.,” is necessarily the abreviation of some word (like φιλοσοφία) while “φιλος.” is a single non-abbreviated word, followed by a sentence period. This is the compelling argument, which Nick made in his note on sigma as well, and which I had forgotten. So while I have to admit that this argument is compelling, I do still think that lunate sigma (which is the same glyph for both word positions) is going to cause real problems with normalization. It is not those who follow the Unicode specifications who worry me, it's those who do not. Thanks to Jim Allen and as often to Nick. For the details of the process which is described in a somewhat simplified way by Katerina Sarri, one can look to the examples in *Thompson*, An Introduction to Greek and Latin Palaeography. PTR
RE: Is long s a presentation form?
On 11/11/2002 05:42:15 AM Marco Cimarosti wrote: Michael Everson wrote: I like to think of the long s as similar to the final sigma. Nobody thinks that final sigma should be a presentation form of sigma. Never say nobody: I *do* think that Greek final sigma, final Hebrew letters, and Latin long s should all be presentation forms. I agree that Michael's nobody is incorrect. I've no opinion on the long s, but for sigma and Hebrew gimel, etc. we have legacy encodings that assume the finals *are* presentation forms. It means that, whereas we have a ton of custom encodings with presentation forms for which we neutralise when going to Unicode but need context-sensitive rules coming back, in the case of these Greek and Hebrew encodings, we need to neutralise distinctions going from Unicode to legacy, but need context-sensitive rules going from legacy to Unicode. It is what it is, though, and we're not suggesting any need to change. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Plane 1 maths fraktur in textual apparatus?
At 07:49 -0600 2002-11-11, [EMAIL PROTECTED] wrote: 4. Use Fraktur math symbols. Cons: I can't think of any, though we'd still want to promote consensus among the Biblical studies community on using this. They are used as symbols, not as letters of words, in Biblical studies texts, so why not? -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Question: the german umlaut
I just wanted to know how much space in bytes the Latin-1 characters such as the german umlaut characters take up in UTF-8 encoding. Is it still just one byte or does it now require 2 bytes? U+ up to U+007F take 1 byte (ASCII) U+0080 up to U+07FF take 2 bytes (Latin-1, Latin extended, combining diacritics, phonetics, greek, cyrillic, hebrew, arabic, syriac, and some more scripts - this is very little expansion especialy for laguages which use only few non-ASCII characters like swedish or german but expensive for greek or arabic or so) U+0800 up to U+FFFD take 3 bytes (hangul, cjk... not to expensive but significant) U+1 up to U+10FFFD take 4 bytes (this is all the rest - take almoust everywhere 4 bytes, so this is no significant expansion). If space is a concern, use SCSU - this shorter and has the additional advantage of beeing very much better compressable by zip or comparable algorithms. -- Dominikus Scherkl [EMAIL PROTECTED]
Entering Plane 1 characters in XP
In Windows 2000 it was necessary to adjust a registry entry to enable support for surrogates, which were disabled by default. What's the situation with XP? I looked on the Microsoft developers web site but it seems to be the same information as I saw when I was dealing with Win2000 with no updates. (One of the pages references Unicode 2.0 . . .) I did some tests and found that I can get characters outside the BMP in WordPad under XP and in Word XP by typing the Unicode scalar value followed by Alt-x; I don't recall ever changing any registry settings, but has been a while since I upgarded from Win2000 to XP. So am I correct in saying that, under XP, 1) no need to change registry and 2) the Win200 method of typing two surrogates has been replaced by typing the single scalar value plus Alt-x? Thanks - David
Re: Entering Plane 1 characters in XP
David, XP requires the registry change as well. http://www.i18nguy.com/surrogates.html I haven't played with the alt-n for surrogates so can't help with that. tex David J. Perry wrote: In Windows 2000 it was necessary to adjust a registry entry to enable support for surrogates, which were disabled by default. What's the situation with XP? I looked on the Microsoft developers web site but it seems to be the same information as I saw when I was dealing with Win2000 with no updates. (One of the pages references Unicode 2.0 . . .) I did some tests and found that I can get characters outside the BMP in WordPad under XP and in Word XP by typing the Unicode scalar value followed by Alt-x; I don't recall ever changing any registry settings, but has been a while since I upgarded from Win2000 to XP. So am I correct in saying that, under XP, 1) no need to change registry and 2) the Win200 method of typing two surrogates has been replaced by typing the single scalar value plus Alt-x? Thanks - David -- - Tex Texin cell: +1 781 789 1898 mailto:Tex;XenCraft.com Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Entering Plane 1 characters in XP
On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote: XP requires the registry change as well. I think the whole Registry thing is a red herring. I've never had to set the registry to see surrogates under Windows 2K or XP. I've even deleted the specified registry keys, and surrogates are still shown OK in IE, Notepad, Word etc. BTW, any application that uses Uniscribe can display surrogates just fine under Windows 9x as well as 2K and XP. Andrew
Scientific typographic characters
From the NY Times http://www.nytimes.com/2002/11/07/technology/circuits/07next.html?8cir - WHAT'S NEXT The Noah's Ark of the Web, 7,000 Characters at a Time By JEFFREY SELINGO IT'S one of the most frustrating problems encountered when passing documents back and forth electronically: the little square boxes that mean a font someone else used to create the file cannot be rendered on your computer. While Portable Document Format, or PDF, files, which essentially are copies of printed pages, have helped mitigate the problem for most computer users, that solution has not satisfied scientists and mathematicians, whose formulas and equations contain many symbols. Using those symbols on the Web has been particularly inconvenient. Most publishers use the symbol-friendly PDF format, but then researchers cannot easily embed links to other files or background information within those documents as they can with HTML files. But HTML documents have their own drawbacks. For instance, they often display equations as separate graphic images that cannot be resized or searched and greatly increase the size of the file. Now a new set of fonts being developed by six publishers of scientific, technical and medical journals promises to contain every character - more than 7,000 in all - that might be needed in a technical article published in any scientific discipline. When complete, sometime next fall, the fonts will be shared freely with publishers, software manufacturers and scholars, under the condition that they not be altered. This work is a breakthrough for publishers and scientists, said Tim Ingoldsby, director of business development at the American Institute of Physics, one of the publishers working on the project, called the Scientific and Technical Information Exchange, or STIX (www.stixfonts.com). The display of math symbols in publishing has always been difficult, but those problems have only become worse with the Web. The set of STIX fonts will work very much like the Symbol or Zapf Dingbats fonts in most applications, where users choose from a grid of dozens of characters. The STIX font will have the appearance of a Times font, but the characters will not look any different if a user switches to a different font, like Courier or Helvetica, Mr. Ingoldsby said. The symbols will work with pretty much any font, he said. Mr. Ingoldsby said most scientific characters lack flavor - they are quite plain to look at - so adding one of those symbols to a document composed using, for instance, a serif font, which has fine lines projecting from the main strokes of the letter, will not make the scientific character stand out. Designers are also adding the alphabet, numbers and other common characters to the STIX font, so, Mr. Ingoldsby said, there will be no need to switch between fonts. This is meant to replace the font which people use today called New Times Roman, he said. About 200 characters of the STIX fonts are being finished each month, Mr. Ingoldsby said. So far, about half of the 7,000 characters have been completed. With so many symbols, however, the STIX fonts could be cumbersome to use. The developers are working to come up with a method that will make it relatively easy for users to find the symbols they want. Symbols will probably be organized by type or subject, with the user selecting a category (and possibly a subcategory) from drop-down menus. A grid of symbols in that category will then appear, from which the user can choose the appropriate one. Creating a new font set is a complicated process. First, developers must correctly copy the shape of each character. Then they must adjust its metrics, or how the character is positioned in the space in which it is supposed to fit. And finally, they must make another set of adjustments to be sure the character looks good on a computer screen. William H. Mischo, head of the Grainger Engineering Library Information Center at the University of Illinois at Urbana-Champaign, said that the STIX project had the potential to solve a problem that dates back to the 1400's, when Gutenberg first conceived of movable type. The two biggest problems since then for properly rendering intellectual works have been tables and mathematics, Mr. Mischo said. Here we are in the digital age and we're still having these problems. Because math equations have been included in Web pages mostly as static images, as either a PDF or a graphics file, scholars have not been able to take advantage of many of the Web's distinctive research capabilities, Mr. Mischo said. For example, a mathematician cannot just plug a particular equation into Google and expect to find other scholars working on a similar problem, since the symbols in a graphic will probably not turn up in a search. For someone trying to read a scholarly publication, the current way of doing things presents difficulties, Mr. Mischo said. You can't enlarge, you can't pull it apart and you can't
RE: Is long s a presentation form?
At 08:00 -0600 2002-11-11, [EMAIL PROTECTED] wrote: On 11/11/2002 05:42:15 AM Marco Cimarosti wrote: Michael Everson wrote: I like to think of the long s as similar to the final sigma. Nobody thinks that final sigma should be a presentation form of sigma. Never say nobody: I *do* think that Greek final sigma, final Hebrew letters, and Latin long s should all be presentation forms. I agree that Michael's nobody is incorrect. I've no opinion on the long s, but for sigma and Hebrew gimel, etc. we have legacy encodings that assume the finals *are* presentation forms. Are there not minimal pairs in Hebrew where the final form would be expected but isn't used for some reason? There certainly is for final sigma, which is why it is a good thing it is encoded separately. Equivalencing s and long-s for searching is not worse than equivalancing S and s for the same purpose, is it? -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Entering Plane 1 characters in XP
Concerning display, there are two separate registry settings: - in Windows 2000 and Windows XP, you can set a registry value to cause Uniscribe to load (Uniscribe is required to display supplementary characters). Alternatively, you could install any of the language packs that require Uniscribe. The only difference between Windows 2000 and Windows XP in this regard is that XP installs Uniscribe for East Asian languages, whereas 2000 installed it only for complex scripts. - Windows XP added a feature to provide font-linking for supplementary characters if Uniscribe is loaded. There are 16 registry values, each of which designates a font for a plane. Although the mechanism exists, none of the registry values are set in Windows XP. Without this registry value set, you must explicitly select the font which contains the glyphs for the supplementary characters. The registry value for Plane 1 is: HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1 Windows 2000 and Windows XP will otherwise treat supplementary characters identically e.g. sorting by code point order. John Global Infrastructure -Original Message- From: Andrew C. West [mailto:andrewcwest;alumni.princeton.edu] Sent: Monday, November 11, 2002 9:03 AM To: [EMAIL PROTECTED] Subject: Re: Entering Plane 1 characters in XP On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote: XP requires the registry change as well. I think the whole Registry thing is a red herring. I've never had to set the registry to see surrogates under Windows 2K or XP. I've even deleted the specified registry keys, and surrogates are still shown OK in IE, Notepad, Word etc. BTW, any application that uses Uniscribe can display surrogates just fine under Windows 9x as well as 2K and XP. Andrew
Re: Entering Plane 1 characters in XP
Andrew, it is definitely a requirement for some applications. However, it would not be surprising if applications overtime have made themselves independent of the registry entry. I do know that to view my plane 1 example web page with IE, the registry needed to be set on both win 2k and win xp. http://www.i18nguy.com/unicode-example-plane1.html If I get some time later I'll play with unsetting it and see what happens now. tex Andrew C. West wrote: On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote: XP requires the registry change as well. I think the whole Registry thing is a red herring. I've never had to set the registry to see surrogates under Windows 2K or XP. I've even deleted the specified registry keys, and surrogates are still shown OK in IE, Notepad, Word etc. BTW, any application that uses Uniscribe can display surrogates just fine under Windows 9x as well as 2K and XP. Andrew -- - Tex Texin cell: +1 781 789 1898 mailto:Tex;XenCraft.com Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Entering Plane 1 characters in XP
John, thanks very much for this. I want to confirm my understanding, and with your permission I'll include your remarks below on my page for supporting surrogates. 1) The possible explanation then for the difference between Andrew and myself with respect to the need for a special registry setting, is that Andrew most likely installed something, perhaps a language pack, that caused Uniscribe to be loaded on his system. He therefore didn't need the setting. I probably didn't install anything that used Unsicribe. 2) The first paragraph describes a registry value that forces Uniscribe to load. I presume that you are referring to the first of these two entries recommended by the kbase. The second seems specific to IE. Is that presumption that this entry causes Uniscribe to be loaded correct? [HKLM]\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack] SURROGATE=(REG_DWORD)0x0002 [HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42] IEFixedFontName=[Surrogate Font Face Name] IEPropFontName=[Surrogate Font Face Name] 3) For XP only, we can set a font face name that supports surrogates into this registry entry. Doing so will make this font the default for plane 1 characters, if another font is not explicitly designated to be used: HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1 (and by extension for the other planes). cool. thanks tex John McConnell wrote: Concerning display, there are two separate registry settings: - in Windows 2000 and Windows XP, you can set a registry value to cause Uniscribe to load (Uniscribe is required to display supplementary characters). Alternatively, you could install any of the language packs that require Uniscribe. The only difference between Windows 2000 and Windows XP in this regard is that XP installs Uniscribe for East Asian languages, whereas 2000 installed it only for complex scripts. - Windows XP added a feature to provide font-linking for supplementary characters if Uniscribe is loaded. There are 16 registry values, each of which designates a font for a plane. Although the mechanism exists, none of the registry values are set in Windows XP. Without this registry value set, you must explicitly select the font which contains the glyphs for the supplementary characters. The registry value for Plane 1 is: HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1 Windows 2000 and Windows XP will otherwise treat supplementary characters identically e.g. sorting by code point order. John Global Infrastructure -Original Message- From: Andrew C. West [mailto:andrewcwest;alumni.princeton.edu] Sent: Monday, November 11, 2002 9:03 AM To: [EMAIL PROTECTED] Subject: Re: Entering Plane 1 characters in XP On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote: XP requires the registry change as well. I think the whole Registry thing is a red herring. I've never had to set the registry to see surrogates under Windows 2K or XP. I've even deleted the specified registry keys, and surrogates are still shown OK in IE, Notepad, Word etc. BTW, any application that uses Uniscribe can display surrogates just fine under Windows 9x as well as 2K and XP. Andrew -- - Tex Texin cell: +1 781 789 1898 mailto:Tex;XenCraft.com Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: Is long s a presentation form?
On 11/11/2002 11:12:55 AM Michael Everson wrote: Are there not minimal pairs in Hebrew where the final form would be expected but isn't used for some reason? There certainly is for final sigma, which is why it is a good thing it is encoded separately. I agree that there are valid reasons for encoding these as distinct characters. When we did our implementations for Biblical Greek and Hebrew several years ago, we weren't aware of those reasons; for the texts we were concerned with it seemed quite appropriate to assume that there is only one sigma character. My objection wasn't to there being two sigmas but to the the claim that nobody considers them to be representable by a single character. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Speaking of Plane 1 characters...
One of the tools I use for building fonts requires that codepoints for Plane 1 characters be expressed as surrogate pairs, rather than as scalar values. I'm hoping this will change on the next release, since the scalar values are a lot easier to work with, but in the meantime I need to figure out the easiest way to find the correct surrogate pair values for any given scalar value. Is there a comprehensive list somewhere, or an easy alogorithm (easy for a non-programmer)? How about a web-based form, into which someone could enter scalar values and receive back surrogate pairs? John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] It is necessary that by all means and cunning, the cursed owners of books should be persuaded to make them available to us, either by argument or by force. - Michael Apostolis, 1467
RE: Entering Plane 1 characters in XP
I'll have somebody a bit more familiar with IE registry usage review that part, but the rest looks good. Thanks. John Global Infrastructure -Original Message- From: Tex Texin [mailto:tex;i18nguy.com] Sent: Monday, November 11, 2002 10:41 AM To: John McConnell Cc: Andrew C. West; [EMAIL PROTECTED] Subject: Re: Entering Plane 1 characters in XP John, thanks very much for this. I want to confirm my understanding, and with your permission I'll include your remarks below on my page for supporting surrogates. 1) The possible explanation then for the difference between Andrew and myself with respect to the need for a special registry setting, is that Andrew most likely installed something, perhaps a language pack, that caused Uniscribe to be loaded on his system. He therefore didn't need the setting. I probably didn't install anything that used Unsicribe. 2) The first paragraph describes a registry value that forces Uniscribe to load. I presume that you are referring to the first of these two entries recommended by the kbase. The second seems specific to IE. Is that presumption that this entry causes Uniscribe to be loaded correct? [HKLM]\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack] SURROGATE=(REG_DWORD)0x0002 [HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42] IEFixedFontName=[Surrogate Font Face Name] IEPropFontName=[Surrogate Font Face Name] 3) For XP only, we can set a font face name that supports surrogates into this registry entry. Doing so will make this font the default for plane 1 characters, if another font is not explicitly designated to be used: HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1 (and by extension for the other planes). cool. thanks tex John McConnell wrote: Concerning display, there are two separate registry settings: - in Windows 2000 and Windows XP, you can set a registry value to cause Uniscribe to load (Uniscribe is required to display supplementary characters). Alternatively, you could install any of the language packs that require Uniscribe. The only difference between Windows 2000 and Windows XP in this regard is that XP installs Uniscribe for East Asian languages, whereas 2000 installed it only for complex scripts. - Windows XP added a feature to provide font-linking for supplementary characters if Uniscribe is loaded. There are 16 registry values, each of which designates a font for a plane. Although the mechanism exists, none of the registry values are set in Windows XP. Without this registry value set, you must explicitly select the font which contains the glyphs for the supplementary characters. The registry value for Plane 1 is: HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1 Windows 2000 and Windows XP will otherwise treat supplementary characters identically e.g. sorting by code point order. John Global Infrastructure -Original Message- From: Andrew C. West [mailto:andrewcwest;alumni.princeton.edu] Sent: Monday, November 11, 2002 9:03 AM To: [EMAIL PROTECTED] Subject: Re: Entering Plane 1 characters in XP On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote: XP requires the registry change as well. I think the whole Registry thing is a red herring. I've never had to set the registry to see surrogates under Windows 2K or XP. I've even deleted the specified registry keys, and surrogates are still shown OK in IE, Notepad, Word etc. BTW, any application that uses Uniscribe can display surrogates just fine under Windows 9x as well as 2K and XP. Andrew -- - Tex Texin cell: +1 781 789 1898 mailto:Tex;XenCraft.com Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Speaking of Plane 1 characters...
John Hudson scripsit: One of the tools I use for building fonts requires that codepoints for Plane 1 characters be expressed as surrogate pairs, rather than as scalar values. I'm hoping this will change on the next release, since the scalar I need to figure out the easiest way to find the correct surrogate pair values for any given scalar value. If you have access to any Windows box, you can use the Windows Calculator (Start/Programs/Accessories/Calculator). Choose View/Scientific and click on the Hex radio button. Then enter your 5-digit Unicode scalar value. (You must type hex digits in lower case.) To get the high surrogate, type: - 1 0 0 0 0 = / 4 0 0 + d 8 0 0 = To get the low surrogate, enter the scalar value again and type: - 1 0 0 0 0 = % 4 0 0 + d c 0 0 = You can also use the mouse, in which case % above represents the MOD key. On *ix systems, use the bc command; type obase=16 and ibase=16. For this program, you must use capital letters for the hex digits. To get the high surrogate, type (x-1)/400+DC00 for the high surrogate (x is the scalar value); to get the low surrogate, type (x-1)%400+DC00. On the Macintosh, I have no clue. -- John Cowan [EMAIL PROTECTED] You need a change: try Canada You need a change: try China --fortune cookies opened by a couple that I know
Re: Speaking of Plane 1 characters...
Many thanks to the various people who recommended Michael Kaplan's calculator at http://trigeminal.com/16to32AndBack.asp This is excellent and solves my problem. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] It is necessary that by all means and cunning, the cursed owners of books should be persuaded to make them available to us, either by argument or by force. - Michael Apostolis, 1467
Re: Speaking of Plane 1 characters...
On the Macintosh, I have no clue. On Mac OS X, the Character Palette or the add-on UnicodeChecker will give the surrogates for any given codepoint. For a web page that calculates both ways, see http://www.trigeminal.com/16to32AndBack.asp
Re: Speaking of Plane 1 characters...
At 13:55 -0700 2002-11-11, Tom Gewecke wrote: On the Macintosh, I have no clue. On Mac OS X, the Character Palette or the add-on UnicodeChecker will give the surrogates for any given codepoint. If you can get it to work. It still breaks for me so constantly I don't even try to use it. :-( -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Speaking of Plane 1 characters...
At 13:11 -0800 2002-11-11, Michael \(michka\) Kaplan wrote: Perhaps it is just me, but terms like scalar value just don't mean anything to me. It rather reminds me of reptilian skin shedding. Since I do not use that term on my site, I assume you are referring to someone else's resource? :-) It was related to this thread but in a previous post. Nevertheless a little gentle user-friendliness on your page would help me to use it more easily. Just a teensy tutorialette and a weensy example at the top? A little hand-holding? I visited MichKa's page and tried typing in 10312 (OLD ITALIC LETTER KU) and it did convert to a surrogate pair. I wonder what would happen if I pasted it into an HTML document. Hmm but I couldn't do that until I converted them to UTF-8 Well, since the page advertises itself as a UTF-16/UTF-32 sort of converter, I would hope that the lack of UTF-8 byte conversion would be expected. Gee, what I really need is a UTF-8/UTF-16/UTF/32 sort of converter that handles surrogates ;-) There isn't such a thing and there ought to be. :-) By the way MichKa if you make the boxes a bit wider the whole string of numbers would display. What numbers did not display for you? They all fit for me The surrogate pair shows three digits and a tiny little popup triangle to tell you that there's a fourth digit. If you need to I can send you a screenshot. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Speaking of Plane 1 characters...
From: Michael Everson [EMAIL PROTECTED] At 12:10 -0700 2002-11-11, John Hudson wrote: Many thanks to the various people who recommended Michael Kaplan's calculator at http://trigeminal.com/16to32AndBack.asp This is excellent and solves my problem. Glad you like it, John -- I am sure James Kass remembers when I put it up, it was actually because of a complaint that there wasn't such a thing and there ought to be. grin Perhaps it is just me, but terms like scalar value just don't mean anything to me. It rather reminds me of reptilian skin shedding. Since I do not use that term on my site, I assume you are referring to someone else's resource? :-) I visited MichKa's page and tried typing in 10312 (OLD ITALIC LETTER KU) and it did convert to a surrogate pair. I wonder what would happen if I pasted it into an HTML document. Hmm but I couldn't do that until I converted them to UTF-8 Well, since the page advertises itself as a UTF-16/UTF-32 sort of converter, I would hope that the lack of UTF-8 byte conversion would be expected. By the way MichKa if you make the boxes a bit wider the whole string of numbers would display. What numbers did not display for you? They all fit for me MichKa
Re: Speaking of Plane 1 characters...
At 13:50 11/11/2002, Michael Everson wrote: By the way MichKa if you make the boxes a bit wider the whole string of numbers would display. I noticed the same problem in Opera. It's okay in IE. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] It is necessary that by all means and cunning, the cursed owners of books should be persuaded to make them available to us, either by argument or by force. - Michael Apostolis, 1467
Re: Speaking of Plane 1 characters...
Michael Everson scripsit: Perhaps it is just me, but terms like scalar value just don't mean anything to me. It rather reminds me of reptilian skin shedding. The scale in question is analogous to a temperature scale, not a reptilian one. I visited MichKa's page and tried typing in 10312 (OLD ITALIC LETTER KU) and it did convert to a surrogate pair. I wonder what would happen if I pasted it into an HTML document. Hmm but I couldn't do that until I converted them to UTF-8 The Right Thing in HTML terms is to say #x10312; and *not* use the surrogate pair representation. -- Deshil Holles eamus. Deshil Holles eamus. Deshil Holles eamus. Send us, bright one, light one, Horhorn, quickening, and wombfruit. (3x) Hoopsa, boyaboy, hoopsa! Hoopsa, boyaboy, hoopsa! Hoopsa, boyaboy, hoopsa! -- Joyce, _Ulysses_, Oxen of the Sun [EMAIL PROTECTED]
Re: Speaking of Plane 1 characters...
At 13:18 11/11/2002 -0700, John Hudson wrote: At 13:50 11/11/2002, Michael Everson wrote: By the way MichKa if you make the boxes a bit wider the whole string of numbers would display. I noticed the same problem in Opera. It's okay in IE. That's the default font size mismatch - IE do things differently (they would!). In Mozilla and Phoenix do they fit? John
Re: Speaking of Plane 1 characters...
At 13:20 -0800 2002-11-11, Mark Davis wrote: If you look http://www.macchiato.com/ under Unicode Charts, you can type in the code point (scalar value) for a character, then Enter, and you will get a chart. The UTF-8, 16, and 32 numbers are given in the chart for each value. Why do you call it a scalar value if it is really a code point? I thought it was bad enough Unicode calls it code point while 10646 calls it code position For the Terminology Police, -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Speaking of Plane 1 characters...
From: John Hudson [EMAIL PROTECTED] At 13:50 11/11/2002, Michael Everson wrote: By the way MichKa if you make the boxes a bit wider the whole string of numbers would display. I noticed the same problem in Opera. It's okay in IE. Ah, if I called *that* by design, someone might accuse me of global conspiracy. :-) Never mind, it wasn't that funny. I went ahead and updated the page, it should work well in Opera Compatibility mode. g,dr Michael, in answer to your request for a UTF-8 converter, that will have to be another day (its a bit more complicated, and I spend most of my time in UTF-16 and UTF-32 so I can't really pretend its work related). If you wanted to provide the code in VBScript or JScript I will add it to the page (and give you credit, of course). MichKa
Re: Speaking of Plane 1 characters...
At 13:34 -0800 2002-11-11, Michael \(michka\) Kaplan wrote: Michael, in answer to your request for a UTF-8 converter, that will have to be another day (its a bit more complicated, and I spend most of my time in UTF-16 and UTF-32 so I can't really pretend its work related). If you wanted to provide the code in VBScript or JScript I will add it to the page (and give you credit, of course). Sir, you mistake me for a programmer! :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Speaking of Plane 1 characters...
Michael Everson scripsit: The scale in question is analogous to a temperature scale, not a reptilian one. Now I very *seriously* don't get it. A temperature scale enumerates the degrees -273, -272, -271, ..., 0, 1, 2, ... in order. When you ask What is the temperature?, you are actually asking What is the scalar value of the temperature? The Unicode scale enumerates the characters 0, 1, 2, ... 10. Unicode scalar values are points on this scale, just as temperature scalar values are points on the (Celsius) temperature scale. -- Winter: MIT, John Cowan Keio, INRIA,[EMAIL PROTECTED] Issue lots of Drafts. http://www.ccil.org/~cowan So much more to understand! http://www.reutershealth.com Might simplicity return?(A tanka, or extended haiku)
Speaking plane1-ly
I have modified my windows settings for surrogates page to include the new information. Consider it a draft for a day or two. I would be grateful for any constructive review comments and the usual comical abuse. The page is at: http://www.i18nguy.com/surrogates.html I don't have any more time today, but if I had recommendations for (lists of) IMEs and Fonts that support planes other than the BMP, it might be nice to have a collection point and web page for them. Much thanks to John McConnell for the clarifications and new info. Hmmm. I just reviewed Andrew's comment that he can get support for surrogates via uniscribe on windows 9x. I guess I have to think about extending this to include those systems. I guess if I get confirmation (or disconfirmation) from John or other Microsofties I will update the page accordingly. tex -- - Tex Texin cell: +1 781 789 1898 mailto:Tex;XenCraft.com Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Speaking of Plane 1 characters...
According to the new 4.0 definitions: - code points go from 0..10, inclusive - scalar value == non-surrogate code point, so they are simply a restriction of code points to the ranges 0..D7FF, E000..10 Since surrogate code points can never represent characters, for a given character you can refer to its code point or to its scalar value; in that circumstance there is no effective difference in the terms. Mark __ http://www.macchiato.com ► “Eppur si muove” ◄ - Original Message - From: Michael Everson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, November 11, 2002 13:37 Subject: Re: Speaking of Plane 1 characters... At 13:20 -0800 2002-11-11, Mark Davis wrote: If you look http://www.macchiato.com/ under Unicode Charts, you can type in the code point (scalar value) for a character, then Enter, and you will get a chart. The UTF-8, 16, and 32 numbers are given in the chart for each value. Why do you call it a scalar value if it is really a code point? I thought it was bad enough Unicode calls it code point while 10646 calls it code position For the Terminology Police, -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Speaking of Plane 1 characters...
At 05:47 PM 11/11/2002 -0500, John Cowan wrote: Michael Everson scripsit: The scale in question is analogous to a temperature scale, not a reptilian one. Now I very *seriously* don't get it. A temperature scale enumerates the degrees -273, -272, -271, ..., 0, 1, 2, ... in order. When you ask What is the temperature?, you are actually asking What is the scalar value of the temperature? The Unicode scale enumerates the characters 0, 1, 2, ... 10. Unicode scalar values are points on this scale, just as temperature scalar values are points on the (Celsius) temperature scale. Well, not exactly...temperature is an arbitrary but standard measure of a continuous physical property. The multiple well known scales attest to that. But code points are absolute points, not continuous. And because one character has a greater encoding value does not make it greater then in any useful sense. Basically, we are talking about continuous ordinal scales vs discrete cardinal scales. Hardly analogous at all IMM. Barry Caplan www.i18n.com
Re: Speaking of Plane 1 characters...
On Mon, 11 Nov 2002, John Cowan wrote: On *ix systems, use the bc command; type obase=16 and ibase=16. Thank you for this. I should have read the man page of bc more carefully. (or I used to know it but forgot...) For this program, you must use capital letters for the hex digits. To get the high surrogate, type (x-1)/400+DC00 for the high s/DC00/D800/ surrogate (x is the scalar value); to get the low surrogate, type (x-1)%400+DC00. And one can define a function On the Macintosh, I have no clue. As you know so well, MacOS X is a Unix and 'bc' should be available there, too. If not by default, one can certainly grab the source and compile it or get a precompiled binary somewhere. It seems to me a waste of the bandwidth (however abundant it may have become recently. I heard several times on this list that it's not in a certain country in Europe ;-) ) to go all the way across the Atlantic or the continent to convert between UCVs and surrogate pairs. There are several ways to do it locally including two suggested above. On *nix including MacOS X (http://developer.apple.com/internet/macosx/perl.html), one can open up a small terminal window (yes, Mac OS X has a terminal window !) and run a script like the following(assuming Perl is installed. If GUI is desired, make one up in Perl/Tk, Tcl/Tk, pdksh, Python+Tk?...) This should also work in a command prompt of Windows. Alternatively, I guess a local html file with ECMAscript should also work. Cuthere #!/usr/bin/perl -w # use the full path of your perl binary in place of /usr/bin/perl while ( 1 ) { print ** Enter Unicode code point in hexadecimal \n . (to end, press [enter]) : ; $| = 1; # force a flush after our print $ucs = STDIN; chomp $ucs; last if $ucs eq ; if ( $ucs =~ /[^a-f0-9A-F]/ ) { printf Error: %s is invalid. Try again\n, $ucs; next; } $usv = hex $ucs; if ( 0x $usv $usv 0x11 ) { printf UTF-16: %04x %04x\n, ($usv-0x1) / 0x400 + 0xd800, ($usv-0x1) % 0x400 + 0xdc00, } elsif ( $usv 0xd800 || 0xdfff $usv $usv 0x1 ) { printf UTF-16: %04x\n, $usv; } else { printf Your input %s is not valid. Try again\n, $ucs; } } print Bye !!\n; Cut-here-- Jungshik
Info: Apple OSX Font Tools Suite 1.0.0 Released
Cupertino 11/8/02: Today the Apple Font Group released its new suite of Unix command line font tools for OSX. These can be downloaded free from http://developer.apple.com/fonts/. The automatically installed 4.8 Mb package includes the tools, user documentation, and a 60-page tutorial. To use this package, you need to be running OSX 10.2. Everything is automatically configured by the installer. You just add fonts to taste. Working with text sources for many of the tables in an sfnt font structure is a powerful and efficient way to develop, debug and manage font sources. E.g. use ftxdumperfuser to solve cmap and postname glitches once and for all in .ttf, .otf and CFF format fonts. With this release, Apple has converted its text dump formats to XML and will be continuing to refine the XML formats in future releases. No previous experience of Unix is necessary as the 60-page tutorial takes you step-by-step through useful font editing proceses with an accompanying set of ready-worked live demo files. Applications in The Font Tool Suite are: * ftxanalyzer * ftxdiff * ftxdumperfuser * ftxenhancer * ftxinstalledfonts * ftxruler * ftxvalidator Documents included: * The Apple Font Tool Suite Manual (51 pages) * Tool Quick Reference (8 pages) * Tutorial (62 pages) * Tutorial Command Summary (8 pages) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Speaking of Plane 1 characters...
Michael (michka) Kaplan wrote: Michael, in answer to your request for a UTF-8 converter, that will have to be another day (its a bit more complicated, and I spend most of my time in UTF-16 and UTF-32 so I can't really pretend its work related). If you wanted to provide the code in VBScript or JScript I will add it to the page (and give you credit, of course). Mark has it all in his UTF Converter and Charts at http://www.macchiato.com/unicode/convert.html markus