RE: Is long s a presentation form?

2002-11-11 Thread Marco Cimarosti
Michael Everson wrote:
 I like to think of the long s as similar to the final sigma. Nobody 
 thinks that final sigma should be a presentation form of sigma.

Never say nobody: I *do* think that Greek final sigma, final Hebrew
letters, and Latin long s should all be presentation forms. I think that
they are encoded as separate characters only because of compatibility with
pre-existing standards such as ISO 8859.

Occasional exceptions to the general distributional rules of these
presentation forms would not have been a valid reason to encode them as
separate characters. Similar exceptions also occur in Indic and Arabic
scripts (e.g., the Arabic abbreviation for plural is a jiim in initial
form). These case can be supported in plain-text using ZWJ and ZWNJ:

Wachstube = German for guard room;
WachsZWNJtube = German for wax tube.
jiimZWJ = Arabic for plural;

 Nobody really uses long s in modern Roman typography, and it's a lot 
 more convenient to have this as a separate character for the 
 nonce-uses that it has than to expect font designers round the world 
 to add special shaping tables to all their fonts just for this 
 critter.

Why all their fonts? Only a few fonts designed for special purposes need
to have the long/short s distinction.

_ Marco




Re: Plane 1 maths fraktur in textual apparatus?

2002-11-11 Thread Peter_Constable

I've been pondering the very same issue as John, though with a little less
focused attention.


On 11/09/2002 11:57:18 AM jameskass wrote:

In the case you have offered, since these Fraktur letters are
used as variables (indicating sourcing in BHS), it shouldn't be
considered abuse, IMHO.

The use of Fraktur in Greek and Hebrew apparatus is not as variables, which
denote some particular attribute but have no specific value; they are
symbols with specific meaning, more comparable to letters denoting units of
measure. But, the Fraktur-ness is essential in their interpretation.

Options:

1. Use a symbol font / PUA for all apparatus and text-annotation symbols
(e.g. some texts use angle brackets that look like |_ and  _| but are
positioned in the lower corners of the em square). Cons: involves PUA
codepoints, and interchange requires prior agreement -- would really need
to seek agreement throughout Biblical studies community.

2. Use regular Latin letters and a Fraktur face. Cons: need multiple fonts
to work with Biblical texts (but may be true regardless), and plain-text
interchange not possible.

3. Use regular Latin letters; provide a single font with Fraktur glyphs as
alternates. Cons: usefulness limited to certain software only, and
plain-text interchange not possible.

4. Use Fraktur math symbols. Cons: I can't think of any, though we'd still
want to promote consensus among the Biblical studies community on using
this.


I think I could readily go along with John's suggestion (i.e. option 4).



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]











Lunate, Terminal, and Medial Sigma

2002-11-11 Thread P. T. Rourke


“φιλοσ.,” is necessarily the abreviation of some word (like
   φιλοσοφία) while “φιλος.” is a single non-abbreviated word, followed
   by a sentence period.


This is the compelling argument, which Nick made in his note on sigma as 
well, and which I had forgotten.  So while I have to admit that this 
argument is compelling, I do still think that lunate sigma (which is the 
same glyph for both word positions) is going to cause real problems with 
normalization.  It is not those who follow the Unicode specifications 
who worry me, it's those who do not.  Thanks to Jim Allen and as often 
to Nick.

For the details of the process which is described in a somewhat 
simplified way by Katerina Sarri, one can look to the examples in 
*Thompson*, An Introduction to Greek and Latin Palaeography.

PTR







RE: Is long s a presentation form?

2002-11-11 Thread Peter_Constable

On 11/11/2002 05:42:15 AM Marco Cimarosti wrote:

Michael Everson wrote:
 I like to think of the long s as similar to the final sigma. Nobody
 thinks that final sigma should be a presentation form of sigma.

Never say nobody: I *do* think that Greek final sigma, final Hebrew
letters, and Latin long s should all be presentation forms.

I agree that Michael's nobody is incorrect. I've no opinion on the long
s, but for sigma and Hebrew gimel, etc. we have legacy encodings that
assume the finals *are* presentation forms. It means that, whereas we have
a ton of custom encodings with presentation forms for which we neutralise
when going to Unicode but need context-sensitive rules coming back, in the
case of these Greek and Hebrew encodings, we need to neutralise
distinctions going from Unicode to legacy, but need context-sensitive rules
going from legacy to Unicode. It is what it is, though, and we're not
suggesting any need to change.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: Plane 1 maths fraktur in textual apparatus?

2002-11-11 Thread Michael Everson
At 07:49 -0600 2002-11-11, [EMAIL PROTECTED] wrote:


4. Use Fraktur math symbols. Cons: I can't think of any, though we'd still
want to promote consensus among the Biblical studies community on using
this.


They are used as symbols, not as letters of words, in Biblical 
studies texts, so why not?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Question: the german umlaut

2002-11-11 Thread Dominikus Scherkl
 I just wanted to know how much space in bytes the Latin-1 
 characters such as the german umlaut characters take up in 
 UTF-8 encoding. Is it still just one byte or does it now 
 require 2 bytes?
U+ up to U+007F take 1 byte (ASCII)
U+0080 up to U+07FF take 2 bytes (Latin-1, Latin extended,
combining diacritics, phonetics, greek, cyrillic, hebrew,
arabic, syriac, and some more scripts - this is very little
expansion especialy for laguages which use only few non-ASCII
characters like swedish or german but expensive for greek or
arabic or so)
U+0800 up to U+FFFD take 3 bytes (hangul, cjk... not to
expensive but significant)
U+1 up to U+10FFFD take 4 bytes (this is all the rest -
take almoust everywhere 4 bytes, so this is no significant
expansion).

If space is a concern, use SCSU - this shorter and has the
additional advantage of beeing very much better compressable
by zip or comparable algorithms.
-- 
Dominikus Scherkl
[EMAIL PROTECTED]




Entering Plane 1 characters in XP

2002-11-11 Thread David J. Perry
In Windows 2000 it was necessary to adjust a registry entry to enable
support for surrogates, which were disabled by default.  What's the
situation with XP?  I looked on the Microsoft developers web site but it
seems to be the same information as I saw when I was dealing with
Win2000 with no updates.  (One of the pages references Unicode 2.0 . .
.)

I did some tests and found that I can get characters outside the BMP in
WordPad under XP and in Word XP by typing the Unicode scalar value
followed by Alt-x; I don't recall ever changing any registry settings,
but has been a while since I upgarded from Win2000 to XP.

So am I correct in saying that, under XP, 1) no need to change registry
and 2) the Win200 method of typing two surrogates has been replaced by
typing the single scalar value plus Alt-x?

Thanks - David






Re: Entering Plane 1 characters in XP

2002-11-11 Thread Tex Texin
David,

XP requires the registry change as well.
http://www.i18nguy.com/surrogates.html

I haven't played with the alt-n for surrogates so can't help with
that.

tex


David J. Perry wrote:
 
 In Windows 2000 it was necessary to adjust a registry entry to enable
 support for surrogates, which were disabled by default.  What's the
 situation with XP?  I looked on the Microsoft developers web site but it
 seems to be the same information as I saw when I was dealing with
 Win2000 with no updates.  (One of the pages references Unicode 2.0 . .
 .)
 
 I did some tests and found that I can get characters outside the BMP in
 WordPad under XP and in Word XP by typing the Unicode scalar value
 followed by Alt-x; I don't recall ever changing any registry settings,
 but has been a while since I upgarded from Win2000 to XP.
 
 So am I correct in saying that, under XP, 1) no need to change registry
 and 2) the Win200 method of typing two surrogates has been replaced by
 typing the single scalar value plus Alt-x?
 
 Thanks - David

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:Tex;XenCraft.com
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Entering Plane 1 characters in XP

2002-11-11 Thread Andrew C. West
On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote:

 
 XP requires the registry change as well.

I think the whole Registry thing is a red herring. I've never had to set the
registry to see surrogates under Windows 2K or XP. I've even deleted the
specified registry keys, and surrogates are still shown OK in IE, Notepad, Word
etc.

BTW, any application that uses Uniscribe can display surrogates just fine under
Windows 9x as well as 2K and XP.

Andrew




Scientific typographic characters

2002-11-11 Thread Michael Everson
From the NY Times
http://www.nytimes.com/2002/11/07/technology/circuits/07next.html?8cir
-

WHAT'S NEXT

The Noah's Ark of the Web, 7,000 Characters at a Time
By JEFFREY SELINGO

IT'S one of the most frustrating problems encountered when passing
documents back and forth electronically: the little square boxes that
mean a font someone else used to create the file cannot be rendered
on your computer. While Portable Document Format, or PDF, files,
which essentially are copies of printed pages, have helped mitigate
the problem for most computer users, that solution has not satisfied
scientists and mathematicians, whose formulas and equations contain
many symbols.


Using those symbols on the Web has been particularly inconvenient.
Most publishers use the symbol-friendly PDF format, but then
researchers cannot easily embed links to other files or background
information within those documents as they can with HTML files. But
HTML documents have their own drawbacks. For instance, they often
display equations as separate graphic images that cannot be resized
or searched and greatly increase the size of the file.

Now a new set of fonts being developed by six publishers of
scientific, technical and medical journals promises to contain every
character - more than 7,000 in all - that might be needed in a
technical article published in any scientific discipline. When
complete, sometime next fall, the fonts will be shared freely with
publishers, software manufacturers and scholars, under the condition
that they not be altered.

This work is a breakthrough for publishers and scientists, said Tim
Ingoldsby, director of business development at the American Institute
of Physics, one of the publishers working on the project, called the
Scientific and Technical Information Exchange, or STIX
(www.stixfonts.com). The display of math symbols in publishing has
always been difficult, but those problems have only become worse with
the Web.

The set of STIX fonts will work very much like the Symbol or Zapf
Dingbats fonts in most applications, where users choose from a grid
of dozens of characters. The STIX font will have the appearance of a
Times font, but the characters will not look any different if a user
switches to a different font, like Courier or Helvetica, Mr.
Ingoldsby said. The symbols will work with pretty much any font, he
said.

Mr. Ingoldsby said most scientific characters lack flavor - they
are quite plain to look at - so adding one of those symbols to a
document composed using, for instance, a serif font, which has fine
lines projecting from the main strokes of the letter, will not make
the scientific character stand out. Designers are also adding the
alphabet, numbers and other common characters to the STIX font, so,
Mr. Ingoldsby said, there will be no need to switch between fonts.

This is meant to replace the font which people use today called New
Times Roman, he said.

About 200 characters of the STIX fonts are being finished each month,
Mr. Ingoldsby said. So far, about half of the 7,000 characters have
been completed.

With so many symbols, however, the STIX fonts could be cumbersome to
use. The developers are working to come up with a method that will
make it relatively easy for users to find the symbols they want.
Symbols will probably be organized by type or subject, with the user
selecting a category (and possibly a subcategory) from drop-down
menus. A grid of symbols in that category will then appear, from
which the user can choose the appropriate one.

Creating a new font set is a complicated process. First, developers
must correctly copy the shape of each character. Then they must
adjust its metrics, or how the character is positioned in the space
in which it is supposed to fit. And finally, they must make another
set of adjustments to be sure the character looks good on a computer
screen.

William H. Mischo, head of the Grainger Engineering Library
Information Center at the University of Illinois at Urbana-Champaign,
said that the STIX project had the potential to solve a problem that
dates back to the 1400's, when Gutenberg first conceived of movable
type.

The two biggest problems since then for properly rendering
intellectual works have been tables and mathematics, Mr. Mischo
said. Here we are in the digital age and we're still having these
problems.

Because math equations have been included in Web pages mostly as
static images, as either a PDF or a graphics file, scholars have not
been able to take advantage of many of the Web's distinctive research
capabilities, Mr. Mischo said. For example, a mathematician cannot
just plug a particular equation into Google and expect to find other
scholars working on a similar problem, since the symbols in a graphic
will probably not turn up in a search.

For someone trying to read a scholarly publication, the current way
of doing things presents difficulties, Mr. Mischo said. You can't
enlarge, you can't pull it apart and you can't 

RE: Is long s a presentation form?

2002-11-11 Thread Michael Everson
At 08:00 -0600 2002-11-11, [EMAIL PROTECTED] wrote:

On 11/11/2002 05:42:15 AM Marco Cimarosti wrote:


Michael Everson wrote:

 I like to think of the long s as similar to the final sigma. Nobody
 thinks that final sigma should be a presentation form of sigma.


Never say nobody: I *do* think that Greek final sigma, final Hebrew
letters, and Latin long s should all be presentation forms.


I agree that Michael's nobody is incorrect. I've no opinion on the long
s, but for sigma and Hebrew gimel, etc. we have legacy encodings that
assume the finals *are* presentation forms.


Are there not minimal pairs in Hebrew where the final form would be 
expected but isn't used for some reason? There certainly is for final 
sigma, which is why it is a good thing it is encoded separately.

Equivalencing s and long-s for searching is not worse than 
equivalancing S and s for the same purpose, is it?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Entering Plane 1 characters in XP

2002-11-11 Thread John McConnell
Concerning display, there are two separate registry settings:
- in Windows 2000 and Windows XP, you can set a registry value to cause 
Uniscribe to load (Uniscribe is required to display supplementary characters). 
Alternatively, you could install any of the language packs that require Uniscribe. The 
only difference between Windows 2000 and Windows XP in this regard is that XP installs 
Uniscribe for East Asian languages, whereas 2000 installed it only for complex scripts.
- Windows XP added a feature to provide font-linking for supplementary 
characters if Uniscribe is loaded. There are 16 registry values, each of which 
designates a font for a plane. Although the mechanism exists, none of the registry 
values are set in Windows XP. Without this registry value set, you must explicitly 
select the font which contains the glyphs for the supplementary characters. The 
registry value for Plane 1 is:
HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1

Windows 2000 and Windows XP will otherwise treat supplementary characters identically 
e.g. sorting by code point order.

John
Global Infrastructure


-Original Message-
From: Andrew C. West [mailto:andrewcwest;alumni.princeton.edu] 
Sent: Monday, November 11, 2002 9:03 AM
To: [EMAIL PROTECTED]
Subject: Re: Entering Plane 1 characters in XP

On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote:

 
 XP requires the registry change as well.

I think the whole Registry thing is a red herring. I've never had to set the
registry to see surrogates under Windows 2K or XP. I've even deleted the
specified registry keys, and surrogates are still shown OK in IE, Notepad, Word
etc.

BTW, any application that uses Uniscribe can display surrogates just fine under
Windows 9x as well as 2K and XP.

Andrew





Re: Entering Plane 1 characters in XP

2002-11-11 Thread Tex Texin
Andrew, it is definitely a requirement for some applications.
However, it would not be surprising if applications overtime have made
themselves independent of the registry entry.

I do know that to view my plane 1 example web page with IE, the registry
needed to be set on both win 2k and win xp.
http://www.i18nguy.com/unicode-example-plane1.html

If I get some time later I'll play with unsetting it and see what
happens now.
tex



Andrew C. West wrote:
 
 On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote:
 
 
  XP requires the registry change as well.
 
 I think the whole Registry thing is a red herring. I've never had to set the
 registry to see surrogates under Windows 2K or XP. I've even deleted the
 specified registry keys, and surrogates are still shown OK in IE, Notepad, Word
 etc.
 
 BTW, any application that uses Uniscribe can display surrogates just fine under
 Windows 9x as well as 2K and XP.
 
 Andrew

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:Tex;XenCraft.com
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Entering Plane 1 characters in XP

2002-11-11 Thread Tex Texin
John,
thanks very much for this.

I want to confirm my understanding, and with your permission I'll
include your remarks below on my page for supporting surrogates.

1) The possible explanation then for the difference between Andrew and
myself with respect to the need for a special registry setting, is that
Andrew most likely installed something, perhaps a language pack, that
caused Uniscribe to be loaded on his system. He therefore didn't need
the setting. I probably didn't install anything that used Unsicribe.

2) The first paragraph describes a registry value that forces Uniscribe
to load.
I presume that you are referring to the first of these two entries
recommended by the kbase. The second seems specific to IE. Is that
presumption that this entry causes Uniscribe to be loaded correct?

[HKLM]\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack] 
 SURROGATE=(REG_DWORD)0x0002 

[HKEY_CURRENT_USER\Software\Microsoft\Internet
Explorer\International\Scripts\42] 
 IEFixedFontName=[Surrogate Font Face Name] 
 IEPropFontName=[Surrogate Font Face Name] 

3) For XP only, we can set a font face name that supports surrogates
into this registry entry. Doing so will make this font the default for
plane 1 characters, if another font is not explicitly designated to be
used:

HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1

(and by extension for the other planes).

cool. thanks
tex

John McConnell wrote:
 
 Concerning display, there are two separate registry settings:
 - in Windows 2000 and Windows XP, you can set a registry value to cause 
Uniscribe to load (Uniscribe is required to display supplementary characters). 
Alternatively, you could install any of the language packs that require Uniscribe. 
The only difference between Windows 2000 and Windows XP in this regard is that XP 
installs Uniscribe for East Asian languages, whereas 2000 installed it only for 
complex scripts.
 - Windows XP added a feature to provide font-linking for supplementary 
characters if Uniscribe is loaded. There are 16 registry values, each of which 
designates a font for a plane. Although the mechanism exists, none of the registry 
values are set in Windows XP. Without this registry value set, you must explicitly 
select the font which contains the glyphs for the supplementary characters. The 
registry value for Plane 1 is:
 
HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1
 
 Windows 2000 and Windows XP will otherwise treat supplementary characters 
identically e.g. sorting by code point order.
 
 John
 Global Infrastructure
 
 -Original Message-
 From: Andrew C. West [mailto:andrewcwest;alumni.princeton.edu]
 Sent: Monday, November 11, 2002 9:03 AM
 To: [EMAIL PROTECTED]
 Subject: Re: Entering Plane 1 characters in XP
 
 On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote:
 
 
  XP requires the registry change as well.
 
 I think the whole Registry thing is a red herring. I've never had to set the
 registry to see surrogates under Windows 2K or XP. I've even deleted the
 specified registry keys, and surrogates are still shown OK in IE, Notepad, Word
 etc.
 
 BTW, any application that uses Uniscribe can display surrogates just fine under
 Windows 9x as well as 2K and XP.
 
 Andrew

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:Tex;XenCraft.com
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




RE: Is long s a presentation form?

2002-11-11 Thread Peter_Constable

On 11/11/2002 11:12:55 AM Michael Everson wrote:

Are there not minimal pairs in Hebrew where the final form would be
expected but isn't used for some reason? There certainly is for final
sigma, which is why it is a good thing it is encoded separately.

I agree that there are valid reasons for encoding these as distinct
characters. When we did our implementations for Biblical Greek and Hebrew
several years ago, we weren't aware of those reasons; for the texts we were
concerned with it seemed quite appropriate to assume that there is only one
sigma character. My objection wasn't to there being two sigmas but to the
the claim that nobody considers them to be representable by a single
character.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Speaking of Plane 1 characters...

2002-11-11 Thread John Hudson
One of the tools I use for building fonts requires that codepoints for 
Plane 1 characters be expressed as surrogate pairs, rather than as scalar 
values. I'm hoping this will change on the next release, since the scalar 
values are a lot easier to work with, but in the meantime I need to figure 
out the easiest way to find the correct surrogate pair values for any given 
scalar value. Is there a comprehensive list somewhere, or an easy 
alogorithm (easy for a non-programmer)? How about a web-based form, into 
which someone could enter scalar values and receive back surrogate pairs?

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




RE: Entering Plane 1 characters in XP

2002-11-11 Thread John McConnell
I'll have somebody a bit more familiar with IE registry usage review that part, but 
the rest looks good. Thanks.

John
Global Infrastructure


-Original Message-
From: Tex Texin [mailto:tex;i18nguy.com] 
Sent: Monday, November 11, 2002 10:41 AM
To: John McConnell
Cc: Andrew C. West; [EMAIL PROTECTED]
Subject: Re: Entering Plane 1 characters in XP

John,
thanks very much for this.

I want to confirm my understanding, and with your permission I'll
include your remarks below on my page for supporting surrogates.

1) The possible explanation then for the difference between Andrew and
myself with respect to the need for a special registry setting, is that
Andrew most likely installed something, perhaps a language pack, that
caused Uniscribe to be loaded on his system. He therefore didn't need
the setting. I probably didn't install anything that used Unsicribe.

2) The first paragraph describes a registry value that forces Uniscribe
to load.
I presume that you are referring to the first of these two entries
recommended by the kbase. The second seems specific to IE. Is that
presumption that this entry causes Uniscribe to be loaded correct?

[HKLM]\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack] 
 SURROGATE=(REG_DWORD)0x0002 

[HKEY_CURRENT_USER\Software\Microsoft\Internet
Explorer\International\Scripts\42] 
 IEFixedFontName=[Surrogate Font Face Name] 
 IEPropFontName=[Surrogate Font Face Name] 

3) For XP only, we can set a font face name that supports surrogates
into this registry entry. Doing so will make this font the default for
plane 1 characters, if another font is not explicitly designated to be
used:

HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1

(and by extension for the other planes).

cool. thanks
tex

John McConnell wrote:
 
 Concerning display, there are two separate registry settings:
 - in Windows 2000 and Windows XP, you can set a registry value to cause 
Uniscribe to load (Uniscribe is required to display supplementary characters). 
Alternatively, you could install any of the language packs that require Uniscribe. 
The only difference between Windows 2000 and Windows XP in this regard is that XP 
installs Uniscribe for East Asian languages, whereas 2000 installed it only for 
complex scripts.
 - Windows XP added a feature to provide font-linking for supplementary 
characters if Uniscribe is loaded. There are 16 registry values, each of which 
designates a font for a plane. Although the mechanism exists, none of the registry 
values are set in Windows XP. Without this registry value set, you must explicitly 
select the font which contains the glyphs for the supplementary characters. The 
registry value for Plane 1 is:
 
HKLM\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\LanguagePack\SurrogateFallback\Plane1
 
 Windows 2000 and Windows XP will otherwise treat supplementary characters 
identically e.g. sorting by code point order.
 
 John
 Global Infrastructure
 
 -Original Message-
 From: Andrew C. West [mailto:andrewcwest;alumni.princeton.edu]
 Sent: Monday, November 11, 2002 9:03 AM
 To: [EMAIL PROTECTED]
 Subject: Re: Entering Plane 1 characters in XP
 
 On Mon, 11 Nov 2002 08:55:37 -0800 (PST), Tex Texin wrote:
 
 
  XP requires the registry change as well.
 
 I think the whole Registry thing is a red herring. I've never had to set the
 registry to see surrogates under Windows 2K or XP. I've even deleted the
 specified registry keys, and surrogates are still shown OK in IE, Notepad, Word
 etc.
 
 BTW, any application that uses Uniscribe can display surrogates just fine under
 Windows 9x as well as 2K and XP.
 
 Andrew

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:Tex;XenCraft.com
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Speaking of Plane 1 characters...

2002-11-11 Thread John Cowan
John Hudson scripsit:
 
 One of the tools I use for building fonts requires that codepoints for 
 Plane 1 characters be expressed as surrogate pairs, rather than as scalar 
 values. I'm hoping this will change on the next release, since the scalar 
 I need to figure 
 out the easiest way to find the correct surrogate pair values for any given 
 scalar value. 

If you have access to any Windows box, you can use the Windows Calculator
(Start/Programs/Accessories/Calculator).  Choose View/Scientific and
click on the Hex radio button.  Then enter your 5-digit Unicode scalar value.
(You must type hex digits in lower case.)  To get the high surrogate, type:

- 1 0 0 0 0 = / 4 0 0 + d 8 0 0 =

To get the low surrogate, enter the scalar value again and type:

- 1 0 0 0 0 = % 4 0 0 + d c 0 0 =

You can also use the mouse, in which case % above represents the MOD key.

On *ix systems, use the bc command; type obase=16 and ibase=16.
For this program, you must use capital letters for the hex digits.
To get the high surrogate, type (x-1)/400+DC00 for the high
surrogate (x is the scalar value); to get the low surrogate,
type (x-1)%400+DC00.

On the Macintosh, I have no clue.

-- 
John Cowan   [EMAIL PROTECTED]
You need a change: try Canada  You need a change: try China
--fortune cookies opened by a couple that I know




Re: Speaking of Plane 1 characters...

2002-11-11 Thread John Hudson
Many thanks to the various people who recommended Michael Kaplan's 
calculator at http://trigeminal.com/16to32AndBack.asp

This is excellent and solves my problem.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: Speaking of Plane 1 characters...

2002-11-11 Thread Tom Gewecke

On the Macintosh, I have no clue.

On Mac OS X, the Character Palette or the add-on UnicodeChecker will give
the surrogates for any given codepoint.

For a web page that calculates both ways, see

http://www.trigeminal.com/16to32AndBack.asp






Re: Speaking of Plane 1 characters...

2002-11-11 Thread Michael Everson
At 13:55 -0700 2002-11-11, Tom Gewecke wrote:

 On the Macintosh, I have no clue.

On Mac OS X, the Character Palette or the add-on UnicodeChecker will give
the surrogates for any given codepoint.


If you can get it to work. It still breaks for me so constantly I 
don't even try to use it. :-(
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Speaking of Plane 1 characters...

2002-11-11 Thread Michael Everson
At 13:11 -0800 2002-11-11, Michael \(michka\) Kaplan wrote:


  Perhaps it is just me, but terms like scalar value just don't mean
  anything to me. It rather reminds me of reptilian skin shedding.

Since I do not use that term on my site, I assume you are referring to
someone else's resource? :-)


It was related to this thread but in a previous post. Nevertheless a 
little gentle user-friendliness on your page would help me to use it 
more easily. Just a teensy tutorialette and a weensy example at the 
top? A little hand-holding?

  I visited MichKa's page and tried typing in 10312 (OLD ITALIC LETTER

 KU) and it did convert to a surrogate pair. I wonder what would
 happen if I pasted it into an HTML document. Hmm but I couldn't do
 that until I converted them to UTF-8


Well, since the page advertises itself as a UTF-16/UTF-32 sort of converter,
I would hope that the lack of UTF-8 byte conversion would be expected.


Gee, what I really need is a UTF-8/UTF-16/UTF/32 sort of converter 
that handles surrogates ;-) There isn't such a thing and there 
ought to be. :-)

  By the way MichKa if you make the boxes a bit wider the whole string

 of numbers would display.


What numbers did not display for you? They all fit for me


The surrogate pair shows three digits and a tiny little popup 
triangle to tell you that there's a fourth digit. If you need to I 
can send you a screenshot.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Speaking of Plane 1 characters...

2002-11-11 Thread Michael \(michka\) Kaplan
From: Michael Everson [EMAIL PROTECTED]
 At 12:10 -0700 2002-11-11, John Hudson wrote:

 Many thanks to the various people who recommended Michael Kaplan's
 calculator at http://trigeminal.com/16to32AndBack.asp
 
 This is excellent and solves my problem.

Glad you like it, John -- I am sure James Kass remembers when I put it up,
it was actually because of a complaint that there wasn't such a thing and
there ought to be. grin

 Perhaps it is just me, but terms like scalar value just don't mean
 anything to me. It rather reminds me of reptilian skin shedding.

Since I do not use that term on my site, I assume you are referring to
someone else's resource? :-)

 I visited MichKa's page and tried typing in 10312 (OLD ITALIC LETTER
 KU) and it did convert to a surrogate pair. I wonder what would
 happen if I pasted it into an HTML document. Hmm but I couldn't do
 that until I converted them to UTF-8

Well, since the page advertises itself as a UTF-16/UTF-32 sort of converter,
I would hope that the lack of UTF-8 byte conversion would be expected.

 By the way MichKa if you make the boxes a bit wider the whole string
 of numbers would display.

What numbers did not display for you? They all fit for me

MichKa





Re: Speaking of Plane 1 characters...

2002-11-11 Thread John Hudson
At 13:50 11/11/2002, Michael Everson wrote:


By the way MichKa if you make the boxes a bit wider the whole string of 
numbers would display.

I noticed the same problem in Opera. It's okay in IE.

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467





Re: Speaking of Plane 1 characters...

2002-11-11 Thread John Cowan
Michael Everson scripsit:

 Perhaps it is just me, but terms like scalar value just don't mean 
 anything to me. It rather reminds me of reptilian skin shedding.

The scale in question is analogous to a temperature scale, not a
reptilian one.

 I visited MichKa's page and tried typing in 10312 (OLD ITALIC LETTER 
 KU) and it did convert to a surrogate pair. I wonder what would 
 happen if I pasted it into an HTML document. Hmm but I couldn't do 
 that until I converted them to UTF-8

The Right Thing in HTML terms is to say #x10312; and *not* use the
surrogate pair representation.

-- 
Deshil Holles eamus.  Deshil Holles eamus.  Deshil Holles eamus.
Send us, bright one, light one, Horhorn, quickening, and wombfruit. (3x)
Hoopsa, boyaboy, hoopsa!  Hoopsa, boyaboy, hoopsa!  Hoopsa, boyaboy, hoopsa!
  -- Joyce, _Ulysses_, Oxen of the Sun   [EMAIL PROTECTED]




Re: Speaking of Plane 1 characters...

2002-11-11 Thread John Colby
At 13:18 11/11/2002 -0700, John Hudson wrote:


At 13:50 11/11/2002, Michael Everson wrote:


By the way MichKa if you make the boxes a bit wider the whole string of 
numbers would display.

I noticed the same problem in Opera. It's okay in IE.


That's the default font size mismatch - IE do things differently (they 
would!). In Mozilla and Phoenix do they fit?

John


Re: Speaking of Plane 1 characters...

2002-11-11 Thread Michael Everson
At 13:20 -0800 2002-11-11, Mark Davis wrote:

If you look http://www.macchiato.com/ under Unicode Charts, you can type
in the code point (scalar value) for a character, then Enter, and you will
get a chart. The UTF-8, 16, and 32 numbers are given in the chart for each
value.


Why do you call it a scalar value if it is really a code point? I 
thought it was bad enough Unicode calls it code point while 10646 
calls it code position

For the Terminology Police,
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Speaking of Plane 1 characters...

2002-11-11 Thread Michael \(michka\) Kaplan
From: John Hudson [EMAIL PROTECTED]

 At 13:50 11/11/2002, Michael Everson wrote:

 By the way MichKa if you make the boxes a bit wider the whole string of
 numbers would display.

 I noticed the same problem in Opera. It's okay in IE.

Ah, if I called *that* by design, someone might accuse me of global
conspiracy. :-)

Never mind, it wasn't that funny. I went ahead and updated the page, it
should work well in Opera Compatibility mode. g,dr

Michael, in answer to your request for a UTF-8 converter, that will have to
be another day (its a bit more complicated, and I spend most of my time in
UTF-16 and UTF-32 so I can't really pretend its work related). If you wanted
to provide the code in VBScript or JScript I will add it to the page (and
give you credit, of course).

MichKa





Re: Speaking of Plane 1 characters...

2002-11-11 Thread Michael Everson
At 13:34 -0800 2002-11-11, Michael \(michka\) Kaplan wrote:


Michael, in answer to your request for a UTF-8 converter, that will 
have to be another day (its a bit more complicated, and I spend most 
of my time in UTF-16 and UTF-32 so I can't really pretend its work 
related). If you wanted to provide the code in VBScript or JScript I 
will add it to the page (and give you credit, of course).

Sir, you mistake me for a programmer! :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




Re: Speaking of Plane 1 characters...

2002-11-11 Thread John Cowan
Michael Everson scripsit:

 The scale in question is analogous to a temperature scale, not a
 reptilian one.
 
 Now I very *seriously* don't get it.

A temperature scale enumerates the degrees -273, -272, -271, ..., 0, 1, 2, ...
in order.  When you ask What is the temperature?, you are actually asking
What is the scalar value of the temperature?

The Unicode scale enumerates the characters 0, 1, 2, ... 10.  Unicode
scalar values are points on this scale, just as temperature scalar values
are points on the (Celsius) temperature scale.

-- 
Winter:  MIT,   John Cowan
Keio, INRIA,[EMAIL PROTECTED]
Issue lots of Drafts.   http://www.ccil.org/~cowan
So much more to understand! http://www.reutershealth.com
Might simplicity return?(A tanka, or extended haiku)




Speaking plane1-ly

2002-11-11 Thread Tex Texin
I have modified my windows settings for surrogates page to include the
new information.
Consider it a draft for a day or two.

I would be grateful for any constructive review comments and the usual
comical abuse.
The page is at:

http://www.i18nguy.com/surrogates.html

I don't have any more time today, but if I had recommendations for
(lists of) IMEs and Fonts that support planes other than the BMP,
it might be nice to have a collection point and web page for them.

Much thanks to John McConnell for the clarifications and new info.

Hmmm. I just reviewed Andrew's comment that he can get support for
surrogates via uniscribe on windows 9x.
I guess I have to think about extending this to include those systems. I
guess if I get confirmation (or disconfirmation) from John or other
Microsofties I will update the page accordingly.

tex

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:Tex;XenCraft.com
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Speaking of Plane 1 characters...

2002-11-11 Thread Mark Davis
According to the new 4.0 definitions:

- code points go from 0..10, inclusive
- scalar value == non-surrogate code point, so they are simply a
restriction of code points to the ranges 0..D7FF, E000..10

Since surrogate code points can never represent characters, for a given
character you can refer to its code point or to its scalar value; in
that circumstance there is no effective difference in the terms.

Mark
__
http://www.macchiato.com
►  “Eppur si muove” ◄

- Original Message -
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, November 11, 2002 13:37
Subject: Re: Speaking of Plane 1 characters...


 At 13:20 -0800 2002-11-11, Mark Davis wrote:
 If you look http://www.macchiato.com/ under Unicode Charts, you can
type
 in the code point (scalar value) for a character, then Enter, and you
will
 get a chart. The UTF-8, 16, and 32 numbers are given in the chart for
each
 value.

 Why do you call it a scalar value if it is really a code point? I
 thought it was bad enough Unicode calls it code point while 10646
 calls it code position

 For the Terminology Police,
 --
 Michael Everson * * Everson Typography *  * http://www.evertype.com







Re: Speaking of Plane 1 characters...

2002-11-11 Thread Barry Caplan
At 05:47 PM 11/11/2002 -0500, John Cowan wrote:
Michael Everson scripsit:

 The scale in question is analogous to a temperature scale, not a
 reptilian one.
 
 Now I very *seriously* don't get it.

A temperature scale enumerates the degrees -273, -272, -271, ..., 0, 1, 2, ...
in order.  When you ask What is the temperature?, you are actually asking
What is the scalar value of the temperature?

The Unicode scale enumerates the characters 0, 1, 2, ... 10.  Unicode
scalar values are points on this scale, just as temperature scalar values
are points on the (Celsius) temperature scale.

Well, not exactly...temperature is an arbitrary but standard measure of a continuous 
physical property. The multiple well known scales attest to that. But code points are 
absolute points, not continuous. And because one character has a greater encoding 
value does not make it greater then in any useful sense. 

Basically, we are talking about continuous ordinal scales vs discrete cardinal scales. 
Hardly analogous at all IMM.

Barry Caplan
www.i18n.com






Re: Speaking of Plane 1 characters...

2002-11-11 Thread Jungshik Shin


On Mon, 11 Nov 2002, John Cowan wrote:

 On *ix systems, use the bc command; type obase=16 and ibase=16.

  Thank you for this. I should have read the man page of bc more
carefully. (or I used to know it but forgot...)

 For this program, you must use capital letters for the hex digits.
 To get the high surrogate, type (x-1)/400+DC00 for the high

  s/DC00/D800/

 surrogate (x is the scalar value); to get the low surrogate,
 type (x-1)%400+DC00.

And one can define a function

 On the Macintosh, I have no clue.

  As you know so well,  MacOS X is a Unix and 'bc' should be available
there, too.  If not by default, one can certainly grab the source and
compile it or get a precompiled binary somewhere.

  It seems to me a waste of the bandwidth (however abundant it may have
become recently. I heard several times on this list that it's not in a
certain country in Europe ;-) ) to go all the way across the Atlantic or
the continent to convert between UCVs and surrogate pairs.  There are
several ways to do it locally including two suggested above. On *nix
including MacOS X (http://developer.apple.com/internet/macosx/perl.html),
one can open up a small terminal window (yes, Mac OS X has a
terminal window !) and run a script like the following(assuming Perl
is installed.  If GUI is desired, make one up in Perl/Tk, Tcl/Tk,
pdksh, Python+Tk?...) This should also work in a command prompt of
Windows. Alternatively, I guess a local html file with ECMAscript should
also work.

Cuthere
#!/usr/bin/perl -w
# use the full path of your perl binary in place of /usr/bin/perl

while ( 1 ) {
  print ** Enter Unicode code point in hexadecimal \n .
  (to end, press [enter]) : ;
  $| = 1;   # force a flush after our print
  $ucs = STDIN;
  chomp $ucs;

  last if $ucs eq ;

  if ( $ucs =~ /[^a-f0-9A-F]/ ) {
printf   Error: %s is invalid. Try again\n, $ucs;
next;
  }

  $usv = hex $ucs;
  if ( 0x  $usv  $usv  0x11 ) {
printf UTF-16: %04x %04x\n, ($usv-0x1) / 0x400 + 0xd800,
  ($usv-0x1) % 0x400 + 0xdc00,
  }
  elsif ( $usv  0xd800 || 0xdfff  $usv  $usv  0x1 ) {
printf UTF-16: %04x\n, $usv;
  }
  else {
printf Your input %s is not valid. Try again\n, $ucs;
  }
}

print Bye !!\n;
Cut-here--

  Jungshik





Info: Apple OSX Font Tools Suite 1.0.0 Released

2002-11-11 Thread John H. Jenkins
Cupertino 11/8/02: Today the Apple Font Group released its new suite of 
Unix command line font tools for OSX.

These can be downloaded free from http://developer.apple.com/fonts/. 
The automatically installed 4.8 Mb package includes the tools, user 
documentation, and a 60-page tutorial.
To use this package, you need to be running OSX 10.2. Everything is 
automatically configured by the installer. You just add fonts to taste.

Working with text sources for many of the tables in an sfnt font 
structure is a powerful and efficient way to develop, debug and manage 
font sources. E.g. use ftxdumperfuser to solve cmap and postname 
glitches once and for all in .ttf, .otf and CFF format fonts.

With this release, Apple has converted its text dump formats to XML and 
will be continuing to refine the XML formats in future releases.

No previous experience of Unix is necessary as the 60-page tutorial 
takes you step-by-step through useful font editing proceses with an 
accompanying set of ready-worked live demo files.

Applications in The Font Tool Suite are:

*	ftxanalyzer
*	ftxdiff
*	ftxdumperfuser
*	ftxenhancer
*	ftxinstalledfonts
*	ftxruler
*	ftxvalidator

Documents included:

*	The Apple Font Tool Suite Manual (51 pages)
*	Tool Quick Reference (8 pages)

*	Tutorial (62 pages)
*	Tutorial Command Summary (8 pages)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/




Re: Speaking of Plane 1 characters...

2002-11-11 Thread Markus Scherer
Michael (michka) Kaplan wrote:

Michael, in answer to your request for a UTF-8 converter, that will have to
be another day (its a bit more complicated, and I spend most of my time in
UTF-16 and UTF-32 so I can't really pretend its work related). If you wanted
to provide the code in VBScript or JScript I will add it to the page (and
give you credit, of course).


Mark has it all in his UTF Converter and Charts at http://www.macchiato.com/unicode/convert.html
markus