Re: Private Use Agreements and Unapproved Characters

2002-03-12 Thread John Hudson

At 20:41 3/12/2002, Doug Ewell wrote:

>There's no reason it has to be that way.  Proposed glyphs are posted on
>the Unicode Web site months in advance of their "go live" date, even
>before the beta period, largely for this reason.  I'm sure Unicode-aware
>type designers like John Hudson don't wait until a version of Unicode is
>formally released before they start designing glyphs.

If something has been approved, and any likely changes are only going to be 
small details, yes, I'm willing to proceed with pre-publication 
information. I have, however, begun to avoid projects involving scripts and 
languages not included or approved for inclusion in Unicode; not because I 
don't want to support these languages, but because of the problems 
associated with such projects. There are enough scripts and languages 
already encoded in Unicode that need improved font support, without taking 
on those that have yet to be encoded. Lately, I've noticed that the old 
question 'Can you make me a font for X script?' has been replaced by 'What 
do I need to do to get X script included in Unicode?' or 'Who do I need to 
talk to at Microsoft about getting Y language supported?'. I think this is 
a positive development: font development should be built on a solid text 
encoding foundation, not jerry-rigged.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Private Use Agreements and Unapproved Characters

2002-03-12 Thread Doug Ewell

Back to Patrick's original question.  Warning: this post contains
nothing about Klingon, or even Tengwar.

Patrick Rourke <[EMAIL PROTECTED]> wrote:

> One effect of Unicode Consortium's rigorous proposal/review policy is
that
> while a particular script or group of characters may not be adopted
into
> Unicode for a couple of years after it is proposed, font makers
usually
> don't get around to creating the fonts for those scripts until after
they
> have been officially approved for Unicode.

There's no reason it has to be that way.  Proposed glyphs are posted on
the Unicode Web site months in advance of their "go live" date, even
before the beta period, largely for this reason.  I'm sure Unicode-aware
type designers like John Hudson don't wait until a version of Unicode is
formally released before they start designing glyphs.

> Would it be a misuse of the PUA to come up with a private agreement
within a
> community to assign certain codepoints in the PUA to characters that
have
> been propsed to the Unicode Consortium, but not yet approved, so that
font
> designers and others in that community could get to work on
establishing
> support for these characters, and so that content providers can begin
the
> process of incorporating these characters into their content?

As some have already said, this is exactly what the PUA is for.  But the
size and scope of the "community" may impose limits on the utility of
these PUA assignments.  Certainly not all font designers and content
providers for a given non-Unicode script, worldwide, can be expected to
comply -- and if they do, it may cause another set of problems, as we
will see.

One important point to remember is that any use or proposed use of the
PUA, such as ConScript, is strictly up to private organizations, not the
Unicode Consortium.  To be sure, ConScript is the domain of two guys who
are quite influential in Unicode, but they do not maintain ConScript in
any official capacity as representatives of Unicode.

> Would it be
> useful/practial for such an agreement to stipulate a versioning system
> whereby the font creators &c. and content providers in that community
who
> wish to use the PUA mapping in question would have to release new
versions
> of their products with the characters remapped to the approved
codepoints
> upon the acceptance of the characters in Unicode (and with the PUA
> codepoints being obsolesced, and eventually removed, in subsequent
versions
> of the agreement assignments, until all characters were assigned by
the
> Unicode Consortium)?

I would think you could simply use the version number of the Unicode
Standard.  For example, the use of Tagalog would have been conformant to
this proposed PUA registry until Unicode version 3.2, at which time it
would have to be removed from the registry because of its introduction
into Unicode.

> This would I think considerably shorten the amount of
> time it would take for characters to become usable to a community
after they
> had been accepted into Unicode, and would also provide a mechanism for
the
> gradual introduction of "new" characters, while the versioning system
would
> (I'd hope) prevent PUA code points from being used long after
perfectly good
> permenent code points have been assigned.

Conformance to this registry, especially over a period of time, is up to
the user community.  The presence of a standard is no guarantee that it
will be followed, or even noticed.

Here's an example of a potential pitfall of widespread PUA
quasi-standardization.  John Jenkins has probably done more than anyone
to get the Deseret Alphabet encoded in Unicode (although it is never
wise to overlook Michael Everson's influence).  John has a series of Web
pages describing Unicode and the DA.  To this day, the main page at
 still includes
the following quote, in large bold italics:

"It is strongly recommended that any implementations of the Deseret
Alphabet conform to the ConScript encoding, if possible."

Now, I don't bring this up to point out that John isn't keeping his Web
pages up to date, but to show that this is and will continue to be a
widespread problem, on the Web and elsewhere, even among the most
diligent supporters of a script and of Unicode.

Suppose Old Persian Cuneiform is encoded in Patrick's PUA registry next
week, and that encoding achieves some popularity.  Then suppose at some
later date it is encoded in Unicode, say version 4.1.  This will
necessarily cause the encoding in Patrick's registry to be withdrawn, or
at least deprecated.  How many people will switch immediately to the
sanctioned Unicode encoding?  How quickly will existing software and
data be converted?  Probably not right away, and the chances for a
timely conversion are less if the private-use encoding is particularly
successful, whether or not there are scripts available to help people
make the conversion.

I provided a "Format A" conversion table to map Desere

Re: More Unicode Myths?

2002-03-12 Thread Mark Davis

thanks

Mark
- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: "Mark Davis" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, March 12, 2002 16:45
Subject: Re: More Unicode Myths?


> Mark,
> 
> Check your date and time settings on your computer. A number of 
> recent postings haven't come in to me as recent, but several hours 
> astray. Maybe it's a weird internet thing, but check anyway.
> -- 
> Michael Everson *** Everson Typography *** http://www.evertype.com
> 





Re: Private Use Agreements and Unapproved Characters

2002-03-12 Thread John Hudson

At 16:01 3/12/2002, John Cowan wrote:

> > Would it even be *legal* to
> > include those characters (referring to U+00A9 COPYRIGHT SIGN)?
>
>Characters are abstractions, and glyphs are not subject to copyright
>protection.  I am not a lawyer; this is not legal advice.

I suppose the question is whether the Tengwar and Cirth scripts can be 
considered literary inventions within the copyright of the Tolkien estate. 
I suspect the estate could make a case, if they felt inclined to do so. I 
am also not a lawyer, and it is not any kind of advice.

By the way, alphanumeric outline format glyphs are not subject to copyright 
protection in the USA, but are in some other jurisdictions. 
Non-alphanumeric glyphs, particularly ornaments and other non-semantic 
glyphs, may be subject to copyright as works of art in the USA. Bitmap 
alphanumeric glyphs are not subject to copyright anywhere, to my knowledge, 
but bitmap icons are. The US Copyright Office doesn't know type design from 
a hole in the ground, insofar as they consider it the intellectual property 
equivalent of digging a ditch. The US Patent and Trademark Office grants 
design patent protection to glyph designs if they demonstrate significant 
distinctiveness and originality according to the review criteria.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]

... es ist ein unwiederbringliches Bild der Vergangenheit,
das mit jeder Gegenwart zu verschwinden droht, die sich
nicht in ihm gemeint erkannte.

... every image of the past that is not recognized by the
present as one of its own concerns threatens to disappear
irretrievably.
   Walter Benjamin





Re: Private Use Agreements and Unapproved Characters

2002-03-12 Thread Kenneth Whistler

Stefan asked:

> > In general, no.  If there is a fair chance that something will become
> > part of Unicode, we usually don't register it.  There are exceptions,
> > like Tengwar and Cirth.
> 
> Is there any chance that Tengwar and Cirth might become parts of the UCS? 

Yes. I consider them perfectly valid instances of scripts, in reasonably
wide use for a number of purposes, and well-enoughed defined that the
encoding can be decided. And there are enough members of the UTC who
think they are valid that they can still be considered clearly on the
table (though not actively under investigation for any imminent
encoding).

> I
> know that they have been proposed for inclusion, but all proposed characters
> don't have to be included in the standard... Would it even be *legal* to
> include those characters (referring to U+00A9 COPYRIGHT SIGN)?

Why not? Elvish poetry is published in academic publications using
Tengwar, without anybody paying some licensing fee to somebody for
use of the characters. I see nothing preventing standardization of
the scripts -- and many of the users of the scripts would be in favor
of such an action.

> 
> BTW, has *any* script, invented for *any* kind of fiction (or similar), ever
> been fully approved and included in the UCS?

Shavian has been approved by the UTC for inclusion in Unicode, and is
under ballot as an amendment for 10646 currently. It wasn't exactly
invented *for* fiction per se, but rather as a failed orthographic
reform, but it was used, of course, to publish Androcles and the Lion.

> And, has any such script ever
> been rejected?

The Klingon *script* has been rejected, since it is ill-defined,
and is not actually used by Klingon language fans to represent
Klingon. (Klingon is normally represented in a Latin-script-based
orthography.)

--Ken





Re: More Unicode Myths?

2002-03-12 Thread Michael Everson

Mark,

Check your date and time settings on your computer. A number of 
recent postings haven't come in to me as recent, but several hours 
astray. Maybe it's a weird internet thing, but check anyway.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Private Use Agreements and Unapproved Characters

2002-03-12 Thread David Starner

On Wed, Mar 13, 2002 at 12:00:44AM +0100, Stefan Persson wrote:
> Is there any chance that Tengwar and Cirth might become parts of the UCS?

Michael Everson seems to think there is. As Michael Everson has been the
driving force behind many of the scripts in Unicode, he should know.

> Would it even be *legal* to
> include those characters (referring to U+00A9 COPYRIGHT SIGN)?

One journal written in Quencha in Tengwar asked a lawyer that question,
and was told that it was completely legal for them to use the language
and script. I would think Unicode's liability would be less. The heirs
of Tolkein's estate have not objected legally to any use of Tengwar that
I've heard of.

> BTW, has *any* script, invented for *any* kind of fiction (or similar), ever
> been fully approved and included in the UCS? And, has any such script ever
> been rejected?

There's Shavian, designed to publish "Androceles and the Lion." (Did
Reads actually believe in Shaw's pipe dreams?) Klingon was rejected, but
precisely because it wasn't a script in use.

-- 
David Starner - [EMAIL PROTECTED]
"It's not a habit; it's cool; I feel alive. 
If you don't have it you're on the other side." 
- K's Choice (probably refering to the Internet)




Re: Private Use Agreements and Unapproved Characters

2002-03-12 Thread John Cowan

Stefan Persson scripsit:

> Is there any chance that Tengwar and Cirth might become parts of the UCS? I
> know that they have been proposed for inclusion, but all proposed characters
> don't have to be included in the standard...

Of the insiders, some are strongly for it and have said so, some are
strongly against it and have said so.  Even the most highly trained
finger-lickers probably can't say for sure what will happen.

> Would it even be *legal* to
> include those characters (referring to U+00A9 COPYRIGHT SIGN)?

Characters are abstractions, and glyphs are not subject to copyright
protection.  I am not a lawyer; this is not legal advice.

> BTW, has *any* script, invented for *any* kind of fiction (or similar), ever
> been fully approved and included in the UCS? And, has any such script ever
> been rejected?

Klingon was rejected, partly on the ground that even Klingonists don't
use the Klingon script to write Klingon: all extant writings (as opposed
to background visuals in TV shows and movies) are in the Latin script.

Ogham, which is included, has historical value, but nowadays is used
more like a ConScript:  I believe Michael Everson conjectured that
the total amount of modern Ogham writing  (kids passing notes in school, e.g.)
exceeds that on all known historical inscriptions.

-- 
John Cowan <[EMAIL PROTECTED]> http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_




More Unicode Myths?

2002-03-12 Thread Mark Davis



I am going to be updating my slides on "Unicode 
Myths" for a presentation in April. If anyone has any additional myths (common 
misinformation about Unicode), I'd appreciate suggestions.
 
[If you want to see the presentation, it's at www.macchiato.com, under "Unicode Myths". 
Unfortunately, to see the slide notes in PP you have to have to be in Full 
Screen mode: Right-click; Full Screen; Right-click; Speaker notes.]
 
Mark


Re: Private Use Agreements and Unapproved Characters

2002-03-12 Thread Stefan Persson

- Original Message -
From: "John Cowan" <[EMAIL PROTECTED]>
To: "Patrick Rourke" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: den 12 mars 2002 21:45
Subject: Re: Private Use Agreements and Unapproved Characters


> In general, no.  If there is a fair chance that something will become
> part of Unicode, we usually don't register it.  There are exceptions,
> like Tengwar and Cirth.

Is there any chance that Tengwar and Cirth might become parts of the UCS? I
know that they have been proposed for inclusion, but all proposed characters
don't have to be included in the standard... Would it even be *legal* to
include those characters (referring to U+00A9 COPYRIGHT SIGN)?

BTW, has *any* script, invented for *any* kind of fiction (or similar), ever
been fully approved and included in the UCS? And, has any such script ever
been rejected?

Stefan


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Private Use Agreements and Unapproved Characters

2002-03-12 Thread John Cowan

Patrick Rourke scripsit:

> Would it be a misuse of the PUA to come up with a private agreement within a
> community to assign certain codepoints in the PUA to characters that have
> been proposed to the Unicode Consortium, but not yet approved, so that font
> designers and others in that community could get to work on establishing
> support for these characters, and so that content providers can begin the
> process of incorporating these characters into their content? 

It is not only *not* a misuse, it is one of the intended uses,
one might say the most important intended use.

> Second, is ConScript (I know ConScript isn't a Unicode Consortium resource,
> but since the two principals are on this list . . .) staying limited to
> "constructed scripts," or is it also accepting "natural" or "evolved"
> scripts that for one reason or another haven't been accepted into Unicode
> yet (one thinks of e.g. Old Persian, already mentioned)?

In general, no.  If there is a fair chance that something will become
part of Unicode, we usually don't register it.  There are exceptions,
like Tengwar and Cirth.

(In truth neither of us has had much time to process new registrations
lately.  Arse longa, vita brevis.)

> The main issue I can think of is the matter of rejected characters: what
> does one do if a character is rejected by the Unicode Consortium for valid
> reasons?

I suspect that it would be unusual to reject a random documented character from
a well-established set.  More likely (though not very likely) would be
to reject the whole coding model, in which case all bets would be off.

-- 
John Cowan <[EMAIL PROTECTED]> http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_




Private Use Agreements and Unapproved Characters

2002-03-12 Thread Patrick Rourke

A couple of quick questions for folks:

One effect of Unicode Consortium's rigorous proposal/review policy is that
while a particular script or group of characters may not be adopted into
Unicode for a couple of years after it is proposed, font makers usually
don't get around to creating the fonts for those scripts until after they
have been officially approved for Unicode.

Would it be a misuse of the PUA to come up with a private agreement within a
community to assign certain codepoints in the PUA to characters that have
been propsed to the Unicode Consortium, but not yet approved, so that font
designers and others in that community could get to work on establishing
support for these characters, and so that content providers can begin the
process of incorporating these characters into their content? Would it be
useful/practial for such an agreement to stipulate a versioning system
whereby the font creators &c. and content providers in that community who
wish to use the PUA mapping in question would have to release new versions
of their products with the characters remapped to the approved codepoints
upon the acceptance of the characters in Unicode (and with the PUA
codepoints being obsolesced, and eventually removed, in subsequent versions
of the agreement assignments, until all characters were assigned by the
Unicode Consortium)?  This would I think considerably shorten the amount of
time it would take for characters to become usable to a community after they
had been accepted into Unicode, and would also provide a mechanism for the
gradual introduction of "new" characters, while the versioning system would
(I'd hope) prevent PUA code points from being used long after perfectly good
permenent code points have been assigned.

The idea, too, would be that the font creators and content providers using
the agreement would all cooperate in the creation of tools that would make
it easy to upgrade content from version to version (i.e., write scripts to
convert the PUA code points to the newly assigned permanent code points, and
that these tools would be distributed together in a package licensed
compatably with the content providers' code.

Yes, I know about ConScript.  I'm just checking to see that I'm making my
analogy from ConScript properly.

Second, is ConScript (I know ConScript isn't a Unicode Consortium resource,
but since the two principals are on this list . . .) staying limited to
"constructed scripts," or is it also accepting "natural" or "evolved"
scripts that for one reason or another haven't been accepted into Unicode
yet (one thinks of e.g. Old Persian, already mentioned)? If not, are there
analogous resources to ConScript for such "natural" scripts?  The ideal
would of course be to avoid conflicts between overlapping user communities,
and while the community I have in mind doesn't overlap much with that of
ConScript, there might be others that it does overlap with considerably.
While ConScript doesn't look like it was intended for the purpose I have in
mind, it is pretty analogous.

I'm particularly interested in any reasons why this would be a bad idea in a
scholarly community (in other words, I'm convinced that it would work, given
careful planning, but need to expose the idea to some hostile fire to see if
it can stand up).

The main issue I can think of is the matter of rejected characters: what
does one do if a character is rejected by the Unicode Consortium for valid
reasons?  Delete it from the agreement, and have to remove a distinction
from the character data of the content providers? Leave it there, and so
perpetuate some final version of the agreement for all time, as a kind of
extension to Unicode?

Thanks,
Patrick Rourke
[EMAIL PROTECTED]







Re: Keyboard Layouts for Office XP in WIndows 98

2002-03-12 Thread Michael Everson

If anyone wants to experiment with Old Persian implementation they 
should use private use characters. Doing anything else creates bad 
habits.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




RE: Book

2002-03-12 Thread Michael Everson

At 10:36 -0800 2002-03-11, F. Avery Bishop wrote:
>I'd be happy to translate or summarize parts of it if someone would buy
>me a copy :)

I've started talking to some people in Japan about a translation. 
This could be a lot of fun

Happily,
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Getting Private Use Area characters to display in Internet Explorer

2002-03-12 Thread Parslow Peter
Title: THIS MAIL IS UNCLASSIFIED Getting Private Use Area characters to display in Internet Explorer





We're developing a publishing system for our hydrographic notices. We need to include a number of industry / office specific symbols (wrecks, various buoys etc.), which we hold in the office within a "home made" font.

In the past, we have included these in word processor documents by specifying the font, but it seems more sensible with Unicode to place them in the Private Use Area, and refer to them by character code (or by our own named entities within the XML context in which we're now developing, which will translate to the numeric entities).

The applications will be running on our intranet, so we can control the client environments (fonts installed, versions of other software, etc.).

It all seemed quite simple.


But, when I actually view the document with Internet Explorer (5.5 or 6), I don't see the characters I want. I can select my font in the Tools/Internet Options/Fonts dialogue, for "Language script: User Defined", but all I get to see are empty squares.

Is there something I'm missing, or will we have to continue to wrap our special characters in HTML Font face= tags? - which works fine, but seems to bypass the point of using Unicode

I've tried posting this question on Microsoft's Internet Explorer forums, but haven't received any reply.


Help or advice would be appreciated.





Re: MS Command Prompt

2002-03-12 Thread Asmus Freytag

At 09:31 AM 3/8/02 -0500, Patrick Rourke wrote:
>Don't know if this will help any with NT.

I am using Lucida console on all my command prompt windows, so that's the 
reason I could never see the problem. You can set properties like font and 
color for the command prompt and have that information be associated with 
the shortcut you use to launch the window, or for windows that have the 
same title. That seems to work well for many kinds of command prompts.

A./




RE: Keyboard Layouts for Office XP in Windows 98

2002-03-12 Thread Martin Kochanski

At 17:34 11/03/02 -0600, [EMAIL PROTECTED] wrote:
>On 03/11/2002 12:58:16 AM "Chris Pratley" wrote:
>
>>While it is true that in terms of absolute numbers most apps do not yet
>>support UTF-16, it is worth noting that OfficeXP and anything based on
>>mshtml.dll ver.6 (e.g. IE 6) or Riched20.dll v.4 (e.g. Wordpad in WinXP)
>>do handle surrogate characters from UTF-16 correctly. So in terms of
>>usage, surrogate support is covered pretty well as adoption of these
>>newer versions increases.
>
>But I believe there is another problem: I'm pretty sure that the TrueType 
>rasterisation part of Win9x/Me does not support the newer cmap formats 
>that are required to display glyphs for non-BMP characters. So, the apps 
>may understand the characters, but unless they are reading the cmap tables 
>on their own and drawing text as glyph strings, you won't see the glyphs 
>on Win9x/Me.
>
>I expect Chris was assuming Win2K/XP, since it is very definitely a better 
>platform for script support. This issue of support for newer cmap formats 
>is but one reason why.

The trouble is that in the real world no-one uses Win2K/XP. No-one uses Win9x/Me 
either. They just use Windows. Ask a user any more than that, and he'll look blank; 
insist on an answer, and he'll go off and buy something else. So we need to be able to 
run equally well on both platforms without having to ask. 
That said, anyone who uses non-BMP characters will already know that they all look the 
same on his system (even if he doesn't know explicitly that it's 9x/Me), so *in this 
particular case* we should be able to get away with it.

Of course, we can't assume mshtml.dll ver.6 or Riched20.dll v.4 or even Uniscribe, 
since none of these are part of Windows either...