Re: Internal Representation of Unicode

2003-09-25 Thread Doug Ewell
Johann  wrote:

> That does not have to be a problem, as long as there are no more than
> 255 accents and combinations of them.  As for vietnamese, I just don't
> know how many there are, or how many characters they use.

You'll need UTF-8 and a fairly comprehensive font to read the following.

For Vietnamese, you should count on supporting the following vowels:

a à ả ã á ạ ă ằ ẳ ẵ ắ ặ â ầ ẩ ẫ ấ ậ e è ẻ ẽ é ẹ 
ê ề ể ễ ế ệ i ì ỉ ĩ í ị
o ò ỏ õ ó ọ ô ồ ổ ỗ ố ộ ơ ờ ở ỡ ớ ợ u ù ủ ũ ú ụ ư 
ừ ử ữ ứ ự y ỳ ỷ ỹ ý ỵ

the following consonant (in addition to most other English consonants):

đ

and this currency sign:

₫

For purposes of your mechanism, you can think of each vowel as having up
to 2 accents: (upper, right-attached, or none) plus (upper, lower, or
none).  The way Vietnamese think of it is that the circumflex, breve,
and horn are part of the base letter (making a total of 12 base vowels),
whereas the grave, hook above, tilde, acute, and dot below are
considered diacritics (6 × 12 = 72 total vowels).  All combinations are
possible.

Of course, all of the letters (not the dong sign) come in both uppercase
and lowercase.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




Unicode 4.0 book (was: Re: About that alphabetician...)

2003-09-25 Thread Doug Ewell
That alphabetician  wrote:

> And on tis very day, my copy of Unicode 4.0 has arrived. :-)

Shipping takes longer to IE than to US.  (Oops, I just used ISO's
intellectual property.)

I received my copy a few weeks ago, and just noticed the new section
5.19, "Unicode Security" (pp. 140-142), which includes, to my surprise
and glee, a subsection on "Spoofing" that borrows heavily from my e-mail
of 2002-02-15:

http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0304.html

right down to the word "albeit."  :-)

I am delighted that the book committee found these suggestions useful.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




Re: Internal Representation of Unicode

2003-09-25 Thread myrkraverk
Hi,

John Cowan writes:
 > The problem is that multiple accents above are quite common -- Vietnamese
 > depends on them heavily.  There may also be multiple accents below,
 > for all I know.

That does not have to be a problem, as long as there are no more than
255 accents and combinations of them.  As for vietnamese, I just don't
know how many there are, or how many characters they use.


Johann

-- 
Emacs is not a text editor -- it's a way of life




Re: Internal Representation of Unicode

2003-09-25 Thread John Cowan
[EMAIL PROTECTED] scripsit:

> All of these fields are actually implementation defined, with just one
> rule for char: don't include characters that can be made with
> combinations, that's what the accent fields are for.  This allows for
> 255 upper and lower accents which should be enough -- for now.

The problem is that multiple accents above are quite common -- Vietnamese
depends on them heavily.  There may also be multiple accents below,
for all I know.

-- 
John Cowan  http://www.ccil.org/~cowan  [EMAIL PROTECTED]
Be yourself.  Especially do not feign a working knowledge of RDF where
no such knowledge exists.  Neither be cynical about RELAX NG; for in
the face of all aridity and disenchantment in the world of markup,
James Clark is as perennial as the grass.  --DeXiderata, Sean McGrath



Re: About that alphabetician...

2003-09-25 Thread Curtis Clark
Of course, any Unicode character can be expressed as an XML character 
reference (e.g. म) in any web page encoding, even US-ASCII.
--
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/




Re: Unicode Normalisaton Optimisation Experiments

2003-09-25 Thread Markus Scherer
Peter Kirk wrote:
On 25/09/2003 14:25, Markus Scherer wrote:
In other words, yes, Unicode's NFC does perform "discontiguous 
composition". Some things might be easier if only contiguous 
composition were used, but the current definition does give you the 
shortest strings.
And this current definition cannot be changed because of the stability 
policy, right?
Right.

markus




Internal Representation of Unicode

2003-09-25 Thread myrkraverk
Hi,

In a plain text environment, there is often a need to encode more than
just the plain character.  A console, or terminal emulator, is such an
environment.  Therefore I propose the following as a technical report
for internal encoding of unicode characters; with one goal in mind:
character equalence is binary equalence.

Since I'm using 64 bits, I call it Excessive Memory Usage Encoding, or
EMUE.

I thought of dividing the 64 bit code space into 32 variably wide
plains, one for control characters, one for latin characters, one for
han characters, and so on; using 5 bits and the next 3 fixed to zero
(for future expansion and alignment to an octet).

I call plain 0 control characters and won't discuss it further.

Plain 1, I had intended for latin characters with the following
encoding method in mind:

bits 63..59  58..56 55..40 39..32 31..24 23..16 15..8  7..0
+---+--+--+--+--+--+--+--+
| plain | zero | attr | res  | uacc | lacc | res  | char |
+---+--+--+--+--+--+--+--+

* Plain Plain(5 bits)
* Zero  Zero bits(3 bits)
* Attr  Attributes   (16 bits)
* Res   Reserved (8 bits)
* Uacc  Upper Accent (8 bits)
* Lacc  Lower Accent (8 bits)
* Res   Reserved (8 bits)
* Char  Character(8 bits)

All of these fields are actually implementation defined, with just one
rule for char: don't include characters that can be made with
combinations, that's what the accent fields are for.  This allows for
255 upper and lower accents which should be enough -- for now.

For Han characters I thought of the following encoding method (with no
particular plain in mind):

bits 63..59  58..56 55..40 39..32  31 ..0
+---+--+--+---+--+
| plain | zero | attr | style |  char|
+---+--+--+---+--+

* Plain Plain(5 bits)
* Zero  Zero bits(3 bits)
* Attr  Attributes   (16 bits)
* Style Stylistic Variation  (8 bits)
* Char  Character(32 bits)

Again, all fields are implementation defined.  Telling something like
a terminal emulator what stylistic variation to use is outside the
scope of this email, but for attributes, there are standardized escape
sequences; but I suspect language tags can be used.

I was also thinking of a plain for punctuation and symbolic characters.

I will be pleased if anyone can come up with better encoding methods
than I did, and I call upon other people to come up with encodings for
scripts I know nothing about, such as arabic and others.  Then let's
wrap it up in a technical report and be done with it ;)


Any comments?

Johann

-- 
Sometimes I do not think at all!  Does that mean I don't exist
in the mean time?




RE: Web Form: Other Question: Unicode characters in Form in MSAcc ess

2003-09-25 Thread Rick Cameron
Crystal Reports 9.0 does support Unicode, right from the database through to
printing or exporting.

When you say SQL2000, do you mean Microsoft SQL Server 2000?

What kind of database driver are you using to access the data - e.g. ODBC?
Ole DB? Via Access?

What is the data type of the column in the database?

Was the report created in CR 9.0, or in an earlier version of CR?

Cheers

- rick cameron, crystal decisions

-Original Message-
From: Magda Danish (Unicode) [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 25 September 2003 15:29
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: FW: Web Form: Other Question: Unicode characters in Form in
MSAccess 


Hi,

I am forwarding your question to the Unicode list for possible answer from
one of the list subscrbers.

Regards,

Magda Danish
Administrative Director
The Unicode Consortium
650-693-3921
 

> -Original Message-
> Date/Time:Tue Sep 23 04:06:15 EDT 2003
> Contact:  [EMAIL PROTECTED]
> Report Type:  Other Question, Problem, or Feedback
> 
> Greeings,
> I have a problem using Unicode outeside the web.
> while in IE5.0 or higher ,when I choose UTF-8 encoding I can
> see unicode data easily.These data are stored in a SQL2000 
> database.But when I want to print them in a Form in MSAccess 
> or Crystal Reports9.0(i.e. Outside IE 5.0) the characters are 
> unreadable.
> What shall I do to overcome this problem?
> Best Regards
> M.Janbeglou
> 
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> (End of Report)
> 
> 
> 



Re: Unicode Normalisaton Optimisation Experiments

2003-09-25 Thread Peter Kirk
On 25/09/2003 14:25, Markus Scherer wrote:

Peter Kirk wrote:

On 25/09/2003 12:27, [EMAIL PROTECTED] wrote:

It's not a reordering per se, as the first combining character is 
given the first "opportunity" to combine.
 
Thanks for the clarification.


In other words, yes, Unicode's NFC does perform "discontiguous 
composition". Some things might be easier if only contiguous 
composition were used, but the current definition does give you the 
shortest strings.
And this current definition cannot be changed because of the stability 
policy, right?

See also http://www.unicode.org/notes/tn5/#FCC (not a normative 
Unicode document).

markus


Thanks.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




FW: Web Form: Other Question: Unicode characters in Form in MSAccess

2003-09-25 Thread Magda Danish \(Unicode\)
Hi,

I am forwarding your question to the Unicode list for possible answer
from one of the list subscrbers.

Regards,

Magda Danish
Administrative Director
The Unicode Consortium
650-693-3921
 

> -Original Message-
> Date/Time:Tue Sep 23 04:06:15 EDT 2003
> Contact:  [EMAIL PROTECTED]
> Report Type:  Other Question, Problem, or Feedback
> 
> Greeings,
> I have a problem using Unicode outeside the web.
> while in IE5.0 or higher ,when I choose UTF-8 encoding I can 
> see unicode data easily.These data are stored in a SQL2000 
> database.But when I want to print them in a Form in MSAccess 
> or Crystal Reports9.0(i.e. Outside IE 5.0) the characters are 
> unreadable.
> What shall I do to overcome this problem?
> Best Regards
> M.Janbeglou
> 
> -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> (End of Report)
> 
> 
> 



Re: About that alphabetician...

2003-09-25 Thread Deborah Goldsmith
I already wrote this up internally as a bug.

Thanks,
Deborah
On 2003/09/25, at 14:05, Tom Gewecke wrote:

About the c-cedilla, it appears that OS X Safari does not  pick  up 
the charset on this page.  If the default is set to UTF-8, the c 
disappears altogether.  The  correct character is displayed only if 
the browser is set by default  or manually to Latin 1.






c-cedilla problem at NYT

2003-09-25 Thread Tom Gewecke
PS The reason the Latin 1 charset  is not picked up by a browser would 
appear to be bad html.  The page has



instead of






RE: About that alphabetician...

2003-09-25 Thread Tom Gewecke
About the c-cedilla, it appears that OS X Safari does not  pick  up the 
charset on this page.  If the default is set to UTF-8, the c disappears 
altogether.  The  correct character is displayed only if the browser is 
set by default  or manually to Latin 1.




RE: AddDefaultCharset considered harmful (was: Mojibake on my Web pages)

2003-09-25 Thread Paul Deuter
Here is a link which describes how some hackers use 
%XX and %u url encoding to mask a malicious request
or to get around an IDS product.

http://www.cgisecurity.com/contrib/hd_spring_2002.pdf

-Paul

-Original Message-
From: Martin Duerst [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 25, 2003 1:32 PM
To: Doug Ewell; Unicode Mailing List
Subject: AddDefaultCharset considered harmful (was: Mojibake on my Web
pages)


Hello Doug, others,

Here is my most probable explanation:
Adelphia recently upgraded to Apache 2.0. The core config file (httpd.conf)
as distributed contains an entry
 AddDefaultCharset iso-8859-1
which does what you have described. They probably adopted this
because the comment in the config file suggests that it's important.

I have just filed a bug with bugzilla, asking that this default
setting be removed or commented out, and the comment fixed, at
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23421. You may
want to vote for that bug.

I have also commented on a related bug that I found, at
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14513.

I suggest you tell your Internet provider:
1) that they change to AddDefaultCharset Off
(or simply comment this out)
2) that they make sure you get FileInfo permission in your directories,
so that you can do the settings you know you are correct.

The comment in the config file contains mostly very strange statements:

 
#
# Specify a default charset for all pages sent out. This is
# always a good idea and opens the door for future internationalisation
# of your web site, should you ever want it. Specifying it as
# a default does little harm; as the standard dictates that a page
# is in iso-8859-1 (latin1) unless specified otherwise i.e. you
# are merely stating the obvious. There are also some security
# reasons in browsers, related to javascript and URL parsing
# which encourage you to always set a default char set.
#
AddDefaultCharset ISO-8859-1
 >>>

If anybody knows something about these security issues, please
tell me (any mention of security issues usually has webmasters
in control, for good reasons).


Regards,   Martin.



At 22:40 03/09/22 -0700, Doug Ewell wrote:
>Apologies in advance to anyone who visits my Web site and sees garbage
>characters, a.k.a. "mojibake."  It isn't my fault.
>
>Adelphia is currently having a character-set problem with their HTTP
>servers.  Apparently they are serving all pages as ISO 8859-1 even if
>they are marked as being encoded in another character set, such as
>UTF-8.

>If you manually change the encoding in your browser to UTF-8, or
>download the page and display it as a local file, everything looks fine
>because Adelphia's server is no longer calling the shot.  Their tech
>support people acknowledge that the problem is at their end and said
>they would look into it.
>
>I understand that having the "Unicode Encoded" logo on my page next to
>these garbage characters may not reflect well on Unicode, especially to
>newbies.  I'm considering putting a disclaimer at the top of my pages,
>but I'm waiting to see how quickly they solve the problem.
>
>-Doug Ewell
>  Fullerton, California
>  http://users.adelphia.net/~dewell/





Re: Unicode Normalisaton Optimisation Experiments

2003-09-25 Thread Markus Scherer
Peter Kirk wrote:
On 25/09/2003 12:27, [EMAIL PROTECTED] wrote:
It's not a reordering per se, as the first combining character is 
given the first "opportunity" to combine.
 
Thanks for the clarification.
In other words, yes, Unicode's NFC does perform "discontiguous composition". Some things might be 
easier if only contiguous composition were used, but the current definition does give you the 
shortest strings.

See also http://www.unicode.org/notes/tn5/#FCC (not a normative Unicode document).

markus




AddDefaultCharset considered harmful (was: Mojibake on my Web pages)

2003-09-25 Thread Martin Duerst
Hello Doug, others,

Here is my most probable explanation:
Adelphia recently upgraded to Apache 2.0. The core config file (httpd.conf)
as distributed contains an entry
AddDefaultCharset iso-8859-1
which does what you have described. They probably adopted this
because the comment in the config file suggests that it's important.
I have just filed a bug with bugzilla, asking that this default
setting be removed or commented out, and the comment fixed, at
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23421. You may
want to vote for that bug.
I have also commented on a related bug that I found, at
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14513.
I suggest you tell your Internet provider:
1) that they change to AddDefaultCharset Off
   (or simply comment this out)
2) that they make sure you get FileInfo permission in your directories,
   so that you can do the settings you know you are correct.
The comment in the config file contains mostly very strange statements:


#
# Specify a default charset for all pages sent out. This is
# always a good idea and opens the door for future internationalisation
# of your web site, should you ever want it. Specifying it as
# a default does little harm; as the standard dictates that a page
# is in iso-8859-1 (latin1) unless specified otherwise i.e. you
# are merely stating the obvious. There are also some security
# reasons in browsers, related to javascript and URL parsing
# which encourage you to always set a default char set.
#
AddDefaultCharset ISO-8859-1
>>>
If anybody knows something about these security issues, please
tell me (any mention of security issues usually has webmasters
in control, for good reasons).
Regards,   Martin.



At 22:40 03/09/22 -0700, Doug Ewell wrote:
Apologies in advance to anyone who visits my Web site and sees garbage
characters, a.k.a. "mojibake."  It isn't my fault.
Adelphia is currently having a character-set problem with their HTTP
servers.  Apparently they are serving all pages as ISO 8859-1 even if
they are marked as being encoded in another character set, such as
UTF-8.

If you manually change the encoding in your browser to UTF-8, or
download the page and display it as a local file, everything looks fine
because Adelphia's server is no longer calling the shot.  Their tech
support people acknowledge that the problem is at their end and said
they would look into it.
I understand that having the "Unicode Encoded" logo on my page next to
these garbage characters may not reflect well on Unicode, especially to
newbies.  I'm considering putting a disclaimer at the top of my pages,
but I'm waiting to see how quickly they solve the problem.
-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




RE: a little more help understanding diacritical encoding

2003-09-25 Thread Paul Deuter
It would appear that your server side is in Java.
There is a well known issue in older versions of the
Java servlet spec that cause the request class to 
assume that %HH encoded octets are 8859-1 octets.
It seems that this is your problem.
The workaround is to get the parameters from the
request object and turn them back into bytes and
then re-interpret them as UTF-8 (because that is
what they are).

The code to do that looks like this:

String strFoo = new String(request.getParameter("whatever").getBytes(8859_1), "UTF-8");

-Paul



-Original Message-
From: Steve Pruitt [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 25, 2003 9:03 AM
To: [EMAIL PROTECTED]
Subject: a little more help understanding diacritical encoding


Thanks for the excellent responses.  I now understand how C3 and 89 are derived.  I 
tried getting everything set the way I intrepreted what the list responses said to do. 
 The scenario is:
I have a page with some diacritical characters displayed and a input text box and a 
submit button.  I copy and past one of the displayed characters into the input box and 
then submit.  What is submitted gets echoed back.  The pages use style sheets so I cut 
and pasted the relevant tags, etc.

I thought I found the problem.  My response had a character encoding of null.  I read 
null defaults to 8859-1 which seemed consistent with my echoed page.  So, I explicitly 
set the response character encoding to UTF-8 via the setContentType method.

I used a TCP tunneler to see what my request and responses look like.  My browser is 
set to utf-8 also.

>From the tunneler my request had the following posted data:  v904=%C3%89   this is 
>correct according to how the utf encoding algo was explained.

The http response had the following:

Content-Type: text/html; charset=UTF-8   this is correct.

  is a child in the 
 tag

É ê ë í î ï ð ñ ó 
ô õ ö  these are the listed characters on the previous page I 
cut and past from they are listed on this page just for reference - (#201 = C9) is É.

Accented Characters from  previous 
form:  Ã‰ 
this is echoed back.  #195 = C3 and #137 = 89.  These, of course, are displayed as Ã?.

I checked the browser to be sure and its encoding is still set to utf-8 and it is.  
This is everything I know to check.  What am I missing?




Re: About that alphabetician...

2003-09-25 Thread Michael Everson
At 13:19 -0700 2003-09-25, James Caldwell wrote:

Congratulations!  You have given Unicode a tremendous boost with 
this interview, published in the New York Times!

I am sure it will bring many positive results for our work and for 
your career.
Thank you very much. Please give generously to the Script Encoding 
Initiative http://www.unicode.org/sei if you can.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: About that alphabetician...

2003-09-25 Thread Brian Doyle
Thanks for the tip. There's must be something wrong with my machine. If
anyone has any suggestions for how to troubleshoot this, please email me
privately.

On 9/25/03 1:54 PM, "John Burger" <[EMAIL PROTECTED]> wrote:

> Brian Doyle wrote:
> 
>> The observation that I, the ³Irish (American) colleague,² made to
>> Michael
>> was that there is a sentence in the NYT article displayed in my
>> browser that
>> dropped the OOE7 LATIN SMALL LETTER C WITH CEDILLA (e.g., François).
> 
> The c-cedilla is really there, I see it in three browsers on my Mac
> (Camino, Safari, and IE).
> 
> - John Burger
>  MITRE
> 




Re: Michael Everson in the news

2003-09-25 Thread John Cowan
Eric Muller scripsit:
> See also , 
> which is apparently about SEI.

Interestingly, both this and the NYT article are written by the same
person: Michael Erard.

-- 
John Cowan  [EMAIL PROTECTED]
http://www.ccil.org/~cowan  http://www.reutershealth.com
Thor Heyerdahl recounts his attempt to prove Rudyard Kipling's theory
that the mongoose first came to India on a raft from Polynesia.
--blurb for _Rikki-Kon-Tiki-Tavi_



RE: About that alphabetician...

2003-09-25 Thread Asmus Freytag
At 05:41 PM 9/25/03 +0100, Richard Ishida wrote:
Aha.  Maybe, next time I try to explain it on the plane, I'll say
something like:
"Unicode is a standard for enabling your computer to represent all the
letters of all the alphabets of the world."
Still not terribly accurate and deliberately vague (and could refer in
their mind to characters and/or fonts), but then the average layman
probably wouldn't know or need to know it was innaccurate or vague.
I usually like to say that

"Unicode is a simply a list, you know, like a catalog, where you can find 
all the letters of all the alphabets of the world."

That allows me to segue to the tasks that people perform.

"If all the computers in the world use the same list, you can type in any 
language anywhere and people on the opposite end of the earth can read it."

Why is this good?

"If everybody uses their own list, as used to be the case, very often 
theres a mismatch and instead of text you get garbage, or random letters on 
your screen."

For the longer answer (still for newbies) see the first part of my Unicode 
tutorial.

A./



Re: About that alphabetician...

2003-09-25 Thread John Burger
Brian Doyle wrote:

The observation that I, the “Irish (American) colleague,” made to 
Michael
was that there is a sentence in the NYT article displayed in my 
browser that
dropped the OOE7 LATIN SMALL LETTER C WITH CEDILLA (e.g., François).
The c-cedilla is really there, I see it in three browsers on my Mac 
(Camino, Safari, and IE).

- John Burger
  MITRE



Re: About that alphabetician...

2003-09-25 Thread Michael Everson
At 12:49 -0500 2003-09-25, Brian Doyle wrote:

The observation that I, the "Irish (American) colleague," made to Michael
was that there is a sentence in the NYT article displayed in my browser that
dropped the OOE7 LATIN SMALL LETTER C WITH CEDILLA (e.g., François).
There's nothing in the paragraph in question to indicate that there is a
missing character--nor is there a numeric code displayed for a savvy user to
look up.
I see the ç when I view the page and I'm using Safari as you are.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: About that alphabetician...

2003-09-25 Thread Marco Cimarosti
Michael Everson wrote:
> At 08:33 -0700 2003-09-25, John Hudson wrote:
> 
> >Unicode is an encoding standard for text on computers that allows 
> >documents in any script and language to be entered, stored, edited 
> >and exchanged.
> 
> >>blank stare from layman<<

Unicode is a code in which every letter of every alphabet in the world
corresponds to a number. This numeric code is used to write text inside
computers, because only can be written numbers inside computers. When the
computer shows on the screen the text which it has inside, it draws the
letters corresponding to the Unicode numbers which it has inside.

My 4 years listened to this explanation and said everything was clear.

The only problem is that he now wants to disassemble my computer to see the
numbers it has inside. He thinks that the numbers are stored in the form of
talking ladybugs which would say out the number when you tip on them (he
gained this idea from one of his favorite books: "Learn the Numbers with the
Talking Ladybugs").

_ Marco



Re: About that alphabetician...

2003-09-25 Thread Brian Doyle
Eric,

Forgive my density. I¹m not sure that I understand. Are you arguing that an
ASCII encoding scheme (ISO-8859-1) is not a limitation because,
semantically, all of the characters (a, b, c, etc.) also exist in the
Unicode scheme?

It makes sense to me that ASCII is not a limitation for those documents that
are limited to that character set. But, your own message, ³which contains
U+10DB ? GEORGIAN LETTER MAN and U+092E Ã DEVANAGARI LETTER MA² triggers an
error message in my own email client (Entourage X), namely:

³Some text in this message is in a langauge that your computer cannot
display.²

I¹m not certain if I¹m seeing this because I don¹t possess a font to display
those characters or some other reason. I suspect that this is the reason
because, when I try to look up those character's in OS X's Character
Palette, the Georgian and Devongari Unicode blocks show up blank.

The observation that I, the ³Irish (American) colleague,² made to Michael
was that there is a sentence in the NYT article displayed in my browser that
dropped the OOE7 LATIN SMALL LETTER C WITH CEDILLA (e.g., François).

There's nothing in the paragraph in question to indicate that there is a
missing character--nor is there a numeric code displayed for a savvy user to
look up.

Surely in this context, we would agree that the semantic content was
distorted, yes?

Sincerely,
Brian Doyle
Unicode newbie


On 9/25/03 11:54 AM, "Eric Muller" <[EMAIL PROTECTED]> wrote:

> 
> 
> Michael Everson wrote:
>> An Irish colleague here said he liked the article but noted that the Times'
>> web directors don't use Unicode
>> 
>> 
>>> ...  
>>> 
>>> ...  
>>> 
>>> 
> There is an alternative point of view, which says that charset declared in an
> HTML (or XML) document is no more than an encoding scheme, and that all
> characters in those documents are fundamentally Unicode characters (i.e. they
> start in life with the full semantic of Unicode, they don't inherit it on the
> occasion of character set conversion). That view is supported by the XML spec
> itself, and by the infoset definition. And because we have numeric character
> entities, using an iso-8859-1 encoding scheme is not really a limitation:
> witness this message, which contains U+10DB ? GEORGIAN LETTER MAN and U+092E Ã
> DEVANAGARI LETTER MA.
> 
> Eric.
> 
> 





Re: Questions on Myanmar encoding

2003-09-25 Thread Maung TunTunLwin
Hello Mr. Eric Muller,

> It is in Unicode 4.0, section 10.3, page 273,  and  you can see it  at:
> 

Thank.

> > 1021 1013 1031 101B 102D 1000 1014 1039 200C 1012 1031 102C 1039 200C
101C
> >102C 0020 1042 1048 0020 1042 002C 1040 1040 1040 0020 1000 1030 100A
102E=>
> >"US$28 2,00 ...?" I think help? 1000 1030 100A 102E 1015 102C
> >
> >Just one character wrong 1031on third place should be 1012.
> >
> my original: 1021 1013 1031...
> your correction: 1021 1013 1012 ...
>
> I am a bit confused, and looking more carefully, my new guess is: 1021
> 1019 1031... Apparently, that makes the first word sound like "american".

Sorry, my misstake. It should be second place 1013 -> 1012. You may be right
with your sample but currently $ use with 1012.


> >1010 102D 101B 1005 1039 1006 102C 1014 1039 200C 1025 101A 1039 101A
102C
> >1025 1039 200C 1018 102F 102D 1037 0020 2018 1015 1004 1039 200C 1012
102C
> >1014 102E 2019 0020 101B 1031 102C 1000 1039 200C => " 'PandaNi' for
zoo..."

> I think I understand. Also, I corrected 1018, which should be 101E.

1018 102F 102D 1037 (for), 101E 102F 102D 1037 (to) Both is useable.

> Just to be clear, I am not proposing any modification to the encoding
> model. At best, I can think of clarifications that could help people
> like me, who have limited knowledge of the script.

I am also not to try to change the standard. I am currently trying to figure
out currenting encoding limitations and looking for ways to extend it.

> In another place in your message, you mention that the current model is
> not optimal for sorting. I am not a specialist of sorting, but this is
> not an entirely unusual situation. It is in general not possible to make
> the encoding model such that it is optimal for all processings
> (rendering, sorting, etc.) You may want to check carefully the UCA, to
> see if and how it can handle proper sorting.

Yes I know and thank for your advice.
I'm finally accupting the encoding model is not optimal for rendering and
sorting. But there is still two thing I am still afraid,...
One: encoding model must have abality to quick word cutting for sort, wrap,
search.
-Currently I see posibility with wraping at graphite.
Two: encoding model must useable with current rendering systems or it will
be in paper tiger, (three years!).
-I see it can work with Graphite with intelligent input method. But what
about other system? OpentypeFont doesn't handle line wraping Uniscribe did.
But what about Vowel Sign E (1031) handeling? to move front and back?

Sorry I put up too much feeling.

Maung TunTunLwin
[EMAIL PROTECTED]




RE: About that alphabetician...

2003-09-25 Thread Michael Everson
At 08:33 -0700 2003-09-25, John Hudson wrote:

Unicode is an encoding standard for text on computers that allows 
documents in any script and language to be entered, stored, edited 
and exchanged.

blank stare from layman<<

I think it is best to relate the description to what the layman 
does: he types things, and he edits them and he sends them to other 
laymen. The 'big font' thing is a really bad idea because it is 
completely inaccurate: that's not informing the layman in terms he 
understands, that's misleading him.
Only if you don't follow it up with a second sentence.

I also think it is a good idea to include the word 'encoding', 
because if the rest of one's description is simple it can be a 
useful way to plant new terminology in someone's head.
Honestly it depends what kind of layman you are talking to. Many's 
the time I was beavering away on some proposal or other down the pub, 
and have been accosted with a "what are you doing?"
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: a little more help understanding diacritical encoding

2003-09-25 Thread jon
This is likely an issue with whatever you are using to read and echo back the 
characters. If you just push the exact same bytes back then you will be okay, but 
anything more clever gives you an opportunity to go wrong - especially if you are 
using an API that thinks it knows better than you do.

What are you using to write and run this code?







Re: About that alphabetician...

2003-09-25 Thread Rick McGowan
Michael wrote:

> I was asked how I describe it briefly to laymen. And I usually say
> "Unicode is like a big, giant font that is supposed to contain all
> the letters of all the alphabets of all the languages in the world."

Now, why do you suppose he removed *that* "like" and, like, left in all  
the others!?

Rick




Re: About that alphabetician...

2003-09-25 Thread Eric Muller






Michael Everson wrote:
An Irish colleague
here said he liked the article but noted that the Times' web directors
don't use Unicode
  
  
  ...



...


  

There is an alternative point of view, which says that charset declared
in an HTML (or XML) document is no more than an encoding scheme, and
that all characters in those documents are fundamentally Unicode
characters (i.e. they start in life with the full semantic of Unicode,
they don't inherit it on the occasion of character set conversion).
That view is supported by the XML spec itself, and by the infoset
definition. And because we have numeric character entities, using an
iso-8859-1 encoding scheme is not really a limitation: witness this
message, which contains U+10DB მ GEORGIAN LETTER MAN and U+092E म
DEVANAGARI LETTER MA.

Eric.







a little more help understanding diacritical encoding

2003-09-25 Thread Steve Pruitt
Thanks for the excellent responses.  I now understand how C3 and 89 are derived.  I 
tried getting everything set the way I intrepreted what the list responses said to do. 
 The scenario is:
I have a page with some diacritical characters displayed and a input text box and a 
submit button.  I copy and past one of the displayed characters into the input box and 
then submit.  What is submitted gets echoed back.  The pages use style sheets so I cut 
and pasted the relevant tags, etc.

I thought I found the problem.  My response had a character encoding of null.  I read 
null defaults to 8859-1 which seemed consistent with my echoed page.  So, I explicitly 
set the response character encoding to UTF-8 via the setContentType method.

I used a TCP tunneler to see what my request and responses look like.  My browser is 
set to utf-8 also.

>From the tunneler my request had the following posted data:  v904=%C3%89   this is 
>correct according to how the utf encoding algo was explained.

The http response had the following:

Content-Type: text/html; charset=UTF-8   this is correct.

  is a child in the 
 tag

É ê ë í î ï ð ñ ó 
ô õ ö  these are the listed characters on the previous page I 
cut and past from they are listed on this page just for reference - (#201 = C9) is É.

Accented Characters from  previous 
form:  Ã‰ 
this is echoed back.  #195 = C3 and #137 = 89.  These, of course, are displayed as Ã?.

I checked the browser to be sure and its encoding is still set to utf-8 and it is.  
This is everything I know to check.  What am I missing?



RE: About that alphabetician...

2003-09-25 Thread John Hudson
At 07:11 AM 9/25/2003, Hart, Edwin F. wrote:

I like to say, "Unicode and ISO/IEC 10646 describe a single standard for
representing the world's characters in computers as a series of numbers
(zeros and ones)."
Unicode is an encoding standard for text on computers that allows documents 
in any script and language to be entered, stored, edited and exchanged.

I think it is best to relate the description to what the layman does: he 
types things, and he edits them and he sends them to other laymen. The 'big 
font' thing is a really bad idea because it is completely inaccurate: 
that's not informing the layman in terms he understands, that's misleading 
him. I also think it is a good idea to include the word 'encoding', because 
if the rest of one's description is simple it can be a useful way to plant 
new terminology in someone's head.

I have not seen the article yet -- too little time with ATypI kicking off 
this evening --, but I'm sure Michael did a grand job otherwise.

John Hudson

Tiro Typeworks  www.tiro.com
Vancouver, BC   [EMAIL PROTECTED]
You need a good operator to make type. If it were a
DIY affair the caster would only run for about five
minutes before the DIYer burned his butt off.
  - Jim Rimmer



RE: About that alphabetician...

2003-09-25 Thread Michael Everson
An Irish colleague here said he liked the article 
but noted that the Times' web directors don't use 
Unicode

Is maith liom an t-alt ach tá díomá orm feiceáil nach bfhuil Unicode in
úsaid ag stiurthóirí gréasáin de chuid NYT.
Seo cód an leathanaigh:




...

...
For the World's A B C's, He Makes 1's and 0's

--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: About that alphabetician...

2003-09-25 Thread Michael Everson
At 10:11 -0400 2003-09-25, Hart, Edwin F. wrote:

It is always a challenge to describe technology in terms that the lay person
can understand.
I like to say, "Unicode and ISO/IEC 10646 describe a single standard for
representing the world's characters in computers as a series of numbers
(zeros and ones)."
Indeed. But the layman knows what a font is, and an alphabet. 
"Characters" has to come in the sentence after. ;-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: About that alphabetician...

2003-09-25 Thread Michael Everson
And on tis very day, my copy of Unicode 4.0 has arrived. :-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: W3C Objects To Royalties On ISO Country Codes

2003-09-25 Thread Eric Muller
See also .

Eric.





Re: Michael Everson in the news

2003-09-25 Thread Eric Muller
See also , 
which is apparently about SEI.

Eric.





Re: Unicode Normalisaton Optimisation Experiments

2003-09-25 Thread Peter Kirk
On 25/09/2003 12:27, [EMAIL PROTECTED] wrote:

Is this actually correct? For example, if I have in my data the string 
 (which I know is garbage, but that is irrelevant), that

will decompose and reorder to , as U+05B0 has a

higher combining class (202) than U+05B0 (10). What does this become in 
NFC? Is the reordering reversed and the combination reapplied?
   

First an attempt is made to compose U+0041 and U+05B0. There is no character allowing for this, so that attempt will fail. Then an attempt is made to compose U+0041 and U+0328 which will produce U+0104. U+0041 is replaced with U+0104 and U+0328 is removed resulting in .

It's not a reordering per se, as the first combining character is given the first "opportunity" to combine.
 

Thanks for the clarification.

 

This is not only a theoretical issue as the same applies to some real 
combinations. There was discussion only last week on the bidi list of a 
form which might be encoded  but which would be

messed up if composed into .
   

Yes, NFC would perform that composition. Are you sure it would be an issue? 
Applying bidi rules doesn't seem to make this an issue.

bidi: Al, NSM, NSM
applying rule W1 from USA9:
Al, NSM, NSM -> Al, Al, NSM -> Al, Al, Al.

bidi: Al, NSM
applying rule W1:
Al, NSM -> Al, Al
Or is the issue with something else, but it came up on the bidi list?

 

The problem isn't with the bidi rules but with more general Arabic 
shaping etc. There are two issues, one the position of the hamza (in 
this case it should be to the left of the sukun) and the other that the 
medial form of U+064A has dots below, which are required in this 
combination, but the medial form of U+0626 does not. But I think we 
concluded that U+0654 alone is not suitable for encoding this particular 
hamza.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Michael Everson in the news

2003-09-25 Thread Martin Heijdra
In today's New York Times (Circuits section) there is an article (with
pictures) on Michael Everson and Unicode. Rick McGowan and Deborah Anderson
make guest appearances.

On the net:
http://www.nytimes.com/2003/09/25/technology/circuits/25code.html

Martin J. Heijdra
Chinese Studies/East Asian Studies Bibliographer
East Asian Library and the Gest Collection
Frist Campus Center, Room 314
Princeton University
Princeton, NJ 08544
United States




Re: About that alphabetician...

2003-09-25 Thread Michael Everson
One complaint:

Very interesting. I didn't realize Unicode was a "large font", 
though... I thought it was a character encoding system, distinct 
from fonts, due to the character/glyph model :)
Another complaint:

Some purist will try to kill you for calling Unicode "a big, giant font ..."
I was asked how I describe it briefly to laymen. And I usually say 
"Unicode is like a big, giant font that is supposed to contain all 
the letters of all the alphabets of all the languages in the world."
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Fun with proof by analogy, was Re: Mojibake on my Web pages

2003-09-25 Thread jon
> >>Suppose you made a document and sent it to me via conventional post.
> >>
> >>The last agent handling the document would be the mail carrier.
> >>Does the mail carrier have the right to open the mailing and
> >>replace your document with garbage?
> >>
> >>
> >>
> >>No, however if I receive a letter in the post written in German I'm going
> to ask someone to translate it rather than try to cope with a language (c.f.
> encoding) I don't understand.
> >>  
> >>
> Yes, if that's what you ask for. But as I know some German I may prefer 
> to do my own translation. And if the recipient is a German who knows no 
> English, they certainly aren't going to be amused if their letters get 
> translated whether they want them to be or not. So the mail carrier 
> should do this only if specifically asked to do so.
> 

Indeed. Remember the problem here isn't a server performing translation, 
transliteration or re-encoding - but rather a server misidentifying an encoding (hence 
my analogy of the translator having a nervous break-down, that and the fact that the 
image struck me as funny).

However to enable a correctly functioning server to perform such re-encoding *when 
asked to do so* we have to have the rule that HTTP-headers over-ride embedded 
self-description for text-based formats. This causes problems in cases like those 
described, but not when the webserver has a rough idea of what the hell it is doing.

One could argue against the rule of headers having precedence on the basis that it is 
brittle, but it is no more brittle than trusting copy-and-paste  elements which 
are also likely to be wrong (trust me I've seen enough that my anecdotal experience is 
approaching statistical validity).

But one day it will all be Unicode... one day...







Re: Fun with proof by analogy, was Re: Mojibake on my Web pages

2003-09-25 Thread Peter Kirk
On 25/09/2003 10:51, [EMAIL PROTECTED] wrote:

Suppose you made a document and sent it to me via conventional post.

The last agent handling the document would be the mail carrier.
Does the mail carrier have the right to open the mailing and
replace your document with garbage?
   

No, however if I receive a letter in the post written in German I'm going to ask someone to translate it rather than try to cope with a language (c.f. encoding) I don't understand.
 

Yes, if that's what you ask for. But as I know some German I may prefer 
to do my own translation. And if the recipient is a German who knows no 
English, they certainly aren't going to be amused if their letters get 
translated whether they want them to be or not. So the mail carrier 
should do this only if specifically asked to do so.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: Unicode Normalisaton Optimisation Experiments

2003-09-25 Thread jon
> Is this actually correct? For example, if I have in my data the string 
>  (which I know is garbage, but that is irrelevant), that
> 
> will decompose and reorder to , as U+05B0 has a
> 
> higher combining class (202) than U+05B0 (10). What does this become in 
> NFC? Is the reordering reversed and the combination reapplied?

First an attempt is made to compose U+0041 and U+05B0. There is no character allowing 
for this, so that attempt will fail. Then an attempt is made to compose U+0041 and 
U+0328 which will produce U+0104. U+0041 is replaced with U+0104 and U+0328 is removed 
resulting in .

It's not a reordering per se, as the first combining character is given the first 
"opportunity" to combine.

> This is not only a theoretical issue as the same applies to some real 
> combinations. There was discussion only last week on the bidi list of a 
> form which might be encoded  but which would be
> 
> messed up if composed into .

Yes, NFC would perform that composition. Are you sure it would be an issue? Applying 
bidi rules doesn't seem to make this an issue.

bidi: Al, NSM, NSM
applying rule W1 from USA9:
Al, NSM, NSM -> Al, Al, NSM -> Al, Al, Al.


bidi: Al, NSM
applying rule W1:
Al, NSM -> Al, Al

Or is the issue with something else, but it came up on the bidi list?







Re: need help understanding diacritical encoding

2003-09-25 Thread jon
> I have a form that posts diacritical characters.Ffor example, when my browser
> has the encoding set to utf-8 and the form posts the character É
> the post data has these two bytes C3 and 89, which when echoed back on a new
> page is displayed as Ã?.  Can someone explain when the character is converted
> to two bytes how I get C3 and 89?
> 

UTF-8 is explained in section 3.9 of the Unicode standard and elsewhere (RFC 2279 is a 
heavily-referenced document, note that its description includes the encoding of 
codepoints outside of the Unicode range).

É is U+00C9 and in binary that is:

11001001

UTF-8 encoding results in different numbers of bytes depending on how many bits you 
have when you remove the leading zeros (8 bits in this case - resulting in two bytes).

It then puts those bits from the codepoint into bytes as so:


 0xxx -> 0xxx
0yyy yyxx -> 110y 10xx
 yyxx -> 1110 10yy 10xx
000u  yyxx -> 0uuu 10uu 10yy 10xx

In the case of U+00C9 the second of these is the shortest form possible, so it is 
used. The bits 00011 are placed in 110y to give you 1111 (0xC3) and the bits 
001001 are placed in 10x to give you 10001001 (0x89).

The problem is that this didn't happen when the bytes went back out again - rather the 
bytes where interpreted as being part of a string encoded in some other way (most 
likely ISO 8859-1, which certainly would produce à followed by a control character 
from those bytes). It may be that all you need to do is to correctly report the 
encoding, by sending a HTTP header of the mime-type and charset (some server-side APIs 
make this easy, e.g. in ASP you would use Response.Charset = "utf-8"). It may be that 
you need to do futher work (depending on just what it is you are doing with the form).







Re: Unicode Normalisaton Optimisation Experiments

2003-09-25 Thread Peter Kirk
On 24/09/2003 14:58, Jon Hanna wrote:

... For example since following the decomposition  ->  there can be no character that is unblocked from the U+0041 that will combine with it, hence there is no circumstance in which they will not be recombined to U+0104 and hence dropping that decomposition from the data will not affect NFC (the relevant data would still have to be in the composition table, as the sequence  might occur in the source code).

 

Is this actually correct? For example, if I have in my data the string 
 (which I know is garbage, but that is irrelevant), that 
will decompose and reorder to , as U+05B0 has a 
higher combining class (202) than U+05B0 (10). What does this become in 
NFC? Is the reordering reversed and the combination reapplied?

This is not only a theoretical issue as the same applies to some real 
combinations. There was discussion only last week on the bidi list of a 
form which might be encoded  but which would be 
messed up if composed into .

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: Mojibake on my Web pages

2003-09-25 Thread jon
> > Maybe including a BOM would help the browser realise something was
> > awry, but it's just as likely to think the author just wrote an
> > invalid document that began with 

I really have to stop using this web-2-mail app, it managed to mangle my 
representation of a mangled BOM!

> I've been told, hee hee hee, that the one thing I must NEVER NEVER do in
> a Web page is to begin it with a BOM.  But I admit I haven't tried that
> yet.  How funny would that be if it solved the problem?

Well with UTF-16 you really should use a BOM but with UTF-8 there was a bit of a 
debate which finally settled on the opinion that it was OK to do so. However browsers 
are not necessarily going to agree with that opinion. Anyway, it couldn't hurt to try.







Fun with proof by analogy, was Re: Mojibake on my Web pages

2003-09-25 Thread jon
> Suppose you made a document and sent it to me via conventional post.
> 
> The last agent handling the document would be the mail carrier.
> Does the mail carrier have the right to open the mailing and
> replace your document with garbage?

No, however if I receive a letter in the post written in German I'm going to ask 
someone to translate it rather than try to cope with a language (c.f. encoding) I 
don't understand.

Besides what is happening here isn't the server replacing the document with garbage, 
it's the server mis-identifying what the document is - analogous either to our 
hypothetical translater having a break-down, insisting that all of our mail was german 
and handing us non-sequitors as "translations", or with the postal service getting the 
delivery wrong (which is something that has certainly happened to my mail).

> 
> An analogy:
> 
> Author = Host
> Document = Wine
> Reader = Guest
> Server = Cup
> 
> If the host pours a cup of wine for the guest, would we allow a
> mere cup to adulterate our wine?

The argument only holds as much as the analogies hold (both the analogy with snail 
mail and the one you actually refer to as an analogy). These analogies do hold in 
certain cases, and the case that started the thread is an example, but it does not 
hold in the general case. In other scenarios better analogies would be:

Author = Scribe
Document = Draft
Reader = em, Reader
Server = Editor.

Or Author = scattered data sources of varying degrees of reliability - Server = 
researcher.

In general, from the browser's perspective the server is the author (which may or may 
not be an accurate view of what goes on "behind the scenes"). Re-encoding, if done 
right, can be very useful in making web documents more widely accessible.

Of course we'll soon be able to just rely on assuming that every step in the process 
can understand UTF-8 and UTF-16 :)







Re: Unicode Normalisaton Optimisation Experiments

2003-09-25 Thread jon
> > Hi,
> > I'm currently experimenting with various trade-offs for Unicode
> normalisation code. Any comments on these (particularly of the "that's
> insane, here's why, stop now!" variety) would be welcome.
> 
> You might want to look at, if not even use, the ICU open-source
> implementation:
> 
> http://oss.software.ibm.com/icu/
> http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/unorm.cpp

I did, but when I started this I was more interested in simply comparing various 
optimisations as a study into the related techniques. However I recently hit a 
practical need for such code for another task, and while it's nice that I've a bunch 
of "work" code already done as "fun" code maybe I should just use ICU...

> > The second is an optimisation of both speed and size, with the
> disadvantage that data cannot be shared between NFC and NFD operations (which
> is perhaps a reasonable trade in the case of web code which might only need NFC
> code to be linked). In this version decompositions of stable codepoints are
> ommitted from the decompositon data. For example since following the
> decomposition  ->  there can be no
> character that is unblocked from the U+0041 that will combine with it, hence
> there is no circumstance in which they will not be recombined to U+0104 and
> hence dropping that decomposition from the data will not affect NFC (the
> relevant data would still have to be in the composition table, as the sequence
>  might occur in the source code).
> 
> Sounds possible and clever. As far as I remember, ICU uses the normalization
> quick check flags 
> (Unicode properties) to determine much of this, and should achieve the same in
> most cases.

The above would supplement use of quick check - indeed it would be a way of 
implementing the concept of "stable codepoints" that the UTR suggests using with quick 
check.







Re: need help understanding diacritical encoding

2003-09-25 Thread Stephane Bortzmeyer
On Wed, Sep 24, 2003 at 02:37:20PM -0400,
 Steve Pruitt <[EMAIL PROTECTED]> wrote 
 a message of 8 lines which said:

> I have a form that posts diacritical characters. For example, when
> my browser has the encoding set to utf-8
 ^
 OK

> and the form posts the character É the post data has these two bytes
> C3 and 89,

It seems reasonable, "0xC3 0x89" is UTF-8 for É.

> which when echoed back on a new page is displayed as Ã?.  

Your Web browser cannot properly display UTF-8 (it is probably
configured to display as Latin-1). The exact solution depends on it.

*or*

Your Web server sent back the reply as UTF-8 but tagged it as
Latin-1. Check the HTTP headers to be sure.



Re: Mojibake on my Web pages

2003-09-25 Thread Doug Ewell
 wrote:

> Maybe including a BOM would help the browser realise something was
> awry, but it's just as likely to think the author just wrote an
> invalid document that began with 

I've been told, hee hee hee, that the one thing I must NEVER NEVER do in
a Web page is to begin it with a BOM.  But I admit I haven't tried that
yet.  How funny would that be if it solved the problem?

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/