Re: Unicode on a website

2000-09-24 Thread Doug Ewell

Elaine Keown <[EMAIL PROTECTED]> wrote:

> Is there some automatic procedure that will happen soon, where a new
> UTF-8 will come out that has all the Hebrew symbols from Unicode 2.0
> and 3.0?   Does the increase in size of the Hebrew character set
> interact with UTF-8 in some negative way?

UTF-8 is just a way of expressing Unicode.  Any of the 1,114,112
possible code points in Unicode, whether assigned or not, can be
expressed in UTF-8.  What this means is that there is no such thing as
"a new UTF-8" that contains more characters than some previous UTF-8.

If you find Web pages that don't have these additional characters (and
you feel that they should), the problem is simply that the page was
written using an earlier version of Unicode, or that the author of the
page was unaware that the characters has been added.  This has nothing
to do with any limitation of UTF-8.

-Doug Ewell
 Fullerton, California



Re: Unicode on a website

2000-09-24 Thread Michael \(michka\) Kaplan

From: "Elaine Keown" <[EMAIL PROTECTED]>

> I'm interested in using the more recent Unicode Hebrew versions on Web
sites.  These versions have about 30 more symbols for Hebrew Bible text than
the original Unicode from the early 90s.
>
> But the UTF-8 versions I found on the Web only seem to have the early 90s
version of Hebrew, and it doesn't have these 30 extra symbols.
>
> How does this work?  Is there some automatic procedure that will happen
soon, where a new UTF-8 will come out that has all the Hebrew symbols from
Unicode 2.0 and 3.0?   Does the increase in size of the Hebrew character set
interact with UTF-8 in some negative way?

Hi Elaine,

This is not a UTF-8 issue at all; it is an issue with font support. As soon
as a font supports the code points, you will see it display things
approproately.

If you consider the number of code points needed for some scripts, the ones
required for scripts such as Hebrew is really no big deal. :-)

michka

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/




Re: Unicode on a website

2000-09-24 Thread Elaine Keown

Hello,

I'm interested in using the more recent Unicode Hebrew versions on Web sites.  These 
versions have about 30 more symbols for Hebrew Bible text than the original Unicode 
from the early 90s.

But the UTF-8 versions I found on the Web only seem to have the early 90s version of 
Hebrew, and it doesn't have these 30 extra symbols.  

How does this work?  Is there some automatic procedure that will happen soon, where a 
new UTF-8 will come out that has all the Hebrew symbols from Unicode 2.0 and 3.0?   
Does the increase in size of the Hebrew character set interact with UTF-8 in some 
negative way?

Thanks, Elaine

___

Free Unlimited Internet Access! Try it now! 
http://www.zdnet.com/downloads/altavista/index.html

___




RE: TATAP => TATAR

2000-09-24 Thread Carl W. Brown

Cathy,

I have found four references to support your contention.  A reference to it
using a Latin script, another to "Azari-Arabic, Azari-Cyrilic &
Azari-Turkish", I found  a Mac font system but I don't have a Mac to try it
and I installed a True Type font that seems to produce both a dotted and an
accented i.

BTW I also found that it seems that there is a movement to Latinize Uigur as
well that started about 1960.

Carl


-Original Message-
From: Cathy Wissink [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, September 19, 2000 10:22 AM
To: 'Carl W. Brown'; Unicode List
Subject: RE: TATAP => TATAR


I believe Azeri also uses the dotless i/dotted i Turkish-style casing.

Cathy

-Original Message-
From: Carl W. Brown [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, September 19, 2000 9:03 AM
To: Unicode List
Subject: RE: TATAP => TATAR


>-Original Message-
>From: Herman Ranes [mailto:[EMAIL PROTECTED]]
>Sent: Tuesday, September 19, 2000 6:30 AM
>To: Unicode List
>Cc: [EMAIL PROTECTED]
>Subject: Re: TATAP => TATAR


>Several Tatar language links here:
>http://members.tripod.com/~anttikoski/eng_tatar.html

>In particular, the Tatar-Bashkir latin alphabet is presented in RFE/RL's
>site at
>http://rferl.org/bd/tb/tatar/TATAR/abs.html

>Are all these characters supported in UNICODE?

I was unaware that they were moving back to the Latin alphabet.
What jumps out at me is that case conversion code like the code that I just
submitted for inclusion into ICU is wrong.  Turkish is not the only language
with dotted and dot less i.  I assume that Tatar and Bashkir should follow
the same rules as Turkish. Are there other languages?

So I guess that I should check for "ba", "tt" & "tr" for special case
shifting.  I presume that the alphabet is listed in proper sort order?

Carl





(no subject)

2000-09-24 Thread woodmailit



please remove me from this 
list 


Re: Can anyone help me!!!

2000-09-24 Thread Michael \(michka\) Kaplan

From: "James Kass" <[EMAIL PROTECTED]>

> > IE 5.5 support all of the Unicode Indian scripts.
> > I just tried it on a couple of Devanagari sites
> > because the English Windows comes with mangal
> > true type font.
>
> May we see links to some of those pages?

Here are a few such pages:

http://www.trigeminal.com/index.asp?1081
http://www.trigeminal.com/frmrpt2dap.html?1081
http://www.trigeminal.com/frmrpt2dap_readme.htm?1081

They all use an explicit style for fonts in a CSS:

{ font-family:Mangal,Code2000,Arial Unicode MS;
 font-size:12pt; }

Mangal I put in first since it is included in Windows 2000 and Arial Unicode
MS I include last as the feedback I have gotten has found that Code2000
looks much better than it does for several Indic scripts.

michka

a new book on internationalization in VB at
http://www.i18nWithVB.com/





FTP and UTF-8

2000-09-24 Thread Frank da Cruz

Does anybody know of a publicly accessible FTP server that supports
RFCs 2389 (negotiation of new features) and 2640 (internationalization)?
Preferably one that allows anonymous uploads (for testing purposes)?

In case you're not aware of these RFCs, they provide for UTF-8 based FTP.

Thanks!

- Frank




Re: Can anyone help me!!!

2000-09-24 Thread James Kass


Here's a page about Indic scripts in Unicode which
offers some pointers:

http://www.tamil.net/people/sivaraj/unicode.html

Carl W. Brown had written about testing Unicode
Devanagari support by visiting some web pages.

> IE 5.5 support all of the Unicode Indian scripts.  
> I just tried it on a couple of Devanagari sites 
> because the English Windows comes with mangal
> true type font.

May we see links to some of those pages? 

Best regards,

James Kass,



- Original Message - 
From: "sanatan mohanty" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Saturday, September 23, 2000 5:50 AM
Subject: Can anyone help me!!!


> 
> 
>   Dear Friends!.
> 
>   How are you!.
> 
>   i  have a project to make a webpage, which will be unicode enable. i can
> show indian language fonts. i can type those fonts on the webpage itself
> on text boxes!. and it should be atleast work on netscape and windows
> explorer!, and atleast LINUX and Windows OS supports it!.
> 
>   so, can u people give me some brief ideas abt keyboard mapping, unicode
> font setting, dispay setting
> 
> 
>   i will be grateful to you all for your help..
> 
>   waiting for you kind response..
> 
> Regards,
> 
>  Sanatan
> 
> 






Re: Unicode on a website: ? Devanagari

2000-09-24 Thread Steven R. Loomis

You will find examples of Devanagari on the ICU locale explorer pages..
http://oss.software.ibm.com/icu/demo/

Try Marathi, Konkani, and Hindi.

The encoding should be UTF-8 by default or you can change it at the
bottom of the page.

Hindi especialy has an extensive but incomplete list of translated
language and country names.

-s



RE: Unicode on a website

2000-09-24 Thread Doug Ewell

"Carl W. Brown" <[EMAIL PROTECTED]> wrote:

> scsu makes sense for large blocks of data.  Send the frame work in
> utf-8 but use HTTP to request the bulk data in scsu.  If it is a
> small amount of data you don't want to pay the overhead of the
> compression.

SCSU was intentionally designed to be extremely low in overhead.  This
is one of the main differences between SCSU and most other compression
schemes.

> You don't need a BOM with UTF-8.

Not for byte-ordering purposes, but it is often handy as a signature.
Auto-detection of UTF-8 is not difficult, but not foolproof either --
there are legitimate sequences of Latin-1 characters that look like
UTF-8.  Using the signature EF BB BF at the beginning of a file is a
more reliable indication that the file is UTF-8.

-Doug Ewell
 Fullerton, California



Re: Unicode on a website: ? Devanagari

2000-09-24 Thread James Kass


Perhaps because of poor support, there don't seem to
be any substantial works in Devanagari Unicode on 
the web.  Aside from test pages or charts, like the
ones found on Alan Wood's excellent site, the best
bet would seem to be to make your own pages.

Naidunia uses dynamic fonts, but with a non-Unicode
mapping.  Mark Leisher has a perl script to convert
Naidunia's pages to Unicode.

http://clr.nmsu.edu/~mleisher/nai.html

Best regards,

James Kass,


- Original Message - 
From: "Carl W. Brown" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Saturday, September 23, 2000 9:58 PM
Subject: RE: Unicode on a website: ? Devanagari


Chris,

Just came across an interesting site: http://www.hclrss.demon.co.uk/unicode/

Follow some of the links.

Carl

-Original Message-
From: Christopher J. Fynn [mailto:[EMAIL PROTECTED]]
Sent: Saturday, September 23, 2000 1:15 PM
To: Unicode List
Subject: Re: Unicode on a website: ? Devanagari



Anyone know of any Devanagari documents (Sanskrit, Hindi, Nepali) on the Web
using UTF-8 (other than the pages at
http://titus.uni-frankfurt.de/unicode/samples/rvbeispx.htm ) - especially any
using Dynamic fonts?

I am not interested in Devanagri sites using font based encodings.

- Chris