New Vietnamese character set standard

2002-04-11 Thread Markus Kuhn

I was just informed that Vietnam has recently published TCVN 6909:2001
"16-bit Coded Vietnamese Character Set", which is to be implemented for
data interchange with and within government agencies as of 2002-07-01:

  http://www.undp.org.vn/unicode/

It is a very small UCS subset:

# Plane 00
# Rows  Positions (Cells)

  0020-7E A0 C0-C3 C8-CA CC-CD D2-D5 D9-DA DD E0-E3 E8-EA EC-ED F2-F5
  00F9-FA FD
  0110-11 28-29 68-69 A0-A1 AF-B0
  0300-03 06 09 1B 23
  1EA0-F9
  201C-1D

# Number of characters in above table: 238

So I guess all we have to do to conform to the new Vietnamese government
character encoding requirement is to add suitable keyboard definitions
and make sure there are ISO 10646-1 fonts with suitable coverage
available.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




asc2utf; utf2asc

2002-04-11 Thread H. Peter Anvin

I have posted a pair of Perl scripts which convert ASCII files with
C-style escape sequences (\u, \U, and \x for invalid bytes or
sequences) and back, at:

ftp://ftp.zytor.com/pub/hpa/asc2utf
ftp://ftp.zytor.com/pub/hpa/utf2asc

... no guarantees, you break it you buy it... :)

-hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt<[EMAIL PROTECTED]>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: UTF-8 file to ASCII file converter

2002-04-11 Thread H. Peter Anvin

Followup to:  <[EMAIL PROTECTED]>
By author:Pedro Ferreira <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
>
> I already have a perl script (thanks to Oyvind A.
> Holm) that converts an ascii file with U+ unicode
> codes to an utf-8 file.
> Now I would like to do the oposite, convert an utf-8
> file to an ascii file, each utf-8 character would be
> encoded back to U+. Many thanks in advance for any
> help!
> 

You'd probably be better off using C-like escape codes \u and
\U with \ escaped as \\.

-hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt<[EMAIL PROTECTED]>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: 3.2 MAPPINGS/EASTASIA

2002-04-11 Thread Glenn Fowler


On Mon, 08 Apr 2002 01:37:20 -0700 Edward Cherlin wrote:
> Does anybody know what the iconv developers have in mind for this? 
> The man page at 
> http://www.research.att.com/sw/tools/uwin/man/man1/iconv.html 
> lists shift-jis, shift_jis, euc-jp, 
> Extended_UNIX_Code_Packed_Format_for_Japanese, x-euc-jp, x-sjis, 
> _iso-2022-jp, but does not say what JIS standards are supported in 
> these encodings. 

these are from the registry

/reg/local_machine/SOFTWARE/Classes/MIME/Database/Charset/

so check the m$ docs

-- Glenn Fowler <[EMAIL PROTECTED]> AT&T Labs Research, Florham Park NJ --

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/