RE: Introducing the idea of a ROMAN VARIANT SELECTOR (was: Re: Proposing Fraktur)

2002-02-01 Thread Oliver Christ

Hi, 

Ken wrote: 

 frakturDas sinkende Schiff sandte/fraktur SOSfraktur-Rufe./fraktur
 or conversely, perhaps better:
 Das sinkende Schiff sandte antiquaSOS/antiqua-Rufe.

at the end, it may be more useful to rather markup the semantics than
formatting properties, i.e.

This is not a question of foreign origin=DEZeitgeist/foreign.

It is the responsibility of the rendering engine (style sheet, ...) to map
that markup to whatever font/script/typeface should be used, according to
users' (or typesetters') preferences, current environment and purpose. 

- The author or some post-authoring process would (hopefully ;-) ) have the
knowledge about where the linguistic expression originates from and can
apply appropriate (semantic) markup, but doesn't need to care about
typesetting conventions (which the author may not be expert in).

- The rendering engine/typesetter doesn't need to have any linguistic
information (such as a database of loan words), but only needs to know how
to map foreign content to formatting properties in a given context. 

- Third, depending on the environment and purpose, different stylistic
conventions may be necessary for the same linguistic expression (fraktur in
one document, no special formatting in another) so that any
formatting-oriented markup (or encoding, for that matter) will potentially
reduce the reusability of the document.


Cheers, Oli

Oliver Christ
TRADOS GmbH
Stuttgart




[partly off-topic] A specialized kind of website, a teleutopia webspace.

2002-02-01 Thread William Overington

The recent sending of attachments in this unicode discussion group has led
me to think once again about my idea for a specialized type of website.  In
view of the fact that, although I can do some client side JavaScript, I have
no knowledge of server side scripting, I do not know whether my idea is
feasible or, if it is feasible, whether it is a relatively quick task for
someone who has the right skills or a major task.  Although the idea was
originated as a suggested infrastructural tool for the construction of
distance education packages by an informal team of people located around the
world, it also potentially has applications for this unicode community, so I
wondered if perhaps some of the participants of this discussion group might
perhaps be willing to comment.  If the Unicode Consortium would like to
implement the idea, then great.

Here is the idea in general terms.  It is called a teleutopia webspace: the
word teleutopia has five syllables, tel - eu - top - i - a and is formed by
joining the prefix tel- to the word eutopia.  A teleutopia webspace can be
used to produce a teleutopia of people individually working at a distance in
an informal manner to produce a combined result.

All users of the web would be able to access a website, say, for an example
here to explain the idea, www.somewhere.com and upon reaching that site an
automated system would generate and display a web page that includes two
lists of files, each file name in each list provided as a hyperlink, as the
home page of the website.  The first list is a list of all of the files that
have a .htm suffix that are in the home directory of the website at that
time.  The second list is a list of all files that have any suffix other
than .htm that are in the home directory of the website at that time.

Registered users of the www.somewhere.com website are able to send emails,
each email being an email that has one and only one attachment, to
[EMAIL PROTECTED] which is an automated receiving system.  The automated
receiving system takes the attachment and stores it in the home directory of
the www.somewhere.com webspace, either under the name that the attachment
carried or, if that name is already taken, under the next sequentially
available name of a local standard naming system.

The idea is that a registered user of the facility can look through the
www.somewhere.com website to find graphics and web pages to which to link as
hypertext links, generate on his or her local computer a .htm file including
those graphics and links, then email the .htm file as an attachment to
[EMAIL PROTECTED] whereupon it will be received and placed in the home
directory of the www.somewhere.com website, and thus be shown on the home
page of www.somewhere.com for each subsequent web access, by anyone, of the
www.somewhere.com website.

That is only a simple example of use.  A person could add new graphics, Java
applets and so on to the www.somewhere.com website by this email method, and
then use them in his or her own page and thus also make them available for
other registered users of the www.somewhere.com website.

This example uses a simple scenario where the information in the email other
than the attachment is ignored.  If someone implements the idea of a
teleutopia webspace then he or she might possibly consider using the
information in the email to add other features, such as sending the original
sender of the email a deletion code so that he or she may later delete a
particular file or send an updated version of it.  If someone does implement
such features could he or she please note that one possible use of a
teleutopia webspace is so that people can submit a portfolio of work for
assessment for a distance education qualification, in which case the files
need to be sent as undeletable so that any review or assessment is based
on the files submitted at a particular time and that the provenance of such
a review or assessment cannot be retrospectively undermined by the files
which were reviewed or assessed being altered: so, if a deletion facility is
included, please also implement an option that a submitted file may be
stated to be undeletable and so marked in the list that appears on the
www.somewhere.com webspace.

Such a website might perhaps be useful to the participants in this
discussion group, so that when several people each send in graphics of
glyphs for a discussion, a web page could be constructed that showed all of
the graphics displayed on one page, together with hyperlinks to relevant
documents that are either in the same webspace or are available at other
sites on the web.

William Overington

1 February 2002

www.users.globalnet.co.uk/~ngo











Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Mark Leisher


Dan FYI I have reported this brain-dead mapping problem to Unicode
Dan Consortium but never got an answer.  Well, they are not public
Dan society in a way they charge for the membership to say anything.  One
Dan of the reasons so many Japanese love to hate Unicode...

This kind of false information is why many Japanese continue to love to hate
Unicode.  If you were actually on the Unicode mailing list, you wouldn't be
repeating garbage like this.

Sign up and send a message about the mapping tables.  You will get an answer.
-
Mark LeisherOrthodoxy, of whatever color, seems to
Computing Research Lab  demand a lifeless, imitative style.
New Mexico State University
Box 30001, Dept. 3CRL  -- Politics and the English Language,
Las Cruces, NM  88003 George Orwell




Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Dan Kogai

On 2002.02.02, at 00:32, Jarkko Hietaniemi wrote:
So far as I see Linux iconv is ascii-preservative while ICS's is
 Unicode-strict.
From Perl's point of view ASCII preservative should be default.

 Why?

   I have already answered in the previous mail (Subject:More on Unicode 
Mappings, Message-Id: [EMAIL PROTECTED]
) but this one is important so let me repeat.

  With a good reason.  The original mapping of Unicode renders any 
 (EUC|JIS|SHIFTJIS)-written perl scripts (or C codes) unusable.  In 
 Japan '\' has been mapped to Yen mark (Because it happened to be at 
 localizable area in ASCII.  I believe localizable area in ASCII is 
 causing a lot of headache for such folks as Danish which exploits this 
 feature to fullest extent).  So source codes in Japan comes with lots 
 of yen marks instead of backslash.

   My very first implementation of Jcode did use Unicode table as is and 
this problem was address in less than hour of release so I fixed it.

Dan





Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Dan Kogai

On 2002.02.01, at 23:57, Mark Leisher wrote:
 Dan FYI I have reported this brain-dead mapping problem to Unicode
 Dan Consortium but never got an answer.  Well, they are not public
 Dan society in a way they charge for the membership to say 
 anything.  One
 Dan of the reasons so many Japanese love to hate Unicode...

 This kind of false information is why many Japanese continue to love to 
 hate
 Unicode.  If you were actually on the Unicode mailing list, you 
 wouldn't be
 repeating garbage like this.

 Sign up and send a message about the mapping tables.  You will get an 
 answer.

   I have signed up to [EMAIL PROTECTED] a long ago and I thought I did 
since I am still getting invitation to conferences and such.  But I 
checked [EMAIL PROTECTED] and it did subscribe my address again instead 
of getting an error message saying I have already subscribed.  Hmm  
Anyway,  I have resubscribed so here I go
   Okay.  Here is. let me begin with the original message.  Sorry for 
repetition, folks in [EMAIL PROTECTED]

 On 2002.02.01, at 19:24, Nick Ing-Simmons wrote:
 As part of the mystery of CJK encodings I notice that IBM's ICU's uconv
 and SuSE6.4 linux iconv differ as to the UTF-8 representation if 
 table.euc

 Both converters will round-trip with themselves and give byte exact
 copy of table.euc

 Weirdly they differ in how they map '\' and '~' in ASCII space as
 well as some spots in higher characters.

   Oh, yes.  This is the problem of the original Unicode 2.x map;  It is 
 not ASCII preservative.  I have posted this problem to perl-
 [EMAIL PROTECTED] when I first released Jcode.  Several discussions 
 later, I made Jcode so that it preserves ASCII by default and added 
 $Jcode::Unicode::PEDANTIC to change the behavior
   Here is the exerpt from Jcode::Unicode

 VARIABLES
$Jcode::Unicode::PEDANTIC
When set to non-zero, x-to-unicode conversion becomes
pedantic.  That is, '\' (chr(0x5c)) is converted to
zenkaku backslash and '~ (chr(0x7e)) to JIS-x0212
tilde.

By Default, Jcode::Unicode leaves ascii ([0x00-0x7f])
as it is.

 Linux iconv will not take ICU's UTF-8.
 ICU's uconv will read the iconv output but does produce same as 
 original
 table.euc.

   So far as I see Linux iconv is ascii-preservative while ICS's is 
 Unicode-strict.
   From Perl's point of view ASCII preservative should be default.
   FYI I have reported this brain-dead mapping problem to Unicode 
 Consortium but never got an answer.  Well, they are not public society 
 in a way they charge for the membership to say anything.   One of the 
 reasons so many Japanese love to hate Unicode...

 Our current euc-jp.ucm is compatible with Linux iconv.

   Right choice.

 Dan the Man with So Many Charsets to Deal With

   Now let me repeat the same question I have asked a long ago.  Why is 
the Unicode - JISX2xxx map remains so that it does not preserve ASCII 
part?  Despite the fact most converters ignores the original map and 
leaves ASCII part as is?
   One more question.  Where has the contents in 
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ gone?

_  Dan Kogai
   __/    CEO, DAN co. ltd.
  /__ /-+-/  2-8-14-418 Shiomi Koto-ku Tokyo 135-0052 Japan
/--/--- mailto: [EMAIL PROTECTED] / http://www.dan.co.jp/ -
__/  /Tel:+81 3-5665-6131   Fax:+81 3-5665-6132
  GPG Key: http://www.dan.co.jp/~dankogai/dankogai.gpg.asc





Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Dan Kogai
On 2002.02.02, at 00:37, Nick Ing-Simmons wrote:
   Oh, yes.  This is the problem of the original Unicode 2.x map;  It is
 not ASCII preservative.  I have posted this problem to perl-
 [EMAIL PROTECTED] when I first released Jcode.  Several discussions
 later, I made Jcode so that it preserves ASCII by default and added
 $Jcode::Unicode::PEDANTIC to change the behavior

 Ah. I take your point. If we used ICU's pedantic form
 Both UNIX ~/foo and MS C:\Foo get mangled.

EXACTLY!

 The other differences (having looked at diff in yudit) seems to be
 mapping (I"(B (cent),(I#(B (pound) ,(I,(B (not) and one of the longer dashes to
 different width variants (full width for ICU).

 I am going off ICU ...

   As I addressed to [EMAIL PROTECTED],  Yet another problems that 
ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ is now gone so I don't 
have a practical way to check the mapping.  I want the mapping back!

Dan


Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Mark Leisher


Dan As I addressed to [EMAIL PROTECTED], Yet another problems that
Dan ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ is now gone so I
Dan don't have a practical way to check the mapping.  I want the mapping
Dan back!

*Sigh*  Readme.txt, which *is* in the Public/MAPPINGS/EASTASIA/ directory
states:

The entire former contents of this directory are obsolete and have been
 moved to the OBSOLETE directory.  The latest information may be found
 in the Unihan.txt file in the latest Unicode Character Database.
 August 1, 2001.

-
Mark LeisherOrthodoxy, of whatever color, seems to
Computing Research Lab  demand a lifeless, imitative style.
New Mexico State University
Box 30001, Dept. 3CRL  -- Politics and the English Language,
Las Cruces, NM  88003 George Orwell




Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Mark Leisher


Nick ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE

Nick ***HOWEVER** if you use the NON-INTUTIVE URL:

Nick http://ftp.unicode.org/Public/MAPPINGS/

Nick one gets redirected to

Nick http://www.unicode.org/Public/MAPPINGS/

Nick which is as you state.

Quite right.  The change to web access happened a couple years ago and I
didn't pay attention to the URL, assuming it was web-based.

Nick A URL to the location of the Unihan.txt file would be more helpful.

Indeed.  It is easy to locate on http://www.unicode.org, but is directly
available from ftp://www.unicode.org/Public/3.1-Update1/Unihan-3.1.1.txt.gz.
-
Mark LeisherOrthodoxy, of whatever color, seems to
Computing Research Lab  demand a lifeless, imitative style.
New Mexico State University
Box 30001, Dept. 3CRL  -- Politics and the English Language,
Las Cruces, NM  88003 George Orwell




RE: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Marco Cimarosti

Dan Kogai wrote:
As I addressed to [EMAIL PROTECTED],  Yet another problems that 
 ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ is now gone 
 so I don't 
 have a practical way to check the mapping.  I want the mapping back!

The Unicode site is a little bit labyrinthic, sometimes.

The web version of the data seems more up to date than the ftp site. But
don't bother to go on http://www.unicode.org/Public/MAPPINGS/EASTASIA/,
because it only contains a note which reads:

 The entire former contents of this directory are obsolete and have been
moved to the OBSOLETE directory.  The latest information may be found
in the Unihan.txt file in the latest Unicode Character Database.
August 1, 2001. 

And don't bother to download the 23 Mb
http://www.unicode.org/Public/UNIDATA/Unihan.txt file, because it contains
only mappings for kanji's.

So, go directly to
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/, where you can
find the old data, along with a note about mapping errors:

 [...]
Below is some analysis by Asmus Freytag of specific problems raised by T.
Kubota in this document:
http://www.debian.or.jp/~kubota/unicode-symbols.html
[...]
The following are available as Full Width characters in the FFxx range.
Therefore, the mappings of these characters are incorrect. This appears to
be a *mapping file issue* as far as these characters are concerned
FILE JIS0208.TXT--
0x2140  U+005C  Na  # REVERSE SOLIDUS
0x215D  U+2212  N  # MINUS SIGN
0x2171  U+00A2  Na  # CENT SIGN
0x2172  U+00A3  Na  # POUND SIGN
0x224C  U+00AC  Na  # NOT SIGN
[...]
FILE JIS0212.TXT--
0x2243  U+00A6  Na  # BROKEN BAR
0x2234  U+00AF  Na  # MACRON
0x2237  U+007E  Na  # TILDE
[...] 

I don't know if this helps solving your issues.

_ Marco




Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Mark Davis
ICU's pedantic form

The goal for ICU is to be charset neutral, and support all of the
conversions that are in modern use. There are a large number of
variants of character sets; you can use the one you want. See:

http://oss.software.ibm.com/icu/charset/index.html

Mark

- Original Message -
From: "Dan Kogai" [EMAIL PROTECTED]
To: "Nick Ing-Simmons" [EMAIL PROTECTED]
Cc: "Nick Ing-Simmons" [EMAIL PROTECTED]; "SADAHIRO Tomoyuki"
[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Friday, February 01, 2002 07:46
Subject: Re: ICU's uconv vs Linux iconv and UTF-8


 On 2002.02.02, at 00:37, Nick Ing-Simmons wrote:
Oh, yes.  This is the problem of the original Unicode 2.x map;
It is
  not ASCII preservative.  I have posted this problem to perl-
  [EMAIL PROTECTED] when I first released Jcode.  Several
discussions
  later, I made Jcode so that it preserves ASCII by default and
added
  $Jcode::Unicode::PEDANTIC to change the behavior
 
  Ah. I take your point. If we used ICU's pedantic form
  Both UNIX ~/foo and MS C:\Foo get mangled.

 EXACTLY!

  The other differences (having looked at diff in yudit) seems to be
  mapping $B!V(B (cent),$B!W(B (pound) ,$B%c(B (not) and one of the longer
dashes to
  different width variants (full width for ICU).
 
  I am going off ICU ...

As I addressed to [EMAIL PROTECTED],  Yet another problems that
 ftp://ftp.unicode.org/Public/MAPPINGS/EASTASIA/ is now gone so I
don't
 have a practical way to check the mapping.  I want the mapping back!

 Dan





Re: GB 18030 question

2002-02-01 Thread Michael Everson

At 11:23 -0800 2002-02-01, Deborah Goldsmith wrote:
There is an error on page 10 of the GB 18030-2000 standard, in that 
the character with code point A3FE maps to U+FFE3 (FULLWIDTH 
MACRON), but is shown with a glyph that corresponds to U+FF5E 
(FULLWIDTH TILDE). The position of the character in its code block 
would also seem to indicate that tilde was intended.

Does anyone have any idea of which should be considered correct, the 
glyph or the Unicode mapping value?

Glyphs are informative in JTC1. I can only assume that the GB 
standards would follow suit.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: RE: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Rick McGowan

Marco wrote...

 The web version of the data seems more up to date than the ftp site.

They are the same files, available through different protocols!

Rick




GB 18030 question

2002-02-01 Thread Deborah Goldsmith

There is an error on page 10 of the GB 18030-2000 standard, in that the 
character with code point A3FE maps to U+FFE3 (FULLWIDTH MACRON), but is 
shown with a glyph that corresponds to U+FF5E (FULLWIDTH TILDE). The 
position of the character in its code block would also seem to indicate 
that tilde was intended.

Does anyone have any idea of which should be considered correct, the 
glyph or the Unicode mapping value?

Deborah Goldsmith
Manager, Fonts  Language Kits
Apple Computer, Inc.
[EMAIL PROTECTED]





RE: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Yves Arrouye

 As part of the mystery of CJK encodings I notice that IBM's ICU's 
 uconv and SuSE6.4 linux iconv differ as to the UTF-8 representation 
 if table.euc

 Both converters will round-trip with themselves and give byte exact 
 copy of table.euc

 Weirdly they differ in how they map '\' and '~' in ASCII space as 
 well as some spots in higher characters.

That is understandable if they use different tables. The question is which
one is the right EUC-JP, and which one do users want? ICU, as well as
iconv, could have two tables with the different mappings. The question then
is how to label them, and whether the labeling should be compatible between
the two.

 Linux iconv will not take ICU's UTF-8.
 ICU's uconv will read the iconv output but does produce same as
 original
 table.euc.

I find the same statement confusing. Are you saying that uconv's UTF-8 is
ill-formed? Nick, Would you mind email me (and just me, not the list) your
table.euc sample file?

Thanks,
YA







Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Dan Kogai

I'll answer this one.

On 2002.02.02, at 03:28, Yves Arrouye wrote:
 That is understandable if they use different tables. The question is 
 which
 one is the right EUC-JP, and which one do users want? ICU, as well as
 iconv, could have two tables with the different mappings. The question 
 then
 is how to label them, and whether the labeling should be compatible 
 between
 the two.

   I don't know which one is 'right'.  But most practical and widely-used 
(euc-jp) is as follows;

\x00 - \x7f Maps to US-ASCII
\xa1a1   - \xfefe   Maps to JISX-0208 (aka Zenkaku)
\x8ea1   - \x8edf   Maps to JISX-0201 (aka Hankaku)

   In addition, extended form of euc-jp also includes;

\x8fa1a1 - \x8ffefe Maps to JISX-0212

   That's what iconv, Tcl's *.enc, and my humble Jcode think what euc-jp 
is.

 I find the same statement confusing. Are you saying that uconv's UTF-8 
 is
 ill-formed? Nick, Would you mind email me (and just me, not the list) 
 your
 table.euc sample file?

   Go get Jcode.pm via http://search.cpan.org/search?dist=Jcode and check 
under t/ directory.  You can find table.euc and x0212.euc.

Dan





Re: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Mark Davis \(jtcsv\)

It is definitely a problem to try to interpret what any given label is
supposed to be. The problem is that MIME labels and others are
ambiguous, and are interpreted different ways on different systems.

MIME/IANA is the best registry we have, but there are a number of
significant problems:

- because for most mappings there is no published mapping in the
registry to
and from Unicode/10646 it is not clear, and certainly not easy, to
figure
out exactly what the unambiguous decoding is.

- in practice, the industry does NOT interpret the same bytes the same
way;
example, you will get different decodings from SJIS on different
platforms.

One of the current projects under development for an upcoming release
of ICU is to have a more precise API, where you can pass in a label
AND a platform (AND version), and get what the platform interprets
that label to mean. That way you can ask for EUC-JP as interpreted
on, say, Solaris.

Mark
—

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο 
πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

- Original Message -
From: Nick Ing-Simmons [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; SADAHIRO Tomoyuki [EMAIL PROTECTED]
Sent: Friday, February 01, 2002 10:21
Subject: Re: ICU's uconv vs Linux iconv and UTF-8


 Mark Davis [EMAIL PROTECTED] writes:
 ICU's pedantic form
 
 The goal for ICU is to be charset neutral, and support all of the
 conversions that are in modern use. There are a large number of
 variants of character sets;


 Fair enough - but as shipped (I downloaded it earlier this week)
 it comes with a convrtrs.txt which maps MIME's EUC-JP onto
 something it calls ibm-33722 which has the behaviour I reported in
at
 the start of this thread.

 you can use the one you want.

 It is not a question of which _I_ want - it is a question of which
one(s)
 CJK perl users want/expect/need.

 In so far a _I_ want any particular one it is the one which is going
 to match the X11 font encoding so I can in my naive westerner's way
 see what it looks like - and I have not a clue which one that is ...

 See:
 
 http://oss.software.ibm.com/icu/charset/index.html

 I huge list and I don't see how to grep it for the provenance of
 the table (not that many seem to have any).

 So can the experts - ideally native reading experts not theorists -
tell
 me which ICU (or other open source) table(s) they want/expect/need,
 or failing that which ones have proven troublesome.

 There seem to be at least 4 EUC-JP mappings in that list
 AIX, Solaris, glibc and Java

 If we cannot get any answers quickly then I think Dan is correct -
 we should un-bundle the whole CJK encoding stuff from the core
into
 a family of CPAN modules.

 Which gives me a design choice:

 A. Bundle a pragmatic set of CJK which are fast and causes least
build
pain for non CJK users (i.e. compact precompiled form)

 B. Make it as easy as possible for end-user to drop in a new
encoding
from (say) a .ucm file.

 I can obvioulsy try for both - but they seem to be pulling in
opposite
 directions at present.

 Meanwhile I will go fix the bugs in the core's :encoding logic ...

 --
 Nick Ing-Simmons
 http://www.ni-s.u-net.com/







RE: ICU's uconv vs Linux iconv and UTF-8

2002-02-01 Thread Yves Arrouye

 It is definitely a problem to try to interpret what any given label is
 supposed to be. The problem is that MIME labels and others are
 ambiguous, and are interpreted different ways on different systems.

Still, in the meantime it does make sense to have EUC-JP associated to the
most common interpretation of it, doesn't it? Just for the sake of user
satisfaction?

I am curious: is there a better name for the EUC-JP that ICU is using,
that would make everybody understand which one it is? If so, we could have
EUC-JP for the one that the rest of the world wants.

YA