Re: Switching to UTF-8

2002-05-05 Thread Pablo Saratxaga

Kaixo!

On Mon, May 06, 2002 at 10:11:34AM +0900, Tomohiro KUBOTA wrote:

> Note for xkb experts who don't know Hiragana/Katakana/Hangul:
> input methods of these scripts need backtracking.  For example,
> in Hangul, imagine I hit keys in the c-v-c-v (c: consonant,
> v: vowel) sequence.  When I hit c-v-c, it should represent one
> Hangul syllable "c-v-c".  However, when I hit the next v, it
> should be two Hangul syllables of "c-v c-v". 

That is only the case with 2-mode keyboard; with 3-mode keyboard there
is no ambiguity, as there are three groups of keys V, C1, C2; allowing
for all the possible combinations: V-C2, C1-V-C2. Eg: there are two keys
for each consoun: one for the leading syllab consoun, and one for the
ending syllab consoun. (I think the small round glyph to fill an empty
place in a syllab is always at place C2, that is, c-v is always written
C1-V-C2 with a special C2 that is not written in latin transliteration) 

> In Hiragana/Katakana, processing of "n" is complex (though
> it may be less complex than Hangul).

No. The "N" is just a kana like any other, no complexity at all involved.
Complexity only happens when typing in latin letters. That is why
the use of transliteration typing will always require an input
method anyways, it cannot be handled with just Xkb.


> 
> ---
> Tomohiro KUBOTA <[EMAIL PROTECTED]>
> http://www.debian.or.jp/~kubota/
> "Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:  http://mail.nl.linux.org/linux-utf8/

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/PGP Key available, key ID: 0x8F0E4975

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-05 Thread Tomohiro KUBOTA

Hi,

At Sun, 5 May 2002 19:12:31 -0400 (EDT),
Jungshik Shin wrote:

> > I believe that you are kidding to say about such a limitation.
> > Japanese language has much less vowels and consonants than Korean,
> > which results in much more homonyms than Korean.  Thus, I think
> 
>   Well, actually it's due to not so much the difference in
> the number of consonants and vowels as  the fact that Korean has
> both closed and open syllables while Japanese has only open syllables
> that makes Japanese have a lot more homonyms than Korean.

You may be right.  Anyway, the true reason is that Japanese
language has a lot of words from old Chinese.  These words
which are not homonyms in Chinese will be homonyms in Japanese.
(They may or may not be homonys in Korea.  I believe that 
Korean also has a lot of Chinese-origin words.)  Since a way to
coin a new word is based on Kanji system, Japanese language
would lose vitality without Kanji.

>   I don't think Japanese will ever do, either.  However, I'm afraid
> having too many homonyms is a little too 'feeble' a 'rationale' for
> not being able to convert to all phonetic scripts like Hiragana and
> Katakana.
> ...

Since I don't represent Japanese people, I don't say whether it is
a good idea or not to have many homonyms.  You are right, there
are many other reasons for/against using Kanji and I cannot 
explain everything.

Japanese pronunciation does have troubles, though it is widely
helped by accents or rhythms.  However, in some cases, none of
accesnts or context can help.  For example, both science and
chemistry are "kagaku" in japanese.  So we sometimes call
chemistry as "bakegaku", where "bake" is another reading of
"ka" for chemistry.  Another famous confusing pair of words
is "private (organization)" and "municipal (organization)",
which is called "shiritu".  Thus, "private" is sometimes
called "watakushiritu" and "municipal" is called "ichiritu",
again these alias names are from different readings of kanji.
If you listen to Japanese news programs every day, you will
find these examples some day.

These days more and more Japanese people want to learn more
Kanji to use their abundance of power of expression, though
I am not one of these Kanji learners.


>   I also like to know whether it's possible with Xkb.  BTW, if
> we use three-set keyboards (where leading consonants and trailing
> consonants are assigned separate keys) and use U+1100 Hangul Conjoining
> Jamos, Korean Hangul input is entirely possible with Xkb alone.

Note for xkb experts who don't know Hiragana/Katakana/Hangul:
input methods of these scripts need backtracking.  For example,
in Hangul, imagine I hit keys in the c-v-c-v (c: consonant,
v: vowel) sequence.  When I hit c-v-c, it should represent one
Hangul syllable "c-v-c".  However, when I hit the next v, it
should be two Hangul syllables of "c-v c-v". 

In Hiragana/Katakana, processing of "n" is complex (though
it may be less complex than Hangul).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-05 Thread Jungshik Shin



On Sun, 5 May 2002, Tomohiro KUBOTA wrote:

> At 02 May 2002 23:54:37 +1000,
> Roger So wrote:

> > I _do_ think xkb is sufficient for Japanese though, if you limit
> > "Japanese" to only hiragana and katagana. ;)
>
> I believe that you are kidding to say about such a limitation.
> Japanese language has much less vowels and consonants than Korean,
> which results in much more homonyms than Korean.  Thus, I think

  Well, actually it's due to not so much the difference in
the number of consonants and vowels as  the fact that Korean has
both closed and open syllables while Japanese has only open syllables
that makes Japanese have a lot more homonyms than Korean.

> native Japanese speakers won't decide to abolish Kanji.

  I don't think Japanese will ever do, either.  However, I'm afraid
having too many homonyms is a little too 'feeble' a 'rationale' for
not being able to convert to all phonetic scripts like Hiragana and
Katakana. The easiest counter argument to that is how Japanese speakers
can tell which homonym is meant in oral communication if Kanji is so
important to disambiguate among homonyms. They don't have any Kanjis to
help them, (well, sometimes you may have to write down Kanjis to break
the ambiguity in the middle of conversation, but I guess it's mostly
limited to proper nouns). I heard that they don't have much trouble
because the context helps a listener a lot with figuring out which
of many homonyms is meant by a speaker. This is true in any language.
Arguably, the same thing could help readers in written communication.
Of course, using logographic/ideographic characters like Kanji certainly
helps readers very much and that should be a very good reason for Japanese
to keep Kanji in their writing system.

  English writing system is also 'logographic' in a sense (so is modern
Korean orthography in pure Hangul as it departs from the strict agreement
between pronunciation and spelling )  and a spelling reform (to make
English have a similar degree of the agreement between spelling and
pronunciation as to that in Spanish) would make it harder to read written
text depriving English written text of its 'logographic' nature. On the
other hand, it would help learners  and writers. It's always been struggle
between readers vs writers and listeners vs speakers

> xkb can be used.  However, more than half of Japanese computer
> users use Romaji-kana conversion, two-keys-one-hiragana/katakana
> method.  The complexity of the algorithm is like two or three-key
> input method of Hangul, I think.  Do you think such an algorithm
> can be implemented as xkb?  If yes, I think Romaji-kana conversion
> (whose complexity is like Hangul input method) can be implemented
> as xkb.

  I also like to know whether it's possible with Xkb.  BTW, if
we use three-set keyboards (where leading consonants and trailing
consonants are assigned separate keys) and use U+1100 Hangul Conjoining
Jamos, Korean Hangul input is entirely possible with Xkb alone.

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-05 Thread Roger So

On Sun, 2002-05-05 at 21:00, Tomohiro KUBOTA wrote:
> At 02 May 2002 23:54:37 +1000,
> Roger So wrote:
> > Note that the source from Li18nux will try to use its own encoding
> > conversion mechanisms on Linux, which is broken.  You need to tell it to
> > use iconv instead.
> 
> I didn't know that because I am not a user of IIIMF nor other Li18nux
> products.  How it is broken?

The csconv library that IIIMF comes with doesn't work properly (at least
I didn't get it to work), possibly because of endianess issues.  csconv
is meant to be a cross-platform replacement for iconv.

> > Maybe I should attempt to package it for Debian again, now that woody is
> > almost out of the way.  (I have the full IIIMF stuff working well on my
> > development machine.)
> 
> I found that Debian has "iiimecf" package.  Do you know what it is?

It's the IIIM Emacs Client Framework.  As the name implies, it's an
implementation of an IIIM client in Emacs.  I've never tried it out, as
I don't use Emacs. :)

Is it used by anyone?  Last time I checked, popularity-contest said
nobody was using it...

> > I _do_ think xkb is sufficient for Japanese though, if you limit
> > "Japanese" to only hiragana and katagana. ;)
> 
> I believe that you are kidding to say about such a limitation.
> Japanese language has much less vowels and consonants than Korean,
> which results in much more homonyms than Korean.  Thus, I think
> native Japanese speakers won't decide to abolish Kanji.
> (Please don't be kidding in international mailing list, because
> people who don't know about Japanese may think you are talking
> about serious story.)

Sorry, it wasn't meant to be a serious comment. :)

Cheers

Roger
-- 
  Roger So Debian Developer
  Sun Wah Linux Limitedi18n/L10n Project Leader
  Tel: +852 2250 0230  [EMAIL PROTECTED]
  Fax: +852 2259 9112  http://www.sw-linux.com/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Re: Switching to UTF-8

2002-05-05 Thread Tomohiro KUBOTA

Hi,

At 02 May 2002 23:54:37 +1000,
Roger So wrote:

> Note that the source from Li18nux will try to use its own encoding
> conversion mechanisms on Linux, which is broken.  You need to tell it to
> use iconv instead.

I didn't know that because I am not a user of IIIMF nor other Li18nux
products.  How it is broken?


> Maybe I should attempt to package it for Debian again, now that woody is
> almost out of the way.  (I have the full IIIMF stuff working well on my
> development machine.)

I found that Debian has "iiimecf" package.  Do you know what it is?


> I don't think xkb is sufficient because (1) there's a large number of
> different Chinese input methods out there, and (2) most of the input
> methods require the user to choose from a list of candidates after
> preedit.
> 
> I _do_ think xkb is sufficient for Japanese though, if you limit
> "Japanese" to only hiragana and katagana. ;)

I believe that you are kidding to say about such a limitation.
Japanese language has much less vowels and consonants than Korean,
which results in much more homonyms than Korean.  Thus, I think
native Japanese speakers won't decide to abolish Kanji.
(Please don't be kidding in international mailing list, because
people who don't know about Japanese may think you are talking
about serious story.)

Even if we limit to input of hiragana/katakana, xkb may not be
sufficient.  For one-key-one-hiragana/katakana method, I think
xkb can be used.  However, more than half of Japanese computer
users use Romaji-kana conversion, two-keys-one-hiragana/katakana
method.  The complexity of the algorithm is like two or three-key
input method of Hangul, I think.  Do you think such an algorithm
can be implemented as xkb?  If yes, I think Romaji-kana conversion
(whose complexity is like Hangul input method) can be implemented
as xkb.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/


--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/