Re: [MacPerl] Re: problem with Japanese text

Dan Kogai Thu, 27 Mar 2003 21:13:30 -0800

On Friday, Mar 28, 2003, at 11:37 Asia/Tokyo, Joel Rees wrote:

Not sure if my comments are relevant, just feeling inclined to expose my ignorance --

And here is mine.

Japanese is one of those languages that has relatively few specifically
plural forms. To get the pluralizations right in Japanese, the program
would have to consult a dictionary.

More exactly speaking, Japanese has no plural form in a sense of Indo-European languages. Japanese totally lacks subject-verb agreement so you don have to delete the "es" in "does" when you change the subject form "s/he" to "they".

On the other hand, counting can be tricky even for natives. The very name of numbers changes depending on what you count. When you count people it goes hito-ri, futa-ri, san-nin but when you count object it goes hito-tsu, futa-tsu (or ik-ko, ni-ko,) and the list goes on (I think this "number-object agreement" came from Chinese).

But when the number is not an issue, you can totally forget if a subject is singular or plural.

Pluralization could probably be ignored for this purpose for Japanese, but, if the purpose is to produce text that the technically un-inclined can parse reasonably effortlessly, there are all sorts of other context related issues, most of which would require not just vocabulary dictionaries, but idiom dictionaries as well. And your locale machinery would have to include some sensitivity to dialect issues and social status issues, to make the generated text natural and non-offending.

I feel Japanese is a hard language to compose because of that but that also makes Japanese easier to read because Japanese tend to include not only "what to say" but also "in what situation by what kind of person says". In English the singular nominative pronoun is nothing but "I", no matter how old or young you are or whether you are a boy or a girl (or a computer). But in Japanese it can be "Watashi" or "Boku" or "Ore" or "Maro" or "Warawa" or "Sessha" or "Jibun" or "Ware" .... even English "me" can be used.

Maybe to compensate this complexity, Japanese grammar seems much simpler. No subject-verb agreement, very few irregular verbs.... It is far easier to compose a grammatically correct Japanese. It gets darn hard once you aim for social and political correctness.

Japanese is becoming more egalitarian, more homogenized, and less colorful, so those who work on such things are aiming at a moving target.

Less colorful I am not sure because at the same time the newer, simple, and more boring expressions are pervasive, the old and more complex expressions hardly die. So in total Japanese is getting richer. Well, the total richness of Japanese is I believe is increasing but ironically "per-capita" richness might not be. But I believe this phenomenon is not unique to Japanese; could be even more prevalent in English. If you don't believe me just compare the Two Bushes in White House :)

Thinking about the recognizer side, did anyone mention that Japanese
text does not use word delimiters? Space has a somewhat different
meaning for Japanese.

Japanese tokenization is nothing but a trivial issue. In Japanese the very notion of "a word" is often moot. Nevertheless, we do have good enough tokenizers to implement input methods and search engines. Of course they are not perfect but the Japanese are very frank about the lack of perfection. After all we don't even have "de jure standard Japanese" to compare.

Dan the Man with Too Many Languages to Deal with

Re: [MacPerl] Re: problem with Japanese text

Reply via email to