Not sure if my comments are relevant, just feeling inclined to expose my
ignorance --

> Character set difficulties are still a real problem, but so is dynamic
> text.  Damian Conway's paper
> 
>    "An Algorithmic Approach to English Pluralization"
>    http://www.csse.monash.edu.au/~damian/papers/HTML/Plurals.html
> 
> contains some fairly complicated tools for generating dynamically-
> pluralized English.  Now generalize that tool set for multiple
> languages and/or more complex variations.  Right.

Japanese is one of those languages that has relatively few specifically
plural forms. To get the pluralizations right in Japanese, the program
would have to consult a dictionary.

> In my current work, I am generating user-specific explanations for the
> permission and ownership information in (roughly) an "ls -al" listing.
> That is, the user gets three paragraphs, saying (a) what the effect of
> these permissions is on the user, (b) how this was derived, and (c)
> what the item's permissions are, as a whole. 

I see the reason for the interest in automatic pluralization there.

Pluralization could probably be ignored for this purpose for Japanese,
but, if the purpose is to produce text that the technically un-inclined
can parse reasonably effortlessly, there are all sorts of other context
related issues, most of which would require not just vocabulary
dictionaries, but idiom dictionaries as well. And your locale machinery would
have to include some sensitivity to dialect issues and social status
issues, to make the generated text natural and non-offending.

Japanese is becoming more egalitarian, more homogenized, and less
colorful, so those who work on such things are aiming at a moving target.

Thinking about the recognizer side, did anyone mention that Japanese
text does not use word delimiters? Space has a somewhat different
meaning for Japanese.

-- 
Joel Rees <[EMAIL PROTECTED]>

Reply via email to