Not sure if my comments are relevant, just feeling inclined to expose my ignorance --
> Character set difficulties are still a real problem, but so is dynamic > text. Damian Conway's paper > > "An Algorithmic Approach to English Pluralization" > http://www.csse.monash.edu.au/~damian/papers/HTML/Plurals.html > > contains some fairly complicated tools for generating dynamically- > pluralized English. Now generalize that tool set for multiple > languages and/or more complex variations. Right. Japanese is one of those languages that has relatively few specifically plural forms. To get the pluralizations right in Japanese, the program would have to consult a dictionary. > In my current work, I am generating user-specific explanations for the > permission and ownership information in (roughly) an "ls -al" listing. > That is, the user gets three paragraphs, saying (a) what the effect of > these permissions is on the user, (b) how this was derived, and (c) > what the item's permissions are, as a whole. I see the reason for the interest in automatic pluralization there. Pluralization could probably be ignored for this purpose for Japanese, but, if the purpose is to produce text that the technically un-inclined can parse reasonably effortlessly, there are all sorts of other context related issues, most of which would require not just vocabulary dictionaries, but idiom dictionaries as well. And your locale machinery would have to include some sensitivity to dialect issues and social status issues, to make the generated text natural and non-offending. Japanese is becoming more egalitarian, more homogenized, and less colorful, so those who work on such things are aiming at a moving target. Thinking about the recognizer side, did anyone mention that Japanese text does not use word delimiters? Space has a somewhat different meaning for Japanese. -- Joel Rees <[EMAIL PROTECTED]>