Make sure you're separating the issues here...
Uh, sorry the issue hasn't really much to do with your patch except that it is in roughly the same region of code... I should have started a new thread.
Probably more than just Western, here's a Japanese
description:
http://java.sun.com/j2se/1.3/ja/docs/ja/api/java/text/BreakIterator.html
Yeah, UTR-14 can handle japanese. However, it states explicitely it needs context in order to determine whether certain characters are handled as "alphabetic" (no breaks in between) or "ideographic" (break opportunity in between). I can't see how BreakIterator gets this.
Yes. See http://java.sun.com/j2se/1.3/docs/api/index.html
Darn! Didn't thought of the online docs!
Again, I don't know the code, but it may be a good idea.
Well, the problem is: BreakIterator returns on break opportunities. How would this fit into the LM framework?
They don't deal with hyphenation, unfortunately.Anything that is based on Sun Java code (and an official standard like UTR14) probably makes our life much easier--anyone has a complaint about the hyphenation decisions can go complain to Sun about them!
J.Pietschmann