I'm looking for Chinese and Arabic tokenizers. We've been using Stanford's for a while but it has downfalls. The Chinese mode loads its statistical models very slowly. The Arabic mode stems the resulting tokens. The coup de grace is that their latest jar update (9 days ago) was compiled run only with Java 1.8.
So, with the exception of Stanford, what choices are available for Chinese and Arabic that you're finding worthwhile? Thanks! Tom _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support