I'm looking for Chinese and Arabic tokenizers. We've been using 
Stanford's for a while but it has downfalls. The Chinese mode loads its 
statistical models very slowly. The Arabic mode stems the resulting 
tokens. The coup de grace is that their latest jar update (9 days ago) 
was compiled run only with Java 1.8.

So, with the exception of Stanford, what choices are available for 
Chinese and Arabic that you're finding worthwhile?

Thanks!
Tom
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to