Hi. I'm a newbie at Language Processing. Then, I'm wondering that what kind of data is suitable for training corpus.
for english, what is the best for training corpus? On Tue, Apr 5, 2011 at 4:16 PM, Jörn Kottmann <[email protected]> wrote: > On 4/5/11 8:25 AM, Toshiya TSURU wrote: >> >> Hi. >> >> I'm a software developer in Tokyo,Japan. >> I found that RapidMiner uses OpenNLP for its tokenization process. >> >> But, the token given by RapidMiner is strange. >> Because There is no Tokenizer model for Japanese. >> >> Although I've checked the page below, >> The models For Japanese is not found. >> http://opennlp.sourceforge.net/models-1.5/ >> >> How can I get Japanese model? >> Or Can I create one? >> > Currently we do not have support for Japanese, but > we would be happy to add it. > > Do you know a training corpus we could use? > > Jörn > > > -- Toshiya TSURU <[email protected]> http://twitter.com/turutosiya
