The tokenizer assumes it can always split on white spaces. So it will not work without modifying this code.
You could hack it by replacing all whitespaces with a special character in your training and test data. For which language do you need that? Jörn On Sat, Feb 11, 2012 at 6:46 PM, Lee Hinman <[email protected]>wrote: > Hey Guys, > > I'm trying to train a tokenizer that ignores spaces and only uses <SPLIT> > to determine where to split. I wasn't able to find anything in the > javadocs, is this possible with OpenNLP? If so, could someone point me in > the right direction regarding it? > > - Lee Hinman
