oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched lucene source code for 256 and found nothing suspicious, and that was itself suspicious cause it looked clearly like an inner limit. Of course I should have searched for 255...
I'll see how I proceed cause I don't want to use a custom build. On Wed, Apr 21, 2010 at 3:50 PM, Ahmet Arslan <iori...@yahoo.com> wrote: >> Is 256 some inner maximum too >> in some >> lucene internal that causes this? What is happening is that >> the long >> word is split into smaller words up to 256 and then the min >> and max >> limit applied. Is that correct? I have removed LengthFilter >> and still >> see the splitting at 256 happen. I would like not to have >> this, and >> removed altogheter any word longer than max, wihtout >> decomposing into >> smaller ones. Is there a way to achieve this? >> >> Using lucene 3.0.1 > > > Assuming your Tokenizer extends CharTokenizer: > > CharTokenizer.java has this field: > private static final int MAX_WORD_LEN = 255; > > you can modify CharTokenizer.java according to your needs. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org