Can you open a bug report to make this configureable, so we don't forget this? E.g. StandardTokenizer is able to change this.
Thanks, Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: jm [mailto:jmugur...@gmail.com] > Sent: Wednesday, April 21, 2010 3:59 PM > To: java-user@lucene.apache.org > Subject: Re: are long words split into up to 256 long tokens? > > oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched > lucene source code for 256 and found nothing suspicious, and that was > itself suspicious cause it looked clearly like an inner limit. Of > course I should have searched for 255... > > I'll see how I proceed cause I don't want to use a custom build. > > On Wed, Apr 21, 2010 at 3:50 PM, Ahmet Arslan <iori...@yahoo.com> > wrote: > >> Is 256 some inner maximum too > >> in some > >> lucene internal that causes this? What is happening is that > >> the long > >> word is split into smaller words up to 256 and then the min > >> and max > >> limit applied. Is that correct? I have removed LengthFilter > >> and still > >> see the splitting at 256 happen. I would like not to have > >> this, and > >> removed altogheter any word longer than max, wihtout > >> decomposing into > >> smaller ones. Is there a way to achieve this? > >> > >> Using lucene 3.0.1 > > > > > > Assuming your Tokenizer extends CharTokenizer: > > > > CharTokenizer.java has this field: > > private static final int MAX_WORD_LEN = 255; > > > > you can modify CharTokenizer.java according to your needs. > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org