ok https://issues.apache.org/jira/browse/LUCENE-2407
On Wed, Apr 21, 2010 at 4:18 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Can you open a bug report to make this configureable, so we don't forget > this? E.g. StandardTokenizer is able to change this. > > Thanks, > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -----Original Message----- >> From: jm [mailto:jmugur...@gmail.com] >> Sent: Wednesday, April 21, 2010 3:59 PM >> To: java-user@lucene.apache.org >> Subject: Re: are long words split into up to 256 long tokens? >> >> oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched >> lucene source code for 256 and found nothing suspicious, and that was >> itself suspicious cause it looked clearly like an inner limit. Of >> course I should have searched for 255... >> >> I'll see how I proceed cause I don't want to use a custom build. >> >> On Wed, Apr 21, 2010 at 3:50 PM, Ahmet Arslan <iori...@yahoo.com> >> wrote: >> >> Is 256 some inner maximum too >> >> in some >> >> lucene internal that causes this? What is happening is that >> >> the long >> >> word is split into smaller words up to 256 and then the min >> >> and max >> >> limit applied. Is that correct? I have removed LengthFilter >> >> and still >> >> see the splitting at 256 happen. I would like not to have >> >> this, and >> >> removed altogheter any word longer than max, wihtout >> >> decomposing into >> >> smaller ones. Is there a way to achieve this? >> >> >> >> Using lucene 3.0.1 >> > >> > >> > Assuming your Tokenizer extends CharTokenizer: >> > >> > CharTokenizer.java has this field: >> > private static final int MAX_WORD_LEN = 255; >> > >> > you can modify CharTokenizer.java according to your needs. >> > >> > >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org