Can you open a bug report to make this configureable, so we don't forget this? 
E.g. StandardTokenizer is able to change this.

Thanks,
Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: jm [mailto:jmugur...@gmail.com]
> Sent: Wednesday, April 21, 2010 3:59 PM
> To: java-user@lucene.apache.org
> Subject: Re: are long words split into up to 256 long tokens?
> 
> oh, yes it does extend CharTokenizer..thanks Ahmet. I had searched
> lucene source code for 256 and found nothing suspicious, and that was
> itself suspicious cause it looked clearly like an inner limit. Of
> course I should have searched for 255...
> 
> I'll see how I proceed cause I don't want to use a custom build.
> 
> On Wed, Apr 21, 2010 at 3:50 PM, Ahmet Arslan <iori...@yahoo.com>
> wrote:
> >> Is 256 some inner maximum too
> >> in some
> >> lucene internal that causes this? What is happening is that
> >> the long
> >> word is split into smaller words up to 256 and then the min
> >> and max
> >> limit applied. Is that correct? I have removed LengthFilter
> >> and still
> >> see the splitting at 256 happen. I would like not to have
> >> this, and
> >> removed altogheter any word longer than max, wihtout
> >> decomposing into
> >> smaller ones. Is there a way to achieve this?
> >>
> >> Using lucene 3.0.1
> >
> >
> > Assuming your Tokenizer extends CharTokenizer:
> >
> > CharTokenizer.java has this field:
> > private static final int MAX_WORD_LEN = 255;
> >
> > you can modify CharTokenizer.java according to your needs.
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to