On Dec 31, 2007 7:54 PM, Michael McCandless <[EMAIL PROTECTED]>
wrote:

> I actually think indexing should try to be as robust as possible.  You
> could test like crazy and never hit a massive term, go into production
> (say, ship your app to lots of your customer's computers) only to
> suddenly see this exception.  In general it could be a long time before
> you "accidentally" our users see this.
>
> So I'm thinking we should have the default behavior, in IndexWriter,
> be to skip immense terms?
>
> Then people can use TokenFilter to change this behavior if they want.
>

+1

At first I saw this similar to IndexWriter.setMaxFieldLength(), but it was
a wrong comparison, because #terms is a "real" indexing/serarch
characteristic that many applications can benefit from being able
to modify, whereas a huge token is in most cases a bug.

Just to make sure on the scenario - the only change is to skip too long
tokens, while any other exception is thrown (not ignored.)

Also, for a skipped token I think the position increment of the
following token should be incremented.

Reply via email to