Re: DocumentsWriter.checkMaxTermLength issues

Doron Cohen Mon, 31 Dec 2007 08:38:27 -0800

On Dec 31, 2007 6:10 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Dec 31, 2007 5:53 AM, Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> > Doron Cohen <[EMAIL PROTECTED]> wrote:
> > > I like the approach of configuration of this behavior in Analysis
> > > (and so IndexWriter can throw an exception on such errors).
> > >
> > > It seems that this should be a property of Analyzer vs.
> > > just StandardAnalyzer, right?
> > >
> > > It can probably be a "policy" property, with two parameters:
> > > 1) maxLength, 2) action: chop/split/ignore/raiseException when
> > > generating too long tokens.
> >
> > Agreed, this should be generic/shared to all analyzers.
> >
> > But maybe for 2.3, we just truncate any too-long term to the max
> > allowed size, and then after 2.3 we make this a settable "policy"?
>
> But we already have a nice component model for analyzers...
> why not just encapsulate truncation/discarding in a TokenFilter?



Makes sense, especially for the implementation aspect.
I'm not sure what API you have in mind:

(1) leave that for applications, to append such a
    TokenFilter to their Analyzer (== no change),

(2) DocumentsWriter to create such a TokenFilter
     under the cover, to force behavior that is defined (where?), or

(3) have an IndexingTokenFilter assigned to IndexWriter,
     make the default such filter trim/ignore/whatever as discussed
     and then applications can set a different IndexingTokenFilter for
     changing the default behavior?

I think I like the 3'rd option - is this what you meant?

Doron

Re: DocumentsWriter.checkMaxTermLength issues

Reply via email to