On Dec 31, 2007 6:10 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Dec 31, 2007 5:53 AM, Michael McCandless <[EMAIL PROTECTED]> > wrote: > > Doron Cohen <[EMAIL PROTECTED]> wrote: > > > I like the approach of configuration of this behavior in Analysis > > > (and so IndexWriter can throw an exception on such errors). > > > > > > It seems that this should be a property of Analyzer vs. > > > just StandardAnalyzer, right? > > > > > > It can probably be a "policy" property, with two parameters: > > > 1) maxLength, 2) action: chop/split/ignore/raiseException when > > > generating too long tokens. > > > > Agreed, this should be generic/shared to all analyzers. > > > > But maybe for 2.3, we just truncate any too-long term to the max > > allowed size, and then after 2.3 we make this a settable "policy"? > > But we already have a nice component model for analyzers... > why not just encapsulate truncation/discarding in a TokenFilter?
Makes sense, especially for the implementation aspect. I'm not sure what API you have in mind: (1) leave that for applications, to append such a TokenFilter to their Analyzer (== no change), (2) DocumentsWriter to create such a TokenFilter under the cover, to force behavior that is defined (where?), or (3) have an IndexingTokenFilter assigned to IndexWriter, make the default such filter trim/ignore/whatever as discussed and then applications can set a different IndexingTokenFilter for changing the default behavior? I think I like the 3'rd option - is this what you meant? Doron