On Dec 31, 2007 12:25 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Sure, but I mean in the >16K (in other words, in the case where > DocsWriter fails, which presumably only DocsWriter knows about) case. > I want the option to ignore tokens larger than that instead of failing/ > throwing an exception.
I think the issue here is what the default behavior for IndexWriter should be. If configuration is required because something other than the default is desired, then one could use a TokenFilter to change the behavior rather than changing options on IndexWriter. Using a TokenFilter is much more flexible. > Imagine I am charged w/ indexing some data > that I don't know anything about (i.e. computer forensics), my goal > would be to index as much as possible in my first raw pass, so that I > can then begin to explore the dataset. Having it completely discard > the document is not a good thing, but throwing away some large binary > tokens would be acceptable (especially if I get warnings about said > tokens) and robust. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]