Grant Ingersoll wrote:

I agree with Robert, as I have had similar wishes about more interface capabilities, but also agree with Eric in that Lucene works great in a lot of ways. I have found the current design causes you to have to hard code things that shouldn't need to be hard coded, especially in the TokenStream area. The idea of writing a new Analyzer every time you want to change a Tokenizer or TokenFilter is very limiting. In my application I need the flexibility to re-index and evaluate fairly often. The current Analyzer implementation would require me to write a new Analyzer for every experiment and that is not manageable. Do others have this issue?



I had this issue. I have solved this by rewritting the API around TokenStream (mainly introducing an interface that allows resetting the source stream) and creating a generalized analyzer class. This analyzer class holds a reference to the TokenStream pipeline to which it delegates. A PerField analyzer is populated with Analyzers configured from JNDI (essentially Tokenizer and TokenStreamDecorator compositions). When TokenStream(String fieldName, Reader reader) is called the analyzer resets its TokenStream reference before returning it.

I submitted a "broken" patch that converts the analyzers and token streams to interfaces, but as Doug pointed out, it is not currently thread safe (I have another version that uses reflection that is thread safe). I intend to go back and make it thread-safe, but haven't had the time. Anyway, this patch contains an interface implementation of Analyzer and TokenStream that we may find useful in the future and if someone else wants to take up the ball and make it thread-safe, I don't think it would take too long.


What were the issues related to thread safety? Are invocations of an analyzer within an IndexWriter not single threaded? I was unsure of this, but planned to object pool my TokenStream compositions if needed.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to