Grant Ingersoll wrote:
I agree with Robert, as I have had similar wishes about more interface capabilities, but also agree with Eric in that Lucene works great in a lot of ways. I have found the current design causes you to have to hard code things that shouldn't need to be hard coded, especially in the TokenStream area. The idea of writing a new Analyzer every time you want to change a Tokenizer or TokenFilter is very limiting. In my application I need the flexibility to re-index and evaluate fairly often. The current Analyzer implementation would require me to write a new Analyzer for every experiment and that is not manageable. Do others have this issue?
I had this issue. I have solved this by rewritting the API around
TokenStream (mainly introducing an interface that allows resetting the
source stream) and creating a generalized analyzer class. This analyzer
class holds a reference to the TokenStream pipeline to which it
delegates. A PerField analyzer is populated with Analyzers configured
from JNDI (essentially Tokenizer and TokenStreamDecorator compositions).
When TokenStream(String fieldName, Reader reader) is called the analyzer
resets its TokenStream reference before returning it.
I submitted a "broken" patch that converts the analyzers and token streams to interfaces, but as Doug pointed out, it is not currently thread safe (I have another version that uses reflection that is thread safe). I intend to go back and make it thread-safe, but haven't had the time. Anyway, this patch contains an interface implementation of Analyzer and TokenStream that we may find useful in the future and if someone else wants to take up the ball and make it thread-safe, I don't think it would take too long.
What were the issues related to thread safety? Are invocations of an
analyzer within an IndexWriter not single threaded? I was unsure of
this, but planned to object pool my TokenStream compositions if needed.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]