Oh boy!

It seems like I have found the problem in my case, which afaik, has nothing to do with lucene but rather the library we use to tokenize HTML document. Its just that we have changed our HTML parser at the same time as the version of Lucene and nekoHTML (cyberneko) does not close its HTML reader even when we call parser.abort()/parser.close() (which is placed in the close() of the lucene Tokenizer()).

Before that, the HTML parser would close the reader so I wrongfully thought it was the change of version of Lucene that caused this.

Bad news is that I had you all worked up for nothing, but good news is you don't have any bugs here.

However, they may be something with the fact that Lucene's Analyzers automatically close the reader when its done analyzing. I think this encourages people not to explicitly close them, and creates the potential of having open fd's if an exception is thrown in the middle of the analysis or before addDocument/updateDocument is called.

I don't think changing the API of Field to accept a "ReaderFactory" would solve anything because there are cases where you must index a reader that is already opened (like a network connection) and wrapping it with a dummy readerFactory does not look very good.

Daniel Shane

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to