magibney commented on pull request #380: URL: https://github.com/apache/lucene/pull/380#issuecomment-1050227228
This patch applies cleanly and all tests pass. I plan to commit this within the next few days, because i think it does improve things (targeting 9.1 release). But I want to go on the record mentioning that there are a number of things I've noticed in the process of looking at this code (completely orthogonal to this PR) that make me a bit uncomfortable: 1. All the objects retrieved through OpenNLPOpsFactory are cached in [static maps](https://github.com/apache/lucene/blob/main/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/tools/OpenNLPOpsFactory.java#L41-L48) that are [never cleared](https://github.com/apache/lucene/blob/main/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/tools/OpenNLPOpsFactory.java#L191-L199). 2. These maps are populated in a way where there's a race condition (could end up with multiple copies of these objects being held external to this class). wrt this PR in particular, I'm nervous about the fact that everything _but_ the DictionaryLemmatizers have long been cached as objects, which makes me wonder if there was a reason that from this class's inception, the dictionaryLemmatizers have been cached as String versions of the raw dictionary. But I've tried to chase down this line of reasoning, and still can't find any obvious problem with this change. Analogous to not "letting the perfect be the enemy of the good", I'm going to not "let the 'not-so-good' be the enemy of the 'probably worse'". I hope this is the correct decision. @spyk thanks for your patience, and for keeping the PR up-to-date. If you can add a CHANGES.txt entry, that's probably warranted here (if only because of the change to the public API). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org