Hi All, This is my first question for this forum. I am fairly familiar with Lucene and using 2.9.4 in my project (not using Solr). I have a following question for the use of Synonym filter.
While indexing contents, I am using following analyzer setup [Analyzer1] == StandardTokenizer --> StandardFilter --> LowerCaseFilter --> StopFilter --> PorterStemFilter And while searching using MoreLikeThis I am using analyzer similar to the previous one but with addition of synonym filter [Analyzer2] == StandardTokenizer --> StandardFilter --> LowerCaseFilter --> StopFilter --> SynonymFilter --> PorterStemFilter *Scenario 1: Analyzer 1 for indexing and searching* Now I index document A, B and C using Analyzer1 and then use MoreLikeThis on document D to find similar documents from the index using Analyzer1 (Not Analyzer2), I get following output A matched 40% B matched 20% C matched 5% *Scenario 2: Analyzer 2 for indexing and searching* My problem is, the moment I use Analyzer2 (with Synonym Filter) to index and search similar documents to document D, all my results gets boost, my results become: A matched 60% B matched 40% C matched 25% *Scenario 3: Analyzer 1 for indexing and Analyzer 2 for searching* But if I use Analyzer1 for indexing and Analyzer2 for searching, then my results go way down A matched 15% B matched 11% C matched 2% When I dig into the reason why the % matching went down, I understood that this is happening because when searching using Synonym analyzer, I tend to get much more interesting terms [moreLikeThis.retrieveInterestingTerms(reader)] and then most of these synonym words match with all the documents bringing down its tf and idf resulting into less matching percentages for the documents. *So my question is:* 1. Is it correct to use Analyzer without synonym filter for indexing and with synonym filter for searching? 2. Is there any other setting that I am missing causing all the matching percentages to go down? My search setting while using MoreLikeThis are MoreLikeThis mlt = new MoreLikeThis(index); SynonymEngine engine = new WordNetSynonymEngine(new File("PATH")); mlt.setMinWordLen(3); mlt.setBoost(true); mlt.setMinTermFreq(2); mlt.setMinDocFreq(0); mlt.setMaxQueryTerms(100); mlt.setAnalyzer(new PorterSynonymStandardAnalyzer(engine)); Thanks in advance Saurabh