I am trying to have multi-word synonyms work in lucene using Solr's * SynonymFilter*.
I need to match synonyms at index time, since many of the synonym lists are huge. Actually they are really not synonyms, but are words that belong to a concept. For example, I would like to map {"New York", "Los Angeles", "New Orleans", "Salt Lake City"...}, a bunch of city names, to the concept called "city". While searching, the user query for the concept "city" will be translated to a keyword like, say "CONCEPTcity", which is the synonym for any city name. Using lucene's SynonymAnalyzer, as explained in Lucene in Action (p. 131), all I could match for "CONCEPTcity" is single word city names like "Chicago", "Seattle", "Boston", etc., It would not match multi-word city names like "New York", "Los Angeles", etc., I tried using Solr's SynonymFilter in tokenStream method in a custom Analyzer (that extends org.apache.lucene.analysis. Analyzer - lucene ver. 2.9.3) using: * public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new SynonymFilter( new WhitespaceTokenizer(reader), synonymMap); return result; } * where *synonymMap* is loaded with synonyms using *synonymMap.add(conceptTerms, listOfTokens, true, true);* where *conceptTerms* is of type *ArrayList<String>* of all the terms in a concept and *listofTokens* is of type *List<Token> *and contains only the generic synonym identifier like *CONCEPTcity*. When I print synonymMap using synonymMap.toString(), I get the output like <{New York=<{Chicago=<{Seattle=<{New Orleans=....<[(CATEGORYcity,0,0,type=SYNONYM),ORIG],null>}>}>}>....}> so it looks like all the synonyms are loaded. But if I search for "CATEGORYcity" then it says no matches found. I am not sure whether I have loaded the synonyms correctly in the synonymMap. Any help will be deeply appreciated. Thanks!