I have a requirement to handle synonyms differently based on the first word (token) in the text field of the document. I have implemented custom SynFilterFactory which loads synonyms per languages when core/solr is started.
Now in the MySynonymFilterFactory#create(TokenStream input) method, I have to read the first token from the input TokenStream. Based on that token value, corresponding SynonymMap will be used for SynonymFilter creation. Here are my documents doc1 <text>lang_eng this is English language text</text> doc2 <text>lang_fra this is French language text</text> doc3 <text>lang_spa this is Spanish language text</text> MySynonymFilterFactory creates MySynonymFilter. Method create() logic is below... @Override *public* TokenStream create(TokenStream input) { // if the fst is null, it means there's actually no synonyms... just return the // original stream as there is nothing to do here. // return map.fst == null ? input : new MySynonymFilter(input, map, ignoreCase); System.*out*.println("input=" + input); // some how read the TokenStream here to capture the lang value SynonymMap synonyms = *null*; *try* { CharTermAttribute termAtt = input.addAttribute(CharTermAttribute.*class*); *boolean* first = *false*; input.reset(); *while* (!first && input.incrementToken()) { String term = *new* String(termAtt.buffer(), 0, termAtt.length()); System.*out*.println("termAtt=" + term); *if* (StringUtils.*startsWith*(term, "lang_")) { String[] split = StringUtils.*split*(term, "_"); String lang = split[1]; String key = (langSynMap.containsKey(lang)) ? lang : "generic"; synonyms = langSynMap.get(key); System.*out*.println("synonyms=" + synonyms); } first = *true*; } } *catch* (IOException e) { // *TODO* Auto-generated catch block e.printStackTrace(); } *return* synonyms == *null* ? input : *new* SynonymFilter(input, synonyms, ignoreCase); } This code compiles and this new analysis works fine in the Solr admin analysis screen. But same fails with below exception when I try to index a document 30273 ERROR (qtp1689843956-18) [ x:gcom] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id id1 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:180) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:934) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1089) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:712) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) Caused by: java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Java docs of TokenStream class for more information about the correct consuming workflow. at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:109) at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:527) at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:738) at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:159) at com.synonyms.poc.synpoc.MySynonymFilterFactory.create(MySynonymFilterFactory.java:94) at org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:91) at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101) at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:176) at org.apache.lucene.document.Field.tokenStream(Field.java:562) at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:628) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:365) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:321) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169) ... 37 more Any idea how can I read a token stream with out violating the token stream contract. I see a similar discussion here https://lucene.472066.n3.nabble.com/how-to-reuse-a-tokenStream-td850767.html, but doesn't help solve my problem. Also how come same error is not reported when analyzing the field value using Solr admin console analysis screen. Thanks