Hello, I'm having an issue creating a custom analyzer utilizing the WordDelimiterFilter. I'm attempting to create an index of information gleaned from JAR manifest files. So if I have "spring-framework" I need the following tokens indexed: "spring" "springframework" "framework" "spring-framework". My understanding is that the WordDelimiterFilter is perfect for this. However, when I introduce the filter to the analyzer I don't seem to get any documents indexed correctly.
Here is the analyzer: import java.io.Reader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.core.WhitespaceTokenizer; import org.apache.lucene.analysis.core.LowerCaseFilter; import org.apache.lucene.analysis.core.StopAnalyzer; import org.apache.lucene.analysis.core.StopFilter; import org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter; import org.apache.lucene.util.Version; public class FieldAnalyzer extends Analyzer { private Version version = null; public FieldAnalyzer(Version version) { this.version = version; } @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { Tokenizer source = new WhitespaceTokenizer(version, reader); TokenStream stream = source; stream = new WordDelimiterFilter(stream, WordDelimiterFilter.CATENATE_WORDS & WordDelimiterFilter.GENERATE_WORD_PARTS & WordDelimiterFilter.PRESERVE_ORIGINAL & WordDelimiterFilter.SPLIT_ON_CASE_CHANGE & WordDelimiterFilter.STEM_ENGLISH_POSSESSIVE, null); stream = new LowerCaseFilter(version, stream); stream = new StopFilter(version, stream, StopAnalyzer.ENGLISH_STOP_WORDS_SET); return new TokenStreamComponents(source, stream); } } //------------------------------------------------- Performing a very simple test results in zero document found: Analyzer analyzer = new FieldAnalyzer(Version.LUCENE_40); Directory index = new RAMDirectory(); String text = "spring-framework"; String field = "field"; IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer); IndexWriter w = new IndexWriter(index, config); Document doc = new Document(); doc.add(new TextField(field, text, Field.Store.YES)); w.addDocument(doc); w.close(); String querystr = "spring-framework"; Query q = new AnalyzingQueryParser(Version.LUCENE_40, field, analyzer).parse(querystr); int hitsPerPage = 10; IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true); searcher.search(q, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; System.out.println("Found " + hits.length + " hits."); for (int i = 0; i < hits.length; ++i) { int docId = hits[i].doc; Document d = searcher.doc(docId); System.out.println((i + 1) + ". " + d.get(field)); } Any idea what I've done wrong? If I comment out the addition of WordDelimiterFilter - the search works. Thanks in advance, Jeremy