Simple analyzer does two things: splits tokens on non-letter characters and lowercases them.
So, in your test you've indexed the tokens "jack" and "bauer" in your second document, the hyphen is completely lost during tokenization and you have two tokens for that document. Using the term query "jack-bauer" is looking for a *single* token that is exactly "jack-bauer", which is nowhere in your index. Term queries assume that you know exactly what result you want, they don't try to transform the input at all. Had you run "jack-bauer" through the query parser with SimpleAnalyzer, you'd be searching for a document with the two terms "jack" and "bauer", and you'd have hit your second document but not your first. I'd strongly recommend you get a copy of Luke, it's invaluable for questions like this because it lets you look at what's actually in your index. It'll also show you how queries get broken down when pushed through various analyzers... BTW, nice test case for demonstrating what you were seeing, it makes answering *vastly* easier..... HTH Erick On Sun, May 31, 2009 at 5:55 AM, legrand thomas <thomaslegran...@yahoo.fr>wrote: > Hi, > > I have a problem using TermQuery and FuzzyQuery for terms containing the > character "-". Considering I've indexed "jack" and "jack-bauer" as 2 > tokenized captions, I get no result when searching for "jack-bauer". > Moreover, "jack" with a TermQuery returns the two captions. > > What should I do to get "jack-bauer" with new TermQuery("jack-bauer") ? > > A full test case is given below. > > Thanks, > Tom > > > import junit.framework.Assert; > > import org.apache.lucene.analysis.Analyzer; > import org.apache.lucene.analysis.SimpleAnalyzer; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.Term; > import org.apache.lucene.search.Hits; > import org.apache.lucene.search.IndexSearcher; > import org.apache.lucene.search.TermQuery; > import org.apache.lucene.store.FSDirectory; > import org.junit.Test; > > public class IDebugIndexTest { > > @Test > public void TermQueryTest() { > > Analyzer analyser = new SimpleAnalyzer(); > > try { > // write docs to new index > IndexWriter writer = new IndexWriter(FSDirectory > .getDirectory("/tmp/idx_test"), analyser, true); > > Document jack = new Document(); > jack.add(new Field("caption", "jack", Field.Store.YES, > Field.Index.TOKENIZED)); > writer.addDocument(jack); > > Document jackBauer = new Document(); > jackBauer.add(new Field("caption", "jack-bauer", > Field.Store.YES, > Field.Index.TOKENIZED)); > writer.addDocument(jackBauer); > > writer.close(); > > // try to search > IndexSearcher s = new > IndexSearcher(IndexReader.open(FSDirectory > .getDirectory("/tmp/idx_test"))); > > // The next assertion is ok > Hits jackHits = s > .search(new TermQuery(new Term("caption", "jack"))); > Assert.assertEquals(jackHits.length(), 2); > > // The next assertion fails !!! > Hits jackBauerHits = s.search(new TermQuery(new Term("caption", > "jack-bauer"))); > Assert.assertEquals(jackBauerHits.length(), 1); > > } catch (Exception e) { > Assert.fail(); > } > > } > } > > > > > >