Just be aware that KeywordAnalyzer won't tokenize at all. That is,if you expect to index "jack-bauer" and hit on "jack" or "bauer" it won't.
Best Erick On Wed, Jun 3, 2009 at 2:25 AM, legrand thomas <thomaslegran...@yahoo.fr>wrote: > Hi, > > A KeywordAnalyzer solved my problem. > Luke allowed me to understand the queries and the content of the index. > > Thanks (Erick & Balasubramanian Sudaakeran) > Tom > > --- En date de : Dim 31.5.09, Erick Erickson <erickerick...@gmail.com> a > écrit : > > De: Erick Erickson <erickerick...@gmail.com> > Objet: Re: Index and search terms containing character "-" > À: java-user@lucene.apache.org > Date: Dimanche 31 Mai 2009, 15h33 > > Simple analyzer does two things: splits tokens on non-letter characters and > lowercases them. > > So, in your test you've indexed the tokens "jack" and "bauer" in your > second > document, the hyphen is completely lost during tokenization and you have > two tokens for that document. > > Using the term query "jack-bauer" is looking for a *single* token that is > exactly "jack-bauer", which is nowhere in your index. Term queries assume > that you know exactly what result you want, they don't try to transform the > input at all. Had you run "jack-bauer" through the query parser with > SimpleAnalyzer, you'd be searching for a document with the two terms > "jack" and "bauer", and you'd have hit your second document but not your > first. > > I'd strongly recommend you get a copy of Luke, it's invaluable for > questions > like this > because it lets you look at what's actually in your index. It'll also show > you how > queries get broken down when pushed through various analyzers... > > BTW, nice test case for demonstrating what you were seeing, it makes > answering > *vastly* easier..... > > HTH > Erick > > On Sun, May 31, 2009 at 5:55 AM, legrand thomas <thomaslegran...@yahoo.fr > >wrote: > > > Hi, > > > > I have a problem using TermQuery and FuzzyQuery for terms containing the > > character "-". Considering I've indexed "jack" and "jack-bauer" as 2 > > tokenized captions, I get no result when searching for "jack-bauer". > > Moreover, "jack" with a TermQuery returns the two captions. > > > > What should I do to get "jack-bauer" with new TermQuery("jack-bauer") ? > > > > A full test case is given below. > > > > Thanks, > > Tom > > > > > > import junit.framework.Assert; > > > > import org.apache.lucene.analysis.Analyzer; > > import org.apache.lucene.analysis.SimpleAnalyzer; > > import org.apache.lucene.document.Document; > > import org.apache.lucene.document.Field; > > import org.apache.lucene.index.IndexReader; > > import org.apache.lucene.index.IndexWriter; > > import org.apache.lucene.index.Term; > > import org.apache.lucene.search.Hits; > > import org.apache.lucene.search.IndexSearcher; > > import org.apache.lucene.search.TermQuery; > > import org.apache.lucene.store.FSDirectory; > > import org.junit.Test; > > > > public class IDebugIndexTest { > > > > @Test > > public void TermQueryTest() { > > > > Analyzer analyser = new SimpleAnalyzer(); > > > > try { > > // write docs to new index > > IndexWriter writer = new IndexWriter(FSDirectory > > .getDirectory("/tmp/idx_test"), analyser, true); > > > > Document jack = new Document(); > > jack.add(new Field("caption", "jack", Field.Store.YES, > > Field.Index.TOKENIZED)); > > writer.addDocument(jack); > > > > Document jackBauer = new Document(); > > jackBauer.add(new Field("caption", "jack-bauer", > > Field.Store.YES, > > Field.Index.TOKENIZED)); > > writer.addDocument(jackBauer); > > > > writer.close(); > > > > // try to search > > IndexSearcher s = new > > IndexSearcher(IndexReader.open(FSDirectory > > .getDirectory("/tmp/idx_test"))); > > > > // The next assertion is ok > > Hits jackHits = s > > .search(new TermQuery(new Term("caption", "jack"))); > > Assert.assertEquals(jackHits.length(), 2); > > > > // The next assertion fails !!! > > Hits jackBauerHits = s.search(new TermQuery(new > Term("caption", > > "jack-bauer"))); > > Assert.assertEquals(jackBauerHits.length(), 1); > > > > } catch (Exception e) { > > Assert.fail(); > > } > > > > } > > } > > > > > > > > > > > > > > > > >