Sorry for the confusion and thanks for taking the time to educate me. So, if I am just indexing literal values, what is the best way to do that (what analyzer)? Sounds like this approach, even though it works, is not the preferred method.
analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); analyzer.addAnalyzer("keyword", new KeywordAnalyzer()); Thanks again. Chris Hostetter wrote: > > > 1) consider using JUnit tests .. it makes it a lot easier for other people > to understand your expecations, and if it winds up demonstraing a genuine > bug in Lucene, it's easy to add to the test tree. > > 2) as i said before, your fields must be TOKENIZED, or your analyzer is > irrelevant at index time. > > 3) when i run the code you sent as is, i get lots of "Test passed" lines > and no "TEST FAILED" lines ... which makes sense since you have everything > UN_TOKENIZED, so the literal values are getting indexed, which just so > happens to be what KeywwordAnalyzer does as well -- hence if you change > everything from UN_TOKENIZED to TOKENIZED it will still work. > > > do you have na example of something that *isn't* working the way you want? > ... if not i don't see what your problem is, all your tests are passing :) > > > : Date: Tue, 5 Sep 2006 14:06:13 -0700 (PDT) > : From: Philip Brown <[EMAIL PROTECTED]> > : Reply-To: java-user@lucene.apache.org > : To: java-user@lucene.apache.org > : Subject: Re: Phrase search using quotes -- special Tokenizer > : > : > : Here's a little sample program (borrowed some code from Erick Erickson > :)). > : Whether I add as TOKENIZED or UN_TOKENIZED seems to make no difference > in > : the output. Is this what you'd expect? > : > : - Philip > : > : package com.test; > : > : import java.io.IOException; > : import java.util.HashSet; > : import java.util.regex.Pattern; > : > : import org.apache.lucene.analysis.Analyzer; > : import org.apache.lucene.analysis.KeywordAnalyzer; > : import org.apache.lucene.analysis.PerFieldAnalyzerWrapper; > : import org.apache.lucene.analysis.standard.StandardAnalyzer; > : import org.apache.lucene.document.Document; > : import org.apache.lucene.document.Field; > : import org.apache.lucene.index.IndexWriter; > : import org.apache.lucene.index.memory.PatternAnalyzer; > : import org.apache.lucene.queryParser.QueryParser; > : import org.apache.lucene.search.Hits; > : import org.apache.lucene.search.IndexSearcher; > : import org.apache.lucene.search.Query; > : import org.apache.lucene.store.RAMDirectory; > : > : public class Test2 { > : private PerFieldAnalyzerWrapper analyzer = null; > : private RAMDirectory idx = null; > : > : private Analyzer getAnalyzer() { > : if (analyzer == null) { > : analyzer = new PerFieldAnalyzerWrapper(new > StandardAnalyzer()); > : analyzer.addAnalyzer("keyword", new KeywordAnalyzer()); > : } > : return analyzer; > : > : } > : > : private void makeTestIndex() throws Exception { > : idx = new RAMDirectory(); > : IndexWriter writer = new IndexWriter(idx, getAnalyzer(), true); > : Document doc = new Document(); > : doc.add(new Field("keyword", "hello world", > Field.Store.YES, > : Field.Index.UN_TOKENIZED)); > : doc.add(new Field("booleanField", "false", > Field.Store.YES, > : Field.Index.UN_TOKENIZED)); > : writer.addDocument(doc); > : doc = new Document(); > : doc.add(new Field("keyword", "hello world", > Field.Store.YES, > : Field.Index.UN_TOKENIZED)); > : doc.add(new Field("booleanField", "true", > Field.Store.YES, > : Field.Index.UN_TOKENIZED)); > : writer.addDocument(doc); > : System.out.println(writer.docCount()); > : writer.optimize(); > : writer.close(); > : } > : > : private void doSearch(String query, int expectedHits) throws > Exception > : { > : try { > : QueryParser qp = new QueryParser("keyword", getAnalyzer()); > : IndexSearcher srch = new IndexSearcher(idx); > : Query tmp = qp.parse(query); > : // Uncomment to see parsed form of query > : System.out.println("Parsed form is '" + tmp.toString() + > "'"); > : Hits hits = srch.search(tmp); > : > : String msg = ""; > : > : if (hits.length() == expectedHits) { > : msg = "Test passed "; > : } else { > : msg = "************TEST FAILED************ "; > : } > : System.out.println(msg + "Expected " > : + Integer.toString(expectedHits) + " hits, got " > : + Integer.toString(hits.length()) + " hits"); > : > : } catch (IOException e) { > : System.out.println("Caught IOException"); > : e.printStackTrace(); > : } > : } > : > : > : public static void main(String[] args) { > : try { > : Test2 test = new Test2(); > : test.makeTestIndex(); > : test.doSearch("Hello World", 0); > : test.doSearch("hello world", 0); > : test.doSearch("hello", 0); > : test.doSearch("world", 0); > : > : test.doSearch("\"Hello World\"", 0); > : test.doSearch("\"hello world\"", 2); > : test.doSearch("\"hello world\" +booleanField:false", 1); > : test.doSearch("\"hello world\" +booleanField:true", 1); > : > : } catch (Exception e) { > : System.err.println(e.getMessage()); > : } > : } > : } > : > : > : Chris Hostetter wrote: > : > > : > > : > : So, if I do as you suggest below (using PerFieldAnalyzerWrapper with > : > : StandardAnalyzer) then I still need to enclose in quotes the phrases > : > : (keywords with spaces) when I issue the search, and they are only > : > returned > : > > : > Yes, quotes will be neccessary to tell the QueryParser "this > : > is one chunk of text, passs it to the analyzer whole" - but that's so > you > : > can get the "compelx" part of the problem you described... recognizing > : > that "my brown-cow" and "red fox" should be matched as seperate values > : > intead of trying to find one big vlaue containing "my brown-cow red > fox" > : > > : > : in the results if the case is identical to how it was added? (This > : > seems to > : > : be what I observe anyway. And whether I add as TOKENIZED or > : > UN_TOKENIZED > : > : seems to have no effect.) > : > > : > 1) wether case matters is determined enitrely by your analyzer, if it > : > produces differnet tokens for "Blue" and "BLUE" then case matters > : > 2) use TOKENIZED or your Analyzer will be completely irrelevant > : > 3) if you observse something working differently then you expect, post > the > : > code -- we're way pastthe point of being able to offer you any > : > meaningful help without seeing a self contained example of what you > want > : > to see work. > : > > : > > : > > : > -Hoss > : > > : > > : > --------------------------------------------------------------------- > : > To unsubscribe, e-mail: [EMAIL PROTECTED] > : > For additional commands, e-mail: [EMAIL PROTECTED] > : > > : > > : > > : > : -- > : View this message in context: > http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6160316 > : Sent from the Lucene - Java Users forum at Nabble.com. > : > : > : --------------------------------------------------------------------- > : To unsubscribe, e-mail: [EMAIL PROTECTED] > : For additional commands, e-mail: [EMAIL PROTECTED] > : > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6163500 Sent from the Lucene - Java Users forum at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]