Sorry for the confusion and thanks for taking the time to educate me. So, if
I am just indexing literal values, what is the best way to do that (what
analyzer)? Sounds like this approach, even though it works, is not the
preferred method.
analyzer = new PerFieldAnalyzerWrapper(new
StandardAnalyzer());
analyzer.addAnalyzer("keyword", new KeywordAnalyzer());
Thanks again.
Chris Hostetter wrote:
>
>
> 1) consider using JUnit tests .. it makes it a lot easier for other people
> to understand your expecations, and if it winds up demonstraing a genuine
> bug in Lucene, it's easy to add to the test tree.
>
> 2) as i said before, your fields must be TOKENIZED, or your analyzer is
> irrelevant at index time.
>
> 3) when i run the code you sent as is, i get lots of "Test passed" lines
> and no "TEST FAILED" lines ... which makes sense since you have everything
> UN_TOKENIZED, so the literal values are getting indexed, which just so
> happens to be what KeywwordAnalyzer does as well -- hence if you change
> everything from UN_TOKENIZED to TOKENIZED it will still work.
>
>
> do you have na example of something that *isn't* working the way you want?
> ... if not i don't see what your problem is, all your tests are passing :)
>
>
> : Date: Tue, 5 Sep 2006 14:06:13 -0700 (PDT)
> : From: Philip Brown <[EMAIL PROTECTED]>
> : Reply-To: [email protected]
> : To: [email protected]
> : Subject: Re: Phrase search using quotes -- special Tokenizer
> :
> :
> : Here's a little sample program (borrowed some code from Erick Erickson
> :)).
> : Whether I add as TOKENIZED or UN_TOKENIZED seems to make no difference
> in
> : the output. Is this what you'd expect?
> :
> : - Philip
> :
> : package com.test;
> :
> : import java.io.IOException;
> : import java.util.HashSet;
> : import java.util.regex.Pattern;
> :
> : import org.apache.lucene.analysis.Analyzer;
> : import org.apache.lucene.analysis.KeywordAnalyzer;
> : import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
> : import org.apache.lucene.analysis.standard.StandardAnalyzer;
> : import org.apache.lucene.document.Document;
> : import org.apache.lucene.document.Field;
> : import org.apache.lucene.index.IndexWriter;
> : import org.apache.lucene.index.memory.PatternAnalyzer;
> : import org.apache.lucene.queryParser.QueryParser;
> : import org.apache.lucene.search.Hits;
> : import org.apache.lucene.search.IndexSearcher;
> : import org.apache.lucene.search.Query;
> : import org.apache.lucene.store.RAMDirectory;
> :
> : public class Test2 {
> : private PerFieldAnalyzerWrapper analyzer = null;
> : private RAMDirectory idx = null;
> :
> : private Analyzer getAnalyzer() {
> : if (analyzer == null) {
> : analyzer = new PerFieldAnalyzerWrapper(new
> StandardAnalyzer());
> : analyzer.addAnalyzer("keyword", new KeywordAnalyzer());
> : }
> : return analyzer;
> :
> : }
> :
> : private void makeTestIndex() throws Exception {
> : idx = new RAMDirectory();
> : IndexWriter writer = new IndexWriter(idx, getAnalyzer(), true);
> : Document doc = new Document();
> : doc.add(new Field("keyword", "hello world",
> Field.Store.YES,
> : Field.Index.UN_TOKENIZED));
> : doc.add(new Field("booleanField", "false",
> Field.Store.YES,
> : Field.Index.UN_TOKENIZED));
> : writer.addDocument(doc);
> : doc = new Document();
> : doc.add(new Field("keyword", "hello world",
> Field.Store.YES,
> : Field.Index.UN_TOKENIZED));
> : doc.add(new Field("booleanField", "true",
> Field.Store.YES,
> : Field.Index.UN_TOKENIZED));
> : writer.addDocument(doc);
> : System.out.println(writer.docCount());
> : writer.optimize();
> : writer.close();
> : }
> :
> : private void doSearch(String query, int expectedHits) throws
> Exception
> : {
> : try {
> : QueryParser qp = new QueryParser("keyword", getAnalyzer());
> : IndexSearcher srch = new IndexSearcher(idx);
> : Query tmp = qp.parse(query);
> : // Uncomment to see parsed form of query
> : System.out.println("Parsed form is '" + tmp.toString() +
> "'");
> : Hits hits = srch.search(tmp);
> :
> : String msg = "";
> :
> : if (hits.length() == expectedHits) {
> : msg = "Test passed ";
> : } else {
> : msg = "************TEST FAILED************ ";
> : }
> : System.out.println(msg + "Expected "
> : + Integer.toString(expectedHits) + " hits, got "
> : + Integer.toString(hits.length()) + " hits");
> :
> : } catch (IOException e) {
> : System.out.println("Caught IOException");
> : e.printStackTrace();
> : }
> : }
> :
> :
> : public static void main(String[] args) {
> : try {
> : Test2 test = new Test2();
> : test.makeTestIndex();
> : test.doSearch("Hello World", 0);
> : test.doSearch("hello world", 0);
> : test.doSearch("hello", 0);
> : test.doSearch("world", 0);
> :
> : test.doSearch("\"Hello World\"", 0);
> : test.doSearch("\"hello world\"", 2);
> : test.doSearch("\"hello world\" +booleanField:false", 1);
> : test.doSearch("\"hello world\" +booleanField:true", 1);
> :
> : } catch (Exception e) {
> : System.err.println(e.getMessage());
> : }
> : }
> : }
> :
> :
> : Chris Hostetter wrote:
> : >
> : >
> : > : So, if I do as you suggest below (using PerFieldAnalyzerWrapper with
> : > : StandardAnalyzer) then I still need to enclose in quotes the phrases
> : > : (keywords with spaces) when I issue the search, and they are only
> : > returned
> : >
> : > Yes, quotes will be neccessary to tell the QueryParser "this
> : > is one chunk of text, passs it to the analyzer whole" - but that's so
> you
> : > can get the "compelx" part of the problem you described... recognizing
> : > that "my brown-cow" and "red fox" should be matched as seperate values
> : > intead of trying to find one big vlaue containing "my brown-cow red
> fox"
> : >
> : > : in the results if the case is identical to how it was added? (This
> : > seems to
> : > : be what I observe anyway. And whether I add as TOKENIZED or
> : > UN_TOKENIZED
> : > : seems to have no effect.)
> : >
> : > 1) wether case matters is determined enitrely by your analyzer, if it
> : > produces differnet tokens for "Blue" and "BLUE" then case matters
> : > 2) use TOKENIZED or your Analyzer will be completely irrelevant
> : > 3) if you observse something working differently then you expect, post
> the
> : > code -- we're way pastthe point of being able to offer you any
> : > meaningful help without seeing a self contained example of what you
> want
> : > to see work.
> : >
> : >
> : >
> : > -Hoss
> : >
> : >
> : > ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: [EMAIL PROTECTED]
> : > For additional commands, e-mail: [EMAIL PROTECTED]
> : >
> : >
> : >
> :
> : --
> : View this message in context:
> http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6160316
> : Sent from the Lucene - Java Users forum at Nabble.com.
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [EMAIL PROTECTED]
> : For additional commands, e-mail: [EMAIL PROTECTED]
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6163500
Sent from the Lucene - Java Users forum at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]