Re: Phrase search using quotes -- special Tokenizer

Chris Hostetter Tue, 05 Sep 2006 14:56:43 -0700

1) consider using JUnit tests .. it makes it a lot easier for other people
to understand your expecations, and if it winds up demonstraing a genuine
bug in Lucene, it's easy to add to the test tree.


2) as i said before, your fields must be TOKENIZED, or your analyzer is
irrelevant at index time.

3) when i run the code you sent as is, i get lots of "Test passed" lines
and no "TEST FAILED" lines ... which makes sense since you have everything
UN_TOKENIZED, so the literal values are getting indexed, which just so
happens to be what KeywwordAnalyzer does as well -- hence if you change
everything from UN_TOKENIZED to TOKENIZED it will still work.


do you have na example of something that *isn't* working the way you want?
... if not i don't see what your problem is, all your tests are passing :)


: Date: Tue, 5 Sep 2006 14:06:13 -0700 (PDT)
: From: Philip Brown <[EMAIL PROTECTED]>
: Reply-To: [email protected]
: To: [email protected]
: Subject: Re: Phrase search using quotes -- special Tokenizer
:
:
: Here's a little sample program (borrowed some code from Erick Erickson :)).
: Whether I add as TOKENIZED or UN_TOKENIZED seems to make no difference in
: the output.  Is this what you'd expect?
:
: - Philip
:
: package com.test;
:
: import java.io.IOException;
: import java.util.HashSet;
: import java.util.regex.Pattern;
:
: import org.apache.lucene.analysis.Analyzer;
: import org.apache.lucene.analysis.KeywordAnalyzer;
: import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
: import org.apache.lucene.analysis.standard.StandardAnalyzer;
: import org.apache.lucene.document.Document;
: import org.apache.lucene.document.Field;
: import org.apache.lucene.index.IndexWriter;
: import org.apache.lucene.index.memory.PatternAnalyzer;
: import org.apache.lucene.queryParser.QueryParser;
: import org.apache.lucene.search.Hits;
: import org.apache.lucene.search.IndexSearcher;
: import org.apache.lucene.search.Query;
: import org.apache.lucene.store.RAMDirectory;
:
: public class Test2 {
:           private PerFieldAnalyzerWrapper analyzer = null;
:           private RAMDirectory idx = null;
:
:           private Analyzer getAnalyzer() {
:               if (analyzer == null) {
:                       analyzer = new PerFieldAnalyzerWrapper(new 
StandardAnalyzer());
:                       analyzer.addAnalyzer("keyword", new KeywordAnalyzer());
:               }
:               return analyzer;
:
:           }
:
:           private void makeTestIndex() throws Exception {
:                       idx = new RAMDirectory();
:               IndexWriter writer = new IndexWriter(idx, getAnalyzer(), true);
:                       Document doc = new Document();
:                       doc.add(new Field("keyword", "hello world", 
Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:                       doc.add(new Field("booleanField", "false", 
Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:                       writer.addDocument(doc);
:                       doc = new Document();
:                       doc.add(new Field("keyword", "hello world", 
Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:                       doc.add(new Field("booleanField", "true", 
Field.Store.YES,
: Field.Index.UN_TOKENIZED));
:                       writer.addDocument(doc);
: System.out.println(writer.docCount());
:                       writer.optimize();
:                       writer.close();
:           }
:
:           private void doSearch(String query, int expectedHits) throws 
Exception
: {
:               try {
:                   QueryParser qp = new QueryParser("keyword", getAnalyzer());
:                   IndexSearcher srch = new IndexSearcher(idx);
:                   Query tmp = qp.parse(query);
:                   // Uncomment to see parsed form of query
:                    System.out.println("Parsed form is '" + tmp.toString() + 
"'");
:                   Hits hits = srch.search(tmp);
:
:                   String msg = "";
:
:                   if (hits.length() == expectedHits) {
:                       msg = "Test passed ";
:                   } else {
:                       msg = "************TEST FAILED************ ";
:                   }
:                   System.out.println(msg + "Expected "
:                           + Integer.toString(expectedHits) + " hits, got "
:                           + Integer.toString(hits.length()) + " hits");
:
:               } catch (IOException e) {
:                   System.out.println("Caught IOException");
:                   e.printStackTrace();
:               }
:           }
:
:
:           public static void main(String[] args) {
:               try {
:                   Test2 test = new Test2();
:                   test.makeTestIndex();
:                   test.doSearch("Hello World", 0);
:                   test.doSearch("hello world", 0);
:                   test.doSearch("hello", 0);
:                   test.doSearch("world", 0);
:
:                   test.doSearch("\"Hello World\"", 0);
:                   test.doSearch("\"hello world\"", 2);
:                   test.doSearch("\"hello world\" +booleanField:false", 1);
:                   test.doSearch("\"hello world\" +booleanField:true", 1);
:
:               } catch (Exception e) {
:                   System.err.println(e.getMessage());
:               }
:           }
: }
:
:
: Chris Hostetter wrote:
: >
: >
: > : So, if I do as you suggest below (using PerFieldAnalyzerWrapper with
: > : StandardAnalyzer) then I still need to enclose in quotes the phrases
: > : (keywords with spaces) when I issue the search, and they are only
: > returned
: >
: > Yes, quotes will be neccessary to tell the QueryParser "this
: > is one chunk of text, passs it to the analyzer whole" - but that's so you
: > can get the "compelx" part of the problem you described... recognizing
: > that "my brown-cow" and "red fox" should be matched as seperate values
: > intead of trying to find one big vlaue containing "my brown-cow red fox"
: >
: > : in the results if the case is identical to how it was added?  (This
: > seems to
: > : be what I observe anyway.  And whether I add as TOKENIZED or
: > UN_TOKENIZED
: > : seems to have no effect.)
: >
: > 1) wether case matters is determined enitrely by your analyzer, if it
: >    produces differnet tokens for "Blue" and "BLUE" then case matters
: > 2) use TOKENIZED or your Analyzer will be completely irrelevant
: > 3) if you observse something working differently then you expect, post the
: >   code -- we're way pastthe point of being able to offer you any
: >   meaningful help without seeing a self contained example of what you want
: >   to see work.
: >
: >
: >
: > -Hoss
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: [EMAIL PROTECTED]
: > For additional commands, e-mail: [EMAIL PROTECTED]
: >
: >
: >
:
: --
: View this message in context: 
http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6160316
: Sent from the Lucene - Java Users forum at Nabble.com.
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Phrase search using quotes -- special Tokenizer

Reply via email to