Here's a little sample program (borrowed some code from Erick Erickson :)).
Whether I add as TOKENIZED or UN_TOKENIZED seems to make no difference in
the output. Is this what you'd expect?
- Philip
package com.test;
import java.io.IOException;
import java.util.HashSet;
import java.util.regex.Pattern;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.memory.PatternAnalyzer;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.RAMDirectory;
public class Test2 {
private PerFieldAnalyzerWrapper analyzer = null;
private RAMDirectory idx = null;
private Analyzer getAnalyzer() {
if (analyzer == null) {
analyzer = new PerFieldAnalyzerWrapper(new
StandardAnalyzer());
analyzer.addAnalyzer("keyword", new KeywordAnalyzer());
}
return analyzer;
}
private void makeTestIndex() throws Exception {
idx = new RAMDirectory();
IndexWriter writer = new IndexWriter(idx, getAnalyzer(), true);
Document doc = new Document();
doc.add(new Field("keyword", "hello world",
Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.add(new Field("booleanField", "false",
Field.Store.YES,
Field.Index.UN_TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field("keyword", "hello world",
Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.add(new Field("booleanField", "true",
Field.Store.YES,
Field.Index.UN_TOKENIZED));
writer.addDocument(doc);
System.out.println(writer.docCount());
writer.optimize();
writer.close();
}
private void doSearch(String query, int expectedHits) throws
Exception
{
try {
QueryParser qp = new QueryParser("keyword", getAnalyzer());
IndexSearcher srch = new IndexSearcher(idx);
Query tmp = qp.parse(query);
// Uncomment to see parsed form of query
System.out.println("Parsed form is '" + tmp.toString() +
"'");
Hits hits = srch.search(tmp);
String msg = "";
if (hits.length() == expectedHits) {
msg = "Test passed ";
} else {
msg = "************TEST FAILED************ ";
}
System.out.println(msg + "Expected "
+ Integer.toString(expectedHits) + " hits, got "
+ Integer.toString(hits.length()) + " hits");
} catch (IOException e) {
System.out.println("Caught IOException");
e.printStackTrace();
}
}
public static void main(String[] args) {
try {
Test2 test = new Test2();
test.makeTestIndex();
test.doSearch("Hello World", 0);
test.doSearch("hello world", 0);
test.doSearch("hello", 0);
test.doSearch("world", 0);
test.doSearch("\"Hello World\"", 0);
test.doSearch("\"hello world\"", 2);
test.doSearch("\"hello world\" +booleanField:false", 1);
test.doSearch("\"hello world\" +booleanField:true", 1);
} catch (Exception e) {
System.err.println(e.getMessage());
}
}
}
Chris Hostetter wrote:
>
>
> : So, if I do as you suggest below (using PerFieldAnalyzerWrapper with
> : StandardAnalyzer) then I still need to enclose in quotes the phrases
> : (keywords with spaces) when I issue the search, and they are only
> returned
>
> Yes, quotes will be neccessary to tell the QueryParser "this
> is one chunk of text, passs it to the analyzer whole" - but that's so you
> can get the "compelx" part of the problem you described... recognizing
> that "my brown-cow" and "red fox" should be matched as seperate values
> intead of trying to find one big vlaue containing "my brown-cow red fox"
>
> : in the results if the case is identical to how it was added? (This
> seems to
> : be what I observe anyway. And whether I add as TOKENIZED or
> UN_TOKENIZED
> : seems to have no effect.)
>
> 1) wether case matters is determined enitrely by your analyzer, if it
> produces differnet tokens for "Blue" and "BLUE" then case matters
> 2) use TOKENIZED or your Analyzer will be completely irrelevant
> 3) if you observse something working differently then you expect, post the
> code -- we're way pastthe point of being able to offer you any
> meaningful help without seeing a self contained example of what you want
> to see work.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6160316
Sent from the Lucene - Java Users forum at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]