Lucene QueryParser and Analyzer

Wei Ho Thu, 29 Apr 2010 12:50:57 -0700

Hello,

I'm using Lucene to index and search through a collection of Chinesedocuments. However, I'm noticing an odd behavior in query parsing/searching.


Given the two queries below:

(Ci refers to Chinese character i)
Input1: C1C2,C3C4,C5C6,C7,C8C9C10
Input2: C1C2  C3C4  C5C6  C7  C8C9C10

Input1 returns absolutely nothing, while Input2 (replacing the commaswith spaces) works as expected. I'm a bit confused why this would behappening - it seems that QueryParser uses the Analyzer passed to it totokenize the input query string, so if the Analyzer ignores thepunctuations, it seems that Input1 and Input2 should return identicalresults. Is there some pre-Analyzer filtering or whatever thatQueryParser does? I've tried this with the StandardAnalyzer,SmartChineseAnalyzer, and an analyzer that I implemented whichexplicitly skips over punctuations and whitespaces in tokenizing thequery string, but to no avail.


-------sample code-------------
Analyzer analyzer = new LingPipeAnalyzer();
Searcher searcher = new IndexSearcher(directory);

QueryParser qParser = new MultiFieldQueryParser(Version.LUCENE_30,SEARCH_FIELDS, analyzer);

Query query = qParser.parse(queryLine[1]);
ScoreDoc[] results = searcher.search(query, TOP_N).scoreDocs;
-----------------------------------

I'm probably just doing something dumb, but any help would be greatlyappreciated!


Thanks,
Wei Ho

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Lucene QueryParser and Analyzer

Reply via email to