Kind of a shot in the dark here, but your parameters for index and query on
WordDelimiterFilterFactory are different, especially suspicious is
catenateWords.
You could test this by looking in your index with the SOLR admin page and/or
Luke to see what your actual terms are.
And don't forget you'll have to re-index after restarting SOLR for any
index
changes to take effect
HTH
Erick
On Thu, Mar 11, 2010 at 2:20 PM, Ya-Wen Hsu y...@eline.com wrote:
Yonik, thank you for your reply. When I don't use PreserveOriginal = 1 for
WordDelimiterFilterFactory, the query ain't is parsed as ain t and no
match is found in this case too. If I remove ' from the query, then I can
get results. I used the analysis tool and see the term ain't is processed as
ain t, and get matches when the title includes ain't. But I got no
result when using ain't query with dismax.
The debug output looks like:
(NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s)
+(long_description:ain t^2.0 | name:ain t^3.0 | search_keywords:ain
t)~0.1 (long_description:save^2.0 | name:save^3.0 |
search_keywords:saved)~0.1) ()
Below is my configuration for text field type.
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
!--filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/--
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
I get results back when I tried to use solr.LowerCaseTokenizerFactory
instead of solr.WhitespaceTokenizerFactory. However, the concern here is
this might reduce the quality of relevant search. Does anyone have a better
idea on what to try next? Thanks!
Wen
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Thursday, March 11, 2010 10:51 AM
To: solr-user@lucene.apache.org
Subject: Re: dismax and WordDelimiterFilterFactory with PreserveOriginal =
1
On Thu, Mar 11, 2010 at 1:07 PM, Ya-Wen Hsu y...@eline.com wrote:
Hi all,
I'm facing the same issue as previous post here:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg19511.html.
Since no one answers this post, I thought I'll ask again. In my case, I use
below setting for index
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=0 preserveOriginal=1/
and
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=0 preserveOriginal=1/ for query.
When I use query with word ain't, no result is returned. When I turned
on the logging, I found the word is interpreted as (ain't ain) t.
The problem is preserving the original in the query analyzer - try
removing that. And if you aren't doing prefix or wildcard queries,
preserveOriginal doesn't buy you anything but wasted index space.
It's the same issue of why you can't generate and catenate at the same
time with the query parser.
-Yonik
http://www.lucidimagination.com