Re: Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'

2011-08-09 Thread Chris Hostetter

: during indexing).  However, due to the pre-analysis whitespace tokenization
: done by lucene query parser, the reverse is not handled well - document with
: string 'thunderbolt' being matched to query 'thunder bolt'.

it's not so much pre-analysis whitespace tokenization as it is query 
parser meta-characters ... whitespace has meaning to the query parser in 
the same way that + - and \ do.

if you want a query parser that doesn't treat whitespace special, you can 
use the FieldQParser ... it supports no metacharacters and just runs hte 
input through the analyzer for a specified field.


-Hoss


Re: Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'

2011-08-05 Thread Prasanna R
Requesting the community for feedback one more time - Does anyone have any
suggestions/comments regarding this?

Thanks in advance,

Prasanna

On Sat, Jul 30, 2011 at 12:04 AM, Prasanna R plistma...@gmail.com wrote:


 We use a dismax handler with mm 1 in our Solr installation. I have a
 fieldType defined that creates shingles to handle space variations in the
 input strings and user queries. This fieldType can successfully handle cases
 where the query is 'thunderbolt' and the document contains the string
 'thunder bolt' (the shingle results in the token 'thunderbolt' created
 during indexing).  However, due to the pre-analysis whitespace tokenization
 done by lucene query parser, the reverse is not handled well - document with
 string 'thunderbolt' being matched to query 'thunder bolt'.

 I find that in our dismax handler the shingle field records a match and
 scores on the 'pf' but the document is not returned as none of the fields in
 'qf' record a match (mm is 1). I am looking for suggestions on how to handle
 this scenario. Using a synonym will obviously work but it seems a rather
 hackish solution. Is there a more elegant way of achieving a similar effect?


 Alternatively, is there a way to get the 'mm' parameter to factor in
 matches on 'pf' also?

 Kindly help.

 Regards,

 Prasanna



Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'

2011-07-30 Thread Prasanna R
We use a dismax handler with mm 1 in our Solr installation. I have a
fieldType defined that creates shingles to handle space variations in the
input strings and user queries. This fieldType can successfully handle cases
where the query is 'thunderbolt' and the document contains the string
'thunder bolt' (the shingle results in the token 'thunderbolt' created
during indexing).  However, due to the pre-analysis whitespace tokenization
done by lucene query parser, the reverse is not handled well - document with
string 'thunderbolt' being matched to query 'thunder bolt'.

I find that in our dismax handler the shingle field records a match and
scores on the 'pf' but the document is not returned as none of the fields in
'qf' record a match (mm is 1). I am looking for suggestions on how to handle
this scenario. Using a synonym will obviously work but it seems a rather
hackish solution. Is there a more elegant way of achieving a similar effect?


Alternatively, is there a way to get the 'mm' parameter to factor in matches
on 'pf' also?

Kindly help.

Regards,

Prasanna