Re: Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'
: during indexing). However, due to the pre-analysis whitespace tokenization : done by lucene query parser, the reverse is not handled well - document with : string 'thunderbolt' being matched to query 'thunder bolt'. it's not so much pre-analysis whitespace tokenization as it is query parser meta-characters ... whitespace has meaning to the query parser in the same way that + - and \ do. if you want a query parser that doesn't treat whitespace special, you can use the FieldQParser ... it supports no metacharacters and just runs hte input through the analyzer for a specified field. -Hoss
Re: Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'
Requesting the community for feedback one more time - Does anyone have any suggestions/comments regarding this? Thanks in advance, Prasanna On Sat, Jul 30, 2011 at 12:04 AM, Prasanna R plistma...@gmail.com wrote: We use a dismax handler with mm 1 in our Solr installation. I have a fieldType defined that creates shingles to handle space variations in the input strings and user queries. This fieldType can successfully handle cases where the query is 'thunderbolt' and the document contains the string 'thunder bolt' (the shingle results in the token 'thunderbolt' created during indexing). However, due to the pre-analysis whitespace tokenization done by lucene query parser, the reverse is not handled well - document with string 'thunderbolt' being matched to query 'thunder bolt'. I find that in our dismax handler the shingle field records a match and scores on the 'pf' but the document is not returned as none of the fields in 'qf' record a match (mm is 1). I am looking for suggestions on how to handle this scenario. Using a synonym will obviously work but it seems a rather hackish solution. Is there a more elegant way of achieving a similar effect? Alternatively, is there a way to get the 'mm' parameter to factor in matches on 'pf' also? Kindly help. Regards, Prasanna
Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'
We use a dismax handler with mm 1 in our Solr installation. I have a fieldType defined that creates shingles to handle space variations in the input strings and user queries. This fieldType can successfully handle cases where the query is 'thunderbolt' and the document contains the string 'thunder bolt' (the shingle results in the token 'thunderbolt' created during indexing). However, due to the pre-analysis whitespace tokenization done by lucene query parser, the reverse is not handled well - document with string 'thunderbolt' being matched to query 'thunder bolt'. I find that in our dismax handler the shingle field records a match and scores on the 'pf' but the document is not returned as none of the fields in 'qf' record a match (mm is 1). I am looking for suggestions on how to handle this scenario. Using a synonym will obviously work but it seems a rather hackish solution. Is there a more elegant way of achieving a similar effect? Alternatively, is there a way to get the 'mm' parameter to factor in matches on 'pf' also? Kindly help. Regards, Prasanna