Changing similarity at query time
I am currently using document-level boosts, which really translates to changing the norm for every field under the covers. As part of an experiment, I want to remove the boost, but that would require either re-indexing content or changing the scoring algorithm (similarity). If I create my own similarity, which would be identical to the default/TFIDF similarity but with the norm hardcoded to 1.0. The Lucene source warns against such behavior. I am assuming this is because it appears computeNorm is only run at index time,but shouldn't I be able to override score() and ignore the decodeNormValue calculation? Cheers, Ivan
Why PhraseQuery translate stopwords to "?"
Hi, My application uses an analyzer with a StopWordFilter. PhraseQuery translates queries with stopwords by replacing stopwords to "?" characters. For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to contribute" is replaced by "? contribute" . Sequence of terms are indexed without stopwords. Query Searching works when the stopword starts the phrase but no results when the "?" is not at the beginning. Searching for phrases without stopwords works well. Any explanation/FAQ/user-list-message that explains why PhraseQuery translate stopwords to "?" would be appreciated. Thank you in advance Jean-Claude Dauphin -- Jean-Claude Dauphin jc.daup...@gmail.com jc.daup...@afus.unesco.org http://kenai.com/projects/j-isis/ http://www.unesco.org/isis/ http://www.unesco.org/idams/ http://www.greenstone.org
Re: Why PhraseQuery translate stopwords to "?"
The analyzer is generating holes for the stop words - the position of the subsequent term is incremented an extra time for each stop word so that their positions are maintained. -- Jack Krupansky -Original Message- From: Jean-Claude Dauphin Sent: Monday, December 09, 2013 4:15 PM To: java-user@lucene.apache.org Subject: Why PhraseQuery translate stopwords to "?" Hi, My application uses an analyzer with a StopWordFilter. PhraseQuery translates queries with stopwords by replacing stopwords to "?" characters. For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to contribute" is replaced by "? contribute" . Sequence of terms are indexed without stopwords. Query Searching works when the stopword starts the phrase but no results when the "?" is not at the beginning. Searching for phrases without stopwords works well. Any explanation/FAQ/user-list-message that explains why PhraseQuery translate stopwords to "?" would be appreciated. Thank you in advance Jean-Claude Dauphin -- Jean-Claude Dauphin jc.daup...@gmail.com jc.daup...@afus.unesco.org http://kenai.com/projects/j-isis/ http://www.unesco.org/isis/ http://www.unesco.org/idams/ http://www.greenstone.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Changing similarity at query time
To answer my own question, it appears that despite the warning, using a custom similarity only at search time appears to be working. The score() method was the wrong code to override, I simply hardcoded the return value of decodeNormValue to 1.0. Since this value is used for normalization, as long as I am consistent, it should be valid. Now onto the real testing. Cheers, Ivan On Mon, Dec 9, 2013 at 9:41 AM, Ivan Brusic wrote: > I am currently using document-level boosts, which really translates to > changing the norm for every field under the covers. As part of an > experiment, I want to remove the boost, but that would require either > re-indexing content or changing the scoring algorithm (similarity). > > If I create my own similarity, which would be identical to the > default/TFIDF similarity but with the norm hardcoded to 1.0. The Lucene > source warns against such behavior. I am assuming this is because it > appears computeNorm is only run at index time,but shouldn't I be able to > override score() and ignore the decodeNormValue calculation? > > Cheers, > > Ivan >