Changing similarity at query time

2013-12-09 Thread Ivan Brusic
I am currently using document-level boosts, which really translates to
changing the norm for every field under the covers. As part of an
experiment, I want to remove the boost, but that would require either
re-indexing content or changing the scoring algorithm (similarity).

If I create my own similarity, which would be identical to the
default/TFIDF similarity but with the norm hardcoded to 1.0. The Lucene
source warns against such behavior. I am assuming this is because it
appears computeNorm is only run at index time,but shouldn't I be able to
override score() and ignore the decodeNormValue calculation?

Cheers,

Ivan


Why PhraseQuery translate stopwords to "?"

2013-12-09 Thread Jean-Claude Dauphin
Hi,

My application uses an analyzer with a StopWordFilter. PhraseQuery
translates queries with stopwords by replacing stopwords to "?" characters.
For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to
contribute" is replaced by "? contribute" . Sequence of terms are indexed
without stopwords. Query Searching works when the stopword starts the
phrase but no results when the "?"  is not at the beginning.

Searching for phrases without stopwords works well.

Any explanation/FAQ/user-list-message that explains why PhraseQuery
translate stopwords to "?" would be appreciated.

Thank you in advance

Jean-Claude Dauphin

-- 
Jean-Claude Dauphin

jc.daup...@gmail.com
jc.daup...@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org


Re: Why PhraseQuery translate stopwords to "?"

2013-12-09 Thread Jack Krupansky
The analyzer is generating holes for the stop words - the position of the 
subsequent term is incremented an extra time for each stop word so that 
their positions are maintained.


-- Jack Krupansky

-Original Message- 
From: Jean-Claude Dauphin

Sent: Monday, December 09, 2013 4:15 PM
To: java-user@lucene.apache.org
Subject: Why PhraseQuery translate stopwords to "?"

Hi,

My application uses an analyzer with a StopWordFilter. PhraseQuery
translates queries with stopwords by replacing stopwords to "?" characters.
For example, "Java and Lucene" is replaced by "Java ? Lucene" and "to
contribute" is replaced by "? contribute" . Sequence of terms are indexed
without stopwords. Query Searching works when the stopword starts the
phrase but no results when the "?"  is not at the beginning.

Searching for phrases without stopwords works well.

Any explanation/FAQ/user-list-message that explains why PhraseQuery
translate stopwords to "?" would be appreciated.

Thank you in advance

Jean-Claude Dauphin

--
Jean-Claude Dauphin

jc.daup...@gmail.com
jc.daup...@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Changing similarity at query time

2013-12-09 Thread Ivan Brusic
To answer my own question, it appears that despite the warning, using a
custom similarity only at search time appears to be working. The score()
method was the wrong code to override, I simply hardcoded the return value
of decodeNormValue to 1.0. Since this value is used for normalization, as
long as I am consistent, it should be valid.

Now onto the real testing.

Cheers,

Ivan


On Mon, Dec 9, 2013 at 9:41 AM, Ivan Brusic  wrote:

> I am currently using document-level boosts, which really translates to
> changing the norm for every field under the covers. As part of an
> experiment, I want to remove the boost, but that would require either
> re-indexing content or changing the scoring algorithm (similarity).
>
> If I create my own similarity, which would be identical to the
> default/TFIDF similarity but with the norm hardcoded to 1.0. The Lucene
> source warns against such behavior. I am assuming this is because it
> appears computeNorm is only run at index time,but shouldn't I be able to
> override score() and ignore the decodeNormValue calculation?
>
> Cheers,
>
> Ivan
>