Re: Question About Boosting.

2007-03-12 Thread shai deljo

Buckets it is :)
Thx

On 3/12/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: I thought about this option but it doesn't sound scalable. What
: happens if i have 100 words with 100 different boost factors?

then you've got a problem :)

typically it's not this severe ... i'll frequently have half a dozen
fields that i divide text up into to boost on different amounts, but i'm
having a hard time understanding why you would need 100 unique boost
factors for 100 unique words ... putting things buckets tends be
effective.



-Hoss




Re: Question About Boosting.

2007-03-12 Thread Chris Hostetter

: I thought about this option but it doesn't sound scalable. What
: happens if i have 100 words with 100 different boost factors?

then you've got a problem :)

typically it's not this severe ... i'll frequently have half a dozen
fields that i divide text up into to boost on different amounts, but i'm
having a hard time understanding why you would need 100 unique boost
factors for 100 unique words ... putting things buckets tends be
effective.



-Hoss



Re: Question About Boosting.

2007-03-12 Thread shai deljo

I thought about this option but it doesn't sound scalable. What
happens if i have 100 words with 100 different boost factors?

On 3/12/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: I have elements within a field that have different importance.
: I thought boosting would be an elegant way to take this into account.
: Please advise,

typically if you know when sending hte doc to solr that certian
words/phrases of field A are extremely significant for that document, the
simple approach is to also put those words/phrases in some other field "B"
and at query time search both A and B .. since B tends to have less words
anyway, itmakes more of an impact on teh results, but if you want those
words to be *really* important boost your queries on B.

The dismax handler makes quering across these multiple fields very easy.



-Hoss




Re: Question About Boosting.

2007-03-12 Thread Chris Hostetter

: I have elements within a field that have different importance.
: I thought boosting would be an elegant way to take this into account.
: Please advise,

typically if you know when sending hte doc to solr that certian
words/phrases of field A are extremely significant for that document, the
simple approach is to also put those words/phrases in some other field "B"
and at query time search both A and B .. since B tends to have less words
anyway, itmakes more of an impact on teh results, but if you want those
words to be *really* important boost your queries on B.

The dismax handler makes quering across these multiple fields very easy.



-Hoss



Re: Question About Boosting.

2007-03-11 Thread Mike Klaas

On 3/11/07, shai deljo <[EMAIL PROTECTED]> wrote:

Thanks,
The only way i found to do this
(http://www.mail-archive.com/solr-user@lucene.apache.org/msg02456.html)
 is to hack and repeat the word several times in the field, but
doesn't this screw up the norms?


Yes, it can influence the norms.


Also, how do i boost words in a query? e.g. q=key1 key2 and i know
key2 is twice as important than key1 ? (searching 1 field).


q=key1 key2^2

If the keywords that have more importance are the same for every
document, query-time boosting is by far the more preferable route.
You have much more flexibility and it isn't  less performant.

There are some things which are elegantly solved using index-time
boosting, and so it is likely that lucene will support it one day.

-Mike


Re: Question About Boosting.

2007-03-11 Thread shai deljo

Thanks,
The only way i found to do this
(http://www.mail-archive.com/solr-user@lucene.apache.org/msg02456.html)
is to hack and repeat the word several times in the field, but
doesn't this screw up the norms?
Also, how do i boost words in a query? e.g. q=key1 key2 and i know
key2 is twice as important than key1 ? (searching 1 field).
Thanks,
S.

On 3/11/07, Walter Underwood <[EMAIL PROTECTED]> wrote:

Back up another step. What are the documents and what do you
want to show to the users? Have you tried the default configuration
with real user queries?

After you've tested it with user queries, then look at the
results where the ranking isn't performing well.

Lucene and Solr already automatically boost rare terms over
common terms, using tf.idf weighting.

I posted more detail on this in my blog last summer:

http://wunderwood.org/most_casual_observer/2006/06/good_to_great_search.html

wunder

On 3/10/07 8:04 PM, "shai deljo" <[EMAIL PROTECTED]> wrote:

> I have elements within a field that have different importance.
> I thought boosting would be an elegant way to take this into account.
> Please advise,
>
>
> On 3/10/07, Walter Underwood <[EMAIL PROTECTED]> wrote:
>> What are you trying to achieve? Let's start with the problem
>> instead of picking one solution which Solr doesn't support. --wunder
>>
>> On 3/10/07 5:08 PM, "shai deljo" <[EMAIL PROTECTED]> wrote:
>>
>>> How can i boost some tokens over others in the same field (at Index
>>> time) ? If this is not supported directly, what's the best way around
>>> this problem (what's the hack to solve this :) ).
>>> Thanks,
>>> Shai
>>
>>




Re: Question About Boosting.

2007-03-11 Thread Walter Underwood
Back up another step. What are the documents and what do you
want to show to the users? Have you tried the default configuration
with real user queries?

After you've tested it with user queries, then look at the
results where the ranking isn't performing well.

Lucene and Solr already automatically boost rare terms over
common terms, using tf.idf weighting.

I posted more detail on this in my blog last summer:

http://wunderwood.org/most_casual_observer/2006/06/good_to_great_search.html

wunder

On 3/10/07 8:04 PM, "shai deljo" <[EMAIL PROTECTED]> wrote:

> I have elements within a field that have different importance.
> I thought boosting would be an elegant way to take this into account.
> Please advise,
> 
> 
> On 3/10/07, Walter Underwood <[EMAIL PROTECTED]> wrote:
>> What are you trying to achieve? Let's start with the problem
>> instead of picking one solution which Solr doesn't support. --wunder
>> 
>> On 3/10/07 5:08 PM, "shai deljo" <[EMAIL PROTECTED]> wrote:
>> 
>>> How can i boost some tokens over others in the same field (at Index
>>> time) ? If this is not supported directly, what's the best way around
>>> this problem (what's the hack to solve this :) ).
>>> Thanks,
>>> Shai
>> 
>> 



Re: Question About Boosting.

2007-03-10 Thread shai deljo

I have elements within a field that have different importance.
I thought boosting would be an elegant way to take this into account.
Please advise,


On 3/10/07, Walter Underwood <[EMAIL PROTECTED]> wrote:

What are you trying to achieve? Let's start with the problem
instead of picking one solution which Solr doesn't support. --wunder

On 3/10/07 5:08 PM, "shai deljo" <[EMAIL PROTECTED]> wrote:

> How can i boost some tokens over others in the same field (at Index
> time) ? If this is not supported directly, what's the best way around
> this problem (what's the hack to solve this :) ).
> Thanks,
> Shai




Re: Question About Boosting.

2007-03-10 Thread Walter Underwood
What are you trying to achieve? Let's start with the problem
instead of picking one solution which Solr doesn't support. --wunder

On 3/10/07 5:08 PM, "shai deljo" <[EMAIL PROTECTED]> wrote:

> How can i boost some tokens over others in the same field (at Index
> time) ? If this is not supported directly, what's the best way around
> this problem (what's the hack to solve this :) ).
> Thanks,
> Shai