Re: Boosting results

Michael McCandless Fri, 07 Nov 2008 09:46:56 -0800


This is a good point.

Sorting populates the field cache (internal to Lucene) for that field,meaning it loads all values for all docs and holds them in memory.This makes the first query slow, and, consumes RAM, in proportion tohow large your index is.

Whereas boosting should be able to achieve the use case without theselimitations.


Mike

Matthew DeLoria wrote:

This actually brings up an interesting question, and something Ihave been
curious about.
In this case, does it make more sense to do Boosting by Category, orto do
sorting? From what I understand, Lucene sorting involves putting the
relevant fields into memory, and then executing a sort.
Is this how sorting actually works in Lucene? If so, is it even agood idea
considering the large data sets in Lucene? What would really be the
difference between sorting and boosting?

M
On Fri, Nov 7, 2008 at 7:59 AM, Erick Erickson <[EMAIL PROTECTED]>wrote:
duuuuh, sorting. I absolutely love it when I overlook the obvious<G>.
[EMAIL PROTECTED]

On Fri, Nov 7, 2008 at 4:58 AM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
Couldn't you just do a single Query that sorts first by category and
second
by relevance?

Mike


Erick Erickson wrote:
It seems to me that the easiest thing would be to fire two queriesand
then just concatenate the results

category:A AND body:fred

category:B AND body:fred
If you really, really didn't want to fire two queries, you couldcreate
filters on category A and category B and make a couple of
passes through your results seeing if the returned documents wereinthe filter, but you'd still concatenate the results. Actually inyour
specific example you could make one filter on A.....
You could also consider a custom scorer that, added 1,000,000 toevery
category A document.
How much were you boosting by? What happens if you boost by avery large
factor?
As in ridiculously large?

Best
Erick

On Thu, Nov 6, 2008 at 7:42 PM, Scott Smith <[EMAIL PROTECTED]
wrote:
I'm interested in comments on the following problem.
I have a set of documents. They fall into 3 categories. Callthesecategories A, B, and C. Each document has an indexed, non-tokenizedfield called "category" which contains A, B, or C (they aremutually
exclusive categories).
All of the documents contain a field called "body" whichcontains a
bunch of text.  This field is indexed and tokenized.



So, I want to do a search which looks something like:



(category:A OR category:B) AND body:fred
I want all of the category A documents to come before thecategory Bdocuments. Effectively, I want to have the category A documentsfirst(sorted by relevancy) and then the category B documents after(sorted
by
relevancy).



I thought I could do this by boosting the category portion of the
query,
but that doesn't seem to work consistently. I was setting theboost onthe category A term to 1.0 and the boost on the category B termto 0.0.
Any thoughts how to skin this?



Scott
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Matthew P. DeLoria
[EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boosting results

Reply via email to