This actually brings up an interesting question, and something I have been
curious about.

In this case, does it make more sense to do Boosting by Category, or to do
sorting? From what I understand, Lucene sorting involves putting the
relevant fields into memory, and then executing a sort.

Is this how sorting actually works in Lucene? If so, is it even a good idea
considering the large data sets in Lucene? What would really be the
difference between sorting and boosting?

M

On Fri, Nov 7, 2008 at 7:59 AM, Erick Erickson <[EMAIL PROTECTED]>wrote:

> duuuuh, sorting. I absolutely love it when I overlook the obvious <G>.
>
> [EMAIL PROTECTED]
>
> On Fri, Nov 7, 2008 at 4:58 AM, Michael McCandless <
> [EMAIL PROTECTED]> wrote:
>
> >
> > Couldn't you just do a single Query that sorts first by category and
> second
> > by relevance?
> >
> > Mike
> >
> >
> > Erick Erickson wrote:
> >
> >  It seems to me that the easiest thing would be to fire two queries and
> >> then just concatenate the results
> >>
> >> category:A AND body:fred
> >>
> >> category:B AND body:fred
> >>
> >>
> >> If you really, really didn't want to fire two queries, you could create
> >> filters on category A and category B and make a couple of
> >> passes through your results seeing if the returned documents were in
> >> the filter, but you'd still concatenate the results. Actually in your
> >> specific example you could make one filter on A.....
> >>
> >> You could also consider a custom scorer that, added 1,000,000 to every
> >> category A document.
> >>
> >> How much were you boosting by? What happens if you boost by a very large
> >> factor?
> >> As in ridiculously large?
> >>
> >> Best
> >> Erick
> >>
> >> On Thu, Nov 6, 2008 at 7:42 PM, Scott Smith <[EMAIL PROTECTED]
> >> >wrote:
> >>
> >>  I'm interested in comments on the following problem.
> >>>
> >>>
> >>>
> >>> I have a set of documents.  They fall into 3 categories.  Call these
> >>> categories A, B, and C.  Each document has an indexed, non-tokenized
> >>> field called "category" which contains A, B, or C (they are mutually
> >>> exclusive categories).
> >>>
> >>>
> >>>
> >>> All of the documents contain a field called "body" which contains a
> >>> bunch of text.  This field is indexed and tokenized.
> >>>
> >>>
> >>>
> >>> So, I want to do a search which looks something like:
> >>>
> >>>
> >>>
> >>> (category:A OR category:B) AND body:fred
> >>>
> >>>
> >>>
> >>> I want all of the category A documents to come before the category B
> >>> documents.  Effectively, I want to have the category A documents first
> >>> (sorted by relevancy) and then the category B documents after (sorted
> by
> >>> relevancy).
> >>>
> >>>
> >>>
> >>> I thought I could do this by boosting the category portion of the
> query,
> >>> but that doesn't seem to work consistently.  I was setting the boost on
> >>> the category A term to 1.0 and the boost on the category B term to 0.0.
> >>>
> >>>
> >>>
> >>> Any thoughts how to skin this?
> >>>
> >>>
> >>>
> >>> Scott
> >>>
> >>>
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>



-- 
Matthew P. DeLoria
[EMAIL PROTECTED]

Reply via email to