This actually brings up an interesting question, and something I have been curious about.
In this case, does it make more sense to do Boosting by Category, or to do sorting? From what I understand, Lucene sorting involves putting the relevant fields into memory, and then executing a sort. Is this how sorting actually works in Lucene? If so, is it even a good idea considering the large data sets in Lucene? What would really be the difference between sorting and boosting? M On Fri, Nov 7, 2008 at 7:59 AM, Erick Erickson <[EMAIL PROTECTED]>wrote: > duuuuh, sorting. I absolutely love it when I overlook the obvious <G>. > > [EMAIL PROTECTED] > > On Fri, Nov 7, 2008 at 4:58 AM, Michael McCandless < > [EMAIL PROTECTED]> wrote: > > > > > Couldn't you just do a single Query that sorts first by category and > second > > by relevance? > > > > Mike > > > > > > Erick Erickson wrote: > > > > It seems to me that the easiest thing would be to fire two queries and > >> then just concatenate the results > >> > >> category:A AND body:fred > >> > >> category:B AND body:fred > >> > >> > >> If you really, really didn't want to fire two queries, you could create > >> filters on category A and category B and make a couple of > >> passes through your results seeing if the returned documents were in > >> the filter, but you'd still concatenate the results. Actually in your > >> specific example you could make one filter on A..... > >> > >> You could also consider a custom scorer that, added 1,000,000 to every > >> category A document. > >> > >> How much were you boosting by? What happens if you boost by a very large > >> factor? > >> As in ridiculously large? > >> > >> Best > >> Erick > >> > >> On Thu, Nov 6, 2008 at 7:42 PM, Scott Smith <[EMAIL PROTECTED] > >> >wrote: > >> > >> I'm interested in comments on the following problem. > >>> > >>> > >>> > >>> I have a set of documents. They fall into 3 categories. Call these > >>> categories A, B, and C. Each document has an indexed, non-tokenized > >>> field called "category" which contains A, B, or C (they are mutually > >>> exclusive categories). > >>> > >>> > >>> > >>> All of the documents contain a field called "body" which contains a > >>> bunch of text. This field is indexed and tokenized. > >>> > >>> > >>> > >>> So, I want to do a search which looks something like: > >>> > >>> > >>> > >>> (category:A OR category:B) AND body:fred > >>> > >>> > >>> > >>> I want all of the category A documents to come before the category B > >>> documents. Effectively, I want to have the category A documents first > >>> (sorted by relevancy) and then the category B documents after (sorted > by > >>> relevancy). > >>> > >>> > >>> > >>> I thought I could do this by boosting the category portion of the > query, > >>> but that doesn't seem to work consistently. I was setting the boost on > >>> the category A term to 1.0 and the boost on the category B term to 0.0. > >>> > >>> > >>> > >>> Any thoughts how to skin this? > >>> > >>> > >>> > >>> Scott > >>> > >>> > >>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > -- Matthew P. DeLoria [EMAIL PROTECTED]