> -----Original Message-----
> From: Doug Cutting [mailto:[EMAIL PROTECTED] 
> Sent: Monday, December 01, 2003 1:11 PM
> To: Lucene Users List
> Subject: Re: Dates and others
> 
> Dion Almaer wrote:
> > The only real item that I still want to tweak more is 
> getting recent results higher in the list.
> > 
> > I was wondering if something like this could work (or if there is a 
> > better solution)
> > 
> > At index time, I have the date of the content.  I could do 
> some math 
> > where the higher the date (based on the time_t version or whatever) 
> > the more of a setBoost(metric). Or, for every month in the 
> past, create a larger negative number to setBoost()... or 
> something like that.
> > 
> > Would something like this make sense?
> 
> The problem with this approach is that eventually you'll 
> exhaust the range of the boost.  So this will only work if 
> you re-index things from scratch periodically, with a boost 
> of something like 1/days-ago.
> 
> If you're adding documents to the index in date order, then 
> you could use a HitCollector which adjusts scores according 
> to the document number, since document numbers increase as 
> you add to the index.
> 
> If you're not adding things in date order, then you can, when 
> you open the index, build an array mapping document numbers 
> to integer dates. 
> Then your hit collector can use this to either boost or sort 
> hits by date.
> 
> Or you could add a "month" or "week" field to documents, then 
> add it as a clause to your queries with a boost.  Then 
> documents matching the most recent week(s) and/or month(s) 
> would get the boost.
> 
> Doug

Interesting.  I implemented an approach which boosted based on the number of months in 
the past, and
after tweaking the boost amounts, it seems to do the job. I do a fresh reindex every 
night (since
the indexing process takes no time at all... unlike our old search solution!)

I read content for the index from different sources. Sometimes the source gives me 
documents loosely
in date order, but not all of them. So, it seems that one of the other approaches 
should be taken
(adding a month/week field etc).  I should look more into the HitCollector and see how 
it can help
me.

The other issue I have is that I would like to prioritize the title field.  At the 
moment I am lazy
and add the title to the body (contents = title + body) which seems to be OK... 
however sometimes
something that mentions the search term in the title should appear higher up in the 
pecking order.

I am using the QueryParser (subclassed to disallow wildcards etc) to do the dirty work 
for me.
Should I get away from this and manage the queries myself (and run a Multi against the 
title field
as well as the contents?

Thanks for the great feedback,

Dion


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to