On Monday 01 December 2003 15:13, Dion Almaer wrote:
...
> Interesting.  I implemented an approach which boosted based on the number
> of months in the past, and after tweaking the boost amounts, it seems to do
> the job. I do a fresh reindex every night (since the indexing process takes
> no time at all... unlike our old search solution!)

This sounds interesting, as I have been thinking of what's the best way
to boost newer documents. Can you share some of your experience regarding 
boost values that seemed to make sense? In my case, CMS I'm working on stores 
support documentation for software/hardware, meaning that content is highly 
time-sensitive (ie. documents "decay" pretty quickly).

Since the system is already doing both incremental reindexing, and nightly 
full reindexing (latter to make sure that even if temporarily some changed 
content was not [fully] reindexed, it eventually gets indexed properly), I 
can fairly easily add boosting I think.

On a related note, it would also be nice if there was a way to start 
categorizing general "hot topics" for Lucene developers; it seems like there 
are about half a dozen areas where there's lots of interest for improvements 
(most of them related to ranking). If so, perhaps there could be more 
specific discussion groups, and also perhaps web pages summarizing some of 
discussions, consensus achieved, even if there's no code to show for it?

-+ Tatu +-

>
> I read content for the index from different sources. Sometimes the source
> gives me documents loosely in date order, but not all of them. So, it seems
> that one of the other approaches should be taken (adding a month/week field
> etc).  I should look more into the HitCollector and see how it can help me.
>
> The other issue I have is that I would like to prioritize the title field. 
> At the moment I am lazy and add the title to the body (contents = title +
> body) which seems to be OK... however sometimes something that mentions the
> search term in the title should appear higher up in the pecking order.
>
> I am using the QueryParser (subclassed to disallow wildcards etc) to do the
> dirty work for me. Should I get away from this and manage the queries
> myself (and run a Multi against the title field as well as the contents?
>
> Thanks for the great feedback,
>
> Dion
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to