Other indexing strategies: - AFAIK, you could probably cheat by multiplying the number of tokens in headers thus affecting the scoring.
For example: <h1>hello world</h1> <p> foo bar </p> content -> hello world hello world foo bar This is not very tweekable though. - As Tate suggests, you can also use multiple fields and apply your search on all of them: <h1>hello world</h1> <p> foo bar </p> content-> hello world foo bar headers-> hello world or even <h1>hello world</h1> <h2> foo bar </h2> content-> hello world foo bar header1-> hello world header2-> foo bar The result of this is that you can fine-grained control over different fields. At this point, you can boost at indexing or at search time. I personnaly opt for search time because it is more open for tweeking as oposed to reindexing everything whenever you want to change a boost factor. As for the complexities that Tate mentions for query parsing, he's right that it's a pain when using the built-in query parser, but you can always use the api directly to build whatever queries you need. HTH, sv On Fri, 13 Aug 2004, Tate Avery wrote: > > Well, as far as I know you can boost 3 different things: > > - Field > - Document > - Query > > So, I think you need to craft a solution using one of those. > > Here are some possibilities for each: > > 1) Field > - make a keyword field which is alongside your content field > - boost your keyword field during indexing > - expand user queries to search 'content' and 'keywords' > > 2) Document > - I don't really think this one helps you in anyway > > 3) Query > - Scan a user query and selectively boost words that are known keywords > - This requires a keyword list and is not really scalable > > That is all that comes to mind, at first glance. So, IMO, the winner IS #1. > > For example: > > Field _headline = Field.Text("headline", "..."); > _headline.setBoost(3); > > Field _content = Field.Text("content", "..."); > > _document.addField(_headline); > _document.addField(_content); > > > But, the tricky part is modifying queries to use both fields. If a user > enters "virus", it is easy (i.e. "content:(virus) OR headline:(virus)"). > But, it quickly gets more complex with more complex queries (especially > boolean queries with AND and such ... you probably would need something > roughly like this: "a AND b" = "content:(a AND b) OR headline:(a AND b) > OR (content:a AND headline:b) OR (headline:a AND content:b) and so on). > > That's my 2 cents. > > T > > > > -----Original Message----- > From: news [mailto:[EMAIL PROTECTED] Behalf Of Leos Literak > Sent: Friday, August 13, 2004 8:52 AM > To: [EMAIL PROTECTED] > Subject: Re: boost keywords > > > Gerard Sychay napsal(a): > > Well, there is always the Lucene wiki. There's not a patterns page per > > se, but you could start one.. > > of course I could. If I had something to add :-) > > but back to my issue. no reaction? So much people using > Lucene and no one knows? I would be gratefull for any > advice. Thanks > > Leos > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]