Other indexing strategies:

- AFAIK, you could probably cheat by multiplying the number of tokens in
headers thus affecting the scoring.

For example:
<h1>hello world</h1> <p> foo bar </p>
content -> hello world hello world foo bar

This is not very tweekable though.

- As Tate suggests, you can also use multiple fields and apply your search
on all of them:

<h1>hello world</h1> <p> foo bar </p>
content-> hello world foo bar
headers-> hello world

or even
<h1>hello world</h1> <h2> foo bar </h2>
content-> hello world foo bar
header1-> hello world
header2-> foo bar

The result of this is that you can fine-grained control over different
fields. At this point, you can boost at indexing or at search time. I
personnaly opt for search time because it is more open for tweeking as
oposed to reindexing everything whenever you want to change a boost
factor.

As for the complexities that Tate mentions for query parsing, he's right
that it's a pain when using the built-in query parser, but you can always
use the api directly to build whatever queries you need.

HTH,
sv

On Fri, 13 Aug 2004, Tate Avery wrote:

>
> Well, as far as I know you can boost 3 different things:
>
> - Field
> - Document
> - Query
>
> So, I think you need to craft a solution using one of those.
>
> Here are some possibilities for each:
>
> 1) Field
>       - make a keyword field which is alongside your content field
>       - boost your keyword field during indexing
>       - expand user queries to search 'content' and 'keywords'
>
> 2) Document
>       - I don't really think this one helps you in anyway
>
> 3) Query
>       - Scan a user query and selectively boost words that are known keywords
>       - This requires a keyword list and is not really scalable
>
> That is all that comes to mind, at first glance.  So, IMO, the winner IS #1.
>
> For example:
>
>       Field _headline = Field.Text("headline", "...");
>       _headline.setBoost(3);
>
>       Field _content = Field.Text("content", "...");
>
>       _document.addField(_headline);
>       _document.addField(_content);
>
>
> But, the tricky part is modifying queries to use both fields.  If a user
> enters "virus", it is easy (i.e. "content:(virus) OR headline:(virus)").
> But, it quickly gets more complex with more complex queries (especially
> boolean queries with AND and such ... you probably would need something
> roughly like this:  "a AND b" = "content:(a AND b) OR headline:(a AND b)
> OR (content:a AND headline:b) OR (headline:a AND content:b) and so on).
>
> That's my 2 cents.
>
> T
>
>
>
> -----Original Message-----
> From: news [mailto:[EMAIL PROTECTED] Behalf Of Leos Literak
> Sent: Friday, August 13, 2004 8:52 AM
> To: [EMAIL PROTECTED]
> Subject: Re: boost keywords
>
>
> Gerard Sychay napsal(a):
> > Well, there is always the Lucene wiki. There's not a patterns page per
> > se, but you could start one..
>
> of course I could. If I had something to add :-)
>
> but back to my issue. no reaction? So much people using
> Lucene and no one knows? I would be gratefull for any
> advice. Thanks
>
> Leos
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to