Hello Stephen,

I'm trying to understand what you are suggesting.
Yes, I have a fixed set of keywords for each "good" site,
although they occasionally get edited.

Not sure what you mean by "index these keywords into
a field for each site". Did you mean index these keywords
for all pages (that contain those keywords) of each site?
So that I can boost the field weight in dismax query in
order to boost these sites? I think this is a way of
doing this. Problem the right way. The downside is that
when I need to edit the keyword list, I'll need to
re-index all pages that contain those keywords. But there's
probably no easy way around it.

As far as I know, Nutch doesn't do site specific boost.

Thanks,
Jack


Wednesday, June 18, 2008, 10:52:33 PM, you wrote:

> Is there a fixed set of keywords?  If so, I suppose you could simply
> index these keywords into a field for each site (either through some
> kind of automatic parser or manually - from personal experience I  
> would recommend manually unless you have tens of thousands of these
> things), and then search that field with each word in the query (with
> or).  Any site that had one of these keywords would match it if it  
> were used in the query...

> If there is no list here and you're just indexing all the content of
> all these sites... isn't that what Nutch is designed for?

> --
> Steve

> On Jun 18, 2008, at 11:05 PM, JLIST wrote:

>> Hi all,
>>
>> This is what I'm trying to do: since some sources (say,
>> some web sites) are more authoritative than other sources
>> on certain subjects, I'd like to promote those sites when
>> the query contains certain keywords. I'm not sure what
>> is the best way to implement this. I suppose I can index
>> the keywords in a field for all pages from that site but
>> this isn't very efficient, and any changes in the keyword
>> list would require re-indexing all pages of that site.
>> I wonder if there is a more efficient way that can dynamically
>> promote sites from a domain that is considered more related
>> to the queries. Any suggestion is welcome.


Reply via email to