Hello Stephen, I'm trying to understand what you are suggesting. Yes, I have a fixed set of keywords for each "good" site, although they occasionally get edited.
Not sure what you mean by "index these keywords into a field for each site". Did you mean index these keywords for all pages (that contain those keywords) of each site? So that I can boost the field weight in dismax query in order to boost these sites? I think this is a way of doing this. Problem the right way. The downside is that when I need to edit the keyword list, I'll need to re-index all pages that contain those keywords. But there's probably no easy way around it. As far as I know, Nutch doesn't do site specific boost. Thanks, Jack Wednesday, June 18, 2008, 10:52:33 PM, you wrote: > Is there a fixed set of keywords? If so, I suppose you could simply > index these keywords into a field for each site (either through some > kind of automatic parser or manually - from personal experience I > would recommend manually unless you have tens of thousands of these > things), and then search that field with each word in the query (with > or). Any site that had one of these keywords would match it if it > were used in the query... > If there is no list here and you're just indexing all the content of > all these sites... isn't that what Nutch is designed for? > -- > Steve > On Jun 18, 2008, at 11:05 PM, JLIST wrote: >> Hi all, >> >> This is what I'm trying to do: since some sources (say, >> some web sites) are more authoritative than other sources >> on certain subjects, I'd like to promote those sites when >> the query contains certain keywords. I'm not sure what >> is the best way to implement this. I suppose I can index >> the keywords in a field for all pages from that site but >> this isn't very efficient, and any changes in the keyword >> list would require re-indexing all pages of that site. >> I wonder if there is a more efficient way that can dynamically >> promote sites from a domain that is considered more related >> to the queries. Any suggestion is welcome.