as far as I can see, nutch does not index any html meta-tags like description or keywords. Does anybody know the reason for this?

I'm not sure why Nutch doesn't do it, but a lot of search engines
stopped using those for scoring because they were abused by
spam sites that would stuff them with keywords.

If you really want it, it's not too difficult. Just copy the
index-basic plugin and add some code to index it:

   String desc = metadata.getProperty("description");
   String keywords = metadata.getProperty("keywords");

  doc.add(Field.Text("content", description));
  doc.add(Field.Text("content", keywords));

  // Or you could add your own fields, but you'll have to
  // change your query filters to pick them up:

  doc.add(Field.Text("description", description));
  doc.add(Field.Text("keywords", keywords));




-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to