Howie Wang wrote:
as far as I can see, nutch does not index any html meta-tags like
description or keywords. Does anybody know the reason for this?
I'm not sure why Nutch doesn't do it, but a lot of search engines
stopped using those for scoring because they were abused by
spam sites that would stuff them with keywords.
Same reason - keywords and description meta-tags are rarely useful these
days. But you may hope they are useful if you crawl .gov, .mil, and
sometimes .edu domains.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general