According to Howard Kim: > I've noticed that my installation of HtDig was not producing "good" > results. (I know 'good' is relative) As I was trying to find the > problem, I ran htdig with various levels of -v and found that htdig was > returning that "More than one <title> tag in document!" was found. I > understand this is to prevent search engine spamming, but I have two > questions: > > 1. Is this affecting the indexing of these documents? > 2. If so, can I turn it off? > > I know HTML pages should not have multiple <head> and <title> areas, > but I don't have any control over that. I just need to be able to > search it.
The documents still get indexed, even if they contain multiple <title> areas, but only the first one in the document will be treated as title text and indexed using title_factor. The second and subsequent titles are treated as regular text. If you need to change this behaviour so that all titles are indexed as title text, you'll need to edit htdig/HTML.cc and remove the "break;" statement after the warning about search engine spamming (or remove the whole "if" clause to avoid the test and warning too). -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate: http://www.gothawte.com/rd524.html _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

