Re: [htdig] Index HTML pages with two or tags</span></a></span> </h1> <p class="darkgray font13"> <span class="sender pipe"><a href="/search?l=htdig-general@lists.sourceforge.net&q=from:%22Gilles+Detillieux%22" rel="nofollow"><span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name">Gilles Detillieux</span></span></a></span> <span class="date"><a href="/search?l=htdig-general@lists.sourceforge.net&q=date:20021119" rel="nofollow">Tue, 19 Nov 2002 13:26:48 -0800</a></span> </p> </div> <div itemprop="articleBody" class="msgBody"> <!--X-Body-of-Message--> <pre>According to Howard Kim: > I've noticed that my installation of HtDig was not producing "good" > results. (I know 'good' is relative) As I was trying to find the > problem, I ran htdig with various levels of -v and found that htdig was > returning that "More than one <title> tag in document!" was found. I > understand this is to prevent search engine spamming, but I have two > questions: > > 1. Is this affecting the indexing of these documents? > 2. If so, can I turn it off? > > I know HTML pages should not have multiple <head> and <title> areas, > but I don't have any control over that. I just need to be able to > search it.</pre><pre> The documents still get indexed, even if they contain multiple <title> areas, but only the first one in the document will be treated as title text and indexed using title_factor. The second and subsequent titles are treated as regular text. If you need to change this behaviour so that all titles are indexed as title text, you'll need to edit htdig/HTML.cc and remove the "break;" statement after the warning about search engine spamming (or remove the whole "if" clause to avoid the test and warning too). -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: <a href="http://www.scrc.umanitoba.ca/">http://www.scrc.umanitoba.ca/</a> Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate: <a href="http://www.gothawte.com/rd524.html">http://www.gothawte.com/rd524.html</a> _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: <a href="http://htdig.sourceforge.net/FAQ.html">http://htdig.sourceforge.net/FAQ.html</a> </pre> </div> <div class="msgButtons margintopdouble"> <ul class="overflow"> <li class="msgButtonItems"><a class="button buttonleft " accesskey="p" href="msg06374.html">Previous message</a></li> <li class="msgButtonItems textaligncenter"><a class="button" accesskey="c" href="thrd14.html#06375">View by thread</a></li> <li class="msgButtonItems textaligncenter"><a class="button" accesskey="i" href="mail14.html#06375">View by date</a></li> <li class="msgButtonItems textalignright"><a class="button buttonright " accesskey="n" href="msg06379.html">Next message</a></li> </ul> </div> <a name="tslice"></a> <div class="tSliceList margintopdouble"> <ul class="icons monospace"> <li class="icons-email"><span class="subject"><a href="msg06374.html">[htdig] Index HTML pages with two <head> or <ti...</a></span> <span class="sender italic">Howard Kim</span></li> <li><ul> <li class="icons-email tSliceCur"><span class="subject"></span> <span class="sender italic">Gilles Detillieux</span></li> </ul> </ul> </div> <div class="overflow msgActions margintopdouble"> <div class="msgReply" > <h2> Reply via email to </h2> <form method="POST" action="/mailto.php"> <input type="hidden" name="subject" value="Re: [htdig] Index HTML pages with two <head> or <title> tags"> <input type="hidden" name="msgid" value="200211192121.PAA13850@cliff.scrc.umanitoba.ca"> <input type="hidden" name="relpath" value="htdig-general@lists.sourceforge.net/msg06375.html"> <input type="submit" value=" Gilles Detillieux "> </form> </div> </div> </div> <div class="aside" role="complementary"> <div class="logo"> <a href="/"><img src="/logo.png" width=247 height=88 alt="The Mail Archive"></a> </div> <form class="overflow" action="/search" method="get"> <input type="hidden" name="l" value="htdig-general@lists.sourceforge.net"> <label class="hidden" for="q">Search the site</label> <input class="submittext" type="text" id="q" name="q" placeholder="Search htdig-general"> <input class="submitbutton" name="submit" type="image" src="/submit.png" alt="Submit"> </form> <div class="nav margintop" id="nav" role="navigation"> <ul class="icons font16"> <li class="icons-home"><a href="/">The Mail Archive home</a></li> <li class="icons-list"><a href="/htdig-general@lists.sourceforge.net/">htdig-general - all messages</a></li> <li class="icons-about"><a href="/htdig-general@lists.sourceforge.net/info.html">htdig-general - about the list</a></li> <li class="icons-expand"><a href="/search?l=htdig-general@lists.sourceforge.net&q=subject:%22Re%5C%3A+%5C%5Bhtdig%5C%5D+Index+HTML+pages+with+two+%3Chead%3E+or+%3Ctitle%3E+tags%22&o=newest&f=1" title="e" id="e">Expand</a></li> <li class="icons-prev"><a href="msg06374.html" title="p">Previous message</a></li> <li class="icons-next"><a href="msg06379.html" title="n">Next message</a></li> </ul> </div> <div class="listlogo margintopdouble"> <a href="#"><img src="/htdig-general@lists.sourceforge.net/logo.png" alt="htdig-general"></a> </div> <div class="margintopdouble"> </div> </div> </div> <div class="footer" role="contentinfo"> <ul> <li><a href="/">The Mail Archive home</a></li> <li><a href="/faq.html#newlist">Add your mailing list</a></li> <li><a href="/faq.html">FAQ</a></li> <li><a href="/faq.html#support">Support</a></li> <li><a href="/faq.html#privacy">Privacy</a></li> <li class="darkgray">200211192121.PAA13850@cliff.scrc.umanitoba.ca</li> </ul> </div> </body> </html> <script>(function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9b2f40aed9160cda',t:'MTc2NjU3MDQzNg=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();</script>