Actually, to append to this problem I'm running into, it doesn't appear to be the text WITHIN the anchor tag -- rather, there is text between the beginning <A> and the ending </A> of the anchor tag that contains the word "large".
The question still is, though, how do I completely wipe out any reference to text between these tags as relevant to the linked document? Thanks, -Jes --- Jessica Biola <[EMAIL PROTECTED]> wrote: > I have two files on my test site being indexed. > > fruit.html > pineapple.html > > There's a word, "large" on fruit.html. "large" does > NOT appear anywhere within pineapple.html, however, > when I htsearch on the index, both documents show as > a > match. In fact, the base_score is very high for the > query "large" on the document pineapple.html. > > If I index pineapple.html alone, the query "large" > yields no results. So there is definitely some type > of relationship between the two documents and the > word > "large". htdump revealed the relationship by > outputting this: > > 3 u:http://jtest/pineapple.htm t:pineapples a:0 > m:1033563111 s:5824 H: Pineapple trees h: > l:1033563111 L:12 b:5 c:1 g:0 e: > > n: S: d:large A:Stump^ATrunk > > The key part is "d:large". According to the htdump > doc page online, the "d" element is defined as: > "The > text of links pointing to this document. (e.g. <a > href="docURL">description</a>)" > > fruit.html, the other HTML file indexed, contains a > hyperlink to pineapple.html that looks like this: > > <a href="pineapple.html" > onMouseOver="MM_showHideLayers('Pine_apple','','show','large','','hide','apple','','hide','tomato','')"><img > border="0" src="images/pineapple.jpg" width="80" > height="60"></a> > > What configuration attribute must set to zero in > order > for that extra anchor text being indexed and > factored > into pineapple.html's word list? I've basically > tried > setting all "documented" factors to zero (including > backlink). > > I used the latest htdig-3.2.0b4-092902 as well as a > January 2002 release -- both behave the same way. __________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
