Actually, to append to this problem I'm running into,
it doesn't appear to be the text WITHIN the anchor tag
-- rather, there is text between the beginning <A> and
the ending </A> of the anchor tag that contains the
word "large".

The question still is, though, how do I completely
wipe out any reference to text between these tags as
relevant to the linked document?

Thanks,
-Jes

--- Jessica Biola <[EMAIL PROTECTED]> wrote:
> I have two files on my test site being indexed.
> 
> fruit.html
> pineapple.html
> 
> There's a word, "large" on fruit.html.  "large" does
> NOT appear anywhere within pineapple.html, however,
> when I htsearch on the index, both documents show as
> a
> match.  In fact, the base_score is very high for the
> query "large" on the document pineapple.html.
> 
> If I index pineapple.html alone, the query "large"
> yields no results.  So there is definitely some type
> of relationship between the two documents and the
> word
> "large".  htdump revealed the relationship by
> outputting this:
> 
> 3  u:http://jtest/pineapple.htm  t:pineapples  a:0 
> m:1033563111  s:5824  H: Pineapple trees  h:
> l:1033563111    L:12    b:5     c:1     g:0     e:  
>  
>  n:      S:      d:large  A:Stump^ATrunk
> 
> The key part is "d:large".  According to the htdump
> doc page online, the "d" element is defined as: 
> "The
> text of links pointing to this document. (e.g. <a
> href="docURL">description</a>)"
> 
> fruit.html, the other HTML file indexed, contains a
> hyperlink to pineapple.html that looks like this:
> 
> <a href="pineapple.html"
>
onMouseOver="MM_showHideLayers('Pine_apple','','show','large','','hide','apple','','hide','tomato','')"><img
> border="0" src="images/pineapple.jpg" width="80"
> height="60"></a>
> 
> What configuration attribute must set to zero in
> order
> for that extra anchor text being indexed and
> factored
> into pineapple.html's word list?  I've basically
> tried
> setting all "documented" factors to zero (including
> backlink).
> 
> I used the latest htdig-3.2.0b4-092902 as well as a
> January 2002 release -- both behave the same way.

__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to