At 04:52 AM 8/6/98 -0400, Geoff Hutchison wrote:
>
>> With setting the title_factor to 10 and the text_factor
>> as well as all heading_factors to 0 we still get things that
>> are between the body tags such as links to other pages
>
>Well the purpose of text_factor is:
>  This is a factor which will be used to multiply the
>  weight of words that are not in any special part of a
>  document. Setting a factor to 0 will cause normal words
>  to be ignored.

It occurs to me that the definition 'not in any special part of the
document' is a tad ambiguous.  In other words, would the body be considered
a 'special part' of the document?  How about links?  One could say that
anything which is between any specific tags is in a special part of the
document, and therefore not subject to the exclusion of the 'text_factor:
0' attribute.  According to the documentation I've seen so far, the only
specific tags that htdig will look for are the <title>...</title> and
<h1>...</h1>-<h6>...</h6> tags.  Do all tags which are not these tags
qualify as 'not in any special part of the document'?

>An alternative solution is to use META description tags and the patch I
>produced. No body text will appear in the output.

Unfortunately, we're trying to adjust searches on a large, extensive web
for which the installation of META tags is just not feasible.  Thanks for
the idea, though.


____________________________________________________________________________
______

Benjamin J. Pitzer
[EMAIL PROTECTED]

"I would rather be ashes than dust.  I would rather that my spark should
burn out in a brilliant blaze than be stifled by dry-rot.  I would rather
be a superb meteorite, every atom of me in magnificent glow, than a sleepy,
permanent planet.  The proper function of man is to live, not to exist.  I
will not waste my days in trying to prolong them.  I will use my time."


- Jack London

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to