On Wed, Apr 05, 2000 at 01:25:55PM -0500, Gilles Detillieux wrote:
> "htdig -ivvvvc newconfig.conf" to see what htdig is doing when in parses
> the title of this page. Take a look at the resulting db.wordlist as well,
> to see if "litt�rature" (or some mangled form of it) is getting into the
> database.
okay, here goes....
from log of htdig session:
Tag: HTML>, matched -1
Tag: HEAD>, matched -1
Tag: TITLE>, matched 0
word: Litt�rature@6
word: francophone@9
word: virtuelle@12
word: ClicNet@15
Tag: /TITLE>, matched 1
title: Litt�rature francophone virtuelle (ClicNet)
Tag: /HEAD>, matched -1
Tag: BODY BGCOLOR="#FFFFFF" LINK="#060433" ALINK="#060433"
VLINK="#0E294B">, matched -1
Tag: center>, matched -1
Tag: IMG SRC="litterature.gif">, matched 18
image: http://clicnet.swarthmore.edu/litterature/litterature.gif
Tag: BR>, matched -1
--------
so, it matched the title from the header. then:
Tag: H2>, matched 5
word: ClicNet@52
word: Litt�rature@54
word: francophone@57
word: virtuelle@60
Tag: /H2>, matched 11
Tag: /center>, matched -1
it seems to match the title inside of the <H2></H2> tags
Litt�rature does appear in the wordlist database, as well (only it is non-cap'd):
litt�rature i:0 l:6 w:105469 c:5
Is any of this helpful, at all?
> Are your title_factor
> and/or heading_factor_1 non-zero?
I'm not sure what you mean by this last bit...
thanks again for all your help,
chris
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.