Now, the wordlist is created.
> -----Original Message-----
> From: Jost Diederichs [mailto:[EMAIL PROTECTED]]
> Sent: Samstag, 25. November 2000 23:40
> To: 'Geschke Steffen'; [EMAIL PROTECTED]
> Subject: RE: [htdig3-dev] Wordlist
>
>
> Yes, that may exactly be the phenomenon I ran into. See my post
> (Retriever.cc -...) from Thursday. The clue is the word
> ignored in your
> output of htdig. It is generated by a function in
> Retriever.cc. There is a
> problem with the pointer dup. It is coded like a variable and
> if you check
> your compiler output you will probably find a warning "the
> address of dup is
> always true". I suppose if you do the edit I describe in my
> previous post
> everything will work fine. I have been trying to figure out
> where dup is
> defined and what its meaning is but no success so far and no
> answers from
> the list.
>
> - Jost
>
>
> -----Original Message-----
> From: Geschke Steffen [mailto:[EMAIL PROTECTED]]
> Sent: Saturday, November 25, 2000 2:21 PM
> To: '[EMAIL PROTECTED]'
> Subject: [htdig3-dev] Wordlist
>
> Hello,
>
> I have just upgraded from 3.2.0b2 to 3.2.0b3 (Snapshot 11-19)
> and run in following (new) problem:
>
> The wordlist database is only created for documents of
> mime type "text/html". Other mime types are indexed too,
> but the wordlist of these requests are not included
> in the wordlist.
>
> I didn't change the configuration file from 3.2.0b2 and AFAIK
> I did not exclude mime types explicitely.
>
> Here is a little excerpt what htdig says in verbose mode when
> I index a pdf file:
>
> --
>
> Making HTTP request on http://intra1.erlf.siemens.de/test.pdf
> ...
> Header line: Content-Type: application/pdf
> Retrieving document /test.pdf on host: intra1.erlf.siemens.de:80
> Status Code : 200
> Reason : OK
> Content-type : application/pdf
> Persistent connection: not accepted
> Reading the body of the response
> 2 - Connection closed (No persistent connection)
> title: [... correct title of pdf document ...}
> head: [... correct head of pdf document ...]
> word: foo@0
> ...
> word: bar@998
> ( http://intra1.erlf.siemens.de/test.pdf ignored) size = 52650
> pick: intra1.erlf.siemens.de, # servers = 1
> > intra1.erlf.siemens.de supports HTTP persistent connections
> (infinite)
> htdig: Run complete
>
> --
>
> It is only required to scan one pdf file named test.pdf. The content
> of the pdf file is parsed correctly and htdig also find 998 words for
> the wordlist. However, at the end htdigs ignores the link. Why?
>
> After scanning I get
> - docdb
> - docs.index
> - excerpts
>
> BUT NO words.db!
>
>
> Any help?!
>
> Steffen
>
>
> ------------------------------------
> To unsubscribe from the htdig3-dev mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
>
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.