RE: [htdig3-dev] Wordlist

Geschke Steffen Sat, 25 Nov 2000 16:29:01 -0800
Now, the wordlist is created.

> -----Original Message-----
> From: Jost Diederichs [mailto:[EMAIL PROTECTED]]
> Sent: Samstag, 25. November 2000 23:40
> To: 'Geschke Steffen'; [EMAIL PROTECTED]
> Subject: RE: [htdig3-dev] Wordlist
> 
> 
> Yes, that may exactly be the phenomenon I ran into. See my post
> (Retriever.cc -...) from Thursday. The clue is the word 
> ignored in your
> output of htdig. It is generated by a function in 
> Retriever.cc. There is a
> problem with the pointer dup. It is coded like a variable and 
> if you check
> your compiler output you will probably find a warning "the 
> address of dup is
> always true". I suppose if you do the edit I describe in my 
> previous post
> everything will work fine. I have been trying to figure out 
> where dup is
> defined and what its meaning is but no success so far and no 
> answers from
> the list.
> 
>  - Jost
> 
> 
>  -----Original Message-----
> From:         Geschke Steffen [mailto:[EMAIL PROTECTED]]
> Sent: Saturday, November 25, 2000 2:21 PM
> To:   '[EMAIL PROTECTED]'
> Subject:      [htdig3-dev] Wordlist
> 
> Hello,
> 
> I have just upgraded from 3.2.0b2 to 3.2.0b3 (Snapshot 11-19)
> and run in following (new) problem:
> 
> The wordlist database is only created for documents of
> mime type "text/html". Other mime types are indexed too,
> but the wordlist of these requests are not included
> in the wordlist.
> 
> I didn't change the configuration file from 3.2.0b2 and AFAIK
> I did not exclude mime types explicitely.
> 
> Here is a little excerpt what htdig says in verbose mode when
> I index a pdf file:
> 
> --
> 
> Making HTTP request on http://intra1.erlf.siemens.de/test.pdf
> ...
> Header line: Content-Type: application/pdf
> Retrieving document /test.pdf on host: intra1.erlf.siemens.de:80
> Status Code       : 200
> Reason            : OK
> Content-type      : application/pdf
> Persistent connection: not accepted
> Reading the body of the response
>     2 - Connection closed (No persistent connection)
> title: [... correct title of pdf document ...}
> head: [... correct head of pdf document ...]
> word: foo@0
> ...
> word: bar@998
>  ( http://intra1.erlf.siemens.de/test.pdf ignored) size = 52650
> pick: intra1.erlf.siemens.de, # servers = 1
> > intra1.erlf.siemens.de supports HTTP persistent connections 
> (infinite)
> htdig: Run complete
> 
> --
> 
> It is only required to scan one pdf file named test.pdf. The content
> of the pdf file is parsed correctly and htdig also find 998 words for
> the wordlist. However, at the end htdigs ignores the link. Why?
> 
> After scanning I get
> - docdb
> - docs.index
> - excerpts
> 
> BUT NO words.db!
> 
> 
> Any help?!
> 
> Steffen
> 
> 
> ------------------------------------
> To unsubscribe from the htdig3-dev mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
> 

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this.
RE: [htdig3-dev] Wordlist

Reply via email to