Hello,

I have just upgraded from 3.2.0b2 to 3.2.0b3 (Snapshot 11-19)
and run in following (new) problem:

The wordlist database is only created for documents of
mime type "text/html". Other mime types are indexed too,
but the wordlist of these requests are not included
in the wordlist.

I didn't change the configuration file from 3.2.0b2 and AFAIK
I did not exclude mime types explicitely.

Here is a little excerpt what htdig says in verbose mode when
I index a pdf file:

--

Making HTTP request on http://intra1.erlf.siemens.de/test.pdf
...
Header line: Content-Type: application/pdf
Retrieving document /test.pdf on host: intra1.erlf.siemens.de:80
Status Code       : 200
Reason            : OK
Content-type      : application/pdf
Persistent connection: not accepted
Reading the body of the response
    2 - Connection closed (No persistent connection)
title: [... correct title of pdf document ...}
head: [... correct head of pdf document ...]
word: foo@0
...
word: bar@998
 ( http://intra1.erlf.siemens.de/test.pdf ignored) size = 52650
pick: intra1.erlf.siemens.de, # servers = 1
> intra1.erlf.siemens.de supports HTTP persistent connections (infinite)
htdig: Run complete

--

It is only required to scan one pdf file named test.pdf. The content
of the pdf file is parsed correctly and htdig also find 998 words for
the wordlist. However, at the end htdigs ignores the link. Why?

After scanning I get
- docdb
- docs.index
- excerpts

BUT NO words.db!


Any help?!

Steffen


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 


Reply via email to