Yes, that may exactly be the phenomenon I ran into. See my post
(Retriever.cc -...) from Thursday. The clue is the word ignored in your
output of htdig. It is generated by a function in Retriever.cc. There is a
problem with the pointer dup. It is coded like a variable and if you check
your compiler output you will probably find a warning "the address of dup is
always true". I suppose if you do the edit I describe in my previous post
everything will work fine. I have been trying to figure out where dup is
defined and what its meaning is but no success so far and no answers from
the list.

 - Jost


 -----Original Message-----
From:   Geschke Steffen [mailto:[EMAIL PROTECTED]]
Sent:   Saturday, November 25, 2000 2:21 PM
To:     '[EMAIL PROTECTED]'
Subject:        [htdig3-dev] Wordlist

Hello,

I have just upgraded from 3.2.0b2 to 3.2.0b3 (Snapshot 11-19)
and run in following (new) problem:

The wordlist database is only created for documents of
mime type "text/html". Other mime types are indexed too,
but the wordlist of these requests are not included
in the wordlist.

I didn't change the configuration file from 3.2.0b2 and AFAIK
I did not exclude mime types explicitely.

Here is a little excerpt what htdig says in verbose mode when
I index a pdf file:

--

Making HTTP request on http://intra1.erlf.siemens.de/test.pdf
...
Header line: Content-Type: application/pdf
Retrieving document /test.pdf on host: intra1.erlf.siemens.de:80
Status Code       : 200
Reason            : OK
Content-type      : application/pdf
Persistent connection: not accepted
Reading the body of the response
    2 - Connection closed (No persistent connection)
title: [... correct title of pdf document ...}
head: [... correct head of pdf document ...]
word: foo@0
...
word: bar@998
 ( http://intra1.erlf.siemens.de/test.pdf ignored) size = 52650
pick: intra1.erlf.siemens.de, # servers = 1
> intra1.erlf.siemens.de supports HTTP persistent connections (infinite)
htdig: Run complete

--

It is only required to scan one pdf file named test.pdf. The content
of the pdf file is parsed correctly and htdig also find 998 words for
the wordlist. However, at the end htdigs ignores the link. Why?

After scanning I get
- docdb
- docs.index
- excerpts

BUT NO words.db!


Any help?!

Steffen


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 


Reply via email to