Use conv_doc.pl instead of parse_doc

get it from http://www.htdig.org/files/contrib/parsers/conv_doc.pl.gz
gunzip it and move it to /usr/local/bin

get xpdf from ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.91.tgz

get ps2ascii from your freetype or ghostscript installation

put this in your conf/htdig.conf
external_parsers: 
            application/msword->text/html /usr/local/bin/conv_doc.pl \
            application/postscript->text/html /usr/local/bin/conv_doc.pl \
            application/pdf->text/html /usr/local/bin/conv_doc.pl


On Wed, 1 Nov 2000, Roy Stephane wrote:

> I have problems indexing PDF Files. I have already considered the FAQ 4.9
> and 5.2. So all my path are OK and the MAX_DOC_SIZE parameter is greater
> than my bigger PDF file. I am working with the external parser "
> parse_doc.pl ".
> 
> When I perform rundig in verbose mode, I find that htdig recognise all my
> PDF files, it shows theire size. After that, when htmerge find a PDF, it say
> that there is no excerpt, so the file (temporary file) is deleted.
> 
> I tried to find the parameters that are used to call htdig form rundig.
> Since an output command on each variables shows nothing on screen, I asume
> that all the parameters used  are having null value.
> 
> I am using RedHat 6.2, an Appache 1.3
> 
> Thanks for your help!
> 
> St�phane Roy
> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> 
> (450) 542-5906
> 
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
> List archives:  <http://www.htdig.org/mail/menu.html>
> FAQ:            <http://www.htdig.org/FAQ.html>
> 
> 


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to