[EMAIL PROTECTED] wrote: > > > > > Hello, > > > > I use htdig 3.1.5 on linux Redhat 6.1. > > > > I have configured htdig.conf file as follows : > > > > valid_extensions: .html .htm .doc .pdf .txt > > local_default_doc: new_index.html index.html index.htm main.htm > > main_frame.htm frame.htm content.htm title.htm main2.htm > > > > local_urls_only: true > > > > local_urls: http://gnbuxsl.grenoble.hp.com:8090/=/var/opt/web/ > > > > # > > # Since ht://Dig does not (and cannot) parse every document type, this > > # attribute is a list of strings (extensions) that will be ignored > > during > > # indexing. These are *only* checked at the end of a URL, whereas > > # exclude_url patterns are matched anywhere. > > # > > bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com > > .gif \ > > .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg > > .mov .avi > > > > max_doc_size: 20000000 > > > > external_parsers: application/msword->text/html > > /usr/local/bin/parse_doc.pl \ > > application/postscript->text/html > > /usr/local/bin/parse_doc.pl \ > > application/pdf->text/html /usr/local/bin/parse_doc.pl > > > > pdf files indexing works fine whereas I get the following message when > > indexing msword files : > > > > 30:30:2:http://gnbuxsl.grenoble.hp.com:8090/doc/tech/casc/details_casc.doc: > > Trying local files > > found existing file /var/opt/web/doc/tech/casc/details_casc.doc > > not found > > > > The file /var/opt/web/doc/tech/casc/details_casc.doc actually exists... > > > > I don't understand what the problem can be. Running rundig with several > > additional -v options does not help. > > > > Could somebody help me ? > > > > Thanks, > > Jean-Francois. > > -- > > I think the "not found" could refer to the utility which you are using > within parse_doc.pl to handle word documents. > > Try calling parse_doc.pl from the command line: > > parse_doc.pl /var/opt/web/doc/tech/casc/details_casc.doc arg2 arg3 > > and see what happens. > > -- > > David J Adams > <[EMAIL PROTECTED]> > Computing Services > University of Southampton Hello, It works, I use the same script for acrobat files and it works properly. It uses catdoc located in /usr/local/bin : ll /usr/local/bin/catdoc -rwxr-xr-x 1 root root 55235 May 19 14:12 /usr/local/bin/catdoc I think the problem is somewhere else. Thanks, Jean-Francois. -- ------------------------------------ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
Re: [htdig] Word documents indexing problem
Jean-Francois Le Carre Petit Wed, 07 Jun 2000 06:35:47 -0700
- [htdig] Word documents indexing problem Jean-Francois Le Carre Petit
- Re: [htdig] Word documents indexing prob... D . J . Adams
- Re: [htdig] Word documents indexing ... Jean-Francois Le Carre Petit
- Re: [htdig] Word documents indexing prob... Gilles Detillieux