I took the doc2html.pl which you sent me, changed the line my $CATDOC = '/usr/local/bin/catdoc ';
to the location of my catdoc and tried it from the command line. It works! This is on RedHat Linux (either 7 or 8 I don't know) and Perl 5.8.0. However, if I put in a space after catdoc as you have, then it fails with the message ! ERROR Unable to execute /opt/local/bin/catdoc for Word (catdoc) document I don't think I can you help any further. -- David Adams Information Systems Services Southampton University ----- Original Message ----- From: "Zachary Jenks" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, December 11, 2002 3:40 PM Subject: Re: doc2html --> catdoc > Well thats good to know, if I can get it working that is. I tried it from the command line using that syntax and get the error message: > > UNABLE to convert! > > So I don't know what the deal is? I'm running it in Linux RedHat 8. I've tested catdoc independently and it works. I give it a file and it prints out the text inside the file on the screen. > > Thanks! > > Zack > > >>> "David Adams" <[EMAIL PROTECTED]> 12/11/02 05:48AM >>> > I believe that Word 2000 is the same format as Word97, and that catdoc > should produce some output with any type of Word file. > > Have you tried running doc2html from the command line? The format is > > doc2html.pl filename.doc application/msword > > -- > David Adams > Information Systems Services > Southampton University > > > ----- Original Message ----- > From: "Zachary Jenks" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Tuesday, December 10, 2002 6:14 PM > Subject: Re: doc2html --> catdoc > > > > Alright I did that and I'm getting the same results. It blows my mind > that it's not working, everything seems to be configured correctly. Is it > possible that somethings wrong with the doc2html.pl program? But then why > is it working fine with pdf? My excel docs aren't getting converted either > (when I set it up right that is...I removed that code from doc2html.pl in > order to focus on catdoc). I've even saved my documents as msword 97 docs. > Am I correct in assuming that catdoc doesn't work with 2000? > > > > Have a good one! > > > > Zack > > > > >>> "David Adams" <[EMAIL PROTECTED]> 12/10/02 08:13AM >>> > > I'm clutching at straws, but you could try removing the space after catdoc > in your doc2html.pl: > > > > #version of catdoc for Word6, Word7 & Word97 files: > > my $CATDOC = '/usr/local/bin/catdoc '; > > > > -- > > David Adams > > Information Systems Services > > Southampton University > > > > > > ----- Original Message ----- > > From: "Zachary Jenks" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Tuesday, December 10, 2002 3:49 PM > > Subject: Re: doc2html --> catdoc > > > > > > > This is what I get: > > > > -------------------------------------------------------------------------- > ------ > > > Header line: HTTP/1.1 200 OK > > > Header line: Date: Tue, 10 Dec 2002 22:53:06 GMT > > > Header line: Server: Apache/2.0.40 (Red Hat Linux) > > > Header line: Last-Modified: Mon, 04 Dec 2000 22:09:26 GMT > > > Converted Mon, 04 Dec 2000 22:09:26 GMT to Mon, 04 Dec 2000 22:09:26 > > > Header line: ETag: "80c1ac-4c00-34013180" > > > Header line: Accept-Ranges: bytes > > > Header line: Content-Length: 19456 > > > Header line: Connection: close > > > Header line: Content-Type: application/msword > > > not HTML > > > pick: superman.umesd.k12.or.us, # servers = 1 > > > > 16:16:2:http://superman.umesd.k12.or.us/public_html/documents/webdev.txt: > Retrieval command for > http://superman.umesd.k12.or.us/public_html/documents/webdev.txt: GET > /public_html/documents/webdev.txt HTTP/1.0 > > > User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) > > > Referer: http://superman.umesd.k12.or.us/public_html/documents/ > > > Host: superman.umesd.k12.or.us > > > > -------------------------------------------------------------------------- > ------------- > > > > > > and the last few lines are: > > > > > > > -------------------------------------------------------------------------- > ------------- > > > Deleted, no excerpt: > 5/http://superman.umesd.k12.or.us/public_html/test.doc > > > Deleted, no excerpt: > 6/http://superman.umesd.k12.or.us/public_html/test.xls > > > Deleted, no excerpt: > 7/http://superman.umesd.k12.or.us/public_html/zack.doc > > > > -------------------------------------------------------------------------- > ------------- > > > > > > And ideas? Thanks Again! > > > > > > Zack > > > > > > >>> "David Adams" <[EMAIL PROTECTED]> 12/10/02 04:33AM >>> > > > The doc2html.pl you sent me separately looks OK. > > > > > > The Magic word looks OK. > > > > > > That leaves the MIME-type. Is your web server configured to deliver > *.doc > > > files as "application/msword"? > > > Run htdig with the -vvv option and see what Content-type you get for > Word > > > documents. > > > > > > -- > > > David Adams > > > Information Systems Services > > > Southampton University > > > > > > > > > ----- Original Message ----- > > > From: "Zachary Jenks" <[EMAIL PROTECTED]> > > > To: <[EMAIL PROTECTED]> > > > Sent: Monday, December 09, 2002 9:28 PM > > > Subject: Re: doc2html --> catdoc > > > > > > > > > > Sorry to keep bothering you about this Mr. Adams but I checked the > magic > > > numbers and they appear to match: > > > > > > > > >From doc2html.pl: > > > > $magic = '^\320\317\021\340'; > > > > > > > > >From first line of od -c filename | more: > > > > 0000000 320 317 021 340 241 261 032 341 \0 \0 \0 \0 \0 \0 \0 > \0 > > > > > > > > Do you have any other suggestions? > > > > > > > > Thanks! > > > > > > > > Zack > > > > > > > > > > > > >>> "David Adams" <[EMAIL PROTECTED]> 12/05/02 08:55AM >>> > > > > Doc2html.pl is failing to match the magic number and MIME-type of the > > > files > > > > you are trying to index. > > > > > > > > If you are certain that you have set up doc2html.pl correctly then > look at > > > > the first few characters of one of your Word documents and see if it > > > matches > > > > the magic number set in doc2html.pl for a Word file. You can use > > > > > > > > od -c filename | more > > > > > > > > to see the first few characters in the file. > > > > > > > > -- > > > > David Adams > > > > Information Systems Services > > > > Southampton University > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Zachary Jenks" <[EMAIL PROTECTED]> > > > > To: <[EMAIL PROTECTED]> > > > > Sent: Thursday, December 05, 2002 4:20 PM > > > > Subject: doc2html --> catdoc > > > > > > > > > > > > > Hello Mr. Adams! I am trying to get catdoc working with my doc2html > > > > program and am receiving the following error: > > > > > > > > > > "UNABLE to convert" > > > > > > > > > > I've tested catdoc independently and it works fine on the file I'm > > > trying > > > > to convert. I've added the catdoc location (usr/local/bin/catdoc) to > my > > > > doc2html program. My doc2html program works great at converting pdf > files. > > > > I am using the appropriate syntax on the command line: "./doc2hmlt.pl > > > > /location to word file application/msword". I've tried reinstalling > > > > doc2html and it still doesn't work with catdoc. Do you have any > > > > suggestions? > > > > > > > > > > Thanks! > > > > > > > > > > Zack > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------- This sf.net email is sponsored by: With Great Power, Comes Great Responsibility Learn to use your power at OSDN's High Performance Computing Channel http://hpc.devchannel.org/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

