Have you tried running doc2html.pl from the command line?

The format is:

    doc2html.pl  filename.doc  application/msword


--
David Adams
Information Systems Services
Southampton University


----- Original Message -----
From: "Keith Pettit" <[EMAIL PROTECTED]>
To: "David Adams" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, December 10, 2002 4:13 PM
Subject: Re: [htdig] Won't index word doc's


> Yes catdoc is installed at works great.  I've put in the correct path
> for catdoc in all the parsers, and like I said htparsedoc and
> parse_doc.pl work just fine.  I can point them to a word doc and it
> works, but doc2html dosen't.
>
> That's why it's so confusing.  I've followed the docs, catdoc is
> installed and works, 2 of the 3 paerser work from the command line.
> I've put in what I've seen in the examples in my config.  Beyond that I
> don't know why it's not working.
>
> Keith
>
>
> On Tue, 2002-12-10 at 02:22, David Adams wrote:
> > Forgive me if I'm misjudging the case, but it sounds as though you have
NOT
> > read the instructions.  If you had, you would know that doc2html does
not
> > require the commercial converter wp2html.  It can be used with the
freeware
> > catdoc, though wp2html should give better results.
> >
> > You should also know that parse_doc.pl requires catdoc and does not work
> > without it.  Have you installed catdoc?
> >
> > 'Fraid I not familiar with htparsedoc, but the same probably applies.
> >
> > --
> > David Adams
> > Information Systems Services
> > Southampton University
> >
> >
> > ----- Original Message -----
> > From: "Keith Pettit" <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Monday, December 09, 2002 10:38 PM
> > Subject: [htdig] Won't index word doc's
> >
> >
> > > I am so boggled with this.  I've followed all the instructions tried
> > > different vresion of htdig, different paresers and nothing seems to
work
> > > and I can't tell where the failure is.
> > >
> > > Basically I'm running htdig on a index page I created. All this page
has
> > > is a bunch of links to word documents.  But it don't search though any
> > > of the doc's.
> > >
> > > This is what I get when I run it:
> > > htdig: Run complete
> > > htdig: 1 server seen:
> > > htdig:     www.drgutah.com:80 1 document
> > >
> > > I've tried using htparsedoc, parse_doc.pl, and the doc2html.
htparsedoc
> > > and parse_doc.pl work by themselves if just execute them by themselves
> > > and point them at a word file, can't get doc2html to work and I assume
> > > it's becuase I won't buy the commerical coverter.  So I'm assumin
there
> > > is some sort of issue in my config.  I've got it pointing to the right
> > > places it just seems like it's ignoring the .doc files.  Maybe there
is
> > > some way I can force it to go though them.
> > >
> > > Thanks for any help..
> > >
> > > Thanks.
> > >
> > > Keith
> > > [EMAIL PROTECTED]
> > >
> > > external_parsers: application/msword /opt/www/htdig/bin/htparsedoc \
> > >                   application/postscript /opt/www/htdig/bin/htparsedoc
\
> > >                   application/pdf /opt/www/htdig/bin/htparsedoc
> > >
> > > database_dir: /opt/www/htdig/db
> > > start_url: http://myurl.com
> > > limit_urls_to: ${start_url}
> > > exclude_urls: /cgi-bin/ .cgi
> > > maintainer: [EMAIL PROTECTED]
> > > max_head_length: 10000
> > > max_doc_size: 2000000
> > >
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > This sf.net email is sponsored by:ThinkGeek
> > > Welcome to geek heaven.
> > > http://thinkgeek.com/sf
> > > _______________________________________________
> > > htdig-general mailing list <[EMAIL PROTECTED]>
> > > To unsubscribe, send a message to
> > <[EMAIL PROTECTED]> with a subject of
unsubscribe
> > > FAQ: http://htdig.sourceforge.net/FAQ.html
> > >
> >
>
>



-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility 
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to