First of all, take the latest version of catdoc. Something
like 0.90 or so.
Second there is another script around. see:
http://www.st.hhs.nl/htdig/parse_word_doc.pl
Third, there is mswordview, which translates Word 97 files
into HTML, but I don't know if someone uses that option
Fourth, catdoc sometimes fails dramaticly when a non-Word
file end with .doc and gets parsed by catdoc. It crashed
htdig at my place...
U.O. Telematica Municipale - Comune di Prato wrote:
>
> Hi people !!! I tried to use the external parse htparsedoc from the contrib
> dir: I compiled the catdoc.c and all went OK. But when I try to run htdig,
> a core dumps. Is there another external parser available for MS Word
> documents? If not, can you tell me how to configure it?
>
> This is what I've done with my htdig configuration.
>
> I added this line to htdig.conf:
>
> external_parsers: application/msword /usr1/htdig/bin/htparsedoc
>
> When htdig founds a document with that MIME type, it launches htparsedoc.
> But at the end of the indexing process I found a core in the directory bin.
>
> Ah, I run htdig on a Linux slakware 2.0.35 (Pentium Celeron 266 Mhx 64MB Ram).
>
> Thanks a lot
> Ciao
> Gabriele
>
> ----------------------------------------------------------
>
> U.O. Rete Civica - Comune di Prato
> Via Ricasoli, 4 - 59100 Prato PO Italia
> Tel. +39 0574616342 Fax +39 0574616003
>
> http://www.comune.prato.it
> E-Mail: [EMAIL PROTECTED]
>
> ----------------------------------------------------------
> ------------------------------------
> To unsubscribe from the htdig3-dev mailing list, send a message to
> [EMAIL PROTECTED] containing the single word "unsubscribe" in
> the SUBJECT of the message.
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.