Hi everybody,

I'm trying to run htDig and parse pdf and rtf files.
Yes, trying.....

I've read all the FAQ but I can't get off of these two (related?) problems:
1) if I run rundig as root, while parsing pdf file I get:

PDF::parse(http://segramm.dico.unimi.it/common/docs/contratti/piano_utilizzo.pdf)
PDF::parse: error running pdf_parser on
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
http://segramm.dico.unimi.it/common/docs/contratti/piano_utilizzo.pdf
 size = 78585

But if I run it as normal user I get this different error:


PDF::setContents(78585 bytes)
PDF::parse(http://segramm.dico.unimi.it/common/docs/contratti/piano_utilizzo.pdf)
PDF::parse: cannot open
            ^^^^^^^^^^^
//usr/share/webapps/htdig/3.1.6-r7/hostroot/htdig/db/htdig14605.pdf
 size = 78585

WHAT? My htdig.conf says: database_dir:           /tmp/db
usr/share/webapps/htdig/3.1.6-r7/hostroot/htdig/db was the *original*
configured database_dir I commented out.

If I run: ./doc2html.pl
/var/www/segramm/htdocs/common/docs/contratti/piano_utilizzo.pdf
"application/pdf"
http://segramm.dico.unimi.it/common/docs/contratti/piano_utilizzo.pdf

the file is converted in html without problems.....

So, can anybody tell me something about how to solve this?


2) I've installed rtf2html and in doc2html.pl I've set up the full path
to che executable. In htdig.conf I've added:
external_parser:        application/pdf->text/html
/usr/local/script/doc2html.pl \
                        application/rtf->text/html
/usr/local/script/doc2html.pl \
                        text/rtf->text/html /usr/local/script/doc2html.pl \

But when I run rundig I get:
101:137:2:http://segramm.dsi.unimi.it/common/docs/contratti/piano_utilizzo.rtf:

Retrieval command for
http://segramm.dsi.unimi.it/common/docs/contratti/piano_ut
ilizzo.rtf: GET /common/docs/contratti/piano_utilizzo.rtf HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Referer: http://segramm.dsi.unimi.it/index.php/sid=3;go=contratti
Host: segramm.dsi.unimi.it

Header line: HTTP/1.1 200 OK
Header line: Date: Wed, 29 Nov 2006 13:53:31 GMT
Header line: Server: Apache
Header line: Last-Modified: Mon, 20 Nov 2006 16:25:16 GMT
Converted Mon, 20 Nov 2006 16:25:16 GMT to Mon, 20 Nov 2006 16:25:16
Header line: ETag: "3d3e4-363d-29b29300"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 13885
Header line: Connection: close
Header line: Content-Type: text/rtf
Header line:
returnStatus = 0
Read 8192 from document
Read 5693 from document
Read a total of 13885 bytes
"text/rtf" not a recognized type.  Assuming text
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Why?



Could anybody help me?

Cheers,
Arianna


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to