Hello Dan,


AFAIK if you don't want to use an external parser, there is only one 
possibility left:

Use the internal parser command as described in

http://www.htdig.org/attrs.html#pdf_parser

Hence, for this to work you need to install Acrobat Reader from Adobe. Dis-
pite of what OS you use and the latest security flaws for Acrobat Reader,
which are hopefully fixed for your OS also, the internal pdf_parser command 
is very limited.

htmldoc or doc2html just calls xpdf for the pdf2text conversion.
An additional disadvantage using Acrobat Reader is the fact, that you
will index Postscript files and you _have_ to adjust the max_doc_size in
you htdig.conf above the size of your biggest PDF/PostScript file.

BUT:
Installing some X11 library stuff does not mean to run the service :)).
If you use some kind of Linux with rpm, try installing only the libraries
with --nodeps. Using Solaris and depending on which Solaris (7/8/9) you use,
you don't need to satisfy all dependancies. Ask me again for a list of
packages :).

Yours,

Martin

-- 

--------------------------------------------------------
 arago AG, Institut fuer komplexes Datenmanagement
 Am Niddatal 3, 60488 Frankfurt/Main, [EMAIL PROTECTED]
 Tel. 069/405680, Fax 069/40568111, http://www.arago.de
--------------------------------------------------------

                
On Mon, Jun 23, 2003 at 02:06:03PM -0500, Dan Muey wrote:
> Hello list, 
> 
> I'd like to parse and index pdf files but when I try to install xpdf it wants/needs 
> to install a bunch of x windows stuff which I don't want to do but even if I try to 
> it keeps failing.
> 
> So what I'd like to ask is this:
> 
> Has anyone successfully used something else beside xpdf, like htmldoc for instance, 
> to be able to index pdf files?
> 
> If so any pointers/documentation would be very helpful.
> 
> TIA
> 
> Dan
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: INetU
> Attention Web Developers & Consultants: Become An INetU Hosting Partner.
> Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
> INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to