- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: John Subject: pdf parsing not working???
I can not seem to get the indexing of pdf files to work. As a test I created a simple directory on localhost called test and placed a simple html file with a link to the pdf that is in the same directory. Indexer indexs the html file but not he link to the pdf. I can see what I am missing. I have included snippets of the indexer.conf below as well as pdftotext version ... Thanks, John linux:/srv/www/htdocs/test # ll total 2114 drwxr-xr-x 2 root root 112 Dec 16 05:07 . drwxr-xr-x 6 root root 376 Dec 16 04:44 .. -rwxr-xr-x 1 root root 2155943 Dec 16 04:44 admin.pdf -rw-r--r-- 1 root root 142 Dec 16 04:45 test4.html linux:/usr/local/dpsearch/etc # /usr/local/bin/pdftotext -v pdftotext version 3.01 Copyright 1996-2005 Glyph & Cog, LLC ***indexer.config*** DBAddr mysql://ctxroot:[EMAIL PROTECTED]/ctxdb/?dbmode=cache&cached=localhost:7000 ... ########################################################################### #DoStore yes/no # Whether to store compressed document copies, if no stored specified. # Default value is no DoStore yes .. AddType text/rtf *.rtf AddType application/pdf *.pdf AddType application/msword *.doc ... Disallow *.rtf *.cdf *.ps (I removed the *.pdf) ... #Mime text/x-postscript text/plain "ps2ascii" Mime application/pdf text/plain "/usr/local/bin/pdftotext -layout -htmlmeta $1 -" #Mime application/vnd.ms-excel text/plain "xls2csv $1" ... # # To specify the only one page: Server path http://localhost/test/ # - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=
