- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: John
Subject: pdf parsing not working???

I can not seem to get the indexing of pdf files to work.  As a test I created a 
simple directory on localhost called test and placed a simple html file with a 
link to the pdf that is in the same directory.  Indexer indexs the html file 
but not he link to the pdf.
I can see what I am missing.
I have included snippets of the indexer.conf below as well as pdftotext version 
...

Thanks, John

linux:/srv/www/htdocs/test # ll
total 2114
drwxr-xr-x    2 root     root          112 Dec 16 05:07 .
drwxr-xr-x    6 root     root          376 Dec 16 04:44 ..
-rwxr-xr-x    1 root     root      2155943 Dec 16 04:44 admin.pdf
-rw-r--r--    1 root     root          142 Dec 16 04:45 test4.html


linux:/usr/local/dpsearch/etc # /usr/local/bin/pdftotext -v
pdftotext version 3.01
Copyright 1996-2005 Glyph & Cog, LLC

***indexer.config***

DBAddr  mysql://ctxroot:[EMAIL 
PROTECTED]/ctxdb/?dbmode=cache&cached=localhost:7000
...
###########################################################################
#DoStore yes/no
# Whether to store compressed document copies, if no stored specified.
# Default value is no
DoStore yes
..
AddType text/rtf                        *.rtf
AddType application/pdf                 *.pdf
AddType application/msword              *.doc
...
Disallow *.rtf  *.cdf  *.ps (I removed the *.pdf)
...
#Mime text/x-postscript        text/plain                    "ps2ascii"
Mime application/pdf          text/plain                    
"/usr/local/bin/pdftotext -layout -htmlmeta $1 -"
#Mime application/vnd.ms-excel text/plain                    "xls2csv $1"
...
#
# To specify the only one page:
Server path http://localhost/test/
#

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=

Reply via email to