I am facing problem in indexing pdfs in htdig. I am using very small set of pdfs (max size -70KB)
while rundig -vvv I get ......
0/http://10.236.145.140/pdfs/
4/http://10.236.145.140/pdfs/?D=A
11/http://10.236.145.140/pdfs/?D=D
2/http://10.236.145.140/pdfs/?M=A
9/http://10.236.145.140/pdfs/?M=D
8/http://10.236.145.140/pdfs/?N=A
1/http://10.236.145.140/pdfs/?N=D
3/http://10.236.145.140/pdfs/?S=A
10/http://10.236.145.140/pdfs/?S=D
Deleted, no excerpt: 12/http://10.236.145.140/pdfs/dso.html
Deleted, no excerpt: 5/http://10.236.145.140/pdfs/insid_scoop_1.pdf
Deleted, no excerpt: 6/http://10.236.145.140/pdfs/inside_scoop_2.pdf
Deleted, no excerpt: 13/http://10.236.145.140/pdfs/mod/mod_so.html
7/http://10.236.145.140/pdfs/sourcereorg.html
htmerge: 10
---> rundig hangs
I am using solaris 5.8
htdig 3.1.6
conv_doc.pl external converter , I have tested it from command line it is working fine.
some attributes from htdig.conf
start_url: http://10.236.145.140/pdfs/
limit_urls_to: ${start_url}
max_head_length: 20000000
max_doc_size: 20000000
external_parsers: application/pdf->text/html /opt/www/htdig/bin/conv_doc.pl
using xpdf package as well.
please help .
Thanks in advance.
****** Message from InterScan E-Mail VirusWall NT ****** ** No virus found in attached file noname.htm ** No virus found in attached file noname.htm
No Virus detected in the attached file(s). ***************** End of message ***************
This e-mail and any files transmitted with it are for the sole use of the intended
recipient(s) and may contain confidential and privileged information. If you are not
the
intended recipient, please contact the sender by reply e-mail and destroy all copies
of
the original message.
Any unauthorised review, use, disclosure, dissemination, forwarding, printing or
copying
of this email or any action taken in reliance on this e-mail is strictly prohibited
and
may be unlawful.
Visit us at http://www.cognizant.com

