[DataparkSearch Forum] Re: pdf parsing not working???

DataparkSearchForum Tue, 03 Jan 2006 16:25:52 -0800

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: John
Subject: Re: pdf parsing not working???


I thought it was working....

If I index a page and there is a link to a pdf file indexer seems to skip it.  
But if I explicitly point to the pdf it fails as follows...

 ../sbin/indexer -aimv6 -u 
http://kb.company.net/servlet/KbServlet/download/6947-102-14232/Web_Interface_Guide.pdf
..
ndexer[16291]: {01} Response.url: 
http://kb.company.net/servlet/KbServlet/download/6947-102-14232/Web_Interface_Guide.pdf
indexer[16291]: {01} Response.URL_ID: -464642772
indexer[16291]: {01} Response.Vary: Accept-Encoding
indexer[16291]: {01} Status: 200 OK
indexer[16291]: {01} Found external parser 'application/octet-stream' -> 
'text/plain'
indexer[16291]: {01} Starting external parser: 'pdftotext -layout -htmlmeta 
/tmp/ind.1.16291.in -'
Error (0): PDF file is damaged - attempting to reconstruct xref table...
indexer[16291]: {01} Parser-Content-Type: text/plain
indexer[16291]: {01} Store by default


I know these pdf files are good because I can click on them in a browser an 
they work fine.

2 questions...
Do you have any idea why I get the PDF file is damaged?
and
If indexer runs across a pdf link while indexing a page will indexer parse the 
pdf file?

thanks,
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1134762755

[DataparkSearch Forum] Re: pdf parsing not working???

Reply via email to