- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: John Subject: Re: pdf parsing not working???
I thought it was working.... If I index a page and there is a link to a pdf file indexer seems to skip it. But if I explicitly point to the pdf it fails as follows... ../sbin/indexer -aimv6 -u http://kb.company.net/servlet/KbServlet/download/6947-102-14232/Web_Interface_Guide.pdf .. ndexer[16291]: {01} Response.url: http://kb.company.net/servlet/KbServlet/download/6947-102-14232/Web_Interface_Guide.pdf indexer[16291]: {01} Response.URL_ID: -464642772 indexer[16291]: {01} Response.Vary: Accept-Encoding indexer[16291]: {01} Status: 200 OK indexer[16291]: {01} Found external parser 'application/octet-stream' -> 'text/plain' indexer[16291]: {01} Starting external parser: 'pdftotext -layout -htmlmeta /tmp/ind.1.16291.in -' Error (0): PDF file is damaged - attempting to reconstruct xref table... indexer[16291]: {01} Parser-Content-Type: text/plain indexer[16291]: {01} Store by default I know these pdf files are good because I can click on them in a browser an they work fine. 2 questions... Do you have any idea why I get the PDF file is damaged? and If indexer runs across a pdf link while indexing a page will indexer parse the pdf file? thanks, - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1134762755
