According to [EMAIL PROTECTED]:
> I'm using htdig v3.1.6 on Mac OS X.  Indexing is fine for HTML 
> documents, but I've configured EXTERNAL PARSING for PDF file and get 
> the following error:
> ---------
>   URL: http://pdf.spiral.com/acfea/itineraries/BH011197-iti.pdf
> External parser error: unknown field in line <TITLE>Mt. Holyoke 
> College Glee Club</TITLE>
> ---------
> htdig.conf has entry: external_parsers: application/pdf 
> /usr/local/bin/doc2html.pl

You didn't follow the instructions for doc2html.pl very carefully.
doc2html is an external converter, not an external parser, so you
need to tell htdig what file type will be produced.  You should have...

external_parsers: application/pdf->text/html /usr/local/bin/doc2html.pl

See http://www.htdig.org/FAQ.html#q4.9
and http://www.htdig.org/attrs.html#external_parsers
as well as the DETAILS file in contrib/doc2html.

Without the "->" and target content-type, htdig will assume the parser
will output preparsed records according to the external_parsers
specification.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to