On Thu, 6 Jun 2002, Bley, Josef wrote: > Hi, > > we work with Htdig 3.1.5 running on a Solaris 8 workstation. > We want to integrate an external parser for indexing Tiff-raster files with > the ocr-Software OCRshop from vividata. > We have made following entry in htdig.conf: > external_parsers: image/tiff->text/plain /ya/cadim/htdig/bin/tif2text > > tif2text is a shell-script, which starts the OCR-software. the output of the > OCR-software is a plain-textfile with extension txt in the htdig temporary > directory. > > Following happens: > When rundig starts indexing of a Tiff-File, it starts the OCR-software, > creates the textfile, but after this, htdig tries to index only the > tiff-file and not the text-file. > The result is, that no words are in the htdig database. > Our goal is , that hdtig should index the txt-file instead of the tiff-file. > How can we achieve this ?
Did you read http://www.htdig.org/attrs.html#external_parsers ? There are two ways: converting and parsing. Converting converts a content-type to one that htdig can parse itself, parsing is that your parser will output the parsed file in a special format described in this attribute file. --jesse -------------------------------------------------------------------- J. op den Brouw Johanna Westerdijkplein 75 Haagse Hogeschool 2521 EN DEN HAAG Faculty of Engeneering Netherlands Electrical Engeneering +31 70 4458936 -------------------- [EMAIL PROTECTED] -------------------- Linux - because reboots are for hardware changes _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

