Hi again. Here is a patch to add documentation for yesterday's patch
for external converter support. Again, it applies to 3.1.3.
--- htdig-3.1.3/htdoc/attrs.html.noconv Wed Sep 22 11:18:41 1999
+++ htdig-3.1.3/htdoc/attrs.html Wed Oct 20 11:37:52 1999
@@ -1625,9 +1625,29 @@
content-type that the parser can handle while the
second string of each pair is the path to the external
parsing program. If quoted, it may contain parameters,
- separated by spaces.<p>
+ separated by spaces.<br>
+ External parsing can also be done with external
+ converters, which convert one content-type to
+ another. To do this, instead of just specifying
+ a single content-type as the first string
+ of a pair, you specify two types, in the form
+ <em>type1</em><strong>-></strong><em>type2</em>,
+ as a single string with no spaces. The second
+ string will define an external converter
+ rather than an external parser, to convert
+ the first type to the second. If the second
+ type is <strong>user-defined</strong>, then
+ it's up to the converter script to put out a
+ "Content-Type: <em>type</em>" header followed
+ by a blank line, to indicate to htdig what type it
+ should expect for the output, much like what a CGI
+ script would do. The resulting content-type must
+ be one that htdig can parse, either internally,
+ or with another external parser or converter.<br>
+ Only one external parser or converter can be
+ specified for any given content-type.<p>
The parser program takes four command-line
- parameters, not counting parameters and parameters
+ parameters, not counting any parameters already
given in the command string:<br>
<em>infile content-type URL configuration-file</em><br>
<table border="1">
@@ -1688,7 +1708,10 @@
</tr>
</table><p>
The external parser is to write information for
- htdig on its standard output.<br>
+ htdig on its standard output. Unless it is an
+ external converter, which will output a document
+ of a different content-type, then its output must
+ follow the format described here.<br>
The output consists of records, each record terminated
with a newline. Each record is a series of (unless
expressively allowed to be empty) non-empty tab-separated
@@ -1927,7 +1950,9 @@
</td>
<td nowrap>
text/html /usr/local/bin/htmlparser \<br>
- application/ms-word "/usr/local/bin/mswordparser -w"
+ application/pdf /usr/local/bin/parse_doc.pl \<br>
+ application/msword->text/plain
+"/usr/local/bin/mswordtotxt -w" \<br>
+ application/x-gunzip->user-defined
+/usr/local/bin/ungzipper
</td>
</tr>
</table>
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.