The current description of external parsers is wrong; they do
*not* take input on stdin; see htdig3/htdig/ExternalParser.cc
Here's an update. I also changed one of the examples to show
how parameters can be passed.
It should also be noted that the "u" field should specify a
complete, non-relative URL. Maybe this is a bug, since the "i"
field can be relative. The safe way to go here IMHO is to
update the documentation, *then* perhaps fix the code; here we go.
No empty fields are allowed. Think strtok ("\t\t","\t") or try
it yourself; you'll get an "external parser error".
There's also a random typo fix for "second string [of] each
pair" on the first line.
htdoc/ChangeLog:
Thu Jan 5 00:47:22 1998 Hans-Peter Nilsson <[EMAIL PROTECTED]>
* attrs.html: Correct and add more verbose description of external
parser program parameters and fields.
Index: attrs.html
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdoc/attrs.html,v
retrieving revision 1.9
diff -p -c -r1.9 attrs.html
*** attrs.html 1998/12/13 05:44:54 1.9
--- attrs.html 1999/01/05 00:25:01
***************
*** 1208,1220 ****
The external parsers are specified as pairs of
strings. The first string of each pair is the
content-type that the parser can handle while the
! second string each pair is the path to the external
! parsing program. The parsing program will get the
! document to be parsed on its standard input and it is
! to write information for htdig on its standard
! output.<br>
The output consists of records, each record terminated
! with a newline. Each record is a series of tab
separated fields. The first field is a single character
that specifies the record type. The rest of the fields
are determined by the record type.
--- 1208,1281 ----
The external parsers are specified as pairs of
strings. The first string of each pair is the
content-type that the parser can handle while the
! second string of each pair is the path to the external
! parsing program. If quoted, it may contain parameters,
! separated by spaces.<p>
! The parser program takes four command-line
! parameters, not counting parameters and parameters
! given in the command string:<br>
! <em>infile content-type URL configuration-file</em><br>
! <table border="1">
! <tr>
! <th>
! Parameter
! </th>
! <th>
! Description
! </th>
! <th>
! Example
! </th>
! </tr>
! <tr>
! <td valign="top">
! infile
! </td>
! <td>
! A temporary file with the contents to be parsed.
! </td>
! <td>
! /var/tmp/htdext.14242
! </td>
! </tr>
! <tr>
! <td valign="top">
! content-type
! </td>
! <td>
! The MIME-type of the contents.
! </td>
! <td>
! text/html
! </td>
! </tr>
! <tr>
! <td valign="top">
! URL
! </td>
! <td>
! The URL of the contents.
! </td>
! <td>
! http://www.htdig.org/attrs.html
! </td>
! </tr>
! <tr>
! <td valign="top">
! configuration-file
! </td>
! <td>
! The configuration-file in effect.
! </td>
! <td>
! /etc/htdig/htdig.conf
! </td>
! </tr>
! </table><p>
! The external parser is to write information for
! htdig on its standard output.<br>
The output consists of records, each record terminated
! with a newline. Each record is a series of non-empty tab
separated fields. The first field is a single character
that specifies the record type. The rest of the fields
are determined by the record type.
***************
*** 1340,1346 ****
</td>
<td>
A hyperlink to another document that is
! referenced by the current document.
</td>
</tr>
<tr>
--- 1401,1409 ----
</td>
<td>
A hyperlink to another document that is
! referenced by the current document. It must be
! complete and non-relative, using the URL parameter to
! resolve any relative references found in the document.
</td>
</tr>
<tr>
***************
*** 1409,1415 ****
</dt>
<dd>
external_parsers: text/html /usr/local/bin/htmlparser
! application/ms-word /usr/local/bin/mswordparser
</dd>
</dl>
</dd>
--- 1472,1478 ----
</dt>
<dd>
external_parsers: text/html /usr/local/bin/htmlparser
! application/ms-word "/usr/local/bin/mswordparser -w"
</dd>
</dl>
</dd>
brgds, H-P
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.