The current description of external parsers is wrong; they do
*not* take input on stdin; see htdig3/htdig/ExternalParser.cc

Here's an update.  I also changed one of the examples to show
how parameters can be passed.

It should also be noted that the "u" field should specify a
complete, non-relative URL.  Maybe this is a bug, since the "i"
field can be relative.  The safe way to go here IMHO is to
update the documentation, *then* perhaps fix the code; here we go.

No empty fields are allowed.  Think strtok ("\t\t","\t") or try
it yourself; you'll get an "external parser error".

There's also a random typo fix for "second string [of] each
pair" on the first line.

htdoc/ChangeLog:
Thu Jan  5 00:47:22 1998  Hans-Peter Nilsson  <[EMAIL PROTECTED]>

        * attrs.html: Correct and add more verbose description of external
        parser program parameters and fields.

Index: attrs.html
===================================================================
RCS file: /opt/htdig/cvs/htdig3/htdoc/attrs.html,v
retrieving revision 1.9
diff -p -c -r1.9 attrs.html
*** attrs.html  1998/12/13 05:44:54     1.9
--- attrs.html  1999/01/05 00:25:01
***************
*** 1208,1220 ****
               The external parsers are specified as pairs of
              strings. The first string of each pair is the
              content-type that the parser can handle while the
!             second string each pair is the path to the external
!             parsing program. The parsing program will get the
!             document to be parsed on its standard input and it is
!             to write information for htdig on its standard
!             output.<br>
               The output consists of records, each record terminated
!             with a newline. Each record is a series of tab
              separated fields. The first field is a single character
              that specifies the record type. The rest of the fields
              are determined by the record type. 
--- 1208,1281 ----
               The external parsers are specified as pairs of
              strings. The first string of each pair is the
              content-type that the parser can handle while the
!             second string of each pair is the path to the external
!             parsing program. If quoted, it may contain parameters,
!             separated by spaces.<p>
!              The parser program takes four command-line
!             parameters, not counting parameters and parameters
!             given in the command string:<br>
!             <em>infile content-type URL configuration-file</em><br>
!             <table border="1">
!               <tr>
!                 <th>
!                   Parameter
!                 </th>
!                 <th>
!                   Description
!                 </th>
!                 <th>
!                   Example
!                 </th>
!               </tr>
!               <tr>
!                 <td valign="top">
!                   infile
!                 </td>
!                 <td>
!                   A temporary file with the contents to be parsed.
!                 </td>
!                 <td>
!                   /var/tmp/htdext.14242
!                 </td>
!               </tr>
!               <tr>
!                 <td valign="top">
!                   content-type
!                 </td>
!                 <td>
!                   The MIME-type of the contents.
!                 </td>
!                 <td>
!                   text/html
!                 </td>
!               </tr>
!               <tr>
!                 <td valign="top">
!                   URL
!                 </td>
!                 <td>
!                   The URL of the contents.
!                 </td>
!                 <td>
!                   http://www.htdig.org/attrs.html
!                 </td>
!               </tr>
!               <tr>
!                 <td valign="top">
!                   configuration-file
!                 </td>
!                 <td>
!                   The configuration-file in effect.
!                 </td>
!                 <td>
!                   /etc/htdig/htdig.conf
!                 </td>
!               </tr>
!             </table><p>
!             The external parser is to write information for
!             htdig on its standard output.<br>
               The output consists of records, each record terminated
!             with a newline. Each record is a series of non-empty tab
              separated fields. The first field is a single character
              that specifies the record type. The rest of the fields
              are determined by the record type. 
***************
*** 1340,1346 ****
                  </td>
                  <td>
                    A hyperlink to another document that is
!                   referenced by the current document.
                  </td>
                </tr>
                <tr>
--- 1401,1409 ----
                  </td>
                  <td>
                    A hyperlink to another document that is
!                   referenced by the current document.  It must be
!                   complete and non-relative, using the URL parameter to
!                   resolve any relative references found in the document.
                  </td>
                </tr>
                <tr>
***************
*** 1409,1415 ****
            </dt>
            <dd>
              external_parsers: text/html /usr/local/bin/htmlparser
!             application/ms-word /usr/local/bin/mswordparser
            </dd>
          </dl>
        </dd>
--- 1472,1478 ----
            </dt>
            <dd>
              external_parsers: text/html /usr/local/bin/htmlparser
!             application/ms-word "/usr/local/bin/mswordparser -w"
            </dd>
          </dl>
        </dd>

brgds, H-P
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to