<SELECT NAME="search_algorithm"> |
contrib/examples
directory.
The default value of this attribute is determined at compile time.
" }, { "build_select_listsThe default value of this attribute is determined at compile time.
" }, { "create_image_listsort -u to get a unique list.
sort -u to get a unique list.
The default value of this attribute is determined at compile time.
" }, { "date_factor ...
The parser program takes four command-line
parameters, not counting any parameters already
given in the command string:
infile content-type URL configuration-file
| Parameter | Description | Example |
|---|---|---|
| infile | A temporary file with the contents to be parsed. | /var/tmp/htdext.14242 |
| content-type | The MIME-type of the contents. | text/html |
| URL | The URL of the contents. | http://www.htdig.org/attrs.html |
| configuration-file | The configuration-file in effect. | /etc/htdig/htdig.conf |
The external parser is to write information for
htdig on its standard output. Unless it is an
external converter, which will output a document
of a different content-type, then its output must
follow the format described here.
The output consists of records, each record terminated
with a newline. Each record is a series of (unless
expressively allowed to be empty) non-empty tab-separated
fields. The first field is a single character
that specifies the record type. The rest of the fields
are determined by the record type.
| Record type | Fields | Description |
|---|---|---|
| w | word | A word that was found in the document. |
| location | A number indicating the normalized location of the word within the document. The number has to fall in the range 0-1000 where 0 means the top of the document. | |
| heading level |
A heading level that is used to compute the
weight of the word depending on its context in
the document itself. The level is in the range of
0-10 and are defined as follows:
|
|
| u | document URL | A hyperlink to another document that is referenced by the current document. It must be complete and non-relative, using the URL parameter to resolve any relative references found in the document. |
| hyperlink description | For HTML documents, this would be the text between the <a href...> and </a> tags. | |
| t | title | The title of the document |
| h | head | The top of the document itself. This is used to build the excerpt. This should only contain normal ASCII text |
| a | anchor | The label that identifies an anchor that can be used as a target in an URL. This really only makes sense for HTML documents. |
| i | image URL | An URL that points at an image that is part of the document. |
| m | http-equiv | The HTTP-EQUIV attribute of a META tag. May be empty. |
| name | The NAME attribute of this META tag. May be empty. | |
| contents | The CONTENTS attribute of this META tag. May be empty. |
| Parameter | Description | Example |
|---|---|---|
| protocol | The URL scheme to be used. | https |
| URL | The URL to be retrieved. | https://www.htdig.org:8008/attrs.html |
| configuration-file | The configuration-file in effect. | /etc/htdig/htdig.conf |
The external protocol script is to write information for htdig on the standard output. The output must follow the form described here. The output consists of a header followed by a blank line, followed by the contents of the document. Each record in the header is terminated with a newline. Each record is a series of (unless expressively allowed to be empty) non-empty tab-separated fields. The first field is a single character that specifies the record type. The rest of the fields are determined by the record type.
| Record type | Fields | Description |
|---|---|---|
| s | status code |
An HTTP-style status code, e.g. 200, 404. Typical codes include:
|
| r | reason | A text string describing the status code, e.g "Redirect" or "Not Found." |
| m | status code | The modification time of this document. While the code is fairly flexible about the time/date formats it accepts, it is recommended to use something standard, like RFC1123: Sun, 06 Nov 1994 08:49:37 GMT, or ISO-8601: 1994-11-06 08:49:37 GMT. |
| t | content-type | A valid MIME type for the document, like text/html or text/plain. |
| l | content-length | The length of the document on the server, which may not necessarily be the length of the buffer returned. |
| u | url | The URL of the document, or in the case of a redirect, the URL that should be indexed as a result of the redirect. |
sort -u on the file to
eliminate duplicates from the file.
The default value of this attribute is determined at compile time.
" }, { "include
<META name="somename" content="somevalue">
http://.
HTML text to display when no matches were found.
The file should contain a complete HTML
document.contrib/scriptname
directory for a small example. Note that this
attribute also affects the value of the CGI variable
used in htsearch templates.
|
|
... Andrew's the digger will see this as
Andrews.