This was exctracted with a simple perl script.
May be usefull to some of you ?
-------------------------------
htlib/Configuration.h:
This class provides an object lookup table. Each object
in the Configuration is indexed with a string. The objects
can be returned by mentioning their string index.
htlib/Connection.h:
This class forms a easy to use interface to the berkeley
tcp socket library. All the calls are basically the same,
but the parameters do not have any stray _addr or _in
mixed in...
htlib/DB2_db.h:
implements the btree database instance of a Database object
htlib/DB2_hash.h:
implements the hash database instance of a Database object
htlib/Database.h:
Class which defines the interface to a generic,
simple database.
htlib/Dictionary.h:
This class provides an object lookup table.
Each object in the dictionary is indexed with a string.
The objects can be returned by mentioning their
string index.
htlib/HtCodec.h:
Provide a generic means to take a String, code
it, and return the encoded string. And vice versa.
htlib/HtDateTime.h:
Parse, split, compare and format dates and times.
htlib/HtHeap.h:
A Heap class which holds objects of type Object.
(A heap is a semi-ordered tree-like structure.
it ensures that the first item is *always* the largest.
NOTE: To use a heap, you must implement the Compare() function for
your Object classes. The assumption used here is -1 means
less-than, 0 means equal, and +1 means greater-than. Thus
this is a "min heap" for that definition.)
htlib/HtPack.h:
Compress and uncompress data in e.g. simple structures.
htlib/HtRegex.h:
A simple C++ wrapper class for the system regex routines.
htlib/HtSGMLCodec.h:
A Specialized HtWordCodec class to convert between SGML
ISO 8859-1 entities and high-bit characters.
htlib/HtURLCodec.h:
Specialized HtWordCodec which just caters to the
needs of "url_part_aliases" and "common_url_parts".
Used for coding URLs when they are on disk; the key and the
href field in db.docdb.
htlib/HtVector.h:
A Vector class which holds objects of type Object.
(A vector is an array that can expand as necessary)
This class is very similar in interface to the List class
htlib/HtWordCodec.h:
Given two lists of pair of "words" 'from' and 'to';
simple one-to-one translations, use those lists to translate.
Only restriction are that no null (0) characters must be
used in "words", and that there is a character "joiner" that
does not appear in any word. One-to-one consistency may be
checked at construction.
htlib/HtWordType.h:
Wrap some attributes to make is...() type
functions and other common functions without having to manage
the attributes or the exact attribute combination semantics.
htlib/HtZlibCodec.h:
Provide a generic access to the zlib compression routines.
If zlib is not present, encode and decode are simply
assignment functions.
htlib/IntObject.h:
int variable encapsulated in Object derived class
htlib/List.h:
A List class which holds objects of type Object.
htlib/Object.h:
This baseclass defines how an object should behave.
This includes the ability to be put into a list
htlib/ParsedString.h:
Contains a string. The string my contain $var, ${var}, $(var)
`filename`. The get method will expand those using the
dictionary given in argument.
htlib/Queue.h:
This class implements a linked list of objects. It itself is also an
object
htlib/QuotedStringList.h:
Fed with a string it will extract separator delimited
words and store them in a list. The words may be
delimited by " or ', hence the name.
htlib/Stack.h:
This class implements a linked list of objects. It itself is also an
object
htlib/StringList.h:
Specialized List containing String objects.
htlib/StringMatch.h:
This class provides an interface to a fairly specialized string
lookup facility. It is intended to be used as a replace for any
regualr expression matching when the pattern string is in the form:
htlib/URL.h:
A URL parsing class, implementing as closely as possible the standard
laid out in RFC2396 (e.g. http://www.faqs.org/rfcs/rfc2396.html)
including support for multiple schemes.
htlib/cgi.h:
Parse cgi arguments and put them in a dictionary.
htlib/good_strtok.h:
The good_strtok() function is very similar to the
standard strtok() library function, except that good_strtok()
htlib/htString.h:
(implementation in String.cc) Just Another String class.
htlib/io.h:
Perform low level I/O. The Connection class is derived from io.
htlib/langinfo.h:
compatibility for strptime implementation on architectures
that do not contain this header.
htlib/lib.h:
Contains typical declarations and header inclusions used by
most sources in this directory.
htlib/regex.h:
replacement of the regex function for architectures that do
not have them.
htcommon/DocumentDB.h:
This class is the interface to the database of document
references. This database is only used while digging.
An extract of this database is used for searching.
This is because digging requires a different index
than searching.
htcommon/DocumentRef.h:
Reference to an indexed document. Keeps track of all
information stored on the document, either by the dig
or temporary search information.
htcommon/WordList.h:
Interface to the word database. Previously, this wrote to
a temporary text file. Now it writes directly to the
word database.
NOTE: Some code previously attempted to directly read from
the word db. This will no longer work, so it's preferred to
use the access methods here.
htcommon/WordRecord.h:
Record for storing word information in the word database
Each word is stored as a separate key/record pair.
htcommon/WordReference.h:
Reference to a word. Store everything we need for internal use
Defined as a class to allow the comparison
method (for sorting).
htcommon/defaults.h:
Default configuration values for the ht programs
htdig/Document.h:
This class holds everything there is to know about a document.
The actual contents of the document may or may not be present at
all times for memory conservation reasons.
The document can be told to retrieve its contents. This is done
with the Retrieve call. In case the retrieval causes a
redirect, the link is followed, but this process is done
only once (to prevent loops.) If the redirect didn't
work, Document_not_found is returned.
htdig/ExternalParser.h:
Allows external programs to parse unknown document formats.
The parser is expected to return the document in a
specific format. The format is documented
in http://www.htdig.org/attrs.html#external_parser
htdig/HTML.h:
Class to parse HTML documents and return useful information
to the Retriever
htdig/HtHTTP.h:
Class for HTTP messaging (derived from Transport)
htdig/Images.h:
Issue an HTTP request to retrieve the size of an image from
the content-length field.
htdig/PDF.h:
This class parses PDF (acrobat) files.
Parsing is done on PostScript translation of the PDF file
by Acrobat Reader (acroread). It is freely available for
most platform at www.adobe.com
htdig/Parsable.h:
Base class for file parsers (HTML, PDF, ExternalParser ...)
htdig/Plaintext.h:
Parses plaintext files. Not much to do, really.
htdig/Retriever.h:
Crawl from a list of URLs and calls appropriate parsers. The
parser notifies the Retriever object that it got something
(got_* functions) and the Retriever object feed the databases
and statistics accordingly.
htdig/Server.h:
A class to keep track of server specific information.
htdig/Transport.h:
A virtual transport interface class for accessing
remote documents. Used to grab URLs based on the
scheme (e.g. http://, ftp://...)
htdig/URLRef.h:
A definition of a URL/Referer pair with associated hopcount
htdig/htdig.h:
Indexes the web sites specified in the config file
generating several databases to be used by htmerge
htmerge/htmerge.h:
The interface to the htmerge program
Defines the calling conventions for
mergeDB -> db.cc (merging two databases)
mergeWords -> words.cc (updating the word db)
convertDocs -> docs.cc (updating the doc db)
reportError -> htmerge.cc (reporting errors)
htsearch/Display.h:
Implementation of Display
Takes results of search and fills in the HTML templates
htsearch/DocMatch.h:
Data object only. Contains information related to a given
document that was matched by a search. For instance, the
score of the document for this search.
htsearch/ResultList.h:
A Dictionary indexed on the document id that holds
documents found for a search.
htsearch/ResultMatch.h:
Contains information related to a given
document that was matched by a search. For instance, the
score of the document for this search. Similar to the
DocMatch class but designed for result display purposes.
htsearch/Template.h:
Gives access to template files used to format the output
of htsearch.
htsearch/TemplateList.h:
Holds the templates available to format a list of
results. These can be compiled in or read from
files.
htsearch/WeightWord.h:
?
htsearch/htsearch.h:
Command-line and CGI interface to search the databases
Expects the databases are generated using htdig, htmerge,
and htfuzzy. Outputs HTML-ized results of the search based
on the templates specified
htsearch/parser.h:
Parse the string containing a search request and find the
document that matches.
---------------------
Script used to generate it:
perl script htlib/*.h htcommon/*.h htdig/*.h htmerge/*.h htsearch/*.h
foreach $file (@ARGV) {
my($comment) = '';
my($head);
open(FILE, "<$file");
my($tag);
($tag = $file) =~ s|.*/(.*)\.h$|$1|;
while(<FILE>) {
if(s|^(//\s+$tag:\s+)||) {
$head = $1;
$head =~ s|[^/]| |g;
$comment .= "\t$_";
} elsif($head) {
if(s/^$head//) {
$comment .= "\t$_";
} else {
last;
}
}
}
close(FILE);
print "$file:\n$comment\n";
}
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.