I'm very glad to announce the release of version 3.2.0b1. As the version number denotes, this is a beta release. We're looking for feedback on the 3.2 codebase, as far as documentation, performance, features, suggestions, and of course bugs. The documentation for the 3.2.0bX series can be found in http://dev.htdig.org/htdig-3.2/ The release notes for 3.2.0b1 are at <http://dev.htdig.org/htdig-3.2/RELEASE.html> To download the source, see <http://www.htdig.org/files/htdig-3.2.0b1.tar.gz> Feedback on the release should be primarily directed to [EMAIL PROTECTED] -Geoff Hutchison Williams Students Online http://wso.williams.edu/ Release notes for htdig-3.2.0b14 Feb 2000 This marks the first beta version of the 3.2.0 codebase, over a year in the works. Since it has not received as much testing as the 3.1.x series, it is *not* recommended for production environments. A full description of how to upgrade is provided at <http://dev.htdig.org/htdig-3.2/upgrade.html> NOTE: Read this document before upgrading. You have been warned. * Fixed a bug in htdig where hopcounts could be calculated incorrectly between multiple servers. * Fixed a bug that could cause problems with 8-bit characters on some systems. * Fixed handling of unreachable servers. First, the new [4]max_retries attribute allows htdig to attempt multiple connections. Secondly, if the server is not available, htdig will stop trying to connect. * Fixed handling of SGML entities: htdig will still decode them to store as single characters in the database, but htsearch now encodes them back for compliant results. * Rewrote the database formats, allowing room for more sophisticated searches and compression of the word database using the new attribute wordlist_compress. These changes include the removal of the word_list file (db.wordlist) and the addition of the new doc_excerpt database. * Cleaned up many parts of the code, including the URL and HTML parsers. Additionally, on platforms that support it, much of the code will be built as shared libraries, which should help memory utilization, especially under high load. * Removed the modification_time_is_now attribute, which is now on by default. This means the time at indexing is taken as the date of the document if the server does not return a date. * Added the new attribute use_doc_date to use the date specified in a META date tag. * Merged all heading_factor attributes into one new attribute, heading_factor. * As a result of the new database format, all _factor attributes (like title_factor and keywords_factor are now dynamic--you do not have to rebuild your database to change the scaling. * Changed attributes bad_querystr, exclude_urls, limit_urls_to, limit_normalized, http_proxy_exclude to allow full regular expressions when the regex are surrounded by [ and ]. * Changed htsearch fields restrict and exclude to allow regular expressions when the regex are surrounded by [ and ]. * Added phrase searching support to htsearch--queries enclosed in quotes will be checked to ensure the words occur in that exact order in the documents. * Added the build_select_lists attribute to allow the config file to specify <select> form elements in htsearch output as a template variable, much like $(SORT) and $(METHOD). * Added a regex fuzzy method. This will allow searches to include regex that match words. The fuzzy method will return up to regex_max_words matches. * Added a speling [sic] fuzzy method. This attempts several simple spelling mistakes (like transposed letters and extra letters) to find matches. This adds the new attribute minimum_speling_length to restrict whether small words should be checked. Transposing letters in smaller words can give unrelated correctly-spelled words. * Added support for external transport methods, using the external_protocols attribute, an analogue of the external_parsers system. * Added support for HTTP/1.1, including persistent connections. This can be configured using the new attributes persistent_connections, head_before_get, and max_connection_requests. * Added support for file:// URLs and support for using the mime_types file to decide whether local files are parsable. * Added two new formats for variables in htsearch templates, $%(var), which escapes the variable for a URL, and $&(var), which HTML-escapes the variable as necessary. * Added support for reading the list of URLs to index with htdig by supplying the command-line option -. * Added a flag -m to htdig to index only the files given in the filename. * There are many more changes especially to the internal code structure, so a huge thank you goes out to everyone who helped make this release! ------------------------------------ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
