At 12:06 PM -0400 5/10/01, Tod Thomas wrote:
>We are looking into the possibility of replacing our current search
>capabilities with a purchased solution. Before going that route it
>was thought to compare what other offerings can provide against what
>we are getting already from htdig. A number of solutions are
>available, each with a slightly different mix of benefits.
It's true there are a *lot* of search engines out there. Some are
scalable to millions of documents, provide nice administration
interfaces, alternate relevance ranking algorithms, read additional
file format filters, etc.
>The person leading this research has discovered a company called
>Autonomy that can provide - in addition to boolean/keyword searching
>- collaborative filtering, social agents, natural language analysis,
>and support for parsing large numbers of file formats I believe
>there are other companies that market these capabilities as well.
Autonomy, Verity, Convera and other vendors provide something that is
better described as "knowledge management" and "enterprise
information portals". This is generally an investment designed to
make your white-collar workers more productive. However, this is an
enterprise-level process, requires top-level management leadership
and often fails. Autonomy is a fine search engine but you have to
ask yourself if you're ready to jump into knowledge management.
>Can htdig provide collaborative filtering, social agents, natural
>language analysis, support for parsing a large number of file
>formats, the way some of the purchased products claim to?
Collaborative filtering no: that is a big complex process that
requires personal profiles and automatic updating, and it works best
with vector search engines. Social agents could mean simply saved
searches, ht://Dig can do this. Natural language analysis is
currently important for document similarity functions, but very few
web searches ever use sentence queries. ht://Dig has access to a ton
of open-source file format converters, but there are always more file
formats...
>If not, what would its limitations be when compared to one of them?
>Has anyone else done a comparison like this and would care to share
>their opinions?
I wrote a review of high-end search engines for Network Computing
Magazine, covering many of these issues. While I didn't cover
ht://Dig as such, I did include it in a sidebar.
<http://www.nwc.com/1120/1120f1.html>
My feeling is that ht://Dig doesn't scale to the millions of pages
the way some of the other search engines do, and it doesn't have a
nice admin interface. But it works well, once you figure out the
compile and config issues.
Avi
PS I am available for consulting on these issues too...
--
_________________________________________________
Complete Guide to Search Engines for Web Sites, Intranets,
and Portals: <http://www.searchtools.com>
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html