At 12:06 PM -0400 5/10/01, Tod Thomas wrote:
>We are looking into the possibility of replacing our current search 
>capabilities with a purchased solution. Before going that route it 
>was thought to compare what other offerings can provide against what 
>we are getting already from htdig.  A number of solutions are 
>available, each with a slightly different mix of benefits.

It's true there are a *lot* of search engines out there.  Some are 
scalable to millions of documents, provide nice administration 
interfaces, alternate relevance ranking algorithms, read additional 
file format filters, etc.

>The person leading this research has discovered a company called 
>Autonomy that can provide - in addition to boolean/keyword searching 
>- collaborative filtering, social agents, natural language analysis, 
>and support for parsing large numbers of file formats  I believe 
>there are other companies that market these capabilities as well.

Autonomy, Verity, Convera and other vendors provide something that is 
better described as "knowledge management" and "enterprise 
information portals".  This is generally an investment designed to 
make your white-collar workers more productive.  However, this is an 
enterprise-level process, requires top-level management leadership 
and often fails.  Autonomy is a fine search engine but you have to 
ask yourself if you're ready to jump into knowledge management.

>Can htdig provide collaborative filtering, social agents, natural 
>language analysis, support for parsing a large number of file 
>formats, the way some of the purchased products claim to?

Collaborative filtering no: that is a big complex process that 
requires personal profiles and automatic updating, and it works best 
with vector search engines.  Social agents could mean simply saved 
searches, ht://Dig can do this.  Natural language analysis is 
currently important for document similarity functions, but very few 
web searches ever use sentence queries.  ht://Dig has access to a ton 
of open-source file format converters, but there are always more file 
formats...

>If not, what would its limitations be when compared to one of them? 
>Has anyone else done a comparison like this and would care to share 
>their opinions?

I wrote a review of high-end search engines for Network Computing 
Magazine, covering many of these issues.  While I didn't cover 
ht://Dig as such, I did include it in a sidebar. 
<http://www.nwc.com/1120/1120f1.html>

My feeling is that ht://Dig doesn't scale to the millions of pages 
the way some of the other search engines do, and it doesn't have a 
nice admin interface.  But it works well, once you figure out the 
compile and config issues.

Avi

PS I am available for consulting on these issues too...
-- 
_________________________________________________
Complete Guide to Search Engines for Web Sites, Intranets, 
   and Portals: <http://www.searchtools.com>

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to