Honestly after looking at lots of htdig 3.2.0 code to build and debug
libhtdig... there is a TON of redundant and confusing code in it.

  We've implemented way too many of our own data structures... we should
be using the STL... and the document parsing code is very wild to trace
through.

  And the searching code needs help, and we have a prototype from Loic for
new search code.

> My vote is that we release 3.2.0b6 basically as the code stands now,
> and then start 3.3.0a1 by back-porting features to 3.1.6.  For each,
> we'll measure the impact on performance, and decide which ones are
> worth it.

  There is another alternative to either flushing out the inefficient
cruft in 3.2.0 or backporting to 3.1.6

  We could look at integrating with Clucene, it's a C++ search engine.  It
is just the store-search portion.... no spidering engine and no UI.  It's
an API (as far as I can tell).

  It's a C++ rewrite of Java Lucene.

  It's worth considering... but would be a lot of work.  We would have to
carefully examine which htdig configs we could still support.

  The advantage is that CLucene is under active development by experienced
search-engine people, I believe one of the participants is an original
Altavista developer.   It's a fairly small code base, and it's LGPL.

  The disadvantages are that at the moment there is no DB compression,
it's not an enduser application (where HtDig is), and it will be a lot
of work.

Would we all be satisfied if we used a different project's 'guts'?  For
that matter we could look at moving our spidering code to use a different
library.

HtDig would concentrate on providing a flexible application
built upon a separate store/search engine and possibly other libraries for
things like spidering and user interface.

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to