Honestly after looking at lots of htdig 3.2.0 code to build and debug libhtdig... there is a TON of redundant and confusing code in it.
We've implemented way too many of our own data structures... we should be using the STL... and the document parsing code is very wild to trace through. And the searching code needs help, and we have a prototype from Loic for new search code. > My vote is that we release 3.2.0b6 basically as the code stands now, > and then start 3.3.0a1 by back-porting features to 3.1.6. For each, > we'll measure the impact on performance, and decide which ones are > worth it. There is another alternative to either flushing out the inefficient cruft in 3.2.0 or backporting to 3.1.6 We could look at integrating with Clucene, it's a C++ search engine. It is just the store-search portion.... no spidering engine and no UI. It's an API (as far as I can tell). It's a C++ rewrite of Java Lucene. It's worth considering... but would be a lot of work. We would have to carefully examine which htdig configs we could still support. The advantage is that CLucene is under active development by experienced search-engine people, I believe one of the participants is an original Altavista developer. It's a fairly small code base, and it's LGPL. The disadvantages are that at the moment there is no DB compression, it's not an enduser application (where HtDig is), and it will be a lot of work. Would we all be satisfied if we used a different project's 'guts'? For that matter we could look at moving our spidering code to use a different library. HtDig would concentrate on providing a flexible application built upon a separate store/search engine and possibly other libraries for things like spidering and user interface. Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 ------------------------------------------------------- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
