I love htdig, especially the 3.2.0b5 release.

As far as I know, htdig is the only "industrial strength" indexing spider that supports a variety of file formats and that is open source, and that once it is set up and configured, runs fine for months on end. It has support for basic authentication (which is adequate when used in conjunction with Integrated Windows authentication). And AFAIK, it is also the only option for indexing an intranet...

I am on the verge of releasing a new version of our intranet. I've decided to go from 3.1.6 to 3.2 because of phrase searching (and because it seems like the first beta that is really ready for it), but it seems to me that 3.2.0b5 is actually faster at digging than 3.1.6, but it may be that the new version of the intranet simply has fewer redundant pages. Regardless, I am confident that 3.2.0b5 is ok for our production environment.

I am not a C programmer and thus am not able to contribute all that much. I can tell you, however, that there is a rather significant learning curve for using htdig (when compared to other information systems I've deployed) and this may contribute to any perceived drop in "market share".

Sometimes I dream about learning about the Mac OS X packager so that I can create an htdig package for it to aid in distribution, setup and maintenance, but even that is just beyond me (for now). Even rundig and rundig.sh, for as much as they offer, to me, fall short of being a complete solution (but rundig.sh sure goes a long way toward helping... don't get me wrong!).

Personally, I'm willing to live with occasional segmentation faults if I can easily recover from them. I recently had a situation in which a troublesome Word file appeared to be corrupting the databases (htdig would hang when trying to index the document via catdoc and subsequent searches against the databases would fail with segmentation faults). Recovering from such a situation required deleting the databases and recreating them from scratch: rundig.sh -i -s -c conf/myconf.conf... but I had to modify rundig to delete existing databases if -a was not used, otherwise, it didn't seem to be creating genuinely new databases (please correct me if I'm wrong).

What I'd really, really like to see is a rundig script that handles these sorts of situations a little more automatically. rundig.sh has been great to learn form (really, really great) but for people who don't have time or the inclination, there needs to be a more complete, double-clickable solution. But then again, maybe the target audience isn't people who aren't adept system administrators... I don't know... Just my thoughts (having been an htdig user since about 1998).

BTW, I've been scouring the documentation for the answer and have been unable to find it. Can someone tell me what role the different databases play and which, if any, are "temporary" (can be deleted after a dig, for example)?

Thanks to all the developers, Lachlan, Jim, Gabriele, Joe, Gilles, and Goeff and everyone else who has contributed and I look forward to testing out the most recent beta on my Mac OS X box.

Ted Stresen-Reuter



-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to