> > 1. If I add '.php' to exclude_urls or bad_extensions, rundig doesn't > > work (runs momentarily then stops a second later, no useful data in > > database). Removing '.php' from the list solves problem.
> What does your start_url look like? Does it perhaps resolve or redirect to > a PHP file? If so, then you will immediately exclude indexing of the first > file and therefore never find any additional URLs to retrieve. Yes, that is precisely the problem (index.php). Pretty obvious now that you point it out! I assumed that htdig just indexed whatever docs it found sitting in a directory, but it actually works by following links within docs? > > > 2. I have added the external parsers required for .doc, .pdf, .rtf. I > > can run these successfully on the command line directly and via > > doc2html. HtDig (rundig) doesn't still doesn't process any of these file > > types though. > > Again your best bet is probably to start by trying a run with some '-v's. > This will allow you to determine whether htdig is even seeing the files > that you want to index. There are a lot of reasons for htdig not seeing > files that you might expect it to find. If this appears to be the case the > following is a good place to start looking for answers. > > http://www.htdig.org/FAQ.html#q5.27 > Hmm, nothing in the FAQ seems to apply. The PDF and DOC files are sitting in the top of DocumentRoot, are world readable, and aren't excluded in any way that I can see. rundig -v -v tells me this about the PDF file: Deleted, no excerpt: 5/http://192.168.0.1/SB04-091.pdf whereas the DOC appears to be indexed OK, I just can't find it with any search words at all. This "Word" doc (.doc) was created with OpenOffice 1.0, I wonder if the MIME type is wrong? > > 3. Editing common/long.html appears to have no effect whatsoever on > > output, whereas common/header.html for example is readily editable. > > By default, htsearch uses templates that are compiled into the executable; > this provides a slight performance advantage. In order to use the template > files, you need to make some changes to your configuration file. Search > htdig.conf (or whatever you named it) for template_map and template_name. Yep, I didn't play with that one long enough. I only uncommented the template_name. Thanks, Sorry for pestering the list with a couple of obvious ones. I do try to pull my weight on some other lists when I can! Mick ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

