> > 1. If I add '.php' to exclude_urls or bad_extensions, rundig doesn't
> > work (runs momentarily then stops a second later, no useful data in
> > database). Removing '.php' from the list solves problem. 

> What does your start_url look like? Does it perhaps resolve or redirect to
> a PHP file? If so, then you will immediately exclude indexing of the first
> file and therefore never find any additional URLs to retrieve. 

Yes, that is precisely the problem (index.php). Pretty obvious now that
you point it out! I assumed that htdig just indexed whatever docs it
found sitting in a directory, but it actually works by following links
within docs?

> 
> > 2. I have added the external parsers required for .doc, .pdf, .rtf. I
> > can run these successfully on the command line directly and via
> > doc2html. HtDig (rundig) doesn't still doesn't process any of these file
> > types though. 

> 
> Again your best bet is probably to start by trying a run with some '-v's.
> This will allow you to determine whether htdig is even seeing the files
> that you want to index. There are a lot of reasons for htdig not seeing
> files that you might expect it to find. If this appears to be the case the
> following is a good place to start looking for answers.
> 
>   http://www.htdig.org/FAQ.html#q5.27
> 

Hmm, nothing in the FAQ seems to apply. The PDF and DOC files are
sitting in the top of DocumentRoot, are world readable, and aren't
excluded in any way that I can see.

rundig -v -v tells me this about the PDF file:

  Deleted, no excerpt: 5/http://192.168.0.1/SB04-091.pdf

whereas the DOC appears to be indexed OK, I just can't find it with any
search words at all. This "Word" doc (.doc) was created with OpenOffice
1.0, I wonder if the MIME type is wrong?


> > 3. Editing common/long.html appears to have no effect whatsoever on
> > output, whereas common/header.html for example is readily editable.
> 
> By default, htsearch uses templates that are compiled into the executable;
> this provides a slight performance advantage. In order to use the template
> files, you need to make some changes to your configuration file. Search
> htdig.conf (or whatever you named it) for template_map and template_name.

Yep, I didn't play with that one long enough. I only uncommented the template_name.

Thanks, Sorry for pestering the list with a couple of obvious ones. I do try to pull 
my weight on some other lists when I can!

Mick



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to