On Sat, 3 Apr 2004, Ninti Systems wrote:

> Hmm, nothing in the FAQ seems to apply. The PDF and DOC files are
> sitting in the top of DocumentRoot, are world readable, and aren't
> excluded in any way that I can see.
> 
> rundig -v -v tells me this about the PDF file:
> 
>   Deleted, no excerpt: 5/http://192.168.0.1/SB04-091.pdf

Are you sure that the PDF contains content that can be indexed? In
some cases a PDF is just a wrapper around an image, in which case
there would be no text to index. I believe that the "Deleted, no
excerpt" message implies that nothing of interest was left from
the document after processing. Also, if your PDF's are large, make
sure that max_doc_size is set appropriately. If the value of this
attribute is too small, the PDF will be truncated and not parsed
correctly. See the following for more on this attribute.

http://www.htdig.org/attrs.html#max_doc_size

> whereas the DOC appears to be indexed OK, I just can't find it with any
> search words at all. This "Word" doc (.doc) was created with OpenOffice
> 1.0, I wonder if the MIME type is wrong?

Not sure on this one. My guess is that you would see some sort of
message indicating a problem if the MIME type was unrecognized. If
you increase the number of '-v's you pass rundig you should get to
a point where you can see the actual words that htdig is indexing.
If you see the words you are interested in being indexed but still
can't find them in the database, then it might be necessary to look
elsewhere for the problem.

Jim


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to