[htdig] "deleted no excerpts " with pdf files

Dominique Fourtune Thu, 18 Dec 2003 10:33:37 -0800

Hello everybody, I need help

I'm using htdig 3.1.6, to parse html pages created by Apache mod-autoindex

I can't merge pdf files, I get always error message " Deleted no excerpts"

I'm using doc2html.pl, it is OK for .doc files, but not for pdf files

pdf2html.pl on command line parses pdf files and creates html files

I found this old post :

According to Paul COURBIS:
> When I run htmerge, I get a lot of messages :
> Deleted, no excerpt: xxx/http...
>
> What does it mean ? Why does htmerge suppress so many documents from the
> database ? As far as I understand english it seems that it means that
> there's no keyword for these pages, despite the fact that when I connect
> to it there's a lot of text...

The most common causes of this are:
- a noindex directive somewhere in the document
- the document was disallowed by robots.txt
- the server_max_docs limit was reached before this document could be parsed

You'd need to correlate the htmerge -v output back to the htdig -v (or -vv)
output to see which of these conditions occurred.

I think the first reason is the good one (I have no robots), but I need help to go further : what is a noindex directive ?

Thanks a lot

-- 
Dominique FOURTUNE - ADEME Département MDE
05 55 10 27 49 - [EMAIL PROTECTED]
Les ordinateurs marchent très bien sans Microsoft, et pour moins cher : passez à Linux !

[htdig] "deleted no excerpts " with pdf files

Reply via email to