>Getting the dead pages is easy; in the log they're marked "not found".
>Getting their sources is a little harder, but with  V3.1.2 all you had to

Actually, it's a *lot* easier than this. Use the -s flag. At the end 
of the dig, it will print the broken URLs and their referers. There's 
even a contributed script in the archive that will help you do 
various things with the list.

>Also: is there any documentation for the format of the log file?  what are
>the three numbers at the beginning of the line, e.g.
>
>        14:2:0:<url>:  not found

Index #, DocID, Hopcount

where Index # is incremented every step during that indexing run, 
DocID is the internal database ID #, and hopcount is the number of 
hops from the start_url.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to