According to Joe R. Jah: > Slightly off this topic;) Is it possible to have htdig list all instances > of "Not found" URL rather than list only one instance per URL. For > instance, if There are page1.html, page2.html, page3.html, .. in a site > pointing to a "Not found" URL, missing.html, only one of those documents, > (at random or fifo?,) is listed: > > Not found: http://www.abc.com/path/2/missing.html Ref: >http://www.abc.com/path/2/page3.html > > I'd like to see htdig report all instances: > > Not found: http://www.abc.com/path/2/missing.html Ref: >http://www.abc.com/path/2/page1.html > Not found: http://www.abc.com/path/2/missing.html Ref: >http://www.abc.com/path/2/page2.html > Not found: http://www.abc.com/path/2/missing.html Ref: >http://www.abc.com/path/2/page3.html
This isn't a trivial change. Right now, htdig only attemts to fetch any URL once at most, so the referer it lists is the first document it parsed that had a link to the missing file. To do what you ask would require keeping track of all referers to every URL so that you could, in the end, report all referers to missing URLs. That would likely chew up a lot of RAM on a large site. An optimization to this would be to record in the "visited" table whether the URL was found or not, but you'd still need to keep track of all referers to URLs currently in the queue because you don't know if they'll be found or not. As you can see, the problem is a fair bit more complicated than it may seem at first glance. If you keep updating old links in the reported referers, though, then you should eventually after a few reindexing cycles sniff out all the referers that need updating. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

