On Fri, 24 Jan 2003, Gilles Detillieux wrote:

> Date: Fri, 24 Jan 2003 17:29:28 -0600 (CST)
> From: Gilles Detillieux <[EMAIL PROTECTED]>
> To: Joe R. Jah <[EMAIL PROTECTED]>
> Cc: "ht://Dig mailing list" <[EMAIL PROTECTED]>
> Subject: Re: [htdig] Errors to take note of
> 
> According to Joe R. Jah:
> > Slightly off this topic;)  Is it possible to have htdig list all instances
> > of "Not found" URL rather than list only one instance per URL.  For
> > instance, if There are page1.html, page2.html, page3.html, .. in a site
> > pointing to a "Not found" URL, missing.html, only one of those documents,
> > (at random or fifo?,) is listed:
> > 
> > Not found: http://www.abc.com/path/2/missing.html Ref: 
>http://www.abc.com/path/2/page3.html
> > 
> > I'd like to see htdig report all instances:
> > 
> > Not found: http://www.abc.com/path/2/missing.html Ref: 
>http://www.abc.com/path/2/page1.html
> > Not found: http://www.abc.com/path/2/missing.html Ref: 
>http://www.abc.com/path/2/page2.html
> > Not found: http://www.abc.com/path/2/missing.html Ref: 
>http://www.abc.com/path/2/page3.html
> 
> This isn't a trivial change.  Right now, htdig only attemts to fetch any
> URL once at most, so the referer it lists is the first document it parsed
> that had a link to the missing file.  To do what you ask would require
> keeping track of all referers to every URL so that you could, in the end,
> report all referers to missing URLs.  That would likely chew up a lot of
> RAM on a large site.  An optimization to this would be to record in the
> "visited" table whether the URL was found or not, but you'd still need
> to keep track of all referers to URLs currently in the queue because you
> don't know if they'll be found or not.  As you can see, the problem is
> a fair bit more complicated than it may seem at first glance.

Thanks; yes I can see the complexity.

> If you keep updating old links in the reported referers, though, then
> you should eventually after a few reindexing cycles sniff out all the
> referers that need updating.

That's what I have been doing; I also grep for the missing file in all
adjacent documents to find most culprits;)

Regards,

Joe
-- 
     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        [EMAIL PROTECTED]



-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to