Re: [htdig] update digging

Frank Guangxin Liu Wed, 3 Mar 1999 09:08:57 -0500

> 
> On Tue, 2 Mar 1999, Frank Guangxin Liu wrote:
> > 
> > Does that mean it won't discover/dig new URLs either?
> > 
> 
> It will dig new URLs, (unless you are limiting the # of pages/server, and
> have already maxed this out).  

Consider this scenario, on the initial dig, some of my web servers were
down, so the statistics from the original htdig shows 0 documents for
those servers. Now on the update dig, those servers are up. I would imagine
htdig will fully dig them, unfortunately, that is not the case.
The statistics from the update htdig doesn't show those servers at all,
not even 0 documents.

Frank

> 
> I'm testing some mods to htDig to add an ability to ignore URLs in the
> database and start only on the start_url.
> 
> This was easy on the surface, but tricky in practice because I wanted
> to skip unchanged pages, but still follow their links.  Adding a list
> of HREFs for each document to the database allowed me to maintain a
> breadth-first search order during an update dig.  This is nice for me
> because I want to frequently refresh an index of just the top 500 pages
> of a server without starting from scratch each time.
> 
> I'd like to add this option to the build if anyone else would be
> interested.
> 
> (P.S. You might also consider doing an initial dig on your subset and
>   then merging the subset data into the full database when it's done)
> 
> Matthew Edwards ([EMAIL PROTECTED]) | The fuel of innovation and
> Go2Net Inc.  999 Third Ave Suite 4700 |    progress is freedom.
> Seattle WA 98104                      |
> 
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED] containing the single word "unsubscribe" in
> the SUBJECT of the message.
> 
> 

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
Re: [htdig] update digging

Reply via email to