>
> On Tue, 2 Mar 1999, Frank Guangxin Liu wrote:
> >
> > Does that mean it won't discover/dig new URLs either?
> >
>
> It will dig new URLs, (unless you are limiting the # of pages/server, and
> have already maxed this out).
Consider this scenario, on the initial dig, some of my web servers were
down, so the statistics from the original htdig shows 0 documents for
those servers. Now on the update dig, those servers are up. I would imagine
htdig will fully dig them, unfortunately, that is not the case.
The statistics from the update htdig doesn't show those servers at all,
not even 0 documents.
Frank
>
> I'm testing some mods to htDig to add an ability to ignore URLs in the
> database and start only on the start_url.
>
> This was easy on the surface, but tricky in practice because I wanted
> to skip unchanged pages, but still follow their links. Adding a list
> of HREFs for each document to the database allowed me to maintain a
> breadth-first search order during an update dig. This is nice for me
> because I want to frequently refresh an index of just the top 500 pages
> of a server without starting from scratch each time.
>
> I'd like to add this option to the build if anyone else would be
> interested.
>
> (P.S. You might also consider doing an initial dig on your subset and
> then merging the subset data into the full database when it's done)
>
> Matthew Edwards ([EMAIL PROTECTED]) | The fuel of innovation and
> Go2Net Inc. 999 Third Ave Suite 4700 | progress is freedom.
> Seattle WA 98104 |
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED] containing the single word "unsubscribe" in
> the SUBJECT of the message.
>
>
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.