On 27 Apr, Peter L. Peres wrote:
>
> Hi,
>
> the machine finished ! The loop was in the java api docs. There were no
> other loops. There is no bug in htdig wrt. this problem (looping).
>
> Here are some stats from the end:
>
> 27425.60user 10781.29system 43:11:27elapsed 24%CPU (0avgtext+0avgdata
> 0maxrent)k
> 0inputs+0outputs (18429778major+3453038minor)pagefaults 2501532swaps
>
> htdig ran with a niceness 18 for the last 25% of the indexing. Load was
> 0.8 or so during this time. docdb is about 200MB. My input was about
> 220MB.
>
> The loop problem was in the tree:
>
> /usr/doc/packages/javadoc/docs/api/
>
> which has more than 500 entries.
>
> System: i486/100MHz/24MB RAM 4.3+2.8 GB EIDE disks (not UDMA), headless
> (ethernet only) Suse 6.2 Linux (w. modified html documentation system - by
> me). As you can see the machine was swapping like crazy. I think I'd need
> a machine with 256MB RAM to avoid serious swapping. Not likely anytime
> soon.
>
> thank you all for the ideas,
>
> Peter
I think I've come across this sort of problem when trying to index a
series of documents that have a lot of internal references (A
HREF="#target"> and htdig tries to follow each of these links, ending up
going in ever decreasing circles until....
My solution was to add something like html# to the exclude_urls list.
Cheers
--
David Robley | WEBMASTER & Mail List Admin
RESEARCH CENTRE FOR INJURY STUDIES | http://www.nisu.flinders.edu.au/
AusEinet | http://auseinet.flinders.edu.au/
Flinders University, ADELAIDE, SOUTH AUSTRALIA
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.