> > Digging with max_hop_count 8: htdig-8.0.8b2 - ca. 55,000 documents
> > htdig-8.1.0b2 - ca. 13,000 documents
> > max_hop_count 12: htdig-8.1.0b2 - 44,757 documents
>
> It is a known bug that 3.1.0b2 ignores CGIs. More precisely, it trims off
> the part of the URL after a ? in the CGI.
That's not the reason in my case, we don't have so many CGI URLs.
Example:
http://www.tu-chemnitz.de/index.html contains a link to
http://www.tu-chemnitz.de/misc/links.html which contains a link to
http://www.tu-chemnitz.de/docs/perl.html
htdig-8.0.8b2:
0:0:0:http://www.tu-chemnitz.de/: ++
...
345:40:1:http://www.tu-chemnitz.de/misc/links.html:
********-----+------*-----------------------------------------------------------------*--------------------**--------------------------------------------------------------------------------------------++-+-+---*+*+-----------------------------------------------------+*****
size = 24555
....
7538:1823:2:http://www.tu-chemnitz.de/docs/perl.html:
+--+--+-------------- size = 1943
htdig-8.1.0b2: (3 weeks later, so small changes in size etc.)
0:0:0:http://www.tu-chemnitz.de/: +++*
...
347:40:3:http://www.tu-chemnitz.de/misc/links.html:
********---+------*-----------------------------------------------------------------*--------------------**--------------------------------------------------------------------------------------------++-++-+---*+*+-----------------------------------------------------+*****
size = 24440
...
5479:2040:12:http://www.tu-chemnitz.de/docs/perl.html: size = 2579
^^??
See here level 12 (?!) - so no links in perl.html are digged.
- Frank
--
Email: [EMAIL PROTECTED] http://www.tu-chemnitz.de/~fri/
Work: Computing Services, Technical University, 09107 Chemnitz, Germany
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.