Hi Gilles,

If I set the max_hop_count to 0, it will only fetch the first page, and want it to 
fetch 1 page further so max_hop_count need to be at 1 but what's happening is that the 
fetch goes behond the 1800 domains, when it's supposed to reject the domain that are 
not in the start_url...

Any suggestion, by the way it works fine when there less domain say 1500 domains ??? 
very strange...

Dann Cohen - Dir., Outsourcing and Information Systems
Toxik Technologies Inc. - Montreal, QC, Canada
www.toxik.com - Phone: (514) 528-6945 x 2 . Fax: (514) 221-3329


-----Original Message-----
From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
Sent: 4 janvier, 2001 12:04
To: Toxik - Dann Cohen
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig3-dev] Fetching outside of domain list (not supposed
to)


According to Toxik - Dann Cohen:
> I'm a new comer (6 month user of ht://dig) to this list and before 
> saying anything I would like to say hi to everyone. Now to the good 
> stuff =)
> 
> I've encounter a problem with the fetching part. I have about 1800 site 
> in my "start_url" to fetch with a "max_hop_count" of 1 and it seems to 
> go beyond the 1800.
> 
> HTTP statistics
> ===============
>  Persistent connections    : Yes
>  HEAD call before GET      : No
>  Connections opened        : 14973
>  Connections closed        : 14973
>  Changes of server         : 6030
>  HTTP Requests             : 35357
>  HTTP KBytes requested     : 209216
>  HTTP Average request time : 0.647679 secs
>  HTTP Average speed        : 9.13605 KBytes/secs
> 
> Has you can see the value of "changes server" is higher than 1800. I can 
> also see in the log that it goes beyond the domain (see bellow for an 
> example), the domain is www.singapore-inc.com and you can see that a 
> "mailto:" and "www.sedb.com.sg" is pushed in. The problem doesn't happen 
> when I fetch them alone, any suggestion or hints are welcome.

If you haven't already figured it out, you should be setting max_hop_count
to 0, not 1.  One hop means it will attempt to follow all the valid links
in those initial 1800 documents.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.


Reply via email to