Hi, I was using urllib to grab urls from web. here is the work flow of my program:
1. Get base url and max number of urls from user 2. Call filter to validate the base url 3. Read the source of the base url and grab all the urls from "href" property of "a" tag 4. Call filter to validate every url grabbed 5. Continue 3-4 until the number of url grabbed gets the limit In filter there is a method like this: -------------------------------------------------- # check whether the url can be connected def filteredByConnection(self, url): assert url try: webPage = urllib2.urlopen(url) except urllib2.URLError: self.logGenerator.log("Error: " + url + " <urlopen error timed out>") return False except urllib2.HTTPError: self.logGenerator.log("Error: " + url + " not found") return False self.logGenerator.log("Connecting " + url + " successed") webPage.close() return True ---------------------------------------------------- But every time when I ran to the 70 to 75 urls (that means 70-75 urls have been tested via this way), the program will crash and all the urls left will raise urllib2.URLError until the program exits. I tried many ways to work it out, using urllib, set a sleep(1) in the filter (I thought it was the massive urls crashed the program). But none works. BTW, if I set the url from which the program crashed to base url, the program will still crashed at the 70-75 url. How can I solve this problem? thanks for your help Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list