Johnny Lee wrote: > Steve Holden wrote: > >>Johnny Lee wrote: >> >>>Alex Martelli wrote: >>> >>> >>>>Johnny Lee <[EMAIL PROTECTED]> wrote: >>>> ... >>>> >>>> >>>>> try: >>>>> webPage = urllib2.urlopen(url) >>>>> except urllib2.URLError: >>>> >>>> ... >>>> >>>> >>>>> webPage.close() >>>>> return True >>>>>---------------------------------------------------- >>>>> >>>>> But every time when I ran to the 70 to 75 urls (that means 70-75 >>>>>urls have been tested via this way), the program will crash and all the >>>>>urls left will raise urllib2.URLError until the program exits. I tried >>>>>many ways to work it out, using urllib, set a sleep(1) in the filter (I >>>>>thought it was the massive urls crashed the program). But none works. >>>>>BTW, if I set the url from which the program crashed to base url, the >>>>>program will still crashed at the 70-75 url. How can I solve this >>>>>problem? thanks for your help >>>> >>>>Sure looks like a resource leak somewhere (probably leaving a file open >>>>until your program hits some wall of maximum simultaneously open files), >>>>but I can't reproduce it here (MacOSX, tried both Python 2.3.5 and >>>>2.4.1). What version of Python are you using, and on what platform? >>>>Maybe a simple Python upgrade might fix your problem... >>>> >>>> >>>>Alex >>> >>> >>>Thanks for the info you provided. I'm using 2.4.1 on cygwin of WinXP. >>>If you want to reproduce the problem, I can send the source to you. >>> >>>This morning I found that this is caused by urllib2. When I use urllib >>>instead of urllib2, it won't crash any more. But the matters is that I >>>want to catch the HTTP 404 Error which is handled by FancyURLopener in >>>urllib.open(). So I can't catch it. >>> >> >>I'm using exactly that configuration, so if you let me have that source >>I could take a look at it for you. >> [...] > > I've sent the source, thanks for your help. > [...] Preliminary result, in case this rings bells with people who use urllib2 quite a lot. I modified the error case to report the actual message returned with the exception and I'm seeing things like:
http://www.holdenweb.com/./Python/webframeworks.html Message: <urlopen error (120, 'Operation already in progress')> Start process http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20 Error: IOError while parsing http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20 Message: <urlopen error (120, 'Operation already in progress')> . . . So at least we know now what the error is, and it looks like some sort of resource limit (though why only on Cygwin betas me) ... anyone, before I start some serious debugging? regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ -- http://mail.python.org/mailman/listinfo/python-list