On Thu, 08 Feb 2007 10:20:56 -0800, k0mp wrote: > On Feb 8, 6:54 pm, Leif K-Brooks <[EMAIL PROTECTED]> wrote: >> k0mp wrote: >> > Is there a way to retrieve a web page and before it is entirely >> > downloaded, begin to test if a specific string is present and if yes >> > stop the download ? >> > I believe that urllib.openurl(url) will retrieve the whole page before >> > the program goes to the next statement. >> >> Use urllib.urlopen(), but call .read() with a smallish argument, e.g.: >> >> >>> foo = urllib.urlopen('http://google.com') >> >>> foo.read(512) >> '<html><head> ... >> >> foo.read(512) will return as soon as 512 bytes have been received. You >> can keep caling it until it returns an empty string, indicating that >> there's no more data to be read. > > Thanks for your answer :) > > I'm not sure that read() works as you say. > Here is a test I've done : > > import urllib2 > import re > import time > > CHUNKSIZE = 1024 > > print 'f.read(CHUNK)' > print time.clock() > > for i in range(30) : > f = urllib2.urlopen('http://google.com') > while True: # read the page using a loop > chunk = f.read(CHUNKSIZE) > if not chunk: break > m = re.search('<html>', chunk ) > if m != None : > break > > print time.clock() > > print > > print 'f.read()' > print time.clock() > for i in range(30) : > f = urllib2.urlopen('http://google.com') > m = re.search('<html>', f.read() ) > if m != None : > break
A fair comparison would use "pass" here. Or a while loop as in the other case. The way it is, it compares 30 times read(CHUNKSIZE) against one time read(). Björn -- http://mail.python.org/mailman/listinfo/python-list