Anders Eriksson wrote: > Hello, > > I have made a short program that given an url will download all referenced > files on that url. > > It works, but I'm thinking it could use some optimization since it's very > slow. > > I create a list of tuples where each tuple consist of the url to the file > and the path to where I want to save it. E.g > (http://somewhere.com/foo.mp3, c:\Music\foo.mp3) > > The downloading part (which is the part I need help with) looks like this: > def GetFiles():
Consider passing 'hreflist' explicitly. Global variables make your script harder to manage in the long run. > """do the actual copying of files""" > for url,path in hreflist: > print(url,end=" ") You can force python to write out its internal buffer by calling sys.stdout.flush() You may also take a look at the logging package. > srcdata = urlopen(url).read() For large files you would read the source in chunks: src = urlopen(url) with open(path, mode="wb") as dstfile: while True: chunk = src.read(2**20) if not chunk: break dstfile.write(chunk) Instead of writing this loop yourself you can use shutil.copyfileobj(src, dstfile) or even urllib.request.urlretrieve(url, path) which also takes care of opening the file. > dstfile = open(path,mode='wb') > dstfile.write(srcdata) > dstfile.close() > print("Done!") > > hreflist if the list of tuples. > > at the moment the print(url,end=" ") will not be printed before the actual > download, instead it will be printed at the same time as print("Done!"). > This I would like to have the way I intended. > > Is downloading a binary file using: srcdata = urlopen(url).read() > the best way? Is there some other way that would speed up the downloading? The above method may not faster (the operation is "io-bound") but it is able to handle large files gracefully. Peter -- http://mail.python.org/mailman/listinfo/python-list