Shiva <shivaji...@yahoo.com.dmarc.invalid> Wrote in message: > Hi, > > Here is a small code that I wrote that downloads images from a webpage url > specified (you can limit to how many downloads you want). However, I am > looking at adding functionality and searching external links from this page > and downloading the same number of images from that page as well.(And > limiting the depth it can go to) > > Any ideas? (I am using Python 3.4 & I am a beginner) > > import urllib.request > import re > url="http://www.abc.com" > > pagehtml = urllib.request.urlopen(url) > myfile = pagehtml.read() > matches=re.findall(r'http://\S+jpg|jpeg',str(myfile)) > > > for urltodownload in matches[0:50]: > imagename=urltodownload[-12:] > urllib.request.urlretrieve(urltodownload,imagename) > > print('Done!') > > Thanks, > Shiva > >
I'm going to make the wild assumption that you can safely do both parses using regex, and that finding the jpegs works well enough with your present one. First thing is to make most of your present code a function fetch(), starting with pagehtml and ending before the print. The function should take two arguments, url and depthlimit. Now, at the end of the function, add something like If depthlimit > 0: matches = ... some regex that finds links for link in matches [:40]: fetch (link, depthlimit - 1) Naturally, the rest of the top level code needs to be moved after the function definition, and is called by doing something like: fetch (url, 10) to have a depth limit of 10. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list