webcomm wrote: > Hi, > > Am I going to have problems if I use urlopen() in a loop to get data > from 3000+ URLs? There will be about 2KB of data on average at each > URL. I will probably run the script about twice per day. Data from > each URL will be saved to my database. > > I'm asking because I've never opened that many URLs before in a loop. > I'm just wondering if it will be particularly taxing for my server. > Is it very uncommon to get data from so many URLs in a script? I > guess search spiders do it, so I should be able to as well? > You shouldn't expect problem - though you might want to think about using some more advanced technique like threading to get your results more quickly.
This is Python, though. It shouldn't take long to write a test program to verify that you can indeed spider 3,000 pages this way. With about 2KB per page, you could probably build up a memory structure containing the whole content of every page without memory usage becoming too excessive for modern systems. If you are writing stuff out to a database as you go and not retaining page content then there should be no problems whatsoever. Then look at a parallelized solution of some sort if you need it to work more quickly. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list