On Jun 15, 2:29 pm, [EMAIL PROTECTED] wrote: > I wrote a Python program (103 lines, below) to download developer data > from SourceForge for research about social networks. > > Please critique the code and let me know how to improve it. > > An example use of the program: > > prompt> python download.py 1 240000 > > The above command downloads data for the projects with IDs between 1 > and 240000, inclusive. As it runs, it prints status messages, with a > plus sign meaning that the project ID exists. Else, it prints a minus > sign. > > Questions: > > --- Are my setup and use of threads, the queue, and "while True" loop > correct or conventional? > > --- Should the program sleep sometimes, to be nice to the SourceForge > servers, and so they don't think this is a denial-of-service attack? > > --- Someone told me that popen is not thread-safe, and to use > mechanize. I installed it and followed an example on the web site. > There wasn't a good description of it on the web site, or I didn't > find it. Could someone explain what mechanize does? > > --- How do I choose the number of threads? I am using a MacBook Pro > 2.4GHz Intel Core 2 Duo with 4 GB 667 MHz DDR2 SDRAM, running OS > 10.5.3. > > Thank you. > > Winston > [snip] String methods are quicker than regular expressions, so don't use regular expressions if string methods are perfectly adequate. For example, you can replace:
error_pattern = re.compile(".*\n<!--pageid login -->\n.*", re.DOTALL) ... valid_id = not error_pattern.match(text) with: error_pattern = "\n<!--pageid login -->\n" ... valid_id = error_pattern not in text -- http://mail.python.org/mailman/listinfo/python-list