John Nagle wrote: > Paul Rubin wrote: >> John Nagle <na...@animats.com> writes: >>> Analysis of each domain is >>> performed in a separate process, but each process uses multiple >>> threads to read process several web pages simultaneously. >>> >>> Some of the threads go compute-bound for a second or two at a time as >>> they parse web pages. >> >> You're probably better off using separate processes for the different >> pages. If I remember, you were using BeautifulSoup, which while very >> cool, is pretty doggone slow for use on large volumes of pages. I don't >> know if there's much that can be done about that without going off on a >> fairly messy C or C++ coding adventure. Maybe someday someone will do >> that. > > I already use separate processes for different domains. I could > live with Python's GIL as long as moving to a multicore server > doesn't make performance worse. That's why I asked about CPU dedication > for each process, to avoid thrashing at the GIL. > I believe it's already been said that the GIL thrashing is mostly MacOS specific. You might also find something in the affinity module
http://pypi.python.org/pypi/affinity/0.1.0 to ensure that each process in your pool runs on only one processor. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ -- http://mail.python.org/mailman/listinfo/python-list