Re: multiprocessing vs thread performance
En Wed, 07 Jan 2009 23:05:53 -0200, James Mills prolo...@shortcircuit.net.au escribió: Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch of good example for it? I am trying to break a loop to run it over multiple core in a system. And I need to return an integer value as the result of the process an accumulate all of them. the examples that I found there is no return for the process. You communicate with the process in one of several ways: * Semaphores * Locks * PIpes The Pool class provides a more abstract view that may be better suited in this case. Just create a pool, and use map_async to collect and summarize the results. import string import multiprocessing def count(args): (lineno, line) = args print This is %s, processing line %d\n % ( multiprocessing.current_process().name, lineno), result = dict(letters=0, digits=0, other=0) for c in line: if c in string.letters: result['letters'] += 1 elif c in string.digits: result['digits'] += 1 else: result['other'] += 1 # just to make some random delay import time; time.sleep(len(line)/100.0) return result if __name__ == '__main__': summary = dict(letters=0, digits=0, other=0) def summary_add(results): # this is called with a list of results for result in results: summary['letters'] += result['letters'] summary['digits'] += result['digits'] summary['other'] += result['other'] # count letters on this same script f = open(__file__, 'r') pool = multiprocessing.Pool(processes=6) # invoke count((lineno, line)) for each line in the file pool.map_async(count, enumerate(f), 10, summary_add) pool.close() # no more jobs pool.join() # wait until done print summary -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
Hi All , Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch of good example for it? I am trying to break a loop to run it over multiple core in a system. And I need to return an integer value as the result of the process an accumulate all of them. the examples that I found there is no return for the process. Thanks, -Arash On Mon, Jan 5, 2009 at 7:24 PM, Gabriel Genellina gagsl-...@yahoo.com.arwrote: En Sat, 03 Jan 2009 11:31:12 -0200, Nick Craig-Wood n...@craig-wood.com escribió: mk mrk...@gmail.com wrote: The results I got are very different from the benchmark quoted in PEP 371. On twin Xeon machine the threaded version executed in 5.54 secs, while multiprocessing version took over 222 secs to complete! Am I doing smth wrong in code below? Yes! The problem with your code is that you never start more than one process at once in the multiprocessing example. Just check ps when it is running and you will see. Oh, very good analysis! Those results were worriying me a little. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Thu, Jan 8, 2009 at 10:55 AM, Arash Arfaee erex...@gmail.com wrote: Hi All , HI :) Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch of good example for it? I am trying to break a loop to run it over multiple core in a system. And I need to return an integer value as the result of the process an accumulate all of them. the examples that I found there is no return for the process. You communicate with the process in one of several ways: * Semaphores * Locks * PIpes I prefer to use Pipes which act much like sockets. (in fact they are). Read the docs and let us know how you go :) I'm actually implementing multiprocessing support into circuits (1) right now... cheers James 1. http://trac.softcircuit.com.au/circuits/ -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
There doesn't seem to be any good examples on POSH or it's not clear to me. For when using with a for loop like mk is doing who started this thread. How would somethings like this be possible to do with POSH? The example show how to share variables between processes/threads but nothing about How the thread starts or a for loop. -Alex Goretoy http://www.alexgoretoy.com somebodywhoca...@gmail.com On Sat, Jan 3, 2009 at 1:31 PM, Nick Craig-Wood n...@craig-wood.com wrote: mk mrk...@gmail.com wrote: After reading http://www.python.org/dev/peps/pep-0371/ I was under impression that performance of multiprocessing package is similar to that of thread / threading. However, to familiarize myself with both packages I wrote my own test of spawning and returning 100,000 empty threads or processes (while maintaining at most 100 processes / threads active at any one time), respectively. The results I got are very different from the benchmark quoted in PEP 371. On twin Xeon machine the threaded version executed in 5.54 secs, while multiprocessing version took over 222 secs to complete! Am I doing smth wrong in code below? Yes! The problem with your code is that you never start more than one process at once in the multiprocessing example. Just check ps when it is running and you will see. My conjecture is that this is due to the way fork() works under unix. I think that when the parent forks it yields the CPU to the child. Because you are giving the child effectively no work to do it returns immediately, re-awakening the parent, thus serialising your jobs. If you give the children some work to do you'll see a quite different result. I gave each child time.sleep(1) to do and cut down the total number to 10,000. $ ./test_multiprocessing.py == Process 1000 working == == Process 2000 working == == Process 3000 working == == Process 4000 working == == Process 5000 working == == Process 6000 working == == Process 7000 working == == Process 8000 working == == Process 9000 working == == Process 1 working == === Main thread waiting for all processes to finish === Total time: 101.382129192 $ ./test_threading.py == Thread 1000 working == == Thread 2000 working == == Thread 3000 working == == Thread 4000 working == == Thread 5000 working == == Thread 6000 working == == Thread 7000 working == == Thread 8000 working == == Thread 9000 working == == Thread 1 working == Total time: 100.659118176 So almost identical results and as expected - we ran 10,000 sleep(1)s in 100 seconds so we must have been running 100 simultaneously. If you replace the time.sleep(1) with for _ in xrange(100): pass you get this much more interesting answer on my dual core linux laptop, showing nicely the effect of the contention on the python global interpreter lock and how multiprocessing avoids it. $ ./test_multiprocessing.py == Process 1000 working == == Process 2000 working == == Process 3000 working == == Process 4000 working == == Process 5000 working == == Process 6000 working == == Process 7000 working == == Process 8000 working == == Process 9000 working == == Process 1 working == === Main thread waiting for all processes to finish === Total time: 266.808327913 $ ./test_threading.py == Thread 1000 working == == Thread 2000 working == == Thread 3000 working == == Thread 4000 working == == Thread 5000 working == == Thread 6000 working == == Thread 7000 working == == Thread 8000 working == == Thread 9000 working == == Thread 1 working == Total time: 834.81882 -- Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
En Sat, 03 Jan 2009 11:31:12 -0200, Nick Craig-Wood n...@craig-wood.com escribió: mk mrk...@gmail.com wrote: The results I got are very different from the benchmark quoted in PEP 371. On twin Xeon machine the threaded version executed in 5.54 secs, while multiprocessing version took over 222 secs to complete! Am I doing smth wrong in code below? Yes! The problem with your code is that you never start more than one process at once in the multiprocessing example. Just check ps when it is running and you will see. Oh, very good analysis! Those results were worriying me a little. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
mk mrk...@gmail.com wrote: After reading http://www.python.org/dev/peps/pep-0371/ I was under impression that performance of multiprocessing package is similar to that of thread / threading. However, to familiarize myself with both packages I wrote my own test of spawning and returning 100,000 empty threads or processes (while maintaining at most 100 processes / threads active at any one time), respectively. The results I got are very different from the benchmark quoted in PEP 371. On twin Xeon machine the threaded version executed in 5.54 secs, while multiprocessing version took over 222 secs to complete! Am I doing smth wrong in code below? Yes! The problem with your code is that you never start more than one process at once in the multiprocessing example. Just check ps when it is running and you will see. My conjecture is that this is due to the way fork() works under unix. I think that when the parent forks it yields the CPU to the child. Because you are giving the child effectively no work to do it returns immediately, re-awakening the parent, thus serialising your jobs. If you give the children some work to do you'll see a quite different result. I gave each child time.sleep(1) to do and cut down the total number to 10,000. $ ./test_multiprocessing.py == Process 1000 working == == Process 2000 working == == Process 3000 working == == Process 4000 working == == Process 5000 working == == Process 6000 working == == Process 7000 working == == Process 8000 working == == Process 9000 working == == Process 1 working == === Main thread waiting for all processes to finish === Total time: 101.382129192 $ ./test_threading.py == Thread 1000 working == == Thread 2000 working == == Thread 3000 working == == Thread 4000 working == == Thread 5000 working == == Thread 6000 working == == Thread 7000 working == == Thread 8000 working == == Thread 9000 working == == Thread 1 working == Total time: 100.659118176 So almost identical results and as expected - we ran 10,000 sleep(1)s in 100 seconds so we must have been running 100 simultaneously. If you replace the time.sleep(1) with for _ in xrange(100): pass you get this much more interesting answer on my dual core linux laptop, showing nicely the effect of the contention on the python global interpreter lock and how multiprocessing avoids it. $ ./test_multiprocessing.py == Process 1000 working == == Process 2000 working == == Process 3000 working == == Process 4000 working == == Process 5000 working == == Process 6000 working == == Process 7000 working == == Process 8000 working == == Process 9000 working == == Process 1 working == === Main thread waiting for all processes to finish === Total time: 266.808327913 $ ./test_threading.py == Thread 1000 working == == Thread 2000 working == == Thread 3000 working == == Thread 4000 working == == Thread 5000 working == == Thread 6000 working == == Thread 7000 working == == Thread 8000 working == == Thread 9000 working == == Thread 1 working == Total time: 834.81882 -- Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Dec 29, 9:29 am, mk mrk...@gmail.com wrote: Christian Heimes wrote: mk wrote: Am I doing smth wrong in code below? Or do I have to use multiprocessing.Pool to get any decent results? You have missed an important point. A well designed application does neither create so many threads nor processes. Except I was not developing well designed application but writing the test the goal of which was measuring the thread / process creation cost. The creation of a thread or forking of a process is an expensive operation. Sure. The point is, how expensive? While still being relatively expensive, it turns out that in Python creating a thread is much, much cheaper than creating a process via multiprocessing on Linux, while this seems to be not necessarily true on Mac OS X. You should use a pool of threads or processes. Probably true, except, again, that was not quite the point of this exercise.. The limiting factor is not the creation time but the communication and synchronization overhead between multiple threads or processes. Which I am probably going to test as well. I had an idea. You could use 'multiprocessing' for synchronization, and just use an mmap for the actual communication. (I think it's a good idea.) -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
Aaron Brady castiro...@gmail.com writes: I had an idea. You could use 'multiprocessing' for synchronization, and just use an mmap for the actual communication. (I think it's a good idea.) Are you reinventing POSH? (http://poshmodule.sourceforge.net) -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Dec 31, 2008, at 7:19 PM, Paul Rubin wrote: Aaron Brady castiro...@gmail.com writes: I had an idea. You could use 'multiprocessing' for synchronization, and just use an mmap for the actual communication. (I think it's a good idea.) Are you reinventing POSH? (http://poshmodule.sourceforge.net) Or sysv_ipc? Or posix_ipc? http://semanchuk.com/philip/sysv_ipc/ http://semanchuk.com/philip/posix_ipc/ Bug fix version of the latter coming out in a day or two... -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Dec 31, 6:19 pm, Paul Rubin http://phr...@nospam.invalid wrote: Aaron Brady castiro...@gmail.com writes: I had an idea. You could use 'multiprocessing' for synchronization, and just use an mmap for the actual communication. (I think it's a good idea.) Are you reinventing POSH? (http://poshmodule.sourceforge.net) I thought the same thing! Paul Boddie introduced me to POSH some months ago. I don't recall the implementation of synchro. objects in POSH, but IIRC, 'multiprocessing' uses native OS handles opening in multiple processes. It could be that there would be substantial overlap, or the trade-offs could be significant. For one, the above combination only permits strings to be shared, not objects. -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Dec 29, 9:08 pm, James Mills prolo...@shortcircuit.net.au wrote: On Tue, Dec 30, 2008 at 12:52 PM, Aaron Brady castiro...@gmail.com wrote: On Dec 29, 7:40 pm, James Mills prolo...@shortcircuit.net.au wrote: On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady castiro...@gmail.com wrote: The OP may be interested in Erlang, which Wikipedia (end-all, be-all) claims is a 'distribution oriented language'. snip I'm presently looking at Virtual Synchrony and other distributed processing architectures - but circuits is meant to be general purpose enough to fit event-driven applications/systems. I noticed a while ago that threads can be used to simulate generators. 'next', 'send', and 'yield' are merely replaced by synchronizor calls (coining the term). Not the other way around, though, unless the generator guarantees a yield frequently. 'settrace' anyone? Aaron, circuits doesn't use generators :) What did your comment have to do with this ? I have often seen generators used to facilitate coroutine and coooperative programming though. cheers James James, Hi. I'm glad you asked; I never know how out there my comments are (but surmise that feedback is always a good thing). What I was thinking was, I didn't know Virtual Synchrony, and I've never used Erlang, but I'm interested in concurrency especially as it pertains to units of work, division of labor, and division of context; and generators are another way to divide context. So: I wanted to put more of my background and interests on the table. What I said wasn't directly relevant, I see. But it's not like I dissertated (discussed) the Tibettan-Austrian spice trade. I think I just want to say stuff about threading! Maybe I'm just excited to meet people who share my interests... not unheard of. In Economics, you can divide a market into vertical and horizontal dimensions, vertical being the chain from raw resources to finished products, and horizontal being market coverage. With a similar division in tasks, horizontal units would handle incoming events, prepare them, then pass them on to a next unit, which processes a little, then passes it on, like an assembly line (bucket brigade/ alternating current); and vertical units would take one incoming event, and see it through start to finish (direct current). You don't have to use that depiction. The terminology is transposed from that of a distributed matrix multiplication task depiction, where horizontal works start-to-finish, and vertical takes handoffs from its predecessor. 'Circuits' doesn't use generators. I think generators are an underexplored technique. Is 'circuits' assembly line or start-to- finish, if my analogy makes any sense? 'Circuits' is event-driven, but I don't see any difference between 'event-driven' and multithreaded in general. (I think contrast can create a good picture and a clear understanding.) What is special about an 'event-driven' architecture? Are you distinguishing blocking from polling? -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Wed, Dec 31, 2008 at 12:29 AM, Aaron Brady castiro...@gmail.com wrote: James, Hi. I'm glad you asked; I never know how out there my comments are (but surmise that feedback is always a good thing). What I was thinking was, I didn't know Virtual Synchrony, and I've never used Erlang, but I'm interested in concurrency especially as it pertains to units of work, division of labor, and division of context; and generators are another way to divide context. So: I wanted to put more of my background and interests on the table. What I said wasn't directly relevant, I see. But it's not like I dissertated (discussed) the Tibettan-Austrian spice trade. I think I just want to say stuff about threading! Maybe I'm just excited to meet people who share my interests... not unheard of. Glad to see others also interested in these topics :) (snip) 'Circuits' doesn't use generators. I think generators are an underexplored technique. Is 'circuits' assembly line or start-to- finish, if my analogy makes any sense? 'Circuits' is event-driven, but I don't see any difference between 'event-driven' and multithreaded in general. (I think contrast can create a good picture and a clear understanding.) What is special about an 'event-driven' architecture? Are you distinguishing blocking from polling? I'll shortly be releasing circuits-1.0 today hopefully. To answer your question, circuits is inspired by a software architecture that my most favoured lecturer a few years back was teaching. That is: * Behaviour Trees (design) and consequently: * The concept of everything is a Component * Systems and Sub-Systems are built upon Components and * Everything is an event. and * An emergent property of such systems are Behaviour. That being said, circuits employs both an event-driven approach as well as a Component architecture. In your analogy it is both horizontal and vertical. As I continue to develop circuits and improve it's core design as well as building it's ever growing set of Components, I try to keep it as general as possible - my main aim though is distributed processing and architectures. (See the primes example). Thanks for sharing your interest :) cheers James -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Wed, Dec 31, 2008 at 8:42 AM, James Mills prolo...@shortcircuit.net.au wrote: (snip) As I continue to develop circuits and improve it's core design as well as building it's ever growing set of Components, I try to keep it as general as possible - my main aim though is distributed processing and architectures. (See the primes example). Aaron, just wanted to demonstrate to you the example (primes) that I mentioned above: On Terminal A: jmi...@atomant:~/circuits/examples$ ./primes.py -o primes.txt -b 127.0.0.1:8000 -p 1000 -w Total Primes: 1001 (23/s after 44.16s) Total Events: 43373 (983/s after 44.16s) Distribution: c1096b40-7606-4ba7-9593-1385e14ef339: 348 8313b43f-d45d-4a0a-8d87-e6a93d3dfb0b: 653 On Terminal B: jmi...@atomant:~/other/circuits/examples$ ./primes.py -b 127.0.0.1:8001 -s 127.0.0.1:8000 The example uses circuits to distribute work amongst any arbitrary number of nodes connected to the system. This is all manually implemented by simply taking advantage of the circuits framework with my basic understanding ( so far ) of distributed processing, virtual synchrony, and other techniques. This is the same thing just run on a single node (no distribution): jmi...@atomant:~/circuits/examples$ ./primes.py -o primes.txt -b 127.0.0.1:8000 -p 1000 Total Primes: 1001 (17/s after 62.13s) Total Events: 28579 (460/s after 62.13s) Distribution: 701be749-9833-40a4-9181-5ee18047b1ad: 1001 As you can see, running 2 instances almost halves the time. If you do try this though, you'll note that the CPU isn't being used very heavily at all - This could be improved - but the example merely demonstrates distributed event processing and syncronization. cheers James -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
Christian Heimes li...@cheimes.de wrote: You have missed an important point. A well designed application does neither create so many threads nor processes. The creation of a thread or forking of a process is an expensive operation. That actually depends on the operating system. As one example, thread creation on Windows is not all that expensive. You should use a pool of threads or processes. Even so, this is good advise. -- Tim Roberts, t...@probo.com Providenza Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list
multiprocessing vs thread performance
Hello everyone, After reading http://www.python.org/dev/peps/pep-0371/ I was under impression that performance of multiprocessing package is similar to that of thread / threading. However, to familiarize myself with both packages I wrote my own test of spawning and returning 100,000 empty threads or processes (while maintaining at most 100 processes / threads active at any one time), respectively. The results I got are very different from the benchmark quoted in PEP 371. On twin Xeon machine the threaded version executed in 5.54 secs, while multiprocessing version took over 222 secs to complete! Am I doing smth wrong in code below? Or do I have to use multiprocessing.Pool to get any decent results? # multithreaded version #!/usr/local/python2.6/bin/python import thread import time class TCalc(object): def __init__(self): self.tactivnum = 0 self.reslist = [] self.tid = 0 self.tlock = thread.allocate_lock() def testth(self, tid): if tid % 1000 == 0: print == Thread %d working == % tid self.tlock.acquire() self.reslist.append(tid) self.tactivnum -= 1 self.tlock.release() def calc_100thousand(self): tid = 1 while tid = 10: while self.tactivnum 99: time.sleep(0.01) self.tlock.acquire() self.tactivnum += 1 self.tlock.release() t = thread.start_new_thread(self.testth, (tid,)) tid += 1 while self.tactivnum 0: time.sleep(0.01) if __name__ == __main__: tc = TCalc() tstart = time.time() tc.calc_100thousand() tend = time.time() print Total time: , tend-tstart # multiprocessing version #!/usr/local/python2.6/bin/python import multiprocessing import time def testp(pid): if pid % 1000 == 0: print == Process %d working == % pid def palivelistlen(plist): pll = 0 for p in plist: if p.is_alive(): pll += 1 else: plist.remove(p) p.join() return pll def testp_100thousand(): pid = 1 proclist = [] while pid = 10: while palivelistlen(proclist) 99: time.sleep(0.01) p = multiprocessing.Process(target=testp, args=(pid,)) p.start() proclist.append(p) pid += 1 print === Main thread waiting for all processes to finish === for p in proclist: p.join() if __name__ == __main__: tstart = time.time() testp_100thousand() tend = time.time() print Total time:, tend - tstart -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On 29 Gru, 15:52, mk mrk...@gmail.com wrote: Hello everyone, After readinghttp://www.python.org/dev/peps/pep-0371/I was under impression that performance of multiprocessing package is similar to that of thread / threading. However, to familiarize myself with both packages I wrote my own test of spawning and returning 100,000 empty threads or processes (while maintaining at most 100 processes / threads active at any one time), respectively. The results I got are very different from the benchmark quoted in PEP 371. On twin Xeon machine the threaded version executed in 5.54 secs, while multiprocessing version took over 222 secs to complete! Am I doing smth wrong in code below? Or do I have to use multiprocessing.Pool to get any decent results? Oooh, 10 processes! You're fortunate that your OS handled them in finite time. [quick browsing through the code] Ah, so there are 100 processes at time. 200secs still don't sound strange. JW -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
janislaw wrote: Ah, so there are 100 processes at time. 200secs still don't sound strange. I ran the PEP 371 code on my system (Linux) on Python 2.6.1: Linux SLES (9.156.44.174) [15:18] root ~/tmp/src # ./run_benchmarks.py empty_func.py Importing empty_func Starting tests ... non_threaded (1 iters) 0.05 seconds threaded (1 threads)0.000235 seconds processes (1 procs) 0.002607 seconds non_threaded (2 iters) 0.06 seconds threaded (2 threads)0.000461 seconds processes (2 procs) 0.004514 seconds non_threaded (4 iters) 0.08 seconds threaded (4 threads)0.000897 seconds processes (4 procs) 0.008557 seconds non_threaded (8 iters) 0.10 seconds threaded (8 threads)0.001821 seconds processes (8 procs) 0.016950 seconds This is very different from PEP 371. It appears that the PEP 371 code was written on Mac OS X. The conclusion I get from comparing above costs sis that OS X must have very low cost of creating the process, at least when compared to Linux, not that multiprocessing is a viable alternative to thread / threading module. :-( -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
mk wrote: Am I doing smth wrong in code below? Or do I have to use multiprocessing.Pool to get any decent results? You have missed an important point. A well designed application does neither create so many threads nor processes. The creation of a thread or forking of a process is an expensive operation. You should use a pool of threads or processes. The limiting factor is not the creation time but the communication and synchronization overhead between multiple threads or processes. Christian -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
Christian Heimes wrote: mk wrote: Am I doing smth wrong in code below? Or do I have to use multiprocessing.Pool to get any decent results? You have missed an important point. A well designed application does neither create so many threads nor processes. Except I was not developing well designed application but writing the test the goal of which was measuring the thread / process creation cost. The creation of a thread or forking of a process is an expensive operation. Sure. The point is, how expensive? While still being relatively expensive, it turns out that in Python creating a thread is much, much cheaper than creating a process via multiprocessing on Linux, while this seems to be not necessarily true on Mac OS X. You should use a pool of threads or processes. Probably true, except, again, that was not quite the point of this exercise.. The limiting factor is not the creation time but the communication and synchronization overhead between multiple threads or processes. Which I am probably going to test as well. -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
In article mailman.6337.1230563873.3487.python-l...@python.org, Christian Heimes li...@cheimes.de wrote: You have missed an important point. A well designed application does neither create so many threads nor processes. The creation of a thread or forking of a process is an expensive operation. You should use a pool of threads or processes. It's worth noting that forking a new process is usually a much more expensive operation than creating a thread. Not that I would want to create 100,000 of either! Not everybody realizes it, but threads eat up a fair chunk of memory (you get one stack per thread, which means you need to allocate a hunk of memory for each stack). I did a quick look around; 256k seems like a common default stack size. 1 meg wouldn't be unheard of. -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Dec 29, 8:52 am, mk mrk...@gmail.com wrote: Hello everyone, After readinghttp://www.python.org/dev/peps/pep-0371/I was under impression that performance of multiprocessing package is similar to that of thread / threading. However, to familiarize myself with both packages I wrote my own test of spawning and returning 100,000 empty threads or processes (while maintaining at most 100 processes / threads active at any one time), respectively. The results I got are very different from the benchmark quoted in PEP 371. On twin Xeon machine the threaded version executed in 5.54 secs, while multiprocessing version took over 222 secs to complete! Am I doing smth wrong in code below? Or do I have to use multiprocessing.Pool to get any decent results? I'm running a 1.6 GHz. I only ran 1 empty threads and 1 empty processes. The threads were the ones you wrote. The processes were empty executables written in a lower language, also run 100 at a time, started with 'subprocess', not 'multiprocessing'. The threads took 1.2 seconds. The processes took 24 seconds. The processes you wrote had only finished 3000 after several minutes. -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On 2008-12-29, mk mrk...@gmail.com wrote: janislaw wrote: Ah, so there are 100 processes at time. 200secs still don't sound strange. I ran the PEP 371 code on my system (Linux) on Python 2.6.1: Linux SLES (9.156.44.174) [15:18] root ~/tmp/src # ./run_benchmarks.py empty_func.py Importing empty_func Starting tests ... non_threaded (1 iters) 0.05 seconds threaded (1 threads)0.000235 seconds processes (1 procs) 0.002607 seconds non_threaded (2 iters) 0.06 seconds threaded (2 threads)0.000461 seconds processes (2 procs) 0.004514 seconds non_threaded (4 iters) 0.08 seconds threaded (4 threads)0.000897 seconds processes (4 procs) 0.008557 seconds non_threaded (8 iters) 0.10 seconds threaded (8 threads)0.001821 seconds processes (8 procs) 0.016950 seconds This is very different from PEP 371. It appears that the PEP 371 code was written on Mac OS X. On the PEP371 it says All benchmarks were run using the following: Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6) On my iMac 2.3Ghz dualcore. python 2.6 iTaulu:src torppa$ python run_benchmarks.py empty_func.py Importing empty_func Starting tests ... non_threaded (1 iters) 0.02 seconds threaded (1 threads)0.000227 seconds processes (1 procs) 0.002367 seconds non_threaded (2 iters) 0.03 seconds threaded (2 threads)0.000406 seconds processes (2 procs) 0.003465 seconds non_threaded (4 iters) 0.04 seconds threaded (4 threads)0.000786 seconds processes (4 procs) 0.006430 seconds non_threaded (8 iters) 0.06 seconds threaded (8 threads)0.001618 seconds processes (8 procs) 0.012841 seconds With python2.5 and pyProcessing-0.52 iTaulu:src torppa$ python2.5 run_benchmarks.py empty_func.py Importing empty_func Starting tests ... non_threaded (1 iters) 0.03 seconds threaded (1 threads)0.000143 seconds processes (1 procs) 0.002794 seconds non_threaded (2 iters) 0.04 seconds threaded (2 threads)0.000277 seconds processes (2 procs) 0.004046 seconds non_threaded (4 iters) 0.05 seconds threaded (4 threads)0.000598 seconds processes (4 procs) 0.007816 seconds non_threaded (8 iters) 0.08 seconds threaded (8 threads)0.001173 seconds processes (8 procs) 0.015504 seconds -- Jarkko Torppa, Elisa -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
Jarkko Torppa wrote: On the PEP371 it says All benchmarks were run using the following: Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6) Right... I overlooked that. My tests I quoted above were done on SLES 10, kernel 2.6.5. With python2.5 and pyProcessing-0.52 iTaulu:src torppa$ python2.5 run_benchmarks.py empty_func.py Importing empty_func Starting tests ... non_threaded (1 iters) 0.03 seconds threaded (1 threads)0.000143 seconds processes (1 procs) 0.002794 seconds non_threaded (2 iters) 0.04 seconds threaded (2 threads)0.000277 seconds processes (2 procs) 0.004046 seconds non_threaded (4 iters) 0.05 seconds threaded (4 threads)0.000598 seconds processes (4 procs) 0.007816 seconds non_threaded (8 iters) 0.08 seconds threaded (8 threads)0.001173 seconds processes (8 procs) 0.015504 seconds There's smth wrong with numbers posted in PEP. This is what I got on 4-socket Xeon (+ HT) with Python 2.6.1 on Debian (Etch), with kernel upgraded to 2.6.22.14: non_threaded (1 iters) 0.04 seconds threaded (1 threads)0.000159 seconds processes (1 procs) 0.001067 seconds non_threaded (2 iters) 0.05 seconds threaded (2 threads)0.000301 seconds processes (2 procs) 0.001754 seconds non_threaded (4 iters) 0.06 seconds threaded (4 threads)0.000581 seconds processes (4 procs) 0.003906 seconds non_threaded (8 iters) 0.09 seconds threaded (8 threads)0.001148 seconds processes (8 procs) 0.008178 seconds -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
Roy Smith r...@panix.com writes: In article mailman.6337.1230563873.3487.python-l...@python.org, Christian Heimes li...@cheimes.de wrote: You have missed an important point. A well designed application does neither create so many threads nor processes. The creation of a thread or forking of a process is an expensive operation. You should use a pool of threads or processes. It's worth noting that forking a new process is usually a much more expensive operation than creating a thread. If by forking you mean an actual fork() call, as opposed to invoking a different executable, the difference is not necessarily that great. Modern Unix systems tend to implement a 1:1 mapping between threads and kernel processes, so creating a thread and forking a process require similar amount of work. On my system, as measured by timeit, spawning and joining a thread takes 111 usecs, while forking and waiting for a process takes 260. Slower, but not catastrophically so. Not that I would want to create 100,000 of either! Agreed. Not everybody realizes it, but threads eat up a fair chunk of memory (you get one stack per thread, which means you need to allocate a hunk of memory for each stack). I did a quick look around; 256k seems like a common default stack size. 1 meg wouldn't be unheard of. Note that this memory is virtual memory, so it doesn't use up the physical RAM until actually used. I've seen systems running legacy Java applications that create thousands of threads where *virtual* memory was the bottleneck. -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Tue, Dec 30, 2008 at 12:52 AM, mk mrk...@gmail.com wrote: Hello everyone, After reading http://www.python.org/dev/peps/pep-0371/ I was under impression that performance of multiprocessing package is similar to that of thread / threading. However, to familiarize myself with both packages I wrote my own test of spawning and returning 100,000 empty threads or processes (while maintaining at most 100 processes / threads active at any one time), respectively. The results I got are very different from the benchmark quoted in PEP 371. On twin Xeon machine the threaded version executed in 5.54 secs, while multiprocessing version took over 222 secs to complete! Am I doing smth wrong in code below? Or do I have to use multiprocessing.Pool to get any decent results? The overhead in starting OS level processes is quite high. This is why event-driven, single process servers can perform far better than ones that fork (spawn multiple processes) per request. As others have mentioned, it's not suprising that spawning even 100 processes took some time. Bottom line: multiprocessing should not be used this way. (nor should threading). cheers James -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Dec 29, 6:05 pm, James Mills prolo...@shortcircuit.net.au wrote: On Tue, Dec 30, 2008 at 12:52 AM, mk mrk...@gmail.com wrote: Hello everyone, After readinghttp://www.python.org/dev/peps/pep-0371/I was under impression that performance of multiprocessing package is similar to that of thread / threading. However, to familiarize myself with both packages I wrote my own test of spawning and returning 100,000 empty threads or processes (while maintaining at most 100 processes / threads active at any one time), respectively. snip As others have mentioned, it's not suprising that spawning even 100 processes took some time. Bottom line: multiprocessing should not be used this way. (nor should threading). The OP may be interested in Erlang, which Wikipedia (end-all, be-all) claims is a 'distribution oriented language'. You might also find it interesting to examine a theoretical OS that is optimized for process overhead. In other words, what is the minimum overhead possible? Can processes be as small as threads? Can entire threads be only a few bytes (words) big? Also, could generators provide any of the things you need with your multiple threads? You could, say, call 'next()' on many items in a list, and just remove them on StopIteration. -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady castiro...@gmail.com wrote: The OP may be interested in Erlang, which Wikipedia (end-all, be-all) claims is a 'distribution oriented language'. I would suggest to the OP that he take a look at circuits (1) an event framework with a focus on component architectures and distributed processing. I'm presently looking at Virtual Synchrony and other distributed processing architectures - but circuits is meant to be general purpose enough to fit event-driven applications/systems. cheers James 1. http://trac.softcircuit.com.au/circuits/ -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Dec 29, 7:40 pm, James Mills prolo...@shortcircuit.net.au wrote: On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady castiro...@gmail.com wrote: The OP may be interested in Erlang, which Wikipedia (end-all, be-all) claims is a 'distribution oriented language'. snip I'm presently looking at Virtual Synchrony and other distributed processing architectures - but circuits is meant to be general purpose enough to fit event-driven applications/systems. I noticed a while ago that threads can be used to simulate generators. 'next', 'send', and 'yield' are merely replaced by synchronizor calls (coining the term). Not the other way around, though, unless the generator guarantees a yield frequently. 'settrace' anyone? -- http://mail.python.org/mailman/listinfo/python-list
Re: multiprocessing vs thread performance
On Tue, Dec 30, 2008 at 12:52 PM, Aaron Brady castiro...@gmail.com wrote: On Dec 29, 7:40 pm, James Mills prolo...@shortcircuit.net.au wrote: On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady castiro...@gmail.com wrote: The OP may be interested in Erlang, which Wikipedia (end-all, be-all) claims is a 'distribution oriented language'. snip I'm presently looking at Virtual Synchrony and other distributed processing architectures - but circuits is meant to be general purpose enough to fit event-driven applications/systems. I noticed a while ago that threads can be used to simulate generators. 'next', 'send', and 'yield' are merely replaced by synchronizor calls (coining the term). Not the other way around, though, unless the generator guarantees a yield frequently. 'settrace' anyone? Aaron, circuits doesn't use generators :) What did your comment have to do with this ? I have often seen generators used to facilitate coroutine and coooperative programming though. cheers James -- http://mail.python.org/mailman/listinfo/python-list