Re: multiprocessing vs thread performance

2009-01-09 Thread Gabriel Genellina
En Wed, 07 Jan 2009 23:05:53 -0200, James Mills  
prolo...@shortcircuit.net.au escribió:


Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch  
of

good example for it? I am trying to break a loop to run it over multiple
core in a system. And I need to return an integer value as the result  
of the

process an accumulate all of them. the examples that I found there is no
return for the process.


You communicate with the process in one of several
ways:
 * Semaphores
 * Locks
 * PIpes


The Pool class provides a more abstract view that may be better suited in  
this case. Just create a pool, and use map_async to collect and summarize  
the results.


import string
import multiprocessing

def count(args):
(lineno, line) = args
print This is %s, processing line %d\n % (
multiprocessing.current_process().name, lineno),
result = dict(letters=0, digits=0, other=0)
for c in line:
if c in string.letters: result['letters'] += 1
elif c in string.digits: result['digits'] += 1
else: result['other'] += 1
# just to make some random delay
import time; time.sleep(len(line)/100.0)
return result

if __name__ == '__main__':

summary = dict(letters=0, digits=0, other=0)

def summary_add(results):
# this is called with a list of results
for result in results:
summary['letters'] += result['letters']
summary['digits'] += result['digits']
summary['other'] += result['other']

# count letters on this same script
f = open(__file__, 'r')

pool = multiprocessing.Pool(processes=6)
# invoke count((lineno, line)) for each line in the file
pool.map_async(count, enumerate(f), 10, summary_add)
pool.close() # no more jobs
pool.join()  # wait until done
print summary

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2009-01-07 Thread Arash Arfaee
Hi All ,

Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch of
good example for it? I am trying to break a loop to run it over multiple
core in a system. And I need to return an integer value as the result of the
process an accumulate all of them. the examples that I found there is no
return for the process.

Thanks,
-Arash

On Mon, Jan 5, 2009 at 7:24 PM, Gabriel Genellina gagsl-...@yahoo.com.arwrote:

 En Sat, 03 Jan 2009 11:31:12 -0200, Nick Craig-Wood n...@craig-wood.com
 escribió:

 mk mrk...@gmail.com wrote:


   The results I got are very different from the benchmark quoted in PEP
  371. On twin Xeon machine the threaded version executed in 5.54 secs,
  while multiprocessing version took over 222 secs to complete!

  Am I doing smth wrong in code below?


 Yes!

 The problem with your code is that you never start more than one
 process at once in the multiprocessing example.  Just check ps when it
 is running and you will see.


 Oh, very good analysis! Those results were worriying me a little.

 --
 Gabriel Genellina


 --
 http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2009-01-07 Thread James Mills
On Thu, Jan 8, 2009 at 10:55 AM, Arash Arfaee erex...@gmail.com wrote:
 Hi All ,

HI :)

 Does anybody know any tutorial for python 2.6 multiprocessing? Or bunch of
 good example for it? I am trying to break a loop to run it over multiple
 core in a system. And I need to return an integer value as the result of the
 process an accumulate all of them. the examples that I found there is no
 return for the process.

You communicate with the process in one of several
ways:
 * Semaphores
 * Locks
 * PIpes

I prefer to use Pipes which act much like sockets.
(in fact they are).

Read the docs and let us know how you go :)
I'm actually implementing multiprocessing
support into circuits (1) right now...

cheers
James

1. http://trac.softcircuit.com.au/circuits/
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2009-01-05 Thread alex goretoy
There doesn't seem to be any good examples on POSH or it's not clear to me.
For when using with a for loop like mk is doing who started this thread. How
would somethings like this be possible to do with POSH? The example show how
to share variables between processes/threads but nothing about How the
thread starts or a for loop.

-Alex Goretoy
http://www.alexgoretoy.com
somebodywhoca...@gmail.com


On Sat, Jan 3, 2009 at 1:31 PM, Nick Craig-Wood n...@craig-wood.com wrote:

 mk mrk...@gmail.com wrote:
   After reading http://www.python.org/dev/peps/pep-0371/ I was under
   impression that performance of multiprocessing package is similar to
   that of thread / threading. However, to familiarize myself with both
   packages I wrote my own test of spawning and returning 100,000 empty
   threads or processes (while maintaining at most 100 processes / threads
   active at any one time), respectively.
 
   The results I got are very different from the benchmark quoted in PEP
   371. On twin Xeon machine the threaded version executed in 5.54 secs,
   while multiprocessing version took over 222 secs to complete!
 
   Am I doing smth wrong in code below?

 Yes!

 The problem with your code is that you never start more than one
 process at once in the multiprocessing example.  Just check ps when it
 is running and you will see.

 My conjecture is that this is due to the way fork() works under unix.
 I think that when the parent forks it yields the CPU to the child.
 Because you are giving the child effectively no work to do it returns
 immediately, re-awakening the parent, thus serialising your jobs.

 If you give the children some work to do you'll see a quite different
 result.  I gave each child time.sleep(1) to do and cut down the total
 number to 10,000.

 $ ./test_multiprocessing.py
 == Process 1000 working ==
 == Process 2000 working ==
 == Process 3000 working ==
 == Process 4000 working ==
 == Process 5000 working ==
 == Process 6000 working ==
 == Process 7000 working ==
 == Process 8000 working ==
 == Process 9000 working ==
 == Process 1 working ==
 === Main thread waiting for all processes to finish ===
 Total time: 101.382129192

 $ ./test_threading.py
 == Thread 1000 working ==
 == Thread 2000 working ==
 == Thread 3000 working ==
 == Thread 4000 working ==
 == Thread 5000 working ==
 == Thread 6000 working ==
 == Thread 7000 working ==
 == Thread 8000 working ==
 == Thread 9000 working ==
 == Thread 1 working ==
 Total time:  100.659118176

 So almost identical results and as expected - we ran 10,000 sleep(1)s
 in 100 seconds so we must have been running 100 simultaneously.

 If you replace the time.sleep(1) with for _ in xrange(100):
 pass you get this much more interesting answer on my dual core linux
 laptop, showing nicely the effect of the contention on the python
 global interpreter lock and how multiprocessing avoids it.

 $ ./test_multiprocessing.py
 == Process 1000 working ==
 == Process 2000 working ==
 == Process 3000 working ==
 == Process 4000 working ==
 == Process 5000 working ==
 == Process 6000 working ==
 == Process 7000 working ==
 == Process 8000 working ==
 == Process 9000 working ==
 == Process 1 working ==
 === Main thread waiting for all processes to finish ===
 Total time: 266.808327913

 $ ./test_threading.py
 == Thread 1000 working ==
 == Thread 2000 working ==
 == Thread 3000 working ==
 == Thread 4000 working ==
 == Thread 5000 working ==
 == Thread 6000 working ==
 == Thread 7000 working ==
 == Thread 8000 working ==
 == Thread 9000 working ==
 == Thread 1 working ==
 Total time:  834.81882

 --
 Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick
 --
 http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2009-01-05 Thread Gabriel Genellina
En Sat, 03 Jan 2009 11:31:12 -0200, Nick Craig-Wood n...@craig-wood.com  
escribió:

mk mrk...@gmail.com wrote:



 The results I got are very different from the benchmark quoted in PEP
 371. On twin Xeon machine the threaded version executed in 5.54 secs,
 while multiprocessing version took over 222 secs to complete!

 Am I doing smth wrong in code below?


Yes!

The problem with your code is that you never start more than one
process at once in the multiprocessing example.  Just check ps when it
is running and you will see.


Oh, very good analysis! Those results were worriying me a little.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2009-01-03 Thread Nick Craig-Wood
mk mrk...@gmail.com wrote:
  After reading http://www.python.org/dev/peps/pep-0371/ I was under 
  impression that performance of multiprocessing package is similar to 
  that of thread / threading. However, to familiarize myself with both 
  packages I wrote my own test of spawning and returning 100,000 empty 
  threads or processes (while maintaining at most 100 processes / threads 
  active at any one time), respectively.
 
  The results I got are very different from the benchmark quoted in PEP 
  371. On twin Xeon machine the threaded version executed in 5.54 secs, 
  while multiprocessing version took over 222 secs to complete!
 
  Am I doing smth wrong in code below?

Yes!

The problem with your code is that you never start more than one
process at once in the multiprocessing example.  Just check ps when it
is running and you will see.

My conjecture is that this is due to the way fork() works under unix.
I think that when the parent forks it yields the CPU to the child.
Because you are giving the child effectively no work to do it returns
immediately, re-awakening the parent, thus serialising your jobs.

If you give the children some work to do you'll see a quite different
result.  I gave each child time.sleep(1) to do and cut down the total
number to 10,000.

$ ./test_multiprocessing.py
== Process 1000 working ==
== Process 2000 working ==
== Process 3000 working ==
== Process 4000 working ==
== Process 5000 working ==
== Process 6000 working ==
== Process 7000 working ==
== Process 8000 working ==
== Process 9000 working ==
== Process 1 working ==
=== Main thread waiting for all processes to finish ===
Total time: 101.382129192

$ ./test_threading.py
== Thread 1000 working ==
== Thread 2000 working ==
== Thread 3000 working ==
== Thread 4000 working ==
== Thread 5000 working ==
== Thread 6000 working ==
== Thread 7000 working ==
== Thread 8000 working ==
== Thread 9000 working ==
== Thread 1 working ==
Total time:  100.659118176

So almost identical results and as expected - we ran 10,000 sleep(1)s
in 100 seconds so we must have been running 100 simultaneously.

If you replace the time.sleep(1) with for _ in xrange(100):
pass you get this much more interesting answer on my dual core linux
laptop, showing nicely the effect of the contention on the python
global interpreter lock and how multiprocessing avoids it.

$ ./test_multiprocessing.py
== Process 1000 working ==
== Process 2000 working ==
== Process 3000 working ==
== Process 4000 working ==
== Process 5000 working ==
== Process 6000 working ==
== Process 7000 working ==
== Process 8000 working ==
== Process 9000 working ==
== Process 1 working ==
=== Main thread waiting for all processes to finish ===
Total time: 266.808327913

$ ./test_threading.py
== Thread 1000 working ==
== Thread 2000 working ==
== Thread 3000 working ==
== Thread 4000 working ==
== Thread 5000 working ==
== Thread 6000 working ==
== Thread 7000 working ==
== Thread 8000 working ==
== Thread 9000 working ==
== Thread 1 working ==
Total time:  834.81882

-- 
Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-31 Thread Aaron Brady
On Dec 29, 9:29 am, mk mrk...@gmail.com wrote:
 Christian Heimes wrote:
  mk wrote:
  Am I doing smth wrong in code below? Or do I have to use
  multiprocessing.Pool to get any decent results?

  You have missed an important point. A well designed application does
  neither create so many threads nor processes.

 Except I was not developing well designed application but writing the
 test the goal of which was measuring the thread / process creation cost.

  The creation of a thread
  or forking of a process is an expensive operation.

 Sure. The point is, how expensive? While still being relatively
 expensive, it turns out that in Python creating a thread is much, much
 cheaper than creating a process via multiprocessing on Linux, while this
 seems to be not necessarily true on Mac OS X.

  You should use a pool
  of threads or processes.

 Probably true, except, again, that was not quite the point of this
 exercise..

  The limiting factor is not the creation time but the communication and
  synchronization overhead between multiple threads or processes.

 Which I am probably going to test as well.

I had an idea.  You could use 'multiprocessing' for synchronization,
and just use an mmap for the actual communication.  (I think it's a
good idea.)
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-31 Thread Paul Rubin
Aaron Brady castiro...@gmail.com writes:
 I had an idea.  You could use 'multiprocessing' for synchronization,
 and just use an mmap for the actual communication.  (I think it's a
 good idea.)

Are you reinventing POSH?  (http://poshmodule.sourceforge.net)
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-31 Thread Philip Semanchuk


On Dec 31, 2008, at 7:19 PM, Paul Rubin wrote:


Aaron Brady castiro...@gmail.com writes:

I had an idea.  You could use 'multiprocessing' for synchronization,
and just use an mmap for the actual communication.  (I think it's a
good idea.)


Are you reinventing POSH?  (http://poshmodule.sourceforge.net)


Or sysv_ipc? Or posix_ipc?

http://semanchuk.com/philip/sysv_ipc/
http://semanchuk.com/philip/posix_ipc/

Bug fix version of the latter coming out in a day or two...
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-31 Thread Aaron Brady
On Dec 31, 6:19 pm, Paul Rubin http://phr...@nospam.invalid wrote:
 Aaron Brady castiro...@gmail.com writes:
  I had an idea.  You could use 'multiprocessing' for synchronization,
  and just use an mmap for the actual communication.  (I think it's a
  good idea.)

 Are you reinventing POSH?  (http://poshmodule.sourceforge.net)

I thought the same thing!  Paul Boddie introduced me to POSH some
months ago.  I don't recall the implementation of synchro. objects in
POSH, but IIRC, 'multiprocessing' uses native OS handles opening in
multiple processes.  It could be that there would be substantial
overlap, or the trade-offs could be significant.  For one, the above
combination only permits strings to be shared, not objects.
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-30 Thread Aaron Brady
On Dec 29, 9:08 pm, James Mills prolo...@shortcircuit.net.au
wrote:
 On Tue, Dec 30, 2008 at 12:52 PM, Aaron Brady castiro...@gmail.com wrote:
  On Dec 29, 7:40 pm, James Mills prolo...@shortcircuit.net.au
  wrote:
  On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady castiro...@gmail.com wrote:
   The OP may be interested in Erlang, which Wikipedia (end-all, be-all)
   claims is a 'distribution oriented language'.
  snip
  I'm presently looking at Virtual Synchrony and
  other distributed processing architectures - but
  circuits is meant to be general purpose enough
  to fit event-driven applications/systems.

  I noticed a while ago that threads can be used to simulate
  generators.  'next', 'send', and 'yield' are merely replaced by
  synchronizor calls (coining the term).

  Not the other way around, though, unless the generator guarantees a
  yield frequently.  'settrace' anyone?

 Aaron, circuits doesn't use generators :)
 What did your comment have to do with this ?

 I have often seen generators used to
 facilitate coroutine and coooperative
 programming though.

 cheers
 James

James, Hi.  I'm glad you asked; I never know how out there my
comments are (but surmise that feedback is always a good thing).  What
I was thinking was, I didn't know Virtual Synchrony, and I've never
used Erlang, but I'm interested in concurrency especially as it
pertains to units of work, division of labor, and division of context;
and generators are another way to divide context.  So: I wanted to put
more of my background and interests on the table.  What I said wasn't
directly relevant, I see.  But it's not like I
dissertated (discussed) the Tibettan-Austrian spice trade.  I think
I just want to say stuff about threading!  Maybe I'm just excited to
meet people who share my interests... not unheard of.

In Economics, you can divide a market into vertical and horizontal
dimensions, vertical being the chain from raw resources to finished
products, and horizontal being market coverage.  With a similar
division in tasks, horizontal units would handle incoming events,
prepare them, then pass them on to a next unit, which processes a
little, then passes it on, like an assembly line (bucket brigade/
alternating current); and vertical units would take one incoming
event, and see it through start to finish (direct current).  You don't
have to use that depiction.

The terminology is transposed from that of a distributed matrix
multiplication task depiction, where horizontal works start-to-finish,
and vertical takes handoffs from its predecessor.

'Circuits' doesn't use generators.  I think generators are an
underexplored technique.  Is 'circuits' assembly line or start-to-
finish, if my analogy makes any sense?  'Circuits' is event-driven,
but I don't see any difference between 'event-driven' and
multithreaded in general.  (I think contrast can create a good picture
and a clear understanding.)  What is special about an 'event-driven'
architecture?  Are you distinguishing blocking from polling?
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-30 Thread James Mills
On Wed, Dec 31, 2008 at 12:29 AM, Aaron Brady castiro...@gmail.com wrote:
 James, Hi.  I'm glad you asked; I never know how out there my
 comments are (but surmise that feedback is always a good thing).  What
 I was thinking was, I didn't know Virtual Synchrony, and I've never
 used Erlang, but I'm interested in concurrency especially as it
 pertains to units of work, division of labor, and division of context;
 and generators are another way to divide context.  So: I wanted to put
 more of my background and interests on the table.  What I said wasn't
 directly relevant, I see.  But it's not like I
 dissertated (discussed) the Tibettan-Austrian spice trade.  I think
 I just want to say stuff about threading!  Maybe I'm just excited to
 meet people who share my interests... not unheard of.

Glad to see others also interested in these topics :)

(snip)

 'Circuits' doesn't use generators.  I think generators are an
 underexplored technique.  Is 'circuits' assembly line or start-to-
 finish, if my analogy makes any sense?  'Circuits' is event-driven,
 but I don't see any difference between 'event-driven' and
 multithreaded in general.  (I think contrast can create a good picture
 and a clear understanding.)  What is special about an 'event-driven'
 architecture?  Are you distinguishing blocking from polling?

I'll shortly be releasing circuits-1.0 today hopefully.
To answer your question, circuits is inspired by a
software architecture that my most favoured
lecturer a few years back was teaching. That is:
 * Behaviour Trees (design)
and consequently:
 * The concept of everything is a Component
 * Systems and Sub-Systems are built upon Components
and
 * Everything is an event.
and
 * An emergent property of such systems are Behaviour.

That being said, circuits employs both an event-driven
approach as well as a Component architecture.
In your analogy it is both horizontal and vertical.

As I continue to develop circuits and improve it's
core design as well as building it's ever growing set
of Components, I try to keep it as general as
possible - my main aim though is distributed
processing and architectures. (See the primes example).

Thanks for sharing your interest :)

cheers
James
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-30 Thread James Mills
On Wed, Dec 31, 2008 at 8:42 AM, James Mills
prolo...@shortcircuit.net.au wrote:
(snip)
 As I continue to develop circuits and improve it's
 core design as well as building it's ever growing set
 of Components, I try to keep it as general as
 possible - my main aim though is distributed
 processing and architectures. (See the primes example).

Aaron, just wanted to demonstrate
to you the example (primes) that I mentioned
above:

On Terminal A:

jmi...@atomant:~/circuits/examples$ ./primes.py -o primes.txt -b
127.0.0.1:8000 -p 1000 -w

Total Primes: 1001 (23/s after 44.16s)
Total Events: 43373 (983/s after 44.16s)
Distribution:
 c1096b40-7606-4ba7-9593-1385e14ef339: 348
 8313b43f-d45d-4a0a-8d87-e6a93d3dfb0b: 653

On Terminal B:

jmi...@atomant:~/other/circuits/examples$ ./primes.py -b
127.0.0.1:8001 -s 127.0.0.1:8000

The example uses circuits to distribute work
amongst any arbitrary number of nodes connected
to the system. This is all manually implemented
by simply taking advantage of the circuits framework
with my basic understanding ( so far ) of distributed
processing, virtual synchrony, and other techniques.

This is the same thing just run on a single node (no
distribution):

jmi...@atomant:~/circuits/examples$ ./primes.py -o primes.txt -b
127.0.0.1:8000 -p 1000

Total Primes: 1001 (17/s after 62.13s)
Total Events: 28579 (460/s after 62.13s)
Distribution:
 701be749-9833-40a4-9181-5ee18047b1ad: 1001

As you can see, running 2 instances almost halves
the time. If you do try this though, you'll note that
the CPU isn't being used very heavily at all - This
could be improved - but the example merely
demonstrates distributed event processing and
syncronization.

cheers
James
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-30 Thread Tim Roberts
Christian Heimes li...@cheimes.de wrote:

You have missed an important point. A well designed application does
neither create so many threads nor processes. The creation of a thread
or forking of a process is an expensive operation.

That actually depends on the operating system.  As one example, thread
creation on Windows is not all that expensive.

You should use a pool of threads or processes.

Even so, this is good advise.
-- 
Tim Roberts, t...@probo.com
Providenza  Boekelheide, Inc.
--
http://mail.python.org/mailman/listinfo/python-list


multiprocessing vs thread performance

2008-12-29 Thread mk

Hello everyone,

After reading http://www.python.org/dev/peps/pep-0371/ I was under 
impression that performance of multiprocessing package is similar to 
that of thread / threading. However, to familiarize myself with both 
packages I wrote my own test of spawning and returning 100,000 empty 
threads or processes (while maintaining at most 100 processes / threads 
active at any one time), respectively.


The results I got are very different from the benchmark quoted in PEP 
371. On twin Xeon machine the threaded version executed in 5.54 secs, 
while multiprocessing version took over 222 secs to complete!


Am I doing smth wrong in code below? Or do I have to use 
multiprocessing.Pool to get any decent results?


# multithreaded version


#!/usr/local/python2.6/bin/python

import thread
import time

class TCalc(object):

def __init__(self):
self.tactivnum = 0
self.reslist = []
self.tid = 0
self.tlock = thread.allocate_lock()

def testth(self, tid):
if tid % 1000 == 0:
print == Thread %d working == % tid
self.tlock.acquire()
self.reslist.append(tid)
self.tactivnum -= 1
self.tlock.release()

def calc_100thousand(self):
tid = 1
while tid = 10:
while self.tactivnum  99:
time.sleep(0.01)
self.tlock.acquire()
self.tactivnum += 1
self.tlock.release()
t = thread.start_new_thread(self.testth, (tid,))
tid += 1
while self.tactivnum  0:
time.sleep(0.01)


if __name__ == __main__:
tc = TCalc()
tstart = time.time()
tc.calc_100thousand()
tend = time.time()
print Total time: , tend-tstart



# multiprocessing version

#!/usr/local/python2.6/bin/python

import multiprocessing
import time


def testp(pid):
if pid % 1000 == 0:
print == Process %d working == % pid

def palivelistlen(plist):
pll = 0
for p in plist:
if p.is_alive():
pll += 1
else:
plist.remove(p)
p.join()
return pll

def testp_100thousand():
pid = 1
proclist = []
while pid = 10:
while palivelistlen(proclist)  99:
time.sleep(0.01)
p = multiprocessing.Process(target=testp, args=(pid,))
p.start()
proclist.append(p)
pid += 1
print === Main thread waiting for all processes to finish ===
for p in proclist:
p.join()

if __name__ == __main__:
tstart = time.time()
testp_100thousand()
tend = time.time()
print Total time:, tend - tstart


--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread janislaw
On 29 Gru, 15:52, mk mrk...@gmail.com wrote:
 Hello everyone,

 After readinghttp://www.python.org/dev/peps/pep-0371/I was under
 impression that performance of multiprocessing package is similar to
 that of thread / threading. However, to familiarize myself with both
 packages I wrote my own test of spawning and returning 100,000 empty
 threads or processes (while maintaining at most 100 processes / threads
 active at any one time), respectively.

 The results I got are very different from the benchmark quoted in PEP
 371. On twin Xeon machine the threaded version executed in 5.54 secs,
 while multiprocessing version took over 222 secs to complete!

 Am I doing smth wrong in code below? Or do I have to use
 multiprocessing.Pool to get any decent results?

Oooh, 10 processes! You're fortunate that your OS handled them in
finite time.

[quick browsing through the code]

Ah, so there are 100 processes at time. 200secs still don't sound
strange.

JW
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread mk

janislaw wrote:


Ah, so there are 100 processes at time. 200secs still don't sound
strange.


I ran the PEP 371 code on my system (Linux) on Python 2.6.1:

Linux SLES (9.156.44.174) [15:18] root ~/tmp/src # ./run_benchmarks.py 
empty_func.py


Importing empty_func
Starting tests ...
non_threaded (1 iters)  0.05 seconds
threaded (1 threads)0.000235 seconds
processes (1 procs) 0.002607 seconds

non_threaded (2 iters)  0.06 seconds
threaded (2 threads)0.000461 seconds
processes (2 procs) 0.004514 seconds

non_threaded (4 iters)  0.08 seconds
threaded (4 threads)0.000897 seconds
processes (4 procs) 0.008557 seconds

non_threaded (8 iters)  0.10 seconds
threaded (8 threads)0.001821 seconds
processes (8 procs) 0.016950 seconds

This is very different from PEP 371. It appears that the PEP 371 code 
was written on Mac OS X. The conclusion I get from comparing above costs 
sis that OS X must have very low cost of creating the process, at least 
when compared to Linux, not that multiprocessing is a viable alternative 
to thread / threading module. :-(


--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread Christian Heimes
mk wrote:
 Am I doing smth wrong in code below? Or do I have to use
 multiprocessing.Pool to get any decent results?

You have missed an important point. A well designed application does
neither create so many threads nor processes. The creation of a thread
or forking of a process is an expensive operation. You should use a pool
of threads or processes.

The limiting factor is not the creation time but the communication and
synchronization overhead between multiple threads or processes.

Christian

--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread mk

Christian Heimes wrote:

mk wrote:

Am I doing smth wrong in code below? Or do I have to use
multiprocessing.Pool to get any decent results?


You have missed an important point. A well designed application does
neither create so many threads nor processes. 


Except I was not developing well designed application but writing the 
test the goal of which was measuring the thread / process creation cost.



The creation of a thread
or forking of a process is an expensive operation. 


Sure. The point is, how expensive? While still being relatively 
expensive, it turns out that in Python creating a thread is much, much 
cheaper than creating a process via multiprocessing on Linux, while this 
seems to be not necessarily true on Mac OS X.



You should use a pool
of threads or processes.


Probably true, except, again, that was not quite the point of this 
exercise..



The limiting factor is not the creation time but the communication and
synchronization overhead between multiple threads or processes.


Which I am probably going to test as well.


--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread Roy Smith
In article mailman.6337.1230563873.3487.python-l...@python.org,
 Christian Heimes li...@cheimes.de wrote:

 You have missed an important point. A well designed application does
 neither create so many threads nor processes. The creation of a thread
 or forking of a process is an expensive operation. You should use a pool
 of threads or processes.

It's worth noting that forking a new process is usually a much more 
expensive operation than creating a thread.  Not that I would want to 
create 100,000 of either!

Not everybody realizes it, but threads eat up a fair chunk of memory (you 
get one stack per thread, which means you need to allocate a hunk of memory 
for each stack).  I did a quick look around; 256k seems like a common 
default stack size.  1 meg wouldn't be unheard of.
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread Aaron Brady
On Dec 29, 8:52 am, mk mrk...@gmail.com wrote:
 Hello everyone,

 After readinghttp://www.python.org/dev/peps/pep-0371/I was under
 impression that performance of multiprocessing package is similar to
 that of thread / threading. However, to familiarize myself with both
 packages I wrote my own test of spawning and returning 100,000 empty
 threads or processes (while maintaining at most 100 processes / threads
 active at any one time), respectively.

 The results I got are very different from the benchmark quoted in PEP
 371. On twin Xeon machine the threaded version executed in 5.54 secs,
 while multiprocessing version took over 222 secs to complete!

 Am I doing smth wrong in code below? Or do I have to use
 multiprocessing.Pool to get any decent results?

I'm running a 1.6 GHz.  I only ran 1 empty threads and 1 empty
processes.  The threads were the ones you wrote.  The processes were
empty executables written in a lower language, also run 100 at a time,
started with 'subprocess', not 'multiprocessing'.  The threads took
1.2 seconds.  The processes took 24 seconds.

The processes you wrote had only finished 3000 after several minutes.
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread Jarkko Torppa
On 2008-12-29, mk mrk...@gmail.com wrote:
 janislaw wrote:

 Ah, so there are 100 processes at time. 200secs still don't sound
 strange.

 I ran the PEP 371 code on my system (Linux) on Python 2.6.1:

 Linux SLES (9.156.44.174) [15:18] root ~/tmp/src # ./run_benchmarks.py 
 empty_func.py

 Importing empty_func
 Starting tests ...
 non_threaded (1 iters)  0.05 seconds
 threaded (1 threads)0.000235 seconds
 processes (1 procs) 0.002607 seconds

 non_threaded (2 iters)  0.06 seconds
 threaded (2 threads)0.000461 seconds
 processes (2 procs) 0.004514 seconds

 non_threaded (4 iters)  0.08 seconds
 threaded (4 threads)0.000897 seconds
 processes (4 procs) 0.008557 seconds

 non_threaded (8 iters)  0.10 seconds
 threaded (8 threads)0.001821 seconds
 processes (8 procs) 0.016950 seconds

 This is very different from PEP 371. It appears that the PEP 371 code 
 was written on Mac OS X.

On the PEP371 it says All benchmarks were run using the following:
Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)

On my iMac 2.3Ghz dualcore. python 2.6

iTaulu:src torppa$ python run_benchmarks.py empty_func.py 
Importing empty_func
Starting tests ...
non_threaded (1 iters)  0.02 seconds
threaded (1 threads)0.000227 seconds
processes (1 procs) 0.002367 seconds

non_threaded (2 iters)  0.03 seconds
threaded (2 threads)0.000406 seconds
processes (2 procs) 0.003465 seconds

non_threaded (4 iters)  0.04 seconds
threaded (4 threads)0.000786 seconds
processes (4 procs) 0.006430 seconds

non_threaded (8 iters)  0.06 seconds
threaded (8 threads)0.001618 seconds
processes (8 procs) 0.012841 seconds

With python2.5 and pyProcessing-0.52

iTaulu:src torppa$ python2.5 run_benchmarks.py empty_func.py
Importing empty_func
Starting tests ...
non_threaded (1 iters)  0.03 seconds
threaded (1 threads)0.000143 seconds
processes (1 procs) 0.002794 seconds

non_threaded (2 iters)  0.04 seconds
threaded (2 threads)0.000277 seconds
processes (2 procs) 0.004046 seconds

non_threaded (4 iters)  0.05 seconds
threaded (4 threads)0.000598 seconds
processes (4 procs) 0.007816 seconds

non_threaded (8 iters)  0.08 seconds
threaded (8 threads)0.001173 seconds
processes (8 procs) 0.015504 seconds

-- 
Jarkko Torppa, Elisa
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread mk

Jarkko Torppa wrote:


On the PEP371 it says All benchmarks were run using the following:
Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)


Right... I overlooked that. My tests I quoted above were done on SLES 
10, kernel 2.6.5.



With python2.5 and pyProcessing-0.52

iTaulu:src torppa$ python2.5 run_benchmarks.py empty_func.py
Importing empty_func
Starting tests ...
non_threaded (1 iters)  0.03 seconds
threaded (1 threads)0.000143 seconds
processes (1 procs) 0.002794 seconds

non_threaded (2 iters)  0.04 seconds
threaded (2 threads)0.000277 seconds
processes (2 procs) 0.004046 seconds

non_threaded (4 iters)  0.05 seconds
threaded (4 threads)0.000598 seconds
processes (4 procs) 0.007816 seconds

non_threaded (8 iters)  0.08 seconds
threaded (8 threads)0.001173 seconds
processes (8 procs) 0.015504 seconds


There's smth wrong with numbers posted in PEP. This is what I got on 
4-socket Xeon (+ HT) with Python 2.6.1 on Debian (Etch), with kernel 
upgraded to 2.6.22.14:



non_threaded (1 iters)  0.04 seconds
threaded (1 threads)0.000159 seconds
processes (1 procs) 0.001067 seconds

non_threaded (2 iters)  0.05 seconds
threaded (2 threads)0.000301 seconds
processes (2 procs) 0.001754 seconds

non_threaded (4 iters)  0.06 seconds
threaded (4 threads)0.000581 seconds
processes (4 procs) 0.003906 seconds

non_threaded (8 iters)  0.09 seconds
threaded (8 threads)0.001148 seconds
processes (8 procs) 0.008178 seconds


--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread Hrvoje Niksic
Roy Smith r...@panix.com writes:

 In article mailman.6337.1230563873.3487.python-l...@python.org,
  Christian Heimes li...@cheimes.de wrote:

 You have missed an important point. A well designed application does
 neither create so many threads nor processes. The creation of a thread
 or forking of a process is an expensive operation. You should use a pool
 of threads or processes.

 It's worth noting that forking a new process is usually a much more 
 expensive operation than creating a thread.

If by forking you mean an actual fork() call, as opposed to invoking
a different executable, the difference is not necessarily that great.
Modern Unix systems tend to implement a 1:1 mapping between threads
and kernel processes, so creating a thread and forking a process
require similar amount of work.

On my system, as measured by timeit, spawning and joining a thread
takes 111 usecs, while forking and waiting for a process takes 260.
Slower, but not catastrophically so.

 Not that I would want to create 100,000 of either!

Agreed.

 Not everybody realizes it, but threads eat up a fair chunk of memory
 (you get one stack per thread, which means you need to allocate a
 hunk of memory for each stack).  I did a quick look around; 256k
 seems like a common default stack size.  1 meg wouldn't be unheard
 of.

Note that this memory is virtual memory, so it doesn't use up the
physical RAM until actually used.  I've seen systems running legacy
Java applications that create thousands of threads where *virtual*
memory was the bottleneck.
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread James Mills
On Tue, Dec 30, 2008 at 12:52 AM, mk mrk...@gmail.com wrote:
 Hello everyone,

 After reading http://www.python.org/dev/peps/pep-0371/ I was under
 impression that performance of multiprocessing package is similar to that of
 thread / threading. However, to familiarize myself with both packages I
 wrote my own test of spawning and returning 100,000 empty threads or
 processes (while maintaining at most 100 processes / threads active at any
 one time), respectively.

 The results I got are very different from the benchmark quoted in PEP 371.
 On twin Xeon machine the threaded version executed in 5.54 secs, while
 multiprocessing version took over 222 secs to complete!

 Am I doing smth wrong in code below? Or do I have to use
 multiprocessing.Pool to get any decent results?

The overhead in starting OS level processes
is quite high. This is why event-driven, single
process servers can perform far better than
ones that fork (spawn multiple processes)
per request.

As others have mentioned, it's not suprising
that spawning even 100 processes took some
time.

Bottom line: multiprocessing should not be used this way.
(nor should threading).

cheers
James
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread Aaron Brady
On Dec 29, 6:05 pm, James Mills prolo...@shortcircuit.net.au
wrote:
 On Tue, Dec 30, 2008 at 12:52 AM, mk mrk...@gmail.com wrote:
  Hello everyone,

  After readinghttp://www.python.org/dev/peps/pep-0371/I was under
  impression that performance of multiprocessing package is similar to that of
  thread / threading. However, to familiarize myself with both packages I
  wrote my own test of spawning and returning 100,000 empty threads or
  processes (while maintaining at most 100 processes / threads active at any
  one time), respectively.
snip
 As others have mentioned, it's not suprising
 that spawning even 100 processes took some
 time.

 Bottom line: multiprocessing should not be used this way.
 (nor should threading).

The OP may be interested in Erlang, which Wikipedia (end-all, be-all)
claims is a 'distribution oriented language'.

You might also find it interesting to examine a theoretical OS that is
optimized for process overhead.  In other words, what is the minimum
overhead possible?  Can processes be as small as threads?  Can entire
threads be only a few bytes (words) big?

Also, could generators provide any of the things you need with your
multiple threads?  You could, say, call 'next()' on many items in a
list, and just remove them on StopIteration.
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread James Mills
On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady castiro...@gmail.com wrote:
 The OP may be interested in Erlang, which Wikipedia (end-all, be-all)
 claims is a 'distribution oriented language'.

I would suggest to the OP that he take a look
at circuits (1) an event framework with a focus
on component architectures and distributed
processing.

I'm presently looking at Virtual Synchrony and
other distributed processing architectures - but
circuits is meant to be general purpose enough
to fit event-driven applications/systems.

cheers
James

1. http://trac.softcircuit.com.au/circuits/
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread Aaron Brady
On Dec 29, 7:40 pm, James Mills prolo...@shortcircuit.net.au
wrote:
 On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady castiro...@gmail.com wrote:
  The OP may be interested in Erlang, which Wikipedia (end-all, be-all)
  claims is a 'distribution oriented language'.
snip
 I'm presently looking at Virtual Synchrony and
 other distributed processing architectures - but
 circuits is meant to be general purpose enough
 to fit event-driven applications/systems.

I noticed a while ago that threads can be used to simulate
generators.  'next', 'send', and 'yield' are merely replaced by
synchronizor calls (coining the term).

Not the other way around, though, unless the generator guarantees a
yield frequently.  'settrace' anyone?
--
http://mail.python.org/mailman/listinfo/python-list


Re: multiprocessing vs thread performance

2008-12-29 Thread James Mills
On Tue, Dec 30, 2008 at 12:52 PM, Aaron Brady castiro...@gmail.com wrote:
 On Dec 29, 7:40 pm, James Mills prolo...@shortcircuit.net.au
 wrote:
 On Tue, Dec 30, 2008 at 11:34 AM, Aaron Brady castiro...@gmail.com wrote:
  The OP may be interested in Erlang, which Wikipedia (end-all, be-all)
  claims is a 'distribution oriented language'.
 snip
 I'm presently looking at Virtual Synchrony and
 other distributed processing architectures - but
 circuits is meant to be general purpose enough
 to fit event-driven applications/systems.

 I noticed a while ago that threads can be used to simulate
 generators.  'next', 'send', and 'yield' are merely replaced by
 synchronizor calls (coining the term).

 Not the other way around, though, unless the generator guarantees a
 yield frequently.  'settrace' anyone?

Aaron, circuits doesn't use generators :)
What did your comment have to do with this ?

I have often seen generators used to
facilitate coroutine and coooperative
programming though.

cheers
James
--
http://mail.python.org/mailman/listinfo/python-list