Threads vs. processes, what to consider in choosing ?
Hi, May I have your recommendations in choosing threads or processes for the following ? I have a wxPython application that builds an internal database from a list of files and then displays various aspects of that data, in response to user's requests. I want to add a module that finds events in a set of log files (LogManager). These log files are potentially huge, and the initial processing is lengthy (several minutes). Thus, when the user will choose LogManager, it would be unacceptable to block the other parts of the program, and so - the initial LogManager processing would need to be done separately from the normal run of the program. Once the initial processing is done, the main program would be notified and could display the results of LogManager processing. I was thinking of either using threads, or using separate processes, for the main programs and LogManager. What would you suggest I should consider in choosing between the two options ? Are there other options besides threads and multi-processing ? Thanks, Ron. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs. processes, what to consider in choosing ?
On Feb 17, 2009, at 10:18 AM, Barak, Ron wrote: I have a wxPython application that builds an internal database from a list of files and then displays various aspects of that data, in response to user's requests. I want to add a module that finds events in a set of log files (LogManager). These log files are potentially huge, and the initial processing is lengthy (several minutes). Thus, when the user will choose LogManager, it would be unacceptable to block the other parts of the program, and so - the initial LogManager processing would need to be done separately from the normal run of the program. Once the initial processing is done, the main program would be notified and could display the results of LogManager processing. I was thinking of either using threads, or using separate processes, for the main programs and LogManager. What would you suggest I should consider in choosing between the two options ? Are there other options besides threads and multi-processing ? Hi Ron, The general rule is that it is a lot easier to share data between threads than between processes. The multiprocessing library makes the latter easier but is only part of the standard library in Python = 2.6. The design of your application matters a lot. For instance, will the processing code write its results to a database, ping the GUI code and then exit, allowing the GUI to read the database? That sounds like an excellent setup for processes. In addition, there's the GIL to consider. Multi-process applications aren't affected by it while multi-threaded applications may be. In these days where multi-processor/multi-core machines are more common, this fact is ever more important. Torrents of words have been written about the GIL on this list and elsewhere and I have nothing useful to add to the torrents. I encourage you to read some of those conversations. FWIW, when I was faced with a similar setup, I went with multiple processes rather than threads. Last but not least, since you asked about alternatives to threads and multiprocessing, I'll point you to some low level libraries I wrote for doing interprocess communication: http://semanchuk.com/philip/posix_ipc/ http://semanchuk.com/philip/sysv_ipc/ Good luck Philip -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs. processes, what to consider in choosing ?
Philip Semanchuk wrote: The general rule is that it is a lot easier to share data between threads than between processes. The multiprocessing library makes the latter easier but is only part of the standard library in Python = 2.6. The design of your application matters a lot. For instance, will the processing code write its results to a database, ping the GUI code and then exit, allowing the GUI to read the database? That sounds like an excellent setup for processes. A backport for Python 2.4 and 2.5 is available on pypi. Python 2.5.4 is recommended though. Christian -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Dennis Lee Bieber [EMAIL PROTECTED] Wrote: | On Thu, 27 Jul 2006 09:17:56 -0700, Carl J. Van Arsdall | [EMAIL PROTECTED] declaimed the following in comp.lang.python: | | Ah, alright, I think I understand, so threading works well for sharing | python objects. Would a scenario for this be something like a a job | queue (say Queue.Queue) for example. This is a situation in which each | process/thread needs access to the Queue to get the next task it must | work on. Does that sound right? Would the same apply to multiple | threads needed access to a dictionary? list? | | Python's Queue module is only (to my knowledge) an internal | (thread-shared) communication channel; you'd need something else to work | IPC -- VMS mailboxes, for example (more general than UNIX pipes with | their single reader/writer concept) | | shared memory mean something more low-level like some bits that don't | necessarily mean anything to python but might mean something to your | application? | | Most OSs support creation and allocation of memory blocks with an | attached name; this allows multiple processes to map that block of | memory into their address space. The contents of said memory block is | totally up to application agreements (won't work well with Python native | objects). | | mmap() | | is one such system. By rough description, it maps a disk file into a | block of memory, so the OS handles loading the data (instead of, say, | file.seek(somewhere_long) followed by file.read(some_data_type) you | treat the mapped memory as an array and use x = mapped[somewhere_long]; | if somewhere_long is not yet in memory, the OS will page swap that part | of the file into place). The file can be shared, so different | processes can map the same file, and thereby, the same memory contents. | | This can be useful, for example, with multiple identical processes | feeding status telemetry. Each process is started with some ID, and the | ID determines which section of mapped memory it is to store its status | into. The controller program can just do a loop over all the mapped | memory, updating a display with whatever is current -- doesn't matter if | process_N manages to update a field twice while the monitor is | scanning... The display always shows the data that was current at the | time of the scan. | | Carried further -- special memory cards can (at least they were | where I work) be obtained. These cards have fiber-optic connections. In | a closely distributed system, each computer has one of these cards, and | the fiber-optics link them in a cycle. Each process (on each computer) | maps the memory of the card -- the cards then have logic to relay all | memory changes, via fiber, to the next card in the link... Thus, all the | closely linked computers share this block of memory. This is nice to share inputs from the real world - but there are some hairy issues if it is to be used for general purpose consumption - unless there are hardware restrictions to stop machines stomping on each other's memories - i.e. the machines have to be *polite* and *well behaved* - or you can easily have a major smash... A structure has to agreed on, and respected... - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
[mark] http://twistedmatrix.com/projects/core/documentation/howto/async.html . At my work, we started writing a web app using the twisted framework, but it was somehow too twisted for the developers, so actually they chose to do threading rather than using twisted's async methods. -- Tobias Brox, 69°42'N, 18°57'E -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
On Thu, 27 Jul 2006 20:53:54 -0700, Nick Vatamaniuc wrote: Debugging all those threads should be a project in an of itself. Ahh, debugging - I forgot to bring that one up in my argument! Thanks Nick ;) Certainly I agree of course that there are many applications which suit a threaded design. I just think there is a general over-emphasis on using threads and see it applied very often where an event based approach would be cleaner and more efficient. Thanks for your comments Bryan and Nick, an interesting debate. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Chance Ginger wrote: Not quite that simple. In most modern OS's today there is something called COW - copy on write. What happens is when you fork a process it will make an identical copy. Whenever the forked process does write will it make a copy of the memory. So it isn't quite as bad. A noteable exception is a toy OS from a manufacturer in Redmond, Washington. It does not do COW fork. It does not even fork. To make a server system scale well on Windows you need to use threads, not processes. That is why the global interpreter lock sucks so badly on Windows. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
sturlamolden wrote: A noteable exception is a toy OS from a manufacturer in Redmond, Washington. It does not do COW fork. It does not even fork. To make a server system scale well on Windows you need to use threads, not processes. Here's one to think about: if you have a bunch of threads running, and you fork, should the child process be born running all the threads? Neither answer is very attractive. It's a matter of which will probably do the least damage in most cases (and the answer the popular threading systems choose is 'no'; the child process runs only the thread that called fork). MS-Windows is more thread-oriented than *nix, and it avoids this particular problem by not using fork() to create new processes. That is why the global interpreter lock sucks so badly on Windows. It sucks about he same on Windows and *nix: hardly at all on single-processors, moderately on multi-processors. -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
sturlamolden wrote: Chance Ginger wrote: Not quite that simple. In most modern OS's today there is something called COW - copy on write. What happens is when you fork a process it will make an identical copy. Whenever the forked process does write will it make a copy of the memory. So it isn't quite as bad. A noteable exception is a toy OS from a manufacturer in Redmond, Washington. It does not do COW fork. It does not even fork. That's only true for Windows 98/95/Windows 3.x and other DOS-based Windows versions. NTCreateProcess with SectionHandle=NULL creates a new process with a COW version of the parent process's address space. It's not called fork, but it does the same thing. There's a new name for it in Win2K or XP (maybe CreateProcessEx?) but the functionality has been there since the NT 3.x days at least and is in all modern Windows versions. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
mark wrote: On Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote: Alright, based a on discussion on this mailing list, I've started to wonder, why use threads vs processes. The debate should not be about threads vs processes, it should be about threads vs events. Events serve a seperate problem space. Use event-driven state machine models for efficient multiplexing and fast network I/O (e.g. writing an efficient static HTTP server) Use multi-execution models for efficient multiprocessing. No matter how scalable your event-driven app is it's not going to take advantage of multi-CPU systems, or modern multi-core processors. Event-driven state machines can be harder to program and maintain than multi-process solutions, but they are usually easier than multi-threaded solutions. On-topic: If your problem is one where event-driven state machines are a good solution, Python generators can be a _huge_ help. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
John Henry wrote: Carl, OS writers provide much more tools for debugging, tracing, changing the priority of, sand-boxing processes than threads (in general) It *should* be easier to get a process based solution up and running andhave it be more robust, when compared to a threaded solution. - Paddy (who shies away from threads in C and C++ too ;-) That mythical process is more robust then thread application paradigm again. No wonder there are so many boring software applications around. Granted. Threaded program forces you to think and design your application much more carefully (to avoid race conditions, dead-locks, ...) but there is nothing inherently *non-robust* about threaded applications. Indeed. Let's just get rid of all preemptive multitasking while we're at it; MacOS9's cooperative, non-memory-protected system wasn't inherently worse as long as every application was written properly. There was nothing inherently non-robust about it! The key difference between threads and processes is that threads share all their memory, while processes have memory protection except with particular segments of memory they choose to share. The next most important difference is that certain languages have different support for threads/procs. If you're writing a Python application, you need to be aware of the GIL and its implications on multithreaded performance. If you're writing a Java app, you're handicapped by the lack of support for multiprocess solutions. The third most important difference--and it's a very distant difference--is the performance difference. In practice, most well-designed systems will be pooling threads/procs and so startup time is not that critical. For some apps, it may be. Context switching time may differ, and likewise that is not usually a sticking point but for particular programs it can be. On some OSes, launching a copy-on-write process is difficult--that used to be a reason to choose threads over procs on Windows, but nowadays all modern Windows OSes offer a CreateProcessEx call that allows full-on COW processes. In general, though, if you want to share _all_ memory or if you have measured and context switching sucks on your OS and is a big factor in your application, use threads. In general, if you don't know exactly why you're choosing one or the other, or if you want memory protection, robustness in the face of programming errors, access to more 3rd-party libraries, etc, then you should choose a multiprocess solution. (OS designers spent years of hard work writing OSes with protected memory--why voluntarily throw that out?) -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Russell Warren wrote: This is something I have a streak of paranoia about (after discovering that the current xmlrpclib has some thread safety issues). Is there a list maintained anywhere of the modules that are aren't thread safe? It's much safer to work the other way: assume that libraries are _not_ thread safe unless they're listed as such. Even things like the standard C library on mainstream Linux distributions are only about 7 years into being thread-safe by default, anything at all esoteric you should assume is not until you investigate and find documentation to the contrary. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
[EMAIL PROTECTED] wrote: John Henry wrote: Granted. Threaded program forces you to think and design your application much more carefully (to avoid race conditions, dead-locks, ...) but there is nothing inherently *non-robust* about threaded applications. Indeed. Let's just get rid of all preemptive multitasking while we're at it Also, race conditions and deadlocks are equally bad in multiprocess solutions as in multithreaded ones. Any time you're doing parallel processing you need to consider them. I'd actually submit that initially writing multiprocess programs requires more design and forethought, since you need to determine exactly what you want to share instead of just saying what the heck, everything's shared! The payoff in terms of getting _correct_ behavior more easily, having much easier maintenance down the line, and being more robust in the face of program failures (or unforseen environment issues) is usually well worth it, though there are certainly some applications where threads are a better choice. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Yes, someone can, and that someone might as well be you. How long does it take to create and clean up 100 trivial processes on your system? How about 100 threads? What portion of your user waiting time is that? Here is test prog... The results are on my 2.6GHz P4 linux system Forking 1000 loops, best of 3: 546 usec per loop Threading 1 loops, best of 3: 199 usec per loop Indicating that starting up and tearing down new threads is 2.5 times quicker than starting new processes under python. This is probably irrelevant in the real world though! Time threads vs fork import os import timeit import threading def do_child_stuff(): Trivial function for children to run # print hello from child pass def fork_test(): Test forking pid = os.fork() if pid == 0: # child do_child_stuff() os._exit(0) # parent - wait for child to finish os.waitpid(pid, os.P_WAIT) def thread_test(): Test threading t = threading.Thread(target=do_child_stuff) t.start() # wait for child to finish t.join() def main(): print Forking timeit.main([-s, from __main__ import fork_test, fork_test()]) print Threading timeit.main([-s, from __main__ import thread_test, thread_test()]) if __name__ == __main__: main() -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Carl J. Van Arsdall wrote: Paul Rubin wrote: Carl J. Van Arsdall [EMAIL PROTECTED] writes: Processes seem fairly expensive from my research so far. Each fork copies the entire contents of memory into the new process. No, you get two processes whose address spaces get the data. It's done with the virtual memory hardware. The data isn't copied. The page tables of both processes are just set up to point to the same physical pages. Copying only happens if a process writes to one of the pages. The OS detects this using a hardware trap from the VM system. Ah, alright. So if that's the case, why would you use python threads versus spawning processes? If they both point to the same address space and python threads can't run concurrently due to the GIL what are they good for? Well, of course they can interleave essentially independent computations, which is why threads (formerly lightweight processes) were traditionally defined. Further, some thread-safe extension (compiled) libraries will release the GIL during their work, allowing other threads to execute simultaneously - and even in parallel on multi-processor hardware. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
On 2006-07-26 19:10:14, Carl J. Van Arsdall wrote: Ah, alright. So if that's the case, why would you use python threads versus spawning processes? If they both point to the same address space and python threads can't run concurrently due to the GIL what are they good for? Nothing runs concurrently on a single core processor (pipelining aside). Processes don't run any more concurrently than threads. The scheduling is different, but they still run sequentially. Gerhard -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
[EMAIL PROTECTED] wrote: Carl J. Van Arsdall wrote: Alright, based a on discussion on this mailing list, I've started to wonder, why use threads vs processes. In many cases, you don't have a choice. If your Python program is to run other programs, the others get their own processes. There's no threads option on that. If multiple lines of execution need to share Python objects, then the standard Python distribution supports threads, while processes would require some heroic extension. Don't confuse sharing memory, which is now easy, with sharing Python objects, which is hard. Ah, alright, I think I understand, so threading works well for sharing python objects. Would a scenario for this be something like a a job queue (say Queue.Queue) for example. This is a situation in which each process/thread needs access to the Queue to get the next task it must work on. Does that sound right? Would the same apply to multiple threads needed access to a dictionary? list? Now if you are just passing ints and strings around, use processes with some type of IPC, does that sound right as well? Or does the term shared memory mean something more low-level like some bits that don't necessarily mean anything to python but might mean something to your application? Sorry if you guys think i'm beating this to death, just really trying to get a firm grasp on what you are telling me and again, thanks for taking the time to explain all of this to me! -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: John Henry wrote: Granted. Threaded program forces you to think and design your application much more carefully (to avoid race conditions, dead-locks, ...) but there is nothing inherently *non-robust* about threaded applications. Indeed. Let's just get rid of all preemptive multitasking while we're at it Also, race conditions and deadlocks are equally bad in multiprocess solutions as in multithreaded ones. Any time you're doing parallel processing you need to consider them. Only in the sense that you are far more likely to be dealing with shared resources in a multi-threaded application. When I start a sub-process, I know I am doing that to *avoid* resource sharing. So, the chance of a dead-lock is less - only because I would do it far less. I'd actually submit that initially writing multiprocess programs requires more design and forethought, since you need to determine exactly what you want to share instead of just saying what the heck, everything's shared! The payoff in terms of getting _correct_ behavior more easily, having much easier maintenance down the line, and being more robust in the face of program failures (or unforseen environment issues) is usually well worth it, though there are certainly some applications where threads are a better choice. If you're sharing things, I would thread. I would not want to pay the expense of a process. It's too bad that programmers are not threading more often. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
John Henry wrote: If you're sharing things, I would thread. I would not want to pay the expense of a process. This is generally a false cost. There are very few applications where thread/process startup time is at all a fast path, and there are likewise few where the difference in context switching time matters at all. Indeed, in a Python program on a multiprocessor system, process are potentially faster than threads, not slower. Moreover, to get at best a small performance gain you pay a huge cost by sacrificing memory protection within the threaded process. You can share things between processes, but you can't memory protect things between threads. So if you need some of each (some things shared and others protected), processes are the clear choice. Now, for a few applications threads make sense. Usually that means applications that have to share a great number of complex data structures (and normally, making the choice for performance reasons means your design is flawed and you could help performance greatly by reworking it--though there may be some exceptions). But the general rule when choosing between them should be use processes when you can, and threads when you must. Sadly, too many programmers greatly overuse threading. That problem is exacerbated by the number of beginner-level programming books that talk about how to use threads without ever mentioning processes (and without going into the design of multi-execution apps). -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
How do i share memory between processes?"[EMAIL PROTECTED]" [EMAIL PROTECTED] wrote: John Henry wrote: If you're sharing things, I would thread. I would not want to pay the expense of a process.This is generally a false cost. There are very few applications wherethread/process startup time is at all a fast path, and there arelikewise few where the difference in context switching time matters atall. Indeed, in a Python program on a multiprocessor system, processare potentially faster than threads, not slower.Moreover, to get at best a small performance gain you pay a huge costby sacrificing memory protection within the threaded process.You can share things between processes, but you can't memory protectthings between threads. So if you need some of each (some thingsshared and others protected), processes are the clear choice.Now, for a few applications threads make sense. Usually that meansapplications that have to share a great number of complex datastructures (and normally, making the choice for performance reasonsmeans your design is flawed and you could help performance greatly byreworking it--though there may be some exceptions). But the generalrule when choosing between them should be "use processes when you can,and threads when you must".Sadly, too many programmers greatly overuse threading. That problem isexacerbated by the number of beginner-level programming books that talkabout how to use threads without ever mentioning processes (and withoutgoing into the design of multi-execution apps).-- http://mail.python.org/mailman/listinfo/python-list __Do You Yahoo!?Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Nick Craig-Wood wrote: Here is test prog... snip Here's a more real-life like program done in both single threaded mode and multi-threaded mode. You'll need PythonCard to try this. Just to make the point, you will notice that the core code is identical between the two (method on_menuFileStart_exe). The only difference is in the setup code. I wanted to dismiss the myth that multi-threaded programs are inherently *evil*, or that it's diffcult to code, or that it's unsafe.(what ever dirty water people wish to throw at it). Don't ask me to try this in process! To have fun, first run it in single threaded mode (change the main program to invoke the MyBackground class, instead of the MyBackgroundThreaded class): Change: app = model.Application(MyBackgroundThreaded) to: app = model.Application(MyBackground) Start the process by selecting File-Start, and then try to stop the program by clicking File-Stop. Note the performance of the program. Now, run it in multi-threaded mode. Click File-Start several times (up to 4) and then try to stop the program by clicking File-Stop. If you want to show off, add several more StaticText items in the resource file, add them to the textAreas list in MyBackgroundThreaded class and let it rip! BTW: This ap also demonstrates the weakness in Python thread - the threads don't get preempted equally (not even close). :-) Two files follows (test.py and test.rsrc.py): #!/usr/bin/python __version__ = $Revision: 1.1 $ __date__ = $Date: 2004/10/24 19:21:46 $ import wx import threading import thread import time from PythonCard import model class MyBackground(model.Background): def on_initialize(self, event): # if you have any initialization # including sizer setup, do it here self.running(False) self.textAreas=(self.components.TextArea1,) return def on_menuFileStart_select(self, event): on_menuFileStart_exe(self.textAreas[0]) return def on_menuFileStart_exe(self, textArea): textArea.visible=True self.running(True) for i in range(1000): textArea.text = Got up to %d % i ##print i for j in range(i): k = 0 time.sleep(0) if not self.running(): break try: wx.SafeYield(self) except: pass if not self.running(): break textArea.text = Finished at %d % i return def on_menuFileStop_select(self, event): self.running(False) def on_Stop_mouseClick(self, event): self.on_menuFileStop_select(event) return def running(self, flag=None): if flag!=None: self.runningFlag=flag return self.runningFlag class MyBackgroundThreaded(MyBackground): def on_initialize(self, event): # if you have any initialization # including sizer setup, do it here self.myLock=thread.allocate_lock() self.myThreadCount = 0 self.running(False) self.textAreas=[self.components.TextArea1, self.components.TextArea2, self.components.TextArea3, self.components.TextArea4] return def on_menuFileStart_select(self, event): res=MyBackgroundWorker(self).start() def on_menuFileStop_select(self, event): self.running(False) self.menuBar.setEnabled(menuFileStart, True) def on_Stop_mouseClick(self, event): self.on_menuFileStop_select(event) def running(self, flag=None): self.myLock.acquire() if flag!=None: self.runningFlag=flag flag=self.runningFlag self.myLock.release() return flag class MyBackgroundWorker(threading.Thread): def __init__(self, parent): threading.Thread.__init__(self) self.parent=parent self.parent.myLock.acquire() threadCount=self.parent.myThreadCount self.parent.myLock.release() self.textArea=self.parent.textAreas[threadCount] def run(self): self.parent.myLock.acquire() self.parent.myThreadCount += 1 if self.parent.myThreadCount==len(self.parent.textAreas): self.parent.menuBar.setEnabled(menuFileStart, False) self.parent.myLock.release() self.parent.on_menuFileStart_exe(self.textArea)
Re: Threads vs Processes
Carl J. Van Arsdall wrote: Ah, alright, I think I understand, so threading works well for sharing python objects. Would a scenario for this be something like a a job queue (say Queue.Queue) for example. This is a situation in which each process/thread needs access to the Queue to get the next task it must work on. Does that sound right? That's a reasonable and popular technique. I'm not sure what this refers to in your question, so I can't say if it solves the problem of which you are thinking. Would the same apply to multiple threads needed access to a dictionary? list? The Queue class is popular with threads because it already has locking around its basic methods. You'll need to serialize your operations when sharing most kinds of objects. Now if you are just passing ints and strings around, use processes with some type of IPC, does that sound right as well? Also reasonable and popular. You can even pass many Python objects by value using pickle, though you lose some safety. Or does the term shared memory mean something more low-level like some bits that don't necessarily mean anything to python but might mean something to your application? Shared memory means the same memory appears in multiple processes, possibly at different address ranges. What any of them writes to the memory, they can all read. The standard Python distribution now offers shared memory via os.mmap(), but lacks cross-process locks. Python doesn't support allocating objects in shared memory, and doing so would be difficult. That's what the POSH project is about, but it looks stuck in alpha. -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
On 2006-07-27, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: If you're sharing things, I would thread. I would not want to pay the expense of a process. This is generally a false cost. There are very few applications where thread/process startup time is at all a fast path, Even if it were, on any sanely designed OS, there really isn't any extra expense for a process over a thread. Moreover, to get at best a small performance gain you pay a huge cost by sacrificing memory protection within the threaded process. Threading most certainly shouldn't be done in some attempt to improve performance over a multi-process model. It should be done because it fits the algorithm better. If the execution contexts don't need to share data and can communicate in a simple manner, then processes probably make more sense. If the contexts need to operate jointly on complex shared data, then threads are usually easier. -- Grant Edwards grante Yow! My life is a patio at of fun! visi.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
[EMAIL PROTECTED] wrote: Carl J. Van Arsdall wrote: Ah, alright, I think I understand, so threading works well for sharing python objects. Would a scenario for this be something like a a job queue (say Queue.Queue) for example. This is a situation in which each process/thread needs access to the Queue to get the next task it must work on. Does that sound right? That's a reasonable and popular technique. I'm not sure what this refers to in your question, so I can't say if it solves the problem of which you are thinking. Would the same apply to multiple threads needed access to a dictionary? list? The Queue class is popular with threads because it already has locking around its basic methods. You'll need to serialize your operations when sharing most kinds of objects. Yes yes, of course. I was just making sure we are on the same page, and I think I'm finally getting there. Now if you are just passing ints and strings around, use processes with some type of IPC, does that sound right as well? Also reasonable and popular. You can even pass many Python objects by value using pickle, though you lose some safety. I actually do use pickle (not for this, but for other things), could you elaborate on the safety issue? Or does the term shared memory mean something more low-level like some bits that don't necessarily mean anything to python but might mean something to your application? Shared memory means the same memory appears in multiple processes, possibly at different address ranges. What any of them writes to the memory, they can all read. The standard Python distribution now offers shared memory via os.mmap(), but lacks cross-process locks. Python doesn't support allocating objects in shared memory, and doing so would be difficult. That's what the POSH project is about, but it looks stuck in alpha. -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Carl J. Van Arsdall wrote: [...] I actually do use pickle (not for this, but for other things), could you elaborate on the safety issue? From http://docs.python.org/lib/node63.html : Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source. A corrupted pickle can crash Python. An evil pickle could probably hijack your process. -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
[EMAIL PROTECTED] wrote: Carl J. Van Arsdall wrote: [...] I actually do use pickle (not for this, but for other things), could you elaborate on the safety issue? From http://docs.python.org/lib/node63.html : Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source. A corrupted pickle can crash Python. An evil pickle could probably hijack your process. Ah, i the data is coming from someone else. I understand. Thanks. -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
On Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote: Alright, based a on discussion on this mailing list, I've started to wonder, why use threads vs processes. The debate should not be about threads vs processes, it should be about threads vs events. Dr. John Ousterhout (creator of Tcl, Professor of Comp Sci at UC Berkeley, etc), started a famous debate about this 10 years ago with the following simple presentation. http://home.pacbell.net/ouster/threads.pdf That sentiment has largely been ignored and thread usage dominates but, if you have been programming for as long as I have, and have used both thread based architectures AND event/reactor/callback based architectures, then that simple presentation above should ring very true. Problem is, young people merely equate newer == better. On large systems and over time, thread based architectures often tend towards chaos. I have seen a few thread based systems where the programmers become so frustrated with subtle timing issues etc, and they eventually overlay so many mutexes etc, that the implementation becomes single threaded in practice anyhow(!), and very inefficient. BTW, I am fairly new to python but I have seen that the python Twisted framework is a good example of the event/reactor design alternative to threads. See http://twistedmatrix.com/projects/core/documentation/howto/async.html . Douglas Schmidt is a famous designer and author (ACE, Corba Tao, etc) who has written much about reactor design patterns, see Pattern-Oriented Software Architecture, Vol 2, Wiley 2000, amongst many other references of his. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
mark wrote: The debate should not be about threads vs processes, it should be about threads vs events. We are so lucky as to have both debates. Dr. John Ousterhout (creator of Tcl, Professor of Comp Sci at UC Berkeley, etc), started a famous debate about this 10 years ago with the following simple presentation. http://home.pacbell.net/ouster/threads.pdf The Ousterhout school finds multiple lines of execution unmanageable, while the Tannenbaum school finds asynchronous I/O unmanageable. What's so hard about single-line-of-control (SLOC) event-driven programming? You can't call anything that might block. You have to initiate the operation, store all the state you'll need in order to pick up where you left off, then return all the way back to the event dispatcher. That sentiment has largely been ignored and thread usage dominates but, if you have been programming for as long as I have, and have used both thread based architectures AND event/reactor/callback based architectures, then that simple presentation above should ring very true. Problem is, young people merely equate newer == better. Newer? They're both old as the trees. That can't be why the whiz kids like them. Threads and process rule because of their success. On large systems and over time, thread based architectures often tend towards chaos. While large SLOC event-driven systems surely tend to chaos. Why? Because they *must* be structured around where blocking operations can happen, and that is not the structure anyone would choose for clarity, maintainability and general chaos avoidance. Even the simplest of modular structures, the procedure, gets broken. Whether you can encapsulate a sequence of operations in a procedure depends upon whether it might need to do an operation that could block. Going farther, consider writing a class supporting overriding of some method. Easy; we Pythoneers do it all the time; that's what O.O. inheritance is all about. Now what if the subclass's version of the method needs to look up external data, and thus might block? How does a method override arrange for the call chain to return all the way back to the event loop, and to and pick up again with the same call chain when the I/O comes in? I have seen a few thread based systems where the programmers become so frustrated with subtle timing issues etc, and they eventually overlay so many mutexes etc, that the implementation becomes single threaded in practice anyhow(!), and very inefficient. While we simply do not see systems as complex as modern DBMS's written in the SLOC event-driven style. BTW, I am fairly new to python but I have seen that the python Twisted framework is a good example of the event/reactor design alternative to threads. See http://twistedmatrix.com/projects/core/documentation/howto/async.html . And consequently, to use Twisted you rewrite all your code as those 'deferred' things. -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
It seems that both ways are here to stay. If one was so much inferior and problem-prone, we won't be talking about it now, it would have been forgotten on the same shelf with a stack of punch cards. The rule of thumb is 'the right tool for the right job.' Threading model is very useful for long CPU-bound processing, as it can potentially take advantage of multiple CPUs/Cores (alas not in Python now because of GIL). The events will not work as well here. But note, if there is not much sharing of resources between threads processes could be used! It turns out that there are very few cases where threads are simply indispensable. The event model is usually well suited for I/O or for any large number of shared resources occurs that would require lots of synchronizations if threads would be used. DBMS' are not a good example of typical large, so 'saying see DBMS use threads -- therefore threads are better' doesn't make a good example. DBMS are highly optimized, only a few of them actually manage to successfully take advantage of the multiple execution units. One could as well cite a hundred of other projects and say 'see it uses an event model -- therefore event models are better' and so on. Again right tool for the right job. A good programmer should know both... And consequently, to use Twisted you rewrite all your code as those 'deferred' things. Then, try re-writing Twisted using threads in the same number of lines having the same or better performance. I bet you'll end up having a whole bunch of 'locks', 'waits' and 'notify's instead of a bunch of those 'deferred' things. Debugging all those threads should be a project in an of itself. -Nick [EMAIL PROTECTED] wrote: mark wrote: The debate should not be about threads vs processes, it should be about threads vs events. We are so lucky as to have both debates. Dr. John Ousterhout (creator of Tcl, Professor of Comp Sci at UC Berkeley, etc), started a famous debate about this 10 years ago with the following simple presentation. http://home.pacbell.net/ouster/threads.pdf The Ousterhout school finds multiple lines of execution unmanageable, while the Tannenbaum school finds asynchronous I/O unmanageable. What's so hard about single-line-of-control (SLOC) event-driven programming? You can't call anything that might block. You have to initiate the operation, store all the state you'll need in order to pick up where you left off, then return all the way back to the event dispatcher. That sentiment has largely been ignored and thread usage dominates but, if you have been programming for as long as I have, and have used both thread based architectures AND event/reactor/callback based architectures, then that simple presentation above should ring very true. Problem is, young people merely equate newer == better. Newer? They're both old as the trees. That can't be why the whiz kids like them. Threads and process rule because of their success. On large systems and over time, thread based architectures often tend towards chaos. While large SLOC event-driven systems surely tend to chaos. Why? Because they *must* be structured around where blocking operations can happen, and that is not the structure anyone would choose for clarity, maintainability and general chaos avoidance. Even the simplest of modular structures, the procedure, gets broken. Whether you can encapsulate a sequence of operations in a procedure depends upon whether it might need to do an operation that could block. Going farther, consider writing a class supporting overriding of some method. Easy; we Pythoneers do it all the time; that's what O.O. inheritance is all about. Now what if the subclass's version of the method needs to look up external data, and thus might block? How does a method override arrange for the call chain to return all the way back to the event loop, and to and pick up again with the same call chain when the I/O comes in? I have seen a few thread based systems where the programmers become so frustrated with subtle timing issues etc, and they eventually overlay so many mutexes etc, that the implementation becomes single threaded in practice anyhow(!), and very inefficient. While we simply do not see systems as complex as modern DBMS's written in the SLOC event-driven style. BTW, I am fairly new to python but I have seen that the python Twisted framework is a good example of the event/reactor design alternative to threads. See http://twistedmatrix.com/projects/core/documentation/howto/async.html . And consequently, to use Twisted you rewrite all your code as those 'deferred' things. -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list
Threads vs Processes
Alright, based a on discussion on this mailing list, I've started to wonder, why use threads vs processes. So, If I have a system that has a large area of shared memory, which would be better? I've been leaning towards threads, I'm going to say why. Processes seem fairly expensive from my research so far. Each fork copies the entire contents of memory into the new process. There's also a more expensive context switch between processes. So if I have a system that would fork 50+ child processes my memory usage would be huge and I burn more cycles that I don't have to. I understand that there are ways of IPC, but aren't these also more expensive? So threads seems faster and more efficient for this scenario. That alone makes me want to stay with threads, but I get the feeling from people on this list that processes are better and that threads are over used. I don't understand why, so can anyone shed any light on this? Thanks, -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
On Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote: Alright, based a on discussion on this mailing list, I've started to wonder, why use threads vs processes. So, If I have a system that has a large area of shared memory, which would be better? I've been leaning towards threads, I'm going to say why. Processes seem fairly expensive from my research so far. Each fork copies the entire contents of memory into the new process. There's also a more expensive context switch between processes. So if I have a system that would fork 50+ child processes my memory usage would be huge and I burn more cycles that I don't have to. I understand that there are ways of IPC, but aren't these also more expensive? So threads seems faster and more efficient for this scenario. That alone makes me want to stay with threads, but I get the feeling from people on this list that processes are better and that threads are over used. I don't understand why, so can anyone shed any light on this? Thanks, -carl Not quite that simple. In most modern OS's today there is something called COW - copy on write. What happens is when you fork a process it will make an identical copy. Whenever the forked process does write will it make a copy of the memory. So it isn't quite as bad. Secondly, with context switching if the OS is smart it might not flush the entire TLB. Since most applications are pretty local as far as execution goes, it might very well be the case the page (or pages) are already in memory. As far as Python goes what you need to determine is how much real parallelism you want. Since there is a global lock in Python you will only execute a few (as in tens) instructions before switching to the new thread. In the case of true process you have two independent Python virtual machines. That may make things go much faster. Another issue is the libraries you use. A lot of them aren't thread safe. So you need to watch out. Chance -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Chance Ginger wrote: On Wed, 26 Jul 2006 10:54:48 -0700, Carl J. Van Arsdall wrote: Alright, based a on discussion on this mailing list, I've started to wonder, why use threads vs processes. So, If I have a system that has a large area of shared memory, which would be better? I've been leaning towards threads, I'm going to say why. Processes seem fairly expensive from my research so far. Each fork copies the entire contents of memory into the new process. There's also a more expensive context switch between processes. So if I have a system that would fork 50+ child processes my memory usage would be huge and I burn more cycles that I don't have to. I understand that there are ways of IPC, but aren't these also more expensive? So threads seems faster and more efficient for this scenario. That alone makes me want to stay with threads, but I get the feeling from people on this list that processes are better and that threads are over used. I don't understand why, so can anyone shed any light on this? Thanks, -carl Not quite that simple. In most modern OS's today there is something called COW - copy on write. What happens is when you fork a process it will make an identical copy. Whenever the forked process does write will it make a copy of the memory. So it isn't quite as bad. Secondly, with context switching if the OS is smart it might not flush the entire TLB. Since most applications are pretty local as far as execution goes, it might very well be the case the page (or pages) are already in memory. As far as Python goes what you need to determine is how much real parallelism you want. Since there is a global lock in Python you will only execute a few (as in tens) instructions before switching to the new thread. In the case of true process you have two independent Python virtual machines. That may make things go much faster. Another issue is the libraries you use. A lot of them aren't thread safe. So you need to watch out. Chance It's all about performance (and sometimes the perception of performance). Eventhough the thread support (and performance) in Python is fairly weak (as explained by Chance), it's nonetheless very useful. My applications threads a lot and it proves to be invaluable - particularly with GUI type applications. I am the type of user that gets annoyed very quickly and easily if the program doesn't respond to me when I click something. So, as a rule of thumb, if the code has to do much of anything that takes say a tenth of a second or more, I thread. I posted a simple demo program yesterday to the Pythoncard list to show why somebody would want to thread an app. You can properly see it from archive. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Carl J. Van Arsdall [EMAIL PROTECTED] writes: Processes seem fairly expensive from my research so far. Each fork copies the entire contents of memory into the new process. No, you get two processes whose address spaces get the data. It's done with the virtual memory hardware. The data isn't copied. The page tables of both processes are just set up to point to the same physical pages. Copying only happens if a process writes to one of the pages. The OS detects this using a hardware trap from the VM system. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Another issue is the libraries you use. A lot of them aren't thread safe. So you need to watch out. This is something I have a streak of paranoia about (after discovering that the current xmlrpclib has some thread safety issues). Is there a list maintained anywhere of the modules that are aren't thread safe? Russ -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Paul Rubin wrote: Carl J. Van Arsdall [EMAIL PROTECTED] writes: Processes seem fairly expensive from my research so far. Each fork copies the entire contents of memory into the new process. No, you get two processes whose address spaces get the data. It's done with the virtual memory hardware. The data isn't copied. The page tables of both processes are just set up to point to the same physical pages. Copying only happens if a process writes to one of the pages. The OS detects this using a hardware trap from the VM system. Ah, alright. So if that's the case, why would you use python threads versus spawning processes? If they both point to the same address space and python threads can't run concurrently due to the GIL what are they good for? -c -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Oops - minor correction... xmlrpclib is fine (I think/hope). It is SimpleXMLRPCServer that currently has issues. It uses thread-unfriendly sys.exc_value and sys.exc_type... this is being corrected. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Carl J. Van Arsdall wrote: Alright, based a on discussion on this mailing list, I've started to wonder, why use threads vs processes. So, If I have a system that has a large area of shared memory, which would be better? I've been leaning towards threads, I'm going to say why. Processes seem fairly expensive from my research so far. Each fork copies the entire contents of memory into the new process. There's also a more expensive context switch between processes. So if I have a system that would fork 50+ child processes my memory usage would be huge and I burn more cycles that I don't have to. I understand that there are ways of IPC, but aren't these also more expensive? So threads seems faster and more efficient for this scenario. That alone makes me want to stay with threads, but I get the feeling from people on this list that processes are better and that threads are over used. I don't understand why, so can anyone shed any light on this? Thanks, -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software Carl, OS writers provide much more tools for debugging, tracing, changing the priority of, sand-boxing processes than threads (in general) It *should* be easier to get a process based solution up and running andhave it be more robust, when compared to a threaded solution. - Paddy (who shies away from threads in C and C++ too ;-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Carl, OS writers provide much more tools for debugging, tracing, changing the priority of, sand-boxing processes than threads (in general) It *should* be easier to get a process based solution up and running andhave it be more robust, when compared to a threaded solution. - Paddy (who shies away from threads in C and C++ too ;-) That mythical process is more robust then thread application paradigm again. No wonder there are so many boring software applications around. Granted. Threaded program forces you to think and design your application much more carefully (to avoid race conditions, dead-locks, ...) but there is nothing inherently *non-robust* about threaded applications. -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
On 2006-07-26 21:02:59, John Henry wrote: Granted. Threaded program forces you to think and design your application much more carefully (to avoid race conditions, dead-locks, ...) but there is nothing inherently *non-robust* about threaded applications. You just need to make sure that every piece of code you're using is thread-safe. While OTOH to make sure they are all process safe is the job of the OS, so to speak :) Gerhard -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
John Henry wrote: Carl, OS writers provide much more tools for debugging, tracing, changing the priority of, sand-boxing processes than threads (in general) It *should* be easier to get a process based solution up and running andhave it be more robust, when compared to a threaded solution. - Paddy (who shies away from threads in C and C++ too ;-) That mythical process is more robust then thread application paradigm again. No wonder there are so many boring software applications around. Granted. Threaded program forces you to think and design your application much more carefully (to avoid race conditions, dead-locks, ...) but there is nothing inherently *non-robust* about threaded applications. In this particular case, the OP (in a different thread) mentioned that his application will be extended by random individuals who can't necessarily be trusted to design their extensions correctly. In that case, segregating the untrusted code, at least, into separate processes seems prudent. The OP also mentioned that: If I have a system that has a large area of shared memory, which would be better? IMO, if you're going to be sharing data structures with code that can't be trusted to clean up after itself, you're doomed. There's just no way to make that scenario work reliably. The best you can do is insulate that data behind an API (rather than giving untrusted code direct access to the data -- IOW, don't use threads, because if you do, they can go around your API and screw things up), and ensure that each API call leaves the data structures in a consistent state. -- JK -- http://mail.python.org/mailman/listinfo/python-list
Re: Threads vs Processes
Carl J. Van Arsdall wrote: Alright, based a on discussion on this mailing list, I've started to wonder, why use threads vs processes. In many cases, you don't have a choice. If your Python program is to run other programs, the others get their own processes. There's no threads option on that. If multiple lines of execution need to share Python objects, then the standard Python distribution supports threads, while processes would require some heroic extension. Don't confuse sharing memory, which is now easy, with sharing Python objects, which is hard. So, If I have a system that has a large area of shared memory, which would be better? I've been leaning towards threads, I'm going to say why. Processes seem fairly expensive from my research so far. Each fork copies the entire contents of memory into the new process. As others have pointed out, not usually true with modern OS's. There's also a more expensive context switch between processes. So if I have a system that would fork 50+ child processes my memory usage would be huge and I burn more cycles that I don't have to. Again, not usually true. Modern OS's share code across processes. There's no way to tell the size of 100 unspecified processes, but the number is nothing special. So threads seems faster and more efficient for this scenario. That alone makes me want to stay with threads, but I get the feeling from people on this list that processes are better and that threads are over used. I don't understand why, so can anyone shed any light on this? Yes, someone can, and that someone might as well be you. How long does it take to create and clean up 100 trivial processes on your system? How about 100 threads? What portion of your user waiting time is that? -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list