Re: Please help with Threading
On Saturday, 18 May 2013 10:58:13 UTC+2, Jurgens de Bruin wrote: This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works. What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in parallel. I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%. My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm Any help would be create!!! Thanks to all for the discussion/comments on threading, although I have not been commenting I have been following. I have learnt a lot and I am still reading up on everything mentioned. Thanks again Will see how I am going to solve my senario. -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 18 May 2013 20:33, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: Python threads work fine if the threads either rely on intelligent DLLs for number crunching (instead of doing nested Python loops to process a numeric array you pass it to something like NumPy which releases the GIL while crunching a copy of the array) or they do lots of I/O and have to wait for I/O devices (while one thread is waiting for the write/read operation to complete, another thread can do some number crunching). Has nobody thought of a context manager to allow a part of your code to free up the GIL? I think the GIL is not inherently bad, but if it poses a problem at times, there should be a way to get it out of your... Way. -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 20May2013 07:25, Fábio Santos fabiosantos...@gmail.com wrote: | On 18 May 2013 20:33, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: | Python threads work fine if the threads either rely on intelligent | DLLs for number crunching (instead of doing nested Python loops to | process a numeric array you pass it to something like NumPy which | releases the GIL while crunching a copy of the array) or they do lots of | I/O and have to wait for I/O devices (while one thread is waiting for | the write/read operation to complete, another thread can do some number | crunching). | | Has nobody thought of a context manager to allow a part of your code to | free up the GIL? I think the GIL is not inherently bad, but if it poses a | problem at times, there should be a way to get it out of your... Way. The GIL makes individual python operations thread safe by never running two at once. This makes the implementation of the operations simpler, faster and safer. It is probably totally infeasible to write meaningful python code inside your suggested context manager that didn't rely on the GIL; if the GIL were not held the code would be unsafe. It is easy for a C extension to release the GIL, and then to do meaningful work until it needs to return to python land. Most C extensions will do that around non-trivial sections, and anything that may stall in the OS. So your use case for the context manager doesn't fit well. -- Cameron Simpson c...@zip.com.au Gentle suggestions being those which are written on rocks of less than 5lbs. - Tracy Nelson in comp.lang.c -- http://mail.python.org/mailman/listinfo/python-list
RE: Please help with Threading
Date: Sun, 19 May 2013 13:10:36 +1000 From: c...@zip.com.au To: carlosnepomuc...@outlook.com CC: python-list@python.org Subject: Re: Please help with Threading On 19May2013 03:02, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote: | Just been told that GIL doesn't make things slower, but as I | didn't know that such a thing even existed I went out looking for | more info and found that document: | http://www.dabeaz.com/python/UnderstandingGIL.pdf | | Is it current? I didn't know Python threads aren't preemptive. | Seems to be something really old considering the state of the art | on parallel execution on multi-cores. | What's the catch on making Python threads preemptive? Are there any ongoing projects to make that? Depends what you mean by preemptive. If you have multiple CPU bound pure Python threads they will all get CPU time without any of them explicitly yeilding control. But thread switching happens between python instructions, mediated by the interpreter. I meant operating system preemptive. I've just checked and Python does not start Windows threads. The standard answers for using multiple cores is to either run multiple processes (either explicitly spawning other executables, or spawning child python processes using the multiprocessing module), or to use (as suggested) libraries that can do the compute intensive bits themselves, releasing the while doing so so that the Python interpreter can run other bits of your python code. I've just discovered the multiprocessing module[1] and will make some tests with it later. Are there any other modules for that purpose? I've found the following articles about Python threads. Any suggestions? http://www.ibm.com/developerworks/aix/library/au-threadingpython/ http://pymotw.com/2/threading/index.html http://www.laurentluce.com/posts/python-threads-synchronization-locks-rlocks-semaphores-conditions-events-and-queues/ [1] http://docs.python.org/2/library/multiprocessing.html Plenty of OS system calls (and calls to other libraries from the interpreter) release the GIL during the call. Other python threads can run during that window. And there are other Python implementations other than CPython. Cheers, -- Cameron Simpson c...@zip.com.au Processes are like potatoes. - NCR device driver manual -- http://mail.python.org/mailman/listinfo/python-list
RE: Please help with Threading
Date: Mon, 20 May 2013 17:45:14 +1000 From: c...@zip.com.au To: fabiosantos...@gmail.com Subject: Re: Please help with Threading CC: python-list@python.org; wlfr...@ix.netcom.com On 20May2013 07:25, Fábio Santos fabiosantos...@gmail.com wrote: | On 18 May 2013 20:33, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: | Python threads work fine if the threads either rely on intelligent | DLLs for number crunching (instead of doing nested Python loops to | process a numeric array you pass it to something like NumPy which | releases the GIL while crunching a copy of the array) or they do lots of | I/O and have to wait for I/O devices (while one thread is waiting for | the write/read operation to complete, another thread can do some number | crunching). | | Has nobody thought of a context manager to allow a part of your code to | free up the GIL? I think the GIL is not inherently bad, but if it poses a | problem at times, there should be a way to get it out of your... Way. The GIL makes individual python operations thread safe by never running two at once. This makes the implementation of the operations simpler, faster and safer. It is probably totally infeasible to write meaningful python code inside your suggested context manager that didn't rely on the GIL; if the GIL were not held the code would be unsafe. I just got my hands dirty trying to synchronize Python prints from many threads. Sometimes they mess up when printing the newlines. I tried several approaches using threading.Lock and Condition. None of them worked perfectly and all of them made the code sluggish. Is there a 100% sure method to make print thread safe? Can it be fast??? It is easy for a C extension to release the GIL, and then to do meaningful work until it needs to return to python land. Most C extensions will do that around non-trivial sections, and anything that may stall in the OS. So your use case for the context manager doesn't fit well. -- Cameron Simpson c...@zip.com.au Gentle suggestions being those which are written on rocks of less than 5lbs. - Tracy Nelson in comp.lang.c -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
My use case was a tight loop processing an image pixel by pixel, or crunching a CSV file. If it only uses local variables (and probably hold a lock before releasing the GIL) it should be safe, no? My idea is that it's a little bad to have to write C or use multiprocessing just to do simultaneous calculations. I think an application using a reactor loop such as twisted would actually benefit from this. Sure, it will be slower than a C implementation of the same loop, but isn't fast prototyping a very important feature of the Python language? On 20 May 2013 08:45, Cameron Simpson c...@zip.com.au wrote: On 20May2013 07:25, Fábio Santos fabiosantos...@gmail.com wrote: | On 18 May 2013 20:33, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: | Python threads work fine if the threads either rely on intelligent | DLLs for number crunching (instead of doing nested Python loops to | process a numeric array you pass it to something like NumPy which | releases the GIL while crunching a copy of the array) or they do lots of | I/O and have to wait for I/O devices (while one thread is waiting for | the write/read operation to complete, another thread can do some number | crunching). | | Has nobody thought of a context manager to allow a part of your code to | free up the GIL? I think the GIL is not inherently bad, but if it poses a | problem at times, there should be a way to get it out of your... Way. The GIL makes individual python operations thread safe by never running two at once. This makes the implementation of the operations simpler, faster and safer. It is probably totally infeasible to write meaningful python code inside your suggested context manager that didn't rely on the GIL; if the GIL were not held the code would be unsafe. It is easy for a C extension to release the GIL, and then to do meaningful work until it needs to return to python land. Most C extensions will do that around non-trivial sections, and anything that may stall in the OS. So your use case for the context manager doesn't fit well. -- Cameron Simpson c...@zip.com.au Gentle suggestions being those which are written on rocks of less than 5lbs. - Tracy Nelson in comp.lang.c -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 20May2013 10:53, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote: | I just got my hands dirty trying to synchronize Python prints from many threads. | Sometimes they mess up when printing the newlines. | I tried several approaches using threading.Lock and Condition. | None of them worked perfectly and all of them made the code sluggish. Show us some code, with specific complaints. Did you try this? _lock = Lock() def lprint(*a, **kw): global _lock with _lock: print(*a, **kw) and use lprint() everywhere? For generality the lock should be per file: the above hack uses one lock for any file, so that's going to stall overlapping prints to different files; inefficient. There are other things than the above, but at least individual prints will never overlap. If you have interleaved prints, show us. | Is there a 100% sure method to make print thread safe? Can it be fast??? Depends on what you mean by fast. It will be slower than code with no lock; how much would require measurement. Cheers, -- Cameron Simpson c...@zip.com.au My own suspicion is that the universe is not only queerer than we suppose, but queerer than we *can* suppose. - J.B.S. Haldane On Being the Right Size in the (1928) book Possible Worlds -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote: _lock = Lock() def lprint(*a, **kw): global _lock with _lock: print(*a, **kw) and use lprint() everywhere? Fun little hack: def print(*args,print=print,lock=Lock(),**kwargs): with lock: print(*args,**kwargs) Question: Is this a cool use or a horrible abuse of the scoping rules? ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
It is pretty cool although it looks like a recursive function at first ;) On 20 May 2013 10:13, Chris Angelico ros...@gmail.com wrote: On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote: _lock = Lock() def lprint(*a, **kw): global _lock with _lock: print(*a, **kw) and use lprint() everywhere? Fun little hack: def print(*args,print=print,lock=Lock(),**kwargs): with lock: print(*args,**kwargs) Question: Is this a cool use or a horrible abuse of the scoping rules? ChrisA -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 20May2013 19:09, Chris Angelico ros...@gmail.com wrote: | On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote: |_lock = Lock() | |def lprint(*a, **kw): | global _lock | with _lock: |print(*a, **kw) | | and use lprint() everywhere? | | Fun little hack: | | def print(*args,print=print,lock=Lock(),**kwargs): | with lock: | print(*args,**kwargs) | | Question: Is this a cool use or a horrible abuse of the scoping rules? I carefully avoided monkey patching print itself:-) That's... mad! I can see what the end result is meant to be, but it looks like a debugging nightmare. Certainly my scoping-fu is too weak to see at a glance how it works. -- Cameron Simpson c...@zip.com.au I will not do it as a hack I will not do it for my friends I will not do it on a MacI will not write for Uncle Sam I will not do it on weekends I won't do ADA, Sam-I-Am - Gregory Bond g...@bby.com.au -- http://mail.python.org/mailman/listinfo/python-list
RE: Please help with Threading
Date: Mon, 20 May 2013 18:35:20 +1000 From: c...@zip.com.au To: carlosnepomuc...@outlook.com CC: python-list@python.org Subject: Re: Please help with Threading On 20May2013 10:53, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote: | I just got my hands dirty trying to synchronize Python prints from many threads. | Sometimes they mess up when printing the newlines. | I tried several approaches using threading.Lock and Condition. | None of them worked perfectly and all of them made the code sluggish. Show us some code, with specific complaints. Did you try this? _lock = Lock() def lprint(*a, **kw): global _lock with _lock: print(*a, **kw) and use lprint() everywhere? It works! Think I was running the wrong script... Anyway, the suggestion you've made is the third and latest attempt that I've tried to synchronize the print outputs from the threads. I've also used: ### 1st approach ### lock = threading.Lock() [...] try: lock.acquire() [thread protected code] finally: lock.release() ### 2nd approach ### cond = threading.Condition() [...] try: [thread protected code] with cond: print '[...]' ### 3rd approach ### from __future__ import print_function def safe_print(*args, **kwargs): global print_lock with print_lock: print(*args, **kwargs) [...] try: [thread protected code] safe_print('[...]') Except for the first one all kind of have the same performance. The problem was I placed the acquire/release around the whole code block, instead of only the print statements. Thanks a lot! ;) For generality the lock should be per file: the above hack uses one lock for any file, so that's going to stall overlapping prints to different files; inefficient. There are other things than the above, but at least individual prints will never overlap. If you have interleaved prints, show us. | Is there a 100% sure method to make print thread safe? Can it be fast??? Depends on what you mean by fast. It will be slower than code with no lock; how much would require measurement. Cheers, -- Cameron Simpson c...@zip.com.au My own suspicion is that the universe is not only queerer than we suppose, but queerer than we *can* suppose. - J.B.S. Haldane On Being the Right Size in the (1928) book Possible Worlds -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On Mon, May 20, 2013 at 7:54 PM, Cameron Simpson c...@zip.com.au wrote: On 20May2013 19:09, Chris Angelico ros...@gmail.com wrote: | On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote: |_lock = Lock() | |def lprint(*a, **kw): | global _lock | with _lock: |print(*a, **kw) | | and use lprint() everywhere? | | Fun little hack: | | def print(*args,print=print,lock=Lock(),**kwargs): | with lock: | print(*args,**kwargs) | | Question: Is this a cool use or a horrible abuse of the scoping rules? I carefully avoided monkey patching print itself:-) That's... mad! I can see what the end result is meant to be, but it looks like a debugging nightmare. Certainly my scoping-fu is too weak to see at a glance how it works. Hehe. Like I said, could easily be called abuse. Referencing a function's own name in a default has to have one of these interpretations: 1) It's a self-reference, which can be used to guarantee recursion even if the name is rebound 2) It references whatever previously held that name before this def statement. Either would be useful. Python happens to follow #2; though I can't point to any piece of specification that mandates that, so all I can really say is that CPython 3.3 appears to follow #2. But both interpretations make sense, and both would be of use, and use of either could be called abusive of the rules. Figure that out. :) The second defaulted argument (lock=Lock()), of course, is a common idiom. No abuse there, that's pretty Pythonic. This same sort of code could be done as a decorator: def serialize(fn): lock=Lock() def locked(*args,**kw): with lock: fn(*args,**kw) return locked print=serialize(print) Spelled like this, it's obvious that the argument to serialize has to be the previous 'print'. The other notation achieves the same thing, just in a quirkier way :) ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 5/20/2013 6:09 AM, Chris Angelico wrote: Referencing a function's own name in a default has to have one of these interpretations: 1) It's a self-reference, which can be used to guarantee recursion even if the name is rebound 2) It references whatever previously held that name before this def statement. The meaning must be #2. A def statement is nothing more than a fancy assignment statement. This: def foo(a): return a + 1 is really just the same as: foo = lambda a: a+1 (in fact, they compile to identical bytecode). More complex def's don't have equivalent lambdas, but are still assignments to the name of the function. So your apparently recursive print function is no more ambiguous x = x + 1. The x on the right hand side is the old value of x, the x on the left hand side will be the new value of x. # Each of these updates a name x = x + 1 def print(*args,print=print,lock=Lock(),**kwargs): with lock: print(*args,**kwargs) Of course, if you're going to use that code, a comment might be in order to help the next reader through the trickiness... --Ned. -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 05/20/2013 03:55 AM, Fábio Santos wrote: My use case was a tight loop processing an image pixel by pixel, or crunching a CSV file. If it only uses local variables (and probably hold a lock before releasing the GIL) it should be safe, no? Are you making function calls, using system libraries, or creating or deleting any objects? All of these use the GIL because they use common data structures shared among all threads. At the lowest level, creating an object requires locked access to the memory manager. Don't forget, the GIL gets used much more for Python internals than it does for the visible stuff. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
=On Mon, May 20, 2013 at 8:46 PM, Ned Batchelder n...@nedbatchelder.com wrote: On 5/20/2013 6:09 AM, Chris Angelico wrote: Referencing a function's own name in a default has to have one of these interpretations: 1) It's a self-reference, which can be used to guarantee recursion even if the name is rebound 2) It references whatever previously held that name before this def statement. The meaning must be #2. A def statement is nothing more than a fancy assignment statement. Sure, but the language could have been specced up somewhat differently, with the same syntax. I was fairly confident that this would be universally true (well, can't do it with 'print' per se in older Pythons, but for others); my statement about CPython 3.3 was just because I hadn't actually hunted down specification proof. So your apparently recursive print function is no more ambiguous x = x + 1. The x on the right hand side is the old value of x, the x on the left hand side will be the new value of x. # Each of these updates a name x = x + 1 def print(*args,print=print,lock=Lock(),**kwargs): with lock: print(*args,**kwargs) Yeah. The decorator example makes that fairly clear. Of course, if you're going to use that code, a comment might be in order to help the next reader through the trickiness... Absolutely!! ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
I didn't know that. On 20 May 2013 12:10, Dave Angel da...@davea.name wrote: Are you making function calls, using system libraries, or creating or deleting any objects? All of these use the GIL because they use common data structures shared among all threads. At the lowest level, creating an object requires locked access to the memory manager. Don't forget, the GIL gets used much more for Python internals than it does for the visible stuff. I did not know that. It's both interesting and somehow obvious, although I didn't know it yet. -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
Chris Angelico於 2013年5月20日星期一UTC+8下午5時09分13秒寫道: On Mon, May 20, 2013 at 6:35 PM, Cameron Simpson c...@zip.com.au wrote: _lock = Lock() def lprint(*a, **kw): global _lock with _lock: print(*a, **kw) and use lprint() everywhere? Fun little hack: def print(*args,print=print,lock=Lock(),**kwargs): with lock: print(*args,**kwargs) Question: Is this a cool use or a horrible abuse of the scoping rules? ChrisA OK, if the python interpreter has a global hiden print out buffer of ,say, 2to 16 K bytes, and all string print functions just construct the output string from the format to this string in an efficient low level way, then the next question would be that whether the uses can use functions in this low level buffer for other string formatting jobs. -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On Tue, May 21, 2013 at 11:44 AM, 8 Dihedral dihedral88...@googlemail.com wrote: OK, if the python interpreter has a global hiden print out buffer of ,say, 2to 16 K bytes, and all string print functions just construct the output string from the format to this string in an efficient low level way, then the next question would be that whether the uses can use functions in this low level buffer for other string formatting jobs. You remind me of George. http://www.chroniclesofgeorge.com/ Both make great reading when I'm at work and poking around with random stuff in our .SQL file of carefully constructed mayhem. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
RE: Please help with Threading
sys.stdout.write() does not suffer from the newlines mess up when printing from many threads, like print statement does. The only usage difference, AFAIK, is to add '\n' at the end of the string. It's faster and thread safe (really?) by default. BTW, why I didn't find the source code to the sys module in the 'Lib' directory? Date: Tue, 21 May 2013 11:50:17 +1000 Subject: Re: Please help with Threading From: ros...@gmail.com To: python-list@python.org On Tue, May 21, 2013 at 11:44 AM, 8 Dihedral dihedral88...@googlemail.com wrote: OK, if the python interpreter has a global hiden print out buffer of ,say, 2to 16 K bytes, and all string print functions just construct the output string from the format to this string in an efficient low level way, then the next question would be that whether the uses can use functions in this low level buffer for other string formatting jobs. You remind me of George. http://www.chroniclesofgeorge.com/ Both make great reading when I'm at work and poking around with random stuff in our .SQL file of carefully constructed mayhem. ChrisA -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
RE: Please help with Threading
On Tue, May 21, 2013 at 11:44 AM, 8 Dihedral dihedral88...@googlemail.com wrote: OK, if the python interpreter has a global hiden print out buffer of ,say, 2to 16 K bytes, and all string print functions just construct the output string from the format to this string in an efficient low level way, then the next question would be that whether the uses can use functions in this low level buffer for other string formatting jobs. You remind me of George. http://www.chroniclesofgeorge.com/ Both make great reading when I'm at work and poking around with random stuff in our .SQL file of carefully constructed mayhem. ChrisA lol I need more cowbell!!! Please!!! lol -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On Tue, 21 May 2013 05:53:46 +0300, Carlos Nepomuceno wrote: BTW, why I didn't find the source code to the sys module in the 'Lib' directory? Because sys is a built-in module. It is embedded in the Python interpreter. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On Mon, May 20, 2013 at 7:46 AM, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico ros...@gmail.com declaimed the following in gmane.comp.python.general: With interpreted code eg in CPython, it's easy to implement preemption in the interpreter. I don't know how it's actually done, but one easy implementation would be every N bytecode instructions, context switch. It's still done at a lower level than user code (N bytecode Which IS how the common Python interpreter does it -- barring the thread making some system call that triggers a preemption ahead of time (even time.sleep(0.0) triggers scheduling). Forget if the default is 20 or 100 byte-code instructions -- as I recall, it DID change a few versions back. Incidentally, is the context-switch check the same as the check for interrupt signal raising KeyboardInterrupt? ISTR that was another every N instructions check. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 05/19/2013 05:46 PM, Dennis Lee Bieber wrote: On Sun, 19 May 2013 10:38:14 +1000, Chris Angelico ros...@gmail.com declaimed the following in gmane.comp.python.general: On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote: I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores. What's the catch on making Python threads preemptive? Are there any ongoing projects to make that? snip With interpreted code eg in CPython, it's easy to implement preemption in the interpreter. I don't know how it's actually done, but one easy implementation would be every N bytecode instructions, context switch. It's still done at a lower level than user code (N bytecode Which IS how the common Python interpreter does it -- barring the thread making some system call that triggers a preemption ahead of time (even time.sleep(0.0) triggers scheduling). Forget if the default is 20 or 100 byte-code instructions -- as I recall, it DID change a few versions back. Part of the context switch is to transfer the GIL from the preempted thread to the new thread. So, overall, on a SINGLE CORE processor running multiple CPU bound threads takes a bit longer just due to the overhead of thread swapping. On a multi-core processor, the effect is the same, since -- even though one may have a thread running on each core -- the GIL is only assigned to one thread, and other threads get blocked when trying to access runtime data structures. And you may have even more overhead from processor cache misses if the a thread gets assigned to a different core. (yes -- I'm restating the same thing as I had just trimmed below this point... but the target is really the OP, where repetition may be helpful in understanding) So what's the mapping between real (OS) threads, and the fake ones Python uses? The OS keeps track of a separate stack and context for each thread it knows about; are they one-to-one with the ones you're describing here? If so, then any OS thread that gets scheduled will almost always find it can't get the GIL, and spend time thrashing. But the change that CPython does intentionally would be equivalent to a sleep(0). On the other hand, if these threads are distinct from the OS threads, is it done with some sort of thread pool, where CPython has its own stack, and doesn't really use the one managed by the OS? Understand the only OS threading I really understand is the one in Windows (which I no longer use). So assuming Linux has some form of lightweight threading, the distinction above may not map very well. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
Jurgens de Bruin wrote: This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works. What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in parallel. I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%. My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm Any help would be create!!! Can you show us the code? -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
I will post code - the entire scripts is 1000 lines of code - can I post the threading functions only? -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
Jurgens de Bruin wrote: I will post code - the entire scripts is 1000 lines of code - can I post the threading functions only? Try to condense it to the relevant parts, but make sure that it can be run by us. As a general note, when you add new stuff to an existing longish script it is always a good idea to write it in such a way that you can test it standalone so that you can have some confidence that it will work as designed once you integrate it with your old code. -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 05/18/2013 04:58 AM, Jurgens de Bruin wrote: This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works. What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in parallel. I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%. My thread creating is based on the following : http://www.tutorialspoint.com/python/python_multithreading.htm Any help would be create!!! CPython, and apparently (all of?) the other current Python implementations, uses a GIL to prevent multi-threaded applications from shooting themselves in the foot. However the practical effect of the GIL is that CPU-bound applications do not multi-thread efficiently; the single-threaded version usually runs faster. The place where CPython programs gain from multithreading is where each thread spends much of its time waiting for some external trigger. (More specifically, if such a wait is inside well-written C code, it releases the GIL so other threads can get useful work done. Example is a thread waiting for internet activity, and blocks inside a system call) -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
RE: Please help with Threading
To: python-list@python.org From: wlfr...@ix.netcom.com Subject: Re: Please help with Threading Date: Sat, 18 May 2013 15:28:56 -0400 On Sat, 18 May 2013 01:58:13 -0700 (PDT), Jurgens de Bruin debrui...@gmail.com declaimed the following in gmane.comp.python.general: This is my first script where I want to use the python threading module. I have a large dataset which is a list of dict this can be as much as 200 dictionaries in the list. The final goal is a histogram for each dict 16 histograms on a page ( 4x4 ) - this already works. What I currently do is a create a nested list [ [ {} ], [ {} ] ] each inner list contains 16 dictionaries, thus each inner list is a single page of 16 histograms. Iterating over the outer-list and creating the graphs takes to long. So I would like multiple inner-list to be processes simultaneously and creating the graphs in parallel. I am trying to use the python threading for this. I create 4 threads loop over the outer-list and send a inner-list to the thread. This seems to work if my nested lists only contains 2 elements - thus less elements than threads. Currently the scripts runs and then seems to get hung up. I monitor the resource on my mac and python starts off good using 80% and when the 4-thread is created the CPU usages drops to 0%. The odds are good that this is just going to run slower... Just been told that GIL doesn't make things slower, but as I didn't know that such a thing even existed I went out looking for more info and found that document: http://www.dabeaz.com/python/UnderstandingGIL.pdf Is it current? I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores. What's the catch on making Python threads preemptive? Are there any ongoing projects to make that? One: The common Python implementation uses a global interpreter lock to prevent interpreted code from interfering with itself in multiple threads. So number cruncher applications don't gain any speed from being partitioned into thread -- even on a multicore processor, only one thread can have the GIL at a time. On top of that, you have the overhead of the interpreter switching between threads (GIL release on one thread, GIL acquire for the next thread). Python threads work fine if the threads either rely on intelligent DLLs for number crunching (instead of doing nested Python loops to process a numeric array you pass it to something like NumPy which releases the GIL while crunching a copy of the array) or they do lots of I/O and have to wait for I/O devices (while one thread is waiting for the write/read operation to complete, another thread can do some number crunching). If you really need to do this type of number crunching in Python level code, you'll want to look into the multiprocessing library instead. That will create actual OS processes (each with a copy of the interpreter, and not sharing memory) and each of those can run on a core without conflicting on the GIL. Which library do you suggest? -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.com HTTP://wlfraed.home.netcom.com/ -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On Sun, May 19, 2013 at 10:02 AM, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote: I didn't know Python threads aren't preemptive. Seems to be something really old considering the state of the art on parallel execution on multi-cores. What's the catch on making Python threads preemptive? Are there any ongoing projects to make that? Preemption isn't really the issue here. On the C level, preemptive vs cooperative usually means the difference between a stalled thread locking everyone else out and not doing so. Preemption is done at a lower level than user code (eg the operating system or the CPU), meaning that user code can't retain control of the CPU. With interpreted code eg in CPython, it's easy to implement preemption in the interpreter. I don't know how it's actually done, but one easy implementation would be every N bytecode instructions, context switch. It's still done at a lower level than user code (N bytecode instructions might all actually be a single tight loop that the programmer didn't realize was infinite), but it's not at the OS level. But none of that has anything to do with multiple core usage. The problem there is that shared data structures need to be accessed simultaneously, and in CPython, there's a Global Interpreter Lock to simplify that; but the consequence of the GIL is that no two threads can simultaneously execute user-level code. There have been GIL-removal proposals at various times, but the fact remains that a global lock makes a huge amount of sense and gives pretty good performance across the board. There's always multiprocessing when you need multiple CPU-bound threads; it's an explicit way to separate the shared data (what gets transferred) from local (what doesn't). ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: Please help with Threading
On 19May2013 03:02, Carlos Nepomuceno carlosnepomuc...@outlook.com wrote: | Just been told that GIL doesn't make things slower, but as I | didn't know that such a thing even existed I went out looking for | more info and found that document: | http://www.dabeaz.com/python/UnderstandingGIL.pdf | | Is it current? I didn't know Python threads aren't preemptive. | Seems to be something really old considering the state of the art | on parallel execution on multi-cores. | What's the catch on making Python threads preemptive? Are there any ongoing projects to make that? Depends what you mean by preemptive. If you have multiple CPU bound pure Python threads they will all get CPU time without any of them explicitly yeilding control. But thread switching happens between python instructions, mediated by the interpreter. The standard answers for using multiple cores is to either run multiple processes (either explicitly spawning other executables, or spawning child python processes using the multiprocessing module), or to use (as suggested) libraries that can do the compute intensive bits themselves, releasing the while doing so so that the Python interpreter can run other bits of your python code. Plenty of OS system calls (and calls to other libraries from the interpreter) release the GIL during the call. Other python threads can run during that window. And there are other Python implementations other than CPython. Cheers, -- Cameron Simpson c...@zip.com.au Processes are like potatoes.- NCR device driver manual -- http://mail.python.org/mailman/listinfo/python-list