Re: Thread scheduling
This may help. http://linuxgazette.net/107/pai.html Also be sure to google. search strategy: Python threading Python threads Python thread tutorial threading.py example Python threading example Python thread safety hth, M.E.Farmer -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread scheduling
Jack Orenstein wrote: I am using Python 2.2.2 on RH9, and just starting to work with Python threads. Is this also the first time you've worked with threads in general, or do you have much experience with them in other situations? This program seems to point to problems in Python thread scheduling. While from time to time bugs in Python are found, it's generally more productive to suspect one's own code. In any case, you wouldn't have posted it here if you didn't suspect your own code at least a bit, so kudos to you for that. :-) done = 0 def run(id): global done print 'thread %d: started' % id global counter for i in range(nCycles): counter += 1 if i % 1 == 0: print 'thread %d: i = %d, counter = %d' % (id, i, counter) print 'thread %d: leaving' % id done += 1 for i in range(nThreads): thread.start_new_thread(run, (i + 1,)) while done nThreads: time.sleep(1) print 'Still waiting, done = %d' % done print 'All threads have finished, counter = %d' % counter Without having tried to run your code, and without having studied it for long, I am going to point out something that is at the very least an inherent defect in your code, though you might not have used Python (or threads?) for long enough to realize why. Note that I don't know if this is the cause of your particular problem, just that it _is_ a bug. You've got two shared global variables, done and counter. Each of these is modified in a manner that is not thread-safe. I don't know if counter is causing trouble, but it seems likely that done is. Basically, the statement done += 1 is equivalent to the statement done = done + 1 which, in Python or most other languages is not thread-safe. The done + 1 part is evaluated separately from the assignment, so it's possible that two threads will be executing the done + 1 part at the same time and that the following assignment of one thread will be overwritten immediately by the assignment in the next thread, but with a value that is now one less than what you really wanted. Look at the bytecode produced by the statement done += 1: import dis def f(): ... global done ... done += 1 ... dis.dis(f) 3 0 LOAD_GLOBAL 0 (done) 3 LOAD_CONST 1 (1) 6 INPLACE_ADD 7 STORE_GLOBAL 0 (done) (ignore the last two lines: they just return None) 10 LOAD_CONST 0 (None) 13 RETURN_VALUE Note here the store_global that is separate from the add operation itself. If thread A gets loses the CPU (so to speak) just before that operation, and thread B executes the entire suite of operations, then later on when thread A executes that operation it will effectively result in only a single addition operation being performed, not two of them. If you really want to increment globals from the thread, you should look into locks. Using the threading module (as is generally recommended, instead of using thread), you would use threading.Lock(). There are other thread synchronization primitives in the threading module as well, some of which might be more suitable for your purposes. Note also the oft-repeated (in this forum) recommendation that if you simply use nothing but the Queue module for inter-thread communication, you will be very unlikely to stumble over such issues. -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread scheduling
Peter Hansen wrote: Jack Orenstein wrote: I am using Python 2.2.2 on RH9, and just starting to work with Python threads. Is this also the first time you've worked with threads in general, or do you have much experience with them in other situations? Yes, I've used threading in Java. You've got two shared global variables, done and counter. Each of these is modified in a manner that is not thread-safe. I don't know if counter is causing trouble, but it seems likely that done is. I understand that. As I said in my posting, The counter is incremented without locking -- I expect to see a final count of less than THREADS * COUNT. This is a test case, and I threw out more and more code, including synchronization around counter and done, until it got as simple as possible and still showed the problem. Basically, the statement done += 1 is equivalent to the statement done = done + 1 which, in Python or most other languages is not thread-safe. The done + 1 part is evaluated separately from the assignment, so it's possible that two threads will be executing the done + 1 part at the same time and that the following assignment of one thread will be overwritten immediately by the assignment in the next thread, but with a value that is now one less than what you really wanted. Understood. I was counting on this being unlikely for my test case. I realize this isn't something to rely on in real software. If you really want to increment globals from the thread, you should look into locks. Using the threading module (as is generally recommended, instead of using thread), you would use threading.Lock(). As my note said, I did start with the threading module. And variables updated by different threads were protected by threading.Condition variables. As I analyzed my test cases, and threading.py, I started suspecting thread scheduling. I then wrote the test case in my email, which does not rely on the threading module at all. The point of the test is not to maintain counter -- it's to show that sometimes even after one thread completes, the other thread never is scheduled again. This seems wrong. Try running the code, and let me see if you see this behavior. If you'd like, replace this: counter += 1 by this: time.sleep(0.01 * id) You should see the same problem. So that removes counter from the picture. And the two increments of done (one by each thread) are still almost certainly not going to coincide and cause a problem. Also, if you look at the output from the code on a hang, you will see that 'thread X: leaving' only prints once. This has nothing to do with what happens with the done variable. Jack -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread scheduling
Jack Orenstein wrote: Peter Hansen wrote: You've got two shared global variables, done and counter. Each of these is modified in a manner that is not thread-safe. I don't know if counter is causing trouble, but it seems likely that done is. I understand that. Basically, the statement done += 1 is equivalent to the statement done = done + 1 which, in Python or most other languages is not thread-safe. Understood. I was counting on this being unlikely for my test case. I realize this isn't something to rely on in real software. Hmm... okay. I may have been distracted by the fact that your termination condition is based on done incrementing properly, and that it was possible this wouldn't happen because of the race condition. So, if I understand you now, you're saying that the reason done doesn't increment is actually because one of the threads is never finishing properly, for some reason not related to the code itself. (?) The point of the test is not to maintain counter -- it's to show that sometimes even after one thread completes, the other thread never is scheduled again. This seems wrong. Try running the code, and let me see if you see this behavior. On my machines (one Py2.4 on WinXP, one Py2.3.4 on RH9.0) I don't see this behaviour. Across about fifty runs each. And the two increments of done (one by each thread) are still almost certainly not going to coincide and cause a problem. Also, if you look at the output from the code on a hang, you will see that 'thread X: leaving' only prints once. This has nothing to do with what happens with the done variable. Okay, I believe you. As I said, I hadn't taken the time to read through everything at first, jumping on an obvious bug related to the done variable not meeting your termination conditions. I can see that something else is likely to be causing this. One thing you might try is experimenting with sys.setcheckinterval(), just to see what effect it might have, if any. It's also possible there were some threading bugs in Py2.2 under Linux. Maybe you could repeat the test with a more recent version and see if you get different behaviour. (Not that that proves anything conclusively, but at least it might be a good solution for your immediate problem.) -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread scheduling
On my machines (one Py2.4 on WinXP, one Py2.3.4 on RH9.0) I don't see this behaviour. Across about fifty runs each. Thanks for trying this. One thing you might try is experimenting with sys.setcheckinterval(), just to see what effect it might have, if any. That does seem to have an impact. At 0, the problem was completely reproducible. At 100, I couldn't get it to occur. It's also possible there were some threading bugs in Py2.2 under Linux. Maybe you could repeat the test with a more recent version and see if you get different behaviour. (Not that that proves anything conclusively, but at least it might be a good solution for your immediate problem.) 2.3 (on the same machine) does seem better, even with setcheckinterval(0). Thanks for your suggestions. Can anyone with knowledge of Python internals comment on these results? (Look earlier in the thread for details. But basically, a very simple program with the thread module, running two threads, shows that on occasion, one thread finishes and the other never runs again. python2.3 seems better, as does python2.2 with sys.setcheckinterval(100).) Jack -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread scheduling
Jack Orenstein wrote: One thing you might try is experimenting with sys.setcheckinterval(), just to see what effect it might have, if any. That does seem to have an impact. At 0, the problem was completely reproducible. At 100, I couldn't get it to occur. If you try other values in between, can you directly affect the frequency of the failure? That would appear to suggest a race condition. It's also possible there were some threading bugs in Py2.2 under Linux. Maybe you could repeat the test with a more recent version and see if you get different behaviour. (Not that that proves anything conclusively, but at least it might be a good solution for your immediate problem.) 2.3 (on the same machine) does seem better, even with setcheckinterval(0). The default check interval was changed from 10 in version 2.2 and earlier to 100 in version 2.3. (See http://www.python.org/2.3/highlights.html for details.) On the other hand, with version 2.3.4 under RH9, I tried values of 10 and 1 with no failures at any time. This might still be an issue with your own particular system, so having others try it out might be helpful... -Peter -- http://mail.python.org/mailman/listinfo/python-list