Re: Thread scheduling

2005-02-26 Thread M.E.Farmer
This may help.
http://linuxgazette.net/107/pai.html
Also be sure to google.
search strategy:
Python threading
Python threads
Python thread tutorial
threading.py example
Python threading example
Python thread safety
hth,
M.E.Farmer

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Thread scheduling

2005-02-26 Thread Peter Hansen
Jack Orenstein wrote:
I am using Python 2.2.2 on RH9, and just starting to work with Python
threads.
Is this also the first time you've worked with threads in general,
or do you have much experience with them in other situations?
This program seems to point to problems in Python thread scheduling.
While from time to time bugs in Python are found, it's generally
more productive to suspect one's own code.  In any case, you
wouldn't have posted it here if you didn't suspect your own
code at least a bit, so kudos to you for that. :-)
done = 0
def run(id):
global done
print 'thread %d: started' % id
global counter
for i in range(nCycles):
counter += 1
if i % 1 == 0:
print 'thread %d: i = %d, counter = %d' % (id, i, counter)
print 'thread %d: leaving' % id
done += 1
for i in range(nThreads):
thread.start_new_thread(run, (i + 1,))
while done  nThreads:
time.sleep(1)
print 'Still waiting, done = %d' % done
print 'All threads have finished, counter = %d' % counter
Without having tried to run your code, and without having studied
it for long, I am going to point out something that is at the
very least an inherent defect in your code, though you might
not have used Python (or threads?) for long enough to realize
why.  Note that I don't know if this is the cause of your
particular problem, just that it _is_ a bug.
You've got two shared global variables, done and counter.
Each of these is modified in a manner that is not thread-safe.
I don't know if counter is causing trouble, but it seems
likely that done is.
Basically, the statement done += 1 is equivalent to the
statement done = done + 1 which, in Python or most other
languages is not thread-safe.  The done + 1 part is
evaluated separately from the assignment, so it's possible
that two threads will be executing the done + 1 part
at the same time and that the following assignment of
one thread will be overwritten immediately by the assignment
in the next thread, but with a value that is now one less
than what you really wanted.
Look at the bytecode produced by the statement done += 1:
 import dis
 def f():
...   global done
...   done += 1
...
 dis.dis(f)
  3   0 LOAD_GLOBAL  0 (done)
  3 LOAD_CONST   1 (1)
  6 INPLACE_ADD
  7 STORE_GLOBAL 0 (done)
(ignore the last two lines: they just return None)
 10 LOAD_CONST   0 (None)
 13 RETURN_VALUE
Note here the store_global that is separate from the add
operation itself.  If thread A gets loses the CPU (so to speak)
just before that operation, and thread B executes the entire
suite of operations, then later on when thread A executes
that operation it will effectively result in only a single
addition operation being performed, not two of them.
If you really want to increment globals from the thread, you
should look into locks.  Using the threading module (as is
generally recommended, instead of using thread), you would
use threading.Lock().  There are other thread synchronization
primitives in the threading module as well, some of which
might be more suitable for your purposes.  Note also the
oft-repeated (in this forum) recommendation that if you simply
use nothing but the Queue module for inter-thread communication,
you will be very unlikely to stumble over such issues.
-Peter
--
http://mail.python.org/mailman/listinfo/python-list


Re: Thread scheduling

2005-02-26 Thread Jack Orenstein
Peter Hansen wrote:
 Jack Orenstein wrote:

 I am using Python 2.2.2 on RH9, and just starting to work with Python
 threads.


 Is this also the first time you've worked with threads in general,
 or do you have much experience with them in other situations?
Yes, I've used threading in Java.
 You've got two shared global variables, done and counter.
 Each of these is modified in a manner that is not thread-safe.
 I don't know if counter is causing trouble, but it seems
 likely that done is.
I understand that. As I said in my posting, The counter is
incremented without locking -- I expect to see a final count of less
than THREADS * COUNT. This is a test case, and I threw out more and
more code, including synchronization around counter and done, until it
got as simple as possible and still showed the problem.
 Basically, the statement done += 1 is equivalent to the
 statement done = done + 1 which, in Python or most other
 languages is not thread-safe.  The done + 1 part is
 evaluated separately from the assignment, so it's possible
 that two threads will be executing the done + 1 part
 at the same time and that the following assignment of
 one thread will be overwritten immediately by the assignment
 in the next thread, but with a value that is now one less
 than what you really wanted.
Understood. I was counting on this being unlikely for my test
case. I realize this isn't something to rely on in real software.
 If you really want to increment globals from the thread, you
 should look into locks.  Using the threading module (as is
 generally recommended, instead of using thread), you would
 use threading.Lock().
As my note said, I did start with the threading module. And variables
updated by different threads were protected by threading.Condition
variables. As I analyzed my test cases, and threading.py, I started
suspecting thread scheduling.  I then wrote the test case in my email,
which does not rely on the threading module at all. The point of the
test is not to maintain counter -- it's to show that sometimes even
after one thread completes, the other thread never is scheduled
again. This seems wrong. Try running the code, and let me see if you
see this behavior.
If you'd like, replace this:
counter += 1
by this:
time.sleep(0.01 * id)
You should see the same problem. So that removes counter from the
picture. And the two increments of done (one by each thread) are still
almost certainly not going to coincide and cause a problem. Also, if
you look at the output from the code on a hang, you will see that
'thread X: leaving' only prints once. This has nothing to do with what
happens with the done variable.
Jack
--
http://mail.python.org/mailman/listinfo/python-list


Re: Thread scheduling

2005-02-26 Thread Peter Hansen
Jack Orenstein wrote:
Peter Hansen wrote:
  You've got two shared global variables, done and counter.
  Each of these is modified in a manner that is not thread-safe.
  I don't know if counter is causing trouble, but it seems
  likely that done is.
I understand that. 

  Basically, the statement done += 1 is equivalent to the
  statement done = done + 1 which, in Python or most other
  languages is not thread-safe.  

Understood. I was counting on this being unlikely for my test
case. I realize this isn't something to rely on in real software.
Hmm... okay.  I may have been distracted by the fact that your
termination condition is based on done incrementing properly,
and that it was possible this wouldn't happen because of the race
condition.  So, if I understand you now, you're saying that the
reason done doesn't increment is actually because one of the
threads is never finishing properly, for some reason not related
to the code itself. (?)
The point of the
test is not to maintain counter -- it's to show that sometimes even
after one thread completes, the other thread never is scheduled
again. This seems wrong. Try running the code, and let me see if you
see this behavior.
On my machines (one Py2.4 on WinXP, one Py2.3.4 on RH9.0) I don't
see this behaviour.  Across about fifty runs each.
And the two increments of done (one by each thread) are still
almost certainly not going to coincide and cause a problem. Also, if
you look at the output from the code on a hang, you will see that
'thread X: leaving' only prints once. This has nothing to do with what
happens with the done variable.
Okay, I believe you.  As I said, I hadn't taken the time to read
through everything at first, jumping on an obvious bug related
to the done variable not meeting your termination conditions.
I can see that something else is likely to be causing this.
One thing you might try is experimenting with sys.setcheckinterval(),
just to see what effect it might have, if any.
It's also possible there were some threading bugs in Py2.2 under
Linux.  Maybe you could repeat the test with a more recent
version and see if you get different behaviour.  (Not that that
proves anything conclusively, but at least it might be a good
solution for your immediate problem.)
-Peter
--
http://mail.python.org/mailman/listinfo/python-list


Re: Thread scheduling

2005-02-26 Thread Jack Orenstein
On my machines (one Py2.4 on WinXP, one Py2.3.4 on RH9.0) I don't
see this behaviour.  Across about fifty runs each.
Thanks for trying this.
One thing you might try is experimenting with sys.setcheckinterval(),
just to see what effect it might have, if any.
That does seem to have an impact. At 0, the problem was completely
reproducible. At 100, I couldn't get it to occur.
It's also possible there were some threading bugs in Py2.2 under
Linux.  Maybe you could repeat the test with a more recent
version and see if you get different behaviour.  (Not that that
proves anything conclusively, but at least it might be a good
solution for your immediate problem.)
2.3 (on the same machine) does seem better, even with setcheckinterval(0).
Thanks for your suggestions.
Can anyone with knowledge of Python internals comment on these results?
(Look earlier in the thread for details. But basically, a very simple
program with the thread module, running two threads, shows that on
occasion, one thread finishes and the other never runs again. python2.3
seems better, as does python2.2 with  sys.setcheckinterval(100).)
Jack
--
http://mail.python.org/mailman/listinfo/python-list


Re: Thread scheduling

2005-02-26 Thread Peter Hansen
Jack Orenstein wrote:
One thing you might try is experimenting with sys.setcheckinterval(),
just to see what effect it might have, if any.
That does seem to have an impact. At 0, the problem was completely
reproducible. At 100, I couldn't get it to occur.
If you try other values in between, can you directly affect
the frequency of the failure?  That would appear to suggest
a race condition.
It's also possible there were some threading bugs in Py2.2 under
Linux.  Maybe you could repeat the test with a more recent
version and see if you get different behaviour.  (Not that that
proves anything conclusively, but at least it might be a good
solution for your immediate problem.)
2.3 (on the same machine) does seem better, even with setcheckinterval(0).
The default check interval was changed from 10 in version 2.2
and earlier to 100 in version 2.3.  (See 
http://www.python.org/2.3/highlights.html for details.)

On the other hand, with version 2.3.4 under RH9, I tried values
of 10 and 1 with no failures at any time.  This might still
be an issue with your own particular system, so having others
try it out might be helpful...
-Peter
--
http://mail.python.org/mailman/listinfo/python-list