hyperthreading fix try #2
The latest snapshot has my latest try at fixing the dreaded hyperthreading problem. My previous fix was flawed in that once Corinna corrected a typo in my change, the problem showed up again. So, I've reworked the synchronization logic again and even ran cygwin through that "test suite" thing that is all the rage with cygwin developers these days. In fact, I ran the test suite while running the hyperthreading tests. To test this, I ran two invocations of the standard shell script test along with the Brian Ford variation of the same for 24 hours. For some reason, Brian's shell script seemed to trip the error more quickly than the other one but the combination of running his script + the other script seemed to produce the problem even more quickly. (I'd modified both of the scripts so that they beeped if they exited, causing me to jump out of my chair a couple of times as I struggled to get this right.) I'm not claiming that it is right now. I haven't tried a "make -j" test yet. I just thought it was time to release another try on the world again: http://cygwin.com/snapshots/ To help preserve my tenuous grasp on sanity, please reply to *this thread* when reporting problems. Please don't start a new thread. Just reply here so that mailing list threading is preserved and I can easily check for all success or error reports. As before, any kind of report is welcome but it is unlikely that I'm going to spend a lot of time debugging problems that I can't reproduce. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: hyperthreading fix try #2
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On my machine my own test case, and the make -j2 test case, have been running now for more than an hour, no problem so far. You seem to be on the right track :) Thanks for your efforts With kind Regards|\ _,,,---,,_ ZZZzz /,`.-'`'-. ;-;;, Volker Bandke |,4- ) )-,_. ,\ ( `'-' (BSP GmbH)'---''(_/--' `-'\_) Lesser known machine instructions - SDLI: Shift Disk Left Immediate (Another Wisdom from my fortune cookie jar) -BEGIN PGP SIGNATURE- Version: PGP 8.0.1 iQA/AwUBQg3Lax5trGyhAF0wEQIkRACeOEFBg5fg9uexTMbuuks2T8Tc6qYAnAoB 12qf6LJ7bKUWGMv8s/51fbKg =/+0S -END PGP SIGNATURE- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: hyperthreading fix try #2
Christopher Faylor wrote: I'm not claiming that it is right now. I haven't tried a "make -j" test yet. I just thought it was time to release another try on the world again: http://cygwin.com/snapshots/ To help preserve my tenuous grasp on sanity, please reply to *this thread* when reporting problems. Please don't start a new thread. Just reply here so that mailing list threading is preserved and I can easily check for all success or error reports. As before, any kind of report is welcome but it is unlikely that I'm going to spend a lot of time debugging problems that I can't reproduce. cgf My "make -j" test has been running for a while with no failures, and beyond that, this seems to fix a long-standing problem for me having to do with more excessive parallelization "make -j100" issues. -Rolf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: hyperthreading fix try #2
Christopher Faylor wrote: To help preserve my tenuous grasp on sanity, please reply to *this thread* when reporting problems. Please don't start a new thread. Just reply here so that mailing list threading is preserved and I can easily check for all success or error reports. As before, any kind of report is welcome but it is unlikely that I'm going to spend a lot of time debugging problems that I can't reproduce. It looks somewhat promising here. My main use case is building Python from CVS, and previously it tended to die somewhere in the autoconf script. With the latest snapshot, ./configure and make both worked, and it made it at least part of the way through the Python regression tests. It made it through Python's test_subprocess.py which should be giving the pipe handling a decent workout, but appears to have died in test_threadedtempfile.py (the shell stops using any CPU time, which is rarely a good sign). The bash shell also won't respond to any of Ctrl-C, Ctrl-Z or Ctrl-Break. The close button works, but I figure Windows is taking care of that one. This seems to happen even running that test from the bash shell with the standard Cygwin python 2.4: $ /usr/lib/python2.4/test/regrtest.py test_threadedtempfile test_threadedtempfile Using the windows Python 2.4, the test completes inside a couple of seconds: C:\>\python24\Lib\test\regrtest.py test_threadedtempfile test_threadedtempfile 1 test OK. Of course, I don't actually know if this is a related problem or not. I'm hoping Chris can check it easily, since it happens with the standard Cygwin Python, not just with the version I built from Python's current CVS. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.skystorm.net -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: hyperthreading fix try #2
Nick Coghlan wrote: Christopher Faylor wrote: Of course, I don't actually know if this is a related problem or not. I'm hoping Chris can check it easily, since it happens with the standard Cygwin Python, not just with the version I built from Python's current CVS. I took the obvious step of running that test script directly, playing with the number of threads spawned, and the number of files created by each thread, as well as adding some more print statements to the script to see where it was hanging. Command lines looked like (with thread and file counts filled in): $ python /lib/python2.4/test/test_threadedtempfile.py -t -f The results weren't particularly deterministic, beyond a general 'more threads, more files' -> 'more likely to hang'. 10 & 10 seemed to do it fairly effectively although even that would occasionally succeed (the default is 20 & 20). When it *did* hang, it was with a number of threads successfully opening their files on an iteration, with the remainder of the threads locking up attempting to open a new temporary file. The next time around, the remaining threads would hang while attempting to open the temporary file. The main script hangs because it is waiting for the threads to terminate. Regards, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.skystorm.net -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: hyperthreading fix try #2
On Mon, Feb 14, 2005 at 01:17:03AM +1000, Nick Coghlan wrote: >Nick Coghlan wrote: >>Christopher Faylor wrote: >>Of course, I don't actually know if this is a related problem or not. >>I'm hoping Chris can check it easily, since it happens with the standard >>Cygwin Python, not just with the version I built from Python's current CVS. > >I took the obvious step of running that test script directly, playing with >the number of threads spawned, and the number of files created by each >thread, as well as adding some more print statements to the script to see >where it was hanging. > >Command lines looked like (with thread and file counts filled in): >$ python /lib/python2.4/test/test_threadedtempfile.py -t -f > > >The results weren't particularly deterministic, beyond a general 'more >threads, more files' -> 'more likely to hang'. 10 & 10 seemed to do it >fairly effectively although even that would occasionally succeed (the >default is 20 & 20). > >When it *did* hang, it was with a number of threads successfully opening >their files on an iteration, with the remainder of the threads locking up >attempting to open a new temporary file. The next time around, the >remaining threads would hang while attempting to open the temporary file. > >The main script hangs because it is waiting for the threads to terminate. Is this a regression? Was this also problem with 1.5.12? Unless there are pipes involved in this test, I don't see how it could be related to the problem. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: hyperthreading fix try #2
On Sun, Feb 13, 2005 at 12:09:21PM -0500, Christopher Faylor wrote: > On Mon, Feb 14, 2005 at 01:17:03AM +1000, Nick Coghlan wrote: > >Nick Coghlan wrote: > >Command lines looked like (with thread and file counts filled in): $ > >python /lib/python2.4/test/test_threadedtempfile.py -t -f > > > > > >The results weren't particularly deterministic, beyond a general > >'more threads, more files' -> 'more likely to hang'. 10 & 10 seemed > >to do it fairly effectively although even that would occasionally > >succeed (the default is 20 & 20). > > > >When it *did* hang, it was with a number of threads successfully > >opening their files on an iteration, with the remainder of the > >threads locking up attempting to open a new temporary file. The next > >time around, the remaining threads would hang while attempting to > >open the temporary file. > > > >The main script hangs because it is waiting for the threads to > >terminate. > > Is this a regression? I don't know. > Was this also problem with 1.5.12? AFAICT, no. However, another threaded regression test hung in 1.5.12 as indicated in the README: Under XP Pro SP1, Cygwin 1.5.12-1, ntsec, and NTFS, Cygwin Python passes all tests except for following: ... test_threaded_import (occasionally hangs) So, maybe I was just "lucky" test_threadedtempfile did not hang when I ran the regression test especially since it hung under 1.5.10-3. Jason -- PGP/GPG Key: http://www.tishler.net/jason/pubkey.asc or key servers Fingerprint: 7A73 1405 7F2B E669 C19D 8784 1AFD E4CC ECF4 8EF6 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Python 2.4's test_threadedtempfile failing (was Re: hyperthreading fix try #2)
Christopher Faylor wrote: On Mon, Feb 14, 2005 at 01:17:03AM +1000, Nick Coghlan wrote: Command lines looked like (with thread and file counts filled in): $ python /lib/python2.4/test/test_threadedtempfile.py -t -f Is this a regression? Was this also problem with 1.5.12? Unless there are pipes involved in this test, I don't see how it could be related to the problem. I can't say for sure if it's a regression, since I never got 1.5.12 to work properly at all (and hence didn't back up the bin directory before dropping the snapshot binaries into it). At the moment, I'm curious if the test passes on non-hyperthreaded machines. I've changed the subject and cc'ed Jason though, to see if he has anything to add. Regards, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.skystorm.net -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Python 2.4's test_threadedtempfile failing (was Re: hyperthreading fix try #2)
On Mon, Feb 14, 2005 at 01:12:53PM +1000, Nick Coghlan wrote: >Christopher Faylor wrote: >>On Mon, Feb 14, 2005 at 01:17:03AM +1000, Nick Coghlan wrote: >>>Command lines looked like (with thread and file counts filled in): >>>$ python /lib/python2.4/test/test_threadedtempfile.py -t -f >>> >>> >>Is this a regression? Was this also problem with 1.5.12? >> >>Unless there are pipes involved in this test, I don't see how it could be >>related to the problem. > >I can't say for sure if it's a regression, since I never got 1.5.12 to work >properly at all (and hence didn't back up the bin directory before dropping >the snapshot binaries into it). At the moment, I'm curious if the test >passes on non-hyperthreaded machines. The test passes on this hyperthreaded machine or at least it doesn't hang. I have been running it repeatedly. I do occasionally get a 'bad file descriptor' error but the test keeps on running. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Python 2.4's test_threadedtempfile failing (was Re: hyperthreading fix try #2)
Christopher Faylor wrote: On Mon, Feb 14, 2005 at 01:12:53PM +1000, Nick Coghlan wrote: I can't say for sure if it's a regression, since I never got 1.5.12 to work properly at all (and hence didn't back up the bin directory before dropping the snapshot binaries into it). At the moment, I'm curious if the test passes on non-hyperthreaded machines. The test passes on this hyperthreaded machine or at least it doesn't hang. I have been running it repeatedly. I do occasionally get a 'bad file descriptor' error but the test keeps on running. Most odd. Anyway, I'm far more comfortable hacking Python than I am Cygwin, so I'll get back to the list once I've taken some more time to dig into the guts of the failing test. For the pipe-handling issue, I take the fact that Python's test_subprocess passed as a very good sign, since I understand that uses pipes for its inter-process communication. Regards, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.skystorm.net -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/