Hello all, Here is the scheduler assert patch, test cases, and integration with the test suite combined into a single patch! After a bit more testing with code that used a lot of threads, I ended up looking at how make-thread uses ##sys#make-thread and added some missing code to the patch's usage of ##sys#make-thread.
Best, Jonathan Chan > On Jul 9, 2015, at 4:29 PM, Jonathan Chan <j...@fastmail.fm> wrote: > > ... and here is hopefully the final version of the patch for review. > In the end I think hijacking a thread to run the signal handler is too > much prone to error. Although the previous patches try to cope with > this behavior, as soon as the signal handler tries to do something > like lock a mutex via srfi-18 or yield, very strange things will > happen depending on what thread was hijacked, which threads were > blocking, etc. Blocking > ##sys#schedule from running is a partial fix, but doesn't help when it > would have been expected for ##sys#schedule not to return immediately, > e.g. if a signal handler tries to block on a socket. For example, > seems to make the the tcp unit go into a constant loop while > trying to recv. > > In this newest patch, the ##sys#interrupt-handler in scheduler.scm > will instead save the old handler in ##sys#signal-vector, change the > handler to one that makes a thread with ##sys#make-thread to call the > old handler and then schedules that thread to run, wrap state so that > the original ##sys#interrupt-handler will reset the handler and then > finally call the old hook so everything starts going. > > In addition to the new patch, I have attached a revised version of the > old test case that should be more likely to trigger the assert failure > in unpatched versions of the scheduler as well as a new test case that > 1) shows problems with the previous patches 2) ensures that the signal > handler can properly call blocking code and 3) possibly > demonstrates a bug with multiple threads blocking on one fd that I > am not certain about. > > Instructions for scheduler-fdset.scm: > > 1. Run with csi or csc. > 2. Send SIGINT (e.g. with Ctrl-C) > 3. The test should write "locking... unlocking... unlocked" and then > exit without error. > > Instructions for scheduler-fdset-2.scm: You will need to compile > sink.c with gcc sink.c -o sink > > 1. Run the test > 2. Wait a short bit for the child processes to start > 3. Start netcat in a new terminal with `nc -l 8080` > 4. Send SIGINT (e.g. with ^C) > 5. Verify that the process is not using 100% CPU > 6. Type `hi` (or anything) in the terminal with netcat and press > Return > 7. The test should exit without error > > Sorry for the many versions and emails! This at least lets me get back > to the work that needed the scheduler in the first place. I hope that > this finally kills the bug... > > All the best, > -- > Jonathan Chan j...@fastmail.fm > > On Thu, Jul 9, 2015, at 03:18 AM, Jonathan Chan wrote: >> I realized that if I am able to set a flag about whether or not we >> are in the interrupt hook, I might as well save the previous block >> object of the thread and then restore it instead of possibly causing >> spurious wakeups. As an aside, it seems as if force-primordial >> doesn't currently do this, and relies on the code that asked the >> scheduler to block on i/o to retry on its own. As I stated in the >> previous email, this seems to be what the code that uses the >> scheduler's blocking facilities does anyway. >> >> In any case, this patch also passes my tests (as the previous one >> did) but now should more cleanly avoid "spurious wakeups" of threads >> blocking on I/O. >> >> Also, I realized that the test case lacked instructions; you should >> just be able to start it, wait a second, then send SIGINT (e.g. with >> Ctrl-C) and the unpatched scheduler should cause the program to exit >> with an assert error while with the patched scheduler you should just >> see a nice: >> >> [jyc@jc-avm-arch tests]$ csi -s scheduler-fdset.scm ^Clocking... >> unlocking... unlocked... done! >> >> Best regards, >> -- >> Jonathan Chan j...@fastmail.fm >> >> On Wed, Jul 8, 2015, at 11:34 PM, Jonathan Chan wrote: >>> Hi again, >>> >>> It turns out that changing the current thread's state to 'running >>> doesn't actually work in all cases because a) unblocking from >>> condition variables will skip a thread if its state is running b) >>> the call to change the state back was never called because the call >>> to the old hook would never return. I found this by doing some more >>> testing with the program that caused me to run into the original >>> assert bug. >>> >>> Here is a revised patch which instead sets a variable to indicate >>> whether or not we are in the interrupt hook and skips the scheduler >>> if so. I've also included an updated test case that catches the >>> problem with the previous patch. >>> >>> Best regards, >>> -- >>> Jonathan Chan j...@fastmail.fm >>> >>> On Wed, Jul 8, 2015, at 08:12 PM, Jonathan Chan wrote: >>>> Hi all, >>>> >>>> I ended up having some free time before the weekend and went back >>>> to the scheduler assert problem. I wrote a test case for the bug >>>> and revised the scheduler assert patch after some further testing. >>>> >>>> In the previous patch, the thread in which the interrupt handler >>>> ran would just be removed from ##sys#fd-list if the block object >>>> was changed from a file. This didn't seem to handle some cases >>>> where the interrupt handler ended up unblocking. In one of these >>>> cases, when mutex-lock! and then mutex-unlock! were called in >>>> succession, it also turned out that the "current thread"'s state >>>> needed to be saved, set to 'running during the interrupt handler, >>>> then restored in order to avoid mutex-unlock! from calling the >>>> scheduler and causing the program to hang/do strange things. >>>> >>>> I've attached the the test case for the bug and the revised patch. >>>> Again, if someone experienced with the scheduler could review it >>>> and make sure it doesn't cause strange things to happen, it would >>>> be very much appreciated. >>>> >>>> Best regards, >>>> -- >>>> Jonathan Chan j...@fastmail.fm >>>> >>>> On Sat, Jul 4, 2015, at 07:49 PM, Jonathan Chan wrote: >>>>> Hello all, >>>>> >>>>> I was in the IRC channel earlier trying to find out why my program >>>>> was failing due to a scheduler assertion about the fdset for poll. >>>>> This would happen consistently when I sent SIGINT. >>>>> >>>>> The error looks like: >>>>> >>>>> scheduler.c:51: C_fd_ready: Assertion `fd == C_fdset_set[pos].fd' >>>>> failed >>>>> >>>>> After looking at the scheduler code, it looks like this happened >>>>> because the SIGINT was causing the interrupt handler to be called, >>>>> which ended up unblocking some thread to run a signal handler, >>>>> which caused a condition variable broadcast. In the end, this made >>>>> a thread that was previously blocking on a file descriptor to now >>>>> block on a mutex. >>>>> >>>>> When create-fdset procedure iterated over the ##sys#fd-list, it >>>>> skipped threads in the list who were not blocking on fds while it >>>>> added those who were to the fdset using fdset-set. However, it did >>>>> not remove these threads from ##sys#fd-list, which resulted in the >>>>> scheduler throwing an error because it expected a fd to be in the >>>>> fdset when it was not there. >>>>> >>>>> I have attached a patch which modifies create-fdset to remove >>>>> threads from ##sys#fd-list who are not blocking on an fd, rather >>>>> than only skip them. This seems to fix my problem and has not >>>>> introduced any bizarre behvaior in the few tests I have run, but >>>>> hopefully someone who is more familiar with the scheduler could >>>>> look it over and see if it fixes the issue. >>>>> >>>>> Best regards, >>>>> -- >>>>> Jonathan Chan j...@fastmail.fm >>>>> _______________________________________________ >>>>> Chicken-hackers mailing list Chicken-hackers@nongnu.org >>>>> https://lists.nongnu.org/mailman/listinfo/chicken-hackers Email >>>>> had 1 attachment: >>>>> + scheduler-assert.diff 2k (text/plain) >>>> _______________________________________________ >>>> Chicken-hackers mailing list Chicken-hackers@nongnu.org >>>> https://lists.nongnu.org/mailman/listinfo/chicken-hackers Email had >>>> 2 attachments: >>>> + scheduler-assert-fix.diff 5k (application/octet-stream) >>>> + scheduler-fdset.scm 2k (application/octet-stream) >>> _______________________________________________ >>> Chicken-hackers mailing list Chicken-hackers@nongnu.org >>> https://lists.nongnu.org/mailman/listinfo/chicken-hackers Email had >>> 3 attachments: >>> + scheduler-fdset.scm 2k (application/octet-stream) >>> + sink.c 1k (text/plain) >>> + scheduler-assert-fix.diff 14k (application/octet-stream) >> _______________________________________________ >> Chicken-hackers mailing list Chicken-hackers@nongnu.org >> https://lists.nongnu.org/mailman/listinfo/chicken-hackers Email had 1 >> attachment: >> + scheduler-assert-fix.diff 14k (application/octet-stream) > <scheduler-assert-new.diff><scheduler-fdset-2.scm><scheduler- > fdset.scm><sink.c>_______________________________________________ Chicken- > hackers mailing list Chicken-hackers@nongnu.org > https://lists.nongnu.org/mailman/listinfo/chicken-hackers
scheduler-assert-final.diff
Description: Binary data
_______________________________________________ Chicken-hackers mailing list Chicken-hackers@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-hackers