Hi there Sorry about the long radio silence on this topic. I want to given an update on this issue:
- I wasn't able to put together a simple example which reproduces the above error. I've tried with multiple ways to fake the job needed, but didn't succeed. - However, I changed the code as Ole suggested and my cygwin jobs ran fine. Since then, I switched to Linux for processing large jobs. Thanks again Stephan 2016-07-23 18:55 GMT+02:00 Ole Tange <[email protected]>: > On Fri, Jul 22, 2016 at 2:50 PM, <[email protected]> wrote: > : > > I use Cygwin (updated to latest packages) with the latest parallel > version > > (20160622). > > My workflow looks like this: > > > > cat input.txt | parallel --pipe -N64 --blocksize 63K --joblog > > joblog.txt --retries 3 --progress python myscript.py > > > > myscript.py does some CPU-bound processing with some network I/O and > takes > > about 3 seconds per input line. input.txt has about 360k lines. > > > > The above command works well for about 30-60 minutes, fully utilizing 16 > > cores. But then, it stops with > > > > "Signal SIGCHLD received, but no signal handler set." > > > > on STDERR. I tried to simulate the command with > > > > seq 360000 | parallel --pipe -N64 --blocksize 63K --joblog > joblog.txt > > --retries 3 --progress sleep 3 > > > > But I could not replicate the error yet. > > > > Does anyone have an idea how to debug/resolve this? > > First step is to reproduce it. > > My gut tells me this is a CygWin thing. > > Looking at the code $SIG{CHLD} is only messed with in: > > # When a child dies, wake up from sleep (or select(,,,)) > $SIG{CHLD} = sub { kill "ALRM", $$ }; > usleep($ms); > # --compress needs $SIG{CHLD} undefined > delete $SIG{CHLD}; > exit_if_disk_full(); > > On GNU/Linux 'delete $SIG{CHLD};' has the same effect as > '$SIG{CHLD}="IGNORE";' but maybe CygWin is different? When you find a > way to reproduce the error, try changing: > > delete $SIG{CHLD}; > > into: > > $SIG{CHLD}="IGNORE"; > > And please post how you reproduced the error. > > > /Ole >
