On Fri, May 4, 2018 at 8:42 AM, John <[email protected]> wrote: > Dear all > > How can I catch if the program I have called with parallel gets killed by > the kernel due to memory space. > I know the option --memfree. I am not sure if this satisfies all my needs. > For example what happens if one of my jobs was always put back into the > queue and is the last one now. And even though he has now all the memory > available it still gets killed. > I would like to have an option that returns me all the jobs that were not > able to be finished. Is this possible? > > Cheers > John > >
You can use *parallel* *--joblog ~/my.log* to output several pieces of information about jobs. One of those pieces is "ExitVal", which will tell you not only that your job completed unsuccessfully, but with what exit code. For example, instead of having to check *dmesg* for a "Out of memory: Kill process ..." message, you can safely assume *143* is from linux's OOM killer having sent your process a SIGTERM (128 + 15). I usually run an *ad hoc* script to pick up the "stragglers" after a larger run, by parsing that file for any non-zero ExitVal's, and re-invoking the full command line associated with it. Of course, if the exit code was due to something *deterministic*, you'll just get non-zeros again and again, without first fixing the problem with the data/args of those specific invocations first.
