Bug#1020415: bcron-start terminated 111 status

Tomas Hodek Mon, 12 Dec 2022 00:18:17 -0800

Hi Georges,

issue with crond, we are currently having is that main process sometimes hangs with high single core cpu usage. When its happens, jobs areoften delayed or not executed at all. We are talking about about 1300user crontabs. Some are empty one, some are using up to 10 jobs. Howeverthis is happening irregularly. Only solution I have found is to restartcrond via init script. I was able to catch what it is doing with strace:

rt_sigprocmask(SIG_BLOCK, [HUP USR1 USR2 PIPE ALRM CHLD TSTP URG VTALRMPROF WINCH IO], [], 8) = 0openat(AT_FDCWD, "/run/systemd/userdb/",O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 5

fstat(5, {st_mode=S_IFDIR|0755, st_size=60, ...}) = 0
getdents64(5, 0x557caf6dd300 /* 3 entries */, 32768) = 96
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 6

connect(6, {sa_family=AF_UNIX,sun_path="/run/systemd/userdb/io.systemd.DynamicUser"},45) = 0

epoll_create1(EPOLL_CLOEXEC) = 7
timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC|TFD_NONBLOCK) = 8

epoll_ctl(7, EPOLL_CTL_ADD, 8, {EPOLLIN, {u32=2881333152,u64=93993945638816}}) = 0epoll_ctl(7, EPOLL_CTL_ADD, 6, {0, {u32=2943150768,u64=93994007456432}}) = 0

getdents64(5, 0x557caf6dd300 /* 0 entries */, 32768) = 0
close(5) = 0

epoll_ctl(7, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLOUT, {u32=2943150768,u64=93994007456432}}) = 0timerfd_settime(8, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0,tv_nsec=0}, it_value={tv_sec=15696950, tv_nsec=229181000}}, NULL) = 0epoll_wait(7, [{EPOLLOUT, {u32=2943150768, u64=93994007456432}}], 4,0) = 1sendto(6, "{\"method\":\"io.systemd.UserDataba"..., 133,MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 133epoll_ctl(7, EPOLL_CTL_MOD, 6, {EPOLLIN, {u32=2943150768,u64=93994007456432}}) = 0epoll_wait(7, [{EPOLLIN, {u32=2943150768, u64=93994007456432}}], 4,0) = 1recvfrom(6, "{\"error\":\"io.systemd.UserDatabas"..., 131080,MSG_DONTWAIT, NULL, NULL) = 66epoll_ctl(7, EPOLL_CTL_MOD, 6, {0, {u32=2943150768,u64=93994007456432}}) = 0

epoll_wait(7, [], 4, 0) = 0
epoll_wait(7, [], 4, 0) = 0
epoll_ctl(7, EPOLL_CTL_DEL, 6, NULL) = 0
close(6) = 0
close(7) = 0
close(8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 5
lseek(5, 0, SEEK_CUR) = 0
fstat(5, {st_mode=S_IFREG|0644, st_size=217418, ...}) = 0
read(5, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 4096

Last 3 lines are repeated for each cron user (in the matter of fact ititerates for about 20 users, and then the whole top part of strace isrepeated). Can it be caused by number of orphan crontabs?((domainname.cz) ORPHAN (no passwd entry)) To run user cron jobs thereis some wrapper script for PHP, which configures some variables for PHPinterpreter such as open_basedir etc. It is simple bash script, whichactually launches defined task. But problem is within main crond proess:


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND

 437055 root      20   0   67828  63792   2748 R  90.9   0.0 1000:40 cron

On older hosting server (debian 7 even 8) we had same issue few yearsago and installation of bcron seemed to fix this issue.


Best regards Tomas



On 09. 12. 22 15:16, Georges Khaznadar wrote:

Dear Thomas,

Tomas Hodek a écrit :

I have been using Debian on servers for many years, but this is first time I
had to file bug report, so i am quite newbie in how to properly debug
package.

Please can you remind me why you could not use cron for your crontabs?
As I understand cron's internals a little better than bcron's, I might
be able to fix the issue which prevents you from using cron?

Here is an excerpt of your latest message about this issue:

        Our issue with cron was not yet reported, since I did not found any
        regularity and have no test environment where I was able to
        reproduce this issue. Only thing I know for sure is the fact that
        cron process sometimes eats 1 whole CPU core and then jobs are not
        being executed correctly. Normal CPU usage is mere percents. Restart
        of daemon fixes it.

I suppose that there is some significant difference between your set of
cron jobs and cron jobs other debian users are running majoritarily, as
nobody complains about a similar issue.

I wonder whether you have a few commands in some cron job which can have
a long life time and require many computing force? Could you know which
children commands were executed when you saw high CPU usage? cron puts
no limit (in duration or cpu & memory usage) for children processes.

But if RFH will not help with finding and issue, this package is no
longer usable for users, so marking it obsolete seems to only a reasonable
choice. But it is not up to me, and I think it would be better to someone to
look at this issue, who has better understanding of how systemd and c++ (I
think bcron is written in it) programming works.

Bcron is written in pure C, no C++; the problem is that its author,
Bruce Guenter, wrote bcron's source files without taking care of any
internal documentation. Here is an example, taken from line 27 of the
file bcron-spool.c

---------------8<---------------------------
static void respond(const char* msg)
{
     ...
}
---------------8<---------------------------

Today's guidelines are to prepend a documentation before such function
definitions. As I read the code of `respond`, I would bet for such a
header:

---------------8<---------------------------
/**
  * @function respond:
  *   outputs a message, and exits
  *
  * @param msg is a string whose first character decides the exit way
  *   and the remainder is the actual message to display. When the first
  *   character is 'K', it calls exit(0); when it is 'Z', it dies with
  *   code 111 logged to syslog; otherwise, it dies with code 100.
  **/
static void respond(const char* msg)
{
     ...
}
---------------8<---------------------------

There are about 40 undocumented function in bcron's source code. I do
not want to spend much time to understand all implicit ideas wired in
the C code, and verify their adequation with the program's goal. If
somebody wants to revive bcron, this is still the right way to go
forward.

Indeed, the situation is even worse: most of bcron's code is built upon
Bruce Guenter's own libraries (debian packages: bglibs and libbg-dev),
whose internals are documented nowhere, so if one wants to audit
how the previous function exits, one has to document thoroughly the
functions and macros die_oom, die1sys, die1, die3sys whose source code
are under /usr/include/bglibs/msg.h and in the C source (which you can
get by `apt-get source bglibs`).

Instead of this, cron is built directly upon libc, whose primitives are
documented by up-to-date manpages.

Best regards,                   Georges.

Bug#1020415: bcron-start terminated 111 status

Reply via email to