Package: rsyslog
Version: 8.2102.0-2
Severity: important

Dear Maintainer,
since upgrading from buster to bullseye, rsyslog on some of my machines
randomly exits.

My setup contains
- a centralized log server, receiving logs with imrelp
- two relays, that forward logs from other hosts using imrelp+omrelp
- multiple hosts that send their logs (to either one of the relays or
  directly to the central server) with omrelp

I tried running rsyslog with debugging messages under gdb, right before
it exits, the output is

1778.966318176:imrelp.c       : imrelp.c: librelp: done epoll_wait, nEvents:1
1778.966486859:imrelp.c       : imrelp.c: librelp: generic error: ecode 10014, 
emsg 'TLS record reception failed [gnutls error -54: Error in the pull 
function.]'
1778.966543315:imrelp.c       : errmsg.c: Called LogMsg, msg: imrelp[10514]: 
error 'TLS record reception failed [gnutls error -54: Error in the pull 
function.]', object  'lstn 10514: conn to clt 2a00:c500:561:201:7910:b4d
8:2065:6c6b/2a00:c500:561:201:7910:b4d8:2065:6c6b' - input may not work as 
intended
1778.966585565:imrelp.c       : operatingstate.c: osf: MSG imrelp[10514]: error 
'TLS record reception failed [gnutls error -54: Error in the pull function.]', 
object  'lstn 10514: conn to clt 2a00:c500:561:201:7910:b4d8:2
065:6c6b/2a00:c500:561:201:7910:b4d8:2065:6c6b' - input may not work as 
intended: signaling new internal message via SIGTTOU: 'imrelp[10514]: error 
'TLS record reception failed [gnutls error -54: Error in the pull functio
n.]', object  'lstn 10514: conn to clt 
2a00:c500:561:201:7910:b4d8:2065:6c6b/2a00:c500:561:201:7910:b4d8:2065:6c6b' - 
input may not work as intended [v8.2102.0 try https://www.rsyslog.com/e/2353 ]'
rsyslogd: imrelp[10514]: error 'TLS record reception failed [gnutls error -54: 
Error in the pull function.]', object  'lstn 10514: conn to clt 
2a00:c500:561:201:7910:b4d8:2065:6c6b/2a00:c500:561:201:7910:b4d8:2065:6c6b' -
 input may not work as intended [v8.2102.0 try https://www.rsyslog.com/e/2353 ]
Cannot find user-level thread for LWP 25226: generic error
(gdb) [Thread 0x7ffff536c700 (LWP 25237) exited]
[Thread 0x7ffff5b6d700 (LWP 25236) exited]
[Thread 0x7ffff676f700 (LWP 25234) exited]
[Thread 0x7ffff6b70700 (LWP 25233) exited]
[Thread 0x7ffff6f71700 (LWP 25232) exited]
[Thread 0x7ffff7a6d240 (LWP 25226) exited]
[Inferior 1 (process 25226) exited with code 01]

I have observed the issue only on the central log server and both of the
relays, not on the hosts -- i.e. only those systems that use imrelp.

I can also reproduce it semi-reliably by SIGKILLing (so it doesn't close
the connection cleanly) a RELP client (i.e. an rsyslog instance using
omrelp), which will almost (but not quite) always cause its
corresponding RELP server to exit in the above manner.

So if I SIGKILL rsyslogd on one of the hosts, it will cause its relay to
die, which in turn causes the central server to die, which in turn makes
me very unhappy. Since logging is critical for my infrastructure, I
would very much appreciate it if this was fixed promptly.

rsyslog 8.2110.0-1 from testing still exhibits the issue.

Cheers,
-- 
Anton Khirnov

rsyslog.conf on one of the relays:
--------------------------------------------------------------------------------
global(workDirectory="/var/spool/rsyslog" preserveFQDN="on")

module(load="imuxsock")
module(load="imklog")
module(load="imudp")
module(load="imrelp")
module(load="omrelp")
module(load="builtin:omfile" template="RSYSLOG_FileFormat"
       fileOwner="root" fileGroup="adm")


template(name="filename_per_program" type="string" 
string="/var/cache/log/%programname%.log")

template(name="fileformat_standard" type="list") {
    property(name="timestamp" dateFormat="rfc3339")
    constant(value=" ")
    property(name="pri-text")
    constant(value=" ")
    property(name="hostname")
    constant(value=" ")
    property(name="syslogtag")
    property(name="msg" spifno1stsp="on" )
    property(name="msg" droplastlf="on" )
    constant(value="\n")
}

template(name="fileformat_sshban" type="list") {
    property(name="timestamp" dateFormat="rfc3339")
    constant(value=" ")
    property(name="msg" spifno1stsp="on" )
    property(name="msg" droplastlf="on" )
    constant(value="\n")
}

input(type="imudp" device="bond0" port="514")
input(type="imudp" device="vlan9" port="514")
input(type="imrelp" port="10514"
      tls="on" tls.authMode="name" tls.cacert="/etc/rsyslog/ca.crt" 
tls.mycert="/etc/rsyslog/quelana.crt" tls.myprivkey="/etc/rsyslog/quelana.key"
      tls.permittedPeer=[ ... ])

action(type="omrelp" target="log.khirnov.net" port="10514"
       tls="on" tls.authMode="name" tls.cacert="/etc/rsyslog/ca.crt" 
tls.mycert="/etc/rsyslog/quelana.crt" tls.myprivkey="/etc/rsyslog/quelana.key" 
tls.permittedPeer="log.khirnov.net"
       queue.type="LinkedList" queue.filename="forwardbuf" 
queue.saveonshutdown="on" action.resumeRetryCount="-1")

# special processing for some local logs (not forwarded)
if ($fromhost contains "quelana") then {
    # local cache for logs
    # nftables logs go into nftables.log, everything else into per-program log
    if ($syslogfacility-text == "kern" and $msg contains "nftfw:") then {
        action(type="omfile" template="fileformat_standard" 
File="/var/cache/log/nftables.log")
    } else {
        action(type="omfile" template="fileformat_standard" 
dynaFile="filename_per_program")
    }

    # copy of sshd logs sent into FIFO for analysis/banning scanners
    if ($programname == "sshd") then {
        action(type="ompipe" pipe="/run/rsyslog_sshd.fifo" 
template="fileformat_sshban")
    }
}
--------------------------------------------------------------------------------

Reply via email to