Hi, over the last couple of months I have seen at least two non-C (rust and python) implementations of syslog() equivalent functionality causing applications written in those languages to become brittle.
The reason, I hear you ask? In C, the return type of syslog() is void, so it can't return any error. Our C implementation makes a reasonable attempt at re-trying in the face of OS-errors: /* * If the send() failed, there are two likely scenarios: * 1) syslogd was restarted * 2) /dev/log is out of socket buffer space * We attempt to reconnect to /dev/log to take care of * case #1 and keep send()ing data to cover case #2 * to give syslogd a chance to empty its socket buffer. */ for (tries = 0; tries < MAXTRIES; tries++) { if (send(data->log_file, tbuf, cnt, 0) != -1) break; if (errno != ENOBUFS) { disconnectlog_r(data); connectlog_r(data); } else (void)usleep(1); } and if the number of retries is exceeded, our C code tries to send the syslog message to the console instead. However, the rust and python implementations have the possibility of returning errors or raising exceptions, but the applications using those syslog-like functions are evidently unprepared to deal with any errors from that functionality, causing those applications to exit if an error occurred during syslog'ing. This has caused me to file https://github.com/Geal/rust-syslog/issues/79 which has not seen any activity or comments since I submitted it. This issue caused the net/routinator program (an RPKI validator written in rust) to exit if I had turned up the logging level "too high" (to trigger the ENOBUFS condition) when running it on NetBSD. I worked around this issue by dialing down the syslog level in my routinator configuration. The python issue I'm having is that similarly, the sysutils/py-borgmatic package is not prepared to handle errors from syslog'ing, causing it to exit with this error message: --- Logging error --- Traceback (most recent call last): File "/usr/pkg/lib/python3.10/logging/handlers.py", line 987, in emit self.socket.send(msg) OSError: [Errno 55] No buffer space available During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/pkg/lib/python3.10/logging/handlers.py", line 991, in emit self.socket.send(msg) OSError: [Errno 55] No buffer space available Call stack: File "/usr/pkg/bin/borgmatic", line 33, in <module> sys.exit(load_entry_point('borgmatic==1.8.5', 'console_scripts', 'borgmatic')()) File "/usr/pkg/lib/python3.10/site-packages/borgmatic/commands/borgmatic.py", line 894, in main logger.handle(log) This is operationally troublesome, to say the least, for potentially long-running programs. Since this points to the syslog code inside python itself (in this case python 3.10), I tried replicating parts of the C semantics and eliminate raising exceptions in the syslog code by applying the attached local patch. I've not yet tried to submit this one upstream, and I'm still testing it locally. IMHO, having long-running programs become brittle just because logging failed is just ... silly, and it appears that almost every programmer out there expects the C semantics that syslog()'ing never fails. Secondly: is it something particular we are doing on the NetBSD end of things which contributes to this problem? Doesn't other OSes return ENOBUFS if syslogd isn't able to keep up by consuming the messages at the receiving end? Other comments? Regards, - HÃ¥vard
$NetBSD$ Introduce code to re-try sending of log message up to 10 times, and drop messages if the retry count is exceeded, instead of raising an error. Calling code is seldom prepared to handle exceptions from syslog-like functions, and become needlessly brittle if syslog-ing can raise an exception. --- Lib/logging/handlers.py.orig 2024-03-05 08:27:17.479574742 +0000 +++ Lib/logging/handlers.py @@ -25,6 +25,7 @@ To use, simply 'import logging.handlers' import io, logging, socket, os, pickle, struct, time, re from stat import ST_DEV, ST_INO, ST_MTIME +import errno import queue import threading import copy @@ -983,16 +984,39 @@ class SysLogHandler(logging.Handler): msg = msg.encode('utf-8') msg = prio + msg if self.unixsocket: - try: - self.socket.send(msg) - except OSError: - self.socket.close() - self._connect_unixsocket(self.address) - self.socket.send(msg) + tries = 10 + while tries > 0: + try: + self.socket.send(msg) + except OSError as err: + tries -= 1 + if err.errno == errno.ENOBUFS: + time.sleep(0.001) + else: + self.socket.close() + self._connect_unixsocket(self.address) + # if we exhaust retries, just drop message + # possibly, we could count the number of dropped messages elif self.socktype == socket.SOCK_DGRAM: - self.socket.sendto(msg, self.address) + tries = 10 + while tries > 0: + tries -= 1 + try: + self.socket.sendto(msg, self.address) + except OSError: + time.sleep(0.001) + # if we exhaust retries, just drop message + # possibly, we could count the number of dropped messages else: - self.socket.sendall(msg) + tries = 10 + while tries > 0: + tries -= 1 + try: + self.socket.sendall(msg) + except OSError: + time.sleep(0.001) + # if we exhaust retries, just drop message + # possibly, we could count the number of dropped messages except Exception: self.handleError(record)