May 26, 2020 4:36 AM, "Kevin Zheng" <kevinz5...@gmail.com> wrote:
> Hi folks, Hellow, > [...] > > Chasing changes in syslog output is a part of maintaining software like > SSHGuard. Unfortunately, my parser (which recently learned how to > pledge!) is a bit dull and would require some re-education to remember > SMTP sessions and their associated IP addresses. So, my questions are: > > Why did OpenSMTPD stop reporting IP addresses on every line? > > Is there any chance that OpenSMTPD can put IP addresses back on every line? Sorry, I'll write a detailed answer for future reference: A long time ago, people started relying on logs to write all kinds of scripts and processors, which led to requests to add some info here and another there until the logs became so dense that they scared users. People writing scripts were happy, humans reading logs were overwhelmed with redundant informations. Around this time, Theo asked me to make logs more readable, ideally so they'd fit terminal lines for most logs. I did some compaction (i.e.: source -> src) but this didn't help much because, as you mention, scripts used parsers which didn't keep state and logs grew dense to keep state on their behalf. What did not help further is that slight changes in format broke all existing scripts. So it became obvious we were doing something wrong: We WANT to make it easy for tools to extend OpenSMTPD, but this can't come at the cost of freezing our maillog format or at the cost of users having harder time going through maillog. The consumers, humans and scripts, have different expectations so they need to consume different logs. We now provide a reporting API which is basically a stream of events that can be consumed by tools. It is a line-based format which is not meant to be read by humans but meant to be easily parsed by tools and that provides all of the information necessary to replicate the session states. Using this stream, one can write a tiny filter which aggregates info and outputs logs tailored for a specific third-party application with a guarantee that it won't break when we make a subtle change to the maillog format. If I were working on SSHGuard for example, I'd write an sshguard-exporter script that reads the stream and that outputs to syslog a format SSHguard recognizes. This way, an smtpd user would simply: filter sshguard proc-exec "sshguard-exporter" listen on all filter sshguard action "foobar" relay filter sshguard SSHguard itself would never need to be altered to follow changes in logs. We decided to turn back maillog format into a human-oriented format, one that guarantees you will find your info with two grep, but that does not guarantee it won't have subtle changes at each release if they make logs more readable. The maillog format logs events and their associated data, removing redundancy and not keeping context across logs. This allows me, as a human to easily grep for info and not be overflowed with a lot of unrelated log lines that happen to carry the same information: $ grep 'smtp connected' /var/log/maillog|tail -3 May 26 06:22:33 in smtpd[75500]: 0192b8a91f05254a smtp connected address=199.185.178.25 host=mail.openbsd.org May 26 06:24:49 in smtpd[75500]: 0192b8ac864dd3e1 smtp connected address=199.185.178.25 host=mail.openbsd.org May 26 06:24:56 in smtpd[75500]: 0192b8b1f1c2f60c smtp connected address=199.185.178.25 host=mail.openbsd.org $ grep 'address=199.185.178.25' /var/log/maillog|tail -3 May 26 06:13:38 in smtpd[75500]: 0192b895dd7ab75f smtp connected address=199.185.178.25 host=mail.openbsd.org May 26 06:22:12 in smtpd[75500]: 0192b8a43a226ada smtp connected address=199.185.178.25 host=mail.openbsd.org May 26 06:22:33 in smtpd[75500]: 0192b8a91f05254a smtp connected address=199.185.178.25 host=mail.openbsd.org And this also allows me to get a very compact format for an entire session: $ grep '0192b8b1f1c2f60c' /var/log/maillog May 26 06:24:56 in smtpd[75500]: 0192b8b1f1c2f60c smtp connected address=199.185.178.25 host=mail.openbsd.org May 26 06:24:57 in smtpd[75500]: 0192b8b1f1c2f60c smtp tls ciphers=TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256 May 26 06:25:04 in smtpd[75500]: 0192b8b1f1c2f60c smtp message msgid=4baeb9ac size=3110 nrcpt=1 proto=ESMTP May 26 06:25:04 in smtpd[75500]: 0192b8b1f1c2f60c smtp envelope evpid=4baeb9ac8c6799b0 from=<x...@openbsd.org> to=<x...@poolp.org> May 26 06:25:14 in smtpd[75500]: 0192b8b1f1c2f60c smtp disconnected reason=quit With this split, human log vs reporting api for tools, tomorrow we can decide in the sample above to rename "message" into "msg" knowing that human readers will not be affected and that scripts will not break as long as they use that reporting API. Hope it clarifies a bit, Gilles