Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, 20 Jan 2023 21:30:22 +0100 Sven Joachim wrote: > Wow. Looking into the BTS, I found bug #895089[1], changed "c_rehash" > to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as > recommended there, and now "systemctl restart postfix.service" > completes in two seconds! A similar major improvement for me. root@white:~# time systemctl restart postfix.service real0m18.389s user0m0.081s sys 0m0.059s root@white:~# vs. root@white:~# time systemctl restart postfix.service real4m10.833s user0m0.081s sys 0m0.062s root@white:~# Thank you, Sven, for pursuing this. The patch has been committed, and should be in the pipeline. -- Does anybody read signatures any more? https://charlescurley.com https://charlescurley.com/blog/
Re: Postfix fails after upgrade from bullseye to bookworm
On 21/01/2023 01:55, Charles Curley wrote: root@white:~# ps aux | grep -i openssl root 4586 5.8 0.9 8256 2064 pts/3S+ 11:48 0:00 grep --colour=auto -i openssl root 4587 150 2.1 4720 ?R11:48 0:00 /usr/bin/openssl x509 -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem ...> I have no idea what that's about. Maybe someone with SSL experience can chime in here? Delays related to cryptography may be caused by lack of entropy. You may try to check cat /proc/sys/kernel/random/entropy_avail and ls -l /proc//fs when such delay happens. Some network activity may increase entropy enough to unblock reading random bytes. If my guess is correct then the issue may be alleviated by installing the haveged package. VMs are affected more than hosts.
Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, 20 Jan 2023 21:30:22 +0100 Sven Joachim wrote: > Wow. Looking into the BTS, I found bug #895089[1], changed "c_rehash" > to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as > recommended there, and now "systemctl restart postfix.service" > completes in two seconds! > > Will follow up on the bug ASAP, if nobody beats me to it. Fix and everything. Nice! Good work, thank you. I'll apply the fix later and see what it does on my sclerotic anemic 686 box. -- Does anybody read signatures any more? https://charlescurley.com https://charlescurley.com/blog/
Re: Postfix fails after upgrade from bullseye to bookworm
On 2023-01-20 21:11 +0100, Sven Joachim wrote: > On 2023-01-20 20:45 +0100, Sven Joachim wrote: > >> My hunch is that postfix recomputes all the hashes in >> /var/spool/postfix/etc/ssl/certs, rather than copying the files from the >> host system into the chroot which would be a lot faster. > > For those who want to dig deeper, /usr/lib/postfix/configure-instance.sh > is the (Debian-specific) script which sets up the chroot. Surely it > should not recompute all the hashes every time postfix is restarted, but > apparently it does so at least on Charles' and my system. Wow. Looking into the BTS, I found bug #895089[1], changed "c_rehash" to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as recommended there, and now "systemctl restart postfix.service" completes in two seconds! Will follow up on the bug ASAP, if nobody beats me to it. Cheers, Sven 1. https://bugs.debian.org/895089
Re: Postfix fails after upgrade from bullseye to bookworm
Am Fri, Jan 20, 2023 at 08:28:10PM +0100 schrieb Sven Joachim: > On 2023-01-20 13:39 -0500, Greg Wooledge wrote: > > > On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote: Hello Community, > >> It seems that postfix's startup time has greatly regressed, on my laptop > >> there are very long delays both at boot: > >> > >> , > >> | $ systemd-analyze blame | head -n1 > >> | 33.340s postfix@-.service > >> ` > > > > A delay that's a multiple of 30 seconds is very often a DNS lookup > > failure. I would imagine your postfix configuration is trying to > > perform a DNS lookup on some hostname or other, and that this is > > happening before you're actually "online", for whatever definition of > > "online" is relevant here. > > That should be NetworkManager-wait-online.service. In the logs I see > that systemd starts the postfix service directly after reaching > network-online.target, as it is supposed to do. The mystery is why it > takes another 30+ seconds before any messages from postfix itself appear > in the logs. > > > That's a total guess, though. Find your logfiles and read them to see > > what's actually going on. > > Here is what I see in the journal when I restart postfix.service: > > , > | Jan 20 20:16:06 myhost postfix/postfix-script[1470]: stopping the Postfix > mail system > | Jan 20 20:16:06 myhost postfix/master[1307]: terminating on signal 15 > | Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Deactivated > successfully. > | Jan 20 20:16:06 myhost systemd[1]: Stopped Postfix Mail Transport Agent > (instance -). > | Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Consumed 36.372s CPU > time. > | Jan 20 20:16:06 myhost systemd[1]: Starting Postfix Mail Transport Agent > (instance -)... > | Jan 20 20:16:41 myhost postfix[1800]: Postfix is using backwards-compatible > default settings > | Jan 20 20:16:41 myhost postfix[1800]: See > http://www.postfix.org/COMPATIBILITY_README.html for details > | Jan 20 20:16:41 myhost postfix[1800]: To disable backwards compatibility > use "postconf compatibility_level=3.6" and "pos\ > | tfix reload" > | Jan 20 20:16:42 myhost postfix/postfix-script[2026]: starting the Postfix > mail system > | Jan 20 20:16:42 myhost postfix/master[2028]: daemon started -- version > 3.7.3, configuration /etc/postfix > | Jan 20 20:16:42 myhost systemd[1]: Started Postfix Mail Transport Agent > (instance -). > | Jan 20 20:16:42 myhost systemd[1]: Starting Postfix Mail Transport Agent... > | Jan 20 20:16:42 myhost systemd[1]: Finished Postfix Mail Transport Agent. > ` I have observed something comparable, but with fetchmail for email and unbound as a dns resover. Both are started by systemd at almost the same time. But it takes some time for unbound to fetch keys and so on. Before unbound is really ready fetchmail has started its first poll, but without success because the mail servers ip could not be resolved. I am only a hobbyist. But may be something similar is happening in the discussed issue. May be it is worth to have a look how the dns lookup is set up and how it is starting. Please just ignore be if this proposal makes no sense. At work I am no admin :-). > I have left nothing out, so WTH is postfix waiting for in these 35 > seconds? > > Cheers, >Sven BTW: I have resolved my issue by starting fetchmail via systemd by a wrapper script which probes the dns resolution of the mail server before starting fetchmail. Kind regards, Christoph -- Ist die Katze gesund schmeckt sie dem Hund.
Re: Postfix fails after upgrade from bullseye to bookworm
On 2023-01-20 20:45 +0100, Sven Joachim wrote: > On 2023-01-20 11:55 -0700, Charles Curley wrote: > >> On Fri, 20 Jan 2023 19:17:37 +0100 >> Sven Joachim wrote: >> >>> Clearly something fishy is going on here. >> >> I concur. What I saw with htop was a slew of calls to SSL. Here's >> a sample of what it was doing. It is a processor hog. >> >> root@white:~# ps aux | grep -i openssl >> root 4586 5.8 0.9 8256 2064 pts/3S+ 11:48 0:00 grep >> --colour=auto -i openssl >> root 4587 150 2.1 4720 ? R 11:48 0:00 /usr/bin/openssl x509 >> -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem > > Indeed I see many calls to openssl in top, apparently they are children > of a single c_rehash process. CPU load is low here, though (2-3 %). > >> I have no idea what that's about. Maybe someone with SSL experience can >> chime in here? > > My hunch is that postfix recomputes all the hashes in > /var/spool/postfix/etc/ssl/certs, rather than copying the files from the > host system into the chroot which would be a lot faster. For those who want to dig deeper, /usr/lib/postfix/configure-instance.sh is the (Debian-specific) script which sets up the chroot. Surely it should not recompute all the hashes every time postfix is restarted, but apparently it does so at least on Charles' and my system. Cheers, Sven
Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, Jan 13, 2023 at 11:37 AM Charles Curley wrote: > > I upgraded an i386 machine from bullseye to bookworm. Postfix now > refuses to run. > > root@white:/var/spool# systemctl start postfix@-.service > Job for postfix@-.service failed because a timeout was exceeded. > See "systemctl status postfix@-.service" and "journalctl -xeu > postfix@-.service" for details. > root@white:/var/spool# systemctl status postfix@-.service > × postfix@-.service - Postfix Mail Transport Agent (instance -) > Loaded: loaded (/lib/systemd/system/postfix@.service; enabled-runtime; > preset: enabled) > Active: failed (Result: timeout) since Fri 2023-01-13 07:55:51 MST; 17s > ago >Docs: man:postfix(1) > Process: 5170 ExecStartPre=/usr/lib/postfix/configure-instance.sh - > (code=killed, signal=TERM) > CPU: 1min 29.456s > > Jan 13 07:54:21 white systemd[1]: Starting Postfix Mail Transport Agent > (instance -)... > Jan 13 07:55:51 white systemd[1]: postfix@-.service: start-pre operation > timed out. Terminating. > Jan 13 07:55:51 white systemd[1]: postfix@-.service: Control process exited, > code=killed, status=15/TERM > Jan 13 07:55:51 white systemd[1]: postfix@-.service: Failed with result > 'timeout'. > Jan 13 07:55:51 white systemd[1]: Failed to start Postfix Mail Transport > Agent (instance -). > Jan 13 07:55:51 white systemd[1]: postfix@-.service: Consumed 1min 29.456s > CPU time. > root@white:/var/spool# postconf -n > alias_database = hash:/etc/aliases > alias_maps = hash:/etc/aliases > append_dot_mydomain = no > biff = no > compatibility_level = 3.6 > inet_interfaces = all > inet_protocols = ipv4 > mailbox_size_limit = 0 > mydestination = white.localdomain, white.localdomain, white, > localhost.localdomain, localhost > myhostname = white.localdomain > mynetworks = 127.0.0.0/8 [:::127.0.0.0]/104 [::1]/128 > myorigin = /etc/mailname > readme_directory = no > recipient_delimiter = + > relayhost = smtp.localdomain > smtp_tls_CApath = /etc/ssl/certs > smtp_tls_security_level = may > smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache > smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU) > smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated > defer_unauth_destination > smtpd_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem > smtpd_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key > smtpd_tls_security_level = may > root@white:/var/spool# You should provide the output of journalctl. That usually has the details that are useful in diagnosing a problem. > Does anybody read signatures any more? > > https://charlescurley.com > https://charlescurley.com/blog/ I try to skip them. There's no point in wasting time on them. Jeff
Re: Postfix fails after upgrade from bullseye to bookworm
On 2023-01-20 11:55 -0700, Charles Curley wrote: > On Fri, 20 Jan 2023 19:17:37 +0100 > Sven Joachim wrote: > >> Clearly something fishy is going on here. > > I concur. What I saw with htop was a slew of calls to SSL. Here's > a sample of what it was doing. It is a processor hog. > > root@white:~# ps aux | grep -i openssl > root 4586 5.8 0.9 8256 2064 pts/3S+ 11:48 0:00 grep > --colour=auto -i openssl > root 4587 150 2.1 4720 ? R 11:48 0:00 /usr/bin/openssl x509 > -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem Indeed I see many calls to openssl in top, apparently they are children of a single c_rehash process. CPU load is low here, though (2-3 %). > I have no idea what that's about. Maybe someone with SSL experience can > chime in here? My hunch is that postfix recomputes all the hashes in /var/spool/postfix/etc/ssl/certs, rather than copying the files from the host system into the chroot which would be a lot faster. It is probably time for me to revisit my postfix configuration. Cheers, Sven
Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, 20 Jan 2023 14:28:22 -0500 Greg Wooledge wrote: > More multiples of 30 seconds. I'm still thinking "DNS issue". In this case, laziness. The default timeout is 60 seconds. I added 30 to that. Then doubled it. Etc. That doesn't mean you are wrong. I'd like to know what that ssl command is up to. The problem is not for lack of network. I executed all the commands I showed in my recent emails via ssh over Ethernet, and that machine has full access to the Internet. -- Does anybody read signatures any more? https://charlescurley.com https://charlescurley.com/blog/
Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, Jan 20, 2023 at 11:55:35AM -0700, Charles Curley wrote: > My previous timeout vale was 180 seconds, which wasn't enough. So my > ancient anemic box needed between 180 and 360 seconds to start postfix. > (But see below.) More multiples of 30 seconds. I'm still thinking "DNS issue".
Re: Postfix fails after upgrade from bullseye to bookworm
On 2023-01-20 13:39 -0500, Greg Wooledge wrote: > On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote: >> It seems that postfix's startup time has greatly regressed, on my laptop >> there are very long delays both at boot: >> >> , >> | $ systemd-analyze blame | head -n1 >> | 33.340s postfix@-.service >> ` > > A delay that's a multiple of 30 seconds is very often a DNS lookup > failure. I would imagine your postfix configuration is trying to > perform a DNS lookup on some hostname or other, and that this is > happening before you're actually "online", for whatever definition of > "online" is relevant here. That should be NetworkManager-wait-online.service. In the logs I see that systemd starts the postfix service directly after reaching network-online.target, as it is supposed to do. The mystery is why it takes another 30+ seconds before any messages from postfix itself appear in the logs. > That's a total guess, though. Find your logfiles and read them to see > what's actually going on. Here is what I see in the journal when I restart postfix.service: , | Jan 20 20:16:06 myhost postfix/postfix-script[1470]: stopping the Postfix mail system | Jan 20 20:16:06 myhost postfix/master[1307]: terminating on signal 15 | Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Deactivated successfully. | Jan 20 20:16:06 myhost systemd[1]: Stopped Postfix Mail Transport Agent (instance -). | Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Consumed 36.372s CPU time. | Jan 20 20:16:06 myhost systemd[1]: Starting Postfix Mail Transport Agent (instance -)... | Jan 20 20:16:41 myhost postfix[1800]: Postfix is using backwards-compatible default settings | Jan 20 20:16:41 myhost postfix[1800]: See http://www.postfix.org/COMPATIBILITY_README.html for details | Jan 20 20:16:41 myhost postfix[1800]: To disable backwards compatibility use "postconf compatibility_level=3.6" and "pos\ | tfix reload" | Jan 20 20:16:42 myhost postfix/postfix-script[2026]: starting the Postfix mail system | Jan 20 20:16:42 myhost postfix/master[2028]: daemon started -- version 3.7.3, configuration /etc/postfix | Jan 20 20:16:42 myhost systemd[1]: Started Postfix Mail Transport Agent (instance -). | Jan 20 20:16:42 myhost systemd[1]: Starting Postfix Mail Transport Agent... | Jan 20 20:16:42 myhost systemd[1]: Finished Postfix Mail Transport Agent. ` I have left nothing out, so WTH is postfix waiting for in these 35 seconds? Cheers, Sven
Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, 20 Jan 2023 19:17:37 +0100 Sven Joachim wrote: > Clearly something fishy is going on here. I concur. What I saw with htop was a slew of calls to SSL. Here's a sample of what it was doing. It is a processor hog. root@white:~# ps aux | grep -i openssl root 4586 5.8 0.9 8256 2064 pts/3S+ 11:48 0:00 grep --colour=auto -i openssl root 4587 150 2.1 4720 ?R11:48 0:00 /usr/bin/openssl x509 -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem root@white:~# I have no idea what that's about. Maybe someone with SSL experience can chime in here? My previous timeout vale was 180 seconds, which wasn't enough. So my ancient anemic box needed between 180 and 360 seconds to start postfix. (But see below.) To repeat your commands: root@white:~# systemd-analyze blame | grep postfix 4min 24.559s postfix@-.service 35ms postfix.service root@white:~# time systemctl restart postfix.service real4m10.833s user0m0.081s sys 0m0.062s root@white:~# Meanwhile simply running "postfix start" took much less time. -- Does anybody read signatures any more? https://charlescurley.com https://charlescurley.com/blog/
Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote: > It seems that postfix's startup time has greatly regressed, on my laptop > there are very long delays both at boot: > > , > | $ systemd-analyze blame | head -n1 > | 33.340s postfix@-.service > ` A delay that's a multiple of 30 seconds is very often a DNS lookup failure. I would imagine your postfix configuration is trying to perform a DNS lookup on some hostname or other, and that this is happening before you're actually "online", for whatever definition of "online" is relevant here. That's a total guess, though. Find your logfiles and read them to see what's actually going on.
Re: Postfix fails after upgrade from bullseye to bookworm
On 2023-01-20 09:34 -0700, Charles Curley wrote: > On Fri, 13 Jan 2023 11:52:29 -0700 > Charles Curley wrote: > >> That suggests there's something wrong with >> the way systemd is starting postfix. I will look into that later >> today. > > Not quite "later today", but: > > A bit of thinking about it, and I realized that the computer in > question is an ancient 686 with very limited RAM, physical and swap. So > I experimented with watching the startup using htop. That got me > thinking that maybe the start timeout was too short. > > I edited postfix@-.service, like so: > > systemctl edit postfix@-.service > > and added a line to the end of the [service] stanza: > > TimeoutSec=360 It seems that postfix's startup time has greatly regressed, on my laptop there are very long delays both at boot: , | $ systemd-analyze blame | head -n1 | 33.340s postfix@-.service ` as well when restarting postfix: , | $ time sudo systemctl restart postfix.service | sudo systemctl restart postfix.service 0,06s user 0,03s system 0% cpu 38,611 total ` Clearly something fishy is going on here. Cheers, Sven
Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, 13 Jan 2023 11:52:29 -0700 Charles Curley wrote: > That suggests there's something wrong with > the way systemd is starting postfix. I will look into that later > today. Not quite "later today", but: A bit of thinking about it, and I realized that the computer in question is an ancient 686 with very limited RAM, physical and swap. So I experimented with watching the startup using htop. That got me thinking that maybe the start timeout was too short. I edited postfix@-.service, like so: systemctl edit postfix@-.service and added a line to the end of the [service] stanza: TimeoutSec=360 So: root@white:/etc# diff systemd/system/postfix@-.service /lib/systemd/system/postfix@.service 17d16 < TimeoutSec=360 20c19 < WantedBy=multi-user.target \ No newline at end of file --- > WantedBy=multi-user.target root@white:/etc# That seems to have fixed the problem. We'll see, said the zen master. -- Does anybody read signatures any more? https://charlescurley.com https://charlescurley.com/blog/
Re: Postfix fails after upgrade from bullseye to bookworm
On Fri, 13 Jan 2023 12:00:31 -0500 Dan Ritter wrote: > Run the postfix executable by hand as root, and look for error > messages and log entries in /var/log/mail.log among other locations. Well, that was interesting. Thanks. In mail.log I found the following from an earlier run: 2023-01-13T06:00:11.188254-07:00 white postfix/postfix-script[3068]: warning: /var/spool/postfix/etc/localtime and /etc/localtime differ 2023-01-13T06:00:11.303294-07:00 white postfix/postfix-script[3073]: warning: /var/spool/postfix/lib/i386-linux-gnu/libnss_systemd.so.2 and /lib/i386-linux-gnu/libnss_systemd.so.2 differ 2023-01-13T06:00:11.479941-07:00 white postfix/postfix-script[3080]: warning: /var/spool/postfix/lib/i386-linux-gnu/libgcc_s.so.1 and /lib/i386-linux-gnu/libgcc_s.so.1 differ (I've pasted these in unwrapped, but they may get mangled in transit or by your mail reader.) The message isn't quite correct: the three files under /var/spool/… were missing. I copied the originals into place. From a run just now: root@white:~# postfix start postfix: Postfix is using backwards-compatible default settings postfix: See http://www.postfix.org/COMPATIBILITY_README.html for details postfix: To disable backwards compatibility use "postconf compatibility_level=3.6" and "postfix reload" postfix/postfix-script: starting the Postfix mail system root@white:~# and ps indicates a slew of postfix jobs running, and a whole bunch of test emails were delivered. That suggests there's something wrong with the way systemd is starting postfix. I will look into that later today. -- Does anybody read signatures any more? https://charlescurley.com https://charlescurley.com/blog/
Re: Postfix fails after upgrade from bullseye to bookworm
Charles Curley wrote: > I upgraded an i386 machine from bullseye to bookworm. Postfix now > refuses to run. Run the postfix executable by hand as root, and look for error messages and log entries in /var/log/mail.log among other locations. -dsr-