Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Charles Curley
On Fri, 20 Jan 2023 21:30:22 +0100
Sven Joachim  wrote:

> Wow.  Looking into the BTS, I found bug #895089[1], changed "c_rehash"
> to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as
> recommended there, and now "systemctl restart postfix.service"
> completes in two seconds!

A similar major improvement for me.

root@white:~# time systemctl restart postfix.service

real0m18.389s
user0m0.081s
sys 0m0.059s
root@white:~# 

vs.

root@white:~# time systemctl restart postfix.service

real4m10.833s
user0m0.081s
sys 0m0.062s
root@white:~# 

Thank you, Sven, for pursuing this.

The patch has been committed, and should be in the pipeline.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Max Nikulin

On 21/01/2023 01:55, Charles Curley wrote:


root@white:~# ps aux | grep -i openssl
root  4586  5.8  0.9   8256  2064 pts/3S+   11:48   0:00 grep 
--colour=auto -i openssl
root  4587  150  2.1     4720 ?R11:48   0:00 
/usr/bin/openssl x509 -subject_hash_old -fingerprint -noout -in 
QuoVadis_Root_CA_2.pem

...> I have no idea what that's about. Maybe someone with SSL experience can

chime in here?


Delays related to cryptography may be caused by lack of entropy. You may 
try to check


cat /proc/sys/kernel/random/entropy_avail

and

ls -l /proc//fs

when such delay happens. Some network activity may increase entropy 
enough to unblock reading random bytes. If my guess is correct then the 
issue may be alleviated by installing the haveged package. VMs are 
affected more than hosts.





Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Charles Curley
On Fri, 20 Jan 2023 21:30:22 +0100
Sven Joachim  wrote:

> Wow.  Looking into the BTS, I found bug #895089[1], changed "c_rehash"
> to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as
> recommended there, and now "systemctl restart postfix.service"
> completes in two seconds!
> 
> Will follow up on the bug ASAP, if nobody beats me to it.

Fix and everything. Nice! Good work, thank you.

I'll apply the fix later and see what it does on my sclerotic anemic
686 box.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Sven Joachim
On 2023-01-20 21:11 +0100, Sven Joachim wrote:

> On 2023-01-20 20:45 +0100, Sven Joachim wrote:
>
>> My hunch is that postfix recomputes all the hashes in
>> /var/spool/postfix/etc/ssl/certs, rather than copying the files from the
>> host system into the chroot which would be a lot faster.
>
> For those who want to dig deeper, /usr/lib/postfix/configure-instance.sh
> is the (Debian-specific) script which sets up the chroot.  Surely it
> should not recompute all the hashes every time postfix is restarted, but
> apparently it does so at least on Charles' and my system.

Wow.  Looking into the BTS, I found bug #895089[1], changed "c_rehash"
to "openssl rehash" in /usr/lib/postfix/configure-instance.sh as
recommended there, and now "systemctl restart postfix.service" completes
in two seconds!

Will follow up on the bug ASAP, if nobody beats me to it.

Cheers,
   Sven


1. https://bugs.debian.org/895089



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Christoph Brinkhaus
Am Fri, Jan 20, 2023 at 08:28:10PM +0100 schrieb Sven Joachim:
> On 2023-01-20 13:39 -0500, Greg Wooledge wrote:
> 
> > On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote:

Hello Community,

> >> It seems that postfix's startup time has greatly regressed, on my laptop
> >> there are very long delays both at boot:
> >>
> >> ,
> >> | $ systemd-analyze blame | head -n1
> >> | 33.340s postfix@-.service
> >> `
> >
> > A delay that's a multiple of 30 seconds is very often a DNS lookup
> > failure.  I would imagine your postfix configuration is trying to
> > perform a DNS lookup on some hostname or other, and that this is
> > happening before you're actually "online", for whatever definition of
> > "online" is relevant here.
> 
> That should be NetworkManager-wait-online.service.  In the logs I see
> that systemd starts the postfix service directly after reaching
> network-online.target, as it is supposed to do.  The mystery is why it
> takes another 30+ seconds before any messages from postfix itself appear
> in the logs.
> 
> > That's a total guess, though.  Find your logfiles and read them to see
> > what's actually going on.
> 
> Here is what I see in the journal when I restart postfix.service:
> 
> ,
> | Jan 20 20:16:06 myhost postfix/postfix-script[1470]: stopping the Postfix 
> mail system
> | Jan 20 20:16:06 myhost postfix/master[1307]: terminating on signal 15
> | Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Deactivated 
> successfully.
> | Jan 20 20:16:06 myhost systemd[1]: Stopped Postfix Mail Transport Agent 
> (instance -).
> | Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Consumed 36.372s CPU 
> time.
> | Jan 20 20:16:06 myhost systemd[1]: Starting Postfix Mail Transport Agent 
> (instance -)...
> | Jan 20 20:16:41 myhost postfix[1800]: Postfix is using backwards-compatible 
> default settings
> | Jan 20 20:16:41 myhost postfix[1800]: See 
> http://www.postfix.org/COMPATIBILITY_README.html for details
> | Jan 20 20:16:41 myhost postfix[1800]: To disable backwards compatibility 
> use "postconf compatibility_level=3.6" and "pos\
> | tfix reload"
> | Jan 20 20:16:42 myhost postfix/postfix-script[2026]: starting the Postfix 
> mail system
> | Jan 20 20:16:42 myhost postfix/master[2028]: daemon started -- version 
> 3.7.3, configuration /etc/postfix
> | Jan 20 20:16:42 myhost systemd[1]: Started Postfix Mail Transport Agent 
> (instance -).
> | Jan 20 20:16:42 myhost systemd[1]: Starting Postfix Mail Transport Agent...
> | Jan 20 20:16:42 myhost systemd[1]: Finished Postfix Mail Transport Agent.
> `

I have observed something comparable, but with fetchmail for email 
and unbound as a dns resover. Both are started by systemd at almost
the same time. But it takes some time for unbound to fetch keys and so
on. Before unbound is really ready fetchmail has started its first
poll, but without success because the mail servers ip could not be
resolved.

I am only a hobbyist. But may be something similar is happening in the
discussed issue. May be it is worth to have a look how the dns lookup
is set up and how it is starting.

Please just ignore be if this proposal makes no sense.
At work I am no admin :-).

> I have left nothing out, so WTH is postfix waiting for in these 35
> seconds?
> 
> Cheers,
>Sven

BTW: I have resolved my issue by starting fetchmail via systemd by a
wrapper script which probes the dns resolution of the mail server
before starting fetchmail.

Kind regards,
Christoph
-- 
Ist die Katze gesund
schmeckt sie dem Hund.



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Sven Joachim
On 2023-01-20 20:45 +0100, Sven Joachim wrote:

> On 2023-01-20 11:55 -0700, Charles Curley wrote:
>
>> On Fri, 20 Jan 2023 19:17:37 +0100
>> Sven Joachim  wrote:
>>
>>> Clearly something fishy is going on here.
>>
>> I concur. What I saw with htop was a slew of calls to SSL. Here's
>> a sample of what it was doing. It is a processor hog.
>>
>> root@white:~# ps aux | grep -i openssl
>> root  4586  5.8  0.9   8256  2064 pts/3S+   11:48   0:00 grep 
>> --colour=auto -i openssl
>> root 4587 150 2.1  4720 ?  R 11:48 0:00 /usr/bin/openssl x509
>> -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem
>
> Indeed I see many calls to openssl in top, apparently they are children
> of a single c_rehash process.  CPU load is low here, though (2-3 %).
>
>> I have no idea what that's about. Maybe someone with SSL experience can
>> chime in here?
>
> My hunch is that postfix recomputes all the hashes in
> /var/spool/postfix/etc/ssl/certs, rather than copying the files from the
> host system into the chroot which would be a lot faster.

For those who want to dig deeper, /usr/lib/postfix/configure-instance.sh
is the (Debian-specific) script which sets up the chroot.  Surely it
should not recompute all the hashes every time postfix is restarted, but
apparently it does so at least on Charles' and my system.

Cheers,
   Sven



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Jeffrey Walton
On Fri, Jan 13, 2023 at 11:37 AM Charles Curley
 wrote:
>
> I upgraded an i386 machine from bullseye to bookworm. Postfix now
> refuses to run.
>
> root@white:/var/spool# systemctl start postfix@-.service
> Job for postfix@-.service failed because a timeout was exceeded.
> See "systemctl status postfix@-.service" and "journalctl -xeu 
> postfix@-.service" for details.
> root@white:/var/spool# systemctl status postfix@-.service
> × postfix@-.service - Postfix Mail Transport Agent (instance -)
>  Loaded: loaded (/lib/systemd/system/postfix@.service; enabled-runtime; 
> preset: enabled)
>  Active: failed (Result: timeout) since Fri 2023-01-13 07:55:51 MST; 17s 
> ago
>Docs: man:postfix(1)
> Process: 5170 ExecStartPre=/usr/lib/postfix/configure-instance.sh - 
> (code=killed, signal=TERM)
> CPU: 1min 29.456s
>
> Jan 13 07:54:21 white systemd[1]: Starting Postfix Mail Transport Agent 
> (instance -)...
> Jan 13 07:55:51 white systemd[1]: postfix@-.service: start-pre operation 
> timed out. Terminating.
> Jan 13 07:55:51 white systemd[1]: postfix@-.service: Control process exited, 
> code=killed, status=15/TERM
> Jan 13 07:55:51 white systemd[1]: postfix@-.service: Failed with result 
> 'timeout'.
> Jan 13 07:55:51 white systemd[1]: Failed to start Postfix Mail Transport 
> Agent (instance -).
> Jan 13 07:55:51 white systemd[1]: postfix@-.service: Consumed 1min 29.456s 
> CPU time.
> root@white:/var/spool# postconf -n
> alias_database = hash:/etc/aliases
> alias_maps = hash:/etc/aliases
> append_dot_mydomain = no
> biff = no
> compatibility_level = 3.6
> inet_interfaces = all
> inet_protocols = ipv4
> mailbox_size_limit = 0
> mydestination = white.localdomain, white.localdomain, white, 
> localhost.localdomain, localhost
> myhostname = white.localdomain
> mynetworks = 127.0.0.0/8 [:::127.0.0.0]/104 [::1]/128
> myorigin = /etc/mailname
> readme_directory = no
> recipient_delimiter = +
> relayhost = smtp.localdomain
> smtp_tls_CApath = /etc/ssl/certs
> smtp_tls_security_level = may
> smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache
> smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
> smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated 
> defer_unauth_destination
> smtpd_tls_cert_file = /etc/ssl/certs/ssl-cert-snakeoil.pem
> smtpd_tls_key_file = /etc/ssl/private/ssl-cert-snakeoil.key
> smtpd_tls_security_level = may
> root@white:/var/spool#

You should provide the output of journalctl. That usually has the
details that are useful in diagnosing a problem.

> Does anybody read signatures any more?
>
> https://charlescurley.com
> https://charlescurley.com/blog/

I try to skip them. There's no point in wasting time on them.

Jeff



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Sven Joachim
On 2023-01-20 11:55 -0700, Charles Curley wrote:

> On Fri, 20 Jan 2023 19:17:37 +0100
> Sven Joachim  wrote:
>
>> Clearly something fishy is going on here.
>
> I concur. What I saw with htop was a slew of calls to SSL. Here's
> a sample of what it was doing. It is a processor hog.
>
> root@white:~# ps aux | grep -i openssl
> root  4586  5.8  0.9   8256  2064 pts/3S+   11:48   0:00 grep 
> --colour=auto -i openssl
> root 4587 150 2.1  4720 ?  R 11:48 0:00 /usr/bin/openssl x509
> -subject_hash_old -fingerprint -noout -in QuoVadis_Root_CA_2.pem

Indeed I see many calls to openssl in top, apparently they are children
of a single c_rehash process.  CPU load is low here, though (2-3 %).

> I have no idea what that's about. Maybe someone with SSL experience can
> chime in here?

My hunch is that postfix recomputes all the hashes in
/var/spool/postfix/etc/ssl/certs, rather than copying the files from the
host system into the chroot which would be a lot faster.

It is probably time for me to revisit my postfix configuration.

Cheers,
   Sven



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Charles Curley
On Fri, 20 Jan 2023 14:28:22 -0500
Greg Wooledge  wrote:

> More multiples of 30 seconds.  I'm still thinking "DNS issue".

In this case, laziness. The default timeout is 60 seconds. I added 30
to that. Then doubled it. Etc. That doesn't mean you are wrong. I'd
like to know what that ssl command is up to.

The problem is not for lack of network. I executed all the commands I
showed in my recent emails via ssh over Ethernet, and that machine has
full access to the Internet.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Greg Wooledge
On Fri, Jan 20, 2023 at 11:55:35AM -0700, Charles Curley wrote:
> My previous timeout vale was 180 seconds, which wasn't enough. So my
> ancient anemic box needed between 180 and 360 seconds to start postfix.
> (But see below.)

More multiples of 30 seconds.  I'm still thinking "DNS issue".



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Sven Joachim
On 2023-01-20 13:39 -0500, Greg Wooledge wrote:

> On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote:
>> It seems that postfix's startup time has greatly regressed, on my laptop
>> there are very long delays both at boot:
>>
>> ,
>> | $ systemd-analyze blame | head -n1
>> | 33.340s postfix@-.service
>> `
>
> A delay that's a multiple of 30 seconds is very often a DNS lookup
> failure.  I would imagine your postfix configuration is trying to
> perform a DNS lookup on some hostname or other, and that this is
> happening before you're actually "online", for whatever definition of
> "online" is relevant here.

That should be NetworkManager-wait-online.service.  In the logs I see
that systemd starts the postfix service directly after reaching
network-online.target, as it is supposed to do.  The mystery is why it
takes another 30+ seconds before any messages from postfix itself appear
in the logs.

> That's a total guess, though.  Find your logfiles and read them to see
> what's actually going on.

Here is what I see in the journal when I restart postfix.service:

,
| Jan 20 20:16:06 myhost postfix/postfix-script[1470]: stopping the Postfix 
mail system
| Jan 20 20:16:06 myhost postfix/master[1307]: terminating on signal 15
| Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Deactivated 
successfully.
| Jan 20 20:16:06 myhost systemd[1]: Stopped Postfix Mail Transport Agent 
(instance -).
| Jan 20 20:16:06 myhost systemd[1]: postfix@-.service: Consumed 36.372s CPU 
time.
| Jan 20 20:16:06 myhost systemd[1]: Starting Postfix Mail Transport Agent 
(instance -)...
| Jan 20 20:16:41 myhost postfix[1800]: Postfix is using backwards-compatible 
default settings
| Jan 20 20:16:41 myhost postfix[1800]: See 
http://www.postfix.org/COMPATIBILITY_README.html for details
| Jan 20 20:16:41 myhost postfix[1800]: To disable backwards compatibility use 
"postconf compatibility_level=3.6" and "pos\
| tfix reload"
| Jan 20 20:16:42 myhost postfix/postfix-script[2026]: starting the Postfix 
mail system
| Jan 20 20:16:42 myhost postfix/master[2028]: daemon started -- version 3.7.3, 
configuration /etc/postfix
| Jan 20 20:16:42 myhost systemd[1]: Started Postfix Mail Transport Agent 
(instance -).
| Jan 20 20:16:42 myhost systemd[1]: Starting Postfix Mail Transport Agent...
| Jan 20 20:16:42 myhost systemd[1]: Finished Postfix Mail Transport Agent.
`

I have left nothing out, so WTH is postfix waiting for in these 35
seconds?

Cheers,
   Sven



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Charles Curley
On Fri, 20 Jan 2023 19:17:37 +0100
Sven Joachim  wrote:

> Clearly something fishy is going on here.

I concur. What I saw with htop was a slew of calls to SSL. Here's
a sample of what it was doing. It is a processor hog.

root@white:~# ps aux | grep -i openssl
root  4586  5.8  0.9   8256  2064 pts/3S+   11:48   0:00 grep 
--colour=auto -i openssl
root  4587  150  2.1     4720 ?R11:48   0:00 
/usr/bin/openssl x509 -subject_hash_old -fingerprint -noout -in 
QuoVadis_Root_CA_2.pem
root@white:~#

I have no idea what that's about. Maybe someone with SSL experience can
chime in here?

My previous timeout vale was 180 seconds, which wasn't enough. So my
ancient anemic box needed between 180 and 360 seconds to start postfix.
(But see below.)

To repeat your commands:

root@white:~# systemd-analyze blame | grep postfix
4min 24.559s postfix@-.service
35ms postfix.service
root@white:~# time systemctl restart postfix.service

real4m10.833s
user0m0.081s
sys 0m0.062s
root@white:~# 

Meanwhile simply running "postfix start" took much less time.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Greg Wooledge
On Fri, Jan 20, 2023 at 07:17:37PM +0100, Sven Joachim wrote:
> It seems that postfix's startup time has greatly regressed, on my laptop
> there are very long delays both at boot:
> 
> ,
> | $ systemd-analyze blame | head -n1
> | 33.340s postfix@-.service
> `

A delay that's a multiple of 30 seconds is very often a DNS lookup
failure.  I would imagine your postfix configuration is trying to
perform a DNS lookup on some hostname or other, and that this is
happening before you're actually "online", for whatever definition of
"online" is relevant here.

That's a total guess, though.  Find your logfiles and read them to see
what's actually going on.



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Sven Joachim
On 2023-01-20 09:34 -0700, Charles Curley wrote:

> On Fri, 13 Jan 2023 11:52:29 -0700
> Charles Curley  wrote:
>
>> That suggests there's something wrong with
>> the way systemd is starting postfix. I will look into that later
>> today.
>
> Not quite "later today", but:
>
> A bit of thinking about it, and I realized that the computer in
> question is an ancient 686 with very limited RAM, physical and swap. So
> I experimented with watching the startup using htop. That got me
> thinking that maybe the start timeout was too short.
>
> I edited postfix@-.service, like so:
>
> systemctl edit postfix@-.service
>
> and added a line to the end of the [service] stanza:
>
> TimeoutSec=360

It seems that postfix's startup time has greatly regressed, on my laptop
there are very long delays both at boot:

,
| $ systemd-analyze blame | head -n1
| 33.340s postfix@-.service
`

as well when restarting postfix:

,
| $ time sudo systemctl restart postfix.service
| sudo systemctl restart postfix.service  0,06s user 0,03s system 0% cpu 38,611 
total
`

Clearly something fishy is going on here.

Cheers,
   Sven



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-20 Thread Charles Curley
On Fri, 13 Jan 2023 11:52:29 -0700
Charles Curley  wrote:

> That suggests there's something wrong with
> the way systemd is starting postfix. I will look into that later
> today.

Not quite "later today", but:

A bit of thinking about it, and I realized that the computer in
question is an ancient 686 with very limited RAM, physical and swap. So
I experimented with watching the startup using htop. That got me
thinking that maybe the start timeout was too short.

I edited postfix@-.service, like so:

systemctl edit postfix@-.service

and added a line to the end of the [service] stanza:

TimeoutSec=360

So:

root@white:/etc# diff systemd/system/postfix@-.service 
/lib/systemd/system/postfix@.service 
17d16
< TimeoutSec=360
20c19
< WantedBy=multi-user.target
\ No newline at end of file
---
> WantedBy=multi-user.target
root@white:/etc# 


That seems to have fixed the problem. We'll see, said the zen master.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-13 Thread Charles Curley
On Fri, 13 Jan 2023 12:00:31 -0500
Dan Ritter  wrote:

> Run the postfix executable by hand as root, and look for error
> messages and  log entries in /var/log/mail.log among other locations.

Well, that was interesting. Thanks.

In mail.log I found the following from an earlier run:

2023-01-13T06:00:11.188254-07:00 white postfix/postfix-script[3068]: warning: 
/var/spool/postfix/etc/localtime and /etc/localtime differ
2023-01-13T06:00:11.303294-07:00 white postfix/postfix-script[3073]: warning: 
/var/spool/postfix/lib/i386-linux-gnu/libnss_systemd.so.2 and 
/lib/i386-linux-gnu/libnss_systemd.so.2 differ
2023-01-13T06:00:11.479941-07:00 white postfix/postfix-script[3080]: warning: 
/var/spool/postfix/lib/i386-linux-gnu/libgcc_s.so.1 and 
/lib/i386-linux-gnu/libgcc_s.so.1 differ

(I've pasted these in unwrapped, but they may get mangled in transit or
by your mail reader.)

The message isn't quite correct: the three files under /var/spool/…
were missing. I copied the originals into place.

From a run just now:

root@white:~# postfix start
postfix: Postfix is using backwards-compatible default settings
postfix: See http://www.postfix.org/COMPATIBILITY_README.html for details
postfix: To disable backwards compatibility use "postconf 
compatibility_level=3.6" and "postfix reload"
postfix/postfix-script: starting the Postfix mail system
root@white:~#

and ps indicates a slew of postfix jobs running, and a whole bunch of
test emails were delivered. That suggests there's something wrong with
the way systemd is starting postfix. I will look into that later today.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/



Re: Postfix fails after upgrade from bullseye to bookworm

2023-01-13 Thread Dan Ritter
Charles Curley wrote: 
> I upgraded an i386 machine from bullseye to bookworm. Postfix now
> refuses to run.


Run the postfix executable by hand as root, and look for error messages
and  log entries in /var/log/mail.log among other locations.

-dsr-