Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-12 Thread Olaf Hopp (SCC) via Exim-users

On 3/12/21 9:34 AM, Olaf Hopp (SCC) via Exim-users wrote:

Hi, folks,
thanks for your answers so far
Just a few more input:
There is no rate limiting, tar pitting or so .
I created a special "accept" ACL just at the beginning of my acl_smtp_rcpt
I created a special router at the beginning of my routers.
I commented out most of my redirect routers.
So most of the ACL and routing stuff in my config is skipped.



Blame on me :(
I had a typo in the sender adress in my acl_smtp_rcpt
So if I use "accept senders = foo@example.com" correctly
as first line in acl_smtp_rcpt my test runs like hell under 2 Minutes.
So the delay must come "somewhere" from below within my receipient acls.

So fixed for that special sender.

Regards and a have nice weekend,
Olaf

--
Karlsruher Institut für Technologie (KIT)
Steinbuch Centre for Computing (SCC)

Dipl.-Geophys. Olaf Hopp

Zirkel 2
Gebäude 20.21, Raum 316
76131 Karlsruhe

Telefon: +49 721 608-48009
E-Mail: olaf.h...@kit.edu
Web: www.scc.kit.edu

Sitz der Körperschaft:
Kaiserstraße 12, 76131 Karlsruhe

KIT - Die Forschungsuniversität in der Helmholtz-Gemeinschaft



smime.p7s
Description: S/MIME Cryptographic Signature
-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-12 Thread Jeremy Harris via Exim-users

On 12/03/2021 09:45, Cyborg via Exim-users wrote:

Am 12.03.21 um 10:21 schrieb Jeremy Harris via Exim-users:



I have a static linear timeframe from one recipient to another appearing in the 
log file, once the actual delivery from thunderbird ended.
strace says, it's due to my SQL select i use. it would go faster if the SQL 
result would be cached, instead of repeating itself.



Could you show us the ACL line?



it was a simple one like this:

"select 1 from eximconfig where  '${quote_mysql:${domain:$item}}' REGEXP 
\Nconcat('^',domain,'$')\N and name='' and value='1' LIMIT 1


You didn't show the part where $item was derived (assuming that's what the ACL 
line
really says).  Your obfuscation, not giving the entire ACL line, hinders.

That said, I would expect that query to be cachable.  Your strace showed
repeated actual SQL lookups... could you run a test with debug enabled
and show the output?  Just a couple of recipients would be enough,
and you could mail me direct rather than the mailinglist if you prefer
not to expose info to the whole world.
--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-12 Thread Cyborg via Exim-users

Am 12.03.21 um 10:21 schrieb Jeremy Harris via Exim-users:


I have a static linear timeframe from one recipient to another 
appearing in the log file, once the actual delivery from thunderbird 
ended.
strace says, it's due to my SQL select i use. it would go faster if 
the SQL result would be cached, instead of repeating itself.



Could you show us the ACL line?



it was a simple one like this:

"select 1 from eximconfig where  '${quote_mysql:${domain:$item}}' REGEXP 
\Nconcat('^',domain,'$')\N and name='' and value='1' LIMIT 1


As all those 3000 recipients where to the same domain ( 
test1...test3000@domainname ) it was defacto the same SQL query. I do 
not expect a sql cache in exim,
as it would be a very very complex issue and is not really the 
bottleneck here.


BTW: Test done on

Name    : exim
Version : 4.94
Release : 1.fc32
Architecture: x86_64

best regards,
Marius


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-12 Thread Jeremy Harris via Exim-users

On 12/03/2021 08:44, Cyborg via Exim-users wrote:

Am 12.03.21 um 08:27 schrieb Andrew C Aitchison via Exim-users:



Any ideas what's happening here ?


The increasing delays remind me of the delays for failed logins.
Is there some sort of authentication, or similar pam or SMB access,
to check the existence of each recipient ?



i made a test mail with 3000 recipients in CC.

Exim is increasing it's CPU Usage the more recipients are used until 100% CPU 
Core usage is reached:


     PID USER  PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
1378404 root  20   0   66084  55300  10632 R  98,7   1,1 8:59.99 
/usr/sbin/exim -Mc 1lKd2y-005m5p-3V


I have a static linear timeframe from one recipient to another appearing in the 
log file, once the actual delivery from thunderbird ended.
strace says, it's due to my SQL select i use. it would go faster if the SQL 
result would be cached, instead of repeating itself.



Could you show us the ACL line?
--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-12 Thread Cyborg via Exim-users

Am 12.03.21 um 08:27 schrieb Andrew C Aitchison via Exim-users:



Any ideas what's happening here ?


The increasing delays remind me of the delays for failed logins.
Is there some sort of authentication, or similar pam or SMB access,
to check the existence of each recipient ?



i made a test mail with 3000 recipients in CC.

Exim is increasing it's CPU Usage the more recipients are used until 
100% CPU Core usage is reached:



    PID USER  PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
1378404 root  20   0   66084  55300  10632 R  98,7   1,1 8:59.99 
/usr/sbin/exim -Mc 1lKd2y-005m5p-3V



I have a static linear timeframe from one recipient to another appearing 
in the log file, once the actual delivery from thunderbird ended.
strace says, it's due to my SQL select i use. it would go faster if the 
SQL result would be cached, instead of repeating itself.


Conclusion:

a) the ( recipient ) listfunction in exim needs a redo, to improve 
performance.


This is a very basic list optimization problem, you learn in data 
structures at the university.


b) the setup in question has an additional problem, increasing the time 
exponential .


because the core is reaching 100%, it's at it's max.. at that point, 
things slow down even more.


c) time to make exim multi-threaded

best regards,
Marius

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-12 Thread Olaf Hopp (SCC) via Exim-users

Hi, folks,
thanks for your answers so far
Just a few more input:
There is no rate limiting, tar pitting or so .
I created a special "accept" ACL just at the beginning of my acl_smtp_rcpt
I created a special router at the beginning of my routers.
I commented out most of my redirect routers.
So most of the ACL and routing stuff in my config is skipped.

In the strace log I see the handling of the remaining stuff of my exim config.
But this handling is done after the observed delay between the RCPT line and 
the socket line

RAM is fine, one of my four cores is at 100% during my tests, it's a Debian 10 
buster OS
with an exim 4.94 from the Debian 11-backport

I will try your proposals and keep you informed.

Regards, Olaf

On 3/11/21 9:59 PM, Olaf Hopp (SCC) via Exim-users wrote:

Dear collegues,

my administration writes one(!) email with more than 3000 receipients
in the envelope. (Please, no discussion about the sense of this. Period.)

Receiving that mail takes about 2 two hours.
Delivering that mail is done very quick, as usual.

I started debugging this and I recognized that
after every "RCPT TO" it took a sucessive increasing time
until the "250 Accepted" was answered.
At the beginning this was a 10th of a second.
After 1000 recipients there was a delay of ~one second
after the 2000th recipients this delay increased smoothly
to 2 secends ofter each rcpt-to

I threw away all my bells and whistles in the exim config,
but no success.

So I started "strace" to see whats happening:

At the beginning I see
    18:35:36.048265 read(7, "RCPT TO:\r\n", 8191) = 45
    18:35:36.063187 alarm(0)    = 300
    18:35:36.063233 rt_sigaction(SIGALRM, {sa_handler=0x55bc6f392180, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8eff4d9730}, NULL, 8) = 0
    18:35:36.140936 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
IPPROTO_IP) = 9
see the nearly zero difference between the RCPT  line and the socket line

After ~1000 recipients I have:
    18:40:51.670874 read(7, "RCPT TO:\r\n", 8191) = 43
    18:40:51.685450 alarm(0)    = 300
    18:40:51.685538 rt_sigaction(SIGALRM, {sa_handler=0x55bc6f392180, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8eff4d9730}, NULL, 8) = 0
    18:40:52.277971 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
IPPROTO_IP) = 9
so half a second between the rcpt line and the socket line

After ~2000 recipients:
    19:06:28.229652 read(7, "RCPT TO:\r\n", 8191) = 29
    19:06:28.244041 alarm(0)    = 300
    19:06:28.244253 rt_sigaction(SIGALRM, {sa_handler=0x55bc6f392180, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8eff4d9730}, NULL, 8) = 0
    19:06:29.776217 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
IPPROTO_IP) = 9
about 1.5 seconds between rcpt and the socket line

There are no dropped lines between the RCPT line until the socket line in the 
strace output

Any ideas what's happening here ?

(testing was done with "swaks" talking directly to the exim)

Regards, Olaf







--
Karlsruher Institut für Technologie (KIT)
Steinbuch Centre for Computing (SCC)

Dipl.-Geophys. Olaf Hopp

Zirkel 2
Gebäude 20.21, Raum 316
76131 Karlsruhe

Telefon: +49 721 608-48009
E-Mail: olaf.h...@kit.edu
Web: www.scc.kit.edu

Sitz der Körperschaft:
Kaiserstraße 12, 76131 Karlsruhe

KIT - Die Forschungsuniversität in der Helmholtz-Gemeinschaft



smime.p7s
Description: S/MIME Cryptographic Signature
-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-11 Thread Andrew C Aitchison via Exim-users

On Thu, 11 Mar 2021, Olaf Hopp (SCC) via Exim-users wrote:

my administration writes one(!) email with more than 3000 
receipients
in the envelope. (Please, no discussion about the sense of this. 
Period.)


Receiving that mail takes about 2 two hours.
Delivering that mail is done very quick, as usual.

I started debugging this and I recognized that
after every "RCPT TO" it took a sucessive increasing time
until the "250 Accepted" was answered.
At the beginning this was a 10th of a second.
After 1000 recipients there was a delay of ~one second
after the 2000th recipients this delay increased smoothly
to 2 secends ofter each rcpt-to



Any ideas what's happening here ?


The increasing delays remind me of the delays for failed logins.
Is there some sort of authentication, or similar pam or SMB access,
to check the existence of each recipient ?

--
Andrew C. Aitchison Kendal, UK
and...@aitchison.me.uk

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-11 Thread Evgeniy Berdnikov via Exim-users
  Hello.
  
On Thu, Mar 11, 2021 at 09:59:43PM +0100, Olaf Hopp (SCC) via Exim-users wrote:
> After ~2000 recipients:
>19:06:28.229652 read(7, "RCPT TO:\r\n", 8191) = 29
>19:06:28.244041 alarm(0)= 300
>19:06:28.244253 rt_sigaction(SIGALRM, {sa_handler=0x55bc6f392180, 
> sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8eff4d9730}, NULL, 8) = 0
>19:06:29.776217 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
> IPPROTO_IP) = 9
> about 1.5 seconds between rcpt and the socket line
> 
> There are no dropped lines between the RCPT line until the socket line in the 
> strace output
> 
> Any ideas what's happening here ?

 There may be some cpu-intensive processing inside RCPT ACL.
 Try "ltrace -S -T -tt ..." to display library calls.
 But running exim with debug options would be more informative.
-- 
 Eugene Berdnikov

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-11 Thread Jeremy Harris via Exim-users

On 11/03/2021 20:59, Olaf Hopp (SCC) via Exim-users wrote:

I started debugging this and I recognized that
after every "RCPT TO" it took a sucessive increasing time
until the "250 Accepted" was answered.
At the beginning this was a 10th of a second.
After 1000 recipients there was a delay of ~one second
after the 2000th recipients this delay increased smoothly
to 2 secends ofter each rcpt-to



After ~2000 recipients:
    19:06:28.229652 read(7, "RCPT TO:\r\n", 8191) = 29
    19:06:28.244041 alarm(0)    = 300
    19:06:28.244253 rt_sigaction(SIGALRM, {sa_handler=0x55bc6f392180, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8eff4d9730}, NULL, 8) = 0
    19:06:29.776217 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
IPPROTO_IP) = 9
about 1.5 seconds between rcpt and the socket line


In a test here I'm seeing under a mmillisecond between RCPT commands
processed by the server, even after 2000 of them:

21:51:59.681 12433 end of inline ACL: ACCEPT
21:51:59.681 12433 SMTP>> 250 Accepted
21:51:59.681 12433 SMTP<< rcpt to:
21:51:59.681 12433 processing "accept" (/home/jgh/git/exim/test/test-config 29)
21:51:59.681 12433 accept: condition test succeeded in inline ACL
21:51:59.681 12433 end of inline ACL: ACCEPT
21:51:59.681 12433 SMTP>> 250 Accepted
21:51:59.681 12433 SMTP<< data
21:51:59.681 12433 SMTP>> 354 Enter message, ending with "." on a line by itself
21:51:59.681 12433 search_tidyup called
21:51:59.681 12433 DMARC: no dmarc_tld_file
21:51:59.681 12433 >>Headers received:
21:51:59.682 12433

This was running the server with -d+all and with a log_selector = +millisec.
Are you able to do that for a test?

The other useful item you might be able to infer from your strace is: what
was that socket used for?

--
Cheers,
  Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Mail with thousands recipients takes exponential longer time

2021-03-11 Thread Bill Cole via Exim-users

On 11 Mar 2021, at 15:59, Olaf Hopp (SCC) via Exim-users wrote:


Dear collegues,

my administration writes one(!) email with more than 3000 receipients
in the envelope. (Please, no discussion about the sense of this. 
Period.)


Receiving that mail takes about 2 two hours.
Delivering that mail is done very quick, as usual.

I started debugging this and I recognized that
after every "RCPT TO" it took a sucessive increasing time
until the "250 Accepted" was answered.
At the beginning this was a 10th of a second.
After 1000 recipients there was a delay of ~one second
after the 2000th recipients this delay increased smoothly
to 2 secends ofter each rcpt-to

I threw away all my bells and whistles in the exim config,
but no success.

[...]

Any ideas what's happening here ?


Check your config for ACL lines setting "ratelimit" and/or "delay". See 
examples at 
https://www.exim.org/exim-html-current/doc/html/spec_html/ch-access_control_lists.html


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


[exim] Mail with thousands recipients takes exponential longer time

2021-03-11 Thread Olaf Hopp (SCC) via Exim-users

Dear collegues,

my administration writes one(!) email with more than 3000 receipients
in the envelope. (Please, no discussion about the sense of this. Period.)

Receiving that mail takes about 2 two hours.
Delivering that mail is done very quick, as usual.

I started debugging this and I recognized that
after every "RCPT TO" it took a sucessive increasing time
until the "250 Accepted" was answered.
At the beginning this was a 10th of a second.
After 1000 recipients there was a delay of ~one second
after the 2000th recipients this delay increased smoothly
to 2 secends ofter each rcpt-to

I threw away all my bells and whistles in the exim config,
but no success.

So I started "strace" to see whats happening:

At the beginning I see
   18:35:36.048265 read(7, "RCPT TO:\r\n", 8191) = 45
   18:35:36.063187 alarm(0)= 300
   18:35:36.063233 rt_sigaction(SIGALRM, {sa_handler=0x55bc6f392180, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8eff4d9730}, NULL, 8) = 0
   18:35:36.140936 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
IPPROTO_IP) = 9
see the nearly zero difference between the RCPT  line and the socket line

After ~1000 recipients I have:
   18:40:51.670874 read(7, "RCPT TO:\r\n", 8191) = 43
   18:40:51.685450 alarm(0)= 300
   18:40:51.685538 rt_sigaction(SIGALRM, {sa_handler=0x55bc6f392180, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8eff4d9730}, NULL, 8) = 0
   18:40:52.277971 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
IPPROTO_IP) = 9
so half a second between the rcpt line and the socket line

After ~2000 recipients:
   19:06:28.229652 read(7, "RCPT TO:\r\n", 8191) = 29
   19:06:28.244041 alarm(0)= 300
   19:06:28.244253 rt_sigaction(SIGALRM, {sa_handler=0x55bc6f392180, 
sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8eff4d9730}, NULL, 8) = 0
   19:06:29.776217 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 
IPPROTO_IP) = 9
about 1.5 seconds between rcpt and the socket line

There are no dropped lines between the RCPT line until the socket line in the 
strace output

Any ideas what's happening here ?

(testing was done with "swaks" talking directly to the exim)

Regards, Olaf



--
Karlsruher Institut für Technologie (KIT)
Steinbuch Centre for Computing (SCC)

Dipl.-Geophys. Olaf Hopp

Zirkel 2
Gebäude 20.21, Raum 316
76131 Karlsruhe

Telefon: +49 721 608-48009
E-Mail: olaf.h...@kit.edu
Web: www.scc.kit.edu

Sitz der Körperschaft:
Kaiserstraße 12, 76131 Karlsruhe

KIT - Die Forschungsuniversität in der Helmholtz-Gemeinschaft



smime.p7s
Description: S/MIME Cryptographic Signature
-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/