[Dovecot] pam_start() failed: system error with dovecot 1.1.2, cause unknown

2008-09-02 Thread Adam McDougall
I would guess this is unlikely to be dovecot's fault, but I'm wondering 
if anyone has any ideas of what might have happened based on the 
evidence. My best guess is some kind of resource limit was reached but I 
don't see any evidence in the logs, and the condition is now gone.


Suddenly this morning, one (and only one) of my dovecot servers decided 
to start failing all logins since 08:25:04 until we restarted dovecot, 
at which point they were working fine.  The number of imap-login 
processes was under the limit, but there were some obvious PAM errors at 
the time.  My account could still ssh to the system so I don't think it 
was a problem general authentication, and NIS on other systems was 
working fine.  No one was logged into that server at the time the 
problems occurred, and I don't think anything happened to the actual pam 
libraries to make them missing since dovecot worked after a restart.  I 
should have used other means to prevent people from using that dovecot 
instance rather than stopping it, and I'll do so if it happens again in 
hopes of further debugging.


/var/log/maillog:
Sep  2 08:25:01 hill dovecot: imap-login: Login: user=userA, 
method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
Sep  2 08:25:01 hill dovecot: IMAP(userA): Disconnected: Logged out 
bytes=127/568
Sep  2 08:25:01 hill dovecot: imap-login: Login: user=userA, 
method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
Sep  2 08:25:01 hill dovecot: IMAP(userA): Disconnected: Logged out 
bytes=282/9641
Sep  2 08:25:04 hill dovecot: imap-login: Login: user=userA, 
method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
Sep  2 08:25:04 hill dovecot: IMAP(userA): Disconnected: Logged out 
bytes=46/543

***problem started here
Sep  2 08:25:04 hill dovecot: auth-worker(default): 
pam(userA,127.0.0.1): pam_start() failed: system error
Sep  2 08:25:04 hill dovecot: auth-worker(default): 
pam(userB,35.9.37.164): pam_start() failed: system error
Sep  2 08:25:05 hill dovecot: auth-worker(default): 
pam(userC,35.9.37.164): pam_start() failed: system error
Sep  2 08:25:06 hill dovecot: imap-login: Aborted login (auth failed, 1 
attempts): user=userB, method=PLAIN, rip=35.9.37.164, lip=35.9.37.190, TLS
Sep  2 08:25:06 hill dovecot: imap-login: Aborted login (auth failed, 1 
attempts): user=userA, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured
Sep  2 08:25:07 hill dovecot: imap-login: Aborted login (auth failed, 1 
attempts): user=userC, method=PLAIN, rip=35.9.37.164, lip=35.9.37.190, TLS

.

/var/log/messages:
Sep  2 08:25:04 hill dovecot-auth: in openpam_load_module(): no 
pam_permit.so found
Sep  2 08:25:04 hill dovecot-auth: in openpam_load_module(): no 
pam_login_access.so found
Sep  2 08:25:05 hill dovecot-auth: in openpam_load_module(): no 
pam_nologin.so found
Sep  2 08:25:10 hill dovecot-auth: in openpam_load_module(): no 
pam_unix.so found
Sep  2 08:25:11 hill dovecot-auth: in openpam_load_module(): no 
pam_unix.so found
Sep  2 08:25:20 hill dovecot-auth: in openpam_load_module(): no 
pam_opieaccess.so found
Sep  2 08:25:20 hill dovecot-auth: in openpam_load_module(): no 
pam_opie.so found
Sep  2 08:25:51 hill kernel: Sep  2 08:25:51 hill last message repeated 
12 times
Sep  2 08:27:52 hill kernel: Sep  2 08:27:52 hill last message repeated 
37 times
Sep  2 08:38:01 hill kernel: Sep  2 08:38:01 hill last message repeated 
144 times
Sep  2 08:48:06 hill kernel: Sep  2 08:48:06 hill last message repeated 
129 times
Sep  2 08:53:36 hill kernel: Sep  2 08:52:51 hill last message repeated 
51 times




Re: [Dovecot] pam_start() failed: system error with dovecot 1.1.2, cause unknown

2008-09-02 Thread Timo Sirainen

On Sep 2, 2008, at 4:48 PM, Adam McDougall wrote:

I would guess this is unlikely to be dovecot's fault, but I'm  
wondering if anyone has any ideas of what might have happened based  
on the evidence. My best guess is some kind of resource limit was  
reached but I don't see any evidence in the logs, and the condition  
is now gone.


Suddenly this morning, one (and only one) of my dovecot servers  
decided to start failing all logins since 08:25:04 until we  
restarted dovecot, at which point they were working fine.  The  
number of imap-login processes was under the limit, but there were  
some obvious PAM errors at the time.  My account could still ssh to  
the system so I don't think it was a problem general authentication,  
and NIS on other systems was working fine.  No one was logged into  
that server at the time the problems occurred, and I don't think  
anything happened to the actual pam libraries to make them missing  
since dovecot worked after a restart.  I should have used other  
means to prevent people from using that dovecot instance rather than  
stopping it, and I'll do so if it happens again in hopes of further  
debugging.


Maybe your PAM plugins are leaking memory/fds. Have you set  
auth_worker_max_request_count to non-zero? That could help.




PGP.sig
Description: This is a digitally signed message part