Re: [Dovecot] memory allocation issues

2012-11-26 Thread Timo Sirainen
On 27.11.2012, at 8.39, Timo Sirainen wrote:

>> userdb {
>> driver = passwd
>> }
> 
> Both of these lookups should be done by auth-worker processes. So why is it 
> the auth process that complains?

Because of a bug in v2.0.16+ :(

http://hg.dovecot.org/dovecot-2.1/rev/8e5d9d88e250

As a workaround you can use for existing versions:

userdb {
  driver = passwd
  args = blocking=yes
}



Re: [Dovecot] memory allocation issues

2012-11-26 Thread Timo Sirainen
On 26.11.2012, at 22.03, Kelsey Cummings wrote:

>>> auth: Error: malloc: 58012: Cannot allocate memory
>>> auth: Error: Unable to allocate memory for mutexes from the region
>>> auth: Error: PANIC: Cannot allocate memory
>>> auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user
..
>> Memory leak maybe? service auth { vsz_limit } anyway was reached (default 
>> 256 MB).
> 
> It is currently set to 768M, I'll go ahead and raise it up to 1G.
> Anything I can do to help see if it is a memory leak?

Is it really the auth master process that fails? 

> passdb {
>  args = imap
>  driver = pam
> }
..
> userdb {
>  driver = passwd
> }

Both of these lookups should be done by auth-worker processes. So why is it the 
auth process that complains?

Anyway .. if the problem really is auth-worker and this is only a logging 
problem (I would have expected auth-worker: prefix in log lines), then the 
solution is simple. Just restart the auth-worker processes every 1000 lookups:

service auth-worker {
  service_count = 1000
}



Re: [Dovecot] memory allocation issues

2012-11-26 Thread Kelsey Cummings
On Fri, Nov 23, 2012 at 08:36:37AM +0200, Timo Sirainen wrote:
> On 9.11.2012, at 2.49, Kelsey Cummings wrote:
> > One of our dovecot backend servers ran into a problem with it's auth
> > process a few days ago.  This doesn't appear to be the error logged when
> > dovecot hits its internal limit so I'm not sure what is going on here.
> > 
> > auth: Error: malloc: 58012: Cannot allocate memory
> > auth: Error: Unable to allocate memory for mutexes from the region
> > auth: Error: PANIC: Cannot allocate memory
> > auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user
> 
> It would have been nicer if libc would have just crashed the process instead 
> of silently converting it into "unknown user" error.. That's probably 
> actually a bug since the getpwuid_r() that Dovecot uses would have been able 
> to return an error message.

We saw two boxes do this over the weekend.

> > pop3: Error: Authenticated user not found from userdb, auth lookup 
> > +id=2509111297 (client-pid=4781 client-id=1)
> > pop3-login: Internal login failure (pid=4781 id=1) (internal failure, 1 
> > +succesful auths): user=...
> > 
> > There was at least 10+GB free RAM on the server and no indication of a
> > system level issue at the same time.  The server is running 2.1.9.
> > There were about 3,200 active sessions, with something like 12 new
> > sessions/sec.  The other identical servers are/were handling virtually
> > identical load with the same service uptime and haven't had any issues 
> > so far.  (Crash happened 7 days ago.)
> 
> Memory leak maybe? service auth { vsz_limit } anyway was reached (default 256 
> MB).

It is currently set to 768M, I'll go ahead and raise it up to 1G.
Anything I can do to help see if it is a memory leak?

# dovecot -n
# 2.1.9: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.32-279.9.1.el6.x86_64 x86_64 Scientific Linux release 6.3 
(Carbon) 
auth_master_user_separator = *
auth_username_format = %Ln
auth_verbose = yes
auth_verbose_passwords = sha1
auth_worker_max_count = 64
login_log_format_elements = user=<%u> session=%{session} method=%m rip=%r 
lip=%l mpid=%e %c
mail_fsync = always
mail_log_prefix = "%s(%u): session=%{session} "
mail_plugins = stats zlib
maildir_very_dirty_syncs = yes
mmap_disable = yes
namespace {
  inbox = yes
  location = 
  prefix = INBOX.
  separator = .
  type = private
}
passdb {
  args = /etc/dovecot/master-users
  driver = passwd-file
  master = yes
}
passdb {
  args = imap
  driver = pam
}
plugin {
  lazy_expunge = DELETED_MESSAGES.
  mail_log_events = delete expunge flag_change
  mail_log_fields = uid box msgid from flags size
  quota = fs:User quota
  stats_refresh = 30 secs
  stats_track_cmds = yes
}
protocols = imap pop3
service anvil {
  client_limit = 1
}
service auth {
  client_limit = 1
  vsz_limit = 768 M
}
service doveadm {
  inet_listener {
port = 1842
  }
  unix_listener doveadm-server {
mode = 0666
  }
}
service imap-login {
  inet_listener imap {
port = 143
  }
  inet_listener imaps {
port = 993
ssl = yes
  }
  process_limit = 7000
  process_min_avail = 32
  vsz_limit = 256 M
}
service imap-postlogin {
  executable = script-login -d /etc/dovecot/bin/sonic-imap-postlogin
  user = $default_internal_user
}
service imap {
  executable = imap imap-postlogin
  process_limit = 4096
  vsz_limit = 512 M
}
service pop3-login {
  inet_listener pop3 {
port = 110
  }
  inet_listener pop3s {
port = 995
ssl = yes
  }
  process_limit = 2000
  process_min_avail = 32
  vsz_limit = 256 M
}
service pop3-postlogin {
  executable = script-login -d /etc/dovecot/bin/sonic-pop3-postlogin
  user = $default_internal_user
}
service pop3 {
  executable = pop3 pop3-postlogin
  process_limit = 4096
}
service stats {
  fifo_listener stats-mail {
mode = 0666
  }
}
shutdown_clients = no
ssl = required
ssl_parameters_regenerate = 1 days
syslog_facility = local0
userdb {
  driver = passwd
}
verbose_proctitle = yes
protocol imap {
  imap_id_send = support-url support-email
  mail_max_userip_connections = 20
  mail_plugins = stats zlib mwi_update mail_log notify imap_stats imap_zlib
}
protocol pop3 {
  mail_plugins = stats zlib lazy_expunge
  pop3_fast_size_lookups = yes
  pop3_uidl_format = %f
}


-- 
Kelsey Cummings - k...@corp.sonic.net  sonic.net, inc.
System Architect  2260 Apollo Way
707.522.1000  Santa Rosa, CA 95407


Re: [Dovecot] memory allocation issues

2012-11-22 Thread Timo Sirainen
On 9.11.2012, at 2.49, Kelsey Cummings wrote:

> One of our dovecot backend servers ran into a problem with it's auth
> process a few days ago.  This doesn't appear to be the error logged when
> dovecot hits its internal limit so I'm not sure what is going on here.
> 
> auth: Error: malloc: 58012: Cannot allocate memory
> auth: Error: Unable to allocate memory for mutexes from the region
> auth: Error: PANIC: Cannot allocate memory
> auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user

It would have been nicer if libc would have just crashed the process instead of 
silently converting it into "unknown user" error.. That's probably actually a 
bug since the getpwuid_r() that Dovecot uses would have been able to return an 
error message.

> pop3: Error: Authenticated user not found from userdb, auth lookup 
> +id=2509111297 (client-pid=4781 client-id=1)
> pop3-login: Internal login failure (pid=4781 id=1) (internal failure, 1 
> +succesful auths): user=...
> 
> There was at least 10+GB free RAM on the server and no indication of a
> system level issue at the same time.  The server is running 2.1.9.
> There were about 3,200 active sessions, with something like 12 new
> sessions/sec.  The other identical servers are/were handling virtually
> identical load with the same service uptime and haven't had any issues 
> so far.  (Crash happened 7 days ago.)

Memory leak maybe? service auth { vsz_limit } anyway was reached (default 256 
MB).



[Dovecot] memory allocation issues

2012-11-08 Thread Kelsey Cummings
One of our dovecot backend servers ran into a problem with it's auth
process a few days ago.  This doesn't appear to be the error logged when
dovecot hits its internal limit so I'm not sure what is going on here.

auth: Error: malloc: 58012: Cannot allocate memory
auth: Error: Unable to allocate memory for mutexes from the region
auth: Error: PANIC: Cannot allocate memory
auth: passwd(test,1.1.1.1,<8HTlNHzNIQBAjhKC>): unknown user
pop3: Error: Authenticated user not found from userdb, auth lookup 
+id=2509111297 (client-pid=4781 client-id=1)
pop3-login: Internal login failure (pid=4781 id=1) (internal failure, 1 
+succesful auths): user=...

There was at least 10+GB free RAM on the server and no indication of a
system level issue at the same time.  The server is running 2.1.9.
There were about 3,200 active sessions, with something like 12 new
sessions/sec.  The other identical servers are/were handling virtually
identical load with the same service uptime and haven't had any issues 
so far.  (Crash happened 7 days ago.)


-- 
Kelsey Cummings - k...@corp.sonic.net  sonic.net, inc.
System Architect  2260 Apollo Way
707.522.1000  Santa Rosa, CA 95407