Re: LDAP: Connection appears to be hanging, reconnecting

2014-12-17 Thread Matthias Egger
Hello Simon

On 12/16/2014 05:38 PM, Simon Fraser wrote:
> This is speculation, but what has happened to us in the past is that the
> LDAP server stopped responding to queries, but the TCP socket was still
> open for connections. A new TCP connection would be established, but the
> daemon would not be notified of it.
> 
> So, depending on precisely how the first LDAP server crashed, it may not
> be the same test as killing the process, but closer to sending it 'kill
> -STOP' (and then 'kill -CONT' afterwards, obviously)

Thank you very much for that hint. You were right. When i -SIGSTOP the
slapd i receive a similar behaviour of dovecot as we had a few weeks ago.

So do you (or someone other) has a hint on how i could work around such
a situation?

I found a statement from Timo Sirainen from June 2011:

http://www.dovecot.org/pipermail/dovecot/2011-June/059905.html

"...Fallbacking to another LDAP server is done by OpenLDAP internally..."

So i thought, there should be a possibility to "tweak" the ldap.conf.

I then found a german Post:

https://listen.jpberlin.de/pipermail/dovecot/2014-June/000506.html

Where someone mentioned some ldap.conf Settings:

BIND_POLICY soft
TIMELIMIT   5
NETWORK_TIMEOUT 5
TIMEOUT 8

and a link to:

http://www.linuxquestions.org/questions/linux-enterprise-47/ldap-failover-timeout-client-setting-847718/

which also uses these two settings:

BIND_TIMELIMIT 10
IDE_TIMELIMIT 10

I gave i try to them, but the result was still the same. Dovecot
respectively OpenLDAP does not switch to another LDAP.

Best regards
Matthias
-- 
Matthias Egger
ETH Zurich
Department of Information Technology  maeg...@ee.ethz.ch
and Electrical Engineering
IT Support Group (ISG.EE), ETL/F/24.1 Phone +41 (0)44 632 03 90
Physikstrasse 3, CH-8092 Zurich   Fax   +41 (0)44 632 11 95




smime.p7s
Description: S/MIME Cryptographic Signature


Re: LDAP: Connection appears to be hanging, reconnecting

2014-12-16 Thread Simon Fraser



On 16/12/14 16:30, Matthias Egger wrote:


What happened:
A few weeks ago one of the LDAPS Servers which is not maintained by us
has crashed. From that moment on, users could still login to check their
emails, but they were not able to send any email through postfix (which
uses smtpd_sasl_type = dovecot)

What i do not understand, is why did dovecot not switch to the second
configured LDAPS Server? It looks like it retried for ever to reconnect
to the crashed LDAP Server.


This is speculation, but what has happened to us in the past is that the 
LDAP server stopped responding to queries, but the TCP socket was still 
open for connections. A new TCP connection would be established, but the 
daemon would not be notified of it.


So, depending on precisely how the first LDAP server crashed, it may not 
be the same test as killing the process, but closer to sending it 'kill 
-STOP' (and then 'kill -CONT' afterwards, obviously)


Simon.



--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


LDAP: Connection appears to be hanging, reconnecting

2014-12-16 Thread Matthias Egger
Hello List

I have a strange problem here which i try to analyse, but i'm stuck.
Maybe someone has a hint?

What happened:
A few weeks ago one of the LDAPS Servers which is not maintained by us
has crashed. From that moment on, users could still login to check their
emails, but they were not able to send any email through postfix (which
uses smtpd_sasl_type = dovecot)

What i do not understand, is why did dovecot not switch to the second
configured LDAPS Server? It looks like it retried for ever to reconnect
to the crashed LDAP Server.

From the moment of the crash we see a lot of Errors like these in our
logfiles:

Nov 30 16:51:53 servername dovecot: [ID 583609 mail.error] auth: Error:
ldap(userone,USERS_IP1,): Connection appears to be
hanging, reconnecting

AND

Nov 30 16:51:59 servername dovecot: [ID 583609 mail.error] auth: Error:
plain(usertwo,USERS_IP2,): Request 1982.83548 timed
out after 151 secs, state=1

The used dovecot version is 2.2.13, runs on a solaris 10 system and the
configuration for passdb and userdb are:

passdb {
  args = /etc/dovecot-ldap.conf
  default_fields =
  deny = no
  driver = ldap
  master = no
  name =
  override_fields =
  pass = no
  result_failure = continue
  result_internalfail = continue
  result_success = return-ok
  skip = never
}

userdb {
  args = /etc/dovecot-ldap.conf
  default_fields =
  driver = ldap
  name =
  override_fields =
  result_failure = continue
  result_internalfail = continue
  result_success = return-ok
  skip = never
}

And the dovecot-ldap.conf contains (obfuscated):

uris = ldaps://server2.tld ldaps://server1.tld
ldaps://server4.tld ldaps://server3.tld
dn   = ...
dnpass   = ...
ldap_version = 3
auth_bind= yes
base = ...
scope= onelevel
user_attrs   = homeDirectory=home,uidNumber=uid,gidNumber=gid
user_filter  = ...
pass_attrs   = uid=user
pass_filter  = ...

The strange thing is, that with the very same binaries and configuration
(okay, some minimal modifications have been made to bind to the correct
interfaces...) a test on our testsystem works as it should.

When we shutdown slapd, dovecot recognizes it an connects to the
alternate LDAPS. When we shutdown slapd and start a netcat (just to let
something listening without responding)... you guess it. Dovecot does
recognize it and switches over to the alternate testsystem.

So on our testsystem, everything worked as it should. But the productive
system did not. And since the LDAPS are not maintained by us it is
somewhat hard to try to reproduce something.

At least i got the logfiles from server2.tld and server1.tld. But they
only show what i still knew. Our server connected to server2.tld until
the crash happened. But server1.tld never got any connection.

Has someone an idea what i could try to find out why dovecot did not
switch to server1.tld?

Best regards
Matthias Egger
-- 
Matthias Egger
ETH Zurich
Department of Information Technology  maeg...@ee.ethz.ch
and Electrical Engineering
IT Support Group (ISG.EE), ETL/F/24.1 Phone +41 (0)44 632 03 90
Physikstrasse 3, CH-8092 Zurich   Fax   +41 (0)44 632 11 95



smime.p7s
Description: S/MIME Cryptographic Signature