Re: tcp reset errors
Willy Tarreau w at 1wt.eu writes: Hi Franky, On Thu, Sep 11, 2014 at 01:08:09PM +0200, Franky Van Liedekerke wrote: On Thu, Sep 11, 2014 at 11:40 AM, Franky Van Liedekerke liedekef@... wrote: After doing tcpdump on both servers (no ldap errors anywhere in the ldap logs), I see that the ldap server sends out resets and the clients connecting to haproxy. This might be related to one another. Each client seems to send 2 RST packets at the end of a LDAP TLS session (over port 389), does that sound familiar? Franky Ok, after much trial and error, I pinned it down to the following: we have lots of servers doing ldap lookup for authentication, also when connecting via ssh. Now on EL5 servers this auth is done via a call to /usr/libexec/openssh/ssh-ldap-wrapper. Apparently this binary causes the resets to be shown in the haproxy error logs. I switched to the sssd version for EL5 servers, but that version did not include ssh-keys support, so the resets persisted. Again to the internet for the rescue: the version 1.9.6 for el5 can be found at http://copr-be.cloud.fedoraproject.org/results/sgallagh/sssd-1.9-rhel5/epel-5-x86_64 , and that version does support ssh correctly. Installing it, changing the ssh config et voila: no more resets. So the bug is in the ssh-ldap-wrapper, but I understand that doing a RST at the end is not bad, just not good either ... the side-effect of the new sssd is that much less ldap queries are made (as sudo and ssh use sssd too then), but I'll leave it up to the management to decide wether or not to go for that solution. Thanks for sending the details of your diagnostic. As you say, RST are not necessarily bad. When a client closes first, it has two options : - either send RST - or have the source port unusable for 2 minutes. Most of the time you chose the first option. In your case since you were seeing SD flags, it means the reset came fro mthe server, maybe the client was speaking inappropriately on the connection, causing the server to abort it. If so, it proves that the behaviour was properly chosen, because it allowed you to detect the anomaly in the logs and to fix it, which is quite good. Regards, Willy I have have been having a similar issue with RST on LDAP connection but have not been able to pin it down any further than haproxy. LDAP seems to be the only issue at the moment although a had identical symptoms with SMTP that were resolved after I lowered the MTU on the interface. I have a mail appliance on one side and AD LDAP on the other of this haproxy: spam_filter - haproxy(service) --- AD LDAP When I run an LDAP test, it doesn't seem to matter if it is plain text or SSL, I will randomly get a RST sent by the server during transfer. I have run the LDAP test dozens of times and there is no patter, randomly about 50% fail mid stream and a packet capture shows this. RST from server. To further test I routed the same test through the haproxy machine, ip_forwarding, to the same destination as the haproxy backend and all tests succeeded 100%. spam_filter -- haproxy(ip_forward) - AD LDAP Does anyone have any advise on what to check next? My haproxy.cfg only has defined mode tcp, do I require other options for LDAP? I need LDAPS and have found that the option ldap-check does not work, but LDAP vs LDAPS does not affect my problem. Thanks Steve
Re: tcp reset errors
After doing tcpdump on both servers (no ldap errors anywhere in the ldap logs), I see that the ldap server sends out resets and the clients connecting to haproxy. This might be related to one another. Each client seems to send 2 RST packets at the end of a LDAP TLS session (over port 389), does that sound familiar? Franky On Wed, Sep 10, 2014 at 8:49 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 10/09/2014 03:31 μμ, Franky Van Liedekerke wrote: Hi, [..snip..] Any hints are very much appreciated. If more info is needed, let me know. Is it possible to run tcpdump on both servers and see who is sending RSTs? what about ldap logs? Do you know if you get this problem for all LDAP queries or for a subset? It could be that LDAP queries take too much time to be processed on LDAP due to missing index, heavy IO and etc. I know ldap can provide quite a lot of information. Cheers, Pavlos
Re: tcp reset errors
On Thu, Sep 11, 2014 at 11:40 AM, Franky Van Liedekerke liede...@telenet.be wrote: After doing tcpdump on both servers (no ldap errors anywhere in the ldap logs), I see that the ldap server sends out resets and the clients connecting to haproxy. This might be related to one another. Each client seems to send 2 RST packets at the end of a LDAP TLS session (over port 389), does that sound familiar? Franky (btw, sorry for top-posting before) Franky
Re: tcp reset errors
On Thu, Sep 11, 2014 at 11:40 AM, Franky Van Liedekerke liede...@telenet.be wrote: After doing tcpdump on both servers (no ldap errors anywhere in the ldap logs), I see that the ldap server sends out resets and the clients connecting to haproxy. This might be related to one another. Each client seems to send 2 RST packets at the end of a LDAP TLS session (over port 389), does that sound familiar? Franky Ok, after much trial and error, I pinned it down to the following: we have lots of servers doing ldap lookup for authentication, also when connecting via ssh. Now on EL5 servers this auth is done via a call to /usr/libexec/openssh/ssh-ldap-wrapper. Apparently this binary causes the resets to be shown in the haproxy error logs. I switched to the sssd version for EL5 servers, but that version did not include ssh-keys support, so the resets persisted. Again to the internet for the rescue: the version 1.9.6 for el5 can be found at http://copr-be.cloud.fedoraproject.org/results/sgallagh/sssd-1.9-rhel5/epel-5-x86_64 , and that version does support ssh correctly. Installing it, changing the ssh config et voila: no more resets. So the bug is in the ssh-ldap-wrapper, but I understand that doing a RST at the end is not bad, just not good either ... the side-effect of the new sssd is that much less ldap queries are made (as sudo and ssh use sssd too then), but I'll leave it up to the management to decide wether or not to go for that solution. Franky
Re: tcp reset errors
Hi Franky, On Thu, Sep 11, 2014 at 01:08:09PM +0200, Franky Van Liedekerke wrote: On Thu, Sep 11, 2014 at 11:40 AM, Franky Van Liedekerke liede...@telenet.be wrote: After doing tcpdump on both servers (no ldap errors anywhere in the ldap logs), I see that the ldap server sends out resets and the clients connecting to haproxy. This might be related to one another. Each client seems to send 2 RST packets at the end of a LDAP TLS session (over port 389), does that sound familiar? Franky Ok, after much trial and error, I pinned it down to the following: we have lots of servers doing ldap lookup for authentication, also when connecting via ssh. Now on EL5 servers this auth is done via a call to /usr/libexec/openssh/ssh-ldap-wrapper. Apparently this binary causes the resets to be shown in the haproxy error logs. I switched to the sssd version for EL5 servers, but that version did not include ssh-keys support, so the resets persisted. Again to the internet for the rescue: the version 1.9.6 for el5 can be found at http://copr-be.cloud.fedoraproject.org/results/sgallagh/sssd-1.9-rhel5/epel-5-x86_64 , and that version does support ssh correctly. Installing it, changing the ssh config et voila: no more resets. So the bug is in the ssh-ldap-wrapper, but I understand that doing a RST at the end is not bad, just not good either ... the side-effect of the new sssd is that much less ldap queries are made (as sudo and ssh use sssd too then), but I'll leave it up to the management to decide wether or not to go for that solution. Thanks for sending the details of your diagnostic. As you say, RST are not necessarily bad. When a client closes first, it has two options : - either send RST - or have the source port unusable for 2 minutes. Most of the time you chose the first option. In your case since you were seeing SD flags, it means the reset came fro mthe server, maybe the client was speaking inappropriately on the connection, causing the server to abort it. If so, it proves that the behaviour was properly chosen, because it allowed you to detect the anomaly in the logs and to fix it, which is quite good. Regards, Willy
Re: tcp reset errors
On 10/09/2014 03:31 μμ, Franky Van Liedekerke wrote: Hi, [..snip..] Any hints are very much appreciated. If more info is needed, let me know. Is it possible to run tcpdump on both servers and see who is sending RSTs? what about ldap logs? Do you know if you get this problem for all LDAP queries or for a subset? It could be that LDAP queries take too much time to be processed on LDAP due to missing index, heavy IO and etc. I know ldap can provide quite a lot of information. Cheers, Pavlos signature.asc Description: OpenPGP digital signature