Re: [strongSwan] Services unreachable after first connection
Hi Tobias and thanks again for your time, > Are new TCP connections created or is the same connection used for > several searches? Are there constantly packets exchanged in these > tests? If not, for how long is there no traffic? All the TCP connections are brand new, during my test I used a very simple ldap searches to reduce the amount of data involved (anche make the dump mure readable). Every single successfull ldapserch involves around 12 packets and around 3.6KB of data transfer completed in around 0.1 seconds, the most significant amount of data between hosts involves the search result which is around 2KB, there's no continuos flaw of data during the test. This is a cool example --> https://sc.burrfoot.it/tcpdump.jpg I made an ldapsearch request to host 10.128.4.16 with ok result (packet 1-12), after that I suddenly repeated the same ldapsearch and It was stuck, and after 60 seconds ldapsearch returned error "ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)", that's the result (packet 13-37). > Interesting. Maybe some 5 minute client-IP block after certain traffic > patterns? Or perhaps some timeout. Seems strange, I mean if there was some specific block after certain traffic I don't understand why everything works perfectly if I keep making tcp connections on 389 every 5 seconds with my keepalive script. As you can see from the screenshot I linked before, even if I wasn't able to complete the ldapsearch there's still some strange data flagged as "TCP Spurious Retransmission" coming from the host on the other side of the vpn. The more I looked to it the more I feel there's something strange on the other side of the VPN, something like a weird security appliance which temporarily block traffic; it's strange this is not happening with those nmap loops, maybe it doesn't trigger the same logic because it's a basic SYN-ACK-RESET connection and does not involve any significant data through it. > Depends on what exactly is going on. It definitely sounds like a > firewall issue (either affecting the ESP packets or traffic after the > tunnel). You'd have to debug where exactly packets get stuck (e.g. > whether ESP packets are sent, if they reach the peer or where they are > dropped, how far decrypted TCP packets get, if a response is sent, if > that's encapsulated in ESP again, where those may get dropped and so > on). Use packet counters or captures to do so. Thank you for your suggestions, I'll dig deeper, now I'm trying to understand if it's possible to do some checks also on our customer's side. Thanks Tas --- "Arguing that you don't care about the right to privacy because you have nothing to hide is no different than saying you don't care about free speech because you have nothing to say."
Re: [strongSwan] Services unreachable after first connection
Hi Tas, > If I stop the nmap loop cycle after a few ldapsearch runs I got > problems, connection to ldap stuck and nmap test returns 389 port filtered. Are new TCP connections created or is the same connection used for several searches? Are there constantly packets exchanged in these tests? If not, for how long is there no traffic? > I noticed that 389 port result unreachable for exactly 300 second, after > that nmap detects it open again. Interesting. Maybe some 5 minute client-IP block after certain traffic patterns? Or perhaps some timeout. > I added some debug parameters to my ipsec.conf file (charondebug="ike 2, > knl 2, cfg 2") but I didn't noticed something significant when the ldap > connection get stuck or opens again after 5 minutes. OK, so no MOBIKE update or DPD or rekeying. > Can be anything related to some dpd or keepalive feature? Depends on what exactly is going on. It definitely sounds like a firewall issue (either affecting the ESP packets or traffic after the tunnel). You'd have to debug where exactly packets get stuck (e.g. whether ESP packets are sent, if they reach the peer or where they are dropped, how far decrypted TCP packets get, if a response is sent, if that's encapsulated in ESP again, where those may get dropped and so on). Use packet counters or captures to do so. Regards, Tobias
Re: [strongSwan] Services unreachable after first connection
Thanks you very much Tobias, I have another question. During some tests I noticed that if I let run a simple script (basically a loop cycle with "nmap -sT -P0 -p 389 10.128.4.15 10.128.4.16" and 5 seconds sleep) to test 389 port on the two destination AD domain controllers, every ldapsearch action (or in general every action that involves a connection to 389 port of those two domain controllers) works perfectly fine and nmap always returns 389 port open. If I stop the nmap loop cycle after a few ldapsearch runs I got problems, connection to ldap stuck and nmap test returns 389 port filtered. I noticed that 389 port result unreachable for exactly 300 second, after that nmap detects it open again. I added some debug parameters to my ipsec.conf file (charondebug="ike 2, knl 2, cfg 2") but I didn't noticed something significant when the ldap connection get stuck or opens again after 5 minutes. Can be anything related to some dpd or keepalive feature? Best regards Tas --- *"Arguing that you don't care about the right to privacy because you have nothing to hide is no different than saying you don't care about free speech because you have nothing to say."* On Fri, Jun 5, 2020 at 10:12 AM Tobias Brunner wrote: > Hi Tas, > > > Do you think this strange behaviour can be cause by our strongswan > > configuration? > > One thing that comes to mind in regards to TCP over IPsec are MTU/MSS > issues [1]. But those would only have an effect on larger transmits, > not on the initial TCP handshake. That is, you should be able to create > a new TCP connection even after another stalled. If that's not the > case, some firewall or routing issue could be the culprit (or a problem > with the IPsec tunnel on the other end). > > By the way, you'll never see outbound plaintext traffic (e.g. a TCP SYN) > in tcpdump [2]. > > Regards, > Tobias > > [1] > > https://wiki.strongswan.org/projects/strongswan/wiki/ForwardingAndSplitTunneling#MTUMSS-issues > [2] > > https://wiki.strongswan.org/projects/strongswan/wiki/FAQ#Capturing-outbound-plaintext-packets-with-tcpdumpwireshark >
Re: [strongSwan] Services unreachable after first connection
Hi Tas, > Do you think this strange behaviour can be cause by our strongswan > configuration? One thing that comes to mind in regards to TCP over IPsec are MTU/MSS issues [1]. But those would only have an effect on larger transmits, not on the initial TCP handshake. That is, you should be able to create a new TCP connection even after another stalled. If that's not the case, some firewall or routing issue could be the culprit (or a problem with the IPsec tunnel on the other end). By the way, you'll never see outbound plaintext traffic (e.g. a TCP SYN) in tcpdump [2]. Regards, Tobias [1] https://wiki.strongswan.org/projects/strongswan/wiki/ForwardingAndSplitTunneling#MTUMSS-issues [2] https://wiki.strongswan.org/projects/strongswan/wiki/FAQ#Capturing-outbound-plaintext-packets-with-tcpdumpwireshark
[strongSwan] Services unreachable after first connection
Hi everyone, I just joined the ML, first of all thank you for your patience and help; I don't have a huge experience with vpn in general and this is the 1st time I used strongswan. Recently I setup up a test environment for a project where the objective was to implement kerberos SSO between one of our application servers (10.1.0.137, an AWS EC2 instance which runs some J2EE applications) and one of our customers Active Directory domain (10.128.4.15, 10.128.4.16 are the two domain controllers), after that the application have to search for some user attributes using AD as ldap directory. To archive this I managed to setup a site-to-site ipsec vpn between our systems and our customer datacenter, on our side I used another EC2 instance as vpn endpoint (10.1.0.144, which is behind NAT by AWS with a public ip 74.74.74.74) using CentOS 7 and strongswan 5.7.2, on our customer side I don't have control or visibility, the only thing I know is that the vpn endpoint should be a Fortinet appliance with a public ip (217.217.217.217). You can see the whole architecture on this png https://sc.burrfoot.it/vpn.png The vpn setup went pretty smooth: - tunnel established (https://sc.burrfoot.it/strongswan.png) - I made my application server to use our vpn endpoint as gateway for the two domain controllers with a static route - adjusted EC2 security groups to allow kerberos and ldap communication (TCP and UDP 88 for kerberos, TCP 389 for ldap), on the other side our customer sysadmin did the same on his firewall. - no masquerade rules on our vpn endpoint because our customer allowed requests from our application server internal ip. Everything seeems ok and a quick test using nmap from the application server (10.1.0.137) worked pretty well (https://sc.burrfoot.it/nmap.png), but after some tests (some basic ldapsearch queries) I noticed the ldap did not respond anymore, so I tried on the second domain controller and it worked... after that also the second domain controller did not respond anymore. At this point I made another nmap test which resulted in traffic filtered ( https://sc.burrfoot.it/nmap2.png). After a couple of minutes I did some other tests, the ldap seem returned reachable and queries went ok, but after a while TCP 389 turned unreachable. To clear out this strange beahvior I setup some basic tcp check with Nagios, which resulted ok most of the time, except when we did some ldap queries, at that point port 389 seems to close for a while and returned available after a few minutes. At first I thought the problem seems related to some strange firewall behaviour on our customer side because we don't have any security appliance or tool on our side (only a basic EC2 security group) but before asking our customer to do some checks I wanna be sure that our strongswan configuration is ok and couldn't be the cause of this problem. I also tried to capture some traffic on our strongswan endpoint (10.1.0.144), for instance I was looking for TCP port 389 and ESP protocols, when I made a nmap test on port 389 (open) this is the result --> https://sc.burrfoot.it/tcpdump1.png When the port result closed I saw not a single packed passing through, not a single one, not even a SYN packet from our application server. Checking strongswan tunnel status I never had a single disconnection, everything seems very stable from a vpn point of view. This is my strongswan configuration, I know that some protocols are not the best from a security point of view, but we had to follow our customer's specifications. --- conn aws-customer ikelifetime=1440m keylife=60m rekeymargin=3m keyingtries=3 keyexchange=ikev1 aggressive=no mobike=no ike=aes128-sha1-modp1536 ike=aes256-sha256-modp1536 esp=aes128-sha1-modp1536 esp=aes256-sha256-modp1536 left=10.1.0.144 leftid=74.74.74.74 leftsubnet=10.1.0.137/32 leftauth=psk right=217.217.217.217 rightid=217.217.217.217 rightsubnet=10.128.4.0/26 rightauth=psk type=tunnel auto=start dpdaction=restart --- Do you think this strange behaviour can be cause by our strongswan configuration? Can you suggest me some more in deep tests to figure out why we have these strange interruptions? Do you have any other suggestions? Thank you very much for any informations. Tas --- *"Arguing that you don't care about the right to privacy because you have nothing to hide is no different than saying you don't care about free speech because you have nothing to say."*