Re: [strongSwan] Services unreachable after first connection

2020-06-10 Thread Tasslehoff Burrfoot
Hi Tobias and thanks again for your time,

> Are new TCP connections created or is the same connection used for
> several searches?  Are there constantly packets exchanged in these
> tests?  If not, for how long is there no traffic?

All the TCP connections are brand new, during my test I used a very
simple ldap searches to reduce the amount of data involved (anche make
the dump mure readable).
Every single successfull ldapserch involves around 12 packets and
around 3.6KB of data transfer completed in around 0.1 seconds, the
most significant amount of data between hosts involves the search
result which is around 2KB, there's no continuos flaw of data during
the test.

This is a cool example --> https://sc.burrfoot.it/tcpdump.jpg
I made an ldapsearch request to host 10.128.4.16 with ok result
(packet 1-12), after that I suddenly repeated the same ldapsearch and
It was stuck, and after 60 seconds ldapsearch returned error
"ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)", that's the
result (packet 13-37).

> Interesting.  Maybe some 5 minute client-IP block after certain traffic
> patterns?  Or perhaps some timeout.

Seems strange, I mean if there was some specific block after certain
traffic I don't understand why everything works perfectly if I keep
making tcp connections on 389 every 5 seconds with my keepalive
script.
As you can see from the screenshot I linked before, even if I wasn't
able to complete the ldapsearch there's still some strange data
flagged as "TCP Spurious Retransmission" coming from the host on the
other side of the vpn.

The more I looked to it the more I feel there's something strange on
the other side of the VPN, something like a weird security appliance
which temporarily block traffic; it's strange this is not happening
with those nmap loops, maybe it doesn't trigger the same logic because
it's a basic SYN-ACK-RESET connection and does not involve any
significant data through it.

> Depends on what exactly is going on.  It definitely sounds like a
> firewall issue (either affecting the ESP packets or traffic after the
> tunnel).  You'd have to debug where exactly packets get stuck (e.g.
> whether ESP packets are sent, if they reach the peer or where they are
> dropped, how far decrypted TCP packets get, if a response is sent, if
> that's encapsulated in ESP again, where those may get dropped and so
> on).  Use packet counters or captures to do so.

Thank you for your suggestions, I'll dig deeper, now I'm trying to
understand if it's possible to do some checks also on our customer's
side.

Thanks

Tas

---
"Arguing that you don't care about the right to privacy because you
have nothing to hide is no different than saying you don't care about
free speech because you have nothing to say."


Re: [strongSwan] Services unreachable after first connection

2020-06-10 Thread Tobias Brunner
Hi Tas,

> If I stop the nmap loop cycle after a few ldapsearch runs I got
> problems, connection to ldap stuck and nmap test returns 389 port filtered.

Are new TCP connections created or is the same connection used for
several searches?  Are there constantly packets exchanged in these
tests?  If not, for how long is there no traffic?

> I noticed that 389 port result unreachable for exactly 300 second, after
> that nmap detects it open again.

Interesting.  Maybe some 5 minute client-IP block after certain traffic
patterns?  Or perhaps some timeout.

> I added some debug parameters to my ipsec.conf file (charondebug="ike 2,
> knl 2, cfg 2") but I didn't noticed something significant when the ldap
> connection get stuck or opens again after 5 minutes.

OK, so no MOBIKE update or DPD or rekeying.

> Can be anything related to some dpd or keepalive feature? 

Depends on what exactly is going on.  It definitely sounds like a
firewall issue (either affecting the ESP packets or traffic after the
tunnel).  You'd have to debug where exactly packets get stuck (e.g.
whether ESP packets are sent, if they reach the peer or where they are
dropped, how far decrypted TCP packets get, if a response is sent, if
that's encapsulated in ESP again, where those may get dropped and so
on).  Use packet counters or captures to do so.

Regards,
Tobias


Re: [strongSwan] Services unreachable after first connection

2020-06-09 Thread Tasslehoff Burrfoot
Thanks you very much Tobias, I have another question.
During some tests I noticed that if I let run a simple script (basically a
loop cycle with "nmap -sT -P0 -p 389 10.128.4.15 10.128.4.16" and 5 seconds
sleep) to test 389 port on the two destination AD domain controllers, every
ldapsearch action (or in general every action that involves a connection to
389 port of those two domain controllers) works perfectly fine and nmap
always returns 389 port open.
If I stop the nmap loop cycle after a few ldapsearch runs I got problems,
connection to ldap stuck and nmap test returns 389 port filtered.

I noticed that 389 port result unreachable for exactly 300 second, after
that nmap detects it open again.

I added some debug parameters to my ipsec.conf file (charondebug="ike 2,
knl 2, cfg 2") but I didn't noticed something significant when the ldap
connection get stuck or opens again after 5 minutes.

Can be anything related to some dpd or keepalive feature?

Best regards

Tas

---
*"Arguing that you don't care about the right to privacy because you have
nothing to hide is no different than saying you don't care about free
speech because you have nothing to say."*



On Fri, Jun 5, 2020 at 10:12 AM Tobias Brunner 
wrote:

> Hi Tas,
>
> > Do you think this strange behaviour can be cause by our strongswan
> > configuration?
>
> One thing that comes to mind in regards to TCP over IPsec are MTU/MSS
> issues [1].  But those would only have an effect on larger transmits,
> not on the initial TCP handshake.  That is, you should be able to create
> a new TCP connection even after another stalled.  If that's not the
> case, some firewall or routing issue could be the culprit (or a problem
> with the IPsec tunnel on the other end).
>
> By the way, you'll never see outbound plaintext traffic (e.g. a TCP SYN)
> in tcpdump [2].
>
> Regards,
> Tobias
>
> [1]
>
> https://wiki.strongswan.org/projects/strongswan/wiki/ForwardingAndSplitTunneling#MTUMSS-issues
> [2]
>
> https://wiki.strongswan.org/projects/strongswan/wiki/FAQ#Capturing-outbound-plaintext-packets-with-tcpdumpwireshark
>


Re: [strongSwan] Services unreachable after first connection

2020-06-05 Thread Tobias Brunner
Hi Tas,

> Do you think this strange behaviour can be cause by our strongswan
> configuration?

One thing that comes to mind in regards to TCP over IPsec are MTU/MSS
issues [1].  But those would only have an effect on larger transmits,
not on the initial TCP handshake.  That is, you should be able to create
a new TCP connection even after another stalled.  If that's not the
case, some firewall or routing issue could be the culprit (or a problem
with the IPsec tunnel on the other end).

By the way, you'll never see outbound plaintext traffic (e.g. a TCP SYN)
in tcpdump [2].

Regards,
Tobias

[1]
https://wiki.strongswan.org/projects/strongswan/wiki/ForwardingAndSplitTunneling#MTUMSS-issues
[2]
https://wiki.strongswan.org/projects/strongswan/wiki/FAQ#Capturing-outbound-plaintext-packets-with-tcpdumpwireshark


[strongSwan] Services unreachable after first connection

2020-06-03 Thread Tasslehoff Burrfoot
Hi everyone, I just joined the ML, first of all thank you for your patience
and help; I don't have a huge experience with vpn in general and this is
the 1st time I used strongswan.

Recently I setup up a test environment for a project where the objective
was to implement kerberos SSO between one of our application servers
(10.1.0.137, an AWS EC2 instance which runs some J2EE applications) and one
of our customers Active Directory domain (10.128.4.15, 10.128.4.16 are the
two domain controllers), after that the application have to search for some
user attributes using AD as ldap directory.
To archive this I managed to setup a site-to-site ipsec vpn between our
systems and our customer datacenter, on our side I used another EC2
instance as vpn endpoint (10.1.0.144, which is behind NAT by AWS with a
public ip 74.74.74.74) using CentOS 7 and strongswan 5.7.2, on our customer
side I don't have control or visibility, the only thing I know is that the
vpn endpoint should be a Fortinet appliance with a public ip
(217.217.217.217).

You can see the whole architecture on this png
https://sc.burrfoot.it/vpn.png

The vpn setup went pretty smooth:
- tunnel established (https://sc.burrfoot.it/strongswan.png)
- I made my application server to use our vpn endpoint as gateway for the
two domain controllers with a static route
- adjusted EC2 security groups to allow kerberos and ldap communication
(TCP and UDP 88 for kerberos, TCP 389 for ldap), on the other side our
customer sysadmin did the same on his firewall.
- no masquerade rules on our vpn endpoint because our customer allowed
requests from our application server internal ip.

Everything seeems ok and a quick test using nmap from the application
server (10.1.0.137) worked pretty well (https://sc.burrfoot.it/nmap.png),
but after some tests (some basic ldapsearch queries) I noticed the ldap did
not respond anymore, so I tried on the second domain controller and it
worked... after that also the second domain controller did not respond
anymore.
At this point I made another nmap test which resulted in traffic filtered (
https://sc.burrfoot.it/nmap2.png).
After a couple of minutes I did some other tests, the ldap seem returned
reachable and queries went ok, but after a while TCP 389 turned unreachable.

To clear out this strange beahvior I setup some basic tcp check with
Nagios, which resulted ok most of the time, except when we did some ldap
queries, at that point port 389 seems to close for a while and returned
available after a few minutes.
At first I thought the problem seems related to some strange firewall
behaviour on our customer side because we don't have any security appliance
or tool on our side (only a basic EC2 security group) but before asking our
customer to do some checks I wanna be sure that our strongswan
configuration is ok and couldn't be the cause of this problem.

I also tried to capture some traffic on our strongswan endpoint
(10.1.0.144), for instance I was looking for TCP port 389 and ESP
protocols, when I made a nmap test on port 389 (open) this is the result
--> https://sc.burrfoot.it/tcpdump1.png
When the port result closed I saw not a single packed passing through, not
a single one, not even a SYN packet from our application server.
Checking strongswan tunnel status I never had a single disconnection,
everything seems very stable from a vpn point of view.

This is my strongswan configuration, I know that some protocols are not the
best from a security point of view, but we had to follow our customer's
specifications.
---
conn aws-customer
ikelifetime=1440m
keylife=60m
rekeymargin=3m
keyingtries=3
keyexchange=ikev1
aggressive=no
mobike=no
ike=aes128-sha1-modp1536
ike=aes256-sha256-modp1536
esp=aes128-sha1-modp1536
esp=aes256-sha256-modp1536
left=10.1.0.144
leftid=74.74.74.74
leftsubnet=10.1.0.137/32
leftauth=psk
right=217.217.217.217
rightid=217.217.217.217
rightsubnet=10.128.4.0/26
rightauth=psk
type=tunnel
auto=start
dpdaction=restart
---

Do you think this strange behaviour can be cause by our strongswan
configuration?
Can you suggest me some more in deep tests to figure out why we have these
strange interruptions?
Do you have any other suggestions?

Thank you very much for any informations.

Tas

---
*"Arguing that you don't care about the right to privacy because you have
nothing to hide is no different than saying you don't care about free
speech because you have nothing to say."*