Re: PF NAT and Oracle/Linux mystery
On Sat, Jan 18, 2003 at 01:57:17PM +, Steve Schmitz wrote: > If you consider gigabit/copper a fast network and can suggest > experiments/meassurements, I'll be happy to conduct them. TCP window scaling support has been commited to -current (pf.c 1.306). If you have a spare box to install -current on, you could give it a try, pfctl -vss now additionally prints 'wscale n' for TCP connections using window scaling. Daniel
Re: PF NAT and Oracle/Linux mystery
> >We could add a "strip-wscale" option to scrub. It doesn't solve > >the state pickup issue, but could prevent clients communicating > >through the firewall from negotiating this option. > Does the Linux NAT code already do this? Linux's stock state code doesn't track sequence numbers. .mike
Re: PF NAT and Oracle/Linux mystery
Maybe someone with experience in fast networks can comment on how average packet loss and latency affect the maximum TCP window you might want to use. With an MTU of 1500 or lower, over the Internet, do your windows actually go beyond 65535 bytes, even if you enable scaling? We have a (partly) gigabit/copper internal network at the institute, and the MTU can be set to 9000. The nodes which have to access the Oracle DB only have 100 Mbit links, and deactivating wscale on them had no measureable effect on the network throughput between nodes and internal file servers (which have Gbit links). One reason for this is, I suppose, that this is about _TCP_ window size, whereas NFS (the transfer method we mainly use) is _UDP_ only under Linux, thus unaffected. If you consider gigabit/copper a fast network and can suggest experiments/meassurements, I'll be happy to conduct them. Cheers, Steve _ Add photos to your e-mail with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail
Re: PF NAT and Oracle/Linux mystery
Another problem with very large window sizes is that it becomes more easy for an attacker to guess valid sequence numbers. With window scaling, the maximum shift is 14 (as per RFC 1323), so the window might become as large as 65535*2^14=1GB. Sequence numbers are 32-bit unsigned integers, so if both peers would actually advertise 1GB windows (1/4th of the sequence number space), an attacker would have a 1 to 16 chance to pass the sequence number tracker by using random values for seq and ack. He could send RSTs until he gets the firewall to close the state entry, tearing down your connection (assuming he already knows the addresses and ports used). Also, if one peer advertises a 1GB window, the other is invited to send 1GB of data before waiting for an ack. Assuming an MTU of 1500 bytes, that's more than 70 packets. If just one of them gets lost, the entire bunch of them would have to be retransmitted (unless you also support SACK). So, on the networks I know, such large windows would never be reached, and a wscale 14 would be less than helpful. :) Maybe someone with experience in fast networks can comment on how average packet loss and latency affect the maximum TCP window you might want to use. With an MTU of 1500 or lower, over the Internet, do your windows actually go beyond 65535 bytes, even if you enable scaling? Daniel
Re: PF NAT and Oracle/Linux mystery
On Sat, Jan 18, 2003 at 08:42:04AM +, Steve Schmitz wrote: > Does the Linux NAT code already do this? Possibly, but I'll have to check the source code to verify. It could either strip the option or set any scale factors inside the option to zero. But doing that is not much simpler than actually supporting non-zero factors. All these approaches have the limitation that they only work if the code sees the TCP handshake of the connections. > So I conclude that either the OpenBSD firewall code has no trouble with > wscale but the NAT code has, or the Linux NAT clears out the wscale TCP > options from the initial SYN packet - i.e. does exactly what you propose. It's the OpenBSD TCP sequence number tracking code that stalls such connections, and that is used whenever you filter a TCP connection statefully (when using 'keep state'). pf always creates a state entry when any translation (like nat, rdr or binat) is applied to a connection. If you were filtering statelessly with pf and doing nat on the Linux box, that might explain why the connection didn't stall. In the tcpdumped session you quoted, the client was using 'wscale 0' and the server 'wscale 9'. That means the client's window values didn't get shifted/multiplied at all, and the server's were shifted 9 bits (multiplied by a factor of 2^9=512). The server started sending window values of 12 (meaning 12*512=6144) and increased them to 52 (meaning 52*512=26624). As long as the client sent smaller segments, pf let them through. But the first larger packet gets dropped, and the client retransmits it until the connection times out. So you might not always see a stall, depending on the kind of traffic the client sends. If it's all small packets (like an interactive SQL session, where the client sends only small commands), it could work. Also, the server might have used a lower scaling factor on other connections. wscale 9 is quite large, that means it wants to be able to advertise a maximum window of 65535*512 bytes, about 32 MB. Such a large window would mean the client is invited to send up to 32 MB of data before getting an acknowledgment. I don't know how Linux calculates the scaling factors, but I guess it might depend on the memory available for such buffers at run-time. It might have chosen a lower scaling factor during the second test. But that's just a guess :) It's also interesting that your client chose wscale 0, indicating that it doesn't itself want to scale its own windows (because it has no large buffers?) but wants to support the peer doing so. If you worry about performance impacts due to disabled window scaling, it might depend on the nature of your traffic. If only the server uses large windows (using scaling factors), only bulk traffic client -> sender would benefit. If your client only sends small queries but gets large results back, using a factor only for the server's windows wouldn't improve performance. Daniel
Re: PF NAT and Oracle/Linux mystery
We could add a "strip-wscale" option to scrub. It doesn't solve the state pickup issue, but could prevent clients communicating through the firewall from negotiating this option. Does the Linux NAT code already do this? We tried and temporarily split up our combined firewall/NAT machine into two, a firewall (the original combined machine with NAT commented out), and an extra NAT machine. When the NAT machine ran OpenBSD, contact with the wscale-ing Linux/Oracle server failed. When we installed Linux on the NAT machine, it worked, although in both cases the OpenBSD firewall was still between the NAT machine and the Oracle server. So I conclude that either the OpenBSD firewall code has no trouble with wscale but the NAT code has, or the Linux NAT clears out the wscale TCP options from the initial SYN packet - i.e. does exactly what you propose. I have not tried to flush the Linux NAT state (and thus, wscale size) and see if it crashes the connection. I only understood these issues after Daniels explanation. Cheers, Steve _ Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail
Re: PF NAT and Oracle/Linux mystery
> Return-Path: [EMAIL PROTECTED] > Delivery-Date: Fri Jan 17 14:46:14 2003 > If the client supports the extention, it will add a TCP option to its > initial SYN packet, indicating its support (and supplying its own scale > factor). If the peer also supports the extention, it will add its own > TCP option to the SYN+ACK, supplying its scale factor (the two factors > can be different). If only one of the peers understands the extention, > the ignorant one will not add the TCP option, and the proposing one must > not scale its window values. We could add a "strip-wscale" option to scrub. It doesn't solve the state pickup issue, but could prevent clients communicating through the firewall from negotiating this option. -kj
Re: PF NAT and Oracle/Linux mystery
Daniel Hartmeier wrote/schrieb/scripsit: > I guess we could add support for the case where pf does see the > handshake, but this is the first time I see this problem reported, maybe > RFC 1323 adoption isn't that broad. Let's not do the ECN mistake again. -Stefan
Resolved: PF NAT and Oracle/Linux mystery
Hi Daniel, hi Mike, and the others. Thank you very very much for your help! Now I know what caused the problem (TCP Window Scaling) and how to fix it ("echo 0 > /proc/sys/net/ipv4/tcp_window_scaling" on the clients), all without requiring access to the Oracle server machine, and without measureable performance loss for the client in the private network! In one word, perfect. What a way to end a week. Thanks again! Cheers, Steve _ MSN 8: advanced junk mail protection and 2 months FREE*. http://join.msn.com/?page=features/junkmail
Re: PF NAT and Oracle/Linux mystery
On Fri, Jan 17, 2003 at 02:01:39PM +, Steve Schmitz wrote: > Any idea why they do this? The TCP header has only space to hold a 16-bit unsigned number to hold the window value, so windows are traditionally limited to 65535 bytes, which can limit performance on fast networks. RFC 1323 (http://www.faqs.org/rfcs/rfc1323.html) defines the Window Scale Option as an extention to TCP (RFC 793). If the client supports the extention, it will add a TCP option to its initial SYN packet, indicating its support (and supplying its own scale factor). If the peer also supports the extention, it will add its own TCP option to the SYN+ACK, supplying its scale factor (the two factors can be different). If only one of the peers understands the extention, the ignorant one will not add the TCP option, and the proposing one must not scale its window values. So, you don't necessarily have to modify the external server, it would be sufficient to make your client not add the TCP option. Because it adds the option ('wscale 0' in your tcpdump), the server is free to use 'wscale 9'. If your client doesn't add the option, the server won't try to scale its windows, either. The problem with adding support for this extention to pf is that the needed information is communicated only in the initial SYN and SYN+ACK packets of a connection. If pf sees those, it could note the two factors in the state entry and multiply each subsequent window value accordingly, without much difficulty. But if pf creates a state entry from packets after the TCP handshake (like when you flush your state entries, and don't limit state creation to 'flags S/SA', so pf 'picks up' existing connections), there's no (simple) way to deduce the factors from subsequent packets, so such state entries would still cause stalled connections. I guess we could add support for the case where pf does see the handshake, but this is the first time I see this problem reported, maybe RFC 1323 adoption isn't that broad. Daniel
Re: PF NAT and Oracle/Linux mystery
You mentioned the behavior depends on the OS (and application) of the server. When Oracle runs on Solaris, it works. And when you connect to the Linux Oracle to another service (ssh, etc.), it works, too? I am not allowed to log into Linux/Oracle server. I tried with netcat on a sister machine of the L/O server and this worked okay. Could you run a tcpdump -nvvvSpi to catch all packets of a new connection up to the point where it stalls? You can use a filter expression (like 'host 192.168.101.14') to only capture packets of a single connection, as the stall occurs after around 130 packets, the log shouldn't get too large. Find the log attached. The client this time was 192.168.101.9. Cheers, Steve _ Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail oracle-hang.log Description: Binary data
Re: PF NAT and Oracle/Linux mystery
> You mentioned the behavior depends on the OS (and application) of the > server. When Oracle runs on Solaris, it works. And when you connect to > the Linux Oracle to another service (ssh, etc.), it works, too? If > that's the case, I wonder whether the Oracle on Linux is configured to > use any TCP options that might affect window sizes (th_win). In the tcpdump output, look for "wscale " on the first packet. Our state code doesn't handle window scaling which I can see Oracle enabling. 'echo 0 > /proc/sys/net/ipv4/tcp_window_scaling' on the linux box to turn it off. > Mike, have you ever seen such a case before? Nope. .mike
Re: PF NAT and Oracle/Linux mystery
On Fri, Jan 17, 2003 at 07:51:29AM +, Steve Schmitz wrote: > The firewall is running not quite the newest version of OpenBSD/PF (a 3.2 > beta). Is it advisable to upgrade, given the interruption in service? I doubt it will make a difference, as that part of the code (the sequence number tracking) hasn't changed since then, so no. > Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 > 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 > win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] > 4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd > Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1 This error means that the client (192.168.101.1) tries to send a packet to the server (141.225.240.34) with a sequence number (3987556722) and length (121) larger than the window the server expects (3987556722-77). When the server acks a packets, the expected window is increased to the acked sequence number plus the advertised window. In this case, the expected window is extremely small (just 55 bytes), so basically the next packet is certain to fail the check. The client isn't sending prematurely here, it just sends the next packet after it got an ack for the previous part (seq=3987556722, src.seqlo=3987556722), the question is why the window is too small. Possibly, the last ack from the server had a very small th_win. You mentioned the behavior depends on the OS (and application) of the server. When Oracle runs on Solaris, it works. And when you connect to the Linux Oracle to another service (ssh, etc.), it works, too? If that's the case, I wonder whether the Oracle on Linux is configured to use any TCP options that might affect window sizes (th_win). Could you run a tcpdump -nvvvSpi to catch all packets of a new connection up to the point where it stalls? You can use a filter expression (like 'host 192.168.101.14') to only capture packets of a single connection, as the stall occurs after around 130 packets, the log shouldn't get too large. Mike, have you ever seen such a case before? Daniel
Re: PF NAT and Oracle/Linux mystery
Could be fragments. Can you try with scrub in on $ext_if all no-df scrub out on $ext_if all no-df If you run pfctl -si, do you see any of the 'Counters' at the bottom increase when you get a stalled connection? Also, can you enable debug loggin (pfctl -x m) and check /var/log/messages for relevant entries, after reproducing the problem? I included the two scrub lines into the ruleset and flushed and reloaded the pf, but to no avail. Log attached. The firewall is running not quite the newest version of OpenBSD/PF (a 3.2 beta). Is it advisable to upgrade, given the interruption in service? Cheers, Steve _ MSN 8 helps eliminate e-mail viruses. Get 2 months FREE* http://join.msn.com/?page=features/virus 192.168.101.14 - the node which tries to connect to Oracle/Linux 141.225.240.34 - the Oracle/Linux server 139.33.102.140 - the OpenBSD/PF NAT (and FW) machine Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1 Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1 Jan 16 18:41:44 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 4:4PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=131 dir=out,fwd Jan 16 18:41:44 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=131 dir=out,fwd Jan 16 18:41:44 firewall /bsd: pf: State failure on: 1 Jan 16 18:41:44 firewall /bsd: pf: State failure on: 1 Counters match 308080.0/s bad-offset 00.0/s fragment 00.0/s short 00.0/s normalize 00.0/s memory 00.0/s [ shortly after ] Counters match 325000.0/s bad-offset 00.0/s fragment 00.0/s short 00.0/s normalize 00.0/s memory 00.0/s
Re: PF NAT and Oracle/Linux mystery
On Thu, Jan 16, 2003 at 02:54:29PM +, Steve Schmitz wrote: > Any ideas? Could be fragments. Can you try with scrub in on $ext_if all no-df scrub out on $ext_if all no-df If you run pfctl -si, do you see any of the 'Counters' at the bottom increase when you get a stalled connection? Also, can you enable debug loggin (pfctl -x m) and check /var/log/messages for relevant entries, after reproducing the problem? Daniel
PF NAT and Oracle/Linux mystery
Hi, I have a problem with access to an Oracle database over an OpenBSD PF NAT setup. We (a particle physics institute) have a Linux cluster for our computations; the nodes have private IP addresses and contact the outside world via an OpenBSD/PF NAT machine. The NAT machine works perfectly fine for SSH/SCP, DNS and everything else we tried. Everything except access to an Oracle database on a Linux machine, that is. A connection can be opened, and a query can be sent. However, after a few lines of results printed out, the connection freezes. pfctl -s state reports the connection as ESTABLISHED:ESTABLISHED, even minutes after the connection went south. It is interesting to notice that two variations of this situation do indeed work well: access via an OpenBSD/PF NAT to an Solaris Oracle database works, and access via a Linux/iptables NAT to both Oracle on Solaris and on Linux works, too. The problem seems to be an interference of the OpenBSD/PF NAT with the Linux/Oracle. Any ideas? Cheers, Steve _ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963