Okay guys, I posted that long message about Firefox/etc on Windows Vista a couple of days ago.
After I re-read my post and looked at the tcpdump output, and chatting with a friend of mine who also runs several OBSD firewalls at his company which exhibited the same EXACT problem when my Vista installs attempted to connect... I think we've figured it out. I remember that Vista (and presumably Longhorn Server) have a completely re-written TCP stack by Microsoft. They've put in all kinds of new stuff. One of which is Receive Window Auto-Tuning. I noticed in my tcpdump the following lines:
From Opera/Firefox:
20:40:45.824144 my.workstation.ip.49370 > remote.server.ip.80: S 1215871830:1215871830(0) win 8192 <mss 1380,nop,wscale 8,nop,nop,sackOK> (DF) 20:38:25.198320 remote.server.ip.80 > my.workstation.ip.49357: S 852828096:852828096(0) ack 643900712 win 64240 <mss 1460,nop,wscale 0,nop,nop,sackOK>
From IE 7:
20:39:08.834465 my.workstation.ip.49358 > remote.server.ip.80: S 4155969795:4155969795(0) win 8192 <mss 1380,nop,wscale 2,nop,nop,sackOK> (DF) 20:39:08.835095 remote.server.ip.80 > my.workstation.ip.49358: S 3294485308:3294485308(0) ack 4155969796 win 64240 <mss 1460,nop,wscale 0,nop,nop,sackOK> Notice the window scale is 2 for IE, and 8 for Firefox/Opera.
From the following MS blog @
http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx : Note: Some Internet gateway devices and firewalls block packet flows because they do not correctly interpret the scaling factor used in TCP connections. Because of this, Internet Explorer in Windows Vista uses an initial scaling factor of 2. Other applications use a default initial scaling factor of 8. After doing the new Vista-equivilent of sudo and elevating my command shell to Administrator mode, I was able to use the following command to disable window scaling completely: C:\windows\system32> netsh interface tcp set global autotuninglevel=disabled Once I performed this command, Firefox/Opera/Remote Desktop Connection all function once more as expected. Now, as I am clearly looking at the window scale issue here, I had found a thread at http://archive.openbsd.nu/?ml=openbsd-pf&a=2006-07&t=2147873 where Daniel Hartmeier comments there are three things that need to be done to have state created correctly: a) there is a default block policy b) all 'pass' rules that can match TCP have 'flags S/SA' c) all 'pass' rules have 'keep state' Here is what I am seeing inside PF with the connections in question: Nov 26 23:09:36.970856 rule 80/(match) pass in on fxp1: my.workstation.ip.59970 > remote.server.ip.443: [|tcp] (DF) Then if I pull the 80th rule out: @80 pass in log quick on fxp1 inet proto tcp from any to remote.server.ip port = https flags S/SA keep state label "ExchangeIn" Now, I can easily see that I am matching B and C of Daniel's list, however A is a bit more in question from my point of view. The rules I do have are: @47 block drop in log on fxp1 all label "DefaultBlock" @48 block return-rst in log on fxp1 proto tcp all label "DefaultBlock" @49 block return-icmp(port-unr, port-unr) in log on fxp1 proto udp all label "DefaultBlock" So, it appears I have condition A matched as well. I do have a line regarding: @95 pass in quick on fxp0 inet proto tcp from <lan:1> to any flags S/SA keep state label "lanOUT" That should not come into play here at all, as again it is creating state on a Syn, not a Syn Ack. However, after testing on this system, I am thinking I am filtering wrong here. Here is what I have found as the full story of what is going on: Connection is open: Nov 27 12:09:07.978281 rule 80/(match) pass in on fxp1: my.workstation.ip.62658 > remote.server.ip.443: [|tcp] (DF) Two state entries are created: all tcp remote.server.ip:443 <- remote.server.ip:443 <- my.workstation.ip:62658 ESTABLISHED:ESTABLISHED [4265902579 + 65535] [1356591875 + 65535] age 00:01:08, expires in 119:59:09, 12:9 pkts, 2014:5401 bytes, rule 80 id: 453e890500b00c1e creatorid: 19ad04b2 all tcp my.workstation.ip:62658 <- remote.server.ip:443 ESTABLISHED:ESTABLISHED [1356591875 + 65535] [4265902579 + 65535] age 00:01:08, expires in 119:59:09, 9:11 pkts, 5401:1966 bytes, rule 96 id: 453e890500b00c1f creatorid: 19ad04b2 Rules that match the state entries: @80 pass in log quick on fxp1 inet proto tcp from any to remote.server.ip port = https flags S/SA keep state label "ExchangeIn" [ Evaluations: 5000 Packets: 204 Bytes: 50906 States: 5 ] [ Inserted: uid 0 pid 806 ] @95 pass in quick on fxp0 inet proto tcp from <lan:1> to any flags S/SA keep state label "lanOUT" [ Evaluations: 417330 Packets: 60770372 Bytes: 36986041704 States: 771 ] [ Inserted: uid 0 pid 6307 ] @96 pass in quick on fxp0 from <lan:1> to any keep state label "lanOUT" [ Evaluations: 429312 Packets: 9560597 Bytes: 5957780712 States: 135 ] [ Inserted: uid 0 pid 6307 ] Now, it is looking to me like the issue is a second state entry is created by that rule 96. When it is modified to be only protocol UDP, traffic through the FW stops due to the rules: block in log from any to any label "DefaultBlock" block in log on { $ext_if } all label "DefaultBlock" block return-rst in log on { $ext_if } proto tcp all label "DefaultBlock" block return-icmp in log on { $ext_if } proto udp all label "DefaultBlock" I think the reason for this misunderstanding on my part is I could have sworn that state is checked before rules. However, the Syn Ack from internal is being blocked on the internal interface due to the block any from any rule. I have heard it said that it makes no sense to filter on two interfaces, best to pass on one and block on the other. In this case it is a multi-legged firewall, and filtering on only one interface is not an option. So my question here now is, what is the accepted fix for this? As the Syn Ack is not hitting the existing state entry, but instead creating a new one, and if that rule is removed just being blocked, how do I let the Syn Ack hit the initial state entry? Thanks! -- Rev