Re: Weird problems with 'pf' (on both 5.x and 6.x)

2006-07-28 Thread Stefan Bethke

Am 28.07.2006 um 03:57 schrieb Garance A Drosihn:


It occurred to me that it might be more informative to
see the transaction from the *freebsd* side of things,
since that's the machine running pf!   So, here is a
similar set of two lpq's, as seen from the print-server
side of the connection.  It seems to be telling the
same basic story, as far as I can tell.


It's just showing that no ACKs come back.  Can you see if anything  
showing pflog0 with tcpdump? That output should also tell you which  
rule forced the rejection.


What I do find curious is that the client keeps using port 1023  
consistently.  I was under the impression that reusing the same port  
number (thus having the same src-ip/port+dst-ip/port tuple) shouldn't  
work, because old packets could arrive after the original  
connection was closed; that's what the CLOSE_WAIT state in netstat is.



Stefan

--
Stefan Bethke [EMAIL PROTECTED]   Fon +49 170 346 0140


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Weird problems with 'pf' (on both 5.x and 6.x)

2006-07-28 Thread Garance A Drosihn

At 9:30 PM +0200 7/28/06, Stefan Bethke wrote:

Am 28.07.2006 um 03:57 schrieb Garance A Drosihn:


It occurred to me that it might be more informative to
see the transaction from the *freebsd* side of things,
since that's the machine running pf!   So, here is a
similar set of two lpq's, as seen from the print-server
side of the connection.  It seems to be telling the
same basic story, as far as I can tell.


It's just showing that no ACKs come back.  Can you see
if anything showing pflog0 with tcpdump?


Thanks for the reply.  I'll check that when I get a chance
to turn the machine back on.  (the air-conditioning for
our offices just died -- AGAIN -- so I had to shut down
most of my machines today).


That output should also tell you which rule forced the
rejection.


There is only one rule.  The config file I'm testing with
has three comment lines, and:

   pass out quick proto { tcp, udp } all keep state

That's it.  That's the entire /etc/pf.conf file.


What I do find curious is that the client keeps using
port 1023 consistently.  I was under the impression that
reusing the same port number (thus having the same
src-ip/port+dst-ip/port tuple) shouldn't work, because
old packets could arrive after the original connection
was closed; that's what the CLOSE_WAIT state in netstat is.


Hmm.  Well, I did wait a few seconds between the two lpq's,
just so it would be easier tell them apart in the packet dumps.

Perhaps solaris is quicker to reuse ports, while 'pf'
remembers that  src-ip/port+dst-ip/port  tuple for a
longer stretch of time?

But if so, there were seven seconds between the end of the
first 'lpq' and the first attempt to start a connection for
the second one.  The 'pf' side didn't start returning ACK's
until 111 seconds after the first connection had closed.
That seems like a pretty long time to wait.

--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Weird problems with 'pf' (on both 5.x and 6.x)

2006-07-28 Thread Stefan Bethke

Am 28.07.2006 um 22:20 schrieb Garance A Drosihn:


At 9:30 PM +0200 7/28/06, Stefan Bethke wrote:

What I do find curious is that the client keeps using
port 1023 consistently.  I was under the impression that
reusing the same port number (thus having the same
src-ip/port+dst-ip/port tuple) shouldn't work, because
old packets could arrive after the original connection
was closed; that's what the CLOSE_WAIT state in netstat is.


Hmm.  Well, I did wait a few seconds between the two lpq's,
just so it would be easier tell them apart in the packet dumps.

Perhaps solaris is quicker to reuse ports, while 'pf'
remembers that  src-ip/port+dst-ip/port  tuple for a
longer stretch of time?


Thinking about it, it must be pf's notion of when to forget about a  
closed TCP connection.  lpq (in FreeBSD) is intent on using port  
1023, tells the kernel it's OK to reuse it, and will try until it  
gets it, with an exponential backoff and an upper limit on the number  
of tries.  I'd think the Solaris lpq does the same.  Since the client  
and server know it's OK, they can deal with the not-yet-expired  
TIME_WAIT (by ignoring it).  But pf obviously cannot know about it,  
and will drop packets that are received during TIME_WAIT, including a  
new SYN.


For this case in particular, you should be able to use a pair of  
static rules (instead of keep state), since both source and  
destination ports will always be the same. Something like

pass out quick proto tcp from $client 1023 to $server 515
pass in quick proto tcp from $server 515 to $client 1023

I'm not certain this is a bug in pf, maybe someone more knowledgeable  
can explain how the TCP state machine in pf works.



Stefan

--
Stefan Bethke [EMAIL PROTECTED]   Fon +49 170 346 0140


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Weird problems with 'pf' (on both 5.x and 6.x)

2006-07-27 Thread Garance A Drosihn

It happens that I noticed two odd networking problems recently.
One of them is easily reproducible, and I have it tracked down
to one innocuous-looking line in my /etc/pf.conf.  The other
is a problem in a chat server that I run, with a few hundred
people on it, and is much more of a hassle to reproduce.  But
turning off 'pf' to solve the first problem seems to have
also solved the second problem, so I assume both problems
come from the same culprit.

Once I figured out how to reproduce the problem, it seems so
easy to reproduce that I find it odd that no one else has
run into it.  But I also do not notice any PR's that seemed
to describe the problem.  I'd appreciate it if people would
try to duplicate the problem on some other machines.

This problem has been seen on:
 5.x-stable as built on Mon Jul 24
 6.x-stable as built on Mon Jul 17
(as well as several earlier snapshots of both 5.x and 6.x).

I have a freebsd box which is the server for a print queue
named 'bill', and is running pf.  I have other machines which
reference that queue.  It seems that machines on the same
subnet as the server-box do not exhibit the problem.  But
for other machines, if I do 'lpq -Pbill' twice in rapid
succession, then the second one will hang.

After some futzing around, I determined that if my pf.conf
has only the lines:

# Filtering: the implicit first two rules are
#pass in all
#pass out all

then I can do many many lpq's in a row, without any trouble.
But if I restart pf after adding these lines to pf.conf:

#   Allow all outgoing tcp and udp connections and keep state
pass out quick proto { tcp, udp } all keep state

then I have the problem where the second 'lpq' from a remote
host will hang, if it is done right after the first one.
That's right.  I add a rule which just does quick passing
for *outbound* connections, and somehow that screws up
(blocks?) *incoming* connections.  I have no rules which
should block any packets at all, so my guess is that some
packets are getting lost, delayed, or corrupted somewhere.

--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Weird problems with 'pf' (on both 5.x and 6.x)

2006-07-27 Thread Garance A Drosihn

At 9:07 PM -0400 7/27/06, Garance A Drosihn wrote:


But if I restart pf after adding these lines to pf.conf:

#   Allow all outgoing tcp and udp connections and keep state
pass out quick proto { tcp, udp } all keep state

then I have the problem where the second 'lpq' from a remote
host will hang, if it is done right after the first one.


The client-machine which is doing the lpq is a solaris
machine, so here is the 'snoop' output from that side
of things.  Disclaimer:  I'm not a networking expert,
so I'm hoping someone else will find this a lot more
obvious than I do.

Here's the packets from the first 'lpq', with various
names changed to protect the innocent (and to reduce
the wrapping a little bit...):


  1   0.0 lpq-client - print-serv ETHER Type=0800 (IP), size = 62 bytes
  1   0.0 lpq-client - print-serv IP  D=128.113.000.001 
S=128.113.002.002 LEN=48, ID=13267
  1   0.0 lpq-client - print-serv TCP D=515 S=1023 Syn 
Seq=1503722122 Len=0 Win=24820 Options=nop,nop,sackOK,mss 1460

  1   0.0 lpq-client - print-serv PRINTER C port=1023

  2   0.00068 print-serv - lpq-client ETHER Type=0800 (IP), size = 62 bytes
  2   0.00068 print-serv - lpq-client IP  D=128.113.002.002 
S=128.113.000.001 LEN=48, ID=4007
  2   0.00068 print-serv - lpq-client TCP D=1023 S=515 Syn 
Ack=1503722123 Seq=1874442309 Len=0 Win=65535 Options=mss 
1460,sackOK,eol

  2   0.00068 print-serv - lpq-client PRINTER R port=1023

  3   0.00072 lpq-client - print-serv ETHER Type=0800 (IP), size = 54 bytes
  3   0.00072 lpq-client - print-serv IP  D=128.113.000.001 
S=128.113.002.002 LEN=40, ID=13268
  3   0.00072 lpq-client - print-serv TCP D=515 S=1023 
Ack=1874442310 Seq=1503722123 Len=0 Win=24820

  3   0.00072 lpq-client - print-serv PRINTER C port=1023

  4   0.00088 lpq-client - print-serv ETHER Type=0800 (IP), size = 63 bytes
  4   0.00088 lpq-client - print-serv IP  D=128.113.000.001 
S=128.113.002.002 LEN=49, ID=13269
  4   0.00088 lpq-client - print-serv TCP D=515 S=1023 
Ack=1874442310 Seq=1503722123 Len=9 Win=24820

  4   0.00088 lpq-client - print-serv PRINTER C port=1023 \3bill\n

  5   0.03003 print-serv - lpq-client ETHER Type=0800 (IP), size = 132 bytes
  5   0.03003 print-serv - lpq-client IP  D=128.113.002.002 
S=128.113.000.001 LEN=118, ID=4045
  5   0.03003 print-serv - lpq-client TCP D=1023 S=515 
Ack=1503722132 Seq=1874442310 Len=78 Win=65535

  5   0.03003 print-serv - lpq-client PRINTER R port=1023 Warning: bill is

  6   0.03014 print-serv - lpq-client ETHER Type=0800 (IP), size = 60 bytes
  6   0.03014 print-serv - lpq-client IP  D=128.113.002.002 
S=128.113.000.001 LEN=40, ID=4046
  6   0.03014 print-serv - lpq-client TCP D=1023 S=515 Fin 
Ack=1503722132 Seq=1874442388 Len=0 Win=65535

  6   0.03014 print-serv - lpq-client PRINTER R port=1023

  7   0.03020 lpq-client - print-serv ETHER Type=0800 (IP), size = 54 bytes
  7   0.03020 lpq-client - print-serv IP  D=128.113.000.001 
S=128.113.002.002 LEN=40, ID=13270
  7   0.03020 lpq-client - print-serv TCP D=515 S=1023 
Ack=1874442388 Seq=1503722132 Len=0 Win=24820

  7   0.03020 lpq-client - print-serv PRINTER C port=1023

  8   0.03022 lpq-client - print-serv ETHER Type=0800 (IP), size = 54 bytes
  8   0.03022 lpq-client - print-serv IP  D=128.113.000.001 
S=128.113.002.002 LEN=40, ID=13271
  8   0.03022 lpq-client - print-serv TCP D=515 S=1023 
Ack=1874442389 Seq=1503722132 Len=0 Win=24820

  8   0.03022 lpq-client - print-serv PRINTER C port=1023

  9   0.03074 lpq-client - print-serv ETHER Type=0800 (IP), size = 54 bytes
  9   0.03074 lpq-client - print-serv IP  D=128.113.000.001 
S=128.113.002.002 LEN=40, ID=13272
  9   0.03074 lpq-client - print-serv TCP D=515 S=1023 Fin 
Ack=1874442389 Seq=1503722132 Len=0 Win=24820

  9   0.03074 lpq-client - print-serv PRINTER C port=1023

 10   0.03132 print-serv - lpq-client ETHER Type=0800 (IP), size = 60 bytes
 10   0.03132 print-serv - lpq-client IP  D=128.113.002.002 
S=128.113.000.001 LEN=40, ID=4047
 10   0.03132 print-serv - lpq-client TCP D=1023 S=515 
Ack=1503722133 Seq=1874442389 Len=0 Win=65534

 10   0.03132 print-serv - lpq-client PRINTER R port=1023



and then here is the packets from the second 'lpq', done
right after the first one.  It looks like the problem is
in the initial handshaking to get the connection started:


 11   7.19194 lpq-client - print-serv ETHER Type=0800 (IP), size = 62 bytes
 11   7.19194 lpq-client - print-serv IP  D=128.113.000.001 
S=128.113.002.002 LEN=48, ID=13273
 11   7.19194 lpq-client - print-serv TCP D=515 S=1023 Syn 
Seq=1505511645 Len=0 Win=24820 

Re: Weird problems with 'pf' (on both 5.x and 6.x)

2006-07-27 Thread Garance A Drosihn

At 9:18 PM -0400 7/27/06, Garance A Drosihn wrote:

At 9:07 PM -0400 7/27/06, Garance A Drosihn wrote:


But if I restart pf after adding these lines to pf.conf:

#   Allow all outgoing tcp and udp connections and keep state
pass out quick proto { tcp, udp } all keep state

then I have the problem where the second 'lpq' from a remote
host will hang, if it is done right after the first one.


The client-machine which is doing the lpq is a solaris
machine, so here is the 'snoop' output from that side
of things.


It occurred to me that it might be more informative to
see the transaction from the *freebsd* side of things,
since that's the machine running pf!   So, here is a
similar set of two lpq's, as seen from the print-server
side of the connection.  It seems to be telling the
same basic story, as far as I can tell.

aside
But if there is a bug somewhere, then might it
be that the same bug which effects 'pf' would
also confuse what tcpdump would report, when
running tcpdump on the same machine?
/aside

(316) santropez/root # tcpdump -X -r 
/tmp/gadchecks/all-060727.212311 host lpq-client

reading from file /tmp/gadchecks/all-060727.212311, link-type EN10MB (Ethernet)
21:23:32.175093 IP (tos 0x0, ttl  63, id 53775, offset 0, flags [DF], 
proto: TCP (6), length: 48) lpq-client.1023  print-serv.printer: S, 
cksum 0x6b2c (correct), 2119630748:2119630748(0) win 24820 
nop,nop,sackOK,mss 1460

0x:  4500 0030 d20f 4000 3f06 36af 8071 1985  [EMAIL PROTECTED]
0x0010:  8071 18a2 03ff 0203 7e56 ff9c    .q..~V..
0x0020:  7002 60f4 6b2c  0101 0402 0204 05b4  p.`.k,..
21:23:32.175205 IP (tos 0x0, ttl  64, id 4488, offset 0, flags [DF], 
proto: TCP (6), length: 48) print-serv.printer  lpq-client.1023: S, 
cksum 0x0bfa (correct), 2140553600:2140553600(0) ack 2119630749 win 
65535 mss 1460,sackOK,eol

0x:  4500 0030 1188 4000 4006 f636 8071 18a2  [EMAIL 
PROTECTED]@..6.q..
0x0010:  8071 1985 0203 03ff 7f96 4180 7e56 ff9d  .qA.~V..
0x0020:  7012  0bfa  0204 05b4 0402   p...
21:23:32.175787 IP (tos 0x0, ttl  63, id 53776, offset 0, flags [DF], 
proto: TCP (6), length: 40) lpq-client.1023  print-serv.printer: ., 
cksum 0xd6c8 (correct), 1:1(0) ack 1 win 24820

0x:  4500 0028 d210 4000 3f06 36b6 8071 1985  E..([EMAIL PROTECTED]
0x0010:  8071 18a2 03ff 0203 7e56 ff9d 7f96 4181  .q..~VA.
0x0020:  5010 60f4 d6c8       P.`.UU
21:23:32.175935 IP (tos 0x0, ttl  63, id 53777, offset 0, flags [DF], 
proto: TCP (6), length: 49) lpq-client.1023  print-serv.printer: P, 
cksum 0xc80d (correct), 1:10(9) ack 1 win 24820

0x:  4500 0031 d211 4000 3f06 36ac 8071 1985  [EMAIL PROTECTED]
0x0010:  8071 18a2 03ff 0203 7e56 ff9d 7f96 4181  .q..~VA.
0x0020:  5018 60f4 c80d  0370 6269 6c6c 3264  P.`..bill
0x0030:  0a   .
21:23:32.204946 IP (tos 0x0, ttl  64, id 4526, offset 0, flags [DF], 
proto: TCP (6), length: 118) print-serv.printer  lpq-client.1023: P, 
cksum 0x5bcb (correct), 1:79(78) ack 10 win 65535

0x:  4500 0076 11ae 4000 4006 f5ca 8071 18a2  [EMAIL 
PROTECTED]@q..
0x0010:  8071 1985 0203 03ff 7f96 4181 7e56 ffa6  .qA.~V..
0x0020:  5018  5bcb  5761 726e 696e 673a  P...[...Warning:
0x0030:  2070 6269 6c6c 3264 2069 7320 646f 776e  .bill.is.down
0x0040:  3a20 5468 6973 2071 7565 7565 2069 7320  :.This.queue.is.
0x0050:  666f 7220 4761 7261 6e63 6520 7465 7374  for.Garance.test
0x0060:  696e 672e 2073 742f 3678 0a6e 6f20 656e  ing..st/6x.no.en
0x0070:  7472 6965 730a   tries.
21:23:32.204988 IP (tos 0x0, ttl  64, id 4527, offset 0, flags [DF], 
proto: TCP (6), length: 40) print-serv.printer  lpq-client.1023: F, 
cksum 0x3765 (correct), 79:79(0) ack 10 win 65535

0x:  4500 0028 11af 4000 4006 f617 8071 18a2  E..([EMAIL 
PROTECTED]@q..
0x0010:  8071 1985 0203 03ff 7f96 41cf 7e56 ffa6  .qA.~V..
0x0020:  5011  3765   P...7e..
21:23:32.205701 IP (tos 0x0, ttl  63, id 53778, offset 0, flags [DF], 
proto: TCP (6), length: 40) lpq-client.1023  print-serv.printer: ., 
cksum 0xd671 (correct), 10:10(0) ack 79 win 24820

0x:  4500 0028 d212 4000 3f06 36b4 8071 1985  E..([EMAIL PROTECTED]
0x0010:  8071 18a2 03ff 0203 7e56 ffa6 7f96 41cf  .q..~VA.
0x0020:  5010 60f4 d671       P.`..q..UU
21:23:32.205755 IP (tos 0x0, ttl  63, id 53779, offset 0, flags [DF], 
proto: TCP (6), length: 40) lpq-client.1023  print-serv.printer: ., 
cksum 0xd670 (correct), 10:10(0) ack 80 win 24820

0x:  4500 0028 d213 4000 3f06 36b3 8071 1985  E..([EMAIL PROTECTED]
0x0010:  8071 18a2 03ff 0203 7e56 ffa6 7f96 41d0  .q..~VA.