Re: bind error when using SO_REUSEPORT(implies SO_REUSEADDR)

2013-07-17 Thread Mikolaj Golub
On Tue, Jul 16, 2013 at 11:12:46AM -0400, John Baldwin wrote:
 On Thursday, March 15, 2012 8:07:46 pm Sean Bruno wrote:
  On Thu, 2012-03-15 at 16:59 -0700, Sean Bruno wrote:
   Hey, I just found a bind bug ticket in my queue about bind.  I noted
   that on stable/6 stable/7 stable/9  head the referenced code fails.
   
   It seems that this is a problem, but I have no idea if its a real
   problem or not.  Our devs think it is.  Anyway, here is a code snippet
   to show the failure in bind.  On linux/solaris this does not fail.
   
   http://people.freebsd.org/~sbruno/bind_test.c
   
   simple compile with gcc -o test test.c and run as normal user.
   
   Sean
   
  
  this is bind() not bind ... :-)
 
 Did the recent commit to HEAD fix this btw?

As for me, bind_test.c does not expose any bug in freebsd, it only
shows different behavior for freebsd and linux.

On freebsd the test output is:

serversock addr is 127.0.0.1:27539
dup bind: Address already in use
This error was expected, tried to bind to used addr/port
BUG: binding duplicate socket to server port succeeded
dup2sock addr is 0.0.0.0:27539
overlapping explicit bind to same port number succeeded without SO_REUSEPORT
listen succeeded after explicitly overlapping port bind
autosock addr is 0.0.0.0:27539
bug triggered, port number conflict on sockets without SO_REUSEPORT
listen succeded after implicitly overlapping port bind

So, the first socket (serversock) is bound to the loopback address,
then it tries some combinations of binding the second socket to the
same port but to the wildcard address. When SO_REUSEADDR socket option
is set, binding to the wildcard address succeeds for freebsd (and
fails for linux).

They call this a bug in freebsd, but this is well known and expected
behavior (see e.g. Stevens' TCP/IP Illustrated Vol1). 

Or I missed the test's point?

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/179901: [netinet] [patch] Multicast SO_REUSEADDR handled incorrectly

2013-06-30 Thread Mikolaj Golub
The following reply was made to PR kern/179901; it has been noted by GNATS.

From: Mikolaj Golub troc...@freebsd.org
To: bug-follo...@freebsd.org
Cc: Michael Gmelin free...@grem.de
Subject: Re: kern/179901: [netinet] [patch] Multicast SO_REUSEADDR handled
 incorrectly
Date: Sun, 30 Jun 2013 10:17:05 +0300

 --EeQfGwPcQSOJBaQU
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 On Thu, Jun 27, 2013 at 11:00:16PM +0300, Mikolaj Golub wrote:
 
  I don't insist on maintaining the old behaviour. But as actually we
  have 2 issues here (regression introduced by me in FreeBSD9 and
  historical behavior that looks wrong), with different priority, I
  would like to fix the issues separately. This way it will be easier to
  track the changes, e.g. when after a year it turns out that the second
  change has broken some other case.
 
 Here is a patch for the second issue.
 
 -- 
 Mikolaj Golub
 
 --EeQfGwPcQSOJBaQU
 Content-Type: text/x-diff; charset=us-ascii
 Content-Disposition: attachment; filename=pr179901.2.1.patch
 
 commit 7cf3a6a95d74ae91c80350fc1ae8e96fe59c3c65
 Author: Mikolaj Golub troc...@freebsd.org
 Date:   Sun Jun 30 00:09:20 2013 +0300
 
 A complete duplication of binding should be allowed if on both new and
 duplicated sockets a multicast address is bound and either
 SO_REUSEPORT or SO_REUSEADDR is set.
 
 But actually it works for the following combinations:
 
  * SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new;
  * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new;
  * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new;
 
 and fails for this:
 
  * SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new.
 
 Fix the last case.
 
 PR:179901
 
 diff --git a/sys/netinet/in_pcb.c b/sys/netinet/in_pcb.c
 index 3506b74..eb15a38 100644
 --- a/sys/netinet/in_pcb.c
 +++ b/sys/netinet/in_pcb.c
 @@ -554,7 +554,7 @@ in_pcbbind_setup(struct inpcb *inp, struct sockaddr *nam, 
in_addr_t *laddrp,
 * and a multicast address is bound on both
 * new and duplicated sockets.
 */
 -  if (so-so_options  SO_REUSEADDR)
 +  if ((so-so_options  (SO_REUSEADDR|SO_REUSEPORT)) != 0)
reuseport = SO_REUSEADDR|SO_REUSEPORT;
} else if (sin-sin_addr.s_addr != INADDR_ANY) {
sin-sin_port = 0;  /* yech... */
 diff --git a/sys/netinet6/in6_pcb.c b/sys/netinet6/in6_pcb.c
 index a0a6874..fb84279 100644
 --- a/sys/netinet6/in6_pcb.c
 +++ b/sys/netinet6/in6_pcb.c
 @@ -156,7 +156,7 @@ in6_pcbbind(register struct inpcb *inp, struct sockaddr 
*nam,
 * and a multicast address is bound on both
 * new and duplicated sockets.
 */
 -  if (so-so_options  SO_REUSEADDR)
 +  if ((so-so_options  (SO_REUSEADDR|SO_REUSEPORT)) != 0)
reuseport = SO_REUSEADDR|SO_REUSEPORT;
} else if (!IN6_IS_ADDR_UNSPECIFIED(sin6-sin6_addr)) {
struct ifaddr *ifa;
 
 --EeQfGwPcQSOJBaQU--
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/179901: [netinet] [patch] Multicast SO_REUSEADDR handled incorrectly

2013-06-27 Thread Mikolaj Golub
The following reply was made to PR kern/179901; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: Michael Gmelin free...@grem.de
Cc: Mikolaj Golub troc...@freebsd.org, bug-follo...@freebsd.org
Subject: Re: kern/179901: [netinet] [patch] Multicast SO_REUSEADDR handled
 incorrectly
Date: Thu, 27 Jun 2013 23:00:16 +0300

 On Wed, Jun 26, 2013 at 03:03:40PM +0200, Michael Gmelin wrote:
  Hi,
  
  I adapted the test code, you can find it at
  
  http://blog.grem.de/multicast.c
  
  Test output is:
  
  IPv4 Port :
Bind using SO_REUSEADDR...OK   (expected)
Bind using SO_REUSEADDR...OK   (expected)
Bind using SO_REUSEPORT...FAIL (NOT expected): Address already in
  use IPv4 Port 5556:
Bind using SO_REUSEPORT...OK   (expected)
Bind using SO_REUSEPORT...OK   (expected)
Bind using SO_REUSEADDR...OK   (expected)
Bind using SO_REUSEPORT...FAIL (NOT expected): Address already in
  use IPv4 Port 5557:
Bind using SO_REUSEADDR x 2...OK   (expected)
Bind using SO_REUSEADDR x 2...OK   (expected)
Bind using SO_REUSEPORT...FAIL (NOT expected): Address already in
  use Bind using SO_REUSEADDR...OK   (expected)
Bind using SO_REUSEPORT...FAIL (NOT expected): Address already in
  use IPv4 Port 5558:
Bind without socketopts...OK   (expected)
Bind using SO_REUSEADDR...FAIL (expected): Address already in use
Bind using SO_REUSEPORT...FAIL (expected): Address already in use
  IPv4 Port 5559:
Bind using SO_REUSEADDR...OK   (expected)
Bind without socketopts...FAIL (expected): Address already in use
  IPv4 Port 5560:
Bind using SO_REUSEPORT...OK   (expected)
Bind using SO_REUSEPORT...OK   (expected)
Bind without socketopts...FAIL (expected): Address already in use
  IPv6 Port :
Bind using SO_REUSEADDR...OK   (expected)
Bind using SO_REUSEADDR...OK   (expected)
Bind using SO_REUSEPORT...FAIL (NOT expected): Address already in
  use IPv6 Port 5556:
Bind using SO_REUSEPORT...OK   (expected)
Bind using SO_REUSEPORT...OK   (expected)
Bind using SO_REUSEADDR...OK   (expected)
Bind using SO_REUSEPORT...FAIL (NOT expected): Address already in
  use IPv6 Port 5557:
Bind using SO_REUSEADDR x 2...OK   (expected)
Bind using SO_REUSEADDR x 2...OK   (expected)
Bind using SO_REUSEPORT...FAIL (NOT expected): Address already in
  use Bind using SO_REUSEADDR...OK   (expected)
Bind using SO_REUSEPORT...FAIL (NOT expected): Address already in
  use IPv6 Port 5558:
Bind without socketopts...OK   (expected)
Bind using SO_REUSEADDR...FAIL (expected): Address already in use
Bind using SO_REUSEPORT...FAIL (expected): Address already in use
  IPv6 Port 5559:
Bind using SO_REUSEADDR...OK   (expected)
Bind without socketopts...FAIL (expected): Address already in use
  IPv6 Port 5560:
Bind using SO_REUSEPORT...OK   (expected)
Bind using SO_REUSEPORT...OK   (expected)
Bind without socketopts...FAIL (expected): Address already in use
  
 
 Thank you for testing!
 
  So you maintained the old PORT/ADDR behavior, which I think is not such
  a great idea. I would suggest to get another opinion on this, just
  because it's broken now doesn't mean we have to perpetuate it - maybe we
  should compare the behavior with other Unix(like) OSes like the other
  BSDs and Linux to see how their implementations work - usually ported
  software is not changed in that respect, so being compatible is
  valuable.
 
 It is difficult to talk about portability in the case of SO_REUSEPORT.
 AFAIK, there is no SO_REUSEPORT in Linux and it is recommended to
 always use SO_REUSEADDR for multicast in portable code. It looks like
 in this case we will always have expected behavior with the proposed
 patch.
 
  Besides my rant the code works as designed and seems to resemble the
  behavior before r227207 correctly (I manually applied the patches to
  9.1-RELEASE).
  
  Fun fact: The code in ip6_output.c could have never worked in the first
  place, since it used IN_MULTICAST instead of IN6_IS_ADDR_MULTICAST:
  
  if (IN_MULTICAST(ntohl(in6p-inp_laddr.s_addr)))
  ...
 
 I don't insist on maintaining the old behaviour. But as actually we
 have 2 issues here (regression introduced by me in FreeBSD9 and
 historical behavior that looks wrong), with different priority, I
 would like to fix the issues separately. This way it will be easier to
 track the changes, e.g. when after a year it turns out that the second
 change has broken some other case.
 
 For now I am more concerned about having SO_REUSEADDR regression fixed
 in CURRENT and STABLE9 before 9.2. The patch is under review and I
 plan to commit it next week if it is ok.
 
 The second issue might require more discussion before commiting the
 change.
 
 -- 
 Mikolaj Golub

Re: kern/179901: [netinet] [patch] Multicast SO_REUSEADDR handled incorrectly

2013-06-25 Thread Mikolaj Golub
The following reply was made to PR kern/179901; it has been noted by GNATS.

From: Mikolaj Golub troc...@freebsd.org
To: Michael Gmelin free...@grem.de
Cc: bug-follo...@freebsd.org
Subject: Re: kern/179901: [netinet] [patch] Multicast SO_REUSEADDR handled
 incorrectly
Date: Tue, 25 Jun 2013 18:24:55 +0300

 --tThc/1wpZn/ma/RB
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 On Tue, Jun 25, 2013 at 01:39:38PM +0200, Michael Gmelin wrote:
 
  Yes, but it seems like your patch is fixing the not all places in
  in6_pcb.c, I think you should modify the code at line 246 as well:
  
  } else if (t  (reuseport == 0 ||
  (t-inp_flags2  INP_REUSEPORT) == 0)) {
  return (EADDRINUSE);
  }
  
  so it says
  } else if (t 
  (reuseport  inp_so_options(t)) == 0) {
   
 
 Good catch! I missed this because I was preparing the patch using
 r227207 as a reference, but this had been missed there too (fixed
 later in r233272 by glebius).
 
  Once 1) has been resolved I can test on a machine running 9.1-RELEASE
  later (the patch is small enough to apply it manually). I will run the
  unit test code from multicast.c I sent earlier and add IPv6 test
  cases to it as well.
 
 The updated patch is attached. Thanks.
 
 -- 
 Mikolaj Golub
 
 --tThc/1wpZn/ma/RB
 Content-Type: text/x-diff; charset=us-ascii
 Content-Disposition: attachment; filename=pr179901.2.patch
 
 Index: sys/netinet/in_pcb.c
 ===
 --- sys/netinet/in_pcb.c   (revision 251760)
 +++ sys/netinet/in_pcb.c   (working copy)
 @@ -467,6 +467,23 @@ in_pcb_lport(struct inpcb *inp, struct in_addr *la
  
return (0);
  }
 +
 +/*
 + * Return cached socket options.
 + */
 +int
 +inp_so_options(const struct inpcb *inp)
 +{
 +   int so_options;
 +
 +   so_options = 0;
 +
 +   if ((inp-inp_flags2  INP_REUSEPORT) != 0)
 + so_options |= SO_REUSEPORT;
 +   if ((inp-inp_flags2  INP_REUSEADDR) != 0)
 + so_options |= SO_REUSEADDR;
 +   return (so_options);
 +}
  #endif /* INET || INET6 */
  
  #ifdef INET
 @@ -595,8 +612,7 @@ in_pcbbind_setup(struct inpcb *inp, struct sockadd
if (tw == NULL ||
(reuseport  tw-tw_so_options) == 0)
return (EADDRINUSE);
 -  } else if (t  (reuseport == 0 ||
 -  (t-inp_flags2  INP_REUSEPORT) == 0)) {
 +  } else if (t  (reuseport  inp_so_options(t)) == 0) {
  #ifdef INET6
if (ntohl(sin-sin_addr.s_addr) !=
INADDR_ANY ||
 Index: sys/netinet/in_pcb.h
 ===
 --- sys/netinet/in_pcb.h   (revision 251760)
 +++ sys/netinet/in_pcb.h   (working copy)
 @@ -442,6 +442,7 @@ struct tcpcb *
inp_inpcbtotcpcb(struct inpcb *inp);
  void  inp_4tuple_get(struct inpcb *inp, uint32_t *laddr, uint16_t *lp,
uint32_t *faddr, uint16_t *fp);
 +int   inp_so_options(const struct inpcb *inp);
  
  #endif /* _KERNEL */
  
 @@ -543,6 +544,7 @@ void   inp_4tuple_get(struct inpcb *inp, uint32_t *
  #define   INP_PCBGROUPWILD0x0004 /* in pcbgroup wildcard list 
*/
  #define   INP_REUSEPORT   0x0008 /* SO_REUSEPORT option is 
set */
  #define   INP_FREED   0x0010 /* inp itself is not valid */
 +#define   INP_REUSEADDR   0x0020 /* SO_REUSEADDR option is 
set */
  
  /*
   * Flags passed to in_pcblookup*() functions.
 Index: sys/netinet/ip_output.c
 ===
 --- sys/netinet/ip_output.c(revision 251760)
 +++ sys/netinet/ip_output.c(working copy)
 @@ -900,13 +900,10 @@ ip_ctloutput(struct socket *so, struct sockopt *so
switch (sopt-sopt_name) {
case SO_REUSEADDR:
INP_WLOCK(inp);
 -  if (IN_MULTICAST(ntohl(inp-inp_laddr.s_addr))) 
{
 -  if ((so-so_options 
 -  (SO_REUSEADDR | SO_REUSEPORT)) != 0)
 -  inp-inp_flags2 |= 
INP_REUSEPORT;
 -  else
 -  inp-inp_flags2 = 
~INP_REUSEPORT;
 -  }
 +  if ((so-so_options  SO_REUSEADDR) != 0)
 +  inp-inp_flags2 |= INP_REUSEADDR;
 +  else
 +  inp-inp_flags2 = ~INP_REUSEADDR;
INP_WUNLOCK(inp

Re: kern/179901: [netinet] [patch] Multicast SO_REUSEADDR handled incorrectly

2013-06-24 Thread Mikolaj Golub
The following reply was made to PR kern/179901; it has been noted by GNATS.

From: Mikolaj Golub troc...@freebsd.org
To: bug-follo...@freebsd.org, free...@grem.de
Cc:  
Subject: Re: kern/179901: [netinet] [patch] Multicast SO_REUSEADDR handled
 incorrectly
Date: Mon, 24 Jun 2013 23:29:42 +0300

 --9amGYk9869ThD9tj
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 Michael,
 
 Thank you for your analysis and the patch.
 
 I have the following notes to your patch though:
 
 1) INET6 needs fixing too.
 
 2) It looks like after introducing INP_REUSEADDR there is no need in
 handling the IN_MULTICAST case in ip_ctloutput().
 
 3) Actually you don't have to use IN_MULTICAST() in in_pcbbind_setup():
 the information is already encoded in reuseport variable.
 
 4) The patch not only fixes the regression introduced by r227207, but
 also changes the historical behavior before r227207. Although the
 change might be correct it is better to separate these issues. Feeling
 guilty for the regression introduced in r227207 I am eager to fix it
 ASAP, before 9.2 release. But I don't have strong opinion about
 changing the historical behavior.
 
 So, could you please look at the attached patch, which is based on
 your idea of INP_REUSEADDR flag? Now the code more resembles the code
 before r227207 in looks and I am a little more confident that there is
 no regression.
 
 I would appreciate any testing. Note, it is against CURRENT; STABLE
 will require patching in_pcb.h manually due to newly introduced
 INP_FREED flag.
 
 -- 
 Mikolaj Golub
 
 --9amGYk9869ThD9tj
 Content-Type: text/x-diff; charset=us-ascii
 Content-Disposition: attachment; filename=pr179901.1.patch
 
 Index: sys/netinet/in_pcb.c
 ===
 --- sys/netinet/in_pcb.c   (revision 252162)
 +++ sys/netinet/in_pcb.c   (working copy)
 @@ -467,6 +467,23 @@ in_pcb_lport(struct inpcb *inp, struct in_addr *la
  
return (0);
  }
 +
 +/*
 + * Return cached socket options.
 + */
 +int
 +inp_so_options(const struct inpcb *inp)
 +{
 +   int so_options;
 +
 +   so_options = 0;
 +
 +   if ((inp-inp_flags2  INP_REUSEPORT) != 0)
 + so_options |= SO_REUSEPORT;
 +   if ((inp-inp_flags2  INP_REUSEADDR) != 0)
 + so_options |= SO_REUSEADDR;
 +   return (so_options);
 +}
  #endif /* INET || INET6 */
  
  #ifdef INET
 @@ -595,8 +612,8 @@ in_pcbbind_setup(struct inpcb *inp, struct sockadd
if (tw == NULL ||
(reuseport  tw-tw_so_options) == 0)
return (EADDRINUSE);
 -  } else if (t  (reuseport == 0 ||
 -  (t-inp_flags2  INP_REUSEPORT) == 0)) {
 +  } else if (t 
 +  (reuseport  inp_so_options(t)) == 0) {
  #ifdef INET6
if (ntohl(sin-sin_addr.s_addr) !=
INADDR_ANY ||
 Index: sys/netinet/in_pcb.h
 ===
 --- sys/netinet/in_pcb.h   (revision 252162)
 +++ sys/netinet/in_pcb.h   (working copy)
 @@ -442,6 +442,7 @@ struct tcpcb *
inp_inpcbtotcpcb(struct inpcb *inp);
  void  inp_4tuple_get(struct inpcb *inp, uint32_t *laddr, uint16_t *lp,
uint32_t *faddr, uint16_t *fp);
 +int   inp_so_options(const struct inpcb *inp);
  
  #endif /* _KERNEL */
  
 @@ -543,6 +544,7 @@ void   inp_4tuple_get(struct inpcb *inp, uint32_t *
  #define   INP_PCBGROUPWILD0x0004 /* in pcbgroup wildcard list 
*/
  #define   INP_REUSEPORT   0x0008 /* SO_REUSEPORT option is 
set */
  #define   INP_FREED   0x0010 /* inp itself is not valid */
 +#define   INP_REUSEADDR   0x0020 /* SO_REUSEADDR option is 
set */
  
  /*
   * Flags passed to in_pcblookup*() functions.
 Index: sys/netinet/ip_output.c
 ===
 --- sys/netinet/ip_output.c(revision 252162)
 +++ sys/netinet/ip_output.c(working copy)
 @@ -900,13 +900,10 @@ ip_ctloutput(struct socket *so, struct sockopt *so
switch (sopt-sopt_name) {
case SO_REUSEADDR:
INP_WLOCK(inp);
 -  if (IN_MULTICAST(ntohl(inp-inp_laddr.s_addr))) 
{
 -  if ((so-so_options 
 -  (SO_REUSEADDR | SO_REUSEPORT)) != 0)
 -  inp-inp_flags2 |= 
INP_REUSEPORT;
 -  else
 -  inp-inp_flags2 = 
~INP_REUSEPORT;
 -  }
 +  if ((so-so_options  SO_REUSEADDR) != 0)
 +  inp-inp_flags2 |= INP_REUSEADDR

Re: kern/167059: [tcp] [panic] System does panic in in_pcbbind() and hangs

2013-05-18 Thread Mikolaj Golub
The following reply was made to PR kern/167059; it has been noted by GNATS.

From: Mikolaj Golub troc...@freebsd.org
To: bug-follo...@freebsd.org, yeho...@gmail.com
Cc:  
Subject: Re: kern/167059: [tcp] [panic] System does panic in in_pcbbind() and
 hangs
Date: Sat, 18 May 2013 22:15:26 +0300

 This looks similar to the issue fixed in 9.0 (r227207 + r227449).
 
 There was a discussion on freebsd-net@ titled Kernel panic on FreeBSD
 9.0-beta2:
 
   http://lists.freebsd.org/pipermail/freebsd-net/2011-September/029858.html
 
 Are there chances that you can check =9.0?
 
 -- 
 Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


lagg with wireless iface: iieee80211_waitfor_parent is called with a non-sleepable lock held

2012-12-02 Thread Mikolaj Golub
Hi,

On my laptop I have lagg setup in failover mode between wired and
wireless interfaces, as it is decribed in handbook:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html#networking-lagg-wired-and-wireless

On start I have been observing witness warnings like below:

taskqueue_drain with the following non-sleepable locks held:
exclusive rw if_lagg rwlock (if_lagg rwlock) r = 0 (0xfe000aa9d408) locked 
@ /home/golub/freebsd/base/head/sys/modules/if_lagg/../../net/if_lagg.c:1065
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b
kdb_backtrace() at kdb_backtrace+0x39
witness_warn() at witness_warn+0x4b2
taskqueue_drain() at taskqueue_drain+0x3a
ieee80211_waitfor_parent() at ieee80211_waitfor_parent+0x28
ieee80211_ioctl() at ieee80211_ioctl+0x3e9
if_setflag() at if_setflag+0xc0
ifpromisc() at ifpromisc+0x2c
lagg_ioctl() at lagg_ioctl+0x7d5
if_setflag() at if_setflag+0xc0
ifpromisc() at ifpromisc+0x2c
bridge_ioctl_add() at bridge_ioctl_add+0x454
bridge_ioctl() at bridge_ioctl+0x268
in_control() at in_control+0x219
ifioctl() at ifioctl+0x1896
kern_ioctl() at kern_ioctl+0x1b0
sys_ioctl() at sys_ioctl+0x11f
amd64_syscall() at amd64_syscall+0x282
Xfast_syscall() at Xfast_syscall+0xfb
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8011815ca, rsp = 
0x7fffd3f8, rbp = 0x7fffd4a0 ---

and eventually the panic Sleeping thread owns a non-sleepable lock
in lagg_input, when a packet arrives simultaneously with ifconfig run.

The lagg gets if_lagg rwlock before going to setflag, which ends up
calling ieee80211_ioctl and ieee80211_waitfor_parent (wait for all
deferred parent interface tasks to complete).

Does anybody see a way how it could be solved?

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Proposal for changes to network device drivers and network stack (RFC)

2012-09-21 Thread Mikolaj Golub
On Fri, Sep 07, 2012 at 01:28:16AM -0700, Anuranjan Shukla wrote:
 Hi George,
 Thanks for taking a look. Some answers/comments below.
 
 
  Building FreeBSD without the network stack (network stack as a module)
  --
 
 This would be interesting for many reasons, and I think it would be a good
 contribution.  Does the work you've done in this area handle the VNET
 stuff that is in the stack as well?  That is, how well does the network
 stack
 as a module play with the vnet architecture?
 
 I'll follow up on this one separately.

FYI, there is at least this issue with virtualized global variables in modules:

http://lists.freebsd.org/pipermail/freebsd-virtualization/2011-July/000737.html

On archs that use link_elf.c (i.e. all except amd64, which uses
link_elf_obj.c) virtualized global variables in modules can not be
accessed from another modules, because link_elf on a module load does
relocation only for VNET variables defined in this module.

As it was pointed by Marko Zec, the same issue is with DPCPU.

The latest patch I have (both for VNET and DPCPU):

http://people.freebsd.org/~trociny/link_elf.c.pcpu_vnet.patch

The fix is to make the linker on a module load recognize external
VNET/DPCPU variables defined in the previously loaded modules and
relocate them accordingly. For this set_pcpu_list and set_vnet_list
are used, where the addresses of modules 'set_pcpu' and 'set_vnet'
linker sets are stored in.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: net.inet.tcp.hostcache.list: RTTVAR value

2012-07-03 Thread Mikolaj Golub

On Mon, 02 Jul 2012 15:04:10 +0200 Andre Oppermann wrote:

 AO On 01.07.2012 18:30, Mikolaj Golub wrote:
  Hi,
 
  It looks for me that in the calculation of RTTVAR value for
  net.inet.tcp.hostcache.list sysctl a wrong scale is used: TCP_RTT_SCALE
  instead of TCP_RTTVAR_SCALE. See the attached patch. I am going to commit it
  if nobody tell me that I am wrong here.

 AO Correct.

Thanks! Committed.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


netstat(1): negative tcp timer counters

2012-07-01 Thread Mikolaj Golub
Hi,

I have noticed that `netstat -x' shows negative values for keep timer. In my
case this is for connections in CLOSE state.

Reviewing the timer code it looks like there is an issue in tcp_timer_*
functions, when inp is checked for INP_DROPPED. If the flag is set the
function returns and callout_deactivate() is never called. Adding some prints
I made sure that observed negative counters in my case were due to this check.

The attached patch (check for INP_DROPPED after callout_deactivate) fixes the
issue for me. I would like to commit it if there are no objections.

-- 
Mikolaj Golub

Index: sys/netinet/tcp_timer.c
===
--- sys/netinet/tcp_timer.c	(revision 237918)
+++ sys/netinet/tcp_timer.c	(working copy)
@@ -183,13 +183,18 @@ tcp_timer_delack(void *xtp)
 		return;
 	}
 	INP_WLOCK(inp);
-	if ((inp-inp_flags  INP_DROPPED) || callout_pending(tp-t_timers-tt_delack)
-	|| !callout_active(tp-t_timers-tt_delack)) {
+	if (callout_pending(tp-t_timers-tt_delack) ||
+	!callout_active(tp-t_timers-tt_delack)) {
 		INP_WUNLOCK(inp);
 		CURVNET_RESTORE();
 		return;
 	}
 	callout_deactivate(tp-t_timers-tt_delack);
+	if ((inp-inp_flags  INP_DROPPED) != 0) {
+		INP_WUNLOCK(inp);
+		CURVNET_RESTORE();
+		return;
+	}
 
 	tp-t_flags |= TF_ACKNOW;
 	TCPSTAT_INC(tcps_delack);
@@ -229,7 +234,7 @@ tcp_timer_2msl(void *xtp)
 	}
 	INP_WLOCK(inp);
 	tcp_free_sackholes(tp);
-	if ((inp-inp_flags  INP_DROPPED) || callout_pending(tp-t_timers-tt_2msl) ||
+	if (callout_pending(tp-t_timers-tt_2msl) ||
 	!callout_active(tp-t_timers-tt_2msl)) {
 		INP_WUNLOCK(tp-t_inpcb);
 		INP_INFO_WUNLOCK(V_tcbinfo);
@@ -237,6 +242,12 @@ tcp_timer_2msl(void *xtp)
 		return;
 	}
 	callout_deactivate(tp-t_timers-tt_2msl);
+	if ((inp-inp_flags  INP_DROPPED) != 0) {
+		INP_WUNLOCK(inp);
+		INP_INFO_WUNLOCK(V_tcbinfo);
+		CURVNET_RESTORE();
+		return;
+	}
 	/*
 	 * 2 MSL timeout in shutdown went off.  If we're closed but
 	 * still waiting for peer to close and connection has been idle
@@ -300,14 +311,20 @@ tcp_timer_keep(void *xtp)
 		return;
 	}
 	INP_WLOCK(inp);
-	if ((inp-inp_flags  INP_DROPPED) || callout_pending(tp-t_timers-tt_keep)
-	|| !callout_active(tp-t_timers-tt_keep)) {
+	if (callout_pending(tp-t_timers-tt_keep) ||
+	!callout_active(tp-t_timers-tt_keep)) {
 		INP_WUNLOCK(inp);
 		INP_INFO_WUNLOCK(V_tcbinfo);
 		CURVNET_RESTORE();
 		return;
 	}
 	callout_deactivate(tp-t_timers-tt_keep);
+	if ((inp-inp_flags  INP_DROPPED) != 0) {
+		INP_WUNLOCK(inp);
+		INP_INFO_WUNLOCK(V_tcbinfo);
+		CURVNET_RESTORE();
+		return;
+	}
 	/*
 	 * Keep-alive timer went off; send something
 	 * or drop connection if idle for too long.
@@ -397,14 +414,20 @@ tcp_timer_persist(void *xtp)
 		return;
 	}
 	INP_WLOCK(inp);
-	if ((inp-inp_flags  INP_DROPPED) || callout_pending(tp-t_timers-tt_persist)
-	|| !callout_active(tp-t_timers-tt_persist)) {
+	if (callout_pending(tp-t_timers-tt_persist) ||
+	!callout_active(tp-t_timers-tt_persist)) {
 		INP_WUNLOCK(inp);
 		INP_INFO_WUNLOCK(V_tcbinfo);
 		CURVNET_RESTORE();
 		return;
 	}
 	callout_deactivate(tp-t_timers-tt_persist);
+	if ((inp-inp_flags  INP_DROPPED) != 0) {
+		INP_WUNLOCK(inp);
+		INP_INFO_WUNLOCK(V_tcbinfo);
+		CURVNET_RESTORE();
+		return;
+	}
 	/*
 	 * Persistance timer into zero window.
 	 * Force a byte to be output, if possible.
@@ -469,14 +492,20 @@ tcp_timer_rexmt(void * xtp)
 		return;
 	}
 	INP_WLOCK(inp);
-	if ((inp-inp_flags  INP_DROPPED) || callout_pending(tp-t_timers-tt_rexmt)
-	|| !callout_active(tp-t_timers-tt_rexmt)) {
+	if (callout_pending(tp-t_timers-tt_rexmt) ||
+	!callout_active(tp-t_timers-tt_rexmt)) {
 		INP_WUNLOCK(inp);
 		INP_INFO_RUNLOCK(V_tcbinfo);
 		CURVNET_RESTORE();
 		return;
 	}
 	callout_deactivate(tp-t_timers-tt_rexmt);
+	if ((inp-inp_flags  INP_DROPPED) != 0) {
+		INP_WUNLOCK(inp);
+		INP_INFO_RUNLOCK(V_tcbinfo);
+		CURVNET_RESTORE();
+		return;
+	}
 	tcp_free_sackholes(tp);
 	/*
 	 * Retransmission timer went off.  Message has not
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: bin/151937: [patch] netstat(1) utility lack support of displaying rtt related counters of tcp sockets

2012-07-01 Thread Mikolaj Golub
Hi,

Mykola, thank you for the report and the provided patch. Displaying rtt
related counters per connection looks useful for me too.

I am attaching the modified version of the patch to discuss (and commit if
there are no objections or other suggestions).

The differences from your version:

1) '-T' option is already used. Also, I don't like very much adding yet
another option, so I added the statistics to '-x' option. Or it can be added
to '-T' statistics.

2) As counter names I used names that are close to field names in the tcpcb
structure.

3) To get hz, instead of kern.clockrate, I use kern.hz sysctl (as it
simplifies the code a little) and for !live case read it from the dump.

4) The trick with printing to buf is used to pad the counters on the right, as
it is with other counters.

Also, it might be enough to display only srtt and rttvar statistics?

-- 
Mikolaj Golub
Index: usr.bin/netstat/inet.c
===
--- usr.bin/netstat/inet.c	(revision 237835)
+++ usr.bin/netstat/inet.c	(working copy)
@@ -293,6 +293,28 @@ fail:
 #undef KREAD
 }
 
+static const char *
+humanize_rtt(int val, int scale)
+{
+	size_t len;
+	static int hz;
+	static char buf[16];
+
+	if (hz == 0) {
+		hz = 1;
+		if (live) {
+			len = sizeof(hz);
+			if (sysctlbyname(kern.hz, hz, len, NULL, 0) == -1)
+warn(sysctl: kern.hz);
+		} else {
+			kread(hz_addr, hz, sizeof(hz));
+		}
+	}
+	snprintf(buf, sizeof(buf), %.3f, (float)val / (scale * hz));
+
+	return (buf);
+}
+
 /*
  * Print a summary of connections related to an Internet
  * protocol.  For TCP, also give state of connection.
@@ -441,6 +463,8 @@ protopr(u_long off, const char *name, int af1, int
 printf( %7.7s %7.7s %7.7s %7.7s %7.7s %7.7s,
    rexmt, persist, keep,
    2msl, delack, rcvtime);
+printf( %7.7s %7.7s %7.7s %9.9s,
+   srtt, rttvar, rttlow, rttupdate);
 			}
 			putchar('\n');
 			first = 0;
@@ -548,6 +572,14 @@ protopr(u_long off, const char *name, int af1, int
 timer-tt_2msl / 1000, (timer-tt_2msl % 1000) / 10,
 timer-tt_delack / 1000, (timer-tt_delack % 1000) / 10,
 timer-t_rcvtime / 1000, (timer-t_rcvtime % 1000) / 10);
+			if (tp != NULL) {
+printf( %7s, humanize_rtt(tp-t_srtt,
+TCP_RTT_SCALE));
+printf( %7s, humanize_rtt(tp-t_rttvar,
+TCP_RTTVAR_SCALE));
+printf( %7s, humanize_rtt(tp-t_rttlow, 1));
+printf( %9lu , tp-t_rttupdated);
+			}
 		}
 		if (istcp  !Lflag  !xflag  !Tflag) {
 			if (tp-t_state  0 || tp-t_state = TCP_NSTATES)
Index: usr.bin/netstat/main.c
===
--- usr.bin/netstat/main.c	(revision 237835)
+++ usr.bin/netstat/main.c	(working copy)
@@ -184,6 +184,8 @@ static struct nlist nl[] = {
 	{ .n_name = _arpstat },
 #define	N_UNP_SPHEAD	56
 	{ .n_name = unp_sphead },
+#define	N_HZ		57
+	{ .n_name = _hz },
 	{ .n_name = NULL },
 };
 
@@ -358,6 +360,8 @@ int	unit;		/* unit number for above */
 int	af;		/* address family */
 int	live;		/* true if we are examining a live system */
 
+u_long	hz_addr;	/* address of hz variable in kernel memory */
+
 int
 main(int argc, char *argv[])
 {
@@ -563,6 +567,7 @@ main(int argc, char *argv[])
 	 */
 #endif
 	kread(0, NULL, 0);
+	hz_addr = nl[N_HZ].n_value;
 	if (iflag  !sflag) {
 		intpr(interval, nl[N_IFNET].n_value, NULL);
 		exit(0);
Index: usr.bin/netstat/netstat.h
===
--- usr.bin/netstat/netstat.h	(revision 237835)
+++ usr.bin/netstat/netstat.h	(working copy)
@@ -59,6 +59,8 @@ extern int	unit;	/* unit number for above */
 extern int	af;	/* address family */
 extern int	live;	/* true if we are examining a live system */
 
+extern u_long	hz_addr;	/* address of hz variable in kernel memory */
+
 int	kread(u_long addr, void *buf, size_t size);
 const char *plural(uintmax_t);
 const char *plurales(uintmax_t);
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

net.inet.tcp.hostcache.list: RTTVAR value

2012-07-01 Thread Mikolaj Golub
Hi,

It looks for me that in the calculation of RTTVAR value for
net.inet.tcp.hostcache.list sysctl a wrong scale is used: TCP_RTT_SCALE
instead of TCP_RTTVAR_SCALE. See the attached patch. I am going to commit it
if nobody tell me that I am wrong here.

-- 
Mikolaj Golub

Index: sys/netinet/tcp_hostcache.c
===
--- sys/netinet/tcp_hostcache.c	(revision 237918)
+++ sys/netinet/tcp_hostcache.c	(working copy)
@@ -624,7 +624,7 @@ sysctl_tcp_hc_list(SYSCTL_HANDLER_ARGS)
 			msec(hc_entry-rmx_rtt *
 (RTM_RTTUNIT / (hz * TCP_RTT_SCALE))),
 			msec(hc_entry-rmx_rttvar *
-(RTM_RTTUNIT / (hz * TCP_RTT_SCALE))),
+(RTM_RTTUNIT / (hz * TCP_RTTVAR_SCALE))),
 			hc_entry-rmx_bandwidth * 8,
 			hc_entry-rmx_cwnd,
 			hc_entry-rmx_sendpipe,
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: bsnmp and HOST-RESOURCES-MIB

2012-06-22 Thread Mikolaj Golub

On Thu, 21 Jun 2012 19:23:33 +0700 Eugene Grosbein wrote:

 EG Hi!

 EG bsnmpd(1) has /usr/lib/snmp_hostres.so module in base system
 EG for HOST-RESOURCES-MIB implementation. What should I do to make
 EG bsnmpwalk -v 2c -s comm@localhost 1.3.6.1.2.1.25.3.3.1.2

 EG work without complaining:

 EG bsnmpwalk: Invalid OID - 1.3.6.1.2.1.25.3.3.1.2
 EG OID parsing error - 1.3.6.1.2.1.25.3.3.1.2

 EG And without -n flag, please :-)
 EG I'd like it to resolve OIDs to their names.

I am not very familiar with bsnmptools. Experimenting, I have found such
combinations working:

in138:~% bsnmpwalk -v 1 -s public@localhost -i hostres_tree.def 
'hrProcessorTable'
hrProcessorFrwID[5] = 0.0
hrProcessorFrwID[10] = 0.0
hrProcessorLoad[5] = 7
hrProcessorLoad[10] = 5
in138:~% bsnmpget -v 1 -s public@localhost -i hostres_tree.def 
'hrProcessorLoad.5'
hrProcessorLoad[5] = 8

Note, you should explicitly specify hostres_tree.def (from 
/usr/share/snmp/defs) 
for bsnmptools to be able to resolve name (no idea why).

Unfortunately, bsnmpwalk does not work for hrProcessorLoad:

in138:~% bsnmpwalk -v 1 -s public@localhost -i hostres_tree.def 
'hrProcessorLoad'   
bsnmpwalk: Snmp dialog - Operation timed out

Athough it works for the numerical format:

in138:~% bsnmpwalk -v 1 -s public@localhost '1.3.6.1.2.1.25.3.3.1.2' 
1.3.6.1.2.1.25.3.3.1.2.5 = 10
1.3.6.1.2.1.25.3.3.1.2.10 = 10
in138:~% bsnmpwalk -v 1 -s public@localhost -i hostres_tree.def 
'1.3.6.1.2.1.25.3.3.1.2'
hrProcessorLoad[5] = 10
hrProcessorLoad[10] = 6

Also, no idea why.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: soreceive_stream: mbuf leak if called with mp0 and MSG_WAITALL

2012-03-15 Thread Mikolaj Golub

On Mon, 12 Mar 2012 22:01:49 +0100 Andre Oppermann wrote:

 AO Yes, doesn't compute this way.  I've put in your fix in this revision:

 AO http://svn.freebsd.org/changeset/base/232867

Running your branch, smbfs tests have passed and no issues have been detected
so far.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: soreceive_stream: mbuf leak if called with mp0 and MSG_WAITALL

2012-03-08 Thread Mikolaj Golub
Hi,

On Tue, 06 Mar 2012 20:50:34 +0100 Andre Oppermann wrote:

 AO On 05.09.2011 21:58, Mikolaj Golub wrote:
 
  On Sun, 04 Sep 2011 12:30:53 +0300 Mikolaj Golub wrote:
 
MG  Apparently soreceive_stream() has an issue if it is called to 
  receive data as a
MG  mbuf chain (by supplying an non zero mbuf **mp0) and with 
  MSG_WAITALL set.
 
MG  I ran into this issue with smbfs, which uses soreceive() exactly in 
  this way
MG  (see netsmb/smb_trantcp.c:nbssn_recv()).
 
  Stressing smbfs a little I also observed the following soreceive_stream()
  related panic:

 AO Hi Mikolaj,

 AO thank you very much for testing, reporting and fixing bugs in 
soreceive_stream().

 AO I've altered your proposed patches a bit and committed them into my 
workqueue
 AO with the following revisions:

 AO http://svn.freebsd.org/changeset/base/232617
 AO http://svn.freebsd.org/changeset/base/232618

 AO Would you mind testing them again before they go into HEAD?

With this patch smb mount fails with the error:

smb_iod_recvall: tran return NULL without error

 AO Index: sys/kern/uipc_socket.c
 AO ===
 AO --- sys/kern/uipc_socket.c (revision 232616)
 AO +++ sys/kern/uipc_socket.c (revision 232617)
 AO @@ -2044,7 +2044,7 @@ deliver:
 AOif (mp0 != NULL) {
 AO/* Dequeue as many mbufs as possible. */
 AOif (!(flags  MSG_PEEK)  len = sb-sb_mb-m_len) {
 AO -  for (*mp0 = m = sb-sb_mb;
 AO +  for (m = sb-sb_mb;
 AO m != NULL  m-m_len = len;
 AO m = m-m_next) {
 AOlen -= m-m_len;
 AO @@ -2052,10 +2052,15 @@ deliver:
 AOsbfree(sb, m);
 AOn = m;
 AO}
 AO +  n-m_next = NULL;
 AOsb-sb_mb = m;
 AO +  sb-sb_lastrecord = sb-sb_mb;
 AOif (sb-sb_mb == NULL)
 AOSB_EMPTY_FIXUP(sb);
 AO -  n-m_next = NULL;
 AO +  if (*mp0 != NULL)
 AO +  m_cat(*mp0, m);
 AO +  else
 AO +  *mp0 = m;
 AO}

At that moment m points to the end of the chain. Shouldn't *mp0 be set to
sb-sb_mb before the for loop?

 AO/* Copy the remainder. */
 AOif (len  0) {
 AO @@ -2066,9 +2071,9 @@ deliver:
 AOif (m == NULL)
 AOlen = 0;/* Don't flush data from 
sockbuf. */
 AOelse
 AO -  uio-uio_resid -= m-m_len;
 AO +  uio-uio_resid -= len;
 AOif (*mp0 != NULL)
 AO -  n-m_next = m;
 AO +  m_cat(*mp0, m);
 AOelse
 AO*mp0 = m;
 AOif (*mp0 == NULL) {
 AO 

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Kernel panic on FreeBSD 9.0-beta2

2011-10-12 Thread Mikolaj Golub

On Wed, 12 Oct 2011 09:53:34 +0800 dave jones wrote:

 dj On Fri, Oct 7, 2011 at 9:12 AM, dave jones  wrote:
  2011/10/4 Mikolaj Golub :
 
  On Sat, 1 Oct 2011 14:15:45 +0800 dave jones wrote:
 
   dj On Fri, Sep 30, 2011 at 9:41 PM, Robert Watson wrote:
   
    On Wed, 28 Sep 2011, Mikolaj Golub wrote:
   
    On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote:
   
    KM Sorry, didn't look at the images (limited bw), I've seen 
  something KM
    like this before in timewait. This can't happen with UDP so will be 
  KM
    interested in learning more about the bug.
   
    The panic can be easily triggered by this:
   
    Hi:
   
    Just catching up on this thread.  I think the analysis here is 
  generally
    right: in 9.0, you're much more likely to see an inpcb with its 
  in_socket
    pointer cleared in the hash list than in prior releases, and
    in_pcbbind_setup() trips over this.
   
    However, at least on first glance (and from the perspective of 
  invariants
    here), I think the bug is actualy that in_pcbbind_setup() is asking
    in_pcblookup_local() for an inpcb and then access the returned inpcb's
    in_socket pointer without acquiring a lock on the inpcb.  
  Structurally, it
    can't acquire this lock for lock order reasons -- it already holds the 
  lock
    on its own inpcb.  Therefore, we should only access fields that are 
  safe to
    follow in an inpcb when you hold a reference via the hash lock and not 
  a
    lock on the inpcb itself, which appears generally OK (+/-) for all the
    fields in that clause but the t-inp_socket-so_options dereference.
   
    A preferred fix would cache the SO_REUSEPORT flag in an inpcb-layer 
  field,
    such as inp_flags2, giving us access to its value without having to 
  walk
    into the attached (or not) socket.
   
    This raises another structural question, which is whether we need a new
    inp_foo flags field that is protected explicitly by the hash lock, and 
  not
    by the inpcb lock, which could hold fields relevant to address 
  binding.  I
    don't think we need to solve that problem in this context, as a 
  slightly
    race on SO_REUSEPORT is likely acceptable.
   
    The suggested fix does perform the desired function of explicitly 
  detaching
    the inpcb from the hash list before the socket is disconnected from the
    inpcb. However, it's incomplete in that the invariant that's being 
  broken is
    also relied on for other protocols (such as raw sockets).  The correct
    invariant is that inp_socket is safe to follow unconditionally if an 
  inpcb
    is locked and INP_DROPPED isn't set -- the bug is in locked not in
    INP_DROPPED, which is why I think this is the wrong fix, even though 
  it
    prevents a panic :-).
 
   dj Hello Robert,
 
   dj Thank you for taking your valuable time to find out the problem.
   dj Since I don't have idea about network internals, would you have a 
  patch
   dj about this? I'd be glad to test it, thanks again.
 
  Here is the patch that implements what Robert suggests.
 
  Dave, could you test it?
 
  Sure. Thanks for cooking the patch.
  Machines have been running two days now without panic.

Thank you for testing it.

 dj Is there any plan to commit your fix? Thank you.
 dj I'd upgrade to 9.0-release from beta-2 once it's released.

I have an upgraded version of the patch, which is under review now. I have
been waiting for the response before asking you to test it, but it would be
great if you try it not waiting :-).

As it was pointed by Robert the previous version introduced a regression:
SO_REUSEPORT was ignored if setsockopt was called after bind (the old cached
value was still used). So the updated version fixes this and also contains
several other fixes, the most important among them is that it fixes the panic
for IPv6 bind case too.

-- 
Mikolaj Golub

Index: sys/netinet/in_pcb.c
===
--- sys/netinet/in_pcb.c	(revision 226165)
+++ sys/netinet/in_pcb.c	(working copy)
@@ -575,8 +575,7 @@ in_pcbbind_setup(struct inpcb *inp, struct sockadd
  ntohl(t-inp_faddr.s_addr) == INADDR_ANY) 
 (ntohl(sin-sin_addr.s_addr) != INADDR_ANY ||
  ntohl(t-inp_laddr.s_addr) != INADDR_ANY ||
- (t-inp_socket-so_options 
-	 SO_REUSEPORT) == 0) 
+ (t-inp_flags2  INP_REUSEPORT) == 0) 
 (inp-inp_cred-cr_uid !=
  t-inp_cred-cr_uid))
 	return (EADDRINUSE);
@@ -590,19 +589,19 @@ in_pcbbind_setup(struct inpcb *inp, struct sockadd
  * being in use (for now).  This is better
  * than a panic, but not desirable.
  */
-tw = intotw(inp);
+tw = intotw(t);
 if (tw == NULL ||
 (reuseport  tw-tw_so_options) == 0)
 	return (EADDRINUSE);
-			} else if (t 
-			(reuseport  t-inp_socket-so_options) == 0) {
+			} else if (t  (reuseport == 0 ||
+			(t-inp_flags2  INP_REUSEPORT) == 0)) {
 #ifdef INET6
 if (ntohl(sin-sin_addr.s_addr

Re: Kernel panic on FreeBSD 9.0-beta2

2011-10-04 Thread Mikolaj Golub

On Sat, 1 Oct 2011 14:15:45 +0800 dave jones wrote:

 dj On Fri, Sep 30, 2011 at 9:41 PM, Robert Watson wrote:
 
  On Wed, 28 Sep 2011, Mikolaj Golub wrote:
 
  On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote:
 
  KM Sorry, didn't look at the images (limited bw), I've seen something KM
  like this before in timewait. This can't happen with UDP so will be KM
  interested in learning more about the bug.
 
  The panic can be easily triggered by this:
 
  Hi:
 
  Just catching up on this thread.  I think the analysis here is generally
  right: in 9.0, you're much more likely to see an inpcb with its in_socket
  pointer cleared in the hash list than in prior releases, and
  in_pcbbind_setup() trips over this.
 
  However, at least on first glance (and from the perspective of invariants
  here), I think the bug is actualy that in_pcbbind_setup() is asking
  in_pcblookup_local() for an inpcb and then access the returned inpcb's
  in_socket pointer without acquiring a lock on the inpcb.  Structurally, it
  can't acquire this lock for lock order reasons -- it already holds the lock
  on its own inpcb.  Therefore, we should only access fields that are safe to
  follow in an inpcb when you hold a reference via the hash lock and not a
  lock on the inpcb itself, which appears generally OK (+/-) for all the
  fields in that clause but the t-inp_socket-so_options dereference.
 
  A preferred fix would cache the SO_REUSEPORT flag in an inpcb-layer field,
  such as inp_flags2, giving us access to its value without having to walk
  into the attached (or not) socket.
 
  This raises another structural question, which is whether we need a new
  inp_foo flags field that is protected explicitly by the hash lock, and not
  by the inpcb lock, which could hold fields relevant to address binding.  I
  don't think we need to solve that problem in this context, as a slightly
  race on SO_REUSEPORT is likely acceptable.
 
  The suggested fix does perform the desired function of explicitly detaching
  the inpcb from the hash list before the socket is disconnected from the
  inpcb. However, it's incomplete in that the invariant that's being broken is
  also relied on for other protocols (such as raw sockets).  The correct
  invariant is that inp_socket is safe to follow unconditionally if an inpcb
  is locked and INP_DROPPED isn't set -- the bug is in locked not in
  INP_DROPPED, which is why I think this is the wrong fix, even though it
  prevents a panic :-).

 dj Hello Robert,

 dj Thank you for taking your valuable time to find out the problem.
 dj Since I don't have idea about network internals, would you have a patch
 dj about this? I'd be glad to test it, thanks again.

Here is the patch that implements what Robert suggests.

Dave, could you test it?

  Robert

 dj Best regards,
 dj Dave.

-- 
Mikolaj Golub

Index: sys/netinet/in_pcb.c
===
--- sys/netinet/in_pcb.c	(revision 225885)
+++ sys/netinet/in_pcb.c	(working copy)
@@ -575,8 +575,7 @@ in_pcbbind_setup(struct inpcb *inp, struct sockadd
  ntohl(t-inp_faddr.s_addr) == INADDR_ANY) 
 (ntohl(sin-sin_addr.s_addr) != INADDR_ANY ||
  ntohl(t-inp_laddr.s_addr) != INADDR_ANY ||
- (t-inp_socket-so_options 
-	 SO_REUSEPORT) == 0) 
+ (t-inp_flags2  INP_REUSEPORT) == 0) 
 (inp-inp_cred-cr_uid !=
  t-inp_cred-cr_uid))
 	return (EADDRINUSE);
@@ -595,14 +594,15 @@ in_pcbbind_setup(struct inpcb *inp, struct sockadd
 (reuseport  tw-tw_so_options) == 0)
 	return (EADDRINUSE);
 			} else if (t 
-			(reuseport  t-inp_socket-so_options) == 0) {
+			(reuseport == 0 ||
+			(t-inp_flags2  INP_REUSEPORT) == 0)) {
 #ifdef INET6
 if (ntohl(sin-sin_addr.s_addr) !=
 INADDR_ANY ||
 ntohl(t-inp_laddr.s_addr) !=
 INADDR_ANY ||
-INP_SOCKAF(so) ==
-INP_SOCKAF(t-inp_socket))
+(inp-inp_vflag  INP_IPV6PROTO) == 0 ||
+(t-inp_vflag  INP_IPV6PROTO) == 0)
 #endif
 return (EADDRINUSE);
 			}
@@ -1867,6 +1867,11 @@ in_pcbinshash_internal(struct inpcb *inp, int do_p
 	KASSERT((inp-inp_flags  INP_INHASHLIST) == 0,
 	(in_pcbinshash: INP_INHASHLIST));
 
+	if ((inp-inp_socket-so_options  SO_REUSEPORT) != 0 ||
+	(IN_MULTICAST(ntohl(inp-inp_laddr.s_addr)) 
+	(inp-inp_socket-so_options  SO_REUSEADDR) != 0))
+		inp-inp_flags2 |= INP_REUSEPORT;
+
 #ifdef INET6
 	if (inp-inp_vflag  INP_IPV6)
 		hashkey_faddr = inp-in6p_faddr.s6_addr32[3] /* XXX */;
Index: sys/netinet/in_pcb.h
===
--- sys/netinet/in_pcb.h	(revision 225885)
+++ sys/netinet/in_pcb.h	(working copy)
@@ -540,6 +540,7 @@ void 	inp_4tuple_get(struct inpcb *inp, uint32_t *
 #define	INP_LLE_VALID		0x0001 /* cached lle is valid */	
 #define	INP_RT_VALID		0x0002 /* cached rtentry is valid */
 #define	INP_PCBGROUPWILD	0x0004 /* in pcbgroup wildcard list */
+#define

Re: Kernel panic on FreeBSD 9.0-beta2

2011-09-28 Thread Mikolaj Golub

On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote:

 KM Sorry, didn't look at the images (limited bw), I've seen something
 KM like this before in timewait. This can't happen with UDP so will be
 KM interested in learning more about the bug.

The panic can be easily triggered by this:



test_udp.c
Description: Binary data

The other thread at that moment is in soclose-sofree-upd_detach-in_pcbfree.

It looks for me that we should call in_pcbdrop() in udp_close() to remove
inpcb from hashed lists, like it is done for tcp_close().

With this patch I don't observe the panic.

Index: sys/netinet/udp_usrreq.c
===
--- sys/netinet/udp_usrreq.c	(revision 225816)
+++ sys/netinet/udp_usrreq.c	(working copy)
@@ -1486,6 +1486,7 @@ udp_close(struct socket *so)
 	inp = sotoinpcb(so);
 	KASSERT(inp != NULL, (udp_close: inp == NULL));
 	INP_WLOCK(inp);
+	in_pcbdrop(inp);
 	if (inp-inp_faddr.s_addr != INADDR_ANY) {
 		INP_HASH_WLOCK(V_udbinfo);
 		in_pcbdisconnect(inp);

 KM On Mon, Sep 26, 2011 at 4:02 PM, Arnaud Lacombe lacom...@gmail.com wrote:
  Hi,
 
  On Mon, Sep 26, 2011 at 5:12 AM, K. Macy km...@freebsd.org wrote:
 
 
  On Monday, September 26, 2011, Adrian Chadd adr...@freebsd.org wrote:
  On 26 September 2011 13:41, Arnaud Lacombe lacom...@gmail.com wrote:
   /*
    * XXX
    * This entire block sorely needs a rewrite.
    */
         if (t 
             ((t-inp_flags  INP_TIMEWAIT) == 0) 
             (so-so_type != SOCK_STREAM ||
              ntohl(t-inp_faddr.s_addr) == INADDR_ANY) 
             (ntohl(sin-sin_addr.s_addr) != INADDR_ANY ||
              ntohl(t-inp_laddr.s_addr) != INADDR_ANY ||
              (t-inp_socket-so_options 
            SO_REUSEPORT) == 0) 
             (inp-inp_cred-cr_uid !=
              t-inp_cred-cr_uid))
           return (EADDRINUSE);
       }
 
  more specifically, `t-inp_socket' is NULL. The top comment may not be
  relevant, as it's been here for the past 8 years.
 
  Why would t-inp_socket be NULL at this point?
 
  TIME_WAIT ...
 
  on UDP socket ?
 
   - Arnaud
 
 KM ___
 KM freebsd-net@freebsd.org mailing list
 KM http://lists.freebsd.org/mailman/listinfo/freebsd-net
 KM To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: soreceive_stream: mbuf leak if called with mp0 and MSG_WAITALL

2011-09-05 Thread Mikolaj Golub

On Sun, 04 Sep 2011 12:30:53 +0300 Mikolaj Golub wrote:

 MG Apparently soreceive_stream() has an issue if it is called to receive data 
as a
 MG mbuf chain (by supplying an non zero mbuf **mp0) and with MSG_WAITALL set.

 MG I ran into this issue with smbfs, which uses soreceive() exactly in this 
way
 MG (see netsmb/smb_trantcp.c:nbssn_recv()).

Stressing smbfs a little I also observed the following soreceive_stream()
related panic:

#9  0x80a28c80 in panic (fmt=0x80f4b4a4 sbappendstream 1)
at /usr/src/sys/kern/kern_shutdown.c:606
#10 0x80a9746b in sbappendstream_locked (sb=0x8bff1874, m=0x8885a600)
at /usr/src/sys/kern/uipc_sockbuf.c:527
#11 0x80bcef62 in tcp_do_segment (m=0x8885a600, th=0x8885a674, so=0x8bff1820, 
tp=0x8bb4f560, 
drop_hdrlen=52, tlen=51, iptos=0 '\0', ti_locked=1)
at /usr/src/sys/netinet/tcp_input.c:2854
#12 0x80bd091d in tcp_input (m=0x8885a600, off0=20) at 
/usr/src/sys/netinet/tcp_input.c:1382
#13 0x80b5b4fe in ip_input (m=0x8885a600) at /usr/src/sys/netinet/ip_input.c:765
#14 0x80af504b in swi_net (arg=0x81825880) at /usr/src/sys/net/netisr.c:806
#15 0x809fd535 in intr_event_execute_handlers (p=0x86ddc588, ie=0x86d37200)
at /usr/src/sys/kern/kern_intr.c:1257
#16 0x809fe419 in ithread_loop (arg=0x86d39bb0) at 
/usr/src/sys/kern/kern_intr.c:1270
#17 0x809fa7a8 in fork_exit (callout=0x809fe370 ithread_loop, arg=0x86d39bb0, 
frame=0x86926d28) at /usr/src/sys/kern/kern_fork.c:1029
#18 0x80d68914 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275
(kgdb) fr 10
#10 0x80a9746b in sbappendstream_locked (sb=0x8bff1874, m=0x8885a600)
at /usr/src/sys/kern/uipc_sockbuf.c:527
527 KASSERT(sb-sb_mb == sb-sb_lastrecord,(sbappendstream 1));
(kgdb) l
522 sbappendstream_locked(struct sockbuf *sb, struct mbuf *m)
523 {
524 SOCKBUF_LOCK_ASSERT(sb);
525 
526 KASSERT(m-m_nextpkt == NULL,(sbappendstream 0));
527 KASSERT(sb-sb_mb == sb-sb_lastrecord,(sbappendstream 1));
528 
529 SBLASTMBUFCHK(sb);
530 
531 sbcompress(sb, m, sb-sb_mbtail);
(kgdb) p m
$1 = (struct mbuf *) 0x8885a600
(kgdb) p m-m_hdr.mh_next
$2 = (struct mbuf *) 0x0
(kgdb) p sb-sb_mb
$3 = (struct mbuf *) 0x93965e00
(kgdb) p sb-sb_lastrecord
$4 = (struct mbuf *) 0x88cb0200
(kgdb) p sb
$5 = (struct sockbuf *) 0x8bff1874

This sb belonged to smb_iod_thread which at that time was in
soreceive_stream(), notifying the protocol that buffer had been drained:

#1  0x80d74cb7 in ipi_nmi_handler () at /usr/src/sys/i386/i386/mp_machdep.c:1478
#2  0x80d7f383 in trap (frame=0xdc33ea58) at /usr/src/sys/i386/i386/trap.c:219
#3  0x80d6886c in calltrap () at /usr/src/sys/i386/i386/exception.s:168
#4  0x80a26955 in _rw_wlock_hard (rw=0x8d18fac0, tid=2285360576, 
file=0x80f68ceb /usr/src/sys/netinet/tcp_usrreq.c, line=732) at 
cpufunc.h:294
#5  0x80a274d6 in _rw_wlock (rw=0x8d18fac0, 
file=0x80f68ceb /usr/src/sys/netinet/tcp_usrreq.c, line=732)
at /usr/src/sys/kern/kern_rwlock.c:240
#6  0x80bdf585 in tcp_usr_rcvd (so=0x8bff1820, flags=64)
at /usr/src/sys/netinet/tcp_usrreq.c:732
#7  0x80a9cf63 in soreceive_stream (so=0x8bff1820, psa=0x0, uio=0xdc33ec10, 
mp0=0xdc33ec44, 
controlp=0x0, flagsp=0xdc33ec40) at /usr/src/sys/kern/uipc_socket.c:2097
#8  0x80a9a6c9 in soreceive (so=0x8bff1820, psa=0x0, uio=0xdc33ec10, 
mp0=0xdc33ec44, 
controlp=0x0, flagsp=0xdc33ec40) at /usr/src/sys/kern/uipc_socket.c:2309
#9  0x91165e14 in nbssn_recv (nbp=0x874a49c0, mpp=0xdc33ec98, lenp=0xdc33ec64, 
rpcodep=0xdc33ec6b , td=0x8837d5c0)
at /usr/src/sys/modules/smbfs/../../netsmb/smb_trantcp.c:378
#10 0x91165fee in smb_nbst_recv (vcp=0x8961ae00, mpp=0xdc33ec98, td=0x8837d5c0)
at /usr/src/sys/modules/smbfs/../../netsmb/smb_trantcp.c:598
#11 0x9116bda1 in smb_iod_recvall (iod=0x88c64980)
at /usr/src/sys/modules/smbfs/../../netsmb/smb_iod.c:305
#12 0x9116c82c in smb_iod_thread (arg=0x88c64980)
at /usr/src/sys/modules/smbfs/../../netsmb/smb_iod.c:645
#13 0x809fa7a8 in fork_exit (callout=0x9116c600 smb_iod_thread, 
arg=0x88c64980, 
frame=0xdc33ed28) at /usr/src/sys/kern/kern_fork.c:1029
#14 0x80d68914 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275
(kgdb) fr 7
#7  0x80a9cf63 in soreceive_stream (so=0x8bff1820, psa=0x0, uio=0xdc33ec10, 
mp0=0xdc33ec44, 
controlp=0x0, flagsp=0xdc33ec40) at /usr/src/sys/kern/uipc_socket.c:2097
2097(*so-so_proto-pr_usrreqs-pru_rcvd)(so, 
flags);
(kgdb) l
2092if ((so-so_proto-pr_flags  PR_WANTRCVD) 
2093(((flags  MSG_WAITALL)  uio-uio_resid  0) ||
2094 !(flags  MSG_SOCALLBCK))) {
2095SOCKBUF_UNLOCK(sb);
2096VNET_SO_ASSERT(so);
2097(*so-so_proto-pr_usrreqs-pru_rcvd)(so, 
flags);
2098SOCKBUF_LOCK(sb);
2099}
2100}
2101
(kgdb) p

soreceive_stream: mbuf leak if called with mp0 and MSG_WAITALL

2011-09-04 Thread Mikolaj Golub
Hi,

Apparently soreceive_stream() has an issue if it is called to receive data as a
mbuf chain (by supplying an non zero mbuf **mp0) and with MSG_WAITALL set.

I ran into this issue with smbfs, which uses soreceive() exactly in this way
(see netsmb/smb_trantcp.c:nbssn_recv()).

If MSG_WAITALL is set and not all data is received it loops again but on the
next run mb0 is set to sb-sb_mb again loosing all previously received mbufs.
It looks like it should be set to the end of mb0 chain instead. See the
attached path.

Also, in the copy the remainder block we reduce uio_resid by m-m_len (the
length of the last mbuf in the chain), but it looks like for the MSG_PEEK case
the remainder may have more than one mbuf in the chain and we have to reduce
by len (the length of the copied chain).

I don't have a test case to check MSG_PEEK issue, but the patch fixes the
issue with smbfs for me.

-- 
Mikolaj Golub

Index: sys/kern/uipc_socket.c
===
--- sys/kern/uipc_socket.c	(revision 225368)
+++ sys/kern/uipc_socket.c	(working copy)
@@ -2030,7 +2030,11 @@ deliver:
 	if (mp0 != NULL) {
 		/* Dequeue as many mbufs as possible. */
 		if (!(flags  MSG_PEEK)  len = sb-sb_mb-m_len) {
-			for (*mp0 = m = sb-sb_mb;
+			if (*mp0 == NULL)
+*mp0 = sb-sb_mb;
+			else
+n-m_next = sb-sb_mb;
+			for (m = sb-sb_mb;
 			 m != NULL  m-m_len = len;
 			 m = m-m_next) {
 len -= m-m_len;
@@ -2052,7 +2056,7 @@ deliver:
 			if (m == NULL)
 len = 0;	/* Don't flush data from sockbuf. */
 			else
-uio-uio_resid -= m-m_len;
+uio-uio_resid -= len;
 			if (*mp0 != NULL)
 n-m_next = m;
 			else
@@ -2061,6 +2065,9 @@ deliver:
 error = ENOBUFS;
 goto out;
 			}
+			/* Update n to point to the last mbuf. */
+			for (; m != NULL; m = m-m_next)
+n = m;
 		}
 	} else {
 		/* NB: Must unlock socket buffer as uiomove may sleep. */
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: Problem using CARP + HAST ...

2011-08-09 Thread Mikolaj Golub

On Mon, 8 Aug 2011 16:54:10 +0200 Ferdinand Goldmann wrote:

 FG Hi!

 FG I am trying to create a common resource pool for a certain application 
using
 FG CARP/HAST as described in [1]. However while testing my setup I ran into a
 FG problem which I don't know how to fix or work around:

 FG If I shut down only the carp interface on the master (ifconfig carp0 down),
 FG the slave will take note of this, make his carp interface the master and
 FG mount the HAST storage using a script called by devd. Everything fine so 
far. BUT:

 FG If, however, I completely shut down the masters network connection (using 
shut on
 FG the switchport), the carp interface on the slave will still switch to 
master. 
 FG But the script for making the HAST storage primary will just hang forever:

 FG root  46841  0.0  0.6  3628  1524  ??  S 4:21PM   0:00.08 /bin/sh 
/opt/bin/carp-hast-switch master
 FG root  47043  0.0  2.6 42228  6580  ??  S 4:22PM   0:00.03 hastd: hast0 
(secondary) (hastd)

 FG Seemingly, this is because the hastd daemons on master and slave are 
unable to 
 FG communicate. So the script waits forever for the secondary device to go 
away... :

 FG# Wait for any hastd secondary processes to stop
 FGfor disk in ${resources}; do
 FGwhile $( pgrep -lf hastd: ${disk} \(secondary\)  /dev/null 
21 ); do
 FG sleep 1
 FG done

What freebsd are you running on? I suppose it is release, because on STABLE
this issue should be fixed -- the secondary terminates after timeout.

 FG Im a bit puzzled. Is there a way for hastd to make himself the master in 
case of a timeout
 FG or such? Because in normal operation, whenever the carp interface fails, 
the underlying 
 FG infrastructure will most likely be down as well.

On release you can just modify the script not to wait forever for hastd
secondary to stop -- it will be terminated when the role is switched to
primary.

But anyway my advise is to use STABLE :-).

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: soreceive_stream: issues with O_NONBLOCK

2011-07-07 Thread Mikolaj Golub

On Thu, 07 Jul 2011 12:47:15 +0200 Andre Oppermann wrote:

 AO Please try this patch:
 AO  http://people.freebsd.org/~andre/soreceive_stream.diff-20110707

It works for me. No issues detected so far. Thanks.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


soreceive_stream: issues with O_NONBLOCK

2011-07-03 Thread Mikolaj Golub
Hi,

Trying soreceive_stream I found that many applications (like firefox, pidgin,
gnus) might hang in soreceive_stream/sbwait.

It was shown up that the issue was with O_NONBLOCK connections -- they blocked
in recv() when should not have been.

This can be checked with this simple test:

http://people.freebsd.org/~trociny/test_nonblock.c

In soreceive_stream we have the following code that looks wrong:

   1968 /* Socket buffer is empty and we shall not block. */
   1969 if (sb-sb_cc == 0 
   1970 ((sb-sb_flags  SS_NBIO) || (flags  
(MSG_DONTWAIT|MSG_NBIO {
   1971 error = EAGAIN;
   1972 goto out;
   1973 }

It should check so-so_state agains SS_NBIO, not sb-sb_flags. But just
changing this is not enough. This check is called too early -- before checking
that socket state is not SBS_CANTRCVMORE. As a result, if the peer closes the
connection recv() returns EAGAIN instead of 0. See this example:

http://people.freebsd.org/~trociny/test_close.c

So I moved the nonblock check below SBS_CANTRCVMORE check and ended up with
this patch:

http://people.freebsd.org/~trociny/uipc_socket.c.soreceive_stream.patch

It works for me fine.

Also, this part looks wrong:

   1958 /* We will never ever get anything unless we are connected. */
   1959 if (!(so-so_state  (SS_ISCONNECTED|SS_ISDISCONNECTED))) {
   1960 /* When disconnecting there may be still some data 
left. */
   1961 if (sb-sb_cc  0)
   1962 goto deliver;
   1963 if (!(so-so_state  SS_ISDISCONNECTED))
   1964 error = ENOTCONN;
   1965 goto out;
   1966 }

Why we check in 1959 that state is not SS_ISDISCONNECTED? If it is valid then
the check at 1963 is useless becase it will be always true. Shouldn't it be
something like below?

if (!(so-so_state  (SS_ISCONNECTED|SS_ISCONNECTING))) {
/* When disconnecting there may be still some data left. */
if (sb-sb_cc  0)
goto deliver;
error = ENOTCONN;
goto out;
}

(I don't see why we souldn't set ENOTCONN if the state is SS_ISDISCONNECTED).

And the last :-). Currently, to try soreceive_stream one need to rebuild kernel
with TCP_SORECEIVE_STREAM and then set tunable net.inet.tcp.soreceive_stream.
Why do we need TCP_SORECEIVE_STREAM option? Wouldn't tunable be enough? It
would simplify trying soreceive_stream by users and we might have more
testing/feedback.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Scenario to make recv(MSG_WAITALL) stuck

2011-06-19 Thread Mikolaj Golub

On Sun, 19 Jun 2011 12:44:03 +0300 Kostik Belousov wrote:

 KB On Wed, Jun 15, 2011 at 09:44:33AM +0300, Mikolaj Golub wrote:
  
  On Tue, 14 Jun 2011 12:23:03 +0300 Kostik Belousov wrote:
  
   KB I do not understand what then happens for the recvfrom(2) call ?
   KB Would it get some error, or 0 as return and no data, or something else 
  ?
  
  It will wait for data below in another loop (Now continue to read any data
  mbufs off of the head...).
  
  Elaborating, I would split soreceive_generic on three logical parts.
  
  In the first (restart) part we block until some data are received and also
  (without the patch) in the case of MSG_WAITALL if the buffer is big enough 
  we
  block until all MSG_WAITALL request is received (actually it will spin in
  goto restart loop until some condition becomes invalid).
  
  The second part is some processing of received data and the third part is a
  while loop where data is copied to userspace and in the case of 
  MSG_WAITALL
  request if not all data is received to satisfy the request it also waits for
  this data.
  
  My patch removes the condition in the first part in the case of MSG_WAITALL 
  to
  wait for all data if buffer is big enough. We always will wait for the rest 
  of
  data in the third part. It might be not so effective, and this is my first
  concern about the patch (although not big :-).
 KB Now I think that this part of the patch is right.
 KB The loop in the soreceive_generic() would behave as I would expect
 KB it for MSG_WAITALL. It copyout the received data to userspace by
 KB received chunks.

 KB I do not understand your note about effectiveness there.

The old behaviour: if only a part of the request is recived and the buffer is
large enough, wait for the rest and then go to processing.

The new behaviour: if a part of data is recived, (unconditionally) process it
and wait for the rest (and process).

The first one looks a little more efficient (but has the issue for edge case
with nearly full buffer).

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Scenario to make recv(MSG_WAITALL) stuck

2011-06-15 Thread Mikolaj Golub

On Tue, 14 Jun 2011 12:23:03 +0300 Kostik Belousov wrote:

 KB I do not understand what then happens for the recvfrom(2) call ?
 KB Would it get some error, or 0 as return and no data, or something else ?

It will wait for data below in another loop (Now continue to read any data
mbufs off of the head...).

Elaborating, I would split soreceive_generic on three logical parts.

In the first (restart) part we block until some data are received and also
(without the patch) in the case of MSG_WAITALL if the buffer is big enough we
block until all MSG_WAITALL request is received (actually it will spin in
goto restart loop until some condition becomes invalid).

The second part is some processing of received data and the third part is a
while loop where data is copied to userspace and in the case of MSG_WAITALL
request if not all data is received to satisfy the request it also waits for
this data.

My patch removes the condition in the first part in the case of MSG_WAITALL to
wait for all data if buffer is big enough. We always will wait for the rest of
data in the third part. It might be not so effective, and this is my first
concern about the patch (although not big :-).

 KB Also, what is the MT_CONTROL chunk about ?

When I removed the condition to skip blocking in the first part I started to
observe panic on KASSERT(m-m_type == MT_DATA) for the following scenario
(produced by HAST):

sender:

send(4 bytes); /* send protocol name */
sendmsg(); /* send descriptor (normal data is empty, descriptor in control 
data) */

receiver:

recv(127 bytes, MSG_WAITALL);   /* recive protocol name */
recvmsg();  /* recive descriptor */

Although the recv() has MSG_WAITALL, it exits after receiving 4 bytes because
the next received data is of different (MT_CONTROL) type. An it panicked when
got control data.

It is unclear for me why it is not expected to have MT_CONTROL data in that
part. We do have processing of MT_CONTROL above (in the second part) in the
code but I still a have feeling that it is possible to create some scenario to
break this assert without my patch too, but I have failed so far. And this is
my second concern about my patch, big enough, because for now I am not sure
that this is correct. Although I have not observed issues with it so far...

Also, I am not sure if there is sense to bother with soreceive_generic() at
all. May be it is more perspective to spend time on maturing
soreceive_stream(). As I see it is going to be a replacement for
soreceive_generic() for stream sockets.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Automatic receive buffer sizing works only for connections in ESTABLISHED state

2011-06-13 Thread Mikolaj Golub
Hi,

Automatic receive buffer sizing works only for connections in ESTABLISHED
state. In tcp_input() auto resizing code is under if (tp-t_state ==
TCPS_ESTABLISHED  ...) branch.

This is unfortunate for HAST, which uses one direction connections and
shutdown another direction, so the receiving socket is in FIN_WAIT_2 and auto
resizing does not work here.

Is there some reason why it should be only for connections in ESTABLISHED
state or this should be considered as a bug?

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Scenario to make recv(MSG_WAITALL) stuck

2011-06-13 Thread Mikolaj Golub
. But the
window was closed when the buffer was filled and to avoid silly window
syndrome it opens only when available space is larger than sb_hiwat/4 or
maxseg:

tcp_output():

/*
 * Calculate receive window.  Don't shrink window,
 * but avoid silly window syndrome.
 */
if (recwin  (long)(so-so_rcv.sb_hiwat / 4) 
recwin  (long)tp-t_maxseg)
recwin = 0;

so it is stuck and pending data is only sent via TCP window probes.

It looks like the fix could be to remove this condition to block if
MSG_WAITALL is set and it is possible to do the entire receive operation at
once, like in the patch:

http://people.freebsd.org/~trociny/uipc_socket.c.soreceive_generic.MSG_DONTWAIT.patch

This works for me but I am not sure this is a correct solution.

Note, the issue is not reproduced with soreceive_stream.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Spurious ACKs, ICMP unreachable?

2011-05-14 Thread Mikolaj Golub

On Fri, 13 May 2011 14:38:34 -0700 Chuck Swiger wrote:

 CS On May 13, 2011, at 1:07 PM, Ivan Voras wrote:
  I'm seeing an an unusual problem at a remote machine; this machine is
  the FreeBSD server, and the client is a probably Windows machine (but I
  don't know the details yet). Something happens which causes FreeBSD to
  send ACKs to the client, and the client to send ICMP unreachable
  messages to the server. It is most likely a configuration error at the
  remote site but I have no idea how to verify this.


 CS Let's look at just one connection:

 CS 18:56:02.711942 IP server.http  client.4732: Flags [.], ack 2110905191, 
win 0, length 0
 CS 18:56:02.713155 IP server.http  client.4732: Flags [.], ack 1, win 65535, 
length 0

 CS The packet is FreeBSD webserver sending ACKs with zero window size;
 CS that's a sign of congestion that the client should not be sending more
 CS data and instead doing periodic window probes until the local box opens
 CS the window again.  The next packet on the same connection then ACK's
 CS something outside of the window with a 64K window size.  That's wrong;
 CS the other side probably sends an RST and the ICMP error.  If you have TSO
 CS enabled, try turning it off.

Might be this the thing that jhb@ was fixing in r221346?

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/154504: [libc] recv(2): PF_LOCAL stream connection is stuck in sbwait when recv(MSG_WAITALL) is used

2011-04-10 Thread Mikolaj Golub
Hi,

Does the attached patch fix the problem for you?

-- 
Mikolaj Golub
Index: sys/kern/uipc_socket.c
===
--- sys/kern/uipc_socket.c	(revision 220485)
+++ sys/kern/uipc_socket.c	(working copy)
@@ -1845,10 +1845,16 @@ dontblock:
 			}
 			SBLASTRECORDCHK(so-so_rcv);
 			SBLASTMBUFCHK(so-so_rcv);
-			error = sbwait(so-so_rcv);
-			if (error) {
-SOCKBUF_UNLOCK(so-so_rcv);
-goto release;
+			/*
+			 * We could receive some data while was notifying the
+			 * the protocol. Skip blocking in this case.
+			 */
+			if (so-so_rcv.sb_mb == NULL) {
+error = sbwait(so-so_rcv);
+if (error) {
+	SOCKBUF_UNLOCK(so-so_rcv);
+	goto release;
+}
 			}
 			m = so-so_rcv.sb_mb;
 			if (m != NULL)
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

recv() with MSG_WAITALL might stuck when receiving more than rcvbuf

2011-04-09 Thread Mikolaj Golub
Hi,

When testing HAST synchronization running both primary and secondary HAST
instances on the same host I faced an issue that the synchronization may be
very slow:

Apr  9 14:04:04 kopusha hastd[3812]: [test] (primary) Synchronization complete. 
512MB synchronized in 16m38s (525KB/sec).

hastd is synchronizing data in MAXPHYS (131072 bytes) blocks. Sending it
splits them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while receives
the whole block calling recv() with MSG_WAITALL option.

Sometimes recv() gets stuck: in tcpdump I see that sending side sent all
chunks, all they were acked, but receiving thread is still waiting in
recv(). netstat is reporting non empty Recv-Q for receiving side (with the
amount of bytes usually equal to the size of last sent chunk). It looked like
the receiving userspace was not informed by the kernel that all data had been
arrived.

I can reproduce the issue with the attached test_MSG_WAITALL.c.

I think the issue is in soreceive_generic(). 

If MSG_WAITALL is set but the request is larger than the receive buffer, it
has to do the receive in sections. So after receiving some data it notifies
protocol (calls pr_usrreqs-pru_rcvd) about the data, releasing so_rcv
lock. Returning it blocks in sbwait() waiting for the rest of data. I think
there is a race: when it was in pr_usrreqs-pru_rcvd not keeping the lock the
rest of data could arrive. Thus it should check for this before sbwait().

See the attached uipc_socket.c.soreceive.patch. The patch fixes the issue for
me.

Apr  9 14:16:40 kopusha hastd[2926]: [test] (primary) Synchronization complete. 
512MB synchronized in 4s (128MB/sec).

I observed the problem on STABLE but believe the same is on CURRENT.

BTW, I also tried optimized version of soreceive(), soreceive_stream(). It
does not have this problem. But with it I was observing tcp connections
getting stuck in soreceive_stream() on firefox (with many tabs) or pidgin
(with many contacts) start. The processes were killable only with -9. I did
not investigate this much though.

-- 
Mikolaj Golub



test_MSG_WAITALL.c
Description: Binary data
Index: sys/kern/uipc_socket.c
===
--- sys/kern/uipc_socket.c	(revision 220472)
+++ sys/kern/uipc_socket.c	(working copy)
@@ -1836,28 +1836,34 @@ dontblock:
 			/*
 			 * Notify the protocol that some data has been
 			 * drained before blocking.
 			 */
 			if (pr-pr_flags  PR_WANTRCVD) {
 SOCKBUF_UNLOCK(so-so_rcv);
 VNET_SO_ASSERT(so);
 (*pr-pr_usrreqs-pru_rcvd)(so, flags);
 SOCKBUF_LOCK(so-so_rcv);
 			}
 			SBLASTRECORDCHK(so-so_rcv);
 			SBLASTMBUFCHK(so-so_rcv);
-			error = sbwait(so-so_rcv);
-			if (error) {
-SOCKBUF_UNLOCK(so-so_rcv);
-goto release;
+			/*
+			 * We could receive some data while was notifying the
+			 * the protocol. Skip blocking in this case.
+			 */
+			if (so-so_rcv.sb_mb == NULL) {
+error = sbwait(so-so_rcv);
+if (error) {
+	SOCKBUF_UNLOCK(so-so_rcv);
+	goto release;
+}
 			}
 			m = so-so_rcv.sb_mb;
 			if (m != NULL)
 nextrecord = m-m_nextpkt;
 		}
 	}
 
 	SOCKBUF_LOCK_ASSERT(so-so_rcv);
 	if (m != NULL  pr-pr_flags  PR_ATOMIC) {
 		flags |= MSG_TRUNC;
 		if ((flags  MSG_PEEK) == 0)
 			(void) sbdroprecord_locked(so-so_rcv);
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

bsnmp/snmpmod.h: #include sys/queue.h is missed

2010-12-18 Thread Mikolaj Golub
Hi,

bsnmp/snmpmod.h uses SLIST but does not includes sys/queue.h. This breaks
net-mgmt/bsnmp-ucd port (ports/153153). 

Could somebody look at the attached patch?

-- 
Mikolaj Golub

Index: contrib/bsnmp/snmpd/snmpmod.h
===
--- contrib/bsnmp/snmpd/snmpmod.h	(revision 216439)
+++ contrib/bsnmp/snmpd/snmpmod.h	(working copy)
@@ -33,6 +33,7 @@
 #ifndef snmpmod_h_
 #define snmpmod_h_
 
+#include sys/queue.h
 #include sys/types.h
 #include sys/socket.h
 #include net/if.h
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: bsnmp/snmpmod.h: #include sys/queue.h is missed

2010-12-18 Thread Mikolaj Golub

On Sat, 18 Dec 2010 13:03:58 +0200 Kostik Belousov wrote:

 KB On Sat, Dec 18, 2010 at 12:48:38PM +0200, Mikolaj Golub wrote:
  Hi,
  
  bsnmp/snmpmod.h uses SLIST but does not includes sys/queue.h. This breaks
  net-mgmt/bsnmp-ucd port (ports/153153). 
  
  Could somebody look at the attached patch?

 KB sys/types.h, as well as sys/param.h should be included before
 KB other headers.

Thanks. Overlooked this :-). 

-- 
Mikolaj Golub

Index: contrib/bsnmp/snmpd/snmpmod.h
===
--- contrib/bsnmp/snmpd/snmpmod.h	(revision 216439)
+++ contrib/bsnmp/snmpd/snmpmod.h	(working copy)
@@ -34,6 +34,7 @@
 #define snmpmod_h_
 
 #include sys/types.h
+#include sys/queue.h
 #include sys/socket.h
 #include net/if.h
 #include netinet/in.h
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

net/if_epair.c: semicolon missed

2010-11-21 Thread Mikolaj Golub
Hi,

In net/if_epair.c semicolon is missed in epair_nh_drainedcpu() (see the
patch below). This shows up when compiling with EPAIR_DEBUG.

Also, what was a reason to declare epair_debug mib as XINT? Shouldn't be just
INT?

-- 
Mikolaj Golub

Index: sys/net/if_epair.c
===
--- sys/net/if_epair.c	(revision 215576)
+++ sys/net/if_epair.c	(working copy)
@@ -305,7 +305,7 @@ epair_nh_drainedcpu(u_int cpuid)
 
 		if ((ifp-if_drv_flags  IFF_DRV_OACTIVE) != 0) {
 			/* Our hwq overflew again. */
-			epair_dpcpu-epair_drv_flags |= IFF_DRV_OACTIVE
+			epair_dpcpu-epair_drv_flags |= IFF_DRV_OACTIVE;
 			DPRINTF(hw queue length overflow at %u\n,
 			epair_nh.nh_qlimit);
 			break;
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly

2010-05-30 Thread Mikolaj Golub
On Fri, 28 May 2010 04:40:03 GMT Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote:

   IMHO, it is not, unfortunately, a solution:  it seems to clear ECONNRESET
   blindly and w/o distinguishing the situation when the remote end closes the
   connection prematurely (i.e. before acknowledging all data written from the
   local end) -- and that qualifies for the true connection reset by peer
   from close()...

I did some experiments the results I would like to share here. The idea is
following: the client sends data in one write() more then a win, while the
server closes the connection without reading (sending RST on close). I also
played with LINGER option. I have managed to get ECONNRESET only on write(),
if the server sends RST before the client calls write(). In all other cases
write()/close() returned without error. See the attachment for details. 

So I think that with the workaround (ignore ECONNRESET returned by
sodisconnect() in soclose()) we would not make the situation worse (while it
fixed the issue with applications getting unexpectedly ECONNRESET after
shutdown()/close() sequence).

-- 
Mikolaj Golub



test_tcp_close.c
Description: Binary data
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly

2010-05-30 Thread Mikolaj Golub
The following reply was made to PR kern/146845; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: freebsd-net@FreeBSD.org
Cc: Lavrentiev\, Anton \(NIH\/NLM\/NCBI\) \[C\] l...@ncbi.nlm.nih.gov,  
Robert N. M. Watson rwat...@freebsd.org, bug-follo...@freebsd.org
Subject: Re: kern/146845: [libc] close(2) returns error 54 (connection reset by 
peer) wrongly
Date: Sun, 30 May 2010 11:05:45 +0300

 --=-=-=
 
 On Fri, 28 May 2010 04:40:03 GMT Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote:
 
IMHO, it is not, unfortunately, a solution:  it seems to clear ECONNRESET
blindly and w/o distinguishing the situation when the remote end closes the
connection prematurely (i.e. before acknowledging all data written from the
local end) -- and that qualifies for the true connection reset by peer
from close()...
 
 I did some experiments the results I would like to share here. The idea is
 following: the client sends data in one write() more then a win, while the
 server closes the connection without reading (sending RST on close). I also
 played with LINGER option. I have managed to get ECONNRESET only on write(),
 if the server sends RST before the client calls write(). In all other cases
 write()/close() returned without error. See the attachment for details. 
 
 So I think that with the workaround (ignore ECONNRESET returned by
 sodisconnect() in soclose()) we would not make the situation worse (while it
 fixed the issue with applications getting unexpectedly ECONNRESET after
 shutdown()/close() sequence).
 
 -- 
 Mikolaj Golub
 
 
 --=-=-=
 Content-Type: application/octet-stream
 Content-Disposition: attachment; filename=test_tcp_close.c
 Content-Transfer-Encoding: base64
 
 I2luY2x1ZGUgPHN5cy90eXBlcy5oPgojaW5jbHVkZSA8c3lzL3NvY2tldC5oPgojaW5jbHVkZSA8
 bmV0aW5ldC9pbi5oPgojaW5jbHVkZSA8c2lnbmFsLmg+CiNpbmNsdWRlIDxzdGRpby5oPgojaW5j
 bHVkZSA8c3RyaW5nLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4KI2luY2x1ZGUgPHVuaXN0ZC5oPgoj
 aW5jbHVkZSA8ZXJyLmg+CgojZGVmaW5lIEJVRlNJWkUJNDA5NjAwCiNkZWZpbmUgUE9SVAkyMzQ4
 MQojZGVmaW5lIFNMRUVQMQkwCiNkZWZpbmUgU0xFRVAyCTEKI3VuZGVmIExJTkdFUl9JTl9DTElF
 TlQKI3VuZGVmIExJTkdFUl9JTl9TRVJWRVIKCmludAptYWluKGludCBhcmdjLCBjaGFyICoqYXJn
 dikKewoJc3RydWN0IHNvY2thZGRyX2luIHNpbjsKCWludCBsaXN0ZW5mZCwgY29ubmZkLCBwaWQ7
 CgljaGFyIGJ1ZltCVUZTSVpFXTsKI2lmZGVmIExJTkdFUl9JTl9DTElFTlQKCXN0cnVjdCBsaW5n
 ZXIgbGluZzsKI2Vsc2UKI2lmZGVmIExJTkdFUl9JTl9TRVJWRVIKCXN0cnVjdCBsaW5nZXIgbGlu
 ZzsKI2VuZGlmCiNlbmRpZiAvKiBMSU5HRVJfSU5fQ0xJRU5UIHx8IExJTkdFUl9JTl9TRVJWRVIg
 Ki8KCQoJaWYgKChsaXN0ZW5mZCA9IHNvY2tldChBRl9JTkVULCBTT0NLX1NUUkVBTSwgMCkpIDwg
 MCkKCQllcnIoMSwgInNvY2tldCBlcnJvciIpOwoJbWVtc2V0KCZzaW4sIDAsIHNpemVvZihzaW4p
 KTsKCXNpbi5zaW5fZmFtaWx5ID0gQUZfSU5FVDsKCXNpbi5zaW5fcG9ydCA9IGh0b25zKFBPUlQp
 OwoJaWYgKGJpbmQobGlzdGVuZmQsIChzdHJ1Y3Qgc29ja2FkZHIgKikgJnNpbiwKCQkgc2l6ZW9m
 KHNpbikpIDwgMCkKCQllcnIoMSwgImJpbmQgZXJyb3IiKTsKCWlmIChsaXN0ZW4obGlzdGVuZmQs
 IDEwMjQpIDwgMCkKCQllcnIoMSwgImxpc3RlbiBlcnJvciIpOwoJcGlkID0gZm9yaygpOwoJaWYg
 KHBpZCA9PSAtMSkKCQllcnIoMSwgImZvcmsgZXJyb3IiKTsKCWlmIChwaWQgIT0gMCkgewoJCWNs
 b3NlKGxpc3RlbmZkKTsKCQlzbGVlcCgxKTsKCQlpZiAoKGNvbm5mZCA9IHNvY2tldChBRl9JTkVU
 LCBTT0NLX1NUUkVBTSwgMCkpIDwgMCkgewoJCQkodm9pZClraWxsKHBpZCwgU0lHVEVSTSk7CgkJ
 CWVycigxLCAicGFyZW50OiBzb2NrZXQgZXJyb3IiKTsKCQl9CgkJaWYgKGNvbm5lY3QoY29ubmZk
 LCAoc3RydWN0IHNvY2thZGRyICopJnNpbiwKCQkJICAgIHNpemVvZihzaW4pKSA8IDApIHsKCQkJ
 KHZvaWQpa2lsbChwaWQsIFNJR1RFUk0pOwoJCQllcnIoMSwgInBhcmVudDogY29ubmVjdCBlcnJv
 ciIpOwoJCX0KI2lmZGVmIExJTkdFUl9JTl9DTElFTlQKCQlsaW5nLmxfb25vZmYgPSAxOwoJCWxp
 bmcubF9saW5nZXIgPSAxMDsKCQlpZiAoc2V0c29ja29wdChjb25uZmQsIFNPTF9TT0NLRVQsIFNP
 X0xJTkdFUiwKCQkJICAgICAgICZsaW5nLCBzaXplb2YobGluZykpIDwgMCkKCQkJZXJyKDEsICJw
 YXJlbnQ6IHNldHNvY2tvcHQgZXJyb3IiKTsKI2VuZGlmIC8qIExJTkdFUl9JTl9DTElFTlQgKi8K
 CQlzbGVlcChTTEVFUDEpOwoJCWlmICh3cml0ZShjb25uZmQsIGJ1ZiwgQlVGU0laRSkgPCAwKSB7
 CgkJCSh2b2lkKWtpbGwocGlkLCBTSUdURVJNKTsKCQkJZXJyKDEsICJwYXJlbnQ6IHdyaXRlIGVy
 cm9yIik7CgkJfQoJCWlmIChjbG9zZShjb25uZmQpIDwgMCkgewoJCQkodm9pZClraWxsKHBpZCwg
 U0lHVEVSTSk7CgkJCWVycigxLCAicGFyZW50OiBjbG9zZSBlcnJvciIpOwoJCX0KCX0gZWxzZSB7
 CgkJaWYgKChjb25uZmQgPSBhY2NlcHQobGlzdGVuZmQsIChzdHJ1Y3Qgc29ja2FkZHIgKilOVUxM
 LAoJCQkJICAgICBOVUxMKSkgPCAwKQoJCQllcnIoMSwgImNoaWxkOiBhY2NlcHQgZXJyb3IiKTsK
 I2lmZGVmIExJTkdFUl9JTl9TRVJWRVIJCQoJCS8qCgkJICogU2VuZCBSU1Qgb24gY2xvc2UuCgkJ
 ICovCgkJbGluZy5sX29ub2ZmID0gMTsKCQlsaW5nLmxfbGluZ2VyID0gMDsJCQoJCWlmIChzZXRz
 b2Nrb3B0KGNvbm5mZCwgU09MX1NPQ0tFVCwgU09fTElOR0VSLAoJCQkgICAgICAgJmxpbmcsIHNp
 emVvZihsaW5nKSkgPCAwKQoJCQllcnIoMSwgImNoaWxkOiBzZXRzb2Nrb3B0IGVycm9yIik7CiNl
 bmRpZiAvKiBMSU5HRVJfSU5fU0VSVkVSICovCgkJc2xlZXAoU0xFRVAyKTsKCQlpZiAoY2xvc2Uo
 Y29ubmZkKSA8IDApCgkJCWVycigxLCAiY2hpbGQ6IGNsb3NlIGVycm9yIik7Cgl9CglleGl0KDAp
 Owp9CgojaWYgMAoKU0xFRVAxID0gMDsgU0xFRVAyID0gMDogTElOR0VSX0lOX1NFUlZFUjogcGFy
 ZW50OiB3cml0ZSBlcnJvcjogQ29ubmVjdGlvbiByZXNldCBieSBwZWVyCjAwOjAwOjAwLjAwMDAw
 MCBJUCAxMjcuMC4wLjEuMjM4NTEgPiAxMjcuMC4wLjEuMjM0ODE6IEZsYWdzIFtTXSwgc2VxIDMy

Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly

2010-05-28 Thread Mikolaj Golub
The following reply was made to PR kern/146845; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: Lavrentiev\, Anton \(NIH\/NLM\/NCBI\) \[C\] l...@ncbi.nlm.nih.gov
Cc: Robert N. M. Watson rwat...@freebsd.org, freebsd-net@FreeBSD.org, 
bug-follo...@freebsd.org
Subject: Re: kern/146845: [libc] close(2) returns error 54 (connection reset by 
peer) wrongly
Date: Fri, 28 May 2010 12:26:33 +0300

 On Fri, 28 May 2010 04:40:03 GMT Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote:
 
  LA  IMHO, it is not, unfortunately, a solution:  it seems to clear ECONNRESET
  LA  blindly and w/o distinguishing the situation when the remote end closes 
the
  LA  connection prematurely (i.e. before acknowledging all data written from 
the
  LA  local end) -- and that qualifies for the true connection reset by peer
  LA  from close()...
 
 I am not very familiar with the socket/tcp code but it looks for me that it
 might not make any difference.
 
 I can be wrong here but the situation you have described as true connection
 reset by peer seems to have the following path in the code:
 
 soclose() - sodisconnect() - tcp_usr_disconnect() - tcp_disconnect()
 
 But tcp_disconnect() does not return error, so we will not have ECONNRESET
 error in any case.
 
 May be you have a good test suite to reproduce this situation? :-) 
 
 -- 
 Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly

2010-05-27 Thread Mikolaj Golub
:

1) after shutdown() our output is closed;

2) then we call close(), soclose() checks that we are still in SS_ISCONNECTED
and calls sodisconnect();

3) at this time FIN arrives from the other end, which has called close() too,
and the kernel disconnects the socket (INP_DROPPED is set);

4) sodisconnect()/tcp_usr_disconnect() checks for INP_DROPPED and returns
ECONNRESET.

I am attaching the patch, which may not be a solution but rather for
illustration to described above. Running the test with this patch I am
observing the following messages in error logs

May 27 23:55:41 zhuzha kernel: ECONNRESET: so-state: 0x2000; file 
/usr/src/sys/kern/uipc_socket.c; line 664

and test does not fail.

-- 
Mikolaj Golub



tcp_close.c
Description: Binary data


uipc_socket.c.econnreset.patch
Description: Binary data
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/146845: [libc] close(2) returns error 54 (connection reset by peer) wrongly

2010-05-27 Thread Mikolaj Golub
The following reply was made to PR kern/146845; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: bug-follo...@freebsd.org
Cc: Anton Lavrentiev l...@ncbi.nlm.nih.gov, Robert Watson 
rwat...@freebsd.org, freebsd-net@FreeBSD.org
Subject: Re: kern/146845: [libc] close(2) returns error 54 (connection reset by 
peer) wrongly
Date: Fri, 28 May 2010 00:25:42 +0300

 --=-=-=
 Content-Type: text/plain; charset=koi8-r
 Content-Transfer-Encoding: 8bit
 
 Hi,
 
 We observed the same issue on our FreeBSD6 and 7 servers. I tried to reproduce
 the problem writing a simple test case but failed -- I didn't come to the idea
 of shutdown()/close() sequence (as Anton did). Although looking now at the
 code we had the issue with I see that shutdown()/close() sequence was used
 there too.
 
 It looks like SO_LINGER is not important to reproduce ECONNRESET.
 shutdown()/close() on one end and close() on the other is enough. Also,
 slowdown of one the processes (done by Anton using select()) is not important
 too. Taking this into consideration I have wrote a simplified version of a test
 to reproduce the bug (may be it worth of including to 
tools/regression/sockets?).
 
 I can easily reproduce the error with this test on FreeBSD7.1 and
 8-STABLE. Adding some prints to the kernel code I localized the place where
 the error appears and added panic() to get a backtrace.
 
 So, the backtrace:
 
 (kgdb) bt
 #0  doadump () at pcpu.h:246
 #1  0xc04ec829 in db_fncall (dummy1=-1064461270, dummy2=0, dummy3=-1, 
dummy4=0xe85e58b0 ÄX^è)
 at /usr/src/sys/ddb/db_command.c:548
 #2  0xc04ecc5f in db_command (last_cmdp=0xc0e0af9c, cmd_table=0x0, dopager=0)
 at /usr/src/sys/ddb/db_command.c:445
 #3  0xc04ecd14 in db_command_script (command=0xc0e0bec4 call doadump)
 at /usr/src/sys/ddb/db_command.c:516
 #4  0xc04f0e50 in db_script_exec (scriptname=0xe85e59bc kdb.enter.panic, 
warnifnotfound=Variable warnifnotfound is not available.
 )
 at /usr/src/sys/ddb/db_script.c:302
 #5  0xc04f0f37 in db_script_kdbenter (eventname=0xc0cc78ea panic)
 at /usr/src/sys/ddb/db_script.c:324
 #6  0xc04eec18 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228
 #7  0xc08d9aa6 in kdb_trap (type=3, code=0, tf=0xe85e5af8) at 
/usr/src/sys/kern/subr_kdb.c:535
 #8  0xc0befecb in trap (frame=0xe85e5af8) at /usr/src/sys/i386/i386/trap.c:690
 #9  0xc0bd15eb in calltrap () at /usr/src/sys/i386/i386/exception.s:165
 #10 0xc08d9c2a in kdb_enter (why=0xc0cc78ea panic, msg=0xc0cc78ea panic) 
at cpufunc.h:71
 #11 0xc08a95b6 in panic (fmt=0xc0ce6585 ECONNRESET) at 
/usr/src/sys/kern/kern_shutdown.c:562
 #12 0xc0a3d805 in tcp_usr_disconnect (so=0xc715c670) at 
/usr/src/sys/netinet/tcp_usrreq.c:552
 #13 0xc09111bd in sodisconnect (so=0xc715c670) at 
/usr/src/sys/kern/uipc_socket.c:810
 #14 0xc0914144 in soclose (so=0xc715c670) at 
/usr/src/sys/kern/uipc_socket.c:658
 #15 0xc08f6459 in soo_close (fp=0xc743e230, td=0xc7023000)
 at /usr/src/sys/kern/sys_socket.c:291
 #16 0xc086efc3 in _fdrop (fp=0xc743e230, td=0xc7023000) at file.h:293
 #17 0xc0870cf0 in closef (fp=0xc743e230, td=0xc7023000)
 at /usr/src/sys/kern/kern_descrip.c:2117
 #18 0xc0871097 in kern_close (td=0xc7023000, fd=4) at 
/usr/src/sys/kern/kern_descrip.c:1162
 #19 0xc087123a in close (td=0xc7023000, uap=0xe85e5cf8)
 at /usr/src/sys/kern/kern_descrip.c:1114
 #20 0xc0bef600 in syscall (frame=0xe85e5d38) at 
/usr/src/sys/i386/i386/trap.c:
 #21 0xc0bd1680 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
 #22 0x0033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) fr 12
 #12 0xc0a3d805 in tcp_usr_disconnect (so=0xc715c670) at 
/usr/src/sys/netinet/tcp_usrreq.c:552
 552 panic(ECONNRESET);
 (kgdb) list
 547 inp = sotoinpcb(so);
 548 KASSERT(inp != NULL, (tcp_usr_disconnect: inp == NULL));
 549 INP_WLOCK(inp);
 550 if (inp-inp_flags  (INP_TIMEWAIT | INP_DROPPED)) {
 551 error = ECONNRESET;
 552 panic(ECONNRESET);
 553 /* log(LOG_INFO, ECONNRESET 3: file %s; line %d\n, 
__FILE__, __LINE__); */
 554 goto out;
 555 }
 556 tp = intotcpcb(inp);
 (kgdb) p/x inp-inp_flags
 $1 = 0x480
 
 #define INP_DROPPED 0x0400 /* protocol drop flag */
 
 (kgdb) fr 14
 #14 0xc0914144 in soclose (so=0xc715c670) at 
/usr/src/sys/kern/uipc_socket.c:658
 658 error = sodisconnect(so);
 (kgdb) list
 653
 654 CURVNET_SET(so-so_vnet);
 655 funsetown(so-so_sigio);
 656 if (so-so_state  SS_ISCONNECTED) {
 657 if ((so-so_state  SS_ISDISCONNECTING) == 0) {
 658 error = sodisconnect(so);
 659 if (error) {
 660 if (error == ENOTCONN)
 661 error = 0

Re: sockstat / netstat output 8.x vs 7.x

2010-05-14 Thread Mikolaj Golub

On Tue, 11 May 2010 13:24:02 -0700 Julian Elischer wrote:

 JE On 5/11/10 12:20 PM, Wes Peters wrote:
  The output header is instructive:
 
  USER COMMANDPID   FD PROTO  LOCAL ADDRESS FOREIGN ADDRESS
  www  httpd  18423 3  tcp4 6 *:80  *:*
  www  httpd  18423 4  tcp4   *:*   *:*
  www  httpd  25184 3  tcp4 6 *:80  *:*
  www  httpd  25184 4  tcp4   *:*   *:*
 
  Same as 7, it's the foreign address.  This is normally only useful for
  connected sockets.
 
  On Tue, May 11, 2010 at 11:14 AM, Mike Tancsam...@sentex.net  wrote:
  [trying on freebsd-net since no response on stable]
 
  I noticed that apache on RELENG_8 and RELENG_7 shows up with output I cant
  seem to understand from sockstat -l and netstat -naW
 
  On RELENG_7, sockstat -l makes sense to me
  
  www  httpd  83005 4  tcp4   *:443 *:*
  www  httpd  82217 3  tcp4   *:80  *:*
  www  httpd  82217 4  tcp4   *:443 *:*
  www  httpd  38942 3  tcp4   *:80  *:*
  www  httpd  38942 4  tcp4   *:443 *:*
  root httpd  1169  3  tcp4   *:80  *:*
  root httpd  1169  4  tcp4   *:443 *:*
 
 
  various processes listening on all bound IP addresses on ports 80 and 443.
 
  On  RELENG_8 however, it shows up with an extra entry (at the end)
 
  www  httpd  29005 4  tcp4   *:*   *:*
  www  httpd  29004 3  tcp4 6 *:80  *:*
  www  httpd  29004 4  tcp4   *:*   *:*
  www  httpd  29003 3  tcp4 6 *:80  *:*
  www  httpd  29003 4  tcp4   *:*   *:*
  www  httpd  66731 3  tcp4 6 *:80  *:*
  www  httpd  66731 4  tcp4   *:*   *:*
  root httpd  72197 3  tcp4 6 *:80  *:*
  root httpd  72197 4  tcp4   *:*   *:*
 
 
  *:80 makes sense to me... process is listening on all IPs for port 80.  
  What
  does *:* mean then ?

 JE I believe it has created a socket but not used it for anything
 JE it may be the 6 socket... otherwise I don't see what a tcp4 6 is
 JE meant to be.

Comparing RELENG_8 and RELENG_7 outputs it might be for https, which looks
like is not configured on RELENG_8 host. I think socket() was called but no
any other actions with the socket was performed.

 
  Netstat gives a slightly different version of it
 
  Active Internet connections (including servers)
  Proto Recv-Q Send-Q  Local Address  Foreign Address   (state)
  tcp4   0  0 *.1984 *.*LISTEN
  tcp4   0  0 *.**.*CLOSED
  tcp46  0  0 *.80   *.*LISTEN
 
  state closed ?

You can reproduce this with this simple program:

zhuzha:~/src/test_socket% cat test.c 
#include sys/types.h
#include sys/socket.h
#include errno.h
#include unistd.h
#include err.h

int
main(int argc, char **argv)
{
int sockfd;

if ((sockfd = socket(AF_INET, SOCK_STREAM, 0))  0)
errx(1, socket error);
sleep(60);
return 0;
}

zhuzha:~/src/test_socket% make
cc -g -O0 -Wall  test.c  -o test
zhuzha:~/src/test_socket% ./test
[1] 56076
zhuzha:~/src/test_socket% sockstat|grep test
golubtest   56076 3  tcp4   *:*   *:*
zhuzha:~/src/test_socket% netstat -na |grep CLOSED
tcp4   0  0 *.**.*CLOSED

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/133902: [tun] Killing tun0 iface ssh tunnel causes Panic String: page fault

2010-05-03 Thread Mikolaj Golub
This pr is duplicate of kern/116837 so I think we can close it. The problem is
fixed in CURRENT and 8-STABLE and there is a patch for 7-STABLE (see
kern/116837 for details).

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/133902: [tun] Killing tun0 iface ssh tunnel causes Panic String: page fault

2010-05-03 Thread Mikolaj Golub
The following reply was made to PR kern/133902; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: bug-follo...@freebsd.org
Cc: Leonardo Santagostini lsantagost...@gmail.com, Bjoern A. Zeeb 
b...@freebsd.org, freebsd-net@FreeBSD.org
Subject: Re: kern/133902: [tun] Killing tun0 iface ssh tunnel causes Panic 
String: page fault
Date: Mon, 03 May 2010 10:41:34 +0300

 This pr is duplicate of kern/116837 so I think we can close it. The problem is
 fixed in CURRENT and 8-STABLE and there is a patch for 7-STABLE (see
 kern/116837 for details).
 
 -- 
 Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: FreeBSD 8.0-STABLE mpd - system freeze

2010-05-02 Thread Mikolaj Golub
On Sun, 2 May 2010 12:46:19 +0200 (CEST) Roar Pettersen wrote:

 Upgraded some servers from 7.2-stabel to 8.0-stable early april and
 since then I have seen stability problems with 8.0 servers which use
 mpd (vpn).
 I have tried several mpd version (5.5, 5.3 and 5.1), but the system freeze
 within 6 hours or 3-5 days. Early in april we got typical watchdog timeout
 error message just before the system freeze, but now we don't get any
 error message.

Could you try disabling flowtable to see if it helps?

sysctl -w net.inet.flowtable.enable=0

 Sometimes we also see that the mpd process goes into a none killeable
 stauts, and then when I execute a shutdown -r the system hang with
 this message :

 stopping mpd5
 Waiting for PIDS : 114830 second watchdog timeout expired. Shutdown 
 terminated.
 Apr 29 21:04:52 init : some process would not die; ps axl advised
 Waiting (max 60 seconds) for system process 'vnlru' to stop...

Could you provide the output of

procstat -kk mpdpid

when this happens again or even better:

procstat -akk

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Races on alias deletion

2010-05-02 Thread Mikolaj Golub

I have sent pr about this issue. kern/146250

On Wed, 21 Apr 2010 08:28:48 +0300 Mikolaj Golub wrote:

 MG Hi,

 MG Accidentally due to misconfiguration of our tools we ran simultaneously
 MG deletion of the same interface alias and crashed the box (FreeBSD-7.1).

 MG So I did some experiments on my 8-STABLE (I have CURRENT in virtualbox 
only)
 MG to investigate this running concurrently two scripts, which were adding and
 MG deleting the same address:

 MG while true; do
 MG ifconfig $IFACE  alias $IP
 MG ifconfig $IFACE -alias $IP
 MG done

 MG The box crashed just after I started the second script. The crash was in
 MG in_control() on removing ia-ia_ifa from ifp-if_addrhead list, because 
there
 MG was no check if the address is still in the list before removing.

 MG panic: Bad link elm 0xcd2f3b00 prev-next != elm

 MG #0  doadump () at pcpu.h:246
 MG #1  0xc04ec829 in db_fncall (dummy1=-1064461270, dummy2=0, dummy3=-1, 
dummy4=0xe9a737fc \0208╖И)
 MG at /usr/src/sys/ddb/db_command.c:548
 MG #2  0xc04ecc5f in db_command (last_cmdp=0xc0e0ab9c, cmd_table=0x0, 
dopager=0)
 MG at /usr/src/sys/ddb/db_command.c:445
 MG #3  0xc04ecd14 in db_command_script (command=0xc0e0bac4 call doadump) at 
/usr/src/sys/ddb/db_command.c:516
 MG #4  0xc04f0e50 in db_script_exec (scriptname=0xe9a73908 kdb.enter.panic, 
warnifnotfound=Variable warnifnotfound is not available.
 MG )
 MG at /usr/src/sys/ddb/db_script.c:302
 MG #5  0xc04f0f37 in db_script_kdbenter (eventname=0xc0cc760a panic) at 
/usr/src/sys/ddb/db_script.c:324
 MG #6  0xc04eec18 in db_trap (type=3, code=0) at 
/usr/src/sys/ddb/db_main.c:228
 MG #7  0xc08d9aa6 in kdb_trap (type=3, code=0, tf=0xe9a73a44) at 
/usr/src/sys/kern/subr_kdb.c:535
 MG #8  0xc0befbeb in trap (frame=0xe9a73a44) at 
/usr/src/sys/i386/i386/trap.c:690
 MG #9  0xc0bd130b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
 MG #10 0xc08d9c2a in kdb_enter (why=0xc0cc760a panic, msg=0xc0cc760a 
panic) at cpufunc.h:71
 MG #11 0xc08a95b6 in panic (fmt=0xc0c61bc0 Bad link elm %p prev-next != 
elm)
 MG at /usr/src/sys/kern/kern_shutdown.c:562
 MG #12 0xc09ba87f in in_control (so=0xcdbd519c, cmd=2149607705, 
data=0xcd3db120 fxp0, ifp=0xc5b94c00, 
 MG td=0xc92ddb90) at /usr/src/sys/netinet/in.c:604
 MG #13 0xc095d400 in ifioctl (so=0xcdbd519c, cmd=2149607705, data=0xcd3db120 
fxp0, td=0xc92ddb90)
 MG at /usr/src/sys/net/if.c:2516
 MG #14 0xc08f69d5 in soo_ioctl (fp=0xcdc90af0, cmd=2149607705, 
data=0xcd3db120, active_cred=0xc9d78400, 
 MG td=0xc92ddb90) at /usr/src/sys/kern/sys_socket.c:212
 MG #15 0xc08f0a2d in kern_ioctl (td=0xc92ddb90, fd=3, com=2149607705, 
data=0xcd3db120 fxp0) at file.h:262
 MG #16 0xc08f0bb4 in ioctl (td=0xc92ddb90, uap=0xe9a73cf8) at 
/usr/src/sys/kern/sys_generic.c:678
 MG #17 0xc0bef320 in syscall (frame=0xe9a73d38) at 
/usr/src/sys/i386/i386/trap.c:
 MG #18 0xc0bd13a0 in Xint0x80_syscall () at 
/usr/src/sys/i386/i386/exception.s:261
 MG #19 0x0033 in ?? ()
 MG Previous frame inner to this frame (corrupt stack?)
 MG (kgdb) fr 12
 MG #12 0xc09ba87f in in_control (so=0xcdbd519c, cmd=2149607705, 
data=0xcd3db120 fxp0, ifp=0xc5b94c00, 
 MG td=0xc92ddb90) at /usr/src/sys/netinet/in.c:604
 MG 604 TAILQ_REMOVE(ifp-if_addrhead, ia-ia_ifa, ifa_link);
 MG (kgdb) list
 MG 599 default:
 MG 600 panic(in_control: unsupported ioctl);
 MG 601 }
 MG 602
 MG 603 IF_ADDR_LOCK(ifp);
 MG 604 TAILQ_REMOVE(ifp-if_addrhead, ia-ia_ifa, ifa_link);
 MG 605 IF_ADDR_UNLOCK(ifp);
 MG 606 ifa_free(ia-ia_ifa);  /* 
if_addrhead */
 MG 607
 MG 608 IN_IFADDR_WLOCK();

 MG The fist patch in the attachments fixed this type of crashes for me, but 
the
 MG box started to crash in in_lltable_prefix_free (now it was required for
 MG scripts to run a few seconds).

 MG (kgdb) bt
 MG #0  doadump () at pcpu.h:246
 MG #1  0xc04ec829 in db_fncall (dummy1=1, dummy2=0, dummy3=-1056922880, 
dummy4=0xe8636760 )
 MG at /usr/src/sys/ddb/db_command.c:548
 MG #2  0xc04ecc21 in db_command (last_cmdp=0xc0e0ac1c, cmd_table=0x0, 
dopager=1)
 MG at /usr/src/sys/ddb/db_command.c:445
 MG #3  0xc04ecd7a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
 MG #4  0xc04eec1d in db_trap (type=12, code=0) at 
/usr/src/sys/ddb/db_main.c:229
 MG #5  0xc08d9aa6 in kdb_trap (type=12, code=0, tf=0xe863694c) at 
/usr/src/sys/kern/subr_kdb.c:535
 MG #6  0xc0beeedf in trap_fatal (frame=0xe863694c, eva=420) at 
/usr/src/sys/i386/i386/trap.c:929
 MG #7  0xc0bef800 in trap (frame=0xe863694c) at 
/usr/src/sys/i386/i386/trap.c:328
 MG #8  0xc0bd139b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
 MG #9  0xc08a6a8b in _rw_wlock_hard (rw=0xc79e1508, tid=3334964384, 
file=0xc0ce01e4 /usr/src/sys/netinet/in.c, 
 MG line=1370) at /usr/src/sys/kern/kern_rwlock.c:677
 MG #10 0xc08a75d6 in _rw_wlock (rw=0xc79e1508, file=0xc0ce01e4

Races on alias deletion

2010-04-20 Thread Mikolaj Golub
=2151704858, 
data=0xc7841bc0 fxp0) at file.h:262
#17 0xc08f0bb4 in ioctl (td=0xc818db90, uap=0xe880dcf8) at 
/usr/src/sys/kern/sys_generic.c:678
#18 0xc0bef430 in syscall (frame=0xe880dd38) at 
/usr/src/sys/i386/i386/trap.c:
#19 0xc0bd14b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
#20 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) fr 12
#12 0xc09b8efc in in_ifinit (ifp=0xc5b94c00, ia=0xc876ea00, sin=0xc185fcf6, 
scrub=0)
at /usr/src/sys/netinet/in.c:844
844 LIST_REMOVE(ia, ia_hash);
(kgdb) list in_ifinit
832  * and routing table entry.
833  */
834 static int
835 in_ifinit(struct ifnet *ifp, struct in_ifaddr *ia, struct sockaddr_in 
*sin,
836 int scrub)
837 {
838 register u_long i = ntohl(sin-sin_addr.s_addr);
839 struct sockaddr_in oldaddr;
840 int s = splimp(), flags = RTF_UP, error = 0;
841 
(kgdb) 
842 oldaddr = ia-ia_addr;
843 if (oldaddr.sin_family == AF_INET)
844 LIST_REMOVE(ia, ia_hash);
845 ia-ia_addr = *sin;
846 if (ia-ia_addr.sin_family == AF_INET) {
847 IN_IFADDR_WLOCK();
848 
LIST_INSERT_HEAD(INADDR_HASH(ia-ia_addr.sin_addr.s_addr),
849 ia, ia_hash);
850 IN_IFADDR_WUNLOCK();
851 }

Applying the fourth patch fixed this. But it is still possible to crash the
box:

#0  doadump () at pcpu.h:246
#1  0xc04ec829 in db_fncall (dummy1=1, dummy2=0, dummy3=-1056922624, 
dummy4=0xe847c890 )
at /usr/src/sys/ddb/db_command.c:548
#2  0xc04ecc21 in db_command (last_cmdp=0xc0e0ad1c, cmd_table=0x0, dopager=1)
at /usr/src/sys/ddb/db_command.c:445
#3  0xc04ecd7a in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#4  0xc04eec1d in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229
#5  0xc08d9aa6 in kdb_trap (type=12, code=0, tf=0xe847ca7c) at 
/usr/src/sys/kern/subr_kdb.c:535
#6  0xc0beefbf in trap_fatal (frame=0xe847ca7c, eva=3735929146) at 
/usr/src/sys/i386/i386/trap.c:929
#7  0xc0bef8e0 in trap (frame=0xe847ca7c) at /usr/src/sys/i386/i386/trap.c:328
#8  0xc0bd147b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#9  0xc09b9c24 in in_control (so=0xc6e29670, cmd=2149607705, data=0xc6246ba0 
fxp0, ifp=0xc5b94c00, 
td=0xc6a59940) at /usr/src/sys/netinet/in.c:331
#10 0xc095d400 in ifioctl (so=0xc6e29670, cmd=2149607705, data=0xc6246ba0 
fxp0, td=0xc6a59940)
at /usr/src/sys/net/if.c:2516
#11 0xc08f69d5 in soo_ioctl (fp=0xc6374700, cmd=2149607705, data=0xc6246ba0, 
active_cred=0xc7131280, 
td=0xc6a59940) at /usr/src/sys/kern/sys_socket.c:212
#12 0xc08f0a2d in kern_ioctl (td=0xc6a59940, fd=3, com=2149607705, 
data=0xc6246ba0 fxp0) at file.h:262
#13 0xc08f0bb4 in ioctl (td=0xc6a59940, uap=0xe847ccf8) at 
/usr/src/sys/kern/sys_generic.c:678
#14 0xc0bef490 in syscall (frame=0xe847cd38) at 
/usr/src/sys/i386/i386/trap.c:
#15 0xc0bd1510 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
#16 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) fr 9
#9  0xc09b9c24 in in_control (so=0xc6e29670, cmd=2149607705, data=0xc6246ba0 
fxp0, ifp=0xc5b94c00, 
td=0xc6a59940) at /usr/src/sys/netinet/in.c:331
331 if (iap-ia_ifp == ifp 
(kgdb) list
326  * first one on the interface, if possible.
327  */
328 dst = ((struct sockaddr_in *)ifr-ifr_addr)-sin_addr;
329 IN_IFADDR_RLOCK();
330 LIST_FOREACH(iap, INADDR_HASH(dst.s_addr), ia_hash) {
331 if (iap-ia_ifp == ifp 
332 iap-ia_addr.sin_addr.s_addr == dst.s_addr) {
333 if (td == NULL || prison_check_ip4(td-td_ucred,
334 dst) == 0)
335 ia = iap;
(kgdb) p iap
$1 = (struct in_ifaddr *) 0xdeadc0de

But I don't have the patch for this yet :-).

Also I have noticed that after running my tests long enough (but not so long
to crash the box) the error message starts to appear on every attempt to add
tested alias IP (although the alias is created):

ifconfig: ioctl (SIOCAIFADDR): File exists

This is because the route is not deleted on alias removal (some reference
leak?). After removing the route manually the error does not appear.

-- 
Mikolaj Golub

--- sys/netinet/in.c.orig	2010-04-16 15:15:07.0 +0300
+++ sys/netinet/in.c	2010-04-18 17:22:57.0 +0300
@@ -601,8 +601,17 @@ in_control(struct socket *so, u_long cmd
 	}
 
 	IF_ADDR_LOCK(ifp);
-	TAILQ_REMOVE(ifp-if_addrhead, ia-ia_ifa, ifa_link);
+	TAILQ_FOREACH(ifa, ifp-if_addrhead, ifa_link) {
+		if (ia-ia_ifa == ifa) {
+			TAILQ_REMOVE(ifp-if_addrhead, ia-ia_ifa, ifa_link);
+			break;
+		}
+	}
 	IF_ADDR_UNLOCK(ifp);
+	if (ifa == NULL) {
+		error = EADDRNOTAVAIL;
+		goto out;
+	}
 	ifa_free(ia-ia_ifa);/* if_addrhead

Re: kmem leakage on tun/tap device removal

2010-03-16 Thread Mikolaj Golub

On Feb 28, 1:30 pm, to.my.troc...@gmail.com (Mikolaj Golub) wrote:

 But I have faced with another issue (not related to your patch, as it is
 observed with unpatched kernel too). When I try to run concurrently two
 create/destroy scripts with the same interface the system panics:
 
 Unread portion of the kernel message buffer:
 panic: Bad link elm 0xc5f1a800 next-prev != elm
 cpuid = 2
 KDB: enter: panic
 exclusive sleep mutex if_clone lock (if_clone lock) r = 0 (0xc0da1cf0) locked 
 @ /usr/src/sys/net/if_clone.c:248
 exclusive sleep mutex if_clone lock (if_clone lock) r = 0 (0xc0da1cf0) locked 
 @ /usr/src/sys/net/if_clone.c:248
 exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc6cd3560) locked @ 
 /usr/src/sys/kern/uipc_sockbuf.c:148
 exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc6b4dbd0) locked @ 
 /usr/src/sys/kern/uipc_sockbuf.c:148
 Physical memory: 2019 MB
 Dumping 160 MB: 145 129 113 97 81 65 49 33 17 1
 
 #0  doadump () at pcpu.h:246
 246 __asm __volatile(movl %%fs:0,%0 : =r (td));
 (kgdb) bt
 #0  doadump () at pcpu.h:246
 #1  0xc04e8bb9 in db_fncall (dummy1=-1064515926, dummy2=0, dummy3=-1, 
 dummy4=0xe83f4834 HH?è)
 at /usr/src/sys/ddb/db_command.c:548
 #2  0xc04e8fef in db_command (last_cmdp=0xc0de14dc, cmd_table=0x0, dopager=0)
 at /usr/src/sys/ddb/db_command.c:445
 #3  0xc04e90a4 in db_command_script (command=0xc0de2404 call doadump)
 at /usr/src/sys/ddb/db_command.c:516
 #4  0xc04ed1d0 in db_script_exec (scriptname=0xe83f4940 kdb.enter.panic, 
 warnifnotfound=Variable warnifnotfound is not available.
 )
 at /usr/src/sys/ddb/db_script.c:302
 #5  0xc04ed2b7 in db_script_kdbenter (eventname=0xc0ca1948 panic) at 
 /usr/src/sys/ddb/db_script.c:324
 #6  0xc04eaf98 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228
 #7  0xc08cc526 in kdb_trap (type=3, code=0, tf=0xe83f4a7c) at 
 /usr/src/sys/kern/subr_kdb.c:535
 #8  0xc0bdd38b in trap (frame=0xe83f4a7c) at /usr/src/sys/i386/i386/trap.c:690
 #9  0xc0bbef1b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
 #10 0xc08cc6aa in kdb_enter (why=0xc0ca1948 panic, msg=0xc0ca1948 panic) 
 at cpufunc.h:71
 #11 0xc089d716 in panic (fmt=0xc0c3c80c Bad link elm %p next-prev != elm)
 at /usr/src/sys/kern/kern_shutdown.c:562
 #12 0xc094e7fb in if_clone_destroyif (ifc=0xc0da1cc0, ifp=0xc5f1a800) at 
 /usr/src/sys/net/if_clone.c:249
 #13 0xc094eb52 in if_clone_destroy (name=0xc664ac20 tun0) at 
 /usr/src/sys/net/if_clone.c:227
 #14 0xc094c8a6 in ifioctl (so=0xc6e0a9a8, cmd=2149607801, data=0xc664ac20 
 tun0, td=0xc66c0d80)
 at /usr/src/sys/net/if.c:2412
 #15 0xc08e8b25 in soo_ioctl (fp=0xc6d46af0, cmd=2149607801, data=0xc664ac20, 
 active_cred=0xc5f62280,
 td=0xc66c0d80) at /usr/src/sys/kern/sys_socket.c:212
 #16 0xc08e31bd in kern_ioctl (td=0xc66c0d80, fd=3, com=2149607801, 
 data=0xc664ac20 tun0) at file.h:262
 #17 0xc08e3344 in ioctl (td=0xc66c0d80, uap=0xe83f4cf8) at 
 /usr/src/sys/kern/sys_generic.c:678
 #18 0xc0bdca33 in syscall (frame=0xe83f4d38) at 
 /usr/src/sys/i386/i386/trap.c:1078
 #19 0xc0bbefb0 in Xint0x80_syscall () at 
 /usr/src/sys/i386/i386/exception.s:261
 #20 0x0033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) fr 12
 #12 0xc094e7fb in if_clone_destroyif (ifc=0xc0da1cc0, ifp=0xc5f1a800) at 
 /usr/src/sys/net/if_clone.c:249
 249 IFC_IFLIST_REMOVE(ifc, ifp);
 (kgdb) list
 244  * switch to the vnet context of the target vnet.
 245  */
 246 CURVNET_SET_QUIET(ifp-if_vnet);
 247
 248 IF_CLONE_LOCK(ifc);
 249 IFC_IFLIST_REMOVE(ifc, ifp);
 250 IF_CLONE_UNLOCK(ifc);
 251
 252 if_delgroup(ifp, ifc-ifc_name);
 253
 

Actually, this issue has already been reported (kern/116837, see the bottom of
the discussion) and there was a patch provided by Takahiro Kurosawa [check
that ifp is on ifc-ifc_iflist before calling IFC_IFLIST_REMOVE(ifc, ifp)].
Although he mentioned that another race was still possible. I have tried the
patch and yes it makes the situation much better: the box did not crush when
running two ifconfig tun0 create/destroy scripts concurrently, but when I
tried 8 concurrent processes :-) it crashed after a couple minutes in another
place:

(kgdb) bt
#0  doadump () at pcpu.h:246
#1  0xc04ec379 in db_fncall (dummy1=1, dummy2=0, dummy3=-1056947200, 
dummy4=0xe86848e4 )
at /usr/src/sys/ddb/db_command.c:548
#2  0xc04ec771 in db_command (last_cmdp=0xc0e04d1c, cmd_table=0x0, dopager=1)
at /usr/src/sys/ddb/db_command.c:445
#3  0xc04ec8ca in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#4  0xc04ee76d in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229
#5  0xc08d7d06 in kdb_trap (type=12, code=0, tf=0xe8684ad0) at 
/usr/src/sys/kern/subr_kdb.c:535
#6  0xc0bea66f in trap_fatal (frame=0xe8684ad0, eva=3735929054) at 
/usr/src/sys/i386/i386/trap.c:929
#7  0xc0beaf90 in trap (frame=0xe8684ad0) at /usr/src/sys/i386/i386/trap.c:328
#8  0xc0bccd7b in calltrap

Re: kmem leakage on tun/tap device removal

2010-02-28 Thread Mikolaj Golub
 enabled, resume, IOPL = 0
current process = 53523 (ifconfig)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 1m15s
Physical memory: 2019 MB
Dumping 109 MB: 94 78 62 46 30 14


  137 Thread 100216 (PID=53523: ifconfig)  doadump () at pcpu.h:246

#0  doadump () at pcpu.h:246
#1  0xc0881c97 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416
#2  0xc0881f89 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:579
#3  0xc0bb39ec in trap_fatal (frame=0xe84959ec, eva=416) at 
/usr/src/sys/i386/i386/trap.c:938
#4  0xc0bb3c70 in trap_pfault (frame=0xe84959ec, usermode=0, eva=416) at 
/usr/src/sys/i386/i386/trap.c:851
#5  0xc0bb4675 in trap (frame=0xe84959ec) at /usr/src/sys/i386/i386/trap.c:533
#6  0xc0b96e0b in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#7  0xc087233f in _mtx_lock_sleep (m=0xc5c4b22c, tid=3330476288, opts=0, 
file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:369
#8  0xc0926471 in if_detach (ifp=0xc5c4b000) at /usr/src/sys/net/if.c:1188
#9  0xc0930879 in tun_destroy (tp=0xc6860d80) at /usr/src/sys/net/if_tun.c:259
#10 0xc0931927 in tun_clone_destroy (ifp=0xc5c4b000) at 
/usr/src/sys/net/if_tun.c:277
#11 0xc092a407 in ifc_simple_destroy (ifc=0xc0d496e0, ifp=0xc5c4b000) at 
/usr/src/sys/net/if_clone.c:595
#12 0xc092a62c in if_clone_destroyif (ifc=0xc0d496e0, ifp=0xc5c4b000) at 
/usr/src/sys/net/if_clone.c:254
#13 0xc092a9e2 in if_clone_destroy (name=0xc5f201e0 tun0) at 
/usr/src/sys/net/if_clone.c:227
#14 0xc0928a26 in ifioctl (so=0xc6806000, cmd=2149607801, data=0xc5f201e0 
tun0, td=0xc6830900)
at /usr/src/sys/net/if.c:2412
#15 0xc08c4c32 in soo_ioctl (fp=0xc6743968, cmd=2149607801, data=0xc5f201e0, 
active_cred=0xc66fa680, 
td=0xc6830900) at /usr/src/sys/kern/sys_socket.c:212
#16 0xc08bdb00 in kern_ioctl (td=0xc6830900, fd=3, com=2149607801, 
data=0xc5f201e0 tun0) at file.h:262
#17 0xc08bdc74 in ioctl (td=0xc6830900, uap=0xe8495cf8) at 
/usr/src/sys/kern/sys_generic.c:678
#18 0xc0bb3fb5 in syscall (frame=0xe8495d38) at 
/usr/src/sys/i386/i386/trap.c:1078
#19 0xc0b96e70 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
#20 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

  138 Thread 100211 (PID=53526: ifconfig)  sched_switch (td=0xc6832480, 
newtd=0xc5951b40, flags=259)
at /usr/src/sys/kern/sched_ule.c:1864

#0  sched_switch (td=0xc6832480, newtd=0xc5951b40, flags=259) at 
/usr/src/sys/kern/sched_ule.c:1864
#1  0xc088a15a in mi_switch (flags=259, newtd=0x0) at 
/usr/src/sys/kern/kern_synch.c:449
#2  0xc08bc02b in turnstile_wait (ts=0xc5ec1700, owner=0xc6830900, 
queue=Variable queue is not available.
)
at /usr/src/sys/kern/subr_turnstile.c:745
#3  0xc088073f in _rw_rlock (rw=0xc0db6024, file=0x0, line=0) at 
/usr/src/sys/kern/kern_rwlock.c:460
#4  0xc0924867 in ifunit_ref (name=0xc685e200 tun0) at 
/usr/src/sys/net/if.c:2017
#5  0xc0928d10 in ifioctl (so=0xc6820338, cmd=3223349536, data=0xc685e200 
tun0, td=0xc6832480)
at /usr/src/sys/net/if.c:2420
#6  0xc08c4c32 in soo_ioctl (fp=0xc6808c78, cmd=3223349536, data=0xc685e200, 
active_cred=0xc6802500, 
td=0xc6832480) at /usr/src/sys/kern/sys_socket.c:212
#7  0xc08bdb00 in kern_ioctl (td=0xc6832480, fd=3, com=3223349536, 
data=0xc685e200 tun0) at file.h:262
#8  0xc08bdc74 in ioctl (td=0xc6832480, uap=0xe846ccf8) at 
/usr/src/sys/kern/sys_generic.c:678
#9  0xc0bb3fb5 in syscall (frame=0xe846cd38) at 
/usr/src/sys/i386/i386/trap.c:1078
#10 0xc0b96e70 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261
#11 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kmem leakage on tun/tap device removal

2010-02-28 Thread Mikolaj Golub
On Sun, 28 Feb 2010 13:30:59 +0200 Mikolaj Golub wrote:

 I am running i386 8.0-STABLE (but rather old, from Dec 1, I can run tests on
 newer sources if this makes difference).

On today 8.0-STABLE I have had the same panic. 

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


kmem leakage on tun/tap device removal

2010-02-27 Thread Mikolaj Golub
)
dev-si_flags |= SI_CHEAPCLONE;
-   }
}
tuncreate(ifc-ifc_name, dev);
 
@@ -239,10 +237,8 @@ tunclone(void *arg, struct ucred *cred, 
/* No preexisting struct cdev *, create one */
*dev = make_dev(tun_cdevsw, u,
UID_UUCP, GID_DIALER, 0600, %s, name);
-   if (*dev != NULL) {
-   dev_ref(*dev);
+   if (*dev != NULL)
(*dev)-si_flags |= SI_CHEAPCLONE;
-   }
}
 
if_clone_create(name, namelen, NULL);


-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: mpd has hung

2010-02-19 Thread Mikolaj Golub
', p_fibnum = 0, p_xstat = 0, p_klist = 
{kl_list = {
  slh_first = 0x0}, kl_lock = 0x802aaac0 knlist_mtx_lock, 
kl_unlock = 0x802aaa90 knlist_mtx_unlock, 
kl_assert_locked = 0x802a7de0 knlist_mtx_assert_locked, 
kl_assert_unlocked = 0x802a7df0 knlist_mtx_assert_unlocked, 
kl_lockarg = 0xff0012c070f8}, p_numthreads = 2, p_md = {md_ldt = 0x0, 
md_ldt_sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, 
  sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, sd_hibase = 0, sd_xx1 = 0, 
sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe 
= {
tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, 
c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 17, p_peers = 0x0, 
  p_leader = 0xff0012c07000, p_emuldata = 0x0, p_label = 0x0, p_sched = 
0xff0012c07460, p_ktr = {stqh_first = 0x0, stqh_last = 0xff0012c07430}, 
  p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0x0, p_pwait = {cv_description = 
0x804c919f ppwait, cv_waiters = 0}}

Unfortunately there is no stack trace for flowcleaner. I have asked Alexander
to make the kernel panic on the next reboot and provide backtrace of
flowcleaner thread from the crush dump but I don't know if he has managed to
do this (this is a production host, which complicates things).


-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: NFS mount properties

2010-01-30 Thread Mikolaj Golub
On Thu, 28 Jan 2010 08:46:33 -0800 Alan Aldrich wrote:

 I am trying to determine how to examine the actual properties of an
 nfs mount in FreeBSD
 In CentOS one can 'cat /proc/mounts' to determine all of the mount
 properties.
 Specifically I am trying to confirm whether the mount is using TCP or
 UDP as I want it to
 be TCP . Is there some similar way to tell in FreeBSD?

 My fstab entry says this
 192.168.44.55:/mcp/home /netnfs rw,async,-d,-3,-s,-i,noatime,- 
 T 0 0
 It mounts fine,
 but I want to confirm that it is actually mounting with TCP and not
 UDP and cannot figure out
 what tool will tell me this.

 'mount' tells me this
 192.168.44.55:/mcp/home on /net (nfs, asynchronous, noatime)

 but not whether it is mounted TCP or UDP

If it is mounted with TCP u UDP you can find out with netstat, checking tcp
connections to 2049 port. If you see established connections like below

tcp4   0  0 10.0.0.110.895 10.0.100.2.2049ESTABLISHED

then tcp is used. Certainly if you have several mounts on the same ip it
complicates the situation :-)

I don't know any such tools that would report this info and would glad to hear
about them, but if I really needed this info I could use the universal tool --
kgdb :-)

zhuzha:~% sudo kgdb
(kgdb) set print pretty
(kgdb) p 
*mountlist.tqh_first.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next
$26 = {
...
  mnt_list = {
tqe_next = 0xc5e9a78c, 
tqe_prev = 0xc5e9acac
  }, 
...
  mnt_opt = 0xc61f7760, 
...
  mnt_stat = {
f_version = 537068824, 
f_type = 4, 
f_flags = 0, 
f_bsize = 512, 
f_iosize = 32768, 
f_blocks = 284354052, 
f_bfree = 137997300, 
f_bavail = 115248976, 
f_files = 18394110, 
f_ffree = 16921938, 
f_syncwrites = 0, 
f_asyncwrites = 0, 
f_syncreads = 0, 
f_asyncreads = 0, 
f_spare = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
f_namemax = 255, 
f_owner = 0, 
f_fsid = {
  val = {67174148, 4}
}, 
f_charspare = '\0' repeats 79 times, 
f_fstypename = nfs, '\0' repeats 12 times, 
f_mntfromname = 
srv01.ua1:/var/public\000\004(\000\000\000\000\000\214\004\b\003\000\000\000\003\000\000\000P\000\000\000X\002\000\000\200╩\000\000\000пBф\000\000\000\000
 \a \aю\a7ф \a7фюx\037ф╟x\037ф\004\000\000\000\v\000\000, 
f_mntonname = /mnt/0, '\0' repeats 81 times
  }, 
...
}
(kgdb) p 
*mountlist.tqh_first.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_opt.tqh_first
$27 = {
  link = {
tqe_next = 0xc6370560, 
tqe_prev = 0xc61f7760
  }, 
  name = 0xc61f7770 rw, 
  value = 0xc61f7780, 
  len = 1, 
  pos = 0, 
  seen = 0
}
(kgdb) p 
*mountlist.tqh_first.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_opt.tqh_first.link.tqe_next
$28 = {
  link = {
tqe_next = 0xc6370580, 
tqe_prev = 0xc6370520
  }, 
  name = 0xc61f77a0 soft, 
  value = 0xc61f77b0, 
  len = 1, 
  pos = 1, 
  seen = 1
}
(kgdb) p 
*mountlist.tqh_first.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_opt.tqh_first.link.tqe_next.link.tqe_next
$29 = {
  link = {
tqe_next = 0xc63705a0, 
tqe_prev = 0xc6370560
  }, 
  name = 0xc61f77c0 intr, 
  value = 0xc61f7810, 
  len = 1, 
  pos = 2, 
  seen = 1
}
(kgdb) p 
*mountlist.tqh_first.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_opt.tqh_first.link.tqe_next.link.tqe_next.link.tqe_next
$30 = {
  link = {
tqe_next = 0xc63705e0, 
tqe_prev = 0xc6370580
  }, 
  name = 0xc61f77d0 rsize, 
  value = 0xc61f77e0, 
  len = 6, 
  pos = 3, 
  seen = 1
}
(kgdb) p 
*mountlist.tqh_first.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_opt.tqh_first.link.tqe_next.link.tqe_next.link.tqe_next.link.tqe_next
$31 = {
  link = {
tqe_next = 0xc6370620, 
tqe_prev = 0xc63705a0
  }, 
  name = 0xc61f77f0 wsize, 
  value = 0xc61f7800, 
  len = 6, 
  pos = 4, 
  seen = 1
}
(kgdb) p 
*mountlist.tqh_first.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_list.tqe_next.mnt_opt.tqh_first.link.tqe_next.link.tqe_next.link.tqe_next.link.tqe_next.link.tqe_next
$32 = {
  link = {
tqe_next = 0xc6370640, 
tqe_prev = 0xc63705e0
  }, 
  name = 0xc61f7820 tcp, 
  value = 0xc61f7830, 
  len = 1, 
  pos = 5, 
  seen = 1
}

If I needed to do this frequently I would write a gdb script taking as an
example nice scripts from jhb :-)

http://people.freebsd.org/~jhb/gdb/

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


close: Socket is not connected

2009-12-17 Thread Mikolaj Golub
)
errx(1, fork(): %d, errno);

if (0 != pid) {
/* parent */

if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0))  0)
errx(1, parent: socket error: %d, errno);

unlink(UNIXSTR_PATH);
bzero(servaddr, sizeof(servaddr));
servaddr.sun_family = AF_LOCAL;
strcpy(servaddr.sun_path, UNIXSTR_PATH);

if (bind(listenfd, (struct sockaddr *) servaddr, 
sizeof(servaddr))  0)
errx(1, parent: bind error: %d, errno);

if (listen(listenfd, 1024)  0)
errx(1, parent: listen error: %d, errno);

for ( ; ; ) {
if ((connfd = accept(listenfd, (struct sockaddr *) 
NULL, NULL))  0)
errx(1, parent: accept error: %d, errno);

if (fcntl(connfd, F_SETFL, O_NONBLOCK) == -1)
errx(1, parent: fcntl error: %d, errno);

Read(connfd, buf, sizeof(buf));
Write(connfd, buf, sizeof(buf));

if (close(connfd)  0)
errx(1, parent: close error: %d, errno);
}

} else {
/* child */

/* wait some time while parent has created socket */
sleep(1);

for ( ; ; ) {

if ((connfd = socket(AF_LOCAL, SOCK_STREAM, 0))  0)
errx(1, child: socket error: %d, errno);

if (fcntl(connfd, F_SETFL, O_NONBLOCK) == -1)
errx(1, child: fcntl error: %d, errno);

bzero(servaddr, sizeof(servaddr));
servaddr.sun_family = AF_LOCAL;
strcpy(servaddr.sun_path, UNIXSTR_PATH);

if (connect(connfd, (struct sockaddr *) servaddr, 
sizeof(servaddr))  0)
errx(1, child: connect error %d, errno);

Write(connfd, buf, sizeof(buf));
Read(connfd, buf, sizeof(buf));

if (close(connfd) != 0) 
errx(1, child: close error: %d, errno);

usleep(USLEEP);
}
}

return 0;
}

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: host(1) coredumps

2009-10-11 Thread Mikolaj Golub

On Mon, 14 Sep 2009 01:16:43 +0800 Eugene Grosbein wrote:

 EG On Sun, Sep 13, 2009 at 05:41:50PM +0200, vol...@vwsoft.com wrote:

   % host -l grosbein.pp.ru. ns2.rucable.net.
   ; Transfer failed.
   /usr/local/src/lib/bind/isc/../../../contrib/bind9/lib/isc/unix/socket.c:2486:
   REQUIREsock) != ((void *)0))  (((const isc__magic_t *)(sock))-magic
   == ((('I')  24 | ('O')  16 | ('i')  8 | ('o')) failed.
   zsh: abort (core dumped)  host -l grosbein.pp.ru. ns2.rucable.net.
   
   Shoud I send PR?

  Eugene,
  
  the attached patch works around the error for me. As this is contributed
  code, it should be fixed upstream (no need to file a PR).
  
  Volker
  

  --- contrib/bind9/bin/dig/dighost.c.orig2009-09-13 
  14:24:13.0 +
  +++ contrib/bind9/bin/dig/dighost.c2009-09-13 14:31:52.0 
  +

 EG Indeed, the patch helps. Thank you.

BTW, we have already had the pr about this problem.

http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/138061

IMO it would be nice to add the patch there.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/134557: [netgraph] [hang] 7.2 with mpd5.3 hanging up - ng_pptp problem

2009-06-30 Thread Mikolaj Golub
Unfortunately, the problem was introduced by this commit :-)

--

Author: mav
Date:   Sat Jan 31 12:48:09 2009 UTC (4 months, 4 weeks ago)
Log Message:

MFC rev. 187495

Check for infinite recursion possible on some broken PPTP/L2TP/... VPN setups.
Mark packets with mbuf_tag on first interface passage and drop on second.

PR: ports/129625, ports/125303

--

If a packet goes through two or more ng interfaces, while loop in the tag
checking code can run infinitely. The attached patch should fix this.

-- 
Mikolaj Golub

--- netgraph/ng_iface.c.orig	2009-06-30 21:47:54.0 +0300
+++ netgraph/ng_iface.c	2009-06-30 21:49:29.0 +0300
@@ -365,7 +365,8 @@
 	}
 
 	/* Protect from deadly infinite recursion. */
-	while ((mtag = m_tag_locate(m, MTAG_NGIF, MTAG_NGIF_CALLED, NULL))) {
+	mtag = NULL;
+	while ((mtag = m_tag_locate(m, MTAG_NGIF, MTAG_NGIF_CALLED, mtag))) {
 		if (*(struct ifnet **)(mtag + 1) == ifp) {
 			log(LOG_NOTICE, Loop detected on %s\n, ifp-if_xname);
 			m_freem(m);
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: kern/134557: [netgraph] [hang] 7.2 with mpd5.3 hanging up - ng_pptp problem

2009-06-30 Thread Mikolaj Golub
The following reply was made to PR kern/134557; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: bug-follo...@freebsd.org
Cc: freebsd-net@FreeBSD.org, Sergei Cherveni sergei.cherv...@gmail.com, 
Alexander Motin m...@freebsd.org
Subject: Re: kern/134557: [netgraph] [hang] 7.2 with mpd5.3 hanging up - 
ng_pptp problem
Date: Tue, 30 Jun 2009 22:33:12 +0300

 --=-=-=
 
 Unfortunately, the problem was introduced by this commit :-)
 
 --
 
 Author:mav
 Date:  Sat Jan 31 12:48:09 2009 UTC (4 months, 4 weeks ago)
 Log Message:   
 
 MFC rev. 187495
 
 Check for infinite recursion possible on some broken PPTP/L2TP/... VPN setups.
 Mark packets with mbuf_tag on first interface passage and drop on second.
 
 PR:ports/129625, ports/125303
 
 --
 
 If a packet goes through two or more ng interfaces, while loop in the tag
 checking code can run infinitely. The attached patch should fix this.
 
 -- 
 Mikolaj Golub
 
 
 --=-=-=
 Content-Type: text/x-diff
 Content-Disposition: attachment; filename=ng_iface.c.patch
 
 --- netgraph/ng_iface.c.orig   2009-06-30 21:47:54.0 +0300
 +++ netgraph/ng_iface.c2009-06-30 21:49:29.0 +0300
 @@ -365,7 +365,8 @@
}
  
/* Protect from deadly infinite recursion. */
 -  while ((mtag = m_tag_locate(m, MTAG_NGIF, MTAG_NGIF_CALLED, NULL))) {
 +  mtag = NULL;
 +  while ((mtag = m_tag_locate(m, MTAG_NGIF, MTAG_NGIF_CALLED, mtag))) {
if (*(struct ifnet **)(mtag + 1) == ifp) {
log(LOG_NOTICE, Loop detected on %s\n, ifp-if_xname);
m_freem(m);
 
 --=-=-=--
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/133572: [ppp] [hang] incoming PPTP connection hangs the system

2009-06-30 Thread Mikolaj Golub

Could you try the patch from kern/134557?

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/133572: [ppp] [hang] incoming PPTP connection hangs the system

2009-06-30 Thread Mikolaj Golub
The following reply was made to PR kern/133572; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: Dennis Melentyev dennis.melent...@gmail.com
Cc: bug-follo...@freebsd.org, freebsd-net@FreeBSD.org
Subject: Re: kern/133572: [ppp] [hang] incoming PPTP connection hangs the  
system
Date: Tue, 30 Jun 2009 23:00:00 +0300

 Could you try the patch from kern/134557?
 
 -- 
 Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: panic with ng_ipfw+ng_car and net.inet.ip.fw.one_pass=0

2009-06-11 Thread Mikolaj Golub
On Fri, 5 Jun 2009 22:56:47 +0400 Oleg Bulyzhin wrote:

 On Fri, Jun 05, 2009 at 04:57:52PM +0300, Mikolaj Golub wrote:

 It works for me. With the patch I has not managed to crash the system using 
 my
 test. Some notes:
 
 - only ng_ipfw/ng_car subsystem has been tested (not dummynet).
 - my -current box is under qemu (I don't have real server running -current to
 test this).
 
 If you are interesting in some testing of dummynet before commiting this to
 current, let me know. I could try some tests but only the next week.

 I did some testing of dummynet though extra testing would not hurt. 

I see the patch has been commited to 8-CURRENT :-). Thanks.

I did some dummy tests on fixed current (simple dummynet configuration +
traffic + ipfw reloaded every second) and did not have any issues. At present
I don't have old -current without fix to reproduce the crash, but on 7-STABLE
running this test I saw in dmesg many

ipfw: ouch!, skip past end of rules, denying packet

messages and one time crashed the system. So it looks like my testbase rather
good and would have found problems with fixed current if they still had had.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: panic with ng_ipfw+ng_car and net.inet.ip.fw.one_pass=0

2009-06-05 Thread Mikolaj Golub
On Fri, 5 Jun 2009 00:47:20 +0400 Oleg Bulyzhin wrote:

 On Wed, Jun 03, 2009 at 09:03:11PM +0400, Oleg Bulyzhin wrote:
 On Mon, Jun 01, 2009 at 11:12:45AM +0300, Mikolaj Golub wrote:
 
  It looks the problem has not drawn much attention :-).
 
 I was on vacation so did not reply in time. 
 Dummynet like solution is not enough, dummynet is affected by this problem
 too.
 I'll send patch to you for testing tomorrow.

 Please test attached patch and let me know results.
 Patch made for -current and it changes ABI, so rebuilding ipfw with new
 headers required.

It works for me. With the patch I has not managed to crash the system using my
test. Some notes:

- only ng_ipfw/ng_car subsystem has been tested (not dummynet).
- my -current box is under qemu (I don't have real server running -current to
test this).

If you are interesting in some testing of dummynet before commiting this to
current, let me know. I could try some tests but only the next week.

If you are going to commit this to -current could you please fix ng_ipfw(4)
man page too?

Index: share/man/man4/ng_ipfw.4
===
--- share/man/man4/ng_ipfw.4(revision 193478)
+++ share/man/man4/ng_ipfw.4(working copy)
@@ -84,11 +84,12 @@
 struct ng_ipfw_tag {
struct m_tagmt; /* tag header */
struct ip_fw*rule;  /* matching rule */
+   uint32_trule_id;/* matching rule id */
+   uint32_tchain_id;   /* ruleset id */
 struct ifnet   *ifp;   /* interface, for ip_output */
int dir;/* packet direction */
 #defineNG_IPFW_OUT 0
 #defineNG_IPFW_IN  1
-   int flags;  /* flags, for ip_output() */
 };
 .Ed
 .Pp

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: panic with ng_ipfw+ng_car and net.inet.ip.fw.one_pass=0

2009-06-01 Thread Mikolaj Golub
On Mon, 25 May 2009 22:29:25 +0300 Mikolaj Golub wrote:

 Hi,

 Some times ago it has been posted to fido7.ru.unix.bsd about panics when using
 ipfw + ng_ipfw + ng_car.

 http://groups.google.com/group/fido7.ru.unix.bsd/browse_thread/thread/5907d1ba4e76675d

 For those who haven't learnt Russian yet ;-) here are some details. Max
 Irgiznov reported that when ng_ipf+ng_car construction was used and
 net.inet.ip.fw.one_pass=0 was set, the system reliably panicked on ipfw rules
 reload if there was some traffic through ng_car.

 The problem here is in the following. When the packet is returning back from
 ng_car queue to ipfw_chk and one_pass is turned off the next rule is being
 tried. But if the rules were reloaded while the packet was sitting in ng_car,
 the next rule pointer might be dangling and the kernel will panic.

 (kgdb) bt
 #0  doadump () at pcpu.h:196
 #1  0xc07e1f7e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
 #2  0xc07e2252 in panic (fmt=Variable fmt is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:574
 #3  0xc0495eb7 in db_panic (addr=Could not find the frame base for db_panic.
 ) at /usr/src/sys/ddb/db_command.c:446
 #4  0xc04968bc in db_command (last_cmdp=0xc0c97514, cmd_table=0x0, dopager=1)
 at /usr/src/sys/ddb/db_command.c:413
 #5  0xc04969ca in db_command_loop () at /usr/src/sys/ddb/db_command.c:466
 #6  0xc04981bd in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:228
 #7  0xc080ec76 in kdb_trap (type=12, code=0, tf=0xe6945774) at 
 /usr/src/sys/kern/subr_kdb.c:524
 #8  0xc0ad9e4f in trap_fatal (frame=0xe6945774, eva=3735929068) at 
 /usr/src/sys/i386/i386/trap.c:930
 #9  0xc0ada790 in trap (frame=0xe6945774) at /usr/src/sys/i386/i386/trap.c:320
 #10 0xc0abeaab in calltrap () at /usr/src/sys/i386/i386/exception.s:159
 #11 0xc903328c in ipfw_chk (args=0xe6945acc) at 
 /usr/src/sys/modules/ipfw/../../netinet/ip_fw2.c:2516
 #12 0xc90373f7 in ipfw_check_in (arg=0x0, m0=0xe6945bd0, ifp=0xc41f9000, 
 dir=1, inp=0x0)
 at /usr/src/sys/modules/ipfw/../../netinet/ip_fw_pfil.c:125
 #13 0xc088d6e8 in pfil_run_hooks (ph=0xc0d1f620, mp=0xe6945c24, 
 ifp=0xc41f9000, dir=1, inp=0x0)
 at /usr/src/sys/net/pfil.c:78
 #14 0xc08c766d in ip_input (m=0xc409ad00) at 
 /usr/src/sys/netinet/ip_input.c:416
 #15 0xc9011c39 in ng_ipfw_rcvdata (hook=0xc61a1780, item=0xc8fe5090)
 at /usr/src/sys/modules/netgraph/ipfw/../../../netgraph/ng_ipfw.c:250
 #16 0xc68b80af in ng_apply_item (node=0xc7054c00, item=0xc8fe5090, rw=0)
 at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2336
 #17 0xc68b939f in ngthread (arg=0x0) at 
 /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3304
 #18 0xc07be4c8 in fork_exit (callout=0xc68b91f0 ngthread, arg=0x0, 
 frame=0xe6945d38)
 at /usr/src/sys/kern/kern_fork.c:810
 #19 0xc0abeb20 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264
 (kgdb) frame 11
 #11 0xc903328c in ipfw_chk (args=0xe6945acc) at 
 /usr/src/sys/modules/ipfw/../../netinet/ip_fw2.c:2516
 warning: Source file is more recent than executable.

 2516if (set_disable  (1  f-set) )
 (kgdb) list
 2511ipfw_insn *cmd;
 2512uint32_t tablearg = 0;
 2513int l, cmdlen, skip_or; /* skip rest of OR block */
 2514
 2515again:
 2516if (set_disable  (1  f-set) )
 2517continue;
 2518
 2519skip_or = 0;
 2520for (l = f-cmd_len, cmd = f-cmd ; l  0 ;
 (kgdb) p f
 $1 = (struct ip_fw *) 0xdeadc0de
 (kgdb) 

 DUMMYNET does not have such problems as ip_dn_ruledel_ptr(rule) is called when
 the rule is removed in reap_rules(). The first thought was to do the same here
 i.e. to broadcast remove the rule message to netgraph nodes, but glancing
 through the netgraph man I haven't figured out how it could be done if it is
 possible at all.

 So the other solution is to have some counter that increases every time when
 any rules are removed. When the packet is directed by ipfw to netgraph
 subsystem, the current value of the counter is stored in mtag. When the packet
 is coming back the current value of the counter is compared with one from the
 mtag and if they differ the packet is dropped.

 Just to prove the concept I have modified ip_fw2.c for 7.2-STABLE accordingly
 and it works for me. The patch is attached.

It looks the problem has not drawn much attention :-).

Anyway, another version of the patch is attached. This time almost all of the
necessary modifications are done in ng_ipfw module. Only the small changes
have been made in ip_fw module and I tried to make them in the same manner as
it is done for dummynet.

The main logic is the same as in the previous patch: have internal counter
ng_ipfw_rdcnt that is increased every time when some rule is deleted from the
chain and compare it with one stored in ng_ipfw_tag when a packet passes
ng_ipfw_rcvdata().

The patch

panic with ng_ipfw+ng_car and net.inet.ip.fw.one_pass=0

2009-05-25 Thread Mikolaj Golub
Hi,

Some times ago it has been posted to fido7.ru.unix.bsd about panics when using
ipfw + ng_ipfw + ng_car.

http://groups.google.com/group/fido7.ru.unix.bsd/browse_thread/thread/5907d1ba4e76675d

For those who haven't learnt Russian yet ;-) here are some details. Max
Irgiznov reported that when ng_ipf+ng_car construction was used and
net.inet.ip.fw.one_pass=0 was set, the system reliably panicked on ipfw rules
reload if there was some traffic through ng_car.

The problem here is in the following. When the packet is returning back from
ng_car queue to ipfw_chk and one_pass is turned off the next rule is being
tried. But if the rules were reloaded while the packet was sitting in ng_car,
the next rule pointer might be dangling and the kernel will panic.

(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0xc07e1f7e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc07e2252 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0495eb7 in db_panic (addr=Could not find the frame base for db_panic.
) at /usr/src/sys/ddb/db_command.c:446
#4  0xc04968bc in db_command (last_cmdp=0xc0c97514, cmd_table=0x0, dopager=1)
at /usr/src/sys/ddb/db_command.c:413
#5  0xc04969ca in db_command_loop () at /usr/src/sys/ddb/db_command.c:466
#6  0xc04981bd in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:228
#7  0xc080ec76 in kdb_trap (type=12, code=0, tf=0xe6945774) at 
/usr/src/sys/kern/subr_kdb.c:524
#8  0xc0ad9e4f in trap_fatal (frame=0xe6945774, eva=3735929068) at 
/usr/src/sys/i386/i386/trap.c:930
#9  0xc0ada790 in trap (frame=0xe6945774) at /usr/src/sys/i386/i386/trap.c:320
#10 0xc0abeaab in calltrap () at /usr/src/sys/i386/i386/exception.s:159
#11 0xc903328c in ipfw_chk (args=0xe6945acc) at 
/usr/src/sys/modules/ipfw/../../netinet/ip_fw2.c:2516
#12 0xc90373f7 in ipfw_check_in (arg=0x0, m0=0xe6945bd0, ifp=0xc41f9000, dir=1, 
inp=0x0)
at /usr/src/sys/modules/ipfw/../../netinet/ip_fw_pfil.c:125
#13 0xc088d6e8 in pfil_run_hooks (ph=0xc0d1f620, mp=0xe6945c24, ifp=0xc41f9000, 
dir=1, inp=0x0)
at /usr/src/sys/net/pfil.c:78
#14 0xc08c766d in ip_input (m=0xc409ad00) at /usr/src/sys/netinet/ip_input.c:416
#15 0xc9011c39 in ng_ipfw_rcvdata (hook=0xc61a1780, item=0xc8fe5090)
at /usr/src/sys/modules/netgraph/ipfw/../../../netgraph/ng_ipfw.c:250
#16 0xc68b80af in ng_apply_item (node=0xc7054c00, item=0xc8fe5090, rw=0)
at /usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:2336
#17 0xc68b939f in ngthread (arg=0x0) at 
/usr/src/sys/modules/netgraph/netgraph/../../../netgraph/ng_base.c:3304
#18 0xc07be4c8 in fork_exit (callout=0xc68b91f0 ngthread, arg=0x0, 
frame=0xe6945d38)
at /usr/src/sys/kern/kern_fork.c:810
#19 0xc0abeb20 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264
(kgdb) frame 11
#11 0xc903328c in ipfw_chk (args=0xe6945acc) at 
/usr/src/sys/modules/ipfw/../../netinet/ip_fw2.c:2516
warning: Source file is more recent than executable.

2516if (set_disable  (1  f-set) )
(kgdb) list
2511ipfw_insn *cmd;
2512uint32_t tablearg = 0;
2513int l, cmdlen, skip_or; /* skip rest of OR block */
2514
2515again:
2516if (set_disable  (1  f-set) )
2517continue;
2518
2519skip_or = 0;
2520for (l = f-cmd_len, cmd = f-cmd ; l  0 ;
(kgdb) p f
$1 = (struct ip_fw *) 0xdeadc0de
(kgdb) 

DUMMYNET does not have such problems as ip_dn_ruledel_ptr(rule) is called when
the rule is removed in reap_rules(). The first thought was to do the same here
i.e. to broadcast remove the rule message to netgraph nodes, but glancing
through the netgraph man I haven't figured out how it could be done if it is
possible at all.

So the other solution is to have some counter that increases every time when
any rules are removed. When the packet is directed by ipfw to netgraph
subsystem, the current value of the counter is stored in mtag. When the packet
is coming back the current value of the counter is compared with one from the
mtag and if they differ the packet is dropped.

Just to prove the concept I have modified ip_fw2.c for 7.2-STABLE accordingly
and it works for me. The patch is attached.

I would like to hear other people opinion, first of all if the proposed idea
is good enough or there might be other better solutions for the problem
(e.g. remove the rule broadcasting is possible). But also if somebody have
any remarks about the patch itself I would happy to see them. E.g. I have
added the counter just as static variable but as for me struct ip_fw_chain
could be better place for this. Also is there any need to mark the tag with
MTAG_PERSISTENT bit?

-- 
Mikolaj Golub

--- sys/netinet/ip_fw2.c.orig	2009-05-24 14:25:30.0 +0300
+++ sys/netinet/ip_fw2.c	2009-05-25 19:30:33.0 +0300
@@ -111,6 +111,15 @@ static int fw_verbose;
 static struct callout ipfw_timeout;
 static int

Re: kern/133902: [tun] Killing tun0 iface ssh tunnel causes Panic String: page fault

2009-04-23 Thread Mikolaj Golub
The following reply was made to PR kern/133902; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: bug-follo...@freebsd.org
Cc: freebsd-b...@freebsd.org,  freebsd-net@FreeBSD.org, lsantagost...@gmail.com
Subject: Re: kern/133902: [tun] Killing tun0 iface ssh tunnel causes Panic 
String: page fault
Date: Thu, 23 Apr 2009 17:14:02 +0300

 I have asked Leonardo to provide more info and backtrace.
 
 So here is backtrace:
 
 cobra4# kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.0
 [GDB will not be able to debug user-mode threads:
 /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup]
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-marcel-freebsd.
 
 Unread portion of the kernel message buffer:
 
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x65656c7b
 fault code  = supervisor write, page not present
 instruction pointer = 0x20:0xc0786e00
 stack pointer   = 0x28:0xe958fac4
 frame pointer   = 0x28:0xe958fac4
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, def32 1, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 66873 (ssh)
 trap number = 12
 panic: page fault
 cpuid = 1
 Uptime: 54d11h21m54s
 Physical memory: 2023 MB
 Dumping 277 MB: 262 246 230 214 198 182 166 150 134 118 102 86 70 54 38 22 6
 
 #0  doadump () at pcpu.h:195
 195 pcpu.h: No such file or directory.
 in pcpu.h
 (kgdb) backtrace
 #0  doadump () at pcpu.h:195
 #1  0xc0754457 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
 #2  0xc0754719 in panic (fmt=Variable fmt is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:563
 #3  0xc0a4905c in trap_fatal (frame=0xe958fa84, eva=1701145723) at
 /usr/src/sys/i386/i386/trap.c:899
 #4  0xc0a492e0 in trap_pfault (frame=0xe958fa84, usermode=0,
 eva=1701145723) at /usr/src/sys/i386/i386/trap.c:812
 #5  0xc0a49c8c in trap (frame=0xe958fa84) at /usr/src/sys/i386/i386/trap.c:490
 #6  0xc0a2fc0b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
 #7  0xc0786e00 in clear_selinfo_list (td=0xca3fc840) at
 /usr/src/sys/kern/sys_generic.c:1065
 #8  0xc0788efc in kern_select (td=0xca3fc840, nd=8, fd_in=0x284010b8,
 fd_ou=0x284010bc, fd_ex=0x0, tvp=0x0) at
 /usr/src/sys/kern/sys_generic.c:794
 #9  0xc07890de in select (td=0xca3fc840, uap=0xe958fcfc) at
 /usr/src/sys/kern/sys_generic.c:663
 #10 0xc0a49635 in syscall (frame=0xe958fd38) at
 /usr/src/sys/i386/i386/trap.c:1035
 #11 0xc0a2fc70 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:196
 #12 0x0033 in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 (kgdb)
 
 The system panics on
 
 ifconfig tun0 destroy
 
 This issue is related to kern/116837.
 
 Leonardo, you can try the patch attached to that pr.
 
 -- 
 Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/133902: [tun] Killing tun0 iface ssh tunnel causes Panic String: page fault

2009-04-23 Thread Mikolaj Golub
I have asked Leonardo to provide more info and backtrace.

So here is backtrace:

cobra4# kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x65656c7b
fault code  = supervisor write, page not present
instruction pointer = 0x20:0xc0786e00
stack pointer   = 0x28:0xe958fac4
frame pointer   = 0x28:0xe958fac4
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 66873 (ssh)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 54d11h21m54s
Physical memory: 2023 MB
Dumping 277 MB: 262 246 230 214 198 182 166 150 134 118 102 86 70 54 38 22 6

#0  doadump () at pcpu.h:195
195 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) backtrace
#0  doadump () at pcpu.h:195
#1  0xc0754457 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc0754719 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc0a4905c in trap_fatal (frame=0xe958fa84, eva=1701145723) at
/usr/src/sys/i386/i386/trap.c:899
#4  0xc0a492e0 in trap_pfault (frame=0xe958fa84, usermode=0,
eva=1701145723) at /usr/src/sys/i386/i386/trap.c:812
#5  0xc0a49c8c in trap (frame=0xe958fa84) at /usr/src/sys/i386/i386/trap.c:490
#6  0xc0a2fc0b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0786e00 in clear_selinfo_list (td=0xca3fc840) at
/usr/src/sys/kern/sys_generic.c:1065
#8  0xc0788efc in kern_select (td=0xca3fc840, nd=8, fd_in=0x284010b8,
fd_ou=0x284010bc, fd_ex=0x0, tvp=0x0) at
/usr/src/sys/kern/sys_generic.c:794
#9  0xc07890de in select (td=0xca3fc840, uap=0xe958fcfc) at
/usr/src/sys/kern/sys_generic.c:663
#10 0xc0a49635 in syscall (frame=0xe958fd38) at
/usr/src/sys/i386/i386/trap.c:1035
#11 0xc0a2fc70 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:196
#12 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)

The system panics on

ifconfig tun0 destroy

This issue is related to kern/116837.

Leonardo, you can try the patch attached to that pr.

-- 
Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/132734: panic in net/if_mib.c

2009-04-23 Thread Mikolaj Golub
The following reply was made to PR kern/132734; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: Alexey Illarionov littlesav...@orionet.ru
Cc: bug-follo...@freebsd.org, Robert Watson rwat...@freebsd.org
Subject: Re: kern/132734: panic in net/if_mib.c
Date: Thu, 23 Apr 2009 22:29:36 +0300

 SVN rev 191435 on 2009-04-23 18:23:08Z by rwatson
 
 Merge r191434 from stable/7 to releng/7.2:
 
   In sysctl_ifdata(), query the ifnet pointer using the index only
   once, rather than querying it, validating it, and then re-querying
   it without validating it.  This may avoid a NULL pointer
   dereference and resulting kernel page fault if an interface is
   being deleted while bsnmp or other tools are querying data on the
   interface.
 
   The full fix, to properly refcount the interface for the duration
   of the sysctl, is in 8.x, but is considered too high-risk for
   7.2, so instead will appear in 7.3 (if all goes well).
 
 So, Alexey, can you try upgrading to the latest stable/7 or releng/7.2 or
 apply attached patch to see if this tweak at least eliminates the instant
 panic?
 
 --- if_mib.c   (revision 191424)
 +++ if_mib.c   (working copy)
 @@ -82,11 +82,9 @@
return EINVAL;
 
if (name[0] = 0 || name[0]  if_index ||
 -  ifnet_byindex(name[0]) == NULL)
 +  (ifp = ifnet_byindex(name[0])) == NULL)
return ENOENT;
 
 -  ifp = ifnet_byindex(name[0]);
 -
switch(name[1]) {
default:
return ENOENT;
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: kern/131310: [netgraph] [panic] 7.1 panics with mpd netgraph interface changes

2009-04-10 Thread Mikolaj Golub
The following reply was made to PR kern/131310; it has been noted by GNATS.

From: Mikolaj Golub to.my.troc...@gmail.com
To: bug-follo...@freebsd.org,Vitaly Dodonov dreamer@gmail.com
Cc: Semenchuk Oleg darki...@gmail.com
Subject: Re: kern/131310: [netgraph] [panic] 7.1 panics with mpd netgraph 
interface changes
Date: Fri, 10 Apr 2009 15:09:38 +0300

 This pr is closely related to kern/130977. You can try the patch from it, which
 adds if_delgroup(ifp, IFG_ALL) to if_detach().
 
 -- 
 Mikolaj Golub
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org