Re: Strange IPsec freeze/partial fix

2006-02-13 Thread David S. Miller
From: Olaf Kirch [EMAIL PROTECTED]
Date: Wed, 8 Feb 2006 12:59:37 +0100

 On Wed, Feb 08, 2006 at 07:46:48AM +1100, Herbert Xu wrote:
  I suggest that we simply bail out always.  If the dst decides to die
  on us later on, the packet will be dropped anyway.  So there is no
  great urgency to retry here.  Once we have the proper resolution
  queueing, we can then do the retry again.
 
 Yes, that is simpler, and should work as well.
 
  Signed-off-by: Herbert Xu [EMAIL PROTECTED]
 
 Acked-by: Olaf Kirch [EMAIL PROTECTED]

Applied, thanks everyone.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange IPsec freeze/partial fix

2006-02-08 Thread Olaf Kirch
On Wed, Feb 08, 2006 at 07:46:48AM +1100, Herbert Xu wrote:
 I suggest that we simply bail out always.  If the dst decides to die
 on us later on, the packet will be dropped anyway.  So there is no
 great urgency to retry here.  Once we have the proper resolution
 queueing, we can then do the retry again.

Yes, that is simpler, and should work as well.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Acked-by: Olaf Kirch [EMAIL PROTECTED]

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Strange IPsec freeze/partial fix

2006-02-07 Thread Olaf Kirch
Hi,

there's a problem with IPsec that has been bugging some of our users
for the last couple of kernel revs. Every now and then, IPsec will
freeze the machine completely. This is with openswan user land,
and with kernels up to and including 2.6.16-rc2.

I managed to debug this a little, and what happens is that we end
up looping in xfrm_lookup, and never get out. With a bit of debug
printks added, I can this happening:

ip_route_output_flow calls xfrm_lookup

xfrm_find_bundle returns NULL (apparently we're in the
middle of negotiating a new SA or something)

We therefore call xfrm_tmpl_resolve. This returns EAGAIN
We go to sleep, waiting for a policy update.
Then we loop back to the top

Apparently, the dst_orig that was passed into xfrm_lookup
has been dropped from the routing table (obsolete=2)
This leads to the endless loop, because we now create
a new bundle, check the new bundle and find it's stale
(stale_bundle - xfrm_bundle_ok - dst_check() return 0)

People have been testing with the patch below, which seems to fix the
problem partially. They still see connection hangs however (things
only clear up when they start a new ping or new ssh). So the patch
is obvsiouly not sufficient, and something else seems to go wrong.

I'm grateful for any hints you may have...

Olaf
-- 
Subject: [XFRM] Fix infinite loop in xfrm_lookup

It seems that the route xfrm_lookup is given on input can go
away when we sleep.

Signed-off-by: Olaf Kirch [EMAIL PROTECTED]

 net/ipv4/route.c   |   25 -
 net/xfrm/xfrm_policy.c |   16 
 2 files changed, 32 insertions(+), 9 deletions(-)

diff -r df2df438c970 net/ipv4/route.c
--- a/net/ipv4/route.c  Mon Feb  6 14:08:26 2006 -0500
+++ b/net/ipv4/route.c  Mon Feb  6 15:52:09 2006 -0500
@@ -2609,18 +2609,25 @@ int ip_route_output_flow(struct rtable *
 {
int err;
 
-   if ((err = __ip_route_output_key(rp, flp)) != 0)
-   return err;
-
-   if (flp-proto) {
-   if (!flp-fl4_src)
-   flp-fl4_src = (*rp)-rt_src;
-   if (!flp-fl4_dst)
-   flp-fl4_dst = (*rp)-rt_dst;
-   return xfrm_lookup((struct dst_entry **)rp, flp, sk, flags);
-   }
-
-   return 0;
+   if (flp-proto == 0) {
+   err = __ip_route_output_key(rp, flp);
+   } else {
+   u32 fl_src = flp-fl4_src, fl_dst = flp-fl4_dst;
+   int repeat = 1;
+
+   do {
+   if ((err = __ip_route_output_key(rp, flp)) != 0)
+   break;
+
+   if (!fl_src)
+   flp-fl4_src = (*rp)-rt_src;
+   if (!fl_dst)
+   flp-fl4_dst = (*rp)-rt_dst;
+   err = xfrm_lookup((struct dst_entry **)rp, flp, sk, 
flags);
+   } while (err == -EAGAIN  repeat--);
+   }
+
+   return err;
 }
 
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
diff -r df2df438c970 net/xfrm/xfrm_policy.c
--- a/net/xfrm/xfrm_policy.cMon Feb  6 14:08:26 2006 -0500
+++ b/net/xfrm/xfrm_policy.cMon Feb  6 15:52:09 2006 -0500
@@ -786,7 +786,22 @@ int xfrm_lookup(struct dst_entry **dst_p
u16 family = dst_orig-ops-family;
u8 dir = policy_to_flow_dir(XFRM_POLICY_OUT);
u32 sk_sid = security_sk_sid(sk, fl, dir);
+   int loops = 0;
+
 restart:
+   if (loops  dst_orig  dst_orig-obsolete  0) {
+   printk(KERN_NOTICE xfrm_lookup: route is stale (obsolete=%d, 
loops=%d)\n,
+   dst_orig-obsolete, loops);
+   err = -EAGAIN;
+   goto error_nopol;
+   }
+   if (unlikely(++loops  10)) {
+   printk(KERN_NOTICE xfrm_lookup bailing out after %d loops\n, 
loops);
+   dump_stack();
+   err = -EHOSTUNREACH;
+   goto error_nopol;
+   }
+
genid = atomic_read(flow_cache_genid);
policy = NULL;
if (sk  sk-sk_policy[1])
@@ -854,6 +869,7 @@ restart:
}
if (nx == -EAGAIN ||
genid != atomic_read(flow_cache_genid)) {
+   printk(KERN_NOTICE xfrm_tmpl_resolve 
says EAGAIN, try again\n);
xfrm_pol_put(policy);
goto restart;
}
@@ -903,8 +919,9 @@ restart:
return 0;
 
 error:
+   xfrm_pol_put(policy);
+error_nopol:
dst_release(dst_orig);
-   xfrm_pol_put(policy);
*dst_p = NULL;
return err;
 }
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe 

Re: Strange IPsec freeze/partial fix

2006-02-07 Thread Herbert Xu
Olaf Kirch [EMAIL PROTECTED] wrote:
 
 People have been testing with the patch below, which seems to fix the
 problem partially. They still see connection hangs however (things
 only clear up when they start a new ping or new ssh). So the patch
 is obvsiouly not sufficient, and something else seems to go wrong.

I suggest that we simply bail out always.  If the dst decides to die
on us later on, the packet will be dropped anyway.  So there is no
great urgency to retry here.  Once we have the proper resolution
queueing, we can then do the retry again.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -890,7 +890,9 @@ restart:
xfrm_pol_put(policy);
if (dst)
dst_free(dst);
-   goto restart;
+
+   err = -EHOSTUNREACH;
+   goto error;
}
dst-next = policy-bundles;
policy-bundles = dst;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html