On Apr 23, 2009, at 11:06 PM, Liang Zhen wrote:
Hi there,I've posted this in rhel5-list, but I'm not sure whether it's the rightplace so I post it here again...
I'm not sure which rhel5-list you are referring to, but I'm certain I'm not on it, and I'm certain that it's not one of our SLA assured support mechanisms.
We got this assertion while running inkernel OFED of RHEL5.3: Apr 15 08:06:24 cl8-0 kernel: RTNL: assertion failed at net/core/fib_rules.c (388) Apr 15 08:06:24 cl8-0 kernel: Apr 15 08:06:24 cl8-0 kernel: Call Trace:Apr 15 08:06:24 cl8-0 kernel: [<ffffffff802288ee>] fib_rules_event +0x3d/0xffApr 15 08:06:24 cl8-0 kernel: [<ffffffff80066f1f>] notifier_call_chain+0x20/0x32Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8021ba64>] dev_set_mtu+0x5a/ 0x60Apr 15 08:06:24 cl8-0 kernel: [<ffffffff88446bb5>] :ib_ipoib:set_mode+0x94/0x134 Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80106b0a>] sysfs_write_file+0xb9/0xe8Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8001659e>] vfs_write+0xce/ 0x174Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80016e6b>] sys_write+0x45/0x6eApr 15 08:06:24 cl8-0 kernel: [<ffffffff8005d116>] system_call+0x7e/ 0x83Apr 15 08:06:24 cl8-0 kernel: Apr 15 08:06:24 cl8-0 kernel: RTNL: assertion failed at net/ipv4/devinet.c (986) Apr 15 08:06:24 cl8-0 kernel: Apr 15 08:06:24 cl8-0 kernel: Call Trace:Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8024e80c>] inetdev_event +0x48/0x282Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80066f1f>] notifier_call_chain+0x20/0x32Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8021ba64>] dev_set_mtu+0x5a/ 0x60Apr 15 08:06:24 cl8-0 kernel: [<ffffffff88446bb5>] :ib_ipoib:set_mode+0x94/0x134 Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80106b0a>] sysfs_write_file+0xb9/0xe8Apr 15 08:06:24 cl8-0 kernel: [<ffffffff8001659e>] vfs_write+0xce/ 0x174Apr 15 08:06:24 cl8-0 kernel: [<ffffffff80016e6b>] sys_write+0x45/0x6eApr 15 08:06:24 cl8-0 kernel: [<ffffffff8005d116>] system_call+0x7e/ 0x83Apr 15 08:06:24 cl8-0 kernel: When looking into code I found:sysfs_write_file()->flush_write_buffer()->store()- >ipoib_cm.c::set_mode()->dev_set_mtu()->raw_notifier_call_chain- >notifier_call_chain()->fib_rules_event()->ASSERT_RTNL(). So, ipoib_cm called dev_set_mtu without rtnl_lock, but dev_set_mtu will assert caller already has rtnl_lock.I think we may need this patch, could somebody confirm this? Thanks Liang --- drivers/infiniband/ulp/ipoib/ipoib_cm.c 2009-04-16 12:49:04.000000000 -0400 +++ drivers/infiniband/ulp/ipoib/ipoib_cm.c 2009-04-16 12:48:52.000000000 -0400 @@ -1481,7 +1481,9 @@ static ssize_t set_mode(struct class_dev if (ipoib_cm_max_mtu(dev) > priv->mcast_mtu) ipoib_warn(priv, "mtu > %d will cause multicast packet drops.\n", priv->mcast_mtu); + rtnl_lock(); dev_set_mtu(dev, ipoib_cm_max_mtu(dev)); + rtnl_unlock(); ipoib_flush_paths(dev); return count;
No, you don't want this patch. The infinband core in OFED 1.3.2 (used in rhel5.3) is not ready for this patch. There are additional changes needed to the core code to deal with handling work queue flushes and deciding whether or not to process events during those work queue flushes depending on the code path that we got to that point from. Without that additional infrastructure changes, the change to use dev_set_mtu and take the rtnl_lock resulted in lockups during attempts to ifdown interfaces (either ones in connected mode or unconnected mode, can't remember which, but one way worked and the other was lockup city). We reverted this patch due to those lockups. Instead, we only support setting connected mode and setting the device mtu as part of the bringup of the interface (aka, ifup ib0 when you've added CONNECTED_MODE=yes and MTU=65520 to /etc/sysconfig/network-scripts/ ifcfg-ib0). Under those conditions, the kernel works fine and does not present a risk.
-- Doug Ledford <[email protected]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford InfiniBand Specific RPMS http://people.redhat.com/dledford/Infiniband
PGP.sig
Description: This is a digitally signed message part
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
