[ewg] [PATCH] IB/ehca: Fix lockdep failures for shca_list_lock
From: Michael Ellerman [EMAIL PROTECTED] shca_list_lock is taken from softirq context in ehca_poll_eqs, so we need to lock IRQ safe elsewhere. Signed-off-by: Michael Ellerman [EMAIL PROTECTED] Acked-by: Joachim Fenkes [EMAIL PROTECTED] --- drivers/infiniband/hw/ehca/ehca_main.c | 17 ++--- 1 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index bb02a86..021c454 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -717,6 +717,7 @@ static int __devinit ehca_probe(struct of_device *dev, const u64 *handle; struct ib_pd *ibpd; int ret, i, eq_size; + u64 flags; handle = of_get_property(dev-node, ibm,hca-handle, NULL); if (!handle) { @@ -830,9 +831,9 @@ static int __devinit ehca_probe(struct of_device *dev, ehca_err(shca-ib_device, Cannot create device attributes ret=%d, ret); - spin_lock(shca_list_lock); + spin_lock_irqsave(shca_list_lock, flags); list_add(shca-shca_list, shca_list); - spin_unlock(shca_list_lock); + spin_unlock_irqrestore(shca_list_lock, flags); return 0; @@ -878,6 +879,7 @@ probe1: static int __devexit ehca_remove(struct of_device *dev) { struct ehca_shca *shca = dev-dev.driver_data; + u64 flags; int ret; sysfs_remove_group(dev-dev.kobj, ehca_dev_attr_grp); @@ -915,9 +917,9 @@ static int __devexit ehca_remove(struct of_device *dev) ib_dealloc_device(shca-ib_device); - spin_lock(shca_list_lock); + spin_lock_irqsave(shca_list_lock, flags); list_del(shca-shca_list); - spin_unlock(shca_list_lock); + spin_unlock_irqrestore(shca_list_lock, flags); return ret; } @@ -975,6 +977,7 @@ static int ehca_mem_notifier(struct notifier_block *nb, unsigned long action, void *data) { static unsigned long ehca_dmem_warn_time; + unsigned long flags; switch (action) { case MEM_CANCEL_OFFLINE: @@ -985,12 +988,12 @@ static int ehca_mem_notifier(struct notifier_block *nb, case MEM_GOING_ONLINE: case MEM_GOING_OFFLINE: /* only ok if no hca is attached to the lpar */ - spin_lock(shca_list_lock); + spin_lock_irqsave(shca_list_lock, flags); if (list_empty(shca_list)) { - spin_unlock(shca_list_lock); + spin_unlock_irqrestore(shca_list_lock, flags); return NOTIFY_OK; } else { - spin_unlock(shca_list_lock); + spin_unlock_irqrestore(shca_list_lock, flags); if (printk_timed_ratelimit(ehca_dmem_warn_time, 30 * 1000)) ehca_gen_err(DMEM operations are not allowed -- 1.5.5 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] IB/ehca: Fix lockdep failures for shca_list_lock
On Fri, 2008-11-21 at 16:37 +0100, Joachim Fenkes wrote: + u64 flags; - spin_lock(shca_list_lock); + spin_lock_irqsave(shca_list_lock, flags); That's wrong and I think will give a warning on all machines where u64 != unsigned long. Might not particularly matter in this case. Also, generally it seems wrong to say fix lockdep failure when the patch really fixes a bug that lockdep happened to find. johannes signature.asc Description: This is a digitally signed message part ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Re: [PATCH] IB/ehca: Fix locking for shca_list_lock
Looks good... I'll add this for 2.6.29, since as far as I can tell this bug has been there approximately forever already. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] trouble getting NFS/RDMA modules to load
I've been trying to get NFS/RDMA setup and working on a Linux client, but have been hitting some roadblocks. I'm using a 2.6.26 kernel, and have gotten NFS on setup as a client working over TCP/Ethernet, and working TCP/IB, but can't get RDMA modules to load. The errors are the same as listed in an old posting (shown below) except with svcrdma the module not loading: svcrdma: disagrees about version of symbol ib_create_cq svcrdma: Unknown symbol ib_create_cq ... I get the same errors if I try to load xprtrdma. I searched around for a solution to the problem below but struck out. The posting below is old (OFED 1.2), I'm using OFED 1.4, and see the same thing with rc1 and rc5. Any advice? Thanks, John. [EMAIL PROTECTED] Old posting shown here: [openfabrics-ewg] bug 355 - problems building modules that depend on the ofed 1.2 modules Steve Wise swise at opengridcomputing.com Thu Feb 15 09:12:06 PST 2007 * Previous message: [ewg] IPoIB_HA not working properly with OFED1.2-alpha * Next message: [openfabrics-ewg] kernel_addons patch for ipath support * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] All, I've run into the following problem. Bug 335 opened to track this... I install the alpha1 ofed 1.2 rpms on a RHEL5b2 system with its 2.6.18-1.2747.el5 kernel. Then I build a module outside of the kernel that uses the IB verbs and RDMA CM kernel interface. (krping). This module builds and loads ok on a stock 2.6.20 system with ofed1.2 installed, but it fails to load on the rhel5b2 system with a version symbol problem. Here is a snipit of the errors: rdma_krping: disagrees about version of symbol ib_create_cq rdma_krping: Unknown symbol ib_create_cq rdma_krping: disagrees about version of symbol rdma_resolve_addr rdma_krping: Unknown symbol rdma_resolve_addr rdma_krping: disagrees about version of symbol ib_dereg_mr rdma_krping: Unknown symbol ib_dereg_mr I'm wondering if maybe the ofed modules are _not_ being build with src versioning even if the kernel has it turned on? We see similar problems with NFS-RDMA trying to use OFED 1.2 modules. And the NFS-RDMA works with OFED 1.1 modules, so I _think_ something is whacked with the OFED 1.2 build process. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] trouble getting NFS/RDMA modules to load
Hi John. I'm the NFS/RDMA maintainer. You are correct that 2.6.26 has some issues that I am working on. Would it be possible for you to use a 2.6.27 kernel? Thanks. Jeff Becker John Fitzgerald wrote: I've been trying to get NFS/RDMA setup and working on a Linux client, but have been hitting some roadblocks. I'm using a 2.6.26 kernel, and have gotten NFS on setup as a client working over TCP/Ethernet, and working TCP/IB, but can't get RDMA modules to load. The errors are the same as listed in an old posting (shown below) except with svcrdma the module not loading: svcrdma: disagrees about version of symbol ib_create_cq svcrdma: Unknown symbol ib_create_cq ... I get the same errors if I try to load xprtrdma. I searched around for a solution to the problem below but struck out. The posting below is old (OFED 1.2), I'm using OFED 1.4, and see the same thing with rc1 and rc5. Any advice? Thanks, John. [EMAIL PROTECTED] Old posting shown here: [openfabrics-ewg] bug 355 - problems building modules that depend on the ofed 1.2 modules Steve Wise swise at opengridcomputing.com Thu Feb 15 09:12:06 PST 2007 * Previous message: [ewg] IPoIB_HA not working properly with OFED1.2-alpha * Next message: [openfabrics-ewg] kernel_addons patch for ipath support * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] All, I've run into the following problem. Bug 335 opened to track this... I install the alpha1 ofed 1.2 rpms on a RHEL5b2 system with its 2.6.18-1.2747.el5 kernel. Then I build a module outside of the kernel that uses the IB verbs and RDMA CM kernel interface. (krping). This module builds and loads ok on a stock 2.6.20 system with ofed1.2 installed, but it fails to load on the rhel5b2 system with a version symbol problem. Here is a snipit of the errors: rdma_krping: disagrees about version of symbol ib_create_cq rdma_krping: Unknown symbol ib_create_cq rdma_krping: disagrees about version of symbol rdma_resolve_addr rdma_krping: Unknown symbol rdma_resolve_addr rdma_krping: disagrees about version of symbol ib_dereg_mr rdma_krping: Unknown symbol ib_dereg_mr I'm wondering if maybe the ofed modules are _not_ being build with src versioning even if the kernel has it turned on? We see similar problems with NFS-RDMA trying to use OFED 1.2 modules. And the NFS-RDMA works with OFED 1.1 modules, so I _think_ something is whacked with the OFED 1.2 build process. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Panel at SC08/Austin
At the session I raised points about missing documentation and was askled to summarize my ideas and write it to this list. specifically I would like to see a) a PDF binder of all mans/docs already available in the distribution on the web site b) a howto start with OFED (example: a collegue of mine had no idea that he needs a running opensm ...) c) for each special feature like ipoib, sdp, opensm... one or two pages describing WHAT the technology want's to achieve, plus some examples how it is used; how to enable/configure it d) on technologies like VERB/DAPL/...: one or two pages describing WHAT the technology want's to achieve, plus some examples how it is used; a few simple examples how to program with the libraries (at the level of a MPI introduction) best regards Michael Michael Hebenstreit Senior Cluster Architect Intel Corporation Software and Services Group/DRD 2800 N Center Dr, DP3-307 Tel.: +1 253 371 3144 WA 98327, DuPont UNITED STATES E-mail: [EMAIL PROTECTED] ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg