[ewg] [PATCH] IB/ehca: Fix lockdep failures for shca_list_lock

2008-11-21 Thread Joachim Fenkes
From: Michael Ellerman [EMAIL PROTECTED]

shca_list_lock is taken from softirq context in ehca_poll_eqs, so we need to
lock IRQ safe elsewhere.

Signed-off-by: Michael Ellerman [EMAIL PROTECTED]
Acked-by: Joachim Fenkes [EMAIL PROTECTED]
---
 drivers/infiniband/hw/ehca/ehca_main.c |   17 ++---
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_main.c 
b/drivers/infiniband/hw/ehca/ehca_main.c
index bb02a86..021c454 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -717,6 +717,7 @@ static int __devinit ehca_probe(struct of_device *dev,
const u64 *handle;
struct ib_pd *ibpd;
int ret, i, eq_size;
+   u64 flags;
 
handle = of_get_property(dev-node, ibm,hca-handle, NULL);
if (!handle) {
@@ -830,9 +831,9 @@ static int __devinit ehca_probe(struct of_device *dev,
ehca_err(shca-ib_device,
 Cannot create device attributes  ret=%d, ret);
 
-   spin_lock(shca_list_lock);
+   spin_lock_irqsave(shca_list_lock, flags);
list_add(shca-shca_list, shca_list);
-   spin_unlock(shca_list_lock);
+   spin_unlock_irqrestore(shca_list_lock, flags);
 
return 0;
 
@@ -878,6 +879,7 @@ probe1:
 static int __devexit ehca_remove(struct of_device *dev)
 {
struct ehca_shca *shca = dev-dev.driver_data;
+   u64 flags;
int ret;
 
sysfs_remove_group(dev-dev.kobj, ehca_dev_attr_grp);
@@ -915,9 +917,9 @@ static int __devexit ehca_remove(struct of_device *dev)
 
ib_dealloc_device(shca-ib_device);
 
-   spin_lock(shca_list_lock);
+   spin_lock_irqsave(shca_list_lock, flags);
list_del(shca-shca_list);
-   spin_unlock(shca_list_lock);
+   spin_unlock_irqrestore(shca_list_lock, flags);
 
return ret;
 }
@@ -975,6 +977,7 @@ static int ehca_mem_notifier(struct notifier_block *nb,
 unsigned long action, void *data)
 {
static unsigned long ehca_dmem_warn_time;
+   unsigned long flags;
 
switch (action) {
case MEM_CANCEL_OFFLINE:
@@ -985,12 +988,12 @@ static int ehca_mem_notifier(struct notifier_block *nb,
case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
/* only ok if no hca is attached to the lpar */
-   spin_lock(shca_list_lock);
+   spin_lock_irqsave(shca_list_lock, flags);
if (list_empty(shca_list)) {
-   spin_unlock(shca_list_lock);
+   spin_unlock_irqrestore(shca_list_lock, flags);
return NOTIFY_OK;
} else {
-   spin_unlock(shca_list_lock);
+   spin_unlock_irqrestore(shca_list_lock, flags);
if (printk_timed_ratelimit(ehca_dmem_warn_time,
   30 * 1000))
ehca_gen_err(DMEM operations are not allowed
-- 
1.5.5



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Re: [PATCH] IB/ehca: Fix lockdep failures for shca_list_lock

2008-11-21 Thread Johannes Berg
On Fri, 2008-11-21 at 16:37 +0100, Joachim Fenkes wrote:

 + u64 flags;

 - spin_lock(shca_list_lock);
 + spin_lock_irqsave(shca_list_lock, flags);

That's wrong and I think will give a warning on all machines where
u64 != unsigned long. Might not particularly matter in this case.

Also, generally it seems wrong to say fix lockdep failure when the
patch really fixes a bug that lockdep happened to find.

johannes


signature.asc
Description: This is a digitally signed message part
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] Re: [PATCH] IB/ehca: Fix locking for shca_list_lock

2008-11-21 Thread Roland Dreier
Looks good... I'll add this for 2.6.29, since as far as I can tell this
bug has been there approximately forever already.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] trouble getting NFS/RDMA modules to load

2008-11-21 Thread John Fitzgerald
I've been trying to get NFS/RDMA setup and working on a Linux client, 
but have been hitting some roadblocks.
I'm using a 2.6.26 kernel, and have gotten NFS on setup as a client 
working over TCP/Ethernet, and working TCP/IB, but can't get RDMA 
modules to load.


The errors are the same as listed in an old posting (shown below) except 
with svcrdma the module not loading:

svcrdma: disagrees about version of symbol ib_create_cq
svcrdma: Unknown symbol ib_create_cq
...

I get the same errors if I try to load xprtrdma.  I searched around for 
a solution to the problem below but struck out.  The posting below is 
old (OFED 1.2), I'm using OFED 1.4, and see the same thing with rc1 and rc5.


Any advice?

Thanks,
John.
[EMAIL PROTECTED]


Old posting shown here:

[openfabrics-ewg] bug 355 - problems building modules that depend on the 
ofed 1.2 modules

Steve Wise swise at opengridcomputing.com
Thu Feb 15 09:12:06 PST 2007

* Previous message: [ewg] IPoIB_HA not working properly with 
OFED1.2-alpha

* Next message: [openfabrics-ewg] kernel_addons patch for ipath support
* Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

All,

I've run into the following problem.  Bug 335 opened to track this...

I install the alpha1 ofed 1.2 rpms on a RHEL5b2 system with its
2.6.18-1.2747.el5 kernel.

Then I build a module outside of the kernel that uses the IB verbs and
RDMA CM kernel interface.  (krping).  This module builds and loads ok on
a stock 2.6.20 system with ofed1.2 installed, but it fails to load on
the rhel5b2 system with a version symbol problem.  Here is a snipit of
the errors:

rdma_krping: disagrees about version of symbol ib_create_cq
rdma_krping: Unknown symbol ib_create_cq
rdma_krping: disagrees about version of symbol rdma_resolve_addr
rdma_krping: Unknown symbol rdma_resolve_addr
rdma_krping: disagrees about version of symbol ib_dereg_mr
rdma_krping: Unknown symbol ib_dereg_mr

I'm wondering if maybe the ofed modules are _not_ being build with src
versioning even if the kernel has it turned on?

We see similar problems with NFS-RDMA trying to use OFED 1.2 modules.
And the NFS-RDMA works with OFED 1.1 modules, so I _think_ something is
whacked with the OFED 1.2 build process.


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] trouble getting NFS/RDMA modules to load

2008-11-21 Thread Jeff Becker
Hi John. I'm the NFS/RDMA maintainer. You are correct that 2.6.26 has
some issues that I am working on. Would it be possible for you to use a
2.6.27 kernel? Thanks.

Jeff Becker

John Fitzgerald wrote:
 I've been trying to get NFS/RDMA setup and working on a Linux client,
 but have been hitting some roadblocks.
 I'm using a 2.6.26 kernel, and have gotten NFS on setup as a client
 working over TCP/Ethernet, and working TCP/IB, but can't get RDMA
 modules to load.

 The errors are the same as listed in an old posting (shown below)
 except with svcrdma the module not loading:
 svcrdma: disagrees about version of symbol ib_create_cq
 svcrdma: Unknown symbol ib_create_cq
 ...

 I get the same errors if I try to load xprtrdma.  I searched around
 for a solution to the problem below but struck out.  The posting below
 is old (OFED 1.2), I'm using OFED 1.4, and see the same thing with rc1
 and rc5.

 Any advice?

 Thanks,
 John.
 [EMAIL PROTECTED]


 Old posting shown here:

 [openfabrics-ewg] bug 355 - problems building modules that depend on
 the ofed 1.2 modules
 Steve Wise swise at opengridcomputing.com
 Thu Feb 15 09:12:06 PST 2007

 * Previous message: [ewg] IPoIB_HA not working properly with
 OFED1.2-alpha
 * Next message: [openfabrics-ewg] kernel_addons patch for ipath
 support
 * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

 All,

 I've run into the following problem.  Bug 335 opened to track this...

 I install the alpha1 ofed 1.2 rpms on a RHEL5b2 system with its
 2.6.18-1.2747.el5 kernel.

 Then I build a module outside of the kernel that uses the IB verbs and
 RDMA CM kernel interface.  (krping).  This module builds and loads ok on
 a stock 2.6.20 system with ofed1.2 installed, but it fails to load on
 the rhel5b2 system with a version symbol problem.  Here is a snipit of
 the errors:

 rdma_krping: disagrees about version of symbol ib_create_cq
 rdma_krping: Unknown symbol ib_create_cq
 rdma_krping: disagrees about version of symbol rdma_resolve_addr
 rdma_krping: Unknown symbol rdma_resolve_addr
 rdma_krping: disagrees about version of symbol ib_dereg_mr
 rdma_krping: Unknown symbol ib_dereg_mr

 I'm wondering if maybe the ofed modules are _not_ being build with src
 versioning even if the kernel has it turned on?

 We see similar problems with NFS-RDMA trying to use OFED 1.2 modules.
 And the NFS-RDMA works with OFED 1.1 modules, so I _think_ something is
 whacked with the OFED 1.2 build process.


 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Panel at SC08/Austin

2008-11-21 Thread Hebenstreit, Michael
At the session I raised points about missing documentation and was askled to 
summarize my ideas and write it to this list. specifically I would like to see

a) a PDF binder of all mans/docs already available in the distribution on the 
web site
b) a howto start with OFED (example: a collegue of mine had no idea that he 
needs a running opensm ...)
c) for each special feature like ipoib, sdp, opensm... one or two pages 
describing WHAT the technology want's to achieve, plus some examples how it is 
used; how to enable/configure it
d) on technologies like VERB/DAPL/...: one or two pages describing WHAT the 
technology want's to achieve, plus some examples how it is used; a few simple 
examples how to program with the libraries (at the level of a MPI introduction)

best regards
Michael



Michael Hebenstreit Senior Cluster Architect
Intel Corporation   Software and Services Group/DRD
2800 N Center Dr, DP3-307   Tel.:   +1 253 371 3144
WA 98327, DuPont
UNITED STATES   E-mail: [EMAIL PROTECTED]



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg