Issue with RDMA_CM on systems with multiple IB HCA's.

2010-07-21 Thread Hari Subramoni
Hi All,

I'm trying to run 'ib_rdma_bw' test (part of the perftest suite) on a
cluster with two IB ConnectX DDR HCA's. The OFED version I'm using is
1.5.1. OpenSM is running on the network and the ports are up and active.
I see that whenever I use RDMA_CM to establish connections, the program
quits with the error given below (the test runs fine if we don't use
RDMA_CM).

I recall seeing a post mentioning some issues with RDMA_CM on systems
with multiple HCA's. I was wondering whether this has been resolved with
the latest OFED.

Any input on this issue would be greatly appreciated. Also, it would be
great if anyone could point me to any open bugs on this issue so that I
can track it's status.

[subra...@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5
11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 |
duplex=0 | cma=1 |
11928: Local address:  LID , QPN 00, PSN 0x5bfbba RKey 0x90042602
VAddr 0x002b27feabe000
11928: Remote address: LID , QPN 00, PSN 0x392fe6, RKey 0xf8042605
VAddr 0x002b9d5c93b000

11928:pp_send_start: bad wc status 12
11928:main: Completion with error at client:
11928:main: Failed status 5: wr_id 3
11928:main: scnt=100, ccnt=0

Thanks in advance,
Hari.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rdma/ib_cm: check LAP state before sending an MRA

2010-07-21 Thread Hefty, Sean
This problem was originally reported by Arthur Kepner :

We have a customer who has repeatedly had system panics with 
the following signature:

Unable to handle kernel NULL pointer dereference at 0010 RIP:
{:ib_cm:ib_cm_init_qp_attr+580}
PGD 3a2db6067 PUD 0
Oops:  [1] SMP
last sysfs file: /class/infiniband/mlx4_0/node_guid
CPU 4
Modules linked in: i2c_dev sg sd_mod crc32c libcrc32c iscsi_tcp libiscsi
scsi_transport_iscsi rdma_ucm rdma_cm
iw_cm ib_addr ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad iw_cxgb3 cxgb3
firmware_class mlx4_ib ib_mthca ib_mad
 ib_core loop numatools xpmem worm mlx4_core libata i2c_i801 scsi_mod i2c_core
shpchp pci_hotplug nfs lockd nfs
_acl af_packet sunrpc e1000
Pid: 3256, comm: star Tainted: G U 2.6.16.60-0.34-smp #1
RIP: 0010:[]
{:ib_cm:ib_cm_init_qp_attr+580}
RSP: 0018:810369d09d38  EFLAGS: 00010046
RAX:  RBX: 810419678c00 RCX: 0008
RDX: 0246 RSI: 810419678d18 RDI: 810369d09e70
RBP: 810369d09e18 R08: 0003003d R09: 
R10: 810369d09e18 R11: 0088 R12: 810369d09d88
R13:  R14: 810419678c80 R15: 403500b0
FS:  40354940(0063) GS:810420ffbbc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0010 CR3: 00039f0c4000 CR4: 06e0
Process star (pid: 3256, threadinfo 810369d08000, task 8103b81b5830)
Stack: 810419678a00 810369d09d88 810369d09e18 810369d09e18
   40143430 882fb6d5 810376261540 81040bea4740
   810376261540 88309285
Call Trace: {:rdma_cm:rdma_init_qp_attr+209}
   {:rdma_ucm:ucma_init_qp_attr+160}
   {thread_return+0}
{:rdma_ucm:ucma_write+115}
   {vfs_write+215} {sys_write+69}
  {system_call+126}

Code: 8a 40 10 88 85 85 00 00 00 8b 83 38 01 00 00 66 89 45 7a 8a
RIP {:ib_cm:ib_cm_init_qp_attr+580} RSP 


>From a crash dump, I determined that we died in cm_init_qp_rts_attr() 
(it's inline, so it doesn't show up in the traceback) on the line 
labeled below:

static int cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv,
   struct ib_qp_attr *qp_attr,
   int *qp_attr_mask)
{

if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) {
.
} else {
   *qp_attr_mask = IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE;
   qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; <-die


A similar problem was reported by Josh England .

The problem is that the rdma_cm can call ib_send_cm_mra() after a
connection has been established.  The ib_cm incorrectly assumes that the
MRA is in response to a LAP (load alternate path) message, even though no
LAP message has been received.  The ib_cm needs to check the lap_state
before sending an MRA if the cm_id state is established.

Signed-off-by: Sean Hefty 
---
Josh or Arthur, can either of you confirm if this patch fixes the crashes that
you've seen?

 drivers/infiniband/core/cm.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index ad63b79..64e0903 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2409,10 +2409,12 @@ int ib_send_cm_mra(struct ib_cm_id *cm_id,
msg_response = CM_MSG_RESPONSE_REP;
break;
case IB_CM_ESTABLISHED:
-   cm_state = cm_id->state;
-   lap_state = IB_CM_MRA_LAP_SENT;
-   msg_response = CM_MSG_RESPONSE_OTHER;
-   break;
+   if (cm_id->lap_state == IB_CM_LAP_RCVD) {
+   cm_state = cm_id->state;
+   lap_state = IB_CM_MRA_LAP_SENT;
+   msg_response = CM_MSG_RESPONSE_OTHER;
+   break;
+   }
default:
ret = -EINVAL;
goto error1;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: set cfgctxts to number of CPUs by default

2010-07-21 Thread Ralph Campbell
Up to now, we have set the number of available user contexts based on
the number of hardware contexts which is set according to the number
of available CPUs. This was fine since most CPUs had a power of two
number of cores and the chip supported 4, 8, or 16 user contexts.
Now that some systems have 12 cores, the default isn't optimal and
should be set to 12 even though 16 hardware contexts need to be enabled.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_iba7322.c |2 +-
 drivers/infiniband/hw/qib/qib_init.c|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 5eedf83..9031cd8 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -5864,7 +5864,7 @@ static void write_7322_initregs(struct qib_devdata *dd)
 * Doesn't clear any of the error bits that might be set.
 */
val = TIDFLOW_ERRBITS; /* these are W1C */
-   for (i = 0; i < dd->ctxtcnt; i++) {
+   for (i = 0; i < dd->cfgctxts; i++) {
int flow;
for (flow = 0; flow < NUM_TIDFLOWS_CTXT; flow++)
qib_write_ureg(dd, ur_rcvflowtable+flow, val, i);
diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index a873dd5..f1d16d3 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -93,7 +93,7 @@ unsigned long *qib_cpulist;
 void qib_set_ctxtcnt(struct qib_devdata *dd)
 {
if (!qib_cfgctxts)
-   dd->cfgctxts = dd->ctxtcnt;
+   dd->cfgctxts = dd->first_user_ctxt + num_online_cpus();
else if (qib_cfgctxts < dd->num_pports)
dd->cfgctxts = dd->ctxtcnt;
else if (qib_cfgctxts <= dd->ctxtcnt)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NULL pointer dereference in rdma_ucm

2010-07-21 Thread Josh England
Upgrading 1.5.1 is the way to go for me.  I have other dependencies
tying me down to the CentOS kernel for the time being.  Hopefully any
patch to mainstream should apply fairly cleanly to 1.5.1.

-JE

On Wed, Jul 21, 2010 at 1:51 PM, Hefty, Sean  wrote:
>> Timed out connections might be something that can be compensated for
>> in the app.  It is definitely preferable to a kernel panic.  Still,
>> I'll work on making OFED-1.5.1 happy before playing around with
>> removing the ib_send_cm_mra() call.
>
> FYI - the possible issue I'm describing is in the upstream kernel, so it 
> would also be in 1.5.1.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] ib_qib: Allow writes to the diag_counters to be able to clear them

2010-07-21 Thread Ralph Campbell
Acked-by: Ralph Campbell 

On Tue, 2010-07-13 at 18:53 -0700, Ira Weiny wrote:
> From: Ira Weiny 
> Date: Wed, 7 Jul 2010 17:35:34 -0700
> Subject: [PATCH] ib_qib: Allow writes to the diag_counters to be able to 
> clear them
> 
> Changes in V3:
>   Add non-number error check
>   Return "proper" proper length
> 
> Changes in V2:
>   Add check for negative values
>   Return proper length
> 
> Signed-off-by: Ira Weiny 
> ---
>  drivers/infiniband/hw/qib/qib_sysfs.c |   21 -
>  1 files changed, 20 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qib/qib_sysfs.c 
> b/drivers/infiniband/hw/qib/qib_sysfs.c
> index dab4d9f..b214eff 100644
> --- a/drivers/infiniband/hw/qib/qib_sysfs.c
> +++ b/drivers/infiniband/hw/qib/qib_sysfs.c
> @@ -347,7 +347,7 @@ static struct kobj_type qib_sl2vl_ktype = {
>  
>  #define QIB_DIAGC_ATTR(N) \
>   static struct qib_diagc_attr qib_diagc_attr_##N = { \
> - .attr = { .name = __stringify(N), .mode = 0444 }, \
> + .attr = { .name = __stringify(N), .mode = 0664 }, \
>   .counter = offsetof(struct qib_ibport, n_##N) \
>   }
>  
> @@ -403,8 +403,27 @@ static ssize_t diagc_attr_show(struct kobject *kobj, 
> struct attribute *attr,
>   return sprintf(buf, "%u\n", *(u32 *)((char *)qibp + dattr->counter));
>  }
>  
> +static ssize_t diagc_attr_store(struct kobject *kobj, struct attribute *attr,
> + const char *buf, size_t size)
> +{
> + struct qib_diagc_attr *dattr =
> + container_of(attr, struct qib_diagc_attr, attr);
> + struct qib_pportdata *ppd =
> + container_of(kobj, struct qib_pportdata, diagc_kobj);
> + struct qib_ibport *qibp = &ppd->ibport_data;
> + char *endp;
> + long val = simple_strtol(buf, &endp, 0);
> +
> + if (val < 0 || endp == buf)
> + return -EINVAL;
> +
> + *(u32 *)((char *)qibp + dattr->counter) = (u32)val;
> + return size;
> +}
> +
>  static const struct sysfs_ops qib_diagc_ops = {
>   .show = diagc_attr_show,
> + .store = diagc_attr_store,
>  };
>  
>  static struct kobj_type qib_diagc_ktype = {


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NULL pointer dereference in rdma_ucm

2010-07-21 Thread Hefty, Sean
> Timed out connections might be something that can be compensated for
> in the app.  It is definitely preferable to a kernel panic.  Still,
> I'll work on making OFED-1.5.1 happy before playing around with
> removing the ib_send_cm_mra() call.

FYI - the possible issue I'm describing is in the upstream kernel, so it would 
also be in 1.5.1.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: more partition questions

2010-07-21 Thread Hal Rosenstock
Hi Tom,

On 7/19/10, Tom Ammon  wrote:
> I'm trying to set up partitions in a little test environment, and I'm
> having trouble.
>
> I have opensm running on a machine attached to the fabric, and sminfo on
> the other machines confirm that this is indeed the master SM. Here's my
> /etc/opensm/partitions.conf:
>
> Default=0x , ipoib : ALL, SELF=full ;
> PartitionBlue=0x8004, ipoib : 0x0002c9030009cb3f=full,
> 0x0002c90200252841=full, 0x0002c90200243471=full ;
> PartitionRed=0x8005, ipoib : 0x0002c90200252841=full,
> 0x0002c90200243591=full, 0x0002c9030009cb2b=full ;

You don't really need the 0x8000 bit on in the pkeys but I don't think
it does any harm.

> But when I go to the machine with port GUID 0x0002c90200243471, it
> doesn't appear that it's getting the pkey I wanted:
>
> [r...@stagnate ~]# ibstat
> CA 'mthca0'
>  CA type: MT23108
>  Number of ports: 2
>  Firmware version: 3.3.5
>  Hardware version: a1
>  Node GUID: 0x0002c90200243470
>  System image GUID: 0x0002c90200243473
>  Port 1:
>  State: Active
>  Physical state: LinkUp
>  Rate: 10
>  Base lid: 10
>  LMC: 0
>  SM lid: 4
>  Capability mask: 0x02510a68
>  Port GUID: 0x0002c90200243471
>  Port 2:
>  State: Down
>  Physical state: Polling
>  Rate: 2
>  Base lid: 0
>  LMC: 0
>  SM lid: 0
>  Capability mask: 0x02510a68
>  Port GUID: 0x0002c90200243472
>
> [r...@stagnate ~]# cat /sys/class/net/ib0/pkey
> 0x

What does:

smpquery pkeys 10 1

say ? Do you see the other pkey(s) on that port ?

The pkey you are seeing is the only one for ib0 interface.

If you want to have IPoIB interfaces on the other partitions too, you
need to set this up by creating a child interface on those nodes; you
had asked about that in a previous email
(http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04728.html).

-- Hal

>
> I'm trying to run one ipoib subnet in each partition, and then
> eventually the goal is to have a different server that has 2 child
> interfaces, one on each subnet. But it doesn't appear that my partition
> configuration is even correct. Is there a syntax error, or something
> else I am missing?
>
> Thanks,
>
> Tom
>
>
>
> --
> Tom Ammon
> Network Engineer
> Office: 801.587.0976
> Mobile: 801.674.9273
>
> Center for High Performance Computing
> University of Utah
> http://www.chpc.utah.edu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ib/ehca: init irq tasklet before irq can happen

2010-07-21 Thread Roland Dreier
thanks, applied.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ib/ehca: Catch failing ioremap()

2010-07-21 Thread Roland Dreier
thanks, applied.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] IB/qib: allow PSM to select from multiple port assignment algorithms

2010-07-21 Thread Roland Dreier
thanks, applied
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/18] ib/qib: use generic_file_llseek

2010-07-21 Thread Roland Dreier
thanks, applied
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] ib_qib: Allow writes to the diag_counters to be able to clear them

2010-07-21 Thread Roland Dreier
So Ralph, I'm relying on you to decide if this makes sense...
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NULL pointer dereference in rdma_ucm

2010-07-21 Thread Josh England
On Wed, Jul 21, 2010 at 11:13 AM, Hefty, Sean  wrote:
> If this is what is happening, then removing the call to ib_send_cm_mra() from 
> cma_req_handler() should eliminate the crash.  The drawback is that 
> connections may begin timing out under load, but it may be worth trying.  
> I'll see if I can come up with a better fix.

Timed out connections might be something that can be compensated for
in the app.  It is definitely preferable to a kernel panic.  Still,
I'll work on making OFED-1.5.1 happy before playing around with
removing the ib_send_cm_mra() call.

-JE
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] change thread-unsafe readdir to thread-safe readdir_r calls

2010-07-21 Thread Steven Dake

On 07/21/2010 11:06 AM, Roland Dreier wrote:

  >  +   buf = alloca(offsetof(struct dirent, d_name) + NAME_MAX + 1);
  >  +   while (readdir_r(class_dir, buf,&dent) == 0&&  dent) {

So after thinking this over, I don't think I'm going to apply this
patch.  I think the right fix is for corosync to allow readdir() -- in
general pushing people to safer APIs is probably a good thing, but I
think in this particular case it doesn't make sense.  In fact it seems
that readdir_r() is actually worse since the "buf" parameter does not
have a well-defined required size, while readdir() is actually safe in
most uses.

  - R.


thanks for considering

regards
-steve
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NULL pointer dereference in rdma_ucm

2010-07-21 Thread Hefty, Sean
>  [] :rdma_cm:rdma_init_qp_attr+0xed/0x13f
>  [] :rdma_ucm:ucma_init_qp_attr+0x97/0xe4
>  [] default_wake_function+0x0/0xe
>  [] default_wake_function+0x0/0xe
>  [] shmem_file_write+0x23f/0x251
>  [] :rdma_ucm:ucma_write+0x73/0x91
>  [] vfs_write+0xce/0x174
>  [] sys_write+0x45/0x6e
>  [] system_call+0x7e/0x83

Here's a guess at what may be happening:

The rdma_cm receives a REQ, in cma_req_handler() we have:

ret = conn_id->id.event_handler(&conn_id->id, &event);
if (!ret) {
/*
 * Acquire mutex to prevent user executing rdma_destroy_id()
 * while we're accessing the cm_id.
 */
mutex_lock(&lock);
if (cma_comp(conn_id, CMA_CONNECT) &&
!cma_is_ud_ps(conn_id->id.ps))
ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0);

Note that the call to ib_send_cm_mra() is after the event has been reported to 
the user.  I'm guessing that the connection is getting established before the 
call to ib_send_cm_mra() is invoked.  ib_send_cm_mra() does this:

case IB_CM_ESTABLISHED:
cm_state = cm_id->state;
lap_state = IB_CM_MRA_LAP_SENT;
msg_response = CM_MSG_RESPONSE_OTHER;
break;
...

cm_id->state = cm_state;
cm_id->lap_state = lap_state;
cm_id_priv->service_timeout = service_timeout;

If the cm_id state is established when ib_send_cm_mra() is called, the 
lap_state is changed.  This would result in cm_init_qp_rts_attr() incorrectly 
falling into:

if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) {
...
} else {
---> here

where the alt_av.port would not be set.

If this is what is happening, then removing the call to ib_send_cm_mra() from 
cma_req_handler() should eliminate the crash.  The drawback is that connections 
may begin timing out under load, but it may be worth trying.  I'll see if I can 
come up with a better fix.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] change thread-unsafe readdir to thread-safe readdir_r calls

2010-07-21 Thread Roland Dreier
 > +buf = alloca(offsetof(struct dirent, d_name) + NAME_MAX + 1);
 > +while (readdir_r(class_dir, buf, &dent) == 0 && dent) {

So after thinking this over, I don't think I'm going to apply this
patch.  I think the right fix is for corosync to allow readdir() -- in
general pushing people to safer APIs is probably a good thing, but I
think in this particular case it doesn't make sense.  In fact it seems
that readdir_r() is actually worse since the "buf" parameter does not
have a well-defined required size, while readdir() is actually safe in
most uses.

 - R.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.35 2/3] RDMA/cxgb4: Support variable sized work requests.

2010-07-21 Thread Roland Dreier
thanks, applied.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch v2] infiniband: cxgb3: clean up signed check

2010-07-21 Thread Roland Dreier
thanks, applied
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW.

2010-07-21 Thread Roland Dreier
 > This one is dependent on a cxgb3 change merged into net-next.

I'll hold on to this until that change gets merged upstream.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] RDMA/cxgb4: Add module option to tweak delayed ack.

2010-07-21 Thread Roland Dreier
thanks, applied
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/6] infiniband: remove dependency on __GFP_NOFAIL

2010-07-21 Thread Roland Dreier
thanks guys, applied
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NULL pointer dereference in rdma_ucm

2010-07-21 Thread Hefty, Sean
> I do have the sysfs counters in
> /sys/class/infiniband_cm///cm_tx_msgs/.  Could you
> point me to a reference for what they all mean?

These are counting the number of CM messages sent for each type.  You would 
need to refer to the Infiniband specification to understand the CM protocol and 
messages.  In short, IB uses a 3 way handshake to connect:

REQ (request) ->
<- REP (reply)
RTU (ready to use) ->

MRA (message received) can also appear to indicate that a message has been 
received, but its processing will be delayed.

> Now, I'm not sure how relevant it is, but under heavy load I also see
> a lot of these:
> kernel: ib_cm: calculated mra timeout 67584 > 8192, decreasing use
> timeout_ms

This is likely not an issue.  The CM has received an MRA, but is using a 
shorter timeout than what was specified in the MRA message.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] ib_ipath: Fix probe failure path

2010-07-21 Thread Roland Dreier
thanks, applied.
-- 
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: knockdown voltaire switch with ARP multicast

2010-07-21 Thread Bob Ciotti

Yes. Good progress - most likely idnetify workaround.

On Wed, Jul 21, 2010 at 06:38:10AM -0500, Or Gerlitz wrote:
> Bob Ciotti wrote:
> > Maybe someone on the voltaire side can help.
> > I'm working the issue now Wed Jul 21 00:34:14 PDT 2010
> Hi Bob,
> 
> I understand that some folks from Voltaire are working with you directly.
> 
> Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NULL pointer dereference in rdma_ucm

2010-07-21 Thread Josh England
I do have the sysfs counters in
/sys/class/infiniband_cm///cm_tx_msgs/.  Could you
point me to a reference for what they all mean?  There are a few
patches I've had to throw into 1.4.2 so I'll need to check whether
they are still needed in 1.5.1, but I'll work on that today.

Now, I'm not sure how relevant it is, but under heavy load I also see
a lot of these:
kernel: ib_cm: calculated mra timeout 67584 > 8192, decreasing use timeout_ms

-JE

On Wed, Jul 21, 2010 at 6:09 AM, Or Gerlitz  wrote:
> Josh England wrote:
>> Do you think upgrading to OFED-1.5.1 would help at all?
>
> it might help you to diagnose the problem better, if you read through the
> thread I pointed on (its very short, four messages, let then two minutes),
> you would see that Arthur is reporting on the lap_state and Sean is 
> suggesting to use the IB CM sysfs counter to further debug this. I don't know 
> if these counters exist on the IB stack used for the ofed drop you're using, 
> but they should be in 1.5.x
>
> Or.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sense remote hardware address change by rdma-cm applications

2010-07-21 Thread Steve Wise

Or Gerlitz wrote:

Steve Wise wrote:
  

The cxgb3/4 drivers do not set IFF_NOARP and rely on ND being done as
part of connection setup.  The driver will initiate ND if there isn't a
neigh entry available at the time the iwarp driver tries to send a SYN or SYN/ACK.  



okay, understood, thanks for clarifying this out.

  

The cxgb* drivers actually reference the neigh and dst structs until the
offload connection is gone.  Also if the the offloaded connection has
problems transmitting (due to a L2 address change, for example), then
the driver will initiate ND again by calling neigh_event_send().  See
t4_l2t_send_event() in l2t.c which is called by the iwarp driver in
peer_abort() from iwch_cm.c when the HW tells us its retransmitting too much.



In the general case of rdma-cm consumer, e.g IB RC based and/or UD unicast based, 
we don't have such feedback mechanism from the HW. As such, I would draw the line here around adopting into the rdma-cm the behavior of referencing the neigh and dst structures until the connection is gone (could you point on the func/path in drivers/net/cxgb3/l2t.c which does this? i wasn't sure).


  



Actually the dst entry ref/deref is really done in iw_cxgb3.  The 
dst/neigh entries are referenced in iwch_connect() and pass_accept_req() 
by calling ip_route_output() via find_route().  They are released in 
__free_ep() when the endpoint is finally freed after connection shutdown.



The L2T code deals with maintaining the HW L2 entries and dealing with 
neighbour change events from the kernel.




Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sense remote hardware address change by rdma-cm applications

2010-07-21 Thread Or Gerlitz
Jason Gunthorpe wrote:
> I'm thinking something like this..
> - The RDMA CM gets the dst from its route lookup locks it and stores it.
> - Instead of doing a route lookup cxgb gets the dst from RDMA CM,
>   locks it and stores it
> - RDMA CM traps all notifications/etc and generates callback to cxgb
>   to say the dst has changed.
> - cxgb releases the old dst and grabs the new one, updates the HW, etc.


Jason,

I'm up for extending the rdma-cm event of address change, on which an app can 
decide if
to re-act or not. For example, the in-tree iser and rds code treat this event 
the same as a disconnection request arriving, which means higher layer (e.g the 
user space iscsi daemon in the iser case) would try to re-connect. This has the 
advantage of simplifying the ULP state-machine, so there's no need for special 
handing for address-change, just treat it as a hint that re-connection is 
needed.

the cxgb* code take this deeper as they handle L2 changes in the driver level 
and not as event delivered to the ULP which can optionally address or ignore it.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sense remote hardware address change by rdma-cm applications

2010-07-21 Thread Or Gerlitz
Steve Wise wrote:
> The cxgb3/4 drivers do not set IFF_NOARP and rely on ND being done as
> part of connection setup.  The driver will initiate ND if there isn't a
> neigh entry available at the time the iwarp driver tries to send a SYN or 
> SYN/ACK.  

okay, understood, thanks for clarifying this out.

> The cxgb* drivers actually reference the neigh and dst structs until the
> offload connection is gone.  Also if the the offloaded connection has
> problems transmitting (due to a L2 address change, for example), then
> the driver will initiate ND again by calling neigh_event_send().  See
> t4_l2t_send_event() in l2t.c which is called by the iwarp driver in
> peer_abort() from iwch_cm.c when the HW tells us its retransmitting too much.

In the general case of rdma-cm consumer, e.g IB RC based and/or UD unicast 
based, 
we don't have such feedback mechanism from the HW. As such, I would draw the 
line here around adopting into the rdma-cm the behavior of referencing the 
neigh and dst structures until the connection is gone (could you point on the 
func/path in drivers/net/cxgb3/l2t.c which does this? i wasn't sure).

> What doesn't happen is active positive feedback during the connection to
> avoid NUD.  IE once the connection is setup, nobody calls dst_confirm()
> It is only called during connection setup/teardown.

I think we can live with that, this is similar to the case of an app using UDP 
in uni-directional manner between host A --> B so the NUD part of the network 
stack @ host A has to issue timely probes to validate the L2 address of host B. 
The only difference is that we have the A --> B comm offloaded and eventually 
without keeping the ref the neighbour and dst are deleted, the proposed patch 
eliminates this deletion.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NULL pointer dereference in rdma_ucm

2010-07-21 Thread Or Gerlitz
Josh England wrote:
> Do you think upgrading to OFED-1.5.1 would help at all?

it might help you to diagnose the problem better, if you read through the
thread I pointed on (its very short, four messages, let then two minutes),
you would see that Arthur is reporting on the lap_state and Sean is suggesting 
to use the IB CM sysfs counter to further debug this. I don't know if these 
counters exist on the IB stack used for the ofed drop you're using, but they 
should be in 1.5.x

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: knockdown voltaire switch with ARP multicast

2010-07-21 Thread Or Gerlitz

Bob Ciotti wrote:

Maybe someone on the voltaire side can help.
I'm working the issue now Wed Jul 21 00:34:14 PDT 2010

Hi Bob,

I understand that some folks from Voltaire are working with you directly.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


knockdown voltair switch with ARP multicast

2010-07-21 Thread Bob Ciotti

I've been able to duplicate a nasty problem that took
down our system last week. I believe its caused by
IPoIB ARP multicast traffic. The 4036 switch subscribes
to the IPoIB multicast group whether the SM is running
or disabled. When the switch goes belly up, and it can't be
programmed by the SM.

ERR 3113: MAD completed in error (IB_TIMEOUT): SubnSet(SwitchInfo),
attr_mod 0x0, TID 0x22322

smpquery works (nodeinfo/switchinfo) but perfquery fails with
ibwarn: [26748] _do_madrpc: recv failed: Connection timed out

I can log into the switch, but wouldn't know what to look for there.

Voltaire support case number 
 US Case 00018085: Re: NASA Issue [[ ref:00D38IO.5008BPc6T:ref]]

Maybe someone on the voltaire side can help.

I'm working the issue now
Wed Jul 21 00:34:14 PDT 2010



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html