Re: [PATCH] IB/srp: fix race condition on srp_target_port.req_lim

2010-07-26 Thread Bart Van Assche
On Sun, Jul 25, 2010 at 8:36 PM, David Dillow d...@thedillows.org wrote:

 On Sun, 2010-07-25 at 18:12 +0200, Bart Van Assche wrote:
  In the current implementation of ib_srp the req_lim field of
  struct srp_target_port can be manipulated in a non-atomic way by
  more than one CPU at a time: one CPU can be modifying req_lim in
  function srp_process_rsp() while another CPU can concurrently be
  decrementing req_lim in function __srp_get_tx_iu(). This is a
  race condition which can result in incorrect manipulation of the
  req_lim field. The patch below fixes this race condition by
  converting all manipulations of req_lim into atomic operations.

 This is not needed -- all modifications of target-req_lim are protected
 by scsi_dev-host_lock. It is held implicitly in srp_queuecommand() by
 the SCSI mid-layer, and we take that lock before modifying req_lim
 elsewhere -- or we're initializing and there won't be concurrent access.

It would be helpful for anyone who is reviewing or modifying the SRP
initiator source code if the locking policy was documented. As an
example, in kernels 2.6.33 and before there was a single notification
processing function srp_completion() that incremented target-tx_tail
non-atomically and without protection of scsi_host-host_lock. It was
not possible to tell from the source code comments whether or not this
access pattern was intentional.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: error when compiling ofed ulp\libibnetdisc

2010-07-26 Thread Hefty, Sean
 When compiling the ulp\libibnetdisc  on OFW I got the following casting 
 error. When looking the code
 it really looks problematic. I wonder how you don't see it in your 
 environment.

I just checked, and this error doesn't show up in my build (7600.16385).

 static size_t _unmarshall16(uint8_t * inbuf, uint16_t * num)
 
 {
 
 (*num) = (uint64_t) inbuf[0];
 
 (*num) |= ((uint16_t) inbuf[1]  8);

The cast to uint64_t is wrong.  We can replace the above with:

(*num) = ((uint16_t) inbuf[1]  8) | inbuf[0];

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: more partition questions

2010-07-26 Thread Tom Ammon

Hal,

On 7/22/2010 6:08 PM, Hal Rosenstock wrote:

Tom,

On Thu, Jul 22, 2010 at 4:49 PM, Tom Ammontom.am...@utah.edu  wrote:

Hal,

Thanks for looking at all of this with me. ifconfig output is below.

On 7/22/2010 12:08 PM, Hal Rosenstock wrote:


Tom,

On Thu, Jul 22, 2010 at 1:19 PM, Tom Ammontom.am...@utah.eduwrote:


Hal,

On 7/21/2010 2:45 PM, Hal Rosenstock wrote:


Hi Tom,

On 7/19/10, Tom Ammontom.am...@utah.edu  wrote:


I'm trying to set up partitions in a little test environment, and I'm
having trouble.

I have opensm running on a machine attached to the fabric, and sminfo
on
the other machines confirm that this is indeed the master SM. Here's my
/etc/opensm/partitions.conf:

Default=0x , ipoib : ALL, SELF=full ;
PartitionBlue=0x8004, ipoib : 0x0002c9030009cb3f=full,
0x0002c90200252841=full, 0x0002c90200243471=full ;
PartitionRed=0x8005, ipoib : 0x0002c90200252841=full,
0x0002c90200243591=full, 0x0002c9030009cb2b=full ;


You don't really need the 0x8000 bit on in the pkeys but I don't think
it does any harm.


But when I go to the machine with port GUID 0x0002c90200243471, it
doesn't appear that it's getting the pkey I wanted:

[r...@stagnate ~]# ibstat
CA 'mthca0'
  CA type: MT23108
  Number of ports: 2
  Firmware version: 3.3.5
  Hardware version: a1
  Node GUID: 0x0002c90200243470
  System image GUID: 0x0002c90200243473
  Port 1:
  State: Active
  Physical state: LinkUp
  Rate: 10
  Base lid: 10
  LMC: 0
  SM lid: 4
  Capability mask: 0x02510a68
  Port GUID: 0x0002c90200243471
  Port 2:
  State: Down
  Physical state: Polling
  Rate: 2
  Base lid: 0
  LMC: 0
  SM lid: 0
  Capability mask: 0x02510a68
  Port GUID: 0x0002c90200243472

[r...@stagnate ~]# cat /sys/class/net/ib0/pkey
0x


What does:

smpquery pkeys 10 1

say ? Do you see the other pkey(s) on that port ?


[r...@stagnate ~]# smpquery pkeys 10 1
   0: 0x7fff 0x8004 0x 0x 0x 0x 0x 0x
   8: 0x 0x 0x 0x 0x 0x 0x 0x
  16: 0x 0x 0x 0x 0x 0x 0x 0x
  24: 0x 0x 0x 0x 0x 0x 0x 0x
  32: 0x 0x 0x 0x 0x 0x 0x 0x
  40: 0x 0x 0x 0x 0x 0x 0x 0x
  48: 0x 0x 0x 0x 0x 0x 0x 0x
  56: 0x 0x 0x 0x 0x 0x 0x 0x
64 pkeys capacity for this port

So I see that both 7fff and 8004 are being assigned to this port. Is that
okay?


Yes.


  Is there any problem with the machine also being in the default
partition?


No.


As I look around at all of the machines with smpquery, it appears that
they
are all being assigned 7fff and the pkey that I assigned in
partitions.conf.


Good.


But the machine that I want to run 2 child interfaces on is having
issues.
It's at LID 7 and here's what smpquery says:

[r...@stagnate ~]# smpquery pkeys 7 1
   0: 0x7fff 0x8004 0x8005 0x 0x 0x 0x 0x
   8: 0x 0x 0x 0x 0x 0x 0x 0x
  16: 0x 0x 0x 0x 0x 0x 0x 0x
  24: 0x 0x 0x 0x 0x 0x 0x 0x
  32: 0x 0x 0x 0x 0x 0x 0x 0x
  40: 0x 0x 0x 0x 0x 0x 0x 0x
  48: 0x 0x 0x 0x 0x 0x 0x 0x
  56: 0x 0x 0x 0x 0x 0x 0x 0x
64 pkeys capacity for this port

So that's fine, but when I try to create a child interface I get this:

[r...@labdisk01 ~]# echo 0x8004/sys/class/net/ib0/create_child
-bash: echo: write error: Name not unique on network


I don't know what cause that error. Maybe someone else can help here.

Are you sure the ib0 interface is OK ? What does ifconfig ib0 say ?


Here's ifconfig ib0:

ib0   Link encap:InfiniBand  HWaddr
80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
  inet6 addr: fe80::202:c902:25:2841/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
  RX packets:1 errors:0 dropped:0 overruns:0 frame:0
  TX packets:17 errors:0 dropped:7 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:56 (56.0 b)  TX bytes:3529 (3.4 KiB)


Then I brought up the subinterfaces with ifup ib0.8004 ifup ib0.8005 .
Still get the Name not unique on network message if I switch the order and
do ifup followed by echo 0x8004etc.

ib0.8004  Link encap:InfiniBand  HWaddr
80:00:04:06:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
  inet addr:10.0.0.2  Bcast:10.0.0.255  Mask:255.255.255.0
  inet6 addr: fe80::202:c902:25:2841/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  

Re: About a shortcoming of the verbs API

2010-07-26 Thread Bart Van Assche
On Mon, Jul 26, 2010 at 4:21 PM, Steve Wise sw...@opengridcomputing.com wrote:
 On 07/25/2010 01:54 PM, Bart Van Assche wrote:

 [ ... ]

 The only way I know of to prevent out-of-order completion processing
 with the current OFED verbs API is to protect the whole completion
 processing loop against concurrent execution with a spinlock. Maybe it
 should be considered to extend the verbs API such that it is possible
 to process completions in order without additional locking. Apparently
 API functions that allow this in a similar context have already been
 invented in the past -- see e.g. VipCQNotify() in the Virtual
 Interface Architecture Specification.

 Is this the API to which you refer?

 http://docsrv.sco.com/cgi-bin/man/man?VipCQNotify+3VI

 I don't see how it provides the semantics you desire?

The web page you refer to is owned by a company that is controversial
in the Linux world. A more neutral source is the book The Virtual
Interface Architecture (Intel Press, 2002) or the document Virtual
Interface Architecture Specification (1997, available online at
http://pllab.cs.nthu.edu.tw/cs5403/Readings/EJB/san_10.pdf). In both
documents it is described that VipCQNotify atomically either dequeues
a work completion or enables notifications. As far as I know none of
the verb extensions defined by the IBTA allows to perform both
operations in an atomic way.

Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: error when compiling ofed ulp\libibnetdisc

2010-07-26 Thread Uri Habusha
The fix is okay. However I wonder how you don't get the error in your 
environment and we got it. Any idea?

-Original Message-
From: Hefty, Sean [mailto:sean.he...@intel.com] 
Sent: Monday, July 26, 2010 7:23 PM
To: Smith, Stan; Uri Habusha; o...@lists.openfabrics.org
Cc: Leonid Keller; Tzachi Dar; Irena Kruchkovsky; Alex Naslednikov; Sasha 
Khapyorsky; Ira Weiny; linux-rdma@vger.kernel.org
Subject: RE: error when compiling ofed ulp\libibnetdisc

 When compiling the ulp\libibnetdisc  on OFW I got the following casting 
 error. When looking the code
 it really looks problematic. I wonder how you don't see it in your 
 environment.

I just checked, and this error doesn't show up in my build (7600.16385).

 static size_t _unmarshall16(uint8_t * inbuf, uint16_t * num)
 
 {
 
 (*num) = (uint64_t) inbuf[0];
 
 (*num) |= ((uint16_t) inbuf[1]  8);

The cast to uint64_t is wrong.  We can replace the above with:

(*num) = ((uint16_t) inbuf[1]  8) | inbuf[0];

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: error when compiling ofed ulp\libibnetdisc

2010-07-26 Thread Hefty, Sean
 The fix is okay. However I wonder how you don't get the error in your 
 environment and we got it. Any
 idea?

The cast is from an 8-bit value to 64-bit, then back to 16-bit, so there's no 
chance of losing data with the cast.  Maybe the compiler got smarter?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: error when compiling ofed ulp\libibnetdisc

2010-07-26 Thread Uri Habusha
I guess we use same WDK version, 6001.10081. That means same compiler.

In any case can you check-in the fix?

-Original Message-
From: Hefty, Sean [mailto:sean.he...@intel.com] 
Sent: Monday, July 26, 2010 9:29 PM
To: Uri Habusha; Smith, Stan; o...@lists.openfabrics.org
Cc: Leonid Keller; Tzachi Dar; Irena Kruchkovsky; Alex Naslednikov; Sasha 
Khapyorsky; Ira Weiny; linux-rdma@vger.kernel.org
Subject: RE: error when compiling ofed ulp\libibnetdisc

 The fix is okay. However I wonder how you don't get the error in your 
 environment and we got it. Any
 idea?

The cast is from an 8-bit value to 64-bit, then back to 16-bit, so there's no 
chance of losing data with the cast.  Maybe the compiler got smarter?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


libibnetdisc: fix cast in unmarshall16

2010-07-26 Thread Hefty, Sean
Uri Habusha reported a build error on windows as a result of an
incorrect cast to uint64_t.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 .../libibnetdisc/src/ibnetdisc_cache.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c 
b/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c
index 1de42eb..199bf33 100644
--- a/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c
+++ b/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c
@@ -183,8 +183,7 @@ static size_t _unmarshall8(uint8_t * inbuf, uint8_t * num)
 
 static size_t _unmarshall16(uint8_t * inbuf, uint16_t * num)
 {
-   (*num) = (uint64_t) inbuf[0];
-   (*num) |= ((uint16_t) inbuf[1]  8);
+   (*num) = ((uint16_t) inbuf[1]  8) | inbuf[0];
 
return (sizeof(*num));
 }


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with RDMA_CM on systems with multiple IB HCA's.

2010-07-26 Thread Hari Subramoni
Hi,

Yes, both cards are on the same IB subnet.

The machines are down for maintanence now. We will send out the
information you requested as soon as they are up.

Thanks a lot,
Hari.

On Fri, 23 Jul 2010, Or Gerlitz wrote:

 Hari Subramoni subra...@cse.ohio-state.edu wrote:

  The nodes have LID's assigned to them and OpenSM is running fine.
  I've attached the configurations of the two hosts along with this e-mail.
   As Jonathan mentioned, we are able to ping between them.

 are the two HCAs on each of the nodes connected to the same IB subnet?

  The issue is intermittent. It happens at times and at other times, things
  work fine. Please let us know if you need any more information.

 lets focus on rping, please use both -v -d  flags with rping, also
 when  rping fails, please send the neighbours info (#ip neigh show)
 from host .5

 Or.


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: error when compiling ofed ulp\libibnetdisc

2010-07-26 Thread Hefty, Sean
 In any case can you check-in the fix?

I've committed the patch I sent to Sasha into svn.  We can adjust for any 
differences in what gets accepted upstream later when I update the windows 
management trees.

- Sean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libibnetdisc: fix cast in unmarshall16

2010-07-26 Thread Sasha Khapyorsky
On 11:40 Mon 26 Jul , Hefty, Sean wrote:
 Uri Habusha reported a build error on windows as a result of an
 incorrect cast to uint64_t.
 
 Signed-off-by: Sean Hefty sean.he...@intel.com

Applied. Thanks.

Sasha

 ---
  .../libibnetdisc/src/ibnetdisc_cache.c |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)
 
 diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c 
 b/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c
 index 1de42eb..199bf33 100644
 --- a/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c
 +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c
 @@ -183,8 +183,7 @@ static size_t _unmarshall8(uint8_t * inbuf, uint8_t * num)
  
  static size_t _unmarshall16(uint8_t * inbuf, uint16_t * num)
  {
 - (*num) = (uint64_t) inbuf[0];
 - (*num) |= ((uint16_t) inbuf[1]  8);
 + (*num) = ((uint16_t) inbuf[1]  8) | inbuf[0];
  
   return (sizeof(*num));
  }
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About a shortcoming of the verbs API

2010-07-26 Thread Roland Dreier
  2. Double completion processing loop
  
  * Initialization:
  ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
  
  * Notification handler:
  
  struct ib_wc wc;
  do {
      while (ib_poll_cq(cq, 1, wc)  0)
          /* process wc */
  } while (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
  IB_CQ_REPORT_MISSED_EVENTS)  0);

This approach can be used to have race-free in-order processing of
completions using a scheme such as the NAPI processing loop used by the
IPoIB driver (with help from the core networking stack).  Essentially a
completion notification just marks the completion processing routine as
runnable, and the networking core schedules that processing routine in a
single-threaded way until the CQ is drained.

Another approach is to just always run the completion processing for a
given CQ on a single CPU and avoid locking entirely.  If you want more
CPUs to spread the work, just use multiple CQs and multiple event vectors.

  see e.g. VipCQNotify() in the Virtual Interface Architecture
  Specification.

I don't know of an efficient way to implement this type of atomic
dequeue completion or enable completions with any existing hardware.
Do you have an idea how this could be done?

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


CMA handler status code

2010-07-26 Thread Eldad Zinger
Hi everyone,
while developing SDP, I found a small problem with cma handlers:

drivers/infiniband/core/cma.c : 2204
drivers/infiniband/core/cma.c : 976

event.status = ib_event-param.sidr_rep_rcvd.status
event.status = ib_event-param.rej_rcvd.reason

event.status should be 0 for success, or negative value of generic error code.
In that code, the error code is positive and do not comply with
generic error code.

In order to make the status field available for other modules (like
SDP), that field should be format-consistent.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html