Re: [PATCH] IB/srp: fix race condition on srp_target_port.req_lim
On Sun, Jul 25, 2010 at 8:36 PM, David Dillow d...@thedillows.org wrote: On Sun, 2010-07-25 at 18:12 +0200, Bart Van Assche wrote: In the current implementation of ib_srp the req_lim field of struct srp_target_port can be manipulated in a non-atomic way by more than one CPU at a time: one CPU can be modifying req_lim in function srp_process_rsp() while another CPU can concurrently be decrementing req_lim in function __srp_get_tx_iu(). This is a race condition which can result in incorrect manipulation of the req_lim field. The patch below fixes this race condition by converting all manipulations of req_lim into atomic operations. This is not needed -- all modifications of target-req_lim are protected by scsi_dev-host_lock. It is held implicitly in srp_queuecommand() by the SCSI mid-layer, and we take that lock before modifying req_lim elsewhere -- or we're initializing and there won't be concurrent access. It would be helpful for anyone who is reviewing or modifying the SRP initiator source code if the locking policy was documented. As an example, in kernels 2.6.33 and before there was a single notification processing function srp_completion() that incremented target-tx_tail non-atomically and without protection of scsi_host-host_lock. It was not possible to tell from the source code comments whether or not this access pattern was intentional. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: error when compiling ofed ulp\libibnetdisc
When compiling the ulp\libibnetdisc on OFW I got the following casting error. When looking the code it really looks problematic. I wonder how you don't see it in your environment. I just checked, and this error doesn't show up in my build (7600.16385). static size_t _unmarshall16(uint8_t * inbuf, uint16_t * num) { (*num) = (uint64_t) inbuf[0]; (*num) |= ((uint16_t) inbuf[1] 8); The cast to uint64_t is wrong. We can replace the above with: (*num) = ((uint16_t) inbuf[1] 8) | inbuf[0]; - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: more partition questions
Hal, On 7/22/2010 6:08 PM, Hal Rosenstock wrote: Tom, On Thu, Jul 22, 2010 at 4:49 PM, Tom Ammontom.am...@utah.edu wrote: Hal, Thanks for looking at all of this with me. ifconfig output is below. On 7/22/2010 12:08 PM, Hal Rosenstock wrote: Tom, On Thu, Jul 22, 2010 at 1:19 PM, Tom Ammontom.am...@utah.eduwrote: Hal, On 7/21/2010 2:45 PM, Hal Rosenstock wrote: Hi Tom, On 7/19/10, Tom Ammontom.am...@utah.edu wrote: I'm trying to set up partitions in a little test environment, and I'm having trouble. I have opensm running on a machine attached to the fabric, and sminfo on the other machines confirm that this is indeed the master SM. Here's my /etc/opensm/partitions.conf: Default=0x , ipoib : ALL, SELF=full ; PartitionBlue=0x8004, ipoib : 0x0002c9030009cb3f=full, 0x0002c90200252841=full, 0x0002c90200243471=full ; PartitionRed=0x8005, ipoib : 0x0002c90200252841=full, 0x0002c90200243591=full, 0x0002c9030009cb2b=full ; You don't really need the 0x8000 bit on in the pkeys but I don't think it does any harm. But when I go to the machine with port GUID 0x0002c90200243471, it doesn't appear that it's getting the pkey I wanted: [r...@stagnate ~]# ibstat CA 'mthca0' CA type: MT23108 Number of ports: 2 Firmware version: 3.3.5 Hardware version: a1 Node GUID: 0x0002c90200243470 System image GUID: 0x0002c90200243473 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 10 LMC: 0 SM lid: 4 Capability mask: 0x02510a68 Port GUID: 0x0002c90200243471 Port 2: State: Down Physical state: Polling Rate: 2 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0002c90200243472 [r...@stagnate ~]# cat /sys/class/net/ib0/pkey 0x What does: smpquery pkeys 10 1 say ? Do you see the other pkey(s) on that port ? [r...@stagnate ~]# smpquery pkeys 10 1 0: 0x7fff 0x8004 0x 0x 0x 0x 0x 0x 8: 0x 0x 0x 0x 0x 0x 0x 0x 16: 0x 0x 0x 0x 0x 0x 0x 0x 24: 0x 0x 0x 0x 0x 0x 0x 0x 32: 0x 0x 0x 0x 0x 0x 0x 0x 40: 0x 0x 0x 0x 0x 0x 0x 0x 48: 0x 0x 0x 0x 0x 0x 0x 0x 56: 0x 0x 0x 0x 0x 0x 0x 0x 64 pkeys capacity for this port So I see that both 7fff and 8004 are being assigned to this port. Is that okay? Yes. Is there any problem with the machine also being in the default partition? No. As I look around at all of the machines with smpquery, it appears that they are all being assigned 7fff and the pkey that I assigned in partitions.conf. Good. But the machine that I want to run 2 child interfaces on is having issues. It's at LID 7 and here's what smpquery says: [r...@stagnate ~]# smpquery pkeys 7 1 0: 0x7fff 0x8004 0x8005 0x 0x 0x 0x 0x 8: 0x 0x 0x 0x 0x 0x 0x 0x 16: 0x 0x 0x 0x 0x 0x 0x 0x 24: 0x 0x 0x 0x 0x 0x 0x 0x 32: 0x 0x 0x 0x 0x 0x 0x 0x 40: 0x 0x 0x 0x 0x 0x 0x 0x 48: 0x 0x 0x 0x 0x 0x 0x 0x 56: 0x 0x 0x 0x 0x 0x 0x 0x 64 pkeys capacity for this port So that's fine, but when I try to create a child interface I get this: [r...@labdisk01 ~]# echo 0x8004/sys/class/net/ib0/create_child -bash: echo: write error: Name not unique on network I don't know what cause that error. Maybe someone else can help here. Are you sure the ib0 interface is OK ? What does ifconfig ib0 say ? Here's ifconfig ib0: ib0 Link encap:InfiniBand HWaddr 80:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet6 addr: fe80::202:c902:25:2841/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:1 errors:0 dropped:0 overruns:0 frame:0 TX packets:17 errors:0 dropped:7 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:56 (56.0 b) TX bytes:3529 (3.4 KiB) Then I brought up the subinterfaces with ifup ib0.8004 ifup ib0.8005 . Still get the Name not unique on network message if I switch the order and do ifup followed by echo 0x8004etc. ib0.8004 Link encap:InfiniBand HWaddr 80:00:04:06:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.0.0.2 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:25:2841/64 Scope:Link UP BROADCAST RUNNING MULTICAST
Re: About a shortcoming of the verbs API
On Mon, Jul 26, 2010 at 4:21 PM, Steve Wise sw...@opengridcomputing.com wrote: On 07/25/2010 01:54 PM, Bart Van Assche wrote: [ ... ] The only way I know of to prevent out-of-order completion processing with the current OFED verbs API is to protect the whole completion processing loop against concurrent execution with a spinlock. Maybe it should be considered to extend the verbs API such that it is possible to process completions in order without additional locking. Apparently API functions that allow this in a similar context have already been invented in the past -- see e.g. VipCQNotify() in the Virtual Interface Architecture Specification. Is this the API to which you refer? http://docsrv.sco.com/cgi-bin/man/man?VipCQNotify+3VI I don't see how it provides the semantics you desire? The web page you refer to is owned by a company that is controversial in the Linux world. A more neutral source is the book The Virtual Interface Architecture (Intel Press, 2002) or the document Virtual Interface Architecture Specification (1997, available online at http://pllab.cs.nthu.edu.tw/cs5403/Readings/EJB/san_10.pdf). In both documents it is described that VipCQNotify atomically either dequeues a work completion or enables notifications. As far as I know none of the verb extensions defined by the IBTA allows to perform both operations in an atomic way. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: error when compiling ofed ulp\libibnetdisc
The fix is okay. However I wonder how you don't get the error in your environment and we got it. Any idea? -Original Message- From: Hefty, Sean [mailto:sean.he...@intel.com] Sent: Monday, July 26, 2010 7:23 PM To: Smith, Stan; Uri Habusha; o...@lists.openfabrics.org Cc: Leonid Keller; Tzachi Dar; Irena Kruchkovsky; Alex Naslednikov; Sasha Khapyorsky; Ira Weiny; linux-rdma@vger.kernel.org Subject: RE: error when compiling ofed ulp\libibnetdisc When compiling the ulp\libibnetdisc on OFW I got the following casting error. When looking the code it really looks problematic. I wonder how you don't see it in your environment. I just checked, and this error doesn't show up in my build (7600.16385). static size_t _unmarshall16(uint8_t * inbuf, uint16_t * num) { (*num) = (uint64_t) inbuf[0]; (*num) |= ((uint16_t) inbuf[1] 8); The cast to uint64_t is wrong. We can replace the above with: (*num) = ((uint16_t) inbuf[1] 8) | inbuf[0]; - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: error when compiling ofed ulp\libibnetdisc
The fix is okay. However I wonder how you don't get the error in your environment and we got it. Any idea? The cast is from an 8-bit value to 64-bit, then back to 16-bit, so there's no chance of losing data with the cast. Maybe the compiler got smarter? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: error when compiling ofed ulp\libibnetdisc
I guess we use same WDK version, 6001.10081. That means same compiler. In any case can you check-in the fix? -Original Message- From: Hefty, Sean [mailto:sean.he...@intel.com] Sent: Monday, July 26, 2010 9:29 PM To: Uri Habusha; Smith, Stan; o...@lists.openfabrics.org Cc: Leonid Keller; Tzachi Dar; Irena Kruchkovsky; Alex Naslednikov; Sasha Khapyorsky; Ira Weiny; linux-rdma@vger.kernel.org Subject: RE: error when compiling ofed ulp\libibnetdisc The fix is okay. However I wonder how you don't get the error in your environment and we got it. Any idea? The cast is from an 8-bit value to 64-bit, then back to 16-bit, so there's no chance of losing data with the cast. Maybe the compiler got smarter? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
libibnetdisc: fix cast in unmarshall16
Uri Habusha reported a build error on windows as a result of an incorrect cast to uint64_t. Signed-off-by: Sean Hefty sean.he...@intel.com --- .../libibnetdisc/src/ibnetdisc_cache.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c b/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c index 1de42eb..199bf33 100644 --- a/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c @@ -183,8 +183,7 @@ static size_t _unmarshall8(uint8_t * inbuf, uint8_t * num) static size_t _unmarshall16(uint8_t * inbuf, uint16_t * num) { - (*num) = (uint64_t) inbuf[0]; - (*num) |= ((uint16_t) inbuf[1] 8); + (*num) = ((uint16_t) inbuf[1] 8) | inbuf[0]; return (sizeof(*num)); } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with RDMA_CM on systems with multiple IB HCA's.
Hi, Yes, both cards are on the same IB subnet. The machines are down for maintanence now. We will send out the information you requested as soon as they are up. Thanks a lot, Hari. On Fri, 23 Jul 2010, Or Gerlitz wrote: Hari Subramoni subra...@cse.ohio-state.edu wrote: The nodes have LID's assigned to them and OpenSM is running fine. I've attached the configurations of the two hosts along with this e-mail. As Jonathan mentioned, we are able to ping between them. are the two HCAs on each of the nodes connected to the same IB subnet? The issue is intermittent. It happens at times and at other times, things work fine. Please let us know if you need any more information. lets focus on rping, please use both -v -d flags with rping, also when rping fails, please send the neighbours info (#ip neigh show) from host .5 Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: error when compiling ofed ulp\libibnetdisc
In any case can you check-in the fix? I've committed the patch I sent to Sasha into svn. We can adjust for any differences in what gets accepted upstream later when I update the windows management trees. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libibnetdisc: fix cast in unmarshall16
On 11:40 Mon 26 Jul , Hefty, Sean wrote: Uri Habusha reported a build error on windows as a result of an incorrect cast to uint64_t. Signed-off-by: Sean Hefty sean.he...@intel.com Applied. Thanks. Sasha --- .../libibnetdisc/src/ibnetdisc_cache.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c b/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c index 1de42eb..199bf33 100644 --- a/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc_cache.c @@ -183,8 +183,7 @@ static size_t _unmarshall8(uint8_t * inbuf, uint8_t * num) static size_t _unmarshall16(uint8_t * inbuf, uint16_t * num) { - (*num) = (uint64_t) inbuf[0]; - (*num) |= ((uint16_t) inbuf[1] 8); + (*num) = ((uint16_t) inbuf[1] 8) | inbuf[0]; return (sizeof(*num)); } -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About a shortcoming of the verbs API
2. Double completion processing loop * Initialization: ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); * Notification handler: struct ib_wc wc; do { while (ib_poll_cq(cq, 1, wc) 0) /* process wc */ } while (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS) 0); This approach can be used to have race-free in-order processing of completions using a scheme such as the NAPI processing loop used by the IPoIB driver (with help from the core networking stack). Essentially a completion notification just marks the completion processing routine as runnable, and the networking core schedules that processing routine in a single-threaded way until the CQ is drained. Another approach is to just always run the completion processing for a given CQ on a single CPU and avoid locking entirely. If you want more CPUs to spread the work, just use multiple CQs and multiple event vectors. see e.g. VipCQNotify() in the Virtual Interface Architecture Specification. I don't know of an efficient way to implement this type of atomic dequeue completion or enable completions with any existing hardware. Do you have an idea how this could be done? - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
CMA handler status code
Hi everyone, while developing SDP, I found a small problem with cma handlers: drivers/infiniband/core/cma.c : 2204 drivers/infiniband/core/cma.c : 976 event.status = ib_event-param.sidr_rep_rcvd.status event.status = ib_event-param.rej_rcvd.reason event.status should be 0 for success, or negative value of generic error code. In that code, the error code is positive and do not comply with generic error code. In order to make the status field available for other modules (like SDP), that field should be format-consistent. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html