RE: [ewg] RDMA in SDP
There is no RDMA in SDP 1.2 because it did not get coded. There is a send side optimization in SDP 1.3 that avoids copies in some cases. It is not RDMA based though. Eventually (planned for 1.4) SDP will include at least some of the defined RDMA modes. Please feel free to contribute code to enable features you need more quickly. Jim Mott From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of zhang Jackie Sent: Tuesday, March 04, 2008 9:05 PM To: [EMAIL PROTECTED] Subject: [ewg] RDMA in SDP hi, I am reading the source code of SDP in OFED 1.2.5.1, I found only Send/Recv was used , I want to know why RDMA is not supported in SDP. Thanks. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
Hi, 64K is borderline for seeing bzcopy effect. Using an AMD 6000+ (3 Ghz dual core) in Asus M2A-VM motherboard with ConnectX running 2.3 firmware and OFED 1.3-rc3 stack running on 2.6.23.8 kernel.org kernel, I ran the test for 128K: 5546 sdp_zcopy_thresh=0 (off) 8709 sdp_zcopy_thresh=65536 For these tests, I just have LD_PRELOAD set in my environment. === I see that TCP_MAXSEG is not being handled by libsdp and will look into it. [EMAIL PROTECTED] ~]# modprobe ib_sdp [EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c -C -- -m 128K TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 193.168.10.198 (193.168.10.198) port 0 AF_INET netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92 Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 13107230.01 5545.69 51.4714.431.521 1.706 Alignment Offset BytesBytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 2.08e+10 131072.00158690 33135.60 627718 Maximum Segment Size (bytes) -1 [EMAIL PROTECTED] ~]# echo 65536 /sys/module/ib_sdp/parameters/sdp_zcopy_thresh [EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c -C -- -m 128K TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 193.168.10.198 (193.168.10.198) port 0 AF_INET netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92 Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 13107230.01 8708.58 50.6314.550.953 1.095 Alignment Offset BytesBytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 3.267e+10 131072.00249228 26348.30 1239807 Maximum Segment Size (bytes) -1 Thanks, JIm Jim Mott Mellanox Technologies Ltd. mail: [EMAIL PROTECTED] Phone: 512-294-5481 -Original Message- From: Weikuan Yu [mailto:[EMAIL PROTECTED] Sent: Thursday, January 24, 2008 9:09 AM To: Scott Weitzenkamp (sweitzen) Cc: Jim Mott; ewg@lists.openfabrics.org; [EMAIL PROTECTED] Subject: Re: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh Hi, Scott, I have been running SDP tests across two woodcrest nodes with 4x DDR cards using OFED-1.2.5.4. The card/firmware info is below. CA 'mthca0' CA type: MT25208 Number of ports: 2 Firmware version: 5.1.400 Hardware version: a0 Node GUID: 0x0002c90200228e0c System image GUID: 0x0002c90200228e0f I could not get a bandwidth more than 5Gbps like you have shown here. Wonder if I need to upgrade to the latest software or firmware? Any suggestions? Thanks, --Weikuan TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.225.77 (192.168 .225.77) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 131072 131072 13107210.00 4918.95 21.2924.991.418 1.665 Scott Weitzenkamp (sweitzen) wrote: Jim, I am trying OFED-1.3-20071231-0600 and RHEL4 x86_64 on a dual CPU (single core each CPU) Xeon system. I do not see any performance improvement (either throughput or CPU utilization) using netperf when I set /sys/module/ib_sdp/sdp_zcopy_thresh to 16384. Can you elaborate on your HCA type, and performance improvement you see? Here's an example netperf command line when using a Cheetah DDR HCA and 1.2.917 firmware (I have also tried ConnectX and 2.3.000 firmware too): [EMAIL PROTECTED] ~]$ LD_PRELOAD=libsdp.so netperf241 -v2 -4 -H 192.168.1.201 -l 30 -t TCP_STREAM -c -C -- -m 65536 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.201 (192.168.1.201) port 0 AF_INET : histogram : demo Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10
[ewg] RE: OFED-1.3-beta sdp issue
Hi, This looks very much like bug 793 (https://bugs.openfabrics.org/show_bug.cgi?id=793). There was a change in the sk_buff definition in 2.6.22+ kernels. Could you verify that the fix posted in the bug is in your sdp_bcopy.c (or just send me your drivers/infiniband/ulp/sdp/sdp_bcopy.c) file? This bug got picked up as a patch that gets applied by the build process instead of a change to base code. Perhaps it is not being picked up for PPC. I'll check it out. Thanks, JIm Jim Mott Mellanox Technologies Ltd. mail: [EMAIL PROTECTED] Phone: 512-294-5481 From: Stefan Roscher [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 4:32 AM To: Jim Mott Cc: Hoang-Nam Nguyen; Christoph Raisch; ewg@lists.openfabrics.org Subject: OFED-1.3-beta sdp issue Hi Jim, during the OFED-1.3-beta2 test on ppc64 systems with SLES10-SP1 I saw the following issue. I booted linux kernel 2.6.22 and 2.6.23 on SLES10-SP1 and netpipe sdp fails. with the following oops: REGS: c8ccf930 TRAP: 0700 Not tainted (2.6.23-ppc64) MSR: 80029032 EE,ME,IR,DR CR: 2444 XER: 0005 TASK = c8ccb6a0[25] 'events/6' THREAD: c8ccc000 CPU: 6 GPR00: c0322b98 c8ccfbb0 c0680048 0087 GPR04: 0024a7d8 GPR08: 001bac9151b0 c05c8108 c001daa87b58 c05c8110 GPR12: c059a300 GPR16: 4210 GPR20: c054de98 c001a0bc4b00 0001 GPR24: c001beb7d000 beb7d014 0006 GPR28: c001aae86100 c001d433c080 c062ef28 c001db841880 NIP [c0322b9c] .skb_over_panic+0x50/0x58 LR [c0322b98] .skb_over_panic+0x4c/0x58 Call Trace: [c8ccfbb0] [c0322b98] .skb_over_panic+0x4c/0x58 (unreliable) [c8ccfc40] [d0559df0] .sdp_poll_cq+0x380/0xa68 [ib_sdp] [c8ccfd10] [d055a8fc] .sdp_work+0xe8/0x10c [ib_sdp] [c8ccfda0] [c0076fac] .run_workqueue+0x118/0x208 [c8ccfe40] [c0077f70] .worker_thread+0xcc/0xf0 [c8ccff00] [c007caa4] .kthread+0x78/0xc4 [c8ccff90] [c0026be4] .kernel_thread+0x4c/0x68 Instruction dump: 80a30068 e8e300b8 e90300c0 812300ac 814300b0 2fa0 409e0008 e81e8028 e87e8038 f8010070 4bd3e4d1 6000 0fe0 4800 7c0802a6 faa1ffa8 This issue occurs only on the two kernels mentioned above. My Question is , is this the bug you described here: https://bugs.openfabrics.org/show_bug.cgi?id=807 or should I open a new one? Kind Regards Stefan Roscher ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: [PATCH] IB/sdp Fix a kernel panic in put_page() that was passingNULL
The question is, how did that page get unset? My understanding is that the get_user_pages() call in sdp_bz_setup() should have incremented the count for each page in the range. Since there is supposed to be only one thread doing bzcopy at a time (to preserve order), the sdp_bz_cleanup() ought to be able to call without checking. Simple netperf testing uses a single send thread so it should just work. The problem that I am chasing now is that while a thread is sleeping waiting for credits, the socket is not locked. Another thread hops in and tries to do a zero copy. Since there is only one active context, and it is associated with the socket structure, it will be trampled by the second thread. Thread 2 blocks waiting for credit, thread 1 wakes up and decrements thread 2's page and bad things follow. I've got that fixed, but there are some cleanup issues that are not quite working. Thoughts? Thanks, JIm Jim Mott Mellanox Technologies Ltd. mail: [EMAIL PROTECTED] Phone: 512-294-5481 -Original Message- From: Ralph Campbell [mailto:[EMAIL PROTECTED] Sent: Friday, November 30, 2007 5:07 PM To: Jim Mott Cc: EWG Subject: [PATCH] IB/sdp Fix a kernel panic in put_page() that was passingNULL The new bzcopy_state() was trying to free unset bz-pages[i] entries. Signed-off-by: Dave Olson [EMAIL PROTECTED] diff --git a/drivers/infiniband/ulp/sdp/sdp_main.c b/drivers/infiniband/ulp/sdp/sdp_main.c index 809f7b8..35c4dd3 100644 --- a/drivers/infiniband/ulp/sdp/sdp_main.c +++ b/drivers/infiniband/ulp/sdp/sdp_main.c @@ -1212,7 +1212,8 @@ static inline struct bzcopy_state *sdp_bz_cleanup(struct bzcopy_state *bz) if (bz-pages) { for (i = bz-cur_page; i bz-page_cnt; i++) - put_page(bz-pages[i]); + if (bz-pages[i]) + put_page(bz-pages[i]); kfree(bz-pages); } ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: SDP / TCP_NODELAY
Hi, SDP includes support for TCP_NODELAY and TCP_CORK. At a high level, these options control buffering and they seem to be working. If you turn off the nagle buffering algorithm using these options, then sends go right away. This is how TCP_RR testing with netperf run to measure latency. That is my understanding anyway. If you have some code that does not seem to be running right, send it along and I will take a look. Thanks, JIm Jim Mott Mellanox Technologies Ltd. mail: [EMAIL PROTECTED] Phone: 512-294-5481 -Original Message- From: Or Gerlitz [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 21, 2007 3:07 AM To: Jim Mott Cc: EWG Subject: SDP / TCP_NODELAY Hi Jim, I was handed a benchmark results from which I suspected that the SDP code that was used might not respect the TCP_NODELAY option, so I wanted to clarify this with you. I think it was OFED 1.2 based. So is it supported in OFED 1.2? in OFED 1.3? thanks, Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] RE: SDP / TCP_NODELAY
Hi, SDP does not keep packet size statistics, but could if people want. Active connection info can be found in /proc/net/sdp, and the counters (and parameters) we do keep are in /sys/module/ib_sdp. There are a couple debug options that can be enabled. If you want to see per-frame detail, then you need to build with CONFIG_INFINIBAND_SDP_DEBUG_DATA defined and enable with /sys/module/ib_sdp/hmm/data_debug_leve=1. Then run something small and use dmesg to get the details. You can also enable some debug info on the normal build by setting debug_level=1 in the /sys/module/ib_sdp tree. Thanks, JIm Jim Mott Mellanox Technologies Ltd. mail: [EMAIL PROTECTED] Phone: 512-294-5481 -Original Message- From: Or Gerlitz [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 21, 2007 7:05 AM To: Jim Mott Cc: EWG Subject: Re: SDP / TCP_NODELAY Jim Mott wrote: SDP includes support for TCP_NODELAY and TCP_CORK. At a high level, these options control buffering and they seem to be working. If you turn off the nagle buffering algorithm using these options, then sends go right away. This is how TCP_RR testing with netperf run to measure latency. That is my understanding anyway. If you have some code that does not seem to be running right, send it along and I will take a look. OK, thanks for the info, I will check to see if they still suspect that the NO_DELAY does not function as it should and let you know. Is there a way to see packet size statistics for SDP without an IB analyzer, do you expose some sysfs based statistics etc? Or. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg