Re: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
Hi, Scott, I have been running SDP tests across two woodcrest nodes with 4x DDR cards using OFED-1.2.5.4. The card/firmware info is below. CA 'mthca0' CA type: MT25208 Number of ports: 2 Firmware version: 5.1.400 Hardware version: a0 Node GUID: 0x0002c90200228e0c System image GUID: 0x0002c90200228e0f I could not get a bandwidth more than 5Gbps like you have shown here. Wonder if I need to upgrade to the latest software or firmware? Any suggestions? Thanks, --Weikuan TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.225.77 (192.168 .225.77) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 131072 131072 13107210.00 4918.95 21.2924.991.418 1.665 Scott Weitzenkamp (sweitzen) wrote: Jim, I am trying OFED-1.3-20071231-0600 and RHEL4 x86_64 on a dual CPU (single core each CPU) Xeon system. I do not see any performance improvement (either throughput or CPU utilization) using netperf when I set /sys/module/ib_sdp/sdp_zcopy_thresh to 16384. Can you elaborate on your HCA type, and performance improvement you see? Here's an example netperf command line when using a Cheetah DDR HCA and 1.2.917 firmware (I have also tried ConnectX and 2.3.000 firmware too): [EMAIL PROTECTED] ~]$ LD_PRELOAD=libsdp.so netperf241 -v2 -4 -H 192.168.1.201 -l 30 -t TCP_STREAM -c -C -- -m 65536 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.201 (192.168.1.201) port 0 AF_INET : histogram : demo Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 6553630.01 7267.70 55.0661.271.241 1.381 Alignment Offset BytesBytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 2.726e+10 65536.00415942 48106.01 566648 ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
Hi, 64K is borderline for seeing bzcopy effect. Using an AMD 6000+ (3 Ghz dual core) in Asus M2A-VM motherboard with ConnectX running 2.3 firmware and OFED 1.3-rc3 stack running on 2.6.23.8 kernel.org kernel, I ran the test for 128K: 5546 sdp_zcopy_thresh=0 (off) 8709 sdp_zcopy_thresh=65536 For these tests, I just have LD_PRELOAD set in my environment. === I see that TCP_MAXSEG is not being handled by libsdp and will look into it. [EMAIL PROTECTED] ~]# modprobe ib_sdp [EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c -C -- -m 128K TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 193.168.10.198 (193.168.10.198) port 0 AF_INET netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92 Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 13107230.01 5545.69 51.4714.431.521 1.706 Alignment Offset BytesBytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 2.08e+10 131072.00158690 33135.60 627718 Maximum Segment Size (bytes) -1 [EMAIL PROTECTED] ~]# echo 65536 /sys/module/ib_sdp/parameters/sdp_zcopy_thresh [EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c -C -- -m 128K TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 193.168.10.198 (193.168.10.198) port 0 AF_INET netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92 Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 13107230.01 8708.58 50.6314.550.953 1.095 Alignment Offset BytesBytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 3.267e+10 131072.00249228 26348.30 1239807 Maximum Segment Size (bytes) -1 Thanks, JIm Jim Mott Mellanox Technologies Ltd. mail: [EMAIL PROTECTED] Phone: 512-294-5481 -Original Message- From: Weikuan Yu [mailto:[EMAIL PROTECTED] Sent: Thursday, January 24, 2008 9:09 AM To: Scott Weitzenkamp (sweitzen) Cc: Jim Mott; ewg@lists.openfabrics.org; [EMAIL PROTECTED] Subject: Re: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh Hi, Scott, I have been running SDP tests across two woodcrest nodes with 4x DDR cards using OFED-1.2.5.4. The card/firmware info is below. CA 'mthca0' CA type: MT25208 Number of ports: 2 Firmware version: 5.1.400 Hardware version: a0 Node GUID: 0x0002c90200228e0c System image GUID: 0x0002c90200228e0f I could not get a bandwidth more than 5Gbps like you have shown here. Wonder if I need to upgrade to the latest software or firmware? Any suggestions? Thanks, --Weikuan TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.225.77 (192.168 .225.77) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 131072 131072 13107210.00 4918.95 21.2924.991.418 1.665 Scott Weitzenkamp (sweitzen) wrote: Jim, I am trying OFED-1.3-20071231-0600 and RHEL4 x86_64 on a dual CPU (single core each CPU) Xeon system. I do not see any performance improvement (either throughput or CPU utilization) using netperf when I set /sys/module/ib_sdp/sdp_zcopy_thresh to 16384. Can you elaborate on your HCA type, and performance improvement you see? Here's an example netperf command line when using a Cheetah DDR HCA and 1.2.917 firmware (I have also tried ConnectX and 2.3.000 firmware too): [EMAIL PROTECTED] ~]$ LD_PRELOAD=libsdp.so netperf241 -v2 -4 -H 192.168.1.201 -l 30 -t TCP_STREAM -c -C -- -m 65536 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.201 (192.168.1.201) port 0 AF_INET : histogram : demo Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10
RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
I've tested on RHEL4 and RHEL5, and see no sdp_zcopy_thresh improvement for any message size, as measured with netperf, for any Arbel or ConnectX HCA. Scott -Original Message- From: Jim Mott [mailto:[EMAIL PROTECTED] Sent: Thursday, January 24, 2008 7:57 AM To: Weikuan Yu; Scott Weitzenkamp (sweitzen) Cc: ewg@lists.openfabrics.org; [EMAIL PROTECTED] Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh Hi, 64K is borderline for seeing bzcopy effect. Using an AMD 6000+ (3 Ghz dual core) in Asus M2A-VM motherboard with ConnectX running 2.3 firmware and OFED 1.3-rc3 stack running on 2.6.23.8 kernel.org kernel, I ran the test for 128K: 5546 sdp_zcopy_thresh=0 (off) 8709 sdp_zcopy_thresh=65536 For these tests, I just have LD_PRELOAD set in my environment. === I see that TCP_MAXSEG is not being handled by libsdp and will look into it. [EMAIL PROTECTED] ~]# modprobe ib_sdp [EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c -C -- -m 128K TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 193.168.10.198 (193.168.10.198) port 0 AF_INET netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92 Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 13107230.01 5545.69 51.4714.431.521 1.706 Alignment Offset BytesBytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 2.08e+10 131072.00158690 33135.60 627718 Maximum Segment Size (bytes) -1 [EMAIL PROTECTED] ~]# echo 65536 /sys/module/ib_sdp/parameters/sdp_zcopy_thresh [EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c -C -- -m 128K TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 193.168.10.198 (193.168.10.198) port 0 AF_INET netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92 Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 13107230.01 8708.58 50.6314.550.953 1.095 Alignment Offset BytesBytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send RecvSend Recv Send (avg) Recv (avg) 8 8 0 0 3.267e+10 131072.00249228 26348.30 1239807 Maximum Segment Size (bytes) -1 Thanks, JIm Jim Mott Mellanox Technologies Ltd. mail: [EMAIL PROTECTED] Phone: 512-294-5481 -Original Message- From: Weikuan Yu [mailto:[EMAIL PROTECTED] Sent: Thursday, January 24, 2008 9:09 AM To: Scott Weitzenkamp (sweitzen) Cc: Jim Mott; ewg@lists.openfabrics.org; [EMAIL PROTECTED] Subject: Re: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh Hi, Scott, I have been running SDP tests across two woodcrest nodes with 4x DDR cards using OFED-1.2.5.4. The card/firmware info is below. CA 'mthca0' CA type: MT25208 Number of ports: 2 Firmware version: 5.1.400 Hardware version: a0 Node GUID: 0x0002c90200228e0c System image GUID: 0x0002c90200228e0f I could not get a bandwidth more than 5Gbps like you have shown here. Wonder if I need to upgrade to the latest software or firmware? Any suggestions? Thanks, --Weikuan TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.225.77 (192.168 .225.77) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 131072 131072 13107210.00 4918.95 21.2924.991.418 1.665 Scott Weitzenkamp (sweitzen) wrote: Jim, I am trying OFED-1.3-20071231-0600 and RHEL4 x86_64 on a dual CPU (single core each CPU) Xeon system. I do not see any performance improvement (either throughput or CPU utilization) using netperf when I set /sys/module/ib_sdp/sdp_zcopy_thresh to 16384. Can you elaborate on your HCA type, and performance improvement you
RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
Jim, when do you plan to enably bzcopy by default? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jim Mott Sent: Friday, November 30, 2007 12:04 PM To: ewg@lists.openfabrics.org Cc: [EMAIL PROTECTED] Subject: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh Hi, This kernel Oops is new and I will look at it. Dotan and the Mellanox regression tests have been keeping me busy recently. There was a problem like this, but only in multi-threaded apps using a single socket or when doing cleanup after ^C. I will re-enable default bzcopy behavior once all the important Mellanox regression tests are passing. Until then, setting the sdp_zcopy_threah variable by hand (8192 and up should give better performance) and running simple tests like netperf should be working fine. You should not be seeing any problem here. [I have only tested locally with x86_64 rhat4u4, rhat5, 2.6.23.8, and 2.6.24-rc2. Mellanox regression tests everything and they have not submitted this Oops yet.] I have opened bugs in the openfabrics bugzilla for everything I am currently working on. It is down right now or I would add pointers. Here is my work list; additions or priority changes welcome: SDP OPEN ISSUES LIST (Priority order) = 1) DONE: BUG: Unload of mlx4 and ib_sdp fails while SDP active 11/6 [PATCH 1/1 V2] SDP - Fix reference count bug ... 2) DONE: BUG: Many data corruption failures 11/11 [PATCH 1/1] SDP - Fix bug where zcopy bcopy returns ... 3) DONE: Bug 793 - kernel BUG at net/core/skbuff.c:95! 11/26 [PATCH 1/1] SDP - bug793; skbuff changes ... 4) TODO: BUG: kernel oops in SDP regression Replicated problem by hitting ^C during a transfer. I have created a patch that fixes the problem, but it needs more work to move into production. There are some side effects I do not yet understand. This is the one I am working on now. I hope to drop it soon. There is a bug open tracking it. 5) TODO: BUG: libsdp returns good RC when it should fail 6) TODO: BUG: aio_test fails in SDP regression 7) TODO: Bug 779 - Lock ordering problem during accept on 1.2.5 After building a 2.6.23.8 kernel with lock checking enabled, I can not reproduce this problem. Looks like I'll need more input from the reporter. (Bug updated to say this). I will continue to code review though. 8) DONE: Bug 294 - connect does not allow AF_INET_SDP [fix in bugzilla dropped] 9) DONE: Backport work needed to support 2.6.24 10) TODO: Package user space libsdp for Redhat This is supposed to be easy to do, but it will take me some time to figure out the detail. 11) DONE: BUG: Memory leak 11/20 [PATCH 1/1 v2] SDP - Fix a memory leak in bzcopy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Friday, November 30, 2007 12:37 PM To: Jim Mott; Scott Weitzenkamp (sweitzen); ewg@lists.openfabrics.org Cc: [EMAIL PROTECTED] Subject: [ewg] Not seeing any SDP performance changes in OFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh Jim, Using netperf with TCP_STREAM and TCP_RR, I'm not seeing any changes in SDP throughput or CPU utilization comparing OFED 1.3 beta and OFED 1.2.5. Looks like I need to set a non-zero value in /sys/module/ib_sdp/sdp_zcopy_thresh? Do you plan to enable this by default soon? I tried echo 4096 /sys/module/ib_sdp/sdp_zcopy_thresh on RHEL4 and then tried netperf, and got an Oops. Unable to handle kernel NULL pointer deref erence at RIP: Nov/30 10:33 am80163ff0{put_page+0} Nov/30 10:33 amPML4 1a3047067 PGD 1a7a6d067 PMD 0 Nov/30 10:33 amOops: [1] SMP Nov/30 10:33 amCPU 0 Nov/30 10:33 amModules linked in: parport_pc lp parport autofs4 i2c_dev i2c_co re nfs lockd nfs_acl sunrpc rdma_ucm(U) rds(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_ addr(U) mlx4_ib(U) mlx4_core(U) ds yenta_socket pcmcia_core dm_mirror dm_multipa th dm_mod joydev button battery ac uhci_hcd ehci_hcd shpchp ib_mthca(U) ib_ipoib (U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) md5 ipv6 e1000 floppy ata_piix libata sg ext3 jbd mptscsih mptsas mptspi mptscsi mp tbase sd_mod scsi_mod Nov/30 10:33 amPid: 6802, comm: netperf241 Not tainted 2.6.9-55.ELlargesmp Nov/30 10:33 amRIP: 0010:[80163ff0] 80163ff0{put_page+0} Nov/30 10:33 amRSP: 0018:0101a7bcbbc0 EFLAGS: 00010203 Nov/30 10:33 amRAX: RBX: 0001 RCX: 02 02 Nov/30 10:33 amRDX: 0101b0b43e80 RSI: 0202 RDI: 00 00 Nov/30 10:33 amRBP: 0101b85761c0 R08: R09