Re: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh

2008-01-24 Thread Weikuan Yu

Hi, Scott,

I have been running SDP tests across two woodcrest nodes with 4x DDR 
cards using OFED-1.2.5.4. The card/firmware info is below.


CA 'mthca0'
CA type: MT25208
Number of ports: 2
Firmware version: 5.1.400
Hardware version: a0
Node GUID: 0x0002c90200228e0c
System image GUID: 0x0002c90200228e0f

I could not get a bandwidth more than 5Gbps like you have shown here. 
Wonder if I need to upgrade to the latest software or firmware? Any 
suggestions?


Thanks,
--Weikuan


TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.225.77 
(192.168

.225.77) port 0 AF_INET
Recv   SendSend  Utilization   Service 
Demand

Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local 
remote

bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

131072 131072 13107210.00  4918.95   21.2924.991.418 
1.665



Scott Weitzenkamp (sweitzen) wrote:

Jim,

I am trying OFED-1.3-20071231-0600 and RHEL4 x86_64 on a dual CPU
(single core each CPU) Xeon system.  I do not see any performance
improvement (either throughput or CPU utilization) using netperf when I
set /sys/module/ib_sdp/sdp_zcopy_thresh to 16384.  Can you elaborate on
your HCA type, and performance improvement you see?

Here's an example netperf command line when using a Cheetah DDR HCA and
1.2.917 firmware (I have also tried ConnectX and 2.3.000 firmware too):

[EMAIL PROTECTED] ~]$ LD_PRELOAD=libsdp.so netperf241 -v2 -4 -H
192.168.1.201 -l 30 -t TCP_STREAM -c -C --   -m 65536
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.201
(192.168.1.201) port 0 AF_INET : histogram : demo

Recv   SendSend  Utilization   Service
Demand
Socket Socket  Message  Elapsed  Send Recv Send
Recv
Size   SizeSize Time Throughput  localremote   local
remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB
us/KB

 87380  16384  6553630.01  7267.70   55.0661.271.241
1.381 


Alignment  Offset BytesBytes   Sends   Bytes
Recvs
Local  Remote  Local  Remote  Xfered   Per Per
Send   RecvSend   Recv Send (avg)  Recv (avg)
8   8  0   0 2.726e+10  65536.00415942   48106.01
566648


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh

2008-01-24 Thread Jim Mott
Hi,
  64K is borderline for seeing bzcopy effect.  Using an AMD 6000+ (3 Ghz
dual core) in Asus M2A-VM motherboard with ConnectX running 2.3 firmware
and OFED 1.3-rc3 stack running on 2.6.23.8 kernel.org kernel, I ran the
test for 128K:
  5546  sdp_zcopy_thresh=0 (off)
  8709  sdp_zcopy_thresh=65536

For these tests, I just have LD_PRELOAD set in my environment.

===

I see that TCP_MAXSEG is not being handled by libsdp and will look into
it.


[EMAIL PROTECTED] ~]# modprobe ib_sdp
[EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c
-C -- -m 128K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 193.168.10.198
(193.168.10.198) port 0 AF_INET
netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92
Recv   SendSend  Utilization   Service
Demand
Socket Socket  Message  Elapsed  Send Recv Send
Recv
Size   SizeSize Time Throughput  localremote   local
remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB
us/KB

 87380  16384 13107230.01  5545.69   51.4714.431.521
1.706  

Alignment  Offset BytesBytes   Sends   Bytes
Recvs
Local  Remote  Local  Remote  Xfered   Per Per
Send   RecvSend   Recv Send (avg)  Recv (avg)
8   8  0   0 2.08e+10  131072.00158690   33135.60
627718

Maximum
Segment
Size (bytes)
-1
[EMAIL PROTECTED] ~]# echo 65536
/sys/module/ib_sdp/parameters/sdp_zcopy_thresh 
[EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c
-C -- -m 128K
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 193.168.10.198
(193.168.10.198) port 0 AF_INET
netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92
Recv   SendSend  Utilization   Service
Demand
Socket Socket  Message  Elapsed  Send Recv Send
Recv
Size   SizeSize Time Throughput  localremote   local
remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB
us/KB

 87380  16384 13107230.01  8708.58   50.6314.550.953
1.095  

Alignment  Offset BytesBytes   Sends   Bytes
Recvs
Local  Remote  Local  Remote  Xfered   Per Per
Send   RecvSend   Recv Send (avg)  Recv (avg)
8   8  0   0 3.267e+10  131072.00249228   26348.30
1239807

Maximum
Segment
Size (bytes)
-1

Thanks,
JIm

Jim Mott
Mellanox Technologies Ltd.
mail: [EMAIL PROTECTED]
Phone: 512-294-5481


-Original Message-
From: Weikuan Yu [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 24, 2008 9:09 AM
To: Scott Weitzenkamp (sweitzen)
Cc: Jim Mott; ewg@lists.openfabrics.org; [EMAIL PROTECTED]
Subject: Re: [ofa-general] RE: [ewg] Not seeing any SDP performance
changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh

Hi, Scott,

I have been running SDP tests across two woodcrest nodes with 4x DDR 
cards using OFED-1.2.5.4. The card/firmware info is below.

CA 'mthca0'
 CA type: MT25208
 Number of ports: 2
 Firmware version: 5.1.400
 Hardware version: a0
 Node GUID: 0x0002c90200228e0c
 System image GUID: 0x0002c90200228e0f

I could not get a bandwidth more than 5Gbps like you have shown here. 
Wonder if I need to upgrade to the latest software or firmware? Any 
suggestions?

Thanks,
--Weikuan


TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.225.77 
(192.168
.225.77) port 0 AF_INET
Recv   SendSend  Utilization   Service 
Demand
Socket Socket  Message  Elapsed  Send Recv Send
Recv
Size   SizeSize Time Throughput  localremote   local 
remote
bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB
us/KB

131072 131072 13107210.00  4918.95   21.2924.991.418 
1.665


Scott Weitzenkamp (sweitzen) wrote:
 Jim,
 
 I am trying OFED-1.3-20071231-0600 and RHEL4 x86_64 on a dual CPU
 (single core each CPU) Xeon system.  I do not see any performance
 improvement (either throughput or CPU utilization) using netperf when
I
 set /sys/module/ib_sdp/sdp_zcopy_thresh to 16384.  Can you elaborate
on
 your HCA type, and performance improvement you see?
 
 Here's an example netperf command line when using a Cheetah DDR HCA
and
 1.2.917 firmware (I have also tried ConnectX and 2.3.000 firmware
too):
 
 [EMAIL PROTECTED] ~]$ LD_PRELOAD=libsdp.so netperf241 -v2 -4 -H
 192.168.1.201 -l 30 -t TCP_STREAM -c -C --   -m 65536
 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.201
 (192.168.1.201) port 0 AF_INET : histogram : demo
 
 Recv   SendSend  Utilization   Service
 Demand
 Socket Socket  Message  Elapsed  Send Recv Send
 Recv
 Size   SizeSize Time Throughput  localremote   local
 remote
 bytes  bytes   bytessecs.10

RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh

2008-01-24 Thread Scott Weitzenkamp (sweitzen)
I've tested on RHEL4 and RHEL5, and see no sdp_zcopy_thresh improvement
for any message size, as measured with netperf, for any Arbel or
ConnectX HCA.

Scott

 
 -Original Message-
 From: Jim Mott [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 24, 2008 7:57 AM
 To: Weikuan Yu; Scott Weitzenkamp (sweitzen)
 Cc: ewg@lists.openfabrics.org; [EMAIL PROTECTED]
 Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP 
 performance changes inOFED 1.3 beta, and I get Oops when 
 enabling sdp_zcopy_thresh
 
 Hi,
   64K is borderline for seeing bzcopy effect.  Using an AMD 
 6000+ (3 Ghz
 dual core) in Asus M2A-VM motherboard with ConnectX running 
 2.3 firmware
 and OFED 1.3-rc3 stack running on 2.6.23.8 kernel.org kernel, 
 I ran the
 test for 128K:
   5546  sdp_zcopy_thresh=0 (off)
   8709  sdp_zcopy_thresh=65536
 
 For these tests, I just have LD_PRELOAD set in my environment.
 
 ===
 
 I see that TCP_MAXSEG is not being handled by libsdp and will 
 look into
 it.
 
 
 [EMAIL PROTECTED] ~]# modprobe ib_sdp
 [EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c
 -C -- -m 128K
 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
 193.168.10.198
 (193.168.10.198) port 0 AF_INET
 netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92
 Recv   SendSend  Utilization   Service
 Demand
 Socket Socket  Message  Elapsed  Send Recv Send
 Recv
 Size   SizeSize Time Throughput  localremote   local
 remote
 bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB
 us/KB
 
  87380  16384 13107230.01  5545.69   51.4714.431.521
 1.706  
 
 Alignment  Offset BytesBytes   Sends   Bytes
 Recvs
 Local  Remote  Local  Remote  Xfered   Per Per
 Send   RecvSend   Recv Send (avg)  Recv (avg)
 8   8  0   0 2.08e+10  131072.00158690   33135.60
 627718
 
 Maximum
 Segment
 Size (bytes)
 -1
 [EMAIL PROTECTED] ~]# echo 65536
 /sys/module/ib_sdp/parameters/sdp_zcopy_thresh 
 [EMAIL PROTECTED] ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t TCP_STREAM -c
 -C -- -m 128K
 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
 193.168.10.198
 (193.168.10.198) port 0 AF_INET
 netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92
 Recv   SendSend  Utilization   Service
 Demand
 Socket Socket  Message  Elapsed  Send Recv Send
 Recv
 Size   SizeSize Time Throughput  localremote   local
 remote
 bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB
 us/KB
 
  87380  16384 13107230.01  8708.58   50.6314.550.953
 1.095  
 
 Alignment  Offset BytesBytes   Sends   Bytes
 Recvs
 Local  Remote  Local  Remote  Xfered   Per Per
 Send   RecvSend   Recv Send (avg)  Recv (avg)
 8   8  0   0 3.267e+10  131072.00249228   26348.30
 1239807
 
 Maximum
 Segment
 Size (bytes)
 -1
 
 Thanks,
 JIm
 
 Jim Mott
 Mellanox Technologies Ltd.
 mail: [EMAIL PROTECTED]
 Phone: 512-294-5481
 
 
 -Original Message-
 From: Weikuan Yu [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 24, 2008 9:09 AM
 To: Scott Weitzenkamp (sweitzen)
 Cc: Jim Mott; ewg@lists.openfabrics.org; [EMAIL PROTECTED]
 Subject: Re: [ofa-general] RE: [ewg] Not seeing any SDP performance
 changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
 
 Hi, Scott,
 
 I have been running SDP tests across two woodcrest nodes with 4x DDR 
 cards using OFED-1.2.5.4. The card/firmware info is below.
 
 CA 'mthca0'
  CA type: MT25208
  Number of ports: 2
  Firmware version: 5.1.400
  Hardware version: a0
  Node GUID: 0x0002c90200228e0c
  System image GUID: 0x0002c90200228e0f
 
 I could not get a bandwidth more than 5Gbps like you have shown here. 
 Wonder if I need to upgrade to the latest software or firmware? Any 
 suggestions?
 
 Thanks,
 --Weikuan
 
 
 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
 192.168.225.77 
 (192.168
 .225.77) port 0 AF_INET
 Recv   SendSend  Utilization  
  Service 
 Demand
 Socket Socket  Message  Elapsed  Send Recv Send
 Recv
 Size   SizeSize Time Throughput  localremote   local 
 remote
 bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB
 us/KB
 
 131072 131072 13107210.00  4918.95   21.2924.991.418 
 1.665
 
 
 Scott Weitzenkamp (sweitzen) wrote:
  Jim,
  
  I am trying OFED-1.3-20071231-0600 and RHEL4 x86_64 on a dual CPU
  (single core each CPU) Xeon system.  I do not see any performance
  improvement (either throughput or CPU utilization) using 
 netperf when
 I
  set /sys/module/ib_sdp/sdp_zcopy_thresh to 16384.  Can you elaborate
 on
  your HCA type, and performance improvement you

RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh

2007-12-12 Thread Scott Weitzenkamp (sweitzen)
Jim, when do you plan to enably bzcopy by default?

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems


 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Jim Mott
 Sent: Friday, November 30, 2007 12:04 PM
 To: ewg@lists.openfabrics.org
 Cc: [EMAIL PROTECTED]
 Subject: [ofa-general] RE: [ewg] Not seeing any SDP 
 performance changes inOFED 1.3 beta, and I get Oops when 
 enabling sdp_zcopy_thresh
 
 Hi,
   This kernel Oops is new and I will look at it.  Dotan and 
 the Mellanox regression tests have been keeping me busy 
 recently.  There
 was a problem like this, but only in multi-threaded apps 
 using a single socket or when doing cleanup after ^C.
 
   I will re-enable default bzcopy behavior once all the 
 important Mellanox regression tests are passing.  Until then, 
 setting the
 sdp_zcopy_threah variable by hand (8192 and up should give 
 better performance) and running simple tests like netperf should be
 working fine.  You should not be seeing any problem here.  [I 
 have only tested locally with x86_64 rhat4u4, rhat5, 2.6.23.8, and
 2.6.24-rc2.  Mellanox regression tests everything and they 
 have not submitted this Oops yet.]
 
   I have opened bugs in the openfabrics bugzilla for 
 everything I am currently working on.  It is down right now 
 or I would add
 pointers.
 
 
 Here is my work list; additions or priority changes welcome:
 
 SDP OPEN ISSUES LIST (Priority order)
 =
 1) DONE: BUG: Unload of mlx4 and ib_sdp fails while SDP active
   11/6 [PATCH 1/1 V2] SDP - Fix reference count bug ...
 
 2) DONE: BUG: Many data corruption failures
   11/11 [PATCH 1/1] SDP - Fix bug where zcopy bcopy returns ...
 
 3) DONE: Bug 793 - kernel BUG at net/core/skbuff.c:95!
   11/26 [PATCH 1/1] SDP - bug793; skbuff changes ...
 
 4) TODO: BUG: kernel oops in SDP regression 
   Replicated problem by hitting ^C during a transfer.  I have 
 created a patch that fixes the problem, but it needs more work
 to move into production.  There are some side effects I do not
 yet understand.
   This is the one I am working on now.  I hope to drop it soon.
 There is a bug open tracking it.
 
 5) TODO: BUG: libsdp returns good RC when it should fail
 
 6) TODO: BUG: aio_test fails in SDP regression
 
 7) TODO: Bug 779 - Lock ordering problem during accept on 1.2.5
   After building a 2.6.23.8 kernel with lock checking enabled, I
 can not reproduce this problem.  Looks like I'll need more input
 from the reporter.  (Bug updated to say this).  I will continue to
 code review though.
 
 8) DONE: Bug 294 - connect does not allow AF_INET_SDP
   [fix in bugzilla dropped] 
 
 9) DONE: Backport work needed to support 2.6.24
 
 10) TODO: Package user space libsdp for Redhat
   This is supposed to be easy to do, but it will take me some time
 to figure out the detail.  
 
 11) DONE: BUG: Memory leak
   11/20 [PATCH 1/1 v2] SDP - Fix a memory leak in bzcopy
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Scott 
 Weitzenkamp (sweitzen)
 Sent: Friday, November 30, 2007 12:37 PM
 To: Jim Mott; Scott Weitzenkamp (sweitzen); ewg@lists.openfabrics.org
 Cc: [EMAIL PROTECTED]
 Subject: [ewg] Not seeing any SDP performance changes in OFED 
 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
 
 Jim,
 
 Using netperf with TCP_STREAM and TCP_RR, I'm not seeing any 
 changes in
 SDP throughput or CPU utilization comparing OFED 1.3 beta and OFED
 1.2.5.  Looks like I need to set a non-zero value in
 /sys/module/ib_sdp/sdp_zcopy_thresh?  Do you plan to enable this by
 default soon?
 
 I tried echo 4096  /sys/module/ib_sdp/sdp_zcopy_thresh on RHEL4 and
 then tried netperf, and got an Oops.
 
 Unable to handle kernel NULL pointer deref
 erence at  RIP:
 Nov/30 10:33 am80163ff0{put_page+0}
 Nov/30 10:33 amPML4 1a3047067 PGD 1a7a6d067 PMD 0
 Nov/30 10:33 amOops:  [1] SMP
 Nov/30 10:33 amCPU 0
 Nov/30 10:33 amModules linked in: parport_pc lp parport autofs4
 i2c_dev i2c_co
 re nfs lockd nfs_acl sunrpc rdma_ucm(U) rds(U) ib_sdp(U) rdma_cm(U)
 iw_cm(U) ib_
 addr(U) mlx4_ib(U) mlx4_core(U) ds yenta_socket pcmcia_core dm_mirror
 dm_multipa
 th dm_mod joydev button battery ac uhci_hcd ehci_hcd shpchp 
 ib_mthca(U)
 ib_ipoib
 (U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U)
 ib_core(U) md5
  ipv6 e1000 floppy ata_piix libata sg ext3 jbd mptscsih mptsas mptspi
 mptscsi mp
 tbase sd_mod scsi_mod
 Nov/30 10:33 amPid: 6802, comm: netperf241 Not tainted
 2.6.9-55.ELlargesmp
 Nov/30 10:33 amRIP: 0010:[80163ff0]
 80163ff0{put_page+0}
 Nov/30 10:33 amRSP: 0018:0101a7bcbbc0  EFLAGS: 00010203
 Nov/30 10:33 amRAX:  RBX: 0001 RCX:
 02
 02
 Nov/30 10:33 amRDX: 0101b0b43e80 RSI: 0202 RDI:
 00
 00
 Nov/30 10:33 amRBP: 0101b85761c0 R08:  R09