Jim, when do you plan to enably bzcopy by default? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems
> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Jim Mott > Sent: Friday, November 30, 2007 12:04 PM > To: ewg@lists.openfabrics.org > Cc: [EMAIL PROTECTED] > Subject: [ofa-general] RE: [ewg] Not seeing any SDP > performance changes inOFED 1.3 beta, and I get Oops when > enabling sdp_zcopy_thresh > > Hi, > This kernel Oops is new and I will look at it. Dotan and > the Mellanox regression tests have been keeping me busy > recently. There > was a problem like this, but only in multi-threaded apps > using a single socket or when doing cleanup after ^C. > > I will re-enable default bzcopy behavior once all the > important Mellanox regression tests are passing. Until then, > setting the > sdp_zcopy_threah variable by hand (8192 and up should give > better performance) and running simple tests like netperf should be > working fine. You should not be seeing any problem here. [I > have only tested locally with x86_64 rhat4u4, rhat5, 2.6.23.8, and > 2.6.24-rc2. Mellanox regression tests everything and they > have not submitted this Oops yet.] > > I have opened bugs in the openfabrics bugzilla for > everything I am currently working on. It is down right now > or I would add > pointers. > > > Here is my work list; additions or priority changes welcome: > > SDP OPEN ISSUES LIST (Priority order) > ===================================== > 1) DONE: BUG: Unload of mlx4 and ib_sdp fails while SDP active > 11/6 [PATCH 1/1 V2] SDP - Fix reference count bug ... > > 2) DONE: BUG: Many data corruption failures > 11/11 [PATCH 1/1] SDP - Fix bug where zcopy bcopy returns ... > > 3) DONE: Bug 793 - kernel BUG at net/core/skbuff.c:95! > 11/26 [PATCH 1/1] SDP - bug793; skbuff changes ... > > 4) TODO: BUG: kernel oops in SDP regression > Replicated problem by hitting ^C during a transfer. I have > created a patch that fixes the problem, but it needs more work > to move into production. There are some side effects I do not > yet understand. > This is the one I am working on now. I hope to drop it soon. > There is a bug open tracking it. > > 5) TODO: BUG: libsdp returns good RC when it should fail > > 6) TODO: BUG: aio_test fails in SDP regression > > 7) TODO: Bug 779 - Lock ordering problem during accept on 1.2.5 > After building a 2.6.23.8 kernel with lock checking enabled, I > can not reproduce this problem. Looks like I'll need more input > from the reporter. (Bug updated to say this). I will continue to > code review though. > > 8) DONE: Bug 294 - connect does not allow AF_INET_SDP > [fix in bugzilla dropped] > > 9) DONE: Backport work needed to support 2.6.24 > > 10) TODO: Package user space libsdp for Redhat > This is supposed to be easy to do, but it will take me some time > to figure out the detail. > > 11) DONE: BUG: Memory leak > 11/20 [PATCH 1/1 v2] SDP - Fix a memory leak in bzcopy > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Scott > Weitzenkamp (sweitzen) > Sent: Friday, November 30, 2007 12:37 PM > To: Jim Mott; Scott Weitzenkamp (sweitzen); ewg@lists.openfabrics.org > Cc: [EMAIL PROTECTED] > Subject: [ewg] Not seeing any SDP performance changes in OFED > 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh > > Jim, > > Using netperf with TCP_STREAM and TCP_RR, I'm not seeing any > changes in > SDP throughput or CPU utilization comparing OFED 1.3 beta and OFED > 1.2.5. Looks like I need to set a non-zero value in > /sys/module/ib_sdp/sdp_zcopy_thresh? Do you plan to enable this by > default soon? > > I tried "echo 4096 > /sys/module/ib_sdp/sdp_zcopy_thresh" on RHEL4 and > then tried netperf, and got an Oops. > > Unable to handle kernel NULL pointer deref > erence at 0000000000000000 RIP: > <Nov/30 10:33 am><ffffffff80163ff0>{put_page+0} > <Nov/30 10:33 am>PML4 1a3047067 PGD 1a7a6d067 PMD 0 > <Nov/30 10:33 am>Oops: 0000 [1] SMP > <Nov/30 10:33 am>CPU 0 > <Nov/30 10:33 am>Modules linked in: parport_pc lp parport autofs4 > i2c_dev i2c_co > re nfs lockd nfs_acl sunrpc rdma_ucm(U) rds(U) ib_sdp(U) rdma_cm(U) > iw_cm(U) ib_ > addr(U) mlx4_ib(U) mlx4_core(U) ds yenta_socket pcmcia_core dm_mirror > dm_multipa > th dm_mod joydev button battery ac uhci_hcd ehci_hcd shpchp > ib_mthca(U) > ib_ipoib > (U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) > ib_core(U) md5 > ipv6 e1000 floppy ata_piix libata sg ext3 jbd mptscsih mptsas mptspi > mptscsi mp > tbase sd_mod scsi_mod > <Nov/30 10:33 am>Pid: 6802, comm: netperf241 Not tainted > 2.6.9-55.ELlargesmp > <Nov/30 10:33 am>RIP: 0010:[<ffffffff80163ff0>] > <ffffffff80163ff0>{put_page+0} > <Nov/30 10:33 am>RSP: 0018:00000101a7bcbbc0 EFLAGS: 00010203 > <Nov/30 10:33 am>RAX: 0000000000000000 RBX: 0000000000000001 RCX: > 00000000000002 > 02 > <Nov/30 10:33 am>RDX: 00000101b0b43e80 RSI: 0000000000000202 RDI: > 00000000000000 > 00 > <Nov/30 10:33 am>RBP: 00000101b85761c0 R08: 0000000000000000 R09: > 00000000000000 > 00 > <Nov/30 10:33 am>R10: 0000000000000246 R11: ffffffffa02e0e36 R12: > 00000101a4b330 > 80 > <Nov/30 10:33 am>R13: 00000101a7bcbd58 R14: 0000000000000000 R15: > 00000000000100 > 00 > <Nov/30 10:33 am>FS: 0000002a95696940(0000) GS:ffffffff80500380(0000) > knlGS:000 > 0000000000000 > <Nov/30 10:33 am>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > <Nov/30 10:33 am>CR2: 0000000000000000 CR3: 0000000000101000 CR4: > 00000000000006 > e0 > <Nov/30 10:33 am>Process netperf241 (pid: 6802, threadinfo > 00000101a7bca000, tas > k 00000101a70df030) > <Nov/30 10:33 am>Stack: ffffffffa02e110a 0000000000000100 > 0000000000000000 00000 > 00000529780 > <Nov/30 10:33 am> 0001000000000246 0000000000000246 > 000000008013feac 00000 > 800ffffffe0 > <Nov/30 10:33 am> 0000000000000000 00000101a7bcbe88 > <Nov/30 10:33 am>Call > Trace:<ffffffffa02e110a>{:ib_sdp:sdp_sendmsg+724} > <fffffff > f801478b2>{queue_delayed_work+101} > <Nov/30 10:33 am> <ffffffffa02c6200>{:ib_addr:queue_req+122} > <ffffffff802a > 7ecb>{sock_sendmsg+271} > <Nov/30 10:33 am> <ffffffff80169a61>{do_no_page+916} > <ffffffff801359a8>{au > toremove_wake_function+0} > <Nov/30 10:33 am> <ffffffff802a7c53>{sockfd_lookup+16} > <ffffffff802a939a>{ > sys_sendto+195} > <Nov/30 10:33 am> <ffffffff801242b9>{do_page_fault+577} > <ffffffff801934c8> > {dnotify_parent+34} > <Nov/30 10:33 am> <ffffffff80179335>{vfs_read+248} > <ffffffff8011026a>{syst > em_call+126} > <Nov/30 10:33 am> > > <Nov/30 10:33 am>Code: 8b 07 48 89 fa f6 c4 80 74 3b 48 8b 57 10 8b 02 > 48 89 d1 > f6 > <Nov/30 10:33 am>RIP <ffffffff80163ff0>{put_page+0} RSP > <00000101a7bcbbc0> > <Nov/30 10:33 am>CR2: 0000000000000000 > <Nov/30 10:33 am> <0>Kernel panic - not syncing: Oops > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > _______________________________________________ > ewg mailing list > ewg@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > _______________________________________________ > general mailing list > [EMAIL PROTECTED] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg