[openib-general] Re: ib_mthca fails to load with old firmware
Quoting r. Ken L Johnson <[EMAIL PROTECTED]>: > Subject: Re: ib_mthca fails to load with old firmware > > On Tue, 16 May 2006 at 22:05:42 -0700, Roland Dreier wrote: > > > You could try passing the module option "fw_cmd_doorbell=0" to > > ib_mthca. That may work around things. > > Thanks Roland and Michael, that did it. Just added the following to > the /etc/modprobe.conf.local: > > options ib_mthca fw_cmd_doorbell=0 Hmm. There have been recent reports on configurations which have trouble working with fw_cmd_doorbell=1, and not all of them old FW. I never saw this in the lab. Roland, should we change fw_cmd_doorbell to 0 by default, until we figure out what is going on? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] opensm segfault?
On Wed, May 17, 2006 at 09:10:11AM +0300, Eitan Zahavi wrote: > cl_memcpy should have some debug capabilities on top of memcpy ... > cl memory management provide means to track all memory allocations, etc. There are a huge number of canned solutions that provide a way to debug memory problems without polluting the code with wrapper functions... You can even fairly easially take your particular tracking functions and build them into a canned linkable solution. Wrapping ISO C (and IMHO, SUSv3) functions is almost always a bad idea. It creates a maintenance pain because people will inevitably add new code that doesn't use the wrappers. Debugging hooks can always be integrated in with linker tricks and portability is _always_ better served by just providing missing ISO and SUSv3 functions on deficient platforms (using autoconf, libraries and #include_next this can be made totally seamless) Jason ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] opensm segfault?
cl_memcpy should have some debug capabilities on top of memcpy ... cl memory management provide means to track all memory allocations, etc. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Sasha Khapyorsky > Sent: Wednesday, May 17, 2006 2:11 AM > To: Troy Benjegerdes > Cc: openib-general@openib.org > Subject: Re: [openib-general] opensm segfault? > > Hi Troy, > > On 14:41 Tue 16 May , Troy Benjegerdes wrote: > > I got this after an indeterminate amount of time running opensm.. > > May this be reproducible? Or it is completely random failure? > > > (gdb) bt > > #0 0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0, > > count=64) at cl_memory_osd.c:87 > > #1 0x00415053 in osm_pkey_tbl_sync_new_blocks ( > > p_pkey_tbl=0x2ad99228) at osm_pkey.c:127 > > #2 0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40) > > at osm_pkey_mgr.c:407 > > #3 0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, > > signal=3) > > at osm_state_mgr.c:2243 > > #4 0x0043c88f in __osm_state_mgr_ctrl_disp_callback ( > > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 > > #5 0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0) > > at cl_dispatcher.c:108 > > #6 0x2b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268) > > at cl_threadpool.c:78 > > #7 0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at > > cl_thread.c:61 > > #8 0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 > > #9 0x2b90b12c8273 in clone () from /lib/libc.so.6 > > > > > > > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This > > just seems like excessive uneeded abstraction. > > Absolutely agree with you. > > Sasha. > > > I'm running opensm from subversion rev 7091.. > > > > May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn > > 6251:7091M > > > > the only local changes are as follows: > > > > [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff > > Index: osm/opensm/osm_port_info_rcv.c > > === > > --- osm/opensm/osm_port_info_rcv.c (revision 7091) > > +++ osm/opensm/osm_port_info_rcv.c (working copy) > > @@ -469,9 +469,14 @@ > >goto Exit; > > } > > > > +#if 0 > > /* Check for IBM eHCA firmware defect in reporting partition > > * enforcement cap */ > > if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == > > IBM_VENDOR_ID) > >p_switch->switch_info.enforce_cap = 0; > > +#endif > > +/* Check for busted divergenet switch on ameslab network */ > > +if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e000152) > > + p_switch->switch_info.enforce_cap = 0; > > > > /* Bail out if this is a switch with no partition enforcement > > * capability */ > > if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ib_mthca fails to load with old firmware
On Tue, 16 May 2006 at 22:05:42 -0700, Roland Dreier wrote: > You could try passing the module option "fw_cmd_doorbell=0" to > ib_mthca. That may work around things. Thanks Roland and Michael, that did it. Just added the following to the /etc/modprobe.conf.local: options ib_mthca fw_cmd_doorbell=0 Regards, -- Ken L Johnson <[EMAIL PROTECTED]> ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: ib_mthca fails to load with old firmware
Quoting r. Ken L Johnson <[EMAIL PROTECTED]>: > Subject: ib_mthca fails to load with old firmware > > I'm running into a problem when I try to use the OFED RC4 release on some > blade systems that have TopSpin HCA daughter cards installed (actually > Mellanox). I'm trying to figure out how to update the firmware to the latest > [ http://mellanox.com/support/firmware_table.php ] but it seems I must know > the PSID so I can grab the right firmware image. Can anyone point me in the > right direction here? > > ---8<--- [query device using flint] > > blade9:~ # flint -d /dev/mst/mt25208_pci_cr0 q > Image type: Failsafe > I.S. Version:1 > Chip Revision: A0 > GUID Des:Node Port1Port2Sys image > GUIDs: 0005ad02ad1d 0005ad02ad1e 0005ad02ad1f > 0005ad000100d050 > Board ID:1 > VSD: 1 > PSID: > > --->8--- > > ---8<--- [dmesg output showing ib_mthca load failure] > > <6>ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > <6>ib_mthca: Initializing :02:00.0 > <6>ACPI: PCI Interrupt :02:00.0[A] -> GSI 16 (level, low) -> IRQ 169 > <7>PCI: Setting latency timer of device :02:00.0 to 64 > <6>e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex > <4>ib_mthca :02:00.0: HCA FW version 4.6.0 is old (4.7.400 is current). > <4>ib_mthca :02:00.0: If you have problems, try updating your HCA FW. > <3>ib_mthca :02:00.0: NOP command failed to generate interrupt (IRQ > 169), aborting. > <3>ib_mthca :02:00.0: BIOS or ACPI interrupt routing problem? > <6>ACPI: PCI interrupt for device :02:00.0 disabled > <4>ib_mthca: probe of :02:00.0 failed with error -16 > > --->8--- > > > ---8<--- [hwinfo & lspci output for HCA] > > blade9:~ # hwinfo > [...] > 24: PCI 200.0: 0c06 InfiniBand > [Created at pci.277] > Unique ID: B35A.guWNc33i6_3 > Parent ID: 8otl.l6V0RupyGX6 > SysFS ID: /devices/pci:00/:00:04.0/:02:00.0 > SysFS BusID: :02:00.0 > Hardware Class: unknown > Model: "Mellanox MT25208 InfiniHost III Ex HCA (Tavor compatibility mode)" > Vendor: pci 0x15b3 "Mellanox Technologies" > Device: pci 0x6278 "MT25208 InfiniHost III Ex HCA (Tavor compatibility > mode)" > SubVendor: pci 0x15b3 "Mellanox Technologies" > SubDevice: pci 0x6278 > Revision: 0xa0 > Memory Range: 0xfe90-0xfe9f (rw,non-prefetchable) > Memory Range: 0xdf80-0xdfff (rw,prefetchable) > Memory Range: 0xd000-0xd7ff (rw,prefetchable) > IRQ: 169 (no events) > Module Alias: "pci:v15B3d6278sv15B3sd6278bc0Csc06i00" > Driver Info #0: > Driver Status: ib_mthca is active > Driver Activation Cmd: "modprobe ib_mthca" > Config Status: cfg=new, avail=yes, need=no, active=unknown > Attached to: #17 (PCI bridge) > > blade9:~ # lspci -vv > [...] > 02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex HCA > (Tavor > compatibility mode) (rev a0) > Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor > compatibility mode) > Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Interrupt: pin A routed to IRQ 169 > Region 0: Memory at fe90 (64-bit, non-prefetchable) > [size=1M] > Region 2: Memory at df80 (64-bit, prefetchable) [size=8M] > Region 4: Memory at d000 (64-bit, prefetchable) > [size=128M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [48] Vital Product Data > Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 > Enable- > Address: Data: > Capabilities: [84] MSI-X: Enable- Mask- TabSize=32 > Vector table: BAR=0 offset=00082000 > PBA: BAR=0 offset=00082200 > Capabilities: [60] Express Endpoint IRQ 0 > Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- > Device: Latency L0s <64ns, L1 unlimited > Device: AtnBtn- AtnInd- PwrInd- > Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- > Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > Device: MaxPayload 128 bytes, MaxReadReq 512 bytes > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8 > Link: Latency L0s unlimited, L1 unlimited > Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- > Link: Speed 2.5Gb/s, Width x8 > --->8--- Can you try with fw_cmd_doorbell? -- MST ___ o
Re: [openib-general] ib_mthca fails to load with old firmware
Ken> I'm running into a problem when I try to use the OFED RC4 Ken> release on some blade systems that have TopSpin HCA daughter Ken> cards installed (actually Mellanox). I'm trying to figure out Ken> how to update the firmware to the latest [ Ken> http://mellanox.com/support/firmware_table.php ] but it seems Ken> I must know the PSID so I can grab the right firmware Ken> image. Can anyone point me in the right direction here? For blade HCAs you should contact the HCA vendor for firmware updates. You could try passing the module option "fw_cmd_doorbell=0" to ib_mthca. That may work around things. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] mpirun_mpd crashing
Hi there, Not sure whether this is the proper place to post, but we encounter some mpirun_mpd crashing problems in testing Voltaire MPI (based on MVAPICH) with Sun studio 11 compilers on SuSE Linux 9 SP3 (Opteron). Hope someone can provide some hints: MVAPICH version: 0.9.4 with Voltaire's modifications Compiler used: Sun Studio 11 Problem: When using the mpd version of MVAPICH, mpirun crashes with the following: > mpirun_mpd -np 2 /usr/voltaire/mpi.cc.mpd/bin/cpi [man_0]: [cli_0]: client_bnr_get failed [cli_1]: MPD_Man_msg_handler received unexpected msg :cmd=client_bnr_get_output val=apstc-g4:00024400: : handle_lhs_msgs_input: failed for bnr_get: buf=:cmd=bnr_get src=man_0 dest=man_0 bcast=true attr=MVAPICH_0001\^ gid=0 : [man_0]: application program exited abnormally with status 0 [man_0]: application program signaled with signal 11 (: Segmentation fault) The "rsh" version is working properly, and the gcc compiled version of mpd is working on the same machine. Thanks! Regards, Liang Peng -- Research Scientist Large Scale Computing Asia Pacific Science & Technology Center Sun Microsystems, Inc. and Nanyang Technological University, Singapore ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] ib_mthca fails to load with old firmware
I'm running into a problem when I try to use the OFED RC4 release on some blade systems that have TopSpin HCA daughter cards installed (actually Mellanox). I'm trying to figure out how to update the firmware to the latest [ http://mellanox.com/support/firmware_table.php ] but it seems I must know the PSID so I can grab the right firmware image. Can anyone point me in the right direction here? ---8<--- [query device using flint] blade9:~ # flint -d /dev/mst/mt25208_pci_cr0 q Image type: Failsafe I.S. Version:1 Chip Revision: A0 GUID Des:Node Port1Port2Sys image GUIDs: 0005ad02ad1d 0005ad02ad1e 0005ad02ad1f 0005ad000100d050 Board ID:1 VSD: 1 PSID: --->8--- ---8<--- [dmesg output showing ib_mthca load failure] <6>ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) <6>ib_mthca: Initializing :02:00.0 <6>ACPI: PCI Interrupt :02:00.0[A] -> GSI 16 (level, low) -> IRQ 169 <7>PCI: Setting latency timer of device :02:00.0 to 64 <6>e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex <4>ib_mthca :02:00.0: HCA FW version 4.6.0 is old (4.7.400 is current). <4>ib_mthca :02:00.0: If you have problems, try updating your HCA FW. <3>ib_mthca :02:00.0: NOP command failed to generate interrupt (IRQ 169), aborting. <3>ib_mthca :02:00.0: BIOS or ACPI interrupt routing problem? <6>ACPI: PCI interrupt for device :02:00.0 disabled <4>ib_mthca: probe of :02:00.0 failed with error -16 --->8--- ---8<--- [hwinfo & lspci output for HCA] blade9:~ # hwinfo [...] 24: PCI 200.0: 0c06 InfiniBand [Created at pci.277] Unique ID: B35A.guWNc33i6_3 Parent ID: 8otl.l6V0RupyGX6 SysFS ID: /devices/pci:00/:00:04.0/:02:00.0 SysFS BusID: :02:00.0 Hardware Class: unknown Model: "Mellanox MT25208 InfiniHost III Ex HCA (Tavor compatibility mode)" Vendor: pci 0x15b3 "Mellanox Technologies" Device: pci 0x6278 "MT25208 InfiniHost III Ex HCA (Tavor compatibility mode)" SubVendor: pci 0x15b3 "Mellanox Technologies" SubDevice: pci 0x6278 Revision: 0xa0 Memory Range: 0xfe90-0xfe9f (rw,non-prefetchable) Memory Range: 0xdf80-0xdfff (rw,prefetchable) Memory Range: 0xd000-0xd7ff (rw,prefetchable) IRQ: 169 (no events) Module Alias: "pci:v15B3d6278sv15B3sd6278bc0Csc06i00" Driver Info #0: Driver Status: ib_mthca is active Driver Activation Cmd: "modprobe ib_mthca" Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #17 (PCI bridge) blade9:~ # lspci -vv [...] 02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor compatibility mode) (rev a0) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor compatibility mode) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 8--- Regards, -- Ken L Johnson <[EMAIL PROTECTED]> ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] AMSO1100 + uverbs: ping and opensm errors after installation
Hello, I am having trouble getting the AMSO1100 Ethernet card to work with uverbs. I have installed uverbs from the Installation Cheat Sheet https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet, substituting amso where mthca is listed (for the most part). I also have updated the AMMASSO firmware using the file from http://www.opengridcomputing.com/downloads/ogc_amso_kit_20060308.tgz. My linux kernel is 2.6.16.15, and the ib… & iw_c2 modules are loading successfully at boot. When I try to ping another AMMASSO machine I get the following output: ping: sndmsg: Network is down accompanied by a dmesg report: Virtual device iw1 asks to queue packet! The following is printed when running ibv_devinfo: hca_id: amso0 fw_ver: 1.1.1 node_guid: 000d:b200:0845: sys_image_guid: 000d:b200:0844: vendor_id: 0x vendor_part_id: 0 hw_ver: 0x0 board_id: AMSO1100 Board ID phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 512 (2) sm_lid: 0 port_lid: 0 port_lmc: 0x00 It looks like some values are not being initialized. Lastly, running opensm prints: - OpenSM Rev:openib-1.2.0 Command Line Arguments: Log File: /var/log/osm.log - OpenSM Rev:openib-1.2.0 Using default guid 0x0 Error: Could not get port guid Exiting SM Does anyone know which step(s) I’ve missed in correctly setting up my network? Thanks for the help, Chris ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
BTW, I think the patch below is correct as well. This avoids problems where the SRP driver waits forever for a completion, for example if sending the DREQ fails because the connection has already been disconnected by the target. Does this scenario seem like the deadlock you thought you saw? --- linux-kernel/infiniband/ulp/srp/ib_srp.c(revision 7245) +++ linux-kernel/infiniband/ulp/srp/ib_srp.c(working copy) @@ -342,7 +342,10 @@ static void srp_disconnect_target(struct /* XXX should send SRP_I_LOGOUT request */ init_completion(&target->done); - ib_send_cm_dreq(target->cm_id, NULL, 0); + if (ib_send_cm_dreq(target->cm_id, NULL, 0)) { + printk(KERN_DEBUG PFX "Sending CM DREQ failed\n"); + return; + } wait_for_completion(&target->done); } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
> +/* > + * We need 2 scsi_host_put becuase there are two get: > + * in scsi_host_alloc and in scsi_add_host > + */ > +scsi_host_put(target->scsi_host); > scsi_host_put(target->scsi_host); Hmm, this doesn't seem right to me. If I try this, then I get a crash because the scsi_host is already gone after the first put. I verified that the reference count is 1 before these puts, and with the unmodified module I don't see anything left in /sys/class/scsi_host after unloading the module. What kernel are you seeing problems with? I'm testing with an up-to-date git kernel, although I doubt it makes a difference (did SCSI reference counting change recently??). I do think there are some extra scsi_host_put() calls in srp_remove_work() -- I think the double scsi_host_put() dates back to a version (which I may never even have checked in) where there was a scsi_host_get() to avoid the scsi_host going away between the schedule_work() and srp_remove_work() actually running. So the patch below seems correct to me. What do you think? --- linux-kernel/infiniband/ulp/srp/ib_srp.c(revision 7245) +++ linux-kernel/infiniband/ulp/srp/ib_srp.c(working copy) @@ -353,7 +356,6 @@ static void srp_remove_work(void *target spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_DEAD) { spin_unlock_irq(target->scsi_host->host_lock); - scsi_host_put(target->scsi_host); return; } target->state = SRP_TARGET_REMOVED; @@ -367,8 +369,6 @@ static void srp_remove_work(void *target ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); scsi_host_put(target->scsi_host); - /* And another put to really free the target port... */ - scsi_host_put(target->scsi_host); } static int srp_connect_target(struct srp_target_port *target) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] opensm segfault?
Hi Troy, On 14:41 Tue 16 May , Troy Benjegerdes wrote: > I got this after an indeterminate amount of time running opensm.. May this be reproducible? Or it is completely random failure? > (gdb) bt > #0 0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0, > count=64) at cl_memory_osd.c:87 > #1 0x00415053 in osm_pkey_tbl_sync_new_blocks ( > p_pkey_tbl=0x2ad99228) at osm_pkey.c:127 > #2 0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40) > at osm_pkey_mgr.c:407 > #3 0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, > signal=3) > at osm_state_mgr.c:2243 > #4 0x0043c88f in __osm_state_mgr_ctrl_disp_callback ( > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 > #5 0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0) > at cl_dispatcher.c:108 > #6 0x2b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268) > at cl_threadpool.c:78 > #7 0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at > cl_thread.c:61 > #8 0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 > #9 0x2b90b12c8273 in clone () from /lib/libc.so.6 > > > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This > just seems like excessive uneeded abstraction. Absolutely agree with you. Sasha. > I'm running opensm from subversion rev 7091.. > > May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn > 6251:7091M > > the only local changes are as follows: > > [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff > Index: osm/opensm/osm_port_info_rcv.c > === > --- osm/opensm/osm_port_info_rcv.c (revision 7091) > +++ osm/opensm/osm_port_info_rcv.c (working copy) > @@ -469,9 +469,14 @@ >goto Exit; > } > > +#if 0 > /* Check for IBM eHCA firmware defect in reporting partition > * enforcement cap */ > if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == > IBM_VENDOR_ID) >p_switch->switch_info.enforce_cap = 0; > +#endif > +/* Check for busted divergenet switch on ameslab network */ > +if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e000152) > + p_switch->switch_info.enforce_cap = 0; > > /* Bail out if this is a switch with no partition enforcement > * capability */ > if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 15 of 53] ipath - make some maximum values more sane
Bryan O'Sullivan wrote: Increase the limits on some maximum values. I noticed a rdma/message max size limitation of 4096 the last time I ran some dapl tests. Are there plans to increase or did I miss it somewhere in all the patches? Here are the max values returned from the ipath ibv_query_device: query_hca: (ver=20401) ep 65535 ep_q 65535 evd 65535 evd_q 65535 query_hca: msg 4096 rdma 4096 iov 255 lmr 65535 rmr 0 query_hca: dto 65535 iov 255 rdma i1,o1 Thanks, -arlin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel
Hi I have been trying to build OFED-1.0-rc4 on FC5 as well. MVAPICH builds if you fix the error - strndup should probably be strdup. Simple fix. We have found that only iser, open-iscsi, mpitests and ibutils do not build right now for us. We do not need iser or open-iscsi so are not going to spend time on those - mpitests and ibutils would be nice. Scott ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] opensm segfault?
On Tue, 2006-05-16 at 16:10, Roland Dreier wrote: > Troy> And why the heck is "cl_memcpy" just a call to 'memcpy' > Troy> anyway? This just seems like excessive uneeded abstraction. > > Hal> It's part of the component library, which is an OS > Hal> abstraction layer. > > memcpy() is specified by the ISO C standard, so it seems pretty silly > to abstract this. Is there any platform that opensm could conceivably > run on that doesn't supply memcpy()? OK. I'll work up a patch to eliminate this if there are no objections. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] opensm segfault?
Troy> And why the heck is "cl_memcpy" just a call to 'memcpy' Troy> anyway? This just seems like excessive uneeded abstraction. Hal> It's part of the component library, which is an OS Hal> abstraction layer. memcpy() is specified by the ISO C standard, so it seems pretty silly to abstract this. Is there any platform that opensm could conceivably run on that doesn't supply memcpy()? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] opensm segfault?
Hi Troy, On Tue, 2006-05-16 at 15:41, Troy Benjegerdes wrote: > I got this after an indeterminate amount of time running opensm.. > > > (gdb) bt > #0 0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0, ^ This is the problem. Not sure why yet. > count=64) at cl_memory_osd.c:87 > #1 0x00415053 in osm_pkey_tbl_sync_new_blocks ( > p_pkey_tbl=0x2ad99228) at osm_pkey.c:127 > #2 0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40) > at osm_pkey_mgr.c:407 > #3 0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, > signal=3) > at osm_state_mgr.c:2243 > #4 0x0043c88f in __osm_state_mgr_ctrl_disp_callback ( > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 > #5 0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0) > at cl_dispatcher.c:108 > #6 0x2b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268) > at cl_threadpool.c:78 > #7 0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at > cl_thread.c:61 > #8 0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 > #9 0x2b90b12c8273 in clone () from /lib/libc.so.6 > > > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This > just seems like excessive uneeded abstraction. It's part of the component library, which is an OS abstraction layer. > I'm running opensm from subversion rev 7091.. > > May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn > 6251:7091M > > the only local changes are as follows: > > [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff > Index: osm/opensm/osm_port_info_rcv.c > === > --- osm/opensm/osm_port_info_rcv.c (revision 7091) > +++ osm/opensm/osm_port_info_rcv.c (working copy) > @@ -469,9 +469,14 @@ >goto Exit; > } > > +#if 0 > /* Check for IBM eHCA firmware defect in reporting partition > * enforcement cap */ > if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == > IBM_VENDOR_ID) >p_switch->switch_info.enforce_cap = 0; > +#endif > +/* Check for busted divergenet switch on ameslab network */ > +if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e000152) > + p_switch->switch_info.enforce_cap = 0; > > /* Bail out if this is a switch with no partition enforcement > * capability */ > if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) Yes, that's fine. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt
On Mon, May 15, 2006 at 02:21:21PM -0700, Bryan O'Sullivan wrote: > On Mon, 2006-05-15 at 08:50 -0700, Roland Dreier wrote: > > > Actually I NAK'ed this patch. It compiles the same thing on x86_64 > > but makes the source code wrong -- dma_map_single() returns a bus > > address, not a physical address. > > As Segher mentioned, bus_to_virt is unportable, so it's definitely the > wrong thing to use. phys_to_virt is as bad. please fix your code to do the right thing, that is to stop pretending to be able to map back from a bus to a virtual address. The only way to get at the virtual address from a bus one is to store it away at the time you call the dma mapping function. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel
This issue already fixed in rc5. Regards, Pasha. Scott Weitzenkamp (sweitzen) wrote: Actually, I spoke too soon. Kernel components compiled, but MVAPICH did not: Compiling MVAPICH ... 2 mpirun_rsh.c: In function 'read_hostfile': mpirun_rsh.c:1197: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1205: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1220: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1220: error: too few arguments to function 'strndup' make[3]: *** [mpirun_rsh] Error 1 Exit status from make was 2 make[2]: *** [mpilib] Error 1 make[1]: *** [mpi-modules] Error 2 make: *** [mpi] Error 2 Error in compiling MVAPICH. Check the log file: make.mvapich.log Exiting Mvapich installation failed Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Tuesday, May 16, 2006 10:48 AM To: Michael S. Tsirkin Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel After running "yum update", I was able to compile OFED 1.0 rc4 on 2.6.16-1.2111_FC5 kernel. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -Original Message- From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] Sent: Thursday, May 11, 2006 9:37 AM To: Scott Weitzenkamp (sweitzen) Cc: [EMAIL PROTECTED]; openib-general@openib.org Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel Quoting r. Scott Weitzenkamp (sweitzen) <[EMAIL PROTECTED]>: Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel Is this a useful kernel to try, or should get latest FC5 kernel or 2.6.16 from kernel.org? I think you should go to latest update. -- MST ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] opensm segfault?
I got this after an indeterminate amount of time running opensm.. (gdb) bt #0 0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0, count=64) at cl_memory_osd.c:87 #1 0x00415053 in osm_pkey_tbl_sync_new_blocks ( p_pkey_tbl=0x2ad99228) at osm_pkey.c:127 #2 0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40) at osm_pkey_mgr.c:407 #3 0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, signal=3) at osm_state_mgr.c:2243 #4 0x0043c88f in __osm_state_mgr_ctrl_disp_callback ( context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 #5 0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0) at cl_dispatcher.c:108 #6 0x2b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268) at cl_threadpool.c:78 #7 0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at cl_thread.c:61 #8 0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 #9 0x2b90b12c8273 in clone () from /lib/libc.so.6 And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This just seems like excessive uneeded abstraction. I'm running opensm from subversion rev 7091.. May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn 6251:7091M the only local changes are as follows: [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff Index: osm/opensm/osm_port_info_rcv.c === --- osm/opensm/osm_port_info_rcv.c (revision 7091) +++ osm/opensm/osm_port_info_rcv.c (working copy) @@ -469,9 +469,14 @@ goto Exit; } +#if 0 /* Check for IBM eHCA firmware defect in reporting partition * enforcement cap */ if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == IBM_VENDOR_ID) p_switch->switch_info.enforce_cap = 0; +#endif +/* Check for busted divergenet switch on ameslab network */ +if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e000152) + p_switch->switch_info.enforce_cap = 0; /* Bail out if this is a switch with no partition enforcement * capability */ if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] RDMA CM: updates to 2.6.18 branch
OK, the for-2.6.18 branch is updated with all of this. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel
Actually, I spoke too soon. Kernel components compiled, but MVAPICH did not: Compiling MVAPICH ... 2 mpirun_rsh.c: In function 'read_hostfile': mpirun_rsh.c:1197: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1205: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1220: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1220: error: too few arguments to function 'strndup' make[3]: *** [mpirun_rsh] Error 1 Exit status from make was 2 make[2]: *** [mpilib] Error 1 make[1]: *** [mpi-modules] Error 2 make: *** [mpi] Error 2 Error in compiling MVAPICH. Check the log file: make.mvapich.log Exiting Mvapich installation failed Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Scott Weitzenkamp (sweitzen) > Sent: Tuesday, May 16, 2006 10:48 AM > To: Michael S. Tsirkin > Cc: [EMAIL PROTECTED]; openib-general@openib.org > Subject: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on > orig FC5 kernel > > After running "yum update", I was able to compile OFED 1.0 rc4 on > 2.6.16-1.2111_FC5 kernel. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -Original Message- > > From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] > > Sent: Thursday, May 11, 2006 9:37 AM > > To: Scott Weitzenkamp (sweitzen) > > Cc: [EMAIL PROTECTED]; openib-general@openib.org > > Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel > > > > Quoting r. Scott Weitzenkamp (sweitzen) <[EMAIL PROTECTED]>: > > > Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel > > > > > > Is this a useful kernel to try, or should get latest FC5 > > kernel or 2.6.16 from kernel.org? > > > > I think you should go to latest update. > > > > -- > > MST > > > ___ > openfabrics-ewg mailing list > [EMAIL PROTECTED] > http://openib.org/mailman/listinfo/openfabrics-ewg > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel
After running "yum update", I was able to compile OFED 1.0 rc4 on 2.6.16-1.2111_FC5 kernel. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -Original Message- > From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 11, 2006 9:37 AM > To: Scott Weitzenkamp (sweitzen) > Cc: [EMAIL PROTECTED]; openib-general@openib.org > Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel > > Quoting r. Scott Weitzenkamp (sweitzen) <[EMAIL PROTECTED]>: > > Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel > > > > Is this a useful kernel to try, or should get latest FC5 > kernel or 2.6.16 from kernel.org? > > I think you should go to latest update. > > -- > MST > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Re: [PATCH] RE: compliancy issue?
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: Re: Re: [PATCH] RE: compliancy issue? > > >OK, I just tested and this works for me. Here's the SDP patch to do what > >you > >described. The code actually got cleaner now: its convenient to get > >different events on active versus passive side - previously I had > >to check a flag to figure out what does ESTABLISHED mean. > > I committed the CMA patch. > Ditto for the SDP update. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Heads-up for anyone using certain thunderbird message filter features
Off topic - but probably very important to people using thunderbird as an email client. I am using certain thunderbird message filter features - mainly move to folder and then delete from pop server (this is done as a single step). I am on 2 mailing lists that received the same patch set, of the 54 patches emails, 15 patches when to one folder (kernel), 41 patches when to the other folder (openib), and 3 went to both. So anyone using this should watch as the filter does act odd, I suspect that it may be that since the message id is the same that, that may be what it is using to delete the message and may cause it to get both messages, the emails that I got both copies of were delayed by quote a bit and very likely came in on different email downloads, so the other email were not there to delete. Roger ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] FTP over SDP is not working fine-[newbie]
On Tue, May 16, 2006 at 08:15:31AM +0100, keshetti mahesh wrote: > my lisdp.conf file is: > > match listen *:* > match destination *:* > match program * > > so that all services can be allowed both on the client and server sides Both client and server have a libsdp.conf file. Do they both have the above content? (It should, but your comment above suggests only one libsdp.conf file is being used.) > > after exporting that file and LD_PRELOAD=/usr/lib/lib64/libsdp.so, i have > restarted all services (vsftpd, xinetd etc) > > again the same problem with FTP .network unreachable I'll assume "ping" does work. My next suggestion is to top the FTP server and manually invoke it to listen on a different port (proftpd takes -p parameter): /etc/init.d/proftpd stop LD_PRELOAD=/usr/lib/lib64/libsdp.so proftpd -p 20022 Using another login, confirm the ftp server is listening on port 20022 (netstat -a) and is using SDP (cat /proc//maps or something like that). Then from the client, try to talk to that server with LD_PRELOAD=/usr/lib/lib64/libsdp.so ftp 192.168.2.99 20022 > but the i can't understand y only this is giving problem (the other > applications are not giving any problem) Sorry - I don't understand that either. If my above suggestion doesn't work, perhaps try a different ftp server or different ftp client? grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
> Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like > not a good idea. Need to think back to why do we need this at all. > You need it to keep the connection alive until both client and server have finished running the test, in the case of full duplex tests... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: need help regarding IB core software
Please post generic questions to the openib mailing list. i have started working over infiniband recently i want to develop a sample utility that would perform simple RDMA (read/write) operations There are some test applications that can be used as a base. Are you wanting a userspace or kernel application? i will be thankful to you if u can refer me some documents,atticles or books where i can get this information The best documentation is the IB architecture specification. However, I think that the test apps are simple enough to help you here. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH] RE: compliancy issue?
OK, I just tested and this works for me. Here's the SDP patch to do what you described. The code actually got cleaner now: its convenient to get different events on active versus passive side - previously I had to check a flag to figure out what does ESTABLISHED mean. I committed the CMA patch. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: sdp with kernel 2.6.16.14
Quoting r. amit byron <[EMAIL PROTECTED]>: > Subject: Re: sdp with kernel 2.6.16.14 > > > Michael, > > netperf works with sdp. > > slabinfo output: > slabinfo - version: 2.1 (statistics) > # name > : tunables: slabdata >: globalstat > : cpustat > > SDP0 0 117262 : tunables 24 128 : > slabdata 0 0 0 : globalstat 104 1811 11 > 0000 : cpustat 3 17 20 0 > fib6_nodes 7 92 40 921 : tunables 32 168 : > slabdata 1 1 0 : globalstat 32 21 10 > 0000 : cpustat 5 2 0 0 > ip6_dst_cache 9 17228 171 : tunables 32 168 : > slabdata 1 1 0 : globalstat 36 17 10 > 0000 : cpustat 14 3 8 0 > ndisc_cache2 22180 221 : tunables 32 168 : > slabdata 1 1 0 : globalstat 32 17 10 > 0000 : cpustat 3 2 3 0 > RAWv6 7 11712 112 : tunables 32 168 : > slabdata 1 1 0 : globalstat 11 11 10 > 0000 : cpustat 6 1 0 0 > UDPv6 1 11684 112 : tunables 32 168 : > slabdata 1 1 0 : globalstat 52 22 21 > 0000 : cpustat 10 5 14 0 > > Amit > > "Michael S. Tsirkin" <[EMAIL PROTECTED]> wrote: > > Quoting r. amit byron : > > Subject: sdp with kernel 2.6.16.14 > > > > > > hi, > > > > i'm trying to get sdp work between point-to-point connected > > machines running kernel 2.6.16.24. i have configured ipoib > > and trying to run iperf using sdp. > > > > the client machine has an entry in its libsdp.conf: > > match destination 192.168.1.2 > > > > the server machine has na entry in its libsdp.conf: > > match listen *:5001 > > > > iperf is started on the server machine using command: > > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -s > > > > iperf client is started on the client machine using command: > > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -c 192.168.1.2 > > > > the server machine panics with following messages: > > > > oom-killer: gfp_mask=0xd0, order=0 > > [] oom-killer: gfp_mask=0xd0, order=0 > > [] out_of_memory+0x155/0x180 > > [] __alloc_pages+0x2a5/0x320 > > [] __get_free_pages+0x1e/0x40 > > [] __pollwait+0x80/0xd0 > > [] pipe_poll+0xcd/0xe0 > > [] do_select+0x212/0x480 > > [] cache_free_debugcheck+0x135/0x230 > > [] __pollwait+0x0/0xd0 > > [] core_sys_select+0x1ce/0x2e0 > > [] sys_select+0x51/0x1c0 > > [] sysenter_past_esp+0x54/0x75 > > DMA per-cpu: > > cpu 0 hot: high 0, batch 1 used:0 > > cpu 0 cold: high 0, batch 1 used:0 > > cpu 1 hot: high 0, batch 1 used:0 > > cpu 1 cold: high 0, batch 1 used:0 > > cpu 2 hot: high 0, batch 1 used:0 > > cpu 2 cold: high 0, batch 1 used:0 > > cpu 3 hot: high 0, batch 1 used:0 > > cpu 3 cold: high 0, batch 1 used:0 > > DMA32 per-cpu: empty > > Normal per-cpu: > > cpu 0 hot: high 186, batch 31 used:103 > > cpu 0 cold: high 62, batch 15 used:61 > > cpu 1 hot: high 186, batch 31 used:183 > > cpu 1 cold: high 62, batch 15 used:53 > > cpu 2 hot: high 186, batch 31 used:28 > > cpu 2 cold: high 62, batch 15 used:54 > > cpu 3 hot: high 186, batch 31 used:63 > > cpu 3 cold: high 62, batch 15 used:60 > > HighMem per-cpu: > > cpu 0 hot: high 186, batch 31 used:176 > > cpu 0 cold: high 62, batch 15 used:13 > > cpu 1 hot: high 186, batch 31 used:169 > > cpu 1 cold: high 62, batch 15 used:1 > > cpu 2 hot: high 186, batch 31 used:157 > > cpu 2 cold: high 62, batch 15 used:0 > > cpu 3 hot: high 186, batch 31 used:174 > > cpu 3 cold: high 62, batch 15 used:6 > > Free pages: 7366104kB (7358760kB HighMem) > > Active:5351 inactive:4885 dirty:0 writeback:0 unstable:0 free:1841526 > slab:8970 mapped:4565 pagetables:238 > > DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB > present:16384kB pages_scanned:8 all_unreclaimable? yes > > lowmem_reserve[]: 0 0 880 8623 > > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB > present:0kB pages_scanned:0 all_unreclaimable? no > > lowmem_reserve[]: 0 0 880 8623 > > Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB > inactive:0kB present:901 > > 120kB pages_scanned:314 all_unreclaimable? yes > > lowmem_reserve[]: 0 0 0 61951 > > HighMem free:7358760kB min:512kB low:8780kB high:17052kB active:21172kB > inactive:19540kB present:7929
RE: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat toutilize the RDMA CM
If we will have a patch just to the pp routine as MST suggested it would be nice , I could apply it to all other performance tests. Sagi -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael S. Tsirkin Sent: Tuesday, May 16, 2006 5:58 PM To: Steve Wise Cc: openib-general Subject: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat toutilize the RDMA CM Quoting r. Steve Wise <[EMAIL PROTECTED]>: > Subject: Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the > RDMA CM > > On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote: > > Quoting r. Steve Wise <[EMAIL PROTECTED]>: > > > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the > > > RDMA CM > > > > > > I don't know who maintains src/userspace/perftest, but here is a > > > patch set that enables rdma_bw and rdma_lat to use the RDMA_CM > > > with the addition of the -c or --cma flag. > > > > > > > I'm worried that this makes the program too big. Maybe this should > > be another test rather than an option? > > > > ok. You want it as a separate pair of programs? I guess we'll see once there's the minimum patch that only affects the connection setup. If the changes can be localised to just the pp routines, then I think it still fits as part of the same test. > > > The rkey/addr info is exchanged in the private data, and > > > SEND/RECV's are used to sync the client/server before and after execution. > > > > Do we really need SEND/RECV messages for this? > > I think I get completion with error once the remote side has disconnected. No? > > > > perhaps. I just thought it was cleaner to synch up at the end. Just > like the non-cma version does over the TCP socket (see > pp_client_exch_dest() / pp_server_exch_dest() at the end of the test). Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like not a good idea. Need to think back to why do we need this at all. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
Quoting r. Steve Wise <[EMAIL PROTECTED]>: > Subject: Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM > > On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote: > > Quoting r. Steve Wise <[EMAIL PROTECTED]>: > > > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM > > > > > > I don't know who maintains src/userspace/perftest, but here is a patch > > > set that enables rdma_bw and rdma_lat to use the RDMA_CM with the > > > addition of the -c or --cma flag. > > > > > > > I'm worried that this makes the program too big. Maybe this should be > > another test rather than an option? > > > > ok. You want it as a separate pair of programs? I guess we'll see once there's the minimum patch that only affects the connection setup. If the changes can be localised to just the pp routines, then I think it still fits as part of the same test. > > > The rkey/addr info is exchanged in the private data, and SEND/RECV's are > > > used > > > to sync the client/server before and after execution. > > > > Do we really need SEND/RECV messages for this? > > I think I get completion with error once the remote side has disconnected. > > No? > > > > perhaps. I just thought it was cleaner to synch up at the end. Just > like the non-cma version does over the TCP socket (see > pp_client_exch_dest() / pp_server_exch_dest() at the end of the test). Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like not a good idea. Need to think back to why do we need this at all. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote: > Quoting r. Steve Wise <[EMAIL PROTECTED]>: > > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM > > > > I don't know who maintains src/userspace/perftest, but here is a patch > > set that enables rdma_bw and rdma_lat to use the RDMA_CM with the > > addition of the -c or --cma flag. > > > > I'm worried that this makes the program too big. Maybe this should be > another test rather than an option? > ok. You want it as a separate pair of programs? > > The rkey/addr info is exchanged in the private data, and SEND/RECV's are > > used > > to sync the client/server before and after execution. > > Do we really need SEND/RECV messages for this? > I think I get completion with error once the remote side has disconnected. No? > perhaps. I just thought it was cleaner to synch up at the end. Just like the non-cma version does over the TCP socket (see pp_client_exch_dest() / pp_server_exch_dest() at the end of the test). > > Also, I added -P or --poll to rdma_bw to allow blocking for completion > > events when none are ready (if you omit -P, it will block when no > > completion is available, otherwise it will spin). > > Needs to be a separate patch. ok. > > > Signed-off-by: Steve Wise <[EMAIL PROTECTED]> > > > > Index: rdma_lat.c > > === > > --- rdma_lat.c (revision 7050) > > +++ rdma_lat.c (working copy) > > @@ -53,6 +53,7 @@ > > #include > > > > #include > > +#include > > > > #include "get_clock.h" > > > > @@ -71,7 +72,8 @@ > > struct ibv_context *context; > > struct ibv_pd *pd; > > struct ibv_mr *mr; > > - struct ibv_cq *cq; > > + struct ibv_cq *scq; > > + struct ibv_cq *rcq; > > Why are you adding another CQ? > It makes waiting for a recv completion easier since you won't get a send completion when the CQ is only for receives... > > struct ibv_qp *qp; > > void *buf; > > volatile char *post_buf; > > @@ -80,6 +82,7 @@ > > int tx_depth; > > struct ibv_sge list; > > struct ibv_send_wr wr; > > + struct rdma_cm_id *cm_id; > > }; > > > > struct pingpong_dest { > > @@ -323,16 +326,22 @@ > > return NULL; > > } > > > > - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); > > - if (!ctx->cq) { > > + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); > > + if (!ctx->rcq) { > > fprintf(stderr, "Couldn't create CQ\n"); > > return NULL; > > } > > CQ of depth 1? > Yes, there is only ever one outstanding send/recv exchange... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
Quoting r. Steve Wise <[EMAIL PROTECTED]>: > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM > > I don't know who maintains src/userspace/perftest, but here is a patch > set that enables rdma_bw and rdma_lat to use the RDMA_CM with the > addition of the -c or --cma flag. > I'm worried that this makes the program too big. Maybe this should be another test rather than an option? > The rkey/addr info is exchanged in the private data, and SEND/RECV's are used > to sync the client/server before and after execution. Do we really need SEND/RECV messages for this? I think I get completion with error once the remote side has disconnected. No? > Also, I added -P or --poll to rdma_bw to allow blocking for completion > events when none are ready (if you omit -P, it will block when no > completion is available, otherwise it will spin). Needs to be a separate patch. > Signed-off-by: Steve Wise <[EMAIL PROTECTED]> > Index: rdma_lat.c > === > --- rdma_lat.c(revision 7050) > +++ rdma_lat.c(working copy) > @@ -53,6 +53,7 @@ > #include > > #include > +#include > > #include "get_clock.h" > > @@ -71,7 +72,8 @@ > struct ibv_context *context; > struct ibv_pd *pd; > struct ibv_mr *mr; > - struct ibv_cq *cq; > + struct ibv_cq *scq; > + struct ibv_cq *rcq; Why are you adding another CQ? > struct ibv_qp *qp; > void *buf; > volatile char *post_buf; > @@ -80,6 +82,7 @@ > int tx_depth; > struct ibv_sge list; > struct ibv_send_wr wr; > + struct rdma_cm_id *cm_id; > }; > > struct pingpong_dest { > @@ -323,16 +326,22 @@ > return NULL; > } > > - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); > - if (!ctx->cq) { > + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); > + if (!ctx->rcq) { > fprintf(stderr, "Couldn't create CQ\n"); > return NULL; > } CQ of depth 1? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
I don't know who maintains src/userspace/perftest, but here is a patch set that enables rdma_bw and rdma_lat to use the RDMA_CM with the addition of the -c or --cma flag. The rkey/addr info is exchanged in the private data, and SEND/RECV's are used to sync the client/server before and after execution. Also, I added -P or --poll to rdma_bw to allow blocking for completion events when none are ready (if you omit -P, it will block when no completion is available, otherwise it will spin). Signed-off-by: Steve Wise <[EMAIL PROTECTED]> Index: rdma_lat.c === --- rdma_lat.c (revision 7050) +++ rdma_lat.c (working copy) @@ -53,6 +53,7 @@ #include #include +#include #include "get_clock.h" @@ -71,7 +72,8 @@ struct ibv_context *context; struct ibv_pd *pd; struct ibv_mr *mr; - struct ibv_cq *cq; + struct ibv_cq *scq; + struct ibv_cq *rcq; struct ibv_qp *qp; void *buf; volatile char *post_buf; @@ -80,6 +82,7 @@ int tx_depth; struct ibv_sge list; struct ibv_send_wr wr; + struct rdma_cm_id *cm_id; }; struct pingpong_dest { @@ -323,16 +326,22 @@ return NULL; } - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); - if (!ctx->cq) { + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); + if (!ctx->rcq) { fprintf(stderr, "Couldn't create CQ\n"); return NULL; } + ctx->scq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); + if (!ctx->scq) { + fprintf(stderr, "Couldn't create CQ\n"); + return NULL; + } + { struct ibv_qp_init_attr attr = { - .send_cq = ctx->cq, - .recv_cq = ctx->cq, + .send_cq = ctx->scq, + .recv_cq = ctx->rcq, .cap = { .max_send_wr = tx_depth, /* Work around: driver doesnt support @@ -370,13 +379,6 @@ } } - ctx->wr.wr_id = PINGPONG_RDMA_WRID; - ctx->wr.sg_list= &ctx->list; - ctx->wr.num_sge= 1; - ctx->wr.opcode = IBV_WR_RDMA_WRITE; - ctx->wr.send_flags = IBV_SEND_SIGNALED | IBV_SEND_INLINE; - ctx->wr.next = NULL; - return ctx; } @@ -489,6 +491,467 @@ return 0; } +/* CMA STUFF */ + +static void pp_post_recv(struct pingpong_context *ctx) +{ + struct ibv_sge list; + struct ibv_recv_wr wr, *bad_wr; + int rc; + + list.addr = (uintptr_t) ctx->buf; + list.length = 1; + list.lkey = ctx->mr->lkey; + wr.next = NULL; + wr.wr_id = 0xdeadbeef; + wr.sg_list = &list; + wr.num_sge = 1; + + rc = ibv_post_recv(ctx->qp, &wr, &bad_wr); + if (rc) { + perror("ibv_post_recv"); + fprintf(stderr, "%s ibv_post_recv failed %d\n", __FUNCTION__, rc); + } +} + +static struct pingpong_context *pp_init_cma_ctx(struct rdma_cm_id *cm_id, + unsigned size, + int tx_depth, int port) +{ + struct pingpong_context *ctx; + + ctx = malloc(sizeof *ctx); + if (!ctx) + return NULL; + + ctx->size = size; + ctx->tx_depth = tx_depth; + + ctx->buf = memalign(page_size, size * 2); + if (!ctx->buf) { + fprintf(stderr, "Couldn't allocate work buf.\n"); + return NULL; + } + + memset(ctx->buf, 0, size * 2); + + ctx->post_buf = (char*)ctx->buf + (size - 1); + ctx->poll_buf = (char*)ctx->buf + (2 * size - 1); + + ctx->cm_id = cm_id; + ctx->context = cm_id->verbs; + if (!ctx->context) { + fprintf(stderr, "%s Unbound cm_id!!\n", __FUNCTION__); + return NULL; + } + + ctx->pd = ibv_alloc_pd(ctx->context); + if (!ctx->pd) { + fprintf(stderr, "Couldn't allocate PD\n"); + return NULL; + } + +/* We dont really want IBV_ACCESS_LOCAL_WRITE, but IB spec says: + * The Consumer is not allowed to assign Remote Write or Remote Atomic to + * a Memory Region that has not been assigned Local Write. */ + ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size * 2, +IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE); + if (!ctx->mr) { + fprintf(stderr, "Couldn't allocate MR\n"); + return NULL; + } + + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); + if (!ctx->rcq) { + fprintf(stderr, "Couldn't create RCQ\n"); + return NULL; + } + + ctx
Re: [openib-general] RDMA enabled NICs- newbie
As has been said already, there are two rnics running with the Open Fabrics stack, the Ammasso 1100 and the Chelsio CXGB3 rnics. Tom Tucker and I are the maintainers of this code base (the iwarp branch). I hear NetEffect also has an 10Gb iWARP NIC. As far as I know, they don't have any support for the Open Fabrics iwarp branch yet. Steve. On Mon, 2006-05-15 at 12:14 -0500, Roger Heflin wrote: > Ian Brown wrote: > > Thanks all. > > I indeed fround that > > http://www.ammasso.com/ responds with > > "There is no website configured at this address." > > while > > http://www.chelsio.com/ > > does exist. > > > > Is there a reason why manufacturers will refrain from > > producing RDMA ? (I mean , are there better technologies > > which are a substitute for RDMA for ethernet ?) > > Regards, > > IB > > > I kind of think that the market is too small to support > a company making a card that is at best just slightly cheaper > than things like Infiniband, and Myrinet, and is actually > slower than the Infiniband and Myrinet. > > Consider how many cards one has to sell to pay a single > engineers salary when you are at best making $100-$150 a > card over production costs.The numbers don't look that > good to me, and consider that previous to Ammasso and Chelsio > there have been a long string of companies producing accelerated > nitch network cards of various types (going back as far as the > early 90's), and all of them have failed to get enough > market share to stay in business. About the only thing > that makes one of these companies viable is being bought > out by someone large enough to support the needed funding. > > Level 5 is making accelerated ethernet cards, I believe most > of the acceleration is in software in some manner (kernel bypass), > and I don't know if their card could be made to do rdma. > > Roger > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] RE: compliancy issue?
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: [PATCH] RE: compliancy issue? > > >CA4-24.2.3: The connecting peer shall terminate the connection attempt > >if ExtMaxAdverts of the HAH is set to zero. > > > >This means that SDP must examine the HAH before RTU is sent. > >But, CMA currently sends RTU from cma_rep_recv, before notifying > >the user. > > Can you try this simple patch and see if it fixes your problem? You will > need to call rdma_accept() or rdma_reject() after receiving a CONNECT_RESPONSE > event. The conn_param to rdma_accept() should be NULL. OK, I just tested and this works for me. Here's the SDP patch to do what you described. The code actually got cleaner now: its convenient to get different events on active versus passive side - previously I had to check a flag to figure out what does ESTABLISHED mean. I still think it makes sense to do this for all ULPs and not just SDP, but oh well. Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> Index: linux-2.6.16/drivers/infiniband/ulp/sdp/sdp_cma.c === --- linux-2.6.16.orig/drivers/infiniband/ulp/sdp/sdp_cma.c 2006-05-16 15:22:00.0 +0300 +++ linux-2.6.16/drivers/infiniband/ulp/sdp/sdp_cma.c 2006-05-16 15:25:03.0 +0300 @@ -237,9 +237,9 @@ int sdp_connect_handler(struct sock *sk, return 0; } -int sdp_connected_handler(struct sock *sk, struct rdma_cm_event *event) +static int sdp_response_handler(struct sock *sk, struct rdma_cm_event *event) { - struct sock *parent; + struct sdp_hah *h; sdp_dbg(sk, "%s\n", __func__); sk->sk_state = TCP_ESTABLISHED; @@ -250,23 +250,37 @@ int sdp_connected_handler(struct sock *s if (sock_flag(sk, SOCK_DEAD)) return 0; + h = event->private_data; + sdp_sk(sk)->bufs = ntohs(h->bsdh.bufs); + sdp_sk(sk)->xmit_size_goal = ntohl(h->actrcvsz) - + sizeof(struct sdp_bsdh); + + sdp_dbg(sk, "%s bufs %d xmit_size_goal %d\n", __func__, + sdp_sk(sk)->bufs, + sdp_sk(sk)->xmit_size_goal); + + ib_req_notify_cq(sdp_sk(sk)->qp->send_cq, IB_CQ_NEXT_COMP); + + sk->sk_state_change(sk); + sk_wake_async(sk, 0, POLL_OUT); + return 0; +} + +int sdp_connected_handler(struct sock *sk, struct rdma_cm_event *event) +{ + struct sock *parent; + sdp_dbg(sk, "%s\n", __func__); + parent = sdp_sk(sk)->parent; - if (!parent) { - struct sdp_hah *h = event->private_data; - sdp_sk(sk)->bufs = ntohs(h->bsdh.bufs); - sdp_sk(sk)->xmit_size_goal = ntohl(h->actrcvsz) - - sizeof(struct sdp_bsdh); - - sdp_dbg(sk, "%s bufs %d xmit_size_goal %d\n", __func__, - sdp_sk(sk)->bufs, - sdp_sk(sk)->xmit_size_goal); + BUG_ON(!parent); - ib_req_notify_cq(sdp_sk(sk)->qp->send_cq, IB_CQ_NEXT_COMP); + sk->sk_state = TCP_ESTABLISHED; + + /* TODO: If SOCK_KEEPOPEN set, need to reset and start + keepalive timer here */ - sk->sk_state_change(sk); - sk_wake_async(sk, 0, POLL_OUT); + if (sock_flag(sk, SOCK_DEAD)) return 0; - } lock_sock(parent); if (sk_acceptq_is_full(parent)) { @@ -292,11 +306,6 @@ void sdp_disconnected_handler(struct soc sdp_dbg(sk, "%s\n", __func__); } -void sdp_response_handler(struct sock *sk) -{ - sdp_dbg(sk, "%s\n", __func__); -} - int sdp_cma_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) { struct rdma_conn_param conn_param; @@ -388,7 +397,11 @@ int sdp_cma_handler(struct rdma_cm_id *i break; case RDMA_CM_EVENT_CONNECT_RESPONSE: sdp_dbg(sk, "RDMA_CM_EVENT_CONNECT_RESPONSE\n"); - sdp_response_handler(sk); + rc = sdp_response_handler(sk, event); + if (rc) + rdma_reject(id, NULL, 0); + else + rc = rdma_accept(id, NULL); break; case RDMA_CM_EVENT_CONNECT_ERROR: sdp_dbg(sk, "RDMA_CM_EVENT_CONNECT_ERROR\n"); -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 31] ifconfig up/down while ssh connection alive cause oops
http://openib.org/bugzilla/show_bug.cgi?id=31 [EMAIL PROTECTED] changed: What|Removed |Added Status|RESOLVED|CLOSED --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 31] ifconfig up/down while ssh connection alive cause oops
http://openib.org/bugzilla/show_bug.cgi?id=31 [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Additional Comments From [EMAIL PROTECTED] 2006-05-16 05:30 --- Resolved in RC4 --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 65] ib_ipoib refuses to unload when alias exists in modprobe.conf
http://openib.org/bugzilla/show_bug.cgi?id=65 [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||WONTFIX --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 28] ipoib_mcast_sendonly_join_complete oops
http://openib.org/bugzilla/show_bug.cgi?id=28 [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Additional Comments From [EMAIL PROTECTED] 2006-05-16 05:14 --- Fixed by Eli --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] slab: Fix kmem_cache_destroy() on NUMA
Roland Dreier wrote: With CONFIG_NUMA set, kmem_cache_destroy() may fail and say "Can't free all objects." The problem is caused by sequences such as the following (suppose we are on a NUMA machine with two nodes, 0 and 1): * Allocate an object from cache on node 0. * Free the object on node 1. The object is put into node 1's alien array_cache for node 0. * Call kmem_cache_destroy(), which ultimately ends up in __cache_shrink(). * __cache_shrink() does drain_cpu_caches(), which loops through all nodes. For each node it drains the shared array_cache and then handles the alien array_cache for the other node. However this means that node 0's shared array_cache will be drained, and then node 1 will move the contents of its alien[0] array_cache into that same shared array_cache. node 0's shared array_cache is never looked at again, so the objects left there will appear to be in use when __cache_shrink() calls __node_shrink() for node 0. So __node_shrink() will return 1 and kmem_cache_destroy() will fail. This patch fixes this by having drain_cpu_caches() do drain_alien_cache() on every node before it does drain_array() on the nodes' shared array_caches. The problem was originally reported by Or Gerlitz <[EMAIL PROTECTED]>. Cc: Christoph Lameter <[EMAIL PROTECTED]> Cc: Pekka Enberg <[EMAIL PROTECTED]> OK, Indeed i have CONFIG_NUMA and yes, the patch fixes my problem, thanks a lot for working on that! Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 33] OFED: Ping fails on ib1 interface - IBED - RC3
http://openib.org/bugzilla/show_bug.cgi?id=33 [EMAIL PROTECTED] changed: What|Removed |Added AssignedTo|[EMAIL PROTECTED] |[EMAIL PROTECTED] Summary|Ping fails on ib1 interface |OFED: Ping fails on ib1 |- IBED - RC3|interface - IBED - RC3 --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general