date:20060516

[openib-general] Re: ib_mthca fails to load with old firmware

2006-05-16 Thread Michael S. Tsirkin

Quoting r. Ken L Johnson <[EMAIL PROTECTED]>:
> Subject: Re: ib_mthca fails to load with old firmware
> 
> On Tue, 16 May 2006 at 22:05:42 -0700, Roland Dreier wrote:
> 
> > You could try passing the module option "fw_cmd_doorbell=0" to
> > ib_mthca.  That may work around things.
> 
> Thanks Roland and Michael, that did it. Just added the following to 
> the /etc/modprobe.conf.local:
> 
> options ib_mthca fw_cmd_doorbell=0

Hmm. There have been recent reports on configurations which have trouble working
with fw_cmd_doorbell=1, and not all of them old FW. I never saw this in the lab.
Roland, should we change fw_cmd_doorbell to 0 by default, until we figure out
what is going on?


-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm segfault?

2006-05-16 Thread Jason Gunthorpe

On Wed, May 17, 2006 at 09:10:11AM +0300, Eitan Zahavi wrote:
> cl_memcpy  should have some debug capabilities on top of memcpy ...
> cl memory management provide means to track all memory allocations, etc.

There are a huge number of canned solutions that provide a way to debug
memory problems without polluting the code with wrapper functions...
You can even fairly easially take your particular tracking functions
and build them into a canned linkable solution.

Wrapping ISO C (and IMHO, SUSv3) functions is almost always a bad
idea. It creates a maintenance pain because people will inevitably
add new code that doesn't use the wrappers.

Debugging hooks can always be integrated in with linker tricks and
portability is _always_ better served by just providing missing ISO
and SUSv3 functions on deficient platforms (using autoconf, libraries
and #include_next this can be made totally seamless)

Jason
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] opensm segfault?

2006-05-16 Thread Eitan Zahavi

cl_memcpy  should have some debug capabilities on top of memcpy ...
cl memory management provide means to track all memory allocations, etc.

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Sasha Khapyorsky
> Sent: Wednesday, May 17, 2006 2:11 AM
> To: Troy Benjegerdes
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] opensm segfault?
> 
> Hi Troy,
> 
> On 14:41 Tue 16 May , Troy Benjegerdes wrote:
> > I got this after an indeterminate amount of time running opensm..
> 
> May this be reproducible? Or it is completely random failure?
> 
> > (gdb) bt
> > #0  0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850,
p_src=0x0,
> > count=64) at cl_memory_osd.c:87
> > #1  0x00415053 in osm_pkey_tbl_sync_new_blocks (
> > p_pkey_tbl=0x2ad99228) at osm_pkey.c:127
> > #2  0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40)
> > at osm_pkey_mgr.c:407
> > #3  0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
> > signal=3)
> > at osm_state_mgr.c:2243
> > #4  0x0043c88f in __osm_state_mgr_ctrl_disp_callback (
> > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
> > #5  0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0)
> > at cl_dispatcher.c:108
> > #6  0x2b90b0dc1ca3 in __cl_thread_pool_routine
(context=0x583268)
> > at cl_threadpool.c:78
> > #7  0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
> > cl_thread.c:61
> > #8  0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
> > #9  0x2b90b12c8273 in clone () from /lib/libc.so.6
> >
> >
> >
> > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
> > just seems like excessive uneeded abstraction.
> 
> Absolutely agree with you.
> 
> Sasha.
> 
> > I'm running opensm from subversion rev 7091..
> >
> > May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn
> > 6251:7091M
> >
> > the only local changes are as follows:
> >
> > [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff
> > Index: osm/opensm/osm_port_info_rcv.c
> > ===
> > --- osm/opensm/osm_port_info_rcv.c  (revision 7091)
> > +++ osm/opensm/osm_port_info_rcv.c  (working copy)
> > @@ -469,9 +469,14 @@
> >goto Exit;
> >  }
> >
> > +#if 0
> >  /* Check for IBM eHCA firmware defect in reporting partition
> >  * enforcement cap */
> >  if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info))
==
> > IBM_VENDOR_ID)
> >p_switch->switch_info.enforce_cap = 0;
> > +#endif
> > +/* Check for busted divergenet switch on ameslab network */
> > +if (cl_ntoh64(p_node->node_info.node_guid) ==
0x00084e000152)
> > +   p_switch->switch_info.enforce_cap = 0;
> >
> >  /* Bail out if this is a switch with no partition enforcement
> >  * capability */
> >  if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ib_mthca fails to load with old firmware

2006-05-16 Thread Ken L Johnson

On Tue, 16 May 2006 at 22:05:42 -0700, Roland Dreier wrote:

> You could try passing the module option "fw_cmd_doorbell=0" to
> ib_mthca.  That may work around things.

Thanks Roland and Michael, that did it. Just added the following to 
the /etc/modprobe.conf.local:

options ib_mthca fw_cmd_doorbell=0

Regards,
-- 
Ken L Johnson  <[EMAIL PROTECTED]>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: ib_mthca fails to load with old firmware

2006-05-16 Thread Michael S. Tsirkin

Quoting r. Ken L Johnson <[EMAIL PROTECTED]>:
> Subject: ib_mthca fails to load with old firmware
> 
> I'm running into a problem when I try to use the OFED RC4 release on some 
> blade systems that have TopSpin HCA daughter cards installed (actually 
> Mellanox). I'm trying to figure out how to update the firmware to the latest 
> [ http://mellanox.com/support/firmware_table.php ] but it seems I must know 
> the PSID so I can grab the right firmware image. Can anyone point me in the 
> right direction here?
> 
> ---8<--- [query device using flint]
> 
> blade9:~ # flint -d /dev/mst/mt25208_pci_cr0 q
> Image type:  Failsafe
> I.S. Version:1
> Chip Revision:   A0
> GUID Des:Node Port1Port2Sys image
> GUIDs:   0005ad02ad1d 0005ad02ad1e 0005ad02ad1f 
> 0005ad000100d050
> Board ID:1
> VSD: 1
> PSID:
> 
> --->8---
> 
> ---8<--- [dmesg output showing ib_mthca load failure]
> 
>   <6>ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
>   <6>ib_mthca: Initializing :02:00.0
>   <6>ACPI: PCI Interrupt :02:00.0[A] -> GSI 16 (level, low) -> IRQ 169
>   <7>PCI: Setting latency timer of device :02:00.0 to 64
>   <6>e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
>   <4>ib_mthca :02:00.0: HCA FW version 4.6.0 is old (4.7.400 is current).
>   <4>ib_mthca :02:00.0: If you have problems, try updating your HCA FW.
>   <3>ib_mthca :02:00.0: NOP command failed to generate interrupt (IRQ 
> 169), aborting.
>   <3>ib_mthca :02:00.0: BIOS or ACPI interrupt routing problem?
>   <6>ACPI: PCI interrupt for device :02:00.0 disabled 
>   <4>ib_mthca: probe of :02:00.0 failed with error -16
> 
> --->8---
> 
> 
> ---8<--- [hwinfo & lspci output for HCA]
> 
> blade9:~ # hwinfo
> [...]
> 24: PCI 200.0: 0c06 InfiniBand
>   [Created at pci.277]
>   Unique ID: B35A.guWNc33i6_3
>   Parent ID: 8otl.l6V0RupyGX6
>   SysFS ID: /devices/pci:00/:00:04.0/:02:00.0
>   SysFS BusID: :02:00.0
>   Hardware Class: unknown
>   Model: "Mellanox MT25208 InfiniHost III Ex HCA (Tavor compatibility mode)"
>   Vendor: pci 0x15b3 "Mellanox Technologies"
>   Device: pci 0x6278 "MT25208 InfiniHost III Ex HCA (Tavor compatibility 
> mode)"
>   SubVendor: pci 0x15b3 "Mellanox Technologies"
>   SubDevice: pci 0x6278
>   Revision: 0xa0
>   Memory Range: 0xfe90-0xfe9f (rw,non-prefetchable)
>   Memory Range: 0xdf80-0xdfff (rw,prefetchable)
>   Memory Range: 0xd000-0xd7ff (rw,prefetchable)
>   IRQ: 169 (no events)
>   Module Alias: "pci:v15B3d6278sv15B3sd6278bc0Csc06i00"
>   Driver Info #0:
> Driver Status: ib_mthca is active
> Driver Activation Cmd: "modprobe ib_mthca"
>   Config Status: cfg=new, avail=yes, need=no, active=unknown
>   Attached to: #17 (PCI bridge)
> 
> blade9:~ # lspci -vv
> [...]
> 02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex HCA 
> (Tavor 
> compatibility mode) (rev a0)
> Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor 
> compatibility mode)
> Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> SERR-  Interrupt: pin A routed to IRQ 169
> Region 0: Memory at fe90 (64-bit, non-prefetchable) 
> [size=1M]
> Region 2: Memory at df80 (64-bit, prefetchable) [size=8M]
> Region 4: Memory at d000 (64-bit, prefetchable) 
> [size=128M]
> Capabilities: [40] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [48] Vital Product Data
> Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 
> Enable-
> Address:   Data: 
> Capabilities: [84] MSI-X: Enable- Mask- TabSize=32
> Vector table: BAR=0 offset=00082000
> PBA: BAR=0 offset=00082200
> Capabilities: [60] Express Endpoint IRQ 0
> Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
> Device: Latency L0s <64ns, L1 unlimited
> Device: AtnBtn- AtnInd- PwrInd-
> Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
> Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
> Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8
> Link: Latency L0s unlimited, L1 unlimited
> Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
> Link: Speed 2.5Gb/s, Width x8
> --->8---

Can you try with fw_cmd_doorbell?

-- 
MST
___
o

Re: [openib-general] ib_mthca fails to load with old firmware

2006-05-16 Thread Roland Dreier

Ken> I'm running into a problem when I try to use the OFED RC4
Ken> release on some blade systems that have TopSpin HCA daughter
Ken> cards installed (actually Mellanox). I'm trying to figure out
Ken> how to update the firmware to the latest [
Ken> http://mellanox.com/support/firmware_table.php ] but it seems
Ken> I must know the PSID so I can grab the right firmware
Ken> image. Can anyone point me in the right direction here?

For blade HCAs you should contact the HCA vendor for firmware updates.

You could try passing the module option "fw_cmd_doorbell=0" to
ib_mthca.  That may work around things.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] mpirun_mpd crashing

2006-05-16 Thread Liang Peng


Hi there,

Not sure whether this is the proper place to post, but we encounter some 
mpirun_mpd crashing problems in testing Voltaire MPI (based on MVAPICH) 
with Sun studio 11 compilers on SuSE Linux 9 SP3 (Opteron).  Hope 
someone can provide some hints:



MVAPICH version: 0.9.4 with Voltaire's modifications
Compiler used: Sun Studio 11
Problem:

When using the mpd version of MVAPICH, mpirun crashes with the following:

> mpirun_mpd -np 2 /usr/voltaire/mpi.cc.mpd/bin/cpi
[man_0]: [cli_0]: client_bnr_get failed
[cli_1]: MPD_Man_msg_handler received unexpected msg 
:cmd=client_bnr_get_output val=apstc-g4:00024400:

:
handle_lhs_msgs_input: failed for bnr_get: buf=:cmd=bnr_get src=man_0 
dest=man_0 bcast=true attr=MVAPICH_0001\^ gid=0

:
[man_0]: application program exited abnormally with status 0
[man_0]: application program signaled with signal 11 (: Segmentation fault)

The "rsh" version is working properly, and the gcc compiled version of 
mpd is working on the same machine.


Thanks!


Regards, 
Liang Peng


--
Research Scientist
Large Scale Computing
Asia Pacific Science & Technology Center
Sun Microsystems, Inc. 
and

Nanyang Technological University, Singapore


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] ib_mthca fails to load with old firmware

2006-05-16 Thread Ken L Johnson

I'm running into a problem when I try to use the OFED RC4 release on some 
blade systems that have TopSpin HCA daughter cards installed (actually 
Mellanox). I'm trying to figure out how to update the firmware to the latest 
[ http://mellanox.com/support/firmware_table.php ] but it seems I must know 
the PSID so I can grab the right firmware image. Can anyone point me in the 
right direction here?

---8<--- [query device using flint]

blade9:~ # flint -d /dev/mst/mt25208_pci_cr0 q
Image type:  Failsafe
I.S. Version:1
Chip Revision:   A0
GUID Des:Node Port1Port2Sys image
GUIDs:   0005ad02ad1d 0005ad02ad1e 0005ad02ad1f 
0005ad000100d050
Board ID:1
VSD: 1
PSID:

--->8---

---8<--- [dmesg output showing ib_mthca load failure]

  <6>ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
  <6>ib_mthca: Initializing :02:00.0
  <6>ACPI: PCI Interrupt :02:00.0[A] -> GSI 16 (level, low) -> IRQ 169
  <7>PCI: Setting latency timer of device :02:00.0 to 64
  <6>e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
  <4>ib_mthca :02:00.0: HCA FW version 4.6.0 is old (4.7.400 is current).
  <4>ib_mthca :02:00.0: If you have problems, try updating your HCA FW.
  <3>ib_mthca :02:00.0: NOP command failed to generate interrupt (IRQ 
169), aborting.
  <3>ib_mthca :02:00.0: BIOS or ACPI interrupt routing problem?
  <6>ACPI: PCI interrupt for device :02:00.0 disabled   
  <4>ib_mthca: probe of :02:00.0 failed with error -16

--->8---


---8<--- [hwinfo & lspci output for HCA]

blade9:~ # hwinfo
[...]
24: PCI 200.0: 0c06 InfiniBand
  [Created at pci.277]
  Unique ID: B35A.guWNc33i6_3
  Parent ID: 8otl.l6V0RupyGX6
  SysFS ID: /devices/pci:00/:00:04.0/:02:00.0
  SysFS BusID: :02:00.0
  Hardware Class: unknown
  Model: "Mellanox MT25208 InfiniHost III Ex HCA (Tavor compatibility mode)"
  Vendor: pci 0x15b3 "Mellanox Technologies"
  Device: pci 0x6278 "MT25208 InfiniHost III Ex HCA (Tavor compatibility 
mode)"
  SubVendor: pci 0x15b3 "Mellanox Technologies"
  SubDevice: pci 0x6278
  Revision: 0xa0
  Memory Range: 0xfe90-0xfe9f (rw,non-prefetchable)
  Memory Range: 0xdf80-0xdfff (rw,prefetchable)
  Memory Range: 0xd000-0xd7ff (rw,prefetchable)
  IRQ: 169 (no events)
  Module Alias: "pci:v15B3d6278sv15B3sd6278bc0Csc06i00"
  Driver Info #0:
Driver Status: ib_mthca is active
Driver Activation Cmd: "modprobe ib_mthca"
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #17 (PCI bridge)

blade9:~ # lspci -vv
[...]
02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor 
compatibility mode) (rev a0)
Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor 
compatibility mode)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
SERR- 8---

Regards,
-- 
Ken L Johnson  <[EMAIL PROTECTED]>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] AMSO1100 + uverbs: ping and opensm errors after installation

2006-05-16 Thread Kasten, ChristopherX B









Hello,

 

I am having trouble getting the AMSO1100 Ethernet card to work
with uverbs.

I have installed uverbs from the Installation Cheat Sheet https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet,
substituting amso where mthca is listed (for the most part).  I also have
updated the AMMASSO firmware using the file from http://www.opengridcomputing.com/downloads/ogc_amso_kit_20060308.tgz.

My linux kernel is 2.6.16.15, and the ib… & iw_c2
modules are loading successfully at boot.

 

When I try to ping another AMMASSO machine I get the
following output:

ping: sndmsg: Network is down

accompanied by a dmesg report:

  Virtual device iw1 asks to
queue packet!

 

The following is printed when running ibv_devinfo:

hca_id: amso0

  fw_ver:   1.1.1

  node_guid:    000d:b200:0845:

  sys_image_guid:   000d:b200:0844:

  vendor_id:    0x

  vendor_part_id:   0

  hw_ver:   0x0

  board_id:     AMSO1100
Board ID

  phys_port_cnt:    1

    port: 1

  state:    PORT_ACTIVE
(4)

  max_mtu:  4096
(5)

  active_mtu:   512
(2)

  sm_lid:   0

  port_lid: 0

  port_lmc: 0x00

It looks like some values are not being initialized.

 

Lastly, running opensm prints:

-

OpenSM Rev:openib-1.2.0

Command Line Arguments:

 Log File:
/var/log/osm.log

-

OpenSM Rev:openib-1.2.0

 

Using default guid 0x0

Error: Could not get port
guid

Exiting SM

 

 

Does anyone know which step(s) I’ve missed in
correctly setting up my network?

 

Thanks for the help,

 

Chris

 






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading

2006-05-16 Thread Roland Dreier

BTW, I think the patch below is correct as well.  This avoids problems
where the SRP driver waits forever for a completion, for example if
sending the DREQ fails because the connection has already been
disconnected by the target.

Does this scenario seem like the deadlock you thought you saw?

--- linux-kernel/infiniband/ulp/srp/ib_srp.c(revision 7245)
+++ linux-kernel/infiniband/ulp/srp/ib_srp.c(working copy)
@@ -342,7 +342,10 @@ static void srp_disconnect_target(struct
/* XXX should send SRP_I_LOGOUT request */
 
init_completion(&target->done);
-   ib_send_cm_dreq(target->cm_id, NULL, 0);
+   if (ib_send_cm_dreq(target->cm_id, NULL, 0)) {
+   printk(KERN_DEBUG PFX "Sending CM DREQ failed\n");
+   return;
+   }
wait_for_completion(&target->done);
 }
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading

2006-05-16 Thread Roland Dreier

 > +/*
 > + * We need 2 scsi_host_put becuase there are two get:
 > + *  in scsi_host_alloc and in scsi_add_host
 > + */
 > +scsi_host_put(target->scsi_host);
 >  scsi_host_put(target->scsi_host);

Hmm, this doesn't seem right to me.  If I try this, then I get a crash
because the scsi_host is already gone after the first put.  I verified
that the reference count is 1 before these puts, and with the
unmodified module I don't see anything left in /sys/class/scsi_host
after unloading the module.

What kernel are you seeing problems with?  I'm testing with an
up-to-date git kernel, although I doubt it makes a difference (did
SCSI reference counting change recently??).

I do think there are some extra scsi_host_put() calls in
srp_remove_work() -- I think the double scsi_host_put() dates back to
a version (which I may never even have checked in) where there was a
scsi_host_get() to avoid the scsi_host going away between the
schedule_work() and srp_remove_work() actually running.

So the patch below seems correct to me.

What do you think?

--- linux-kernel/infiniband/ulp/srp/ib_srp.c(revision 7245)
+++ linux-kernel/infiniband/ulp/srp/ib_srp.c(working copy)
@@ -353,7 +356,6 @@ static void srp_remove_work(void *target
spin_lock_irq(target->scsi_host->host_lock);
if (target->state != SRP_TARGET_DEAD) {
spin_unlock_irq(target->scsi_host->host_lock);
-   scsi_host_put(target->scsi_host);
return;
}
target->state = SRP_TARGET_REMOVED;
@@ -367,8 +369,6 @@ static void srp_remove_work(void *target
ib_destroy_cm_id(target->cm_id);
srp_free_target_ib(target);
scsi_host_put(target->scsi_host);
-   /* And another put to really free the target port... */
-   scsi_host_put(target->scsi_host);
 }
 
 static int srp_connect_target(struct srp_target_port *target)
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm segfault?

2006-05-16 Thread Sasha Khapyorsky

Hi Troy,

On 14:41 Tue 16 May , Troy Benjegerdes wrote:
> I got this after an indeterminate amount of time running opensm..

May this be reproducible? Or it is completely random failure?

> (gdb) bt
> #0  0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0,
> count=64) at cl_memory_osd.c:87
> #1  0x00415053 in osm_pkey_tbl_sync_new_blocks (
> p_pkey_tbl=0x2ad99228) at osm_pkey.c:127
> #2  0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40)
> at osm_pkey_mgr.c:407
> #3  0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
> signal=3)
> at osm_state_mgr.c:2243
> #4  0x0043c88f in __osm_state_mgr_ctrl_disp_callback (
> context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
> #5  0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0)
> at cl_dispatcher.c:108
> #6  0x2b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268)
> at cl_threadpool.c:78
> #7  0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
> cl_thread.c:61
> #8  0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
> #9  0x2b90b12c8273 in clone () from /lib/libc.so.6
> 
> 
> 
> And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
> just seems like excessive uneeded abstraction.

Absolutely agree with you.

Sasha.

> I'm running opensm from subversion rev 7091.. 
> 
> May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn
> 6251:7091M
> 
> the only local changes are as follows:
> 
> [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff
> Index: osm/opensm/osm_port_info_rcv.c
> ===
> --- osm/opensm/osm_port_info_rcv.c  (revision 7091)
> +++ osm/opensm/osm_port_info_rcv.c  (working copy)
> @@ -469,9 +469,14 @@
>goto Exit;
>  }
> 
> +#if 0
>  /* Check for IBM eHCA firmware defect in reporting partition
>  * enforcement cap */
>  if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) ==
> IBM_VENDOR_ID)
>p_switch->switch_info.enforce_cap = 0;
> +#endif
> +/* Check for busted divergenet switch on ameslab network */
> +if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e000152)
> +   p_switch->switch_info.enforce_cap = 0;
> 
>  /* Bail out if this is a switch with no partition enforcement
>  * capability */
>  if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 15 of 53] ipath - make some maximum values more sane

2006-05-16 Thread Arlin Davis


Bryan O'Sullivan wrote:


Increase the limits on some maximum values.

 

I noticed a rdma/message max size limitation of 4096 the last time I ran 
some dapl tests. Are there plans to increase or did I miss it somewhere 
in all the patches?


Here are the max values returned from the ipath ibv_query_device:

query_hca: (ver=20401) ep 65535 ep_q 65535 evd 65535 evd_q 65535
query_hca: msg 4096 rdma 4096 iov 255 lmr 65535 rmr 0
query_hca:  dto 65535 iov 255 rdma i1,o1

Thanks,

-arlin


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel

2006-05-16 Thread Scott A. Friedman


Hi

I have been trying to build OFED-1.0-rc4 on FC5 as well. MVAPICH builds 
if you fix the error - strndup should probably be strdup. Simple fix.


We have found that only iser, open-iscsi, mpitests and ibutils do not 
build right now for us. We do not need iser or open-iscsi so are not 
going to spend time on those - mpitests and ibutils would be nice.


Scott
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm segfault?

2006-05-16 Thread Hal Rosenstock

On Tue, 2006-05-16 at 16:10, Roland Dreier wrote:
> Troy> And why the heck is "cl_memcpy" just a call to 'memcpy'
> Troy> anyway?  This just seems like excessive uneeded abstraction.
> 
> Hal> It's part of the component library, which is an OS
> Hal> abstraction layer.
> 
> memcpy() is specified by the ISO C standard, so it seems pretty silly
> to abstract this.  Is there any platform that opensm could conceivably
> run on that doesn't supply memcpy()?

OK. I'll work up a patch to eliminate this if there are no objections.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm segfault?

2006-05-16 Thread Roland Dreier

Troy> And why the heck is "cl_memcpy" just a call to 'memcpy'
Troy> anyway?  This just seems like excessive uneeded abstraction.

Hal> It's part of the component library, which is an OS
Hal> abstraction layer.

memcpy() is specified by the ISO C standard, so it seems pretty silly
to abstract this.  Is there any platform that opensm could conceivably
run on that doesn't supply memcpy()?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm segfault?

2006-05-16 Thread Hal Rosenstock

Hi Troy,

On Tue, 2006-05-16 at 15:41, Troy Benjegerdes wrote:
> I got this after an indeterminate amount of time running opensm..
> 
> 
> (gdb) bt
> #0  0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0,
  ^
This is the problem. Not sure why yet.

> count=64) at cl_memory_osd.c:87
> #1  0x00415053 in osm_pkey_tbl_sync_new_blocks (
> p_pkey_tbl=0x2ad99228) at osm_pkey.c:127
> #2  0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40)
> at osm_pkey_mgr.c:407
> #3  0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
> signal=3)
> at osm_state_mgr.c:2243
> #4  0x0043c88f in __osm_state_mgr_ctrl_disp_callback (
> context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
> #5  0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0)
> at cl_dispatcher.c:108
> #6  0x2b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268)
> at cl_threadpool.c:78
> #7  0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
> cl_thread.c:61
> #8  0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
> #9  0x2b90b12c8273 in clone () from /lib/libc.so.6
> 
> 
> 
> And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
> just seems like excessive uneeded abstraction.

It's part of the component library, which is an OS abstraction layer.

> I'm running opensm from subversion rev 7091.. 
> 
> May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn
> 6251:7091M
> 
> the only local changes are as follows:
> 
> [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff
> Index: osm/opensm/osm_port_info_rcv.c
> ===
> --- osm/opensm/osm_port_info_rcv.c  (revision 7091)
> +++ osm/opensm/osm_port_info_rcv.c  (working copy)
> @@ -469,9 +469,14 @@
>goto Exit;
>  }
> 
> +#if 0
>  /* Check for IBM eHCA firmware defect in reporting partition
>  * enforcement cap */
>  if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) ==
> IBM_VENDOR_ID)
>p_switch->switch_info.enforce_cap = 0;
> +#endif
> +/* Check for busted divergenet switch on ameslab network */
> +if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e000152)
> +   p_switch->switch_info.enforce_cap = 0;
> 
>  /* Bail out if this is a switch with no partition enforcement
>  * capability */
>  if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)

Yes, that's fine.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

2006-05-16 Thread Christoph Hellwig

On Mon, May 15, 2006 at 02:21:21PM -0700, Bryan O'Sullivan wrote:
> On Mon, 2006-05-15 at 08:50 -0700, Roland Dreier wrote:
> 
> > Actually I NAK'ed this patch.  It compiles the same thing on x86_64
> > but makes the source code wrong -- dma_map_single() returns a bus
> > address, not a physical address.
> 
> As Segher mentioned, bus_to_virt is unportable, so it's definitely the
> wrong thing to use.

phys_to_virt is as bad.  please fix your code to do the right thing, that
is to stop pretending to be able to map back from a bus to a virtual address.
The only way to get at the virtual address from a bus one is to store it
away at the time you call the dma mapping function.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel

2006-05-16 Thread Pavel Shamis (Pasha)


This issue already fixed in rc5.

Regards,
Pasha.

Scott Weitzenkamp (sweitzen) wrote:

Actually, I spoke too soon.   Kernel components compiled, but MVAPICH
did not:

Compiling MVAPICH ...
2
mpirun_rsh.c: In function 'read_hostfile':
mpirun_rsh.c:1197: warning: incompatible implicit declaration of
built-in functi
on 'strndup'
mpirun_rsh.c:1205: warning: incompatible implicit declaration of
built-in functi
on 'strndup'
mpirun_rsh.c:1220: warning: incompatible implicit declaration of
built-in functi
on 'strndup'
mpirun_rsh.c:1220: error: too few arguments to function 'strndup'
make[3]: *** [mpirun_rsh] Error 1
Exit status from make was 2
make[2]: *** [mpilib] Error 1
make[1]: *** [mpi-modules] Error 2
make: *** [mpi] Error 2
Error in compiling MVAPICH. Check the log file: make.mvapich.log
Exiting 
Mvapich installation failed

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
Scott Weitzenkamp (sweitzen)

Sent: Tuesday, May 16, 2006 10:48 AM
To: Michael S. Tsirkin
Cc: [EMAIL PROTECTED]; openib-general@openib.org
Subject: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on 
orig FC5 kernel


After running "yum update", I was able to compile OFED 1.0 rc4 on
2.6.16-1.2111_FC5 kernel.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 


-Original Message-
From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 11, 2006 9:37 AM

To: Scott Weitzenkamp (sweitzen)
Cc: [EMAIL PROTECTED]; openib-general@openib.org
Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel

Quoting r. Scott Weitzenkamp (sweitzen) <[EMAIL PROTECTED]>:

Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel

Is this a useful kernel to try, or should get latest FC5 

kernel or 2.6.16 from kernel.org?

I think you should go to latest update.

--
MST


___
openfabrics-ewg mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openfabrics-ewg


___
openfabrics-ewg mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openfabrics-ewg



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] opensm segfault?

2006-05-16 Thread Troy Benjegerdes

I got this after an indeterminate amount of time running opensm..


(gdb) bt
#0  0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0,
count=64) at cl_memory_osd.c:87
#1  0x00415053 in osm_pkey_tbl_sync_new_blocks (
p_pkey_tbl=0x2ad99228) at osm_pkey.c:127
#2  0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40)
at osm_pkey_mgr.c:407
#3  0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
signal=3)
at osm_state_mgr.c:2243
#4  0x0043c88f in __osm_state_mgr_ctrl_disp_callback (
context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
#5  0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0)
at cl_dispatcher.c:108
#6  0x2b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268)
at cl_threadpool.c:78
#7  0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
cl_thread.c:61
#8  0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
#9  0x2b90b12c8273 in clone () from /lib/libc.so.6



And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
just seems like excessive uneeded abstraction.

I'm running opensm from subversion rev 7091.. 

May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn
6251:7091M

the only local changes are as follows:

[EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff
Index: osm/opensm/osm_port_info_rcv.c
===
--- osm/opensm/osm_port_info_rcv.c  (revision 7091)
+++ osm/opensm/osm_port_info_rcv.c  (working copy)
@@ -469,9 +469,14 @@
   goto Exit;
 }

+#if 0
 /* Check for IBM eHCA firmware defect in reporting partition
 * enforcement cap */
 if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) ==
IBM_VENDOR_ID)
   p_switch->switch_info.enforce_cap = 0;
+#endif
+/* Check for busted divergenet switch on ameslab network */
+if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e000152)
+   p_switch->switch_info.enforce_cap = 0;

 /* Bail out if this is a switch with no partition enforcement
 * capability */
 if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] RDMA CM: updates to 2.6.18 branch

2006-05-16 Thread Roland Dreier

OK, the for-2.6.18 branch is updated with all of this.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel

2006-05-16 Thread Scott Weitzenkamp (sweitzen)

Actually, I spoke too soon.   Kernel components compiled, but MVAPICH
did not:

Compiling MVAPICH ...
2
mpirun_rsh.c: In function 'read_hostfile':
mpirun_rsh.c:1197: warning: incompatible implicit declaration of
built-in functi
on 'strndup'
mpirun_rsh.c:1205: warning: incompatible implicit declaration of
built-in functi
on 'strndup'
mpirun_rsh.c:1220: warning: incompatible implicit declaration of
built-in functi
on 'strndup'
mpirun_rsh.c:1220: error: too few arguments to function 'strndup'
make[3]: *** [mpirun_rsh] Error 1
Exit status from make was 2
make[2]: *** [mpilib] Error 1
make[1]: *** [mpi-modules] Error 2
make: *** [mpi] Error 2
Error in compiling MVAPICH. Check the log file: make.mvapich.log
Exiting 
Mvapich installation failed

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Scott Weitzenkamp (sweitzen)
> Sent: Tuesday, May 16, 2006 10:48 AM
> To: Michael S. Tsirkin
> Cc: [EMAIL PROTECTED]; openib-general@openib.org
> Subject: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on 
> orig FC5 kernel
> 
> After running "yum update", I was able to compile OFED 1.0 rc4 on
> 2.6.16-1.2111_FC5 kernel.
> 
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
> > -Original Message-
> > From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] 
> > Sent: Thursday, May 11, 2006 9:37 AM
> > To: Scott Weitzenkamp (sweitzen)
> > Cc: [EMAIL PROTECTED]; openib-general@openib.org
> > Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel
> > 
> > Quoting r. Scott Weitzenkamp (sweitzen) <[EMAIL PROTECTED]>:
> > > Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel
> > > 
> > > Is this a useful kernel to try, or should get latest FC5 
> > kernel or 2.6.16 from kernel.org?
> > 
> > I think you should go to latest update.
> > 
> > -- 
> > MST
> > 
> ___
> openfabrics-ewg mailing list
> [EMAIL PROTECTED]
> http://openib.org/mailman/listinfo/openfabrics-ewg
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel

2006-05-16 Thread Scott Weitzenkamp (sweitzen)

After running "yum update", I was able to compile OFED 1.0 rc4 on
2.6.16-1.2111_FC5 kernel.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -Original Message-
> From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, May 11, 2006 9:37 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: [EMAIL PROTECTED]; openib-general@openib.org
> Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel
> 
> Quoting r. Scott Weitzenkamp (sweitzen) <[EMAIL PROTECTED]>:
> > Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel
> > 
> > Is this a useful kernel to try, or should get latest FC5 
> kernel or 2.6.16 from kernel.org?
> 
> I think you should go to latest update.
> 
> -- 
> MST
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: Re: [PATCH] RE: compliancy issue?

2006-05-16 Thread Michael S. Tsirkin

Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: Re: Re: [PATCH] RE: compliancy issue?
> 
> >OK, I just tested and this works for me. Here's the SDP patch to do what 
> >you
> >described. The code actually got cleaner now: its convenient to get
> >different events on active versus passive side - previously I had
> >to check a flag to figure out what does ESTABLISHED mean.
> 
> I committed the CMA patch.
> 

Ditto for the SDP update.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Heads-up for anyone using certain thunderbird message filter features

2006-05-16 Thread Roger Heflin


Off topic - but probably very important to people using
thunderbird as an email client.

I am using certain thunderbird message filter features - mainly
move to folder and then delete from pop server (this is done
as a single step).

I am on 2 mailing lists that received the same patch set, of the
54 patches emails, 15 patches when to one folder (kernel), 41
patches when to the other folder (openib), and 3 went to both.

So anyone using this should watch as the filter does act odd, I suspect
that it may be that since the message id is the same that, that may be
what it is using to delete the message and may cause it to get both
messages, the emails that I got both copies of were delayed by quote
a bit and very likely came in on different email downloads, so
the other email were not there to delete.

  Roger
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] FTP over SDP is not working fine-[newbie]

2006-05-16 Thread Grant Grundler

On Tue, May 16, 2006 at 08:15:31AM +0100, keshetti mahesh wrote:
> my lisdp.conf file is:
> 
> match listen *:*
> match destination *:*
> match program *
> 
> so that all services can be allowed both on the client and server sides

Both client and server have a libsdp.conf file.
Do they both have the above content?

(It should, but your comment above suggests only one libsdp.conf
file is being used.)

> 
> after exporting that file and LD_PRELOAD=/usr/lib/lib64/libsdp.so, i   have 
> restarted all services (vsftpd, xinetd etc) 
> 
> again the same problem with FTP .network unreachable

I'll assume "ping" does work.

My next suggestion is to top the FTP server and manually invoke
it to listen on a different port (proftpd takes -p  parameter):
/etc/init.d/proftpd stop
LD_PRELOAD=/usr/lib/lib64/libsdp.so proftpd -p 20022

Using another login, confirm the ftp server is listening on 
port 20022 (netstat -a) and is using SDP (cat /proc//maps 
or something like that).

Then from the client, try to talk to that server with
LD_PRELOAD=/usr/lib/lib64/libsdp.so ftp 192.168.2.99 20022

> but the i can't understand y only this is giving problem (the other 
> applications are not giving any problem)

Sorry - I don't understand that either.
If my above suggestion doesn't work, perhaps try a different
ftp server or different ftp client?


grant
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM

2006-05-16 Thread Steve Wise


> Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like
> not a good idea. Need to think back to why do we need this at all.
> 

You need it to keep the connection alive until both client and server
have finished running the test, in the case of full duplex tests...



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: need help regarding IB core software

2006-05-16 Thread Sean Hefty











Please post generic questions to the openib
mailing list.


i have started working over infiniband recently
i want to develop a sample utility that would perform simple RDMA (read/write)
operations



There are some test applications that can be
used as a base.  Are you wanting a userspace or kernel application?


i will be thankful to you if u can refer me some documents,atticles or books
where i can get this information



The best documentation is the IB
architecture specification.  However, I think that the test apps are
simple enough to help you here.

 

- Sean

 








___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH] RE: compliancy issue?

2006-05-16 Thread Sean Hefty


OK, I just tested and this works for me. Here's the SDP patch to do what you
described. The code actually got cleaner now: its convenient to get
different events on active versus passive side - previously I had
to check a flag to figure out what does ESTABLISHED mean.


I committed the CMA patch.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: sdp with kernel 2.6.16.14

2006-05-16 Thread Michael S. Tsirkin

Quoting r. amit byron <[EMAIL PROTECTED]>:
> Subject: Re: sdp with kernel 2.6.16.14
> 
> 
> Michael,
> 
> netperf works with sdp.
> 
> slabinfo output:
> slabinfo - version: 2.1 (statistics)
> # name
>  : tunables: slabdata 
>: globalstat   
>   : cpustat 
>
> SDP0  0   117262 : tunables   24   128 : 
> slabdata  0  0  0 : globalstat 104 1811   11  
>  0000 : cpustat  3 17 20  0
> fib6_nodes 7 92 40   921 : tunables   32   168 : 
> slabdata  1  1  0 : globalstat  32 21 10  
>  0000 : cpustat  5  2  0  0
> ip6_dst_cache  9 17228   171 : tunables   32   168 : 
> slabdata  1  1  0 : globalstat  36 17 10  
>  0000 : cpustat 14  3  8  0
> ndisc_cache2 22180   221 : tunables   32   168 : 
> slabdata  1  1  0 : globalstat  32 17 10  
>  0000 : cpustat  3  2  3  0
> RAWv6  7 11712   112 : tunables   32   168 : 
> slabdata  1  1  0 : globalstat  11 11 10  
>  0000 : cpustat  6  1  0  0
> UDPv6  1 11684   112 : tunables   32   168 : 
> slabdata  1  1  0 : globalstat  52 22 21  
>  0000 : cpustat 10  5 14  0
> 
> Amit
> 
> "Michael S. Tsirkin" <[EMAIL PROTECTED]> wrote:
> 
> Quoting r. amit byron :
> > Subject: sdp with kernel 2.6.16.14
> >
> >
> > hi,
> >
> > i'm trying to get sdp work between point-to-point connected
> > machines running kernel 2.6.16.24. i have configured ipoib
> > and trying to run iperf using sdp.
> >
> > the client machine has an entry in its libsdp.conf:
> > match destination 192.168.1.2
> >
> > the server machine has na entry in its libsdp.conf:
> > match listen *:5001
> >
> > iperf is started on the server machine using command:
> > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -s
> >
> > iperf client is started on the client machine using command:
> > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -c 192.168.1.2
> >
> > the server machine panics with following messages:
> >
> > oom-killer: gfp_mask=0xd0, order=0
> > [] oom-killer: gfp_mask=0xd0, order=0
> > [] out_of_memory+0x155/0x180
> > [] __alloc_pages+0x2a5/0x320
> > [] __get_free_pages+0x1e/0x40
> > [] __pollwait+0x80/0xd0
> > [] pipe_poll+0xcd/0xe0
> > [] do_select+0x212/0x480
> > [] cache_free_debugcheck+0x135/0x230
> > [] __pollwait+0x0/0xd0
> > [] core_sys_select+0x1ce/0x2e0
> > [] sys_select+0x51/0x1c0
> > [] sysenter_past_esp+0x54/0x75
> > DMA per-cpu:
> > cpu 0 hot: high 0, batch 1 used:0
> > cpu 0 cold: high 0, batch 1 used:0
> > cpu 1 hot: high 0, batch 1 used:0
> > cpu 1 cold: high 0, batch 1 used:0
> > cpu 2 hot: high 0, batch 1 used:0
> > cpu 2 cold: high 0, batch 1 used:0
> > cpu 3 hot: high 0, batch 1 used:0
> > cpu 3 cold: high 0, batch 1 used:0
> > DMA32 per-cpu: empty
> > Normal per-cpu:
> > cpu 0 hot: high 186, batch 31 used:103
> > cpu 0 cold: high 62, batch 15 used:61
> > cpu 1 hot: high 186, batch 31 used:183
> > cpu 1 cold: high 62, batch 15 used:53
> > cpu 2 hot: high 186, batch 31 used:28
> > cpu 2 cold: high 62, batch 15 used:54
> > cpu 3 hot: high 186, batch 31 used:63
> > cpu 3 cold: high 62, batch 15 used:60
> > HighMem per-cpu:
> > cpu 0 hot: high 186, batch 31 used:176
> > cpu 0 cold: high 62, batch 15 used:13
> > cpu 1 hot: high 186, batch 31 used:169
> > cpu 1 cold: high 62, batch 15 used:1
> > cpu 2 hot: high 186, batch 31 used:157
> > cpu 2 cold: high 62, batch 15 used:0
> > cpu 3 hot: high 186, batch 31 used:174
> > cpu 3 cold: high 62, batch 15 used:6
> > Free pages: 7366104kB (7358760kB HighMem)
> > Active:5351 inactive:4885 dirty:0 writeback:0 unstable:0 free:1841526 
> slab:8970 mapped:4565 pagetables:238
> > DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB 
> present:16384kB pages_scanned:8 all_unreclaimable? yes
> > lowmem_reserve[]: 0 0 880 8623
> > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB 
> present:0kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 880 8623
> > Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB 
> inactive:0kB present:901
> > 120kB pages_scanned:314 all_unreclaimable? yes
> > lowmem_reserve[]: 0 0 0 61951
> > HighMem free:7358760kB min:512kB low:8780kB high:17052kB active:21172kB 
> inactive:19540kB present:7929

RE: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat toutilize the RDMA CM

2006-05-16 Thread Sagi Rotem

If we will have a patch just to the pp routine as MST suggested it would
be nice , I could apply it to all other performance tests.
Sagi 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael S.
Tsirkin
Sent: Tuesday, May 16, 2006 5:58 PM
To: Steve Wise
Cc: openib-general
Subject: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat
toutilize the RDMA CM

Quoting r. Steve Wise <[EMAIL PROTECTED]>:
> Subject: Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the

> RDMA CM
> 
> On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote:
> > Quoting r. Steve Wise <[EMAIL PROTECTED]>:
> > > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the

> > > RDMA CM
> > > 
> > > I don't know who maintains src/userspace/perftest, but here is a 
> > > patch set that enables rdma_bw and rdma_lat to use the RDMA_CM 
> > > with the addition of the -c or --cma flag.
> > > 
> > 
> > I'm worried that this makes the program too big. Maybe this should 
> > be another test rather than an option?
> > 
> 
> ok.  You want it as a separate pair of programs?

I guess we'll see once there's the minimum patch that only affects the
connection setup.  If the changes can be localised to just the pp
routines, then I think it still fits as part of the same test.

> > > The rkey/addr info is exchanged in the private data, and 
> > > SEND/RECV's are used to sync the client/server before and after
execution.
> > 
> > Do we really need SEND/RECV messages for this?
> > I think I get completion with error once the remote side has
disconnected. No?
> > 
> 
> perhaps.  I just thought it was cleaner to synch up at the end.  Just 
> like the non-cma version does over the TCP socket (see
> pp_client_exch_dest() / pp_server_exch_dest() at the end of the test).

Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like not a
good idea. Need to think back to why do we need this at all.

--
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM

2006-05-16 Thread Michael S. Tsirkin

Quoting r. Steve Wise <[EMAIL PROTECTED]>:
> Subject: Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
> 
> On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote:
> > Quoting r. Steve Wise <[EMAIL PROTECTED]>:
> > > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
> > > 
> > > I don't know who maintains src/userspace/perftest, but here is a patch
> > > set that enables rdma_bw and rdma_lat to use the RDMA_CM with the
> > > addition of the -c or --cma flag.
> > > 
> > 
> > I'm worried that this makes the program too big. Maybe this should be
> > another test rather than an option?
> > 
> 
> ok.  You want it as a separate pair of programs?

I guess we'll see once there's the minimum patch that only affects the
connection setup.  If the changes can be localised to just the pp routines, then
I think it still fits as part of the same test.

> > > The rkey/addr info is exchanged in the private data, and SEND/RECV's are 
> > > used
> > > to sync the client/server before and after execution.
> > 
> > Do we really need SEND/RECV messages for this?
> > I think I get completion with error once the remote side has disconnected. 
> > No?
> > 
> 
> perhaps.  I just thought it was cleaner to synch up at the end.  Just
> like the non-cma version does over the TCP socket (see
> pp_client_exch_dest() / pp_server_exch_dest() at the end of the test).

Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like
not a good idea. Need to think back to why do we need this at all.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM

2006-05-16 Thread Steve Wise

On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote:
> Quoting r. Steve Wise <[EMAIL PROTECTED]>:
> > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
> > 
> > I don't know who maintains src/userspace/perftest, but here is a patch
> > set that enables rdma_bw and rdma_lat to use the RDMA_CM with the
> > addition of the -c or --cma flag.
> > 
> 
> I'm worried that this makes the program too big. Maybe this should be
> another test rather than an option?
> 

ok.  You want it as a separate pair of programs?

> > The rkey/addr info is exchanged in the private data, and SEND/RECV's are 
> > used
> > to sync the client/server before and after execution.
> 
> Do we really need SEND/RECV messages for this?
> I think I get completion with error once the remote side has disconnected. No?
> 

perhaps.  I just thought it was cleaner to synch up at the end.  Just
like the non-cma version does over the TCP socket (see
pp_client_exch_dest() / pp_server_exch_dest() at the end of the test).

> > Also, I added -P or --poll to rdma_bw to allow blocking for completion
> > events when none are ready (if you omit -P, it will block when no
> > completion is available, otherwise it will spin).
> 
> Needs to be a separate patch.

ok.


> 
> > Signed-off-by: Steve Wise <[EMAIL PROTECTED]>
> 
> 
> > Index: rdma_lat.c
> > ===
> > --- rdma_lat.c  (revision 7050)
> > +++ rdma_lat.c  (working copy)
> > @@ -53,6 +53,7 @@
> >  #include 
> >  
> >  #include 
> > +#include 
> >  
> >  #include "get_clock.h"
> >  
> > @@ -71,7 +72,8 @@
> > struct ibv_context *context;
> > struct ibv_pd  *pd;
> > struct ibv_mr  *mr;
> > -   struct ibv_cq  *cq;
> > +   struct ibv_cq  *scq;
> > +   struct ibv_cq  *rcq;
> 
> Why are you adding another CQ?
> 

It makes waiting for a recv completion easier since you won't get a send
completion when the CQ is only for receives...


> > struct ibv_qp  *qp;
> > void   *buf;
> > volatile char  *post_buf;
> > @@ -80,6 +82,7 @@
> > int tx_depth;
> > struct ibv_sge list;
> > struct ibv_send_wr wr;
> > +   struct rdma_cm_id  *cm_id;
> >  };
> >  
> >  struct pingpong_dest {
> > @@ -323,16 +326,22 @@
> > return NULL;
> > }
> >  
> > -   ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0);
> > -   if (!ctx->cq) {
> > +   ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0);
> > +   if (!ctx->rcq) {
> > fprintf(stderr, "Couldn't create CQ\n");
> > return NULL;
> > }
> 
> CQ of depth 1?
> 

Yes, there is only ever one outstanding send/recv exchange...



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM

2006-05-16 Thread Michael S. Tsirkin

Quoting r. Steve Wise <[EMAIL PROTECTED]>:
> Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM
> 
> I don't know who maintains src/userspace/perftest, but here is a patch
> set that enables rdma_bw and rdma_lat to use the RDMA_CM with the
> addition of the -c or --cma flag.
> 

I'm worried that this makes the program too big. Maybe this should be
another test rather than an option?

> The rkey/addr info is exchanged in the private data, and SEND/RECV's are used
> to sync the client/server before and after execution.

Do we really need SEND/RECV messages for this?
I think I get completion with error once the remote side has disconnected. No?

> Also, I added -P or --poll to rdma_bw to allow blocking for completion
> events when none are ready (if you omit -P, it will block when no
> completion is available, otherwise it will spin).

Needs to be a separate patch.

> Signed-off-by: Steve Wise <[EMAIL PROTECTED]>


> Index: rdma_lat.c
> ===
> --- rdma_lat.c(revision 7050)
> +++ rdma_lat.c(working copy)
> @@ -53,6 +53,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  #include "get_clock.h"
>  
> @@ -71,7 +72,8 @@
>   struct ibv_context *context;
>   struct ibv_pd  *pd;
>   struct ibv_mr  *mr;
> - struct ibv_cq  *cq;
> + struct ibv_cq  *scq;
> + struct ibv_cq  *rcq;

Why are you adding another CQ?

>   struct ibv_qp  *qp;
>   void   *buf;
>   volatile char  *post_buf;
> @@ -80,6 +82,7 @@
>   int tx_depth;
>   struct ibv_sge list;
>   struct ibv_send_wr wr;
> + struct rdma_cm_id  *cm_id;
>  };
>  
>  struct pingpong_dest {
> @@ -323,16 +326,22 @@
>   return NULL;
>   }
>  
> - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0);
> - if (!ctx->cq) {
> + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0);
> + if (!ctx->rcq) {
>   fprintf(stderr, "Couldn't create CQ\n");
>   return NULL;
>   }

CQ of depth 1?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM

2006-05-16 Thread Steve Wise

I don't know who maintains src/userspace/perftest, but here is a patch
set that enables rdma_bw and rdma_lat to use the RDMA_CM with the
addition of the -c or --cma flag.

The rkey/addr info is exchanged in the private data, and SEND/RECV's are
used to sync the client/server before and after execution.

Also, I added -P or --poll to rdma_bw to allow blocking for completion
events when none are ready (if you omit -P, it will block when no
completion is available, otherwise it will spin).

Signed-off-by: Steve Wise <[EMAIL PROTECTED]>



Index: rdma_lat.c
===
--- rdma_lat.c  (revision 7050)
+++ rdma_lat.c  (working copy)
@@ -53,6 +53,7 @@
 #include 
 
 #include 
+#include 
 
 #include "get_clock.h"
 
@@ -71,7 +72,8 @@
struct ibv_context *context;
struct ibv_pd  *pd;
struct ibv_mr  *mr;
-   struct ibv_cq  *cq;
+   struct ibv_cq  *scq;
+   struct ibv_cq  *rcq;
struct ibv_qp  *qp;
void   *buf;
volatile char  *post_buf;
@@ -80,6 +82,7 @@
int tx_depth;
struct ibv_sge list;
struct ibv_send_wr wr;
+   struct rdma_cm_id  *cm_id;
 };
 
 struct pingpong_dest {
@@ -323,16 +326,22 @@
return NULL;
}
 
-   ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0);
-   if (!ctx->cq) {
+   ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0);
+   if (!ctx->rcq) {
fprintf(stderr, "Couldn't create CQ\n");
return NULL;
}
 
+   ctx->scq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0);
+   if (!ctx->scq) {
+   fprintf(stderr, "Couldn't create CQ\n");
+   return NULL;
+   }
+
{
struct ibv_qp_init_attr attr = {
-   .send_cq = ctx->cq,
-   .recv_cq = ctx->cq,
+   .send_cq = ctx->scq,
+   .recv_cq = ctx->rcq,
.cap = {
.max_send_wr  = tx_depth,
/* Work around:  driver doesnt support
@@ -370,13 +379,6 @@
}
}
 
-   ctx->wr.wr_id  = PINGPONG_RDMA_WRID;
-   ctx->wr.sg_list= &ctx->list;
-   ctx->wr.num_sge= 1;
-   ctx->wr.opcode = IBV_WR_RDMA_WRITE;
-   ctx->wr.send_flags = IBV_SEND_SIGNALED | IBV_SEND_INLINE;
-   ctx->wr.next   = NULL;
-
return ctx;
 }
 
@@ -489,6 +491,467 @@
return 0;
 }
 
+/* CMA STUFF */
+
+static void pp_post_recv(struct pingpong_context *ctx)
+{
+   struct ibv_sge list;
+   struct ibv_recv_wr wr, *bad_wr;
+   int rc;
+
+   list.addr = (uintptr_t) ctx->buf;
+   list.length = 1;
+   list.lkey = ctx->mr->lkey;
+   wr.next = NULL;
+   wr.wr_id = 0xdeadbeef;
+   wr.sg_list = &list;
+   wr.num_sge = 1;
+
+   rc = ibv_post_recv(ctx->qp, &wr, &bad_wr);
+   if (rc) {
+   perror("ibv_post_recv");
+   fprintf(stderr, "%s ibv_post_recv failed %d\n", __FUNCTION__, 
rc);
+   }
+}
+
+static struct pingpong_context *pp_init_cma_ctx(struct rdma_cm_id *cm_id,
+   unsigned size,
+   int tx_depth, int port)
+{
+   struct pingpong_context *ctx;
+
+   ctx = malloc(sizeof *ctx);
+   if (!ctx)
+   return NULL;
+
+   ctx->size = size;
+   ctx->tx_depth = tx_depth;
+
+   ctx->buf = memalign(page_size, size * 2);
+   if (!ctx->buf) {
+   fprintf(stderr, "Couldn't allocate work buf.\n");
+   return NULL;
+   }
+
+   memset(ctx->buf, 0, size * 2);
+
+   ctx->post_buf = (char*)ctx->buf + (size - 1);
+   ctx->poll_buf = (char*)ctx->buf + (2 * size - 1);
+
+   ctx->cm_id = cm_id;
+   ctx->context = cm_id->verbs;
+   if (!ctx->context) {
+   fprintf(stderr, "%s Unbound cm_id!!\n", __FUNCTION__);
+   return NULL;
+   }
+
+   ctx->pd = ibv_alloc_pd(ctx->context);
+   if (!ctx->pd) {
+   fprintf(stderr, "Couldn't allocate PD\n");
+   return NULL;
+   }
+
+/* We dont really want IBV_ACCESS_LOCAL_WRITE, but IB spec says:
+ * The Consumer is not allowed to assign Remote Write or Remote Atomic 
to
+ * a Memory Region that has not been assigned Local Write. */
+   ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size * 2,
+IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE);
+   if (!ctx->mr) {
+   fprintf(stderr, "Couldn't allocate MR\n");
+   return NULL;
+   }
+
+   ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0);
+   if (!ctx->rcq) {
+   fprintf(stderr, "Couldn't create RCQ\n");
+   return NULL;
+   }
+
+   ctx

Re: [openib-general] RDMA enabled NICs- newbie

2006-05-16 Thread Steve Wise

As has been said already, there are two rnics running with the Open
Fabrics stack, the Ammasso 1100 and the Chelsio CXGB3 rnics.  Tom Tucker
and I are the maintainers of this code base (the iwarp branch).  

I hear NetEffect also has an 10Gb iWARP NIC.  As far as I know, they
don't have any support for the Open Fabrics iwarp branch yet. 

Steve.




On Mon, 2006-05-15 at 12:14 -0500, Roger Heflin wrote:
> Ian Brown wrote:
> > Thanks all.
> > I indeed fround that
> > http://www.ammasso.com/ responds with
> > "There is no website configured at this address."
> > while
> > http://www.chelsio.com/
> > does exist.
> > 
> > Is there a reason why manufacturers will refrain from
> > producing RDMA ? (I mean , are there better technologies
> > which are a substitute for RDMA for ethernet ?)
> > Regards,
> > IB
> 
> 
> I kind of think that the market is too small to support
> a company making a card that is at best just slightly cheaper
> than things like Infiniband, and Myrinet, and is actually
> slower than the Infiniband and Myrinet.
> 
> Consider how many cards one has to sell to pay a single
> engineers salary when you are at best making $100-$150 a
> card over production costs.The numbers don't look that
> good to me, and consider that previous to Ammasso and Chelsio
> there have been a long string of companies producing accelerated
> nitch network cards of various types (going back as far as the
> early 90's), and all of them have failed to get enough
> market share to stay in business.   About the only thing
> that makes one of these companies viable is being bought
> out by someone large enough to support the needed funding.
> 
> Level 5 is making accelerated ethernet cards, I believe most
> of the acceleration is in software in some manner (kernel bypass),
> and I don't know if their card could be made to do rdma.
> 
>   Roger
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] RE: compliancy issue?

2006-05-16 Thread Michael S. Tsirkin

Quoting r. Sean Hefty <[EMAIL PROTECTED]>:
> Subject: [PATCH] RE: compliancy issue?
> 
> >CA4-24.2.3: The connecting peer shall terminate the connection attempt
> >if ExtMaxAdverts of the HAH is set to zero.
> >
> >This means that SDP must examine the HAH before RTU is sent.
> >But, CMA currently sends RTU from cma_rep_recv, before notifying
> >the user.
> 
> Can you try this simple patch and see if it fixes your problem?  You will
> need to call rdma_accept() or rdma_reject() after receiving a CONNECT_RESPONSE
> event.  The conn_param to rdma_accept() should be NULL.

OK, I just tested and this works for me. Here's the SDP patch to do what you
described. The code actually got cleaner now: its convenient to get
different events on active versus passive side - previously I had
to check a flag to figure out what does ESTABLISHED mean.

I still think it makes sense to do this for all ULPs and not just SDP, but oh
well.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

Index: linux-2.6.16/drivers/infiniband/ulp/sdp/sdp_cma.c
===
--- linux-2.6.16.orig/drivers/infiniband/ulp/sdp/sdp_cma.c  2006-05-16 
15:22:00.0 +0300
+++ linux-2.6.16/drivers/infiniband/ulp/sdp/sdp_cma.c   2006-05-16 
15:25:03.0 +0300
@@ -237,9 +237,9 @@ int sdp_connect_handler(struct sock *sk,
return 0;
 }
 
-int sdp_connected_handler(struct sock *sk, struct rdma_cm_event *event)
+static int sdp_response_handler(struct sock *sk, struct rdma_cm_event *event)
 {
-   struct sock *parent;
+   struct sdp_hah *h;
sdp_dbg(sk, "%s\n", __func__);
 
sk->sk_state = TCP_ESTABLISHED;
@@ -250,23 +250,37 @@ int sdp_connected_handler(struct sock *s
if (sock_flag(sk, SOCK_DEAD))
return 0;
 
+   h = event->private_data;
+   sdp_sk(sk)->bufs = ntohs(h->bsdh.bufs);
+   sdp_sk(sk)->xmit_size_goal = ntohl(h->actrcvsz) -
+   sizeof(struct sdp_bsdh);
+
+   sdp_dbg(sk, "%s bufs %d xmit_size_goal %d\n", __func__,
+   sdp_sk(sk)->bufs,
+   sdp_sk(sk)->xmit_size_goal);
+
+   ib_req_notify_cq(sdp_sk(sk)->qp->send_cq, IB_CQ_NEXT_COMP);
+
+   sk->sk_state_change(sk);
+   sk_wake_async(sk, 0, POLL_OUT);
+   return 0;
+}
+
+int sdp_connected_handler(struct sock *sk, struct rdma_cm_event *event)
+{
+   struct sock *parent;
+   sdp_dbg(sk, "%s\n", __func__);
+
parent = sdp_sk(sk)->parent;
-   if (!parent) {
-   struct sdp_hah *h = event->private_data;
-   sdp_sk(sk)->bufs = ntohs(h->bsdh.bufs);
-   sdp_sk(sk)->xmit_size_goal = ntohl(h->actrcvsz) -
-   sizeof(struct sdp_bsdh);
-
-   sdp_dbg(sk, "%s bufs %d xmit_size_goal %d\n", __func__,
-   sdp_sk(sk)->bufs,
-   sdp_sk(sk)->xmit_size_goal);
+   BUG_ON(!parent);
 
-   ib_req_notify_cq(sdp_sk(sk)->qp->send_cq, IB_CQ_NEXT_COMP);
+   sk->sk_state = TCP_ESTABLISHED;
+
+   /* TODO: If SOCK_KEEPOPEN set, need to reset and start
+  keepalive timer here */
 
-   sk->sk_state_change(sk);
-   sk_wake_async(sk, 0, POLL_OUT);
+   if (sock_flag(sk, SOCK_DEAD))
return 0;
-   }
 
lock_sock(parent);
if (sk_acceptq_is_full(parent)) {
@@ -292,11 +306,6 @@ void sdp_disconnected_handler(struct soc
sdp_dbg(sk, "%s\n", __func__);
 }
 
-void sdp_response_handler(struct sock *sk)
-{
-   sdp_dbg(sk, "%s\n", __func__);
-}
-
 int sdp_cma_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
 {
struct rdma_conn_param conn_param;
@@ -388,7 +397,11 @@ int sdp_cma_handler(struct rdma_cm_id *i
break;
case RDMA_CM_EVENT_CONNECT_RESPONSE:
sdp_dbg(sk, "RDMA_CM_EVENT_CONNECT_RESPONSE\n");
-   sdp_response_handler(sk);
+   rc = sdp_response_handler(sk, event);
+   if (rc)
+   rdma_reject(id, NULL, 0);
+   else
+   rc = rdma_accept(id, NULL);
break;
case RDMA_CM_EVENT_CONNECT_ERROR:
sdp_dbg(sk, "RDMA_CM_EVENT_CONNECT_ERROR\n");

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [Bug 31] ifconfig up/down while ssh connection alive cause oops

2006-05-16 Thread bugzilla-daemon

http://openib.org/bugzilla/show_bug.cgi?id=31

[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [Bug 31] ifconfig up/down while ssh connection alive cause oops

2006-05-16 Thread bugzilla-daemon

http://openib.org/bugzilla/show_bug.cgi?id=31

[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED



--- Additional Comments From [EMAIL PROTECTED]  2006-05-16 05:30 ---
Resolved in RC4



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [Bug 65] ib_ipoib refuses to unload when alias exists in modprobe.conf

2006-05-16 Thread bugzilla-daemon

http://openib.org/bugzilla/show_bug.cgi?id=65

[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||WONTFIX





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [Bug 28] ipoib_mcast_sendonly_join_complete oops

2006-05-16 Thread bugzilla-daemon

http://openib.org/bugzilla/show_bug.cgi?id=28

[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED



--- Additional Comments From [EMAIL PROTECTED]  2006-05-16 05:14 ---
Fixed by Eli



--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] slab: Fix kmem_cache_destroy() on NUMA

2006-05-16 Thread Or Gerlitz


Roland Dreier wrote:

With CONFIG_NUMA set, kmem_cache_destroy() may fail and say "Can't
free all objects."  The problem is caused by sequences such as the
following (suppose we are on a NUMA machine with two nodes, 0 and 1):

 * Allocate an object from cache on node 0.
 * Free the object on node 1.  The object is put into node 1's alien
   array_cache for node 0.
 * Call kmem_cache_destroy(), which ultimately ends up in __cache_shrink().
 * __cache_shrink() does drain_cpu_caches(), which loops through all nodes.
   For each node it drains the shared array_cache and then handles the
   alien array_cache for the other node.

However this means that node 0's shared array_cache will be drained,
and then node 1 will move the contents of its alien[0] array_cache
into that same shared array_cache.  node 0's shared array_cache is
never looked at again, so the objects left there will appear to be in
use when __cache_shrink() calls __node_shrink() for node 0.  So
__node_shrink() will return 1 and kmem_cache_destroy() will fail.

This patch fixes this by having drain_cpu_caches() do
drain_alien_cache() on every node before it does drain_array() on the
nodes' shared array_caches.

The problem was originally reported by Or Gerlitz <[EMAIL PROTECTED]>.

Cc: Christoph Lameter <[EMAIL PROTECTED]>
Cc: Pekka Enberg <[EMAIL PROTECTED]>


OK, Indeed i have CONFIG_NUMA and yes, the patch fixes my problem, 
thanks a lot for working on that!


Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [Bug 33] OFED: Ping fails on ib1 interface - IBED - RC3

2006-05-16 Thread bugzilla-daemon

http://openib.org/bugzilla/show_bug.cgi?id=33

[EMAIL PROTECTED] changed:

   What|Removed |Added

 AssignedTo|[EMAIL PROTECTED] |[EMAIL PROTECTED]
Summary|Ping fails on ib1 interface |OFED: Ping fails on ib1
   |- IBED - RC3|interface - IBED - RC3





--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

43 matches

Mail list logo