[openib-general] opensm crash with topspin HCA

2006-11-02 Thread Viswanath Krishnamurthy
When we run opensm (OFED) release and if a Topspin HCA is in the IB network, opensm crashes in umad_receiver with NULL pointer exception.  The transaction ID is zero is the MAD'S from topspin HCA on windows. The crashes seems to random in umad_receiver.

 HCA found:
    
hca_id=InfiniHost0
    
vendor_id=0x02C9
    
vendor_part_id=0x5A44
    
hw_ver=0xA0
    
fw_ver=0x40006

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] CM and REP handling

2006-06-30 Thread Viswanath Krishnamurthy
In the current communication manager (CM) implementation how is the REP MADgetting lost handled. When the REP gets lost, the cm_dup_req_handler gets calledwhich currently enters the default condition and does nothing.  The client retries
the number of timers it is configured to and fails.  If the first REP gets lost, the connectionnever gets established. So what should be the behavior ?-Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Disabling end-to-end flow control

2006-06-22 Thread Viswanath Krishnamurthy
Is there a way to disable end-to-end flowcontrol using any of the API's ?Thanks,-Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm and NPTL

2006-06-13 Thread Viswanath Krishnamurthy
I am using the trunk.   Should I be using 1.0 ?

-Viswa
 On 13 Jun 2006 12:35:17 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Tue, 2006-06-13 at 12:21, Viswanath Krishnamurthy wrote:> Yes.. I want to test waters again and see if the issues went away.Are you using the trunk or 1.0 ?-- Hal> -Viswa>>
> On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]>> wrote:> Hi Viswa,>> On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:
> > There were some issues with opensm running with> NPTL  (thread> > library). Has the issues been resolved ?>> There were some fixes to the signal handling which went in
> back in the> Feb/early March time frame. OpenSM should be better with NPTL> now. Is it> working for you or are you asking before stepping into these> waters
> again ?>> -- Hal>> > Regards,> > Viswa> >> >> >> >> __
> >> > ___> > openib-general mailing list> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general> >> > To unsubscribe, please visit> 
http://openib.org/mailman/listinfo/openib-general>>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm and NPTL

2006-06-13 Thread Viswanath Krishnamurthy
Yes.. I want to test waters again and see if the issues went away.

-Viswa
On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:> There were some issues with opensm running with NPTL  (thread> library). Has the issues been resolved ?There were some fixes to the signal handling which went in back in the
Feb/early March time frame. OpenSM should be better with NPTL now. Is itworking for you or are you asking before stepping into these watersagain ?-- Hal> Regards,> Viswa>>
>> __>> ___> openib-general mailing list> 
openib-general@openib.org> http://openib.org/mailman/listinfo/openib-general>> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] opensm and NPTL

2006-06-12 Thread Viswanath Krishnamurthy
There were some issues with opensm running with NPTL  (thread library). Has the issues been
resolved ?

Regards,
Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Fix for ibping

2006-04-13 Thread Viswanath Krishnamurthy
Works like a charm...

-Viswa
On 12 Apr 2006 21:32:33 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Wed, 2006-04-12 at 20:46, Hal Rosenstock wrote:> On Wed, 2006-04-12 at 18:25, Viswanath Krishnamurthy wrote:> > The RMPP version needs to be 1.>> Thanks. I'm not sure what changed here to require this. I need to do
> some more digging.I figured it out. The fix is in r6448. Can you update and try it ?Thanks.-- Hal> -- Hal>> > [EMAIL PROTECTED] src]# svn diff ibping.c> > Index: 
ibping.c> > ===> > -- ibping.c(revision 6446)> > +++ ibping.c(working copy)> > @@ -336,7 +336,7 @@> > exit(0);
> > }> >> > -   if (mad_register_client(ping_class, 0) < 0)> > +   if (mad_register_client(ping_class, 1) < 0)>
>
IBERROR("can't register to ping class %d",> > ping_class);> >> > if (ib_resolve_portid_str(&portid, argv[0], dest_type, sm_id)> > < 0)> >> >
> >> > __> >> > ___> > openib-general mailing list> > 
openib-general@openib.org> > http://openib.org/mailman/listinfo/openib-general> >> > To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general>> ___> openib-general mailing list> 
openib-general@openib.org> http://openib.org/mailman/listinfo/openib-general>> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Fix for ibping

2006-04-12 Thread Viswanath Krishnamurthy
The mad_register_agent function in mad.c kernel file was checking for rmpp_version.
This was failing and this failure was propagated to umad (thru ioctl)

On 12 Apr 2006 20:46:33 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Wed, 2006-04-12 at 18:25, Viswanath Krishnamurthy wrote:> The RMPP version needs to be 1.Thanks. I'm not sure what changed here to require this. I need to dosome more digging.-- Hal> [
[EMAIL PROTECTED] src]# svn diff ibping.c> Index: ibping.c> ===> --- ibping.c(revision 6446)> +++ ibping.c(working copy)
> @@ -336,7 +336,7 @@> exit(0);> }>> -   if (mad_register_client(ping_class, 0) < 0)> +   if (mad_register_client(ping_class, 1) < 0)>
IBERROR("can't register to ping class %d",> ping_class);>> if (ib_resolve_portid_str(&portid, argv[0], dest_type, sm_id)> < 0)>>>> __
>> ___> openib-general mailing list> openib-general@openib.org> 
http://openib.org/mailman/listinfo/openib-general>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Fix for ibping

2006-04-12 Thread Viswanath Krishnamurthy
The RMPP version needs to be 1.

[EMAIL PROTECTED] src]# svn diff ibping.c
Index: ibping.c
===
--- ibping.c    (revision 6446)
+++ ibping.c    (working copy)
@@ -336,7 +336,7 @@
    exit(0);
    }

-   if (mad_register_client(ping_class, 0) < 0)
+   if (mad_register_client(ping_class, 1) < 0)
   
IBERROR("can't register to ping class %d", ping_class);

    if (ib_resolve_portid_str(&portid, argv[0], dest_type, sm_id) < 0)

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] ibping broken in SVN 6446 ?

2006-04-12 Thread Viswanath Krishnamurthy
When I do a ibping I get an error  (on a 32 bit machine)

Linux Kernel: 2.6.16 
infiniband directory replaced with SVN6446

I  enable debug in umad.c, I get the following error. The ioctl call to the umad  driver (umad device)
is failing.

return value for ioctl is -1, errno is -22 (EINVAL)
portid 0 registering qp 1 class 50 version 1 failed: ibping: iberror: can't register to ping class 50

-Viswa


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Mainline 2.6.16 kernel with openib userland libraries

2006-03-27 Thread Viswanath Krishnamurthy
My guess is the bug is in userspace library, since a kernel module
which uses the same API's in kernel mode works fine. I will work on the
sample code and send it..

-Viswa
On 3/27/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
Roland> Did this code work with mainline kernel 2.6.15?  If so youRoland> could do a bisection on the changes between 2.6.15 andRoland> 2.6.16 to pin down which patch broke things.Just to be clear: the thing to check would be the same userspace code
on kernel 2.6.15 and 2.6.16.  Because it's entirely possible that thebug is in a userspace library. - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Mainline 2.6.16 kernel with openib userland libraries

2006-03-27 Thread Viswanath Krishnamurthy
I tried using openib userland libraries with mainline 2.6.16  kernel but ran into
a strange problem. A userland application which uses CM and VERBS library which works
fine with earlier releases stopped working with no error (in API's). When I put
the analyser on, I see the CM connect sequence is fine but when ibv_post_send (RC send)
the DLID field in the LRH header is zero causing the packet to be dropped.

I tried mainline 2.6.16 kernel (with IB stack from kernel tree)

openib userland libraies
[EMAIL PROTECTED] 216GEN2]# svn info
Path: .
URL: https://openib.org/svn/gen2/trunk
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 5989
Node Kind: directory
Schedule: normal
Last Changed Author: halr
Last Changed Rev: 5989
Last Changed Date: 2006-03-23 10:17:02 -0800 (Thu, 23 Mar 2006)


Any idea about the compatilibility issues ?

Thanks,
Viswa



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] mthca and coalesced ACK

2006-02-21 Thread Viswanath Krishnamurthy
When the HCA receives back to back RDMA write followed by RDMA read requests. It  generates
coalesced ACK (implicit ACK for RDMA write). Is there a configuration in the mthca driver which will
enable HCA firmware to generate individual ACK's.  I an trying to debug another issue and this will be helpful.

Thanks,
Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Getting the right userspace libraries

2006-02-16 Thread Viswanath Krishnamurthy
How does one pull out the correct userland libraries for 2.6.16 kernel IB stack. Is it
to look at the SVN number in the driver code, and pull that version ?

-Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] mthca and non-MSI system

2005-11-18 Thread Viswanath Krishnamurthy
Has the mthca driver been tested on non-MSI (interrupt) system. I seem to have a problem where
interrupts are not generated on non-MSI system with the following message

"NOP command failed to generate interrupt (IRQ 9), aborting."
BIOS or ACPI interrupt routing problem?

-Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Vendor specific MAD support

2005-10-04 Thread Viswanath Krishnamurthy
Does openIB Gen2 stack umad/mad library support Vendor specific MAD extensions ?

-Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] mthca error ?

2005-09-28 Thread Viswanath Krishnamurthy
Roland,

I see the following when I use the latest mthca driver on a different HCA card

[  193.882759] ib_mthca: Initializing :03:00.0
[  193.887546] ib_mthca :03:00.0: Found bridge: :02:0c.0
[  194.894937] ib_mthca :03:00.0: SYS_EN DDR error: syn=4, sock=0, sladdr=0, SPD source=DIMM
[  194.903781] ib_mthca :03:00.0: SYS_EN returned status 0x07, aborting.
[  194.910823] ib_mthca: probe of :03:00.0 failed with error -22

lspci output

:03:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1)

Any idea what th error is ?

Thanks,
Viswa



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm and faulty hardware

2005-09-27 Thread Viswanath Krishnamurthy
Hal,

Thanks.. works like a charm...

-Viswa
On 27 Sep 2005 16:13:01 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Tue, 2005-09-27 at 16:00, Viswanath Krishnamurthy wrote:> Hal,>> I added a hack now to get around the problem. There needs to be a> proper fix later..Can you try this instead ? Thanks.
-- HalIndex: include/opensm/osm_port.h===--- include/opensm/osm_port.h   (revision 3567)+++ include/opensm/osm_port.h   (working copy)
@@ -346,7 +346,7 @@ osm_physp_is_healthy( *  Returns TRUE if the Physical Port has been maked as healthy *  FALSE otherwise. *  All physical ports are initialized as "healthy" but may be marked
-*  otherwise if a  received trap claims otherwise.+*  otherwise if a received trap claims otherwise. * * NOTES *@@ -456,6 +456,42 @@ osm_physp_set_port_info( *  Port, Physical Port */
+/f* OpenSM: Physical Port/osm_physp_validate_base_lid+* NAME+*  osm_physp_validate_base_lid+*+* DESCRIPTION+*  Validates the base LID in the Physical Port object.+*+* SYNOPSIS+*/
+static inline boolean_t+osm_physp_validate_base_lid(+   IN osm_physp_t* const p_physp )+{+   CL_ASSERT( osm_physp_is_valid( p_physp ) );+   if ( cl_ntoh16( p_physp->port_info.base_lid ) > IB_LID_UCAST_END_HO )
+   {+   p_physp->port_info.base_lid = 0;+   return FALSE;+   }+   return TRUE;+}+/*+* PARAMETERS+*  p_physp+*  [in] Pointer to an osm_physp_t object.
+*+* RETURN VALUES+*  Returns TRUE if the base LID in the Physical port object is valid.+*  FALSE otherwise.+*+* NOTES+*+* SEE ALSO+*  Port, Physical Port+*/
+ /f* OpenSM: Physical Port/osm_physp_set_pkey_tbl * NAME *  osm_physp_set_pkey_tblIndex: opensm/osm_port_info_rcv.c===--- opensm/osm_port_info_rcv.c  (revision 3579)
+++ opensm/osm_port_info_rcv.c  (working copy)@@ -346,8 +346,12 @@ __osm_pi_rcv_process_switch_port(   if (port_num == 0)   {-/* This is a management port 0 */-   __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi);
+   /* This is switch management port 0 */+   if ( !osm_physp_validate_base_lid( p_physp ) )+   osm_log( p_rcv->p_log, OSM_LOG_ERROR,+"__osm_pi_rcv_process_switch_port: ERR 0F04: "
+"Invalid
base LID corrected.\n" );+   __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi);   }   OSM_LOG_EXIT( p_rcv->p_log );@@ -367,6 +371,10 @@ __osm_pi_rcv_process_ca_port(   UNUSED_PARAM( p_node );
   osm_physp_set_port_info( p_physp, p_pi );+  if ( !osm_physp_validate_base_lid( p_physp ) )+osm_log( p_rcv->p_log, OSM_LOG_ERROR,+"__osm_pi_rcv_process_ca_port: ERR 0F08: "
+"Invalid base LID corrected.\n" );   __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi);@@ -390,6 +398,10 @@ __osm_pi_rcv_process_router_port( Update the PortInfo attribute.
   */   osm_physp_set_port_info( p_physp, p_pi );+  if ( !osm_physp_validate_base_lid( p_physp ) )+osm_log( p_rcv->p_log, OSM_LOG_ERROR,+"__osm_pi_rcv_process_router_port: ERR 0F09: "
+"Invalid base LID corrected.\n" );   OSM_LOG_EXIT( p_rcv->p_log ); }
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm and faulty hardware

2005-09-27 Thread Viswanath Krishnamurthy
Hal,

I added a hack now to get around the problem. There needs to be a proper fix later..

[EMAIL PROTECTED] opensm]# svn diff osm_port.h
Index: osm_port.h
===
--- osm_port.h  (revision 3549)
+++ osm_port.h  (working copy)
@@ -1049,6 +1049,8 @@
 {
    CL_ASSERT( p_physp );
    CL_ASSERT( osm_physp_is_valid( p_physp ) );
+   if (p_physp->port_info.base_lid == 0x)
+   return (0);
    return( p_physp->port_info.base_lid );
 }
 /*
On 27 Sep 2005 15:11:05 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Tue, 2005-09-27 at 14:13, Viswanath Krishnamurthy wrote:> I tracked down the issue to a bug in osm_lid_mgr.c>> function:  __osm_lid_mgr_init_sweep(...)>> The bad hardware was retutning an assigned LID of 0x. In this
> function there is a loop> as follows where opensm is getting stuck.. (with line number)>> 392   p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;> 393> 394   for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl );
>
395p_port !=
(osm_port_t*)cl_qmap_end( p_port_guid_tbl );>
396p_port =
(osm_port_t*)cl_qmap_next( &p_port->map_item )> )> 397   {> 398 osm_port_get_lid_range_ho(p_port, &disc_min_lid,> &disc_max_lid);> 399 for (lid = disc_min_lid; lid <= disc_max_lid;
>
lid++)  <=
Bug here> 400   cl_ptr_vector_set(p_discovered_vec, lid, p_port );> 401   }>> Since the disc_max_lid and disc_min_lid are 0x, and these are> unsigned 16 bit numbers, the condition
> in the for loop never becomes false, and opensm is stuck in the loop.> There are couple of other places in that> function that needs fixing too.Sep 26 15:26:03 424135 [B66CFBB0] -> SMP dump:
base_ver0x1mgmt_class..0x81class_ver...0x1method..0x1
(SubnGet)D
bit...0x0status..0x0hop_ptr.0x0hop_count...0x2trans_id0x1274
attr_id.0x15
(PortInfo)resv0x0attr_mod0x1m_key...0xdr_slid.0x
dr_dlid.0xSep 26 15:26:03 424407 [B6ED0BB0] -> __osm_nd_rcv_process_nd: Node 0x30d32c7234Description
= Agilent E2954A 4x Generator for InfiniBand.Sep 26 15:26:03 424426 [B6ED0BB0] -> __osm_nd_rcv_process_nd: ]Sep 26 15:26:03 679882 [B56CDBB0] -> SMP dump:base_ver0x1
mgmt_class..0x81class_ver...0x1method..0x81
(SubnGetResp)D
bit...0x1status..0x0hop_ptr.0x0hop_count...0x2trans_id0x1274
attr_id.0x15
(PortInfo)resv0x0attr_mod0x1m_key...0xdr_slid.0x
dr_dlid.0xInitial
path: [0][1][12]Return
path:  [0][E][0]Sep 26 15:26:03 680291 [B76D1BB0] -> osm_pi_rcv_process: [Sep 26 15:26:03 680323 [B56CDBB0] -> __osm_sm_mad_ctrl_rcv_callback: ]Sep 26 15:26:03 680343 [B76D1BB0] -> PortInfo dump:
port
number.0x1node_guid...0x0030d32c7234port_guid...0x0030d32c7234m_key...0x
subnet_prefix...0xfe80base_lid0xYes, it appears the Agilent exerciser returned good status to a SM Get
PortInfo with a base_lid of 0x. The base_lid should be validated byOpenSM.-- Hal
___
openib

Re: [openib-general] opensm and faulty hardware

2005-09-27 Thread Viswanath Krishnamurthy
I tracked down the issue to a bug in osm_lid_mgr.c 

function:  __osm_lid_mgr_init_sweep(...)

The bad hardware was retutning an assigned LID of 0x. In this function there is a loop
as follows where opensm is getting stuck.. (with line number)

    392   p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl;
    393
    394   for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl );
    395    p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl );
    396    p_port
= (osm_port_t*)cl_qmap_next( &p_port->map_item ) )
    397   {
    398 osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid);
    399 for (lid = disc_min_lid;
lid <= disc_max_lid;
lid++) 
<= Bug here
    400   cl_ptr_vector_set(p_discovered_vec, lid, p_port );
    401   }

Since the disc_max_lid and disc_min_lid are 0x, and these are unsigned 16 bit numbers, the condition
in the for loop never becomes false, and opensm is stuck in the loop.  There are couple of other places in that
function that needs fixing too.

-Viswa
On 9/27/05, Viswanath Krishnamurthy <[EMAIL PROTECTED]> wrote:
Log sent off-list...

-Viswa
On 9/27/05, Eitan Zahavi <
[EMAIL PROTECTED]> wrote:

Hi Viswa,Please send a full /var/log/osm.log file of opensm -V .You can send us a copy off the list if it is too big:yael and eitan in @
mellanox.co.ilEZ
Hal Rosenstock wrote:> On Mon, 2005-09-26 at 19:57, Viswanath Krishnamurthy wrote:>>>I have an exerciser in the IB network. The exerciser seems to be>>faulty/buggy. When opensm starts I do not
>>see 'SUBNET UP" message. It says "Entering MASTER"  and waits there.>>Any new node inserted in this state is not assigned any LID.   Anybody>>seen such behavior ?>

>> Any idea on how the IB exerciser misbehaves on the network ? Do you have> an analyzer too ?>> What does the OSM log show ?>> -- Hal>> ___
> openib-general mailing list> openib-general@openib.org> 
http://openib.org/mailman/listinfo/openib-general
>> To unsubscribe, please visit> http://openib.org/mailman/listinfo/openib-general
>


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] opensm and faulty hardware

2005-09-27 Thread Viswanath Krishnamurthy
Log sent off-list...

-Viswa
On 9/27/05, Eitan Zahavi <[EMAIL PROTECTED]> wrote:
Hi Viswa,Please send a full /var/log/osm.log file of opensm -V .You can send us a copy off the list if it is too big:yael and eitan in @mellanox.co.ilEZ
Hal Rosenstock wrote:> On Mon, 2005-09-26 at 19:57, Viswanath Krishnamurthy wrote:>>>I have an exerciser in the IB network. The exerciser seems to be>>faulty/buggy. When opensm starts I do not
>>see 'SUBNET UP" message. It says "Entering MASTER"  and waits there.>>Any new node inserted in this state is not assigned any LID.   Anybody>>seen such behavior ?>
>> Any idea on how the IB exerciser misbehaves on the network ? Do you have> an analyzer too ?>> What does the OSM log show ?>> -- Hal>> ___
> openib-general mailing list> openib-general@openib.org> http://openib.org/mailman/listinfo/openib-general
>> To unsubscribe, please visit> http://openib.org/mailman/listinfo/openib-general>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] opensm and faulty hardware

2005-09-26 Thread Viswanath Krishnamurthy
I have an exerciser in the IB network. The exerciser seems to be faulty/buggy. When opensm starts I do not
see 'SUBNET UP" message. It says "Entering MASTER"  and waits there.
Any new node inserted in this state is not assigned any LID.   Anybody seen such behavior ?

-Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Another opensm bug ?

2005-09-26 Thread Viswanath Krishnamurthy
I ran into another opensm bug which caused opensm to stop functioning. This happened only once.

Here is the test case

1. Run opensm on Machine A
2. Run the following script on M/c B
    a. Check ibstatus
    b. Ping machine A
    c. Run osmtest
 d. reboot

The test case is to make sure opensm configures the machine correcty.
Out of 850 iterations, I saw this error once.  The opensm started receiving
Sbnet trap continiously. 9I did not see any message in the log to prevent DOS attacks)
The Trap has the same transacation id (0x224 in this case). opensm mad receive thread
was getting called continously called. 

Initially I suspected the situation which Eitan described.. (Bad hardware causing traps etc).
But when I stoppped and restarted opensm, the problem went away. Log attached off-list.

-Viswa




   
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: Another opensm problem ?

2005-09-26 Thread Viswanath Krishnamurthy
Hi Eitan,

I see that message in the log. 

-Viswa
On 9/24/05, Eitan Zahavi <[EMAIL PROTECTED]> wrote:
Hi Viswa and Hal,I have read through the thread and have few comments.But first let me see if I understand the test run correctly. The test is as follows:1. OpenSM starts up configuring the subnet.
2. Then the user ears up a cable and connects it to the other side port of a switch3. The SM is supposed to bring up the new connection4. Step 2 is repeated until the SM stops responding.Well, if this is the case then OpenSM is might stop responding due to the following features:
1. We had in the past cases where bad hardware continuously flooded the SM with Traps.To protect against this kind of DOS attack we have implemented an adaptive filter inthe SM trap receiver:If the exact same trap is received continuously from same source more then 10 times
(with no more then of 5sec between the traps) they are considered DOS and are ignored.Please see osm_trap_rcv.c for details.2. The way IB switches work is that each time a port of their changes state they:
a. Set the "change bit" in the SwitchInfob. Send a trap 128 to the SM. But Trap 128 does not carry the changed port number.So under a test case like you describe what can happen:1. The SM decides to ignore trap 128 from the switch as more then 5 connect/reconnect sequences
happen with not enough "quite" time to recover.2. The SwitchInfo ChangeBit is sampled during the OSM light sweep. There is a race between thereading of the change bit and the clearing of it. If the connect disconnect happen very fast
the change bit set by the re-connect can be cleaned by the clear starting by the disconnect.It is easy to see in the log file if the SM did ignore traps. Run with -V and look for:grep "Continuously received this trap" /var/log/osm.log
(for some reason I did not get any log attachments with this thread - otherwise I woulddo some analysis on it too).Anyway, if the SM does not heavy sweep (due to the above) it is very likely it will continue to
poll the non existing node that was previously attached to a switch port with no success.So testing of cable tear off and reconnect should be done with at least 10 seconds recovery time.Also you could try sending kill -HUP to the OpenSM process and see if the full sweep you start
is able to bring all ports up.Viswa, with all that said, it is very possible you are experiencing a bug in OpenSM and wewant to encourage your effort finding those. With your, and others, help we will be able to
flush them out.ThanksEitanHal Rosenstock wrote:> On Fri, 2005-09-23 at 14:57, Hal Rosenstock wrote:>>>On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote:>>
>>>- After 7-8 iterations, I ran into a weird problem, where opensm was>>>showing the HCA as UNKNOWN. The port>>>never came up to ACTIVE state.  The unplugged and replugged into>>>different slots, the port remained in INIT
>>>state.>>>>Mellanox:
SW : 12 : INI
:  : : 2048 :
1x  : 2.5 :>> 0002c9010d26e780 : UNKNOWN>>>OpenSM thinks that either there is no physical port on the other end>> of>>>the link or it is not "valid" (GUID non 0). Obviously it is there as
>> the>>>port state is INIT so the physical link came up which requires the>>remote end to be there.>>>>From the log you sent, this is exactly what is happening.
> Sep 23 10:07:23 451191 [B7751BB0] -> osm_drop_mgr_process: Checking port> 0x0002c9010d26e780.> Sep 23 10:07:23 451209 [B7751BB0] -> osm_drop_mgr_process: Checking port> 0x0002c90200400cfd.
> Sep 23 10:07:23 451226 [B7751BB0] -> osm_drop_mgr_process: ERR 0108:> Unknown remote side for node 0x0002c9010d26e780 port 20. Adding to light> sweep sampling list.> Sep 23 10:07:23 451251 [B7751BB0] -> Directed Path Dump of 1 hop path:
>
Path = [0][1]> Sep 23 10:07:23 451267 [B7751BB0] -> osm_drop_mgr_process: ]>> So look in osm_drop_mgr.c line 707:> Can you enhance the log display to see which is failing:> osm_physp_is_valid(p_physp) or osm_physp_get_remote(p_physp) ?
>> Also, it appears to keep light sweeping this port but whichever switch> port it is on, it does not respond. Not sure where the problem is. It> could be on the outgoing side of the switch (we could run diags against
> the switch and various ports; I would be curious what they return when> the subnet is in this broken state) or on the HCA. However, the fact> that restarting opensm made it go away without touching anything else
> makes this appear otherwise.>>>>One other note is that it appears to have come up as 1x. Is that what>>should happen ?>>> -- Hal>> ___
> openib-general mailing list> 

Re: [openib-general] Re: opensm and SIGINT

2005-09-23 Thread Viswanath Krishnamurthy
On 23 Sep 2005 13:49:31 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote:> More information,>> The test case is as follows>> 1. Start opensm in verbose mode (-V)> 2. Ping remote node
> 3. osmtest -f c> 4. osmtest -f a> 5. pkill -9 opensm> 6. Repeat over>> Out of about 2500 iterations, 143  osmtest  failed. Keep in mind,> only Step 4 failed.Yes.
Do you see any port LEDs on the switch blink indicating the port wentdown from active and back while running this  ?

No, I ran this test overnight and logged the results.  I will try it next week and let you know.
> Step 3 which is inventory file creation *never* failed. (I think> inventory file creation also talks to SA right ?)
Right.-- Hal
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Forcing IB link state down

2005-09-23 Thread Viswanath Krishnamurthy
On 23 Sep 2005 13:59:28 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Fri, 2005-09-23 at 13:55, Viswanath Krishnamurthy wrote:> Is there an API or command to force an IB link to go down.Not currently.>  This will be helpful in running tests on opensm.
Yes, I can understand that. Technically (per the IBA spec), the SM isthe only one allowed to do Sets. I think it would be possible to have adiag command do this as long as the MKey protection is weak (which it is
now). A better way might be to have a CLI on the OpenSM and be able toissue a down command to a port.

I was looking if mthca driver has any API/ioctl to disable/enable the link..

-Viswa
 
-- Hal
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: Another opensm problem ?

2005-09-23 Thread Viswanath Krishnamurthy
Hal,On 23 Sep 2005 14:04:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi again Viswa,On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote:Good test. Hadn't tried this. I will try it and will recreate this.> - 2 machines with a switch in bertween. One m/c running opensm.
How was opensm started ?

Manually   # opensm -V 
> Attached is the logThe default log is in /var/log/osm.log


I captured what appeared on the screen.   I will  send
the osm.log file too.. It is a big one and had accumulated over a
period of time..
-- Hal
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Forcing IB link state down

2005-09-23 Thread Viswanath Krishnamurthy
Is there an API or command to force an IB link to go down. This will be helpful in running tests on opensm.

-Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: opensm and SIGINT

2005-09-23 Thread Viswanath Krishnamurthy
More information,

The test case is as follows

1. Start opensm in verbose mode (-V)
2. Ping remote node 
3. osmtest -f c
4. osmtest -f a
5. pkill -9 opensm
6. Repeat over

Out of about 2500 iterations, 143  osmtest  failed. Keep in
mind,  only Step 4 failed.  Step 3 which is inventory file
creation *never* failed. (I think inventory file creation also talks to
SA right ?)

-Viswa

On 23 Sep 2005 12:54:56 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Eitan,On Fri, 2005-09-23 at 12:19, Eitan Zahavi wrote:> Hi Hal, Viswa,>> Sorry I'm joining late on this thread due to the weekend (which starts> here on Friday ending Saturday night).
> Is there any conclusion on this one?No.> The only log I have seen was from osmtest failing to send a MAD.True.> Looks like a umad issue?Not sure why you say that. There are other possibilities I'm aware of
here:Note that that failed sent MAD is one which has a response expected sothis means that the response was not received. It also goes through thetransmit retry strategy (I could see this on the SA side). So the only
thing I can say at this point is that for some reason, the response doesnot make it back from the SA to the SA client (osmtest). That's wherethis one is right now.-- Hal> Eitan>> Hal Rosenstock wrote:
> > Hi again Viswa,> >> > On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote:> >> >>Hi Viswa,> >>> >>On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote:
> >>> >>>Currently opensm traps SIGINT. There was some discussion to remove> >> > it.> >> >>>I have currently running some tests on opensm> >>>by killing (SIGKILL) and restarting opensm. So far I ahve not found
> >>>any resource leak issues. Is ther a plan to remove that> >>>signal handler. Ideally it should not exist.> >>> >>Eitan stated that this was historical in nature for gen1 drivers which
> >>had resource tracking problems: "if OpenSM left without cleaning up> >> > all> >> >>used resources (like MAD buffers and UD-AVs), the driver oops'ed."
> >>> >>I think that (eliminating the handler for SIGINT) can at least be done> >>for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor> >>layers for starters. I will experiment with gen2 and let you know.
> >> >> > Does the patch below do what you want ? Can you try it ?> >> > -- Hal> >> > Index: opensm/osm_opensm.c> > ===
> > --- opensm/osm_opensm.c (revision 3513)> > +++ opensm/osm_opensm.c (working copy)> > @@ -182,7 +182,9 @@ osm_reg_sig_handler(> > IN osm_opensm_t * const p_osm )> >  {
> > __p_osm_to_signal = p_osm;> > +#ifndef OSM_VENDOR_INTF_OPENIB> > cl_reg_sig_hdl( SIGINT, __sig_handler );> > +#endif> > cl_reg_sig_hdl( SIGTERM, __sig_handler );
> > cl_reg_sig_hdl( SIGHUP, __sig_handler );> > osm_exit_flag = 0;> >> >> > ___> > openib-general mailing list
> > openib-general@openib.org> > http://openib.org/mailman/listinfo/openib-general> >
> > To unsubscribe, please visit> > http://openib.org/mailman/listinfo/openib-general> >>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: opensm and SIGINT

2005-09-22 Thread Viswanath Krishnamurthy
On 22 Sep 2005 18:44:44 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Thu, 2005-09-22 at 15:55, Viswanath Krishnamurthy wrote:> Here is the log of osmtest failure. This was seen 150 times out of> 2500 iterations. The opensm SUBNET UP failure is tough to reproduce.
> Saw it once in 2500 iterations. Unfortunately I did not collect the> log on that error.I understand but it is hard to know whether this is a known issue orsomething else without a log of the failure.
> The patch worked as expected and did not see any issues with ctrl-C.> When I tried apply the patch, I got a failure.  (I used the patch> command). I manually added those 2 lines.Not sure why the patch wouldn't apply.
> Command Line Arguments> Done with args> Flow = All Validations> Sep 21 17:50:56 684254 [B7F026C0] -> osm_vendor_get_all_port_attr:> assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
> ault port.> using default guid 0x2c90200400cfd> Sep 21 17:50:56 686301 [B7F026C0] -> osm_vendor_get_all_port_attr:> assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def> ault port.
> Sep 21 17:50:56 686347 [B7F026C0] -> osm_vendor_bind: Binding to port> 0x2c90200400cfd.> Sep 21 17:50:56 689963 [B7F026C0] -> osm_vendor_get_all_port_attr:> assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
> ault port.> Sep 21 17:50:56 691969 [B7F026C0] -> osm_vendor_get_all_port_attr:> assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def> ault port.> Sep 21 17:50:56 693187 [B7F026C0] ->
> osmtest_validate_sa_class_port_info:> -> SA Class Port Info:>  base_ver:1>  class_ver:2>  cap_mask:0x202>  resp_time_val:0x64> -
> Sep 21 17:50:56 775383 [B7F026C0] -> osmtest_wrong_sm_key_ignored: Try> PortRecord for port with LID 0x0 Num:0x1.> Sep 21 17:51:00 775320 [B76FFBB0] -> umad_receiver: ERR 5409: send> completed with error (method=1 attr=12 trans_id=0x34) --
> dropping.> Sep 21 17:51:00 775389 [B76FFBB0] -> umad_receiver: ERR 5410: class> 0x3 LID 0x0> Sep 21 17:51:00 775418 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003:> Error on query (IB_TIMEOUT).
> Sep 21 17:51:00 775465 [B7F026C0] -> osmtest_wrong_sm_key_ignored: ERR> 0011: Did not get a timeout but got (IB_SUCCESS).> Sep 21 17:51:00 775581 [B7F026C0] -> osmt_register_service:> Registering Service: name:
osmt.srvc.1804289383.7793 id:0x6b8b26f> 6.> Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service:> Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554> Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service:
> Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554> .> Sep 21 17:51:04 779578 [B76FFBB0] -> umad_receiver: ERR 5409: send> completed with error (method=2 attr=31 trans_id=0x36) --dropping.
> Sep 21 17:51:04 779604 [B76FFBB0] -> umad_receiver: ERR 5410: class> 0x3 LID 0x0> Sep 21 17:51:04 779631 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003:> Error on query (IB_TIMEOUT).> Sep 21 17:51:04 779674 [B7F026C0] -> osmt_register_service: ERR 0364:
> ib_query failed (IB_TIMEOUT).> Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: Service> Flow failed (IB_TIMEOUT)> OSMTEST: TEST "All Validations" FAILThe final FAIL/PASS is definitive so there are real failures here. Is
this consistent or intermittent ? Does this work sometimes or always

Intermittent.. As I said 150 out of  2500 iterations failed. Is there any log you want me to collect ?
fail ?-- Hal
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: opensm and SIGINT

2005-09-22 Thread Viswanath Krishnamurthy
Hal,

Here is the log of osmtest failure. This was seen 150 times out of 2500
iterations. The opensm SUBNET UP failure is tough to reproduce. Saw it
once in 2500 iterations. Unfortunately I did not collect the log on
that error.

The patch worked as expected and did not see any issues with
ctrl-C.  When I tried apply the patch, I got a failure.  (I
used the patch command). I manually added those 2 lines.

Command Line Arguments
Done with args
    Flow = All Validations
Sep 21 17:50:56 684254 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
ault port.
using default guid 0x2c90200400cfd
Sep 21 17:50:56 686301 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
ault port.
Sep 21 17:50:56 686347 [B7F026C0] -> osm_vendor_bind: Binding to port 0x2c90200400cfd.
Sep 21 17:50:56 689963 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
ault port.
Sep 21 17:50:56 691969 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
ault port.
Sep 21 17:50:56 693187 [B7F026C0] -> osmtest_validate_sa_class_port_info:
-
SA Class Port Info:
 base_ver:1
 class_ver:2
 cap_mask:0x202
 resp_time_val:0x64
-
Sep 21 17:50:56 775383 [B7F026C0] -> osmtest_wrong_sm_key_ignored: Try PortRecord for port with LID 0x0 Num:0x1.
Sep 21 17:51:00 775320 [B76FFBB0] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=12 trans_id=0x34) --
dropping.
Sep 21 17:51:00 775389 [B76FFBB0] -> umad_receiver: ERR 5410: class 0x3 LID 0x0
Sep 21 17:51:00 775418 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT).
Sep 21 17:51:00 775465 [B7F026C0] -> osmtest_wrong_sm_key_ignored: ERR 0011: Did not get a timeout but got (IB_SUCCESS).
Sep 21 17:51:00 775581 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.1804289383.7793 id:0x6b8b26f
6.
Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554
Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554
.
Sep 21 17:51:04 779578 [B76FFBB0] -> umad_receiver: ERR 5409: send
completed with error (method=2 attr=31 trans_id=0x36) --dropping.
Sep 21 17:51:04 779604 [B76FFBB0] -> umad_receiver: ERR 5410: class 0x3 LID 0x0
Sep 21 17:51:04 779631 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT).
Sep 21 17:51:04 779674 [B7F026C0] -> osmt_register_service: ERR 0364: ib_query failed (IB_TIMEOUT).
Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: Service Flow failed (IB_TIMEOUT)
OSMTEST: TEST "All Validations" FAIL


-Viswa

On 22 Sep 2005 15:08:02 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Thu, 2005-09-22 at 15:06, Viswanath Krishnamurthy wrote:> I do not think this would help.  The system is never rebooted. Just> opensm is started  and stopped. On the mext opensm start/stop the> subnet came up. I think it is more of an opensm issue than any kernel
> module issue.Can you run opensm in -V mode and send the log. It might be related tothe SM Set PortInfo armed->active issue which has been documented butnot resolved.-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: opensm and SIGINT

2005-09-22 Thread Viswanath Krishnamurthy
Hal,On 22 Sep 2005 14:41:04 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Thu, 2005-09-22 at 14:37, Viswanath Krishnamurthy wrote:> Hi Hal,>> Sure will test it out. I see no issue in this fix. I have run the> following test overnight> in a script with yesterday's code
>> 1. Start opensm> 2. Ping another  node over IB> 3. Run osmtest (osmtest -f c, osmtest -f a)> 4. Kill opensm with -9 signal and repeat over>> The failures are  captured in a log.
>> This has run more than 2500 times without resource leak issues. I saw> about 150 osmtest> failures which I will followup with another mail.Some failures are intentional (bad flow tests). They are all not marked
obviously. Some of this has been documented on the list but not fixedyet but I am interested in seeing what you are referring to.

I will attach the log later. 
>  Once opensm failed to start  correctly with SUBNET UP message in the> log.
So the subnet didn't come up and the ports didn't become active ? Justout of curiousity, could you unload and reload ib_umad and then startopensm when that occurs to see if that fixes things ? I'm not sure it
would.

I do not think this would help.  The system is never rebooted.
Just opensm is started  and stopped. On the mext opensm start/stop
the subnet came up. I think it is more of an opensm issue than any
kernel module issue. 
Thanks.-- Hal
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: opensm and SIGINT

2005-09-22 Thread Viswanath Krishnamurthy
Hi Hal,

Sure will test it out. I see no issue in this fix. I have run the following test overnight
in a script with yesterday's code

1. Start opensm
2. Ping another  node over IB
3. Run osmtest (osmtest -f c, osmtest -f a)
4. Kill opensm with -9 signal and repeat over

The failures are  captured in a log.

This has run more than 2500 times without resource leak issues. I saw about 150 osmtest
failures which I will followup with another mail. Once opensm failed to start  correctly with SUBNET UP message in the log.

-Viswa
On 22 Sep 2005 11:17:46 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi again Viswa,On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote:> Hi Viswa,>> On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote:> > Currently opensm traps SIGINT. There was some discussion to remove it.
> > I have currently running some tests on opensm> > by killing (SIGKILL) and restarting opensm. So far I ahve not found> > any resource leak issues. Is ther a plan to remove that> > signal handler. Ideally it should not exist.
>> Eitan stated that this was historical in nature for gen1 drivers which> had resource tracking problems: "if OpenSM left without cleaning up all> used resources (like MAD buffers and UD-AVs), the driver oops'ed."
>> I think that (eliminating the handler for SIGINT) can at least be done> for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor> layers for starters. I will experiment with gen2 and let you know.
Does the patch below do what you want ? Can you try it ?-- HalIndex: opensm/osm_opensm.c===--- opensm/osm_opensm.c (revision 3513)
+++ opensm/osm_opensm.c (working copy)@@ -182,7 +182,9 @@ osm_reg_sig_handler(IN osm_opensm_t * const p_osm ) {__p_osm_to_signal = p_osm;+#ifndef OSM_VENDOR_INTF_OPENIBcl_reg_sig_hdl( SIGINT, __sig_handler );
+#endifcl_reg_sig_hdl( SIGTERM, __sig_handler );cl_reg_sig_hdl( SIGHUP, __sig_handler );osm_exit_flag = 0;
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ib_create_cq memory leak?

2005-09-22 Thread Viswanath Krishnamurthy
Roland,

Thanks.  Tested this out.. Works like a charm...

-Viswa
On 9/21/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Thanks very much for the excellent test case.  The following patch(already checked into svn and queued in git for merging into 2.6.14)should fix things -- on my system, your test case ran successfully formany hundreds of iterations.
--- linux-kernel/infiniband/hw/mthca/mthca_memfree.c(revision 3500)+++ linux-kernel/infiniband/hw/mthca/mthca_memfree.c(working copy)@@ -529,12 +529,25 @@ int mthca_alloc_db(struct mthca_dev *dev
goto
found;}+   for (i = start; i != end; i += dir)+   if (!dev->db_tab->page[i].db_rec) {+  
page = dev->db_tab->page + i;+  
goto alloc;+   }+if (dev->db_tab->max_group1 >= dev->db_tab->min_group2 - 1) {ret = -ENOMEM;goto out;}+   if (group == 0)
+   ++dev->db_tab->max_group1;+   else+   --dev->db_tab->min_group2;+page = dev->db_tab->page + end;++alloc:page->db_rec = dma_alloc_coherent(&dev->pdev->dev, 4096,
  &page->mapping,
GFP_KERNEL);if (!page->db_rec) {@@ -554,10 +567,6 @@ int mthca_alloc_db(struct mthca_dev *dev}bitmap_zero(page->used, MTHCA_DB_REC_PER_PAGE);-   if (group == 0)
-   ++dev->db_tab->max_group1;-   else-   --dev->db_tab->min_group2; found:j = find_first_zero_bit(page->used, MTHCA_DB_REC_PER_PAGE);

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] opensm and SIGINT

2005-09-21 Thread Viswanath Krishnamurthy
Hal,

Currently opensm traps SIGINT. There was some discussion to remove it. I have currently running some tests on opensm
by killing (SIGKILL) and restarting opensm. So far I ahve not found any resource leak issues. Is ther a plan to remove that
signal handler. Ideally it should not exist.

-Viswa



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Modifying QP state error

2005-09-21 Thread Viswanath Krishnamurthy
The mthca state transistion  code allows this transistion (RTS
--> RESET), but the mthca hardware/firmware does not allow it. It
allows RTS->ERR->RESET. I will post the code later  to
reproduce this. I was trying to workaround the CQ destroy memory 
leak by caching QP entries and reusing them, but ran into other issues.

-Viswa

On 9/21/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Hal> You can only get to RESET from ERROR. See Figure 124 QPHal> Context State Diagram IBA 1.2 p. 452.I think the figure drawn in a slightly misleading way.  The text atthe lower left says:
It is possible to transition from any state to either the Error orthe Reset state with the Modify QP/EE Verb. - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Modifying QP state error

2005-09-21 Thread Viswanath Krishnamurthy
When I try to modify QP state from RTS to RESET I get the following error

ib_mthca :05:00.0: Command 1e completed with status 0a
ib_mthca :05:00.0: modify QP 7 returned status 0a.

Is modifying QP state from RTS to RESET a valid state transistion ?  (I guess so)
Are there anything else that needs to be taken care of ?


-Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] ib_create_cq memory leak? (Resend)

2005-09-21 Thread Viswanath Krishnamurthy
I ran into this issue when using the kernel API to create CQ's. In order to reproduce this problem, I wrote

a small kernel module which creates 4K CQ's and destroys them. After running the test (8-10 times), I saw

create_cq error with error -12 (ENOMEM).



I am attaching the test module source code with Makefiles



[root src]# svn info   (Latest code)

Path: .

URL: https://openib.org/svn/gen2/trunk/src

Repository UUID: 21a7a0b7-18d7-0310-8e21
-e8b31bdbf5cd
Revision: 3512
Node Kind: directory
Schedule: normal
Last Changed Author: halr
Last Changed Rev: 3511
Last Changed Date: 2005-09-21 08:57:38 -0700 (Wed, 21 Sep 2005)


To compile the code, change the KERNELSRC variable in mysock.mak to point to your kernel source tree

#make -f mysock.mak

#insmod mysock.ko

To run the test

#echo 1 > /dev/mysock

After 8-10 times of running the above, you will see a -12 error on the console.

This problem does not occur when you create a single CQ and destroy it immediately in a loop (I tried 10 times).
This occurs when you create 4K CQ's and then destroy it.


ib_mthca :05:00.0: Mapped page at 362f9000 to 7e000 for ICM.
ib_mthca :05:00.0: Mapped page at 362fa000 to 41000 for ICM.
ib_mthca :05:00.0: Mapped page at 35d1 to 7d000 for ICM.
ib_mthca :05:00.0: Mapped page at 35d11000 to 42000 for ICM.
ib_mthca :05:00.0: Mapped page at 35f27000 to 7c000 for ICM.
ib_mthca :05:00.0: Mapped page at 35f28000 to 43000 for ICM.
ib_mthca :05:00.0: Mapped page at 3593f000 to 7b000 for ICM.
ib_mthca :05:00.0: Mapped page at 3594 to 44000 for ICM.
ib_mthca :05:00.0: Mapped page at 35b56000 to 7a000 for ICM.
ib_mthca :05:00.0: Mapped page at 35b57000 to 45000 for ICM.
ib_mthca :05:00.0: Mapped page at 3556d000 to 79000 for ICM.
ib_mthca :05:00.0: Mapped page at 3556e000 to 46000 for ICM.
ib_mthca :05:00.0: Mapped page at 35785000 to 78000 for ICM.
ib_mthca :05:00.0: Mapped page at 35786000 to 47000 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2604 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2004 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2584 for ICM.
ib_mthca :05:00.0: Mapped page at 3519b000 to 77000 for ICM.
ib_mthca :05:00.0: Mapped page at 3519c000 to 48000 for ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7e000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7d000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7c000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7b000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7a000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 79000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 78000 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2584 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2004 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 48000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 77000 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2604 from ICM.
ib_mthca :05:00.0: Mapped page at 35b03000 to 76000 for ICM.
ib_mthca :05:00.0: Mapped page at 362ba000 to 75000 for ICM.
ib_mthca :05:00.0: Mapped page at 35d2c000 to 74000 for ICM.
ib_mthca :05:00.0: Mapped page at 35c83000 to 73000 for ICM.
ib_mthca :05:00.0: Mapped page at 35b99000 to 72000 for ICM.
ib_mthca :05:00.0: Mapped page at 35db to 71000 for ICM.
ib_mthca :05:00.0: Mapped page at 356c5000 to 7 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2604 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2004 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2584 for ICM.
ib_mthca :05:00.0: Mapped page at 35adc000 to 6f000 for ICM.
ib_mthca :05:00.0: Mapped page at 35add000 to 49000 for ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 76000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 75000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 74000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 73000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 72000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 71000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2584 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2004 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 49000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 6f000 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2604 from ICM.
ib_mthca :05:00.0: Mapped page at 362cf000 to 6e000 for ICM.
ib_mthca :05:00.0: Mapped page at 35a0f000 to 6d000 for ICM.
ib_mthca :05:00.0: Mapped page at 3507 to 6c000 for ICM.
ib_mthca :05:00.0: Mapped page at 35e83000 to 6b000 for ICM.
ib_mthca :05:00.0: Mapped page at 35bd8000 to 6a000 for ICM.
ib_mthca :05:00.0: Mapped page at 351ef000 to 69000 for ICM.
ib_mthca :05:00.0: Mapped page at 35c440

[openib-general] ib_create_cq memory leak?

2005-09-21 Thread Viswanath Krishnamurthy
I ran into this issue when using the kernel API to create CQ's. In order to reproduce this problem, I wrote
a small kernel module which creates 4K CQ's and destroys them. After running the test (8-10 times), I saw
create_cq error with error -12 (ENOMEM).

I am attaching the test module source code with Makefiles

[root src]# svn info   (Latest code)
Path: .
URL: https://openib.org/svn/gen2/trunk/src
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3512
Node Kind: directory
Schedule: normal
Last Changed Author: halr
Last Changed Rev: 3511
Last Changed Date: 2005-09-21 08:57:38 -0700 (Wed, 21 Sep 2005)


To compile the code, change the KERNELSRC variable in mysock.mak to point to your kernel source tree

#make -f mysock.mak

#insmod mysock.ko

To run the test

#echo 1 > /dev/mysock

After 8-10 times of running the above, you will see a -12 error on the console.

This problem does not occur when you create a single CQ and destroy it immediately in a loop (I tried 10 times).
This occurs when you create 4K CQ's and then destroy it.


ib_mthca :05:00.0: Mapped page at 362f9000 to 7e000 for ICM.
ib_mthca :05:00.0: Mapped page at 362fa000 to 41000 for ICM.
ib_mthca :05:00.0: Mapped page at 35d1 to 7d000 for ICM.
ib_mthca :05:00.0: Mapped page at 35d11000 to 42000 for ICM.
ib_mthca :05:00.0: Mapped page at 35f27000 to 7c000 for ICM.
ib_mthca :05:00.0: Mapped page at 35f28000 to 43000 for ICM.
ib_mthca :05:00.0: Mapped page at 3593f000 to 7b000 for ICM.
ib_mthca :05:00.0: Mapped page at 3594 to 44000 for ICM.
ib_mthca :05:00.0: Mapped page at 35b56000 to 7a000 for ICM.
ib_mthca :05:00.0: Mapped page at 35b57000 to 45000 for ICM.
ib_mthca :05:00.0: Mapped page at 3556d000 to 79000 for ICM.
ib_mthca :05:00.0: Mapped page at 3556e000 to 46000 for ICM.
ib_mthca :05:00.0: Mapped page at 35785000 to 78000 for ICM.
ib_mthca :05:00.0: Mapped page at 35786000 to 47000 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2604 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2004 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2584 for ICM.
ib_mthca :05:00.0: Mapped page at 3519b000 to 77000 for ICM.
ib_mthca :05:00.0: Mapped page at 3519c000 to 48000 for ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7e000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7d000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7c000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7b000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7a000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 79000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 78000 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2584 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2004 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 48000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 77000 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2604 from ICM.
ib_mthca :05:00.0: Mapped page at 35b03000 to 76000 for ICM.
ib_mthca :05:00.0: Mapped page at 362ba000 to 75000 for ICM.
ib_mthca :05:00.0: Mapped page at 35d2c000 to 74000 for ICM.
ib_mthca :05:00.0: Mapped page at 35c83000 to 73000 for ICM.
ib_mthca :05:00.0: Mapped page at 35b99000 to 72000 for ICM.
ib_mthca :05:00.0: Mapped page at 35db to 71000 for ICM.
ib_mthca :05:00.0: Mapped page at 356c5000 to 7 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2604 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2004 for ICM.
ib_mthca :05:00.0: Mapped 1 chunks/256 KB at 2584 for ICM.
ib_mthca :05:00.0: Mapped page at 35adc000 to 6f000 for ICM.
ib_mthca :05:00.0: Mapped page at 35add000 to 49000 for ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 76000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 75000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 74000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 73000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 72000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 71000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 7 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2584 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2004 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 49000 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at 6f000 from ICM.
ib_mthca :05:00.0: Unmapping 64 pages at 2604 from ICM.
ib_mthca :05:00.0: Mapped page at 362cf000 to 6e000 for ICM.
ib_mthca :05:00.0: Mapped page at 35a0f000 to 6d000 for ICM.
ib_mthca :05:00.0: Mapped page at 3507 to 6c000 for ICM.
ib_mthca :05:00.0: Mapped page at 35e83000 to 6b000 for ICM.
ib_mthca :05:00.0: Mapped page at 35bd8000 to 6a000 for ICM.
ib_mthca :05:00.0: Mapped page at 351ef000 to 69000 for ICM.
ib_mthca :05:00.0: Mapped page at 35c44000 to 6800

[openib-general] Re: [PATCH] libmthca: fix wqe post

2005-09-13 Thread Viswanath Krishnamurthy
Just wanted to confirm kernel mthca also works fine..

Thanks Roland & Michael

-Viswa
On 9/13/05, Viswanath Krishnamurthy <[EMAIL PROTECTED]> wrote:
Thanks.. yes that was the problem...

The panic was happening when I was getting these errors  and pressed Ctrl-C on
the server. This may be an error path issue. 

I am not seeing it now..

-Viswa
On 9/13/05, Roland Dreier <
[EMAIL PROTECTED]> wrote:

Viswanath> When I ran the cmpost program which I sent you, IViswanath> started getting errors from the mthca library even forViswanath> smaller number of connections (Earlier it wasViswanath> working).
Yeah, I found another problem with your cmpost program.  I thinkyou're setting the packet lifetime far too low.  You have:sa.packet_life_time  =
2;This ends up having the CM set an ACK timeout of something like 32microseconds, which is way too low.  If you poll the send CQ, you'llprobably see some "retries exceeded" errors.  Setting the

packet_life_time to something like 14 or 15 should work better.Viswanath> Also it is now easier to create the panic when you killViswanath> the cmpost server program. The panic may be happening

Viswanath> on an error path.I still have never been able to reproduce this panic (and believe me,I've killed the cmpost program many time).  Anyway, I'll take a lookat the traceback and see if anything jumps out at me.
 - R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] libmthca: fix wqe post

2005-09-13 Thread Viswanath Krishnamurthy
Thanks.. yes that was the problem...

The panic was happening when I was getting these errors  and pressed Ctrl-C on
the server. This may be an error path issue. 

I am not seeing it now..

-Viswa
On 9/13/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Viswanath> When I ran the cmpost program which I sent you, IViswanath> started getting errors from the mthca library even forViswanath> smaller number of connections (Earlier it wasViswanath> working).
Yeah, I found another problem with your cmpost program.  I thinkyou're setting the packet lifetime far too low.  You have:sa.packet_life_time  =
2;This ends up having the CM set an ACK timeout of something like 32microseconds, which is way too low.  If you poll the send CQ, you'llprobably see some "retries exceeded" errors.  Setting the
packet_life_time to something like 14 or 15 should work better.Viswanath> Also it is now easier to create the panic when you killViswanath> the cmpost server program. The panic may be happening
Viswanath> on an error path.I still have never been able to reproduce this panic (and believe me,I've killed the cmpost program many time).  Anyway, I'll take a lookat the traceback and see if anything jumps out at me.
 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] libmthca: fix wqe post

2005-09-13 Thread Viswanath Krishnamurthy
Roland,

I got the latest sorces, built it along with the drivers.  

Userland mthca

Your test application ran fine without any issue. (rctest)
When I ran the cmpost program which I sent you, I started getting errors from
the mthca library even for smaller number of connections (Earlier it was working). This looks
like error dump im mthca library.

..  [ 0] 0493
  [ 4] 
  [ 8] 
  [ c] 
  [10] 05f4
  [14]    
  [18] 0042
  [1c] fe10
failed polling CQ: 142: err 1  <=== This is from cmpost program
  [ 0] 0493
  [ 4] 
  [ 8] 
  [ c] 
  [10] 05f9
  [14] 
  [18] 0082
  [1c] fe10
failed polling CQ: 142: err 1
  [ 0] 0493

Also it is now easier to create the panic when  you kill the cmpost server program. The panic
may be happening on an error path.

printing eip:
c029197d
*pde = 35d56001
Oops:  [#1]
SMP
Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbd sd_mod
CPU:    0
EIP:    0060:[]    Not tainted VLI
EFLAGS: 00010002   (2.6.13)
EIP is at mthca_poll_cq+0x158/0x534
eax:    ebx: f5e90280   ecx: 0006   edx: 1250
esi: 023a   edi: f5e90304   ebp: f7941f0c   esp: f7941ea4
ds: 007b   es: 007b   ss: 0068
Process ib_mad1 (pid: 308, threadinfo=f794 task=f7cb7540)
Stack: f7941ed0 c0118c7d f7def41c c0355dc0 f7cb7540 f7dea41c c1a01bc0 
   0080   0286 f7ce1000 f7941f0c 0001 f7dea400
   f8806000 0292 0001  f5e90280 f7ce1000 f7def400 f7941f0c
Call Trace:
 [] load_balance_newidle+0x23/0xa2
 [] ib_mad_completion_handler+0x2c/0x8d
 [] remove_wait_queue+0xf/0x34
 [] worker_thread+0x1b0/0x23a
 [] schedule+0x5d3/0xbdf
 [] ib_mad_completion_handler+0x0/0x8d
 [] default_wake_function+0x0/0xc
 [] default_wake_function+0x0/0xc
 [] worker_thread+0x0/0x23a
 [] kthread+0x8a/0xb2
 [] kthread+0x0/0xb2
 [] kernel_thread_helper+0x5/0xb
Code: 01 00 00 8b 44 24 18 8d bb 84 00 00 00 8b 53 5c 8b 70 18 8b 4f 24
0f ce 2b b3 b8 00 00 00 8b 83 bc 00 00 00 d3 ee 01 f2 8d 14 d0
<8b> 02 8b 52 04 85 ff 89 45 00 89 55 04 74 16 8b 57 10 89 f0 39

-Viswa
On 9/13/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Viswanath> Once you generate a kernel patch, I can test out bothViswanath> user and kernel mthca since I have the tests ready..Excellent.  I merged MST's patch, and applied the patch below to the
kernel.  (So you can either update from svn or apply the patches)Thanks for testing -- let me know if you still see problems.Index: infiniband/hw/mthca/mthca_srq.c===
--- infiniband/hw/mthca/mthca_srq.c (revision 3404)+++ infiniband/hw/mthca/mthca_srq.c (working copy)@@ -189,7 +189,6 @@ int mthca_alloc_srq(struct mthca_dev *desrq->max  = attr->max_wr;
srq->max_gs   = attr->max_sge;-   srq->last = NULL;srq->counter  = 0;if (mthca_is_memfree(dev))@@ -264,6 +263,7 @@ int mthca_alloc_srq(struct mthca_dev *de
srq->first_free = 0;srq->last_free  = srq->max - 1;+   srq->last   = get_wqe(srq, srq->max - 1);return 0;@@ -446,13 +446,11 @@ int mthca_tavor_post_srq_recv(struct ib_
((struct
mthca_data_seg *) wqe)->addr = 0;}-   if (likely(prev_wqe)) {-  
((struct mthca_next_seg *) prev_wqe)->nda_op =-  
cpu_to_be32((ind << srq->wqe_shift) | 1);-  
wmb();-  
((struct mthca_next_seg *) prev_wqe)->ee_nds =-  
cpu_to_be32(MTHCA_NEXT_DBD);-   }+  
((struct mthca_next_seg *) prev_wqe)->nda_op =+  
cpu_to_be32((ind << srq->wqe_shift) | 1);+   wmb();+  
((struct mthca_next_seg *) prev_wqe)->ee_nds =+  
cpu_to_be32(MTHCA_NEXT_DBD);srq->wrid[ind]  =
wr->wr_id;srq->first_free = next_ind;Index: infiniband/hw/mthca/mthca_qp.c===--- infiniband/hw/mthca/mthca_qp.c  (revision 3404)
+++ infiniband/hw/mthca/mthca_qp.c  (working copy)@@ -227,7 +227,6 @@ static void mthca_wq_init(struct mthca_wwq->last_comp = wq->max - 1;wq->head  = 0;wq->tail  = 0;
-   wq->last  = NULL; } void mthca_qp_event(struct mthca_dev *dev, u32 qpn,@@ -1103,6 +1102,9 @@ static int mthca_alloc_qp_common(struct}}+   qp->
sq.last = get_send_wqe(qp, qp->sq.max - 1);+   qp->rq.last = get_recv_wqe(qp, qp->rq.max - 1);+return 0; }@@ -1583,15 +1585,13 @@ int mthca_tavor_post_send(struct ib_qp *goto
out;}-   if (prev_wqe) {-  
((struct mthca_next_seg *) prev_wqe)->nda_op =-  
cpu_to_be32((

[openib-general] Strange configure error in libibcm

2005-09-13 Thread Viswanath Krishnamurthy
I got the latest code from the repository to verify mthca fixes, I ran into this
strange configure error in libibcm 

checking infiniband/at.h usability... yes
checking infiniband/at.h presence... yes
checking for infiniband/at.h... yes
checking for ANSI C header files... (cached) yes
checking for an ANSI C-conforming const... yes
checking for long... yes
checking size of long... configure: error: cannot compute sizeof (long), 77
See `config.log' for more details.

gcc version is 3.4
Linux 2.6.13

I was able to build earlier versions on the same machine. This happens only with libibcm Any clues ?

-Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] libmthca: fix wqe post (was Re: strange mem-free bug)

2005-09-13 Thread Viswanath Krishnamurthy
Michael,

Thanks..

Roland,

Once you generate a kernel patch, I can test out both user and kernel mthca since I have the tests
ready..

-Viswa
On 9/13/05, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:> Subject: strange mem-free bug (was: [openib-general] completion Q overflow error/panic)>> While looking at Viswa's example, I've found what seems to be a
> problem using lots of QPs on mem-free HCAs.Hi, Roland!This seems to be a bug in libmthca. Patch below.We probably need a similiar fix for kernel mthca - let me know ifyou plan to work on that, otherwise I'll look into it tomorrow.
And its probably something we want fixed for 2.6.14, right?Let me know.With regard to the test code that you posted - I also have some smallcomments. If you plan to use it in the future, you can stick it
in svn somewhere and I'll send patches.---Fix posting of the first work request for memfree hardware.Simplify code for tavor mode hardware.Signed-off-by: Michael S. Tsirkin <
[EMAIL PROTECTED]>Index: userspace/libmthca/src/qp.c===--- userspace.orig/libmthca/src/qp.c2005-09-13 17:17:58.0 +0300
+++ userspace/libmthca/src/qp.c 2005-09-13 17:26:23.0 +0300@@ -259,15 +259,13 @@ int mthca_tavor_post_send(struct ibv_qpgoto
out;}-   if (prev_wqe) {-  
((struct mthca_next_seg *) prev_wqe)->nda_op =-  
htonl(((ind << qp->sq.wqe_shift) +-  qp->send_wqe_offset)
|-
mthca_opcode[wr->opcode]);+  
((struct mthca_next_seg *) prev_wqe)->nda_op =+  
htonl(((ind << qp->sq.wqe_shift) ++  qp->send_wqe_offset)
|+
mthca_opcode[wr->opcode]);-  
((struct mthca_next_seg *) prev_wqe)->ee_nds =-  
htonl((size0 ? 0 : MTHCA_NEXT_DBD) | size);-   }+  
((struct mthca_next_seg *) prev_wqe)->ee_nds =+  
htonl((size0 ? 0 : MTHCA_NEXT_DBD) | size);if (!size0) {size0
= size;@@ -353,12 +351,10 @@ int mthca_tavor_post_recv(struct ibv_qpqp->wrid[ind] = wr->wr_id;-   if (prev_wqe) {-  
((struct mthca_next_seg *) prev_wqe)->nda_op =-  
htonl((ind << qp->rq.wqe_shift) | 1);-  
((struct mthca_next_seg *) prev_wqe)->ee_nds =-  
htonl(MTHCA_NEXT_DBD | size);-   }+  
((struct mthca_next_seg *) prev_wqe)->nda_op =+  
htonl((ind << qp->rq.wqe_shift) | 1);+  
((struct mthca_next_seg *) prev_wqe)->ee_nds =+  
htonl(MTHCA_NEXT_DBD | size);if (!size0)size0
= size;@@ -562,15 +558,13 @@ int mthca_arbel_post_send(struct ibv_qpgoto
out;}-   if (prev_wqe) {-  
((struct mthca_next_seg *) prev_wqe)->nda_op =-  
htonl(((ind << qp->sq.wqe_shift) +-  qp->send_wqe_offset)
|-
mthca_opcode[wr->opcode]);-  
mb();-  
((struct mthca_next_seg *) prev_wqe)->ee_nds =-  
htonl(MTHCA_NEXT_DBD | size);-   }+  
((struct mthca_next_seg *) prev_wqe)->nda_op =+  
htonl(((ind << qp->sq.wqe_shift) ++  qp->send_wqe_offset)
|+
mthca_opcode[wr->opcode]);+   mb();+  
((struct mthca_next_seg *) prev_wqe)->ee_nds =+  
htonl(MTHCA_NEXT_DBD | size);if (!size0) {size0
= size;@@ -767,6 +761,8 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd}}+   qp->sq.last = get_send_wqe(qp, qp->sq.max - 1);+   qp->rq.last = get_recv_wqe(qp, qp->
sq.max - 1);return 0; }Index: userspace/libmthca/src/srq.c===--- userspace.orig/libmthca/src/srq.c   2005-09-13 17:25:41.0
 +0300+++ userspace/libmthca/src/srq.c2005-09-13 17:25:51.0 +0300@@ -142,13 +142,11 @@ int mthca_tavor_post_srq_recv(struct ibv((struct
mthca_data_seg *) wqe)->addr = 0;}-   if (prev_wqe) {-  
((struct mthca_next_seg *) prev_wqe)->nda_op =-  
htonl((ind << srq->wqe_shift) | 1);-  
mb();-  
((struct mthca_next_seg *) prev_wqe)->ee_nds =-  
htonl(MTHCA_NEXT_DBD);-   }+  
((struct mthca_next_seg *) prev

[openib-general] Status of opensm 1.8 merge

2005-09-12 Thread Viswanath Krishnamurthy
Can I start testing opensm 1.8 merge on gen2  ?   What is the current status ?

-Viswa

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] completion Q overflow error/panic

2005-09-10 Thread Viswanath Krishnamurthy
Here is ibv_devinfo output. It is InfiniHost_III_Lx0

]# ibv_devinfo
hca_id: mthca0
   
fw_ver:
1.0.1
   
node_guid: 
0002:c902:0040:0cfc
   
sys_image_guid:
0002:c902:0040:0cff
   
max_mr_size:   
0x
   
page_size_cap: 
0x0
   
vendor_id: 
0x02c9
   
vendor_part_id:
25204
   
hw_ver:
0x0
   
phys_port_cnt: 
1
    port:   1
   
state: 
PORT_ACTIVE (4)
   
max_mtu:   
invalid MTU (0)
   
active_mtu:
invalid MTU (0)
   
sm_lid:
1
   
port_lid:  
1
   
port_lmc:  
0x00


Yes the CQE is a bug. But in this case at any time there should be  one
outstanding packet in the pipe. The client sends 1 packet, waits for response with a 
pause (delay), then sends the next packet. If everything works, we should be
using atmost 1 cq entry. Initially I had more number of CQ entries, but the problem
appeared later.

Looks like the packet is getting stuck somewhere, with no notification
back of any error.  Do we need to tweak any of the QP parameters ?
(packet life time, retries etc)  ?

-Viswa


On 9/9/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
I found one bug in your cmpost.c program that could cause CQoverruns.  When you create your receive and send CQs, you create themwith a cqe value of 5, so they can hold at most 5 entries.  However,you create the send and receive work queues so they can hold up to 10
entries, and in fact the code will post up to 8 entries at a time.  Soit's possible to overflow the CQ.The fix is to create the CQs to have at least as many entries as thework queues -- in other words, change cqe to 10.
However, even with this fixed I do see some strange behavior that I'mstill debugging.  More details on Monday.What HCA firmware version do your systems have? - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] completion Q overflow error/panic

2005-09-09 Thread Viswanath Krishnamurthy

Some more info..

This also happens in the kernel level. I have a small kernel module which does the echo
reply.  After about 100-200 connections, I start to see the following message

ib_mthca :05:00.0: SQ 590473 full (8 head, 0 tail, 8 max, 0 nreq)
ib_mthca :05:00.0: SQ 590477 full (8 head, 0 tail, 8 max, 0 nreq)
ib_mthca :05:00.0: SQ 59040c full (8 head, 0 tail, 8 max, 0 nreq)

Below 100 connections I do not see any such messages.   

Looks like if there is problem, it exists in both kernel and userland API's.

-Viswa


On 9/9/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Thanks for the excellent bug report.  I'll try your code and see if Ican reproduce the problem.  If I can, then I should be able to fix thebugs. - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] completion Q overflow error/panic

2005-09-09 Thread Viswanath Krishnamurthy
Somehow gmail ate away the main content of my mail..

Here it is..


I modified the cmpost program to have individual completion send/receive Q's.  The mcpost

server acts like a echo server, echoing back anything it receives. The client program keeps sending

the packets.



The test works fine upto around 600 connections. After 600 connections, I start to see ibv_post_send errors

with. I added some debug messages in libmthca/src/qp.c  where a check is made for wq_overflow. In fact

it is overflowing. I checked the code to make sure all the send descriptors are recovered with cq_poll operation. Also

the wc.status field is checked for any errors.

I am attaching the modified code . 



bash-3.00$ svn info

Path: .

URL: https://openib.org/svn/gen2/trunk

Repository UUID: 21a7a0b7-18d7-0310-8e21
-e8b31bdbf5cd
Revision: 3344
Node Kind: directory
Schedule: normal
Last Changed Author: jlentini
Last Changed Rev: 3344
Last Changed Date: 2005-09-08 16:39:25 -0700 (Thu, 08 Sep 2005)


To run the test compile the code 

cc -o cmpost cmpost.c -libcm -libverbs -libat

$ cmpost -n 1024    <=== as server

$ cmpost -c  -n 1024 -l  -g 

After sometime you start seeing post_send errors. On my system upto 600 connections work fine.


When running the test I saw panics couple of time. But difficult to reproduce

ernel BUG at include/asm/spinlock.h:149!
invalid operand:  [#1]
SMP
Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbdsd_mod
CPU:    1
EIP:    0060:[]    Not tainted VLI
EFLAGS: 00010086   (2.6.13)
EIP is at _spin_lock_irqsave+0x47/0x51
eax: 0011   ebx: 0282   ecx: c035950c   edx: 0082
esi: f7d82010   edi:    ebp: f6792c80   esp: c1a33ed0
ds: 007b   es: 007b   ss: 0068
Process ib_mad1 (pid: 308, threadinfo=c1a32000 task=f7e3c540)
Stack: c03123ee c0276963 f6792c80 f7d82010 c0276963 f79a6adc f7974b00 0001
   c1a33f0c f7912e00 f7df2000 f7df4200 c1a33f0c 0292 c0276b96 f6792c80
      b93e2c00 0128 0296 0402 0001
Call Trace:
 [] ib_mad_send_done_handler+0x72/0x11e
 [] ib_mad_send_done_handler+0x72/0x11e
 [] ib_mad_completion_handler+0x80/0x8d
 [] wait_noreap_copyout+0x55/0xbe
 [] worker_thread+0x1b0/0x23a
 [] schedule+0x5d3/0xbdf
 [] ib_mad_completion_handler+0x0/0x8d
 [] default_wake_function+0x0/0xc
 [] default_wake_function+0x0/0xc
 [] worker_thread+0x0/0x23a
 [] kthread+0x8a/0xb2
 [] kthread+0x0/0xb2
 [] kernel_thread_helper+0x5/0xb
Code: 00 00 74 01 fb f3 90 80 3e 00 7e f9 fa eb e8 83 c4 08 89 d8 5b 5e
c3 8b 44 24 10 c7 04 24 ee 23 31 c0 89 44 24 04 e8 2f e7 e1 ff
<0f> 0b 95 00 39 1c 31 c0 eb c2 53 89 c3 83 ec 08 fa 81 78 04 ad



-Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] completion Q overflow error/panic

2005-09-09 Thread Viswanath Krishnamurthy
I modified the cmpost program to have individual completion send/receive Q's.  The mcpost
server acts like a echo server, echoing back anything it receives. The client program keeps sending
the packets.

The test works fine upto around 600 connections. After 600 connections, I start to see ibv_post_send errors
with. I added some debug messages in libmthca/src/qp.c  where a check is made for wq_overflow. In fact
it is overflowing. I checked the code to make sure all the send descriptors are recovered with cq_poll operation. Also
the wc.status field is checked for any errors.
I am attaching the modified code . 

bash-3.00$ svn info
Path: .
URL: https://openib.org/svn/gen2/trunk
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3344
Node Kind: directory
Schedule: normal
Last Changed Author: jlentini
Last Changed Rev: 3344
Last Changed Date: 2005-09-08 16:39:25 -0700 (Thu, 08 Sep 2005)


To run the test compile the code 

cc -o cmpost cmpost.c -libcm -libverbs -libat

$ cmpost -n 1024    <=== as server

$ cmpost -c  -n 1024 -l  -g 

After sometime you start seeing post_send errors. On my system upto 600 connections work fine.


When running the test I saw panics couple of time. But difficult to reproduce

ernel BUG at include/asm/spinlock.h:149!
invalid operand:  [#1]
SMP
Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbdsd_mod
CPU:    1
EIP:    0060:[]    Not tainted VLI
EFLAGS: 00010086   (2.6.13)
EIP is at _spin_lock_irqsave+0x47/0x51
eax: 0011   ebx: 0282   ecx: c035950c   edx: 0082
esi: f7d82010   edi:    ebp: f6792c80   esp: c1a33ed0
ds: 007b   es: 007b   ss: 0068
Process ib_mad1 (pid: 308, threadinfo=c1a32000 task=f7e3c540)
Stack: c03123ee c0276963 f6792c80 f7d82010 c0276963 f79a6adc f7974b00 0001
   c1a33f0c f7912e00 f7df2000 f7df4200 c1a33f0c 0292 c0276b96 f6792c80
      b93e2c00 0128 0296 0402 0001
Call Trace:
 [] ib_mad_send_done_handler+0x72/0x11e
 [] ib_mad_send_done_handler+0x72/0x11e
 [] ib_mad_completion_handler+0x80/0x8d
 [] wait_noreap_copyout+0x55/0xbe
 [] worker_thread+0x1b0/0x23a
 [] schedule+0x5d3/0xbdf
 [] ib_mad_completion_handler+0x0/0x8d
 [] default_wake_function+0x0/0xc
 [] default_wake_function+0x0/0xc
 [] worker_thread+0x0/0x23a
 [] kthread+0x8a/0xb2
 [] kthread+0x0/0xb2
 [] kernel_thread_helper+0x5/0xb
Code: 00 00 74 01 fb f3 90 80 3e 00 7e f9 fa eb e8 83 c4 08 89 d8 5b 5e
c3 8b 44 24 10 c7 04 24 ee 23 31 c0 89 44 24 04 e8 2f e7 e1 ff
<0f> 0b 95 00 39 1c 31 c0 eb c2 53 89 c3 83 ec 08 fa 81 78 04 ad



-Viswa





cmpost.c
Description: Binary data
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [Fwd: Re: [openib-general] kernel oops]

2005-09-02 Thread Viswanath Krishnamurthy
See inline..On 02 Sep 2005 17:04:42 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Fri, 2005-09-02 at 16:59, Viswanath Krishnamurthy wrote:> Here is the setup..Thanks. A couple more questions:> #svn info> Path: .>> URL: 
https://openib.org/svn/gen2/trunk> Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd> Revision: 3295> Node Kind: directory> Schedule: normal> Last Changed Author: halr> Last Changed Rev: 3295
> Last Changed Date: 2005-09-01 12:07:54 -0700 (Thu, 01 Sep 2005)>>> Patch applied to core/at.c and kernel 2.6.13 recompiled.>>> Machine  A> => Running opensm
>> Run ucmpost>> machine B> => ./ucmpost Are these back to back HCAs or is there a switch in between ?

There is a  switch in between.  A simple setup with 2 machines and a switch.  The machines are running
2.6.13. One of them is running opensm.
> The problem is reproducible when you *cannot* ping each otherover IPoIB ?


Yes.. 
> [EMAIL PROTECTED] ~]# ibv_devinfo> hca_id: mthca0>
fw_ver:
1.0.1>
node_guid:  0002:c902:0040:0d00>
sys_image_guid:
0002:c902:0040:0d03>
max_mr_size:0x>
page_size_cap:  0x0>
vendor_id:  0x02c9>
vendor_part_id:
25204>
hw_ver:
0x0>
phys_port_cnt:  1> port:   1>
state:  PORT_ACTIVE
(4)>
max_mtu:invalid
MTU (0)  <> What is this ??>>
active_mtu:
invalid MTU (0)If the program is right and those are the real values, somehow max_mtuis trashed which causes active_mtu to be invalid which could break allsorts of things...
Is there some issue with the HCA ?  
>
sm_lid:
1>
port_lid:  
3>
port_lmc:  
0x00That's on the remote (from the SM) machine.-- Hal
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [Fwd: Re: [openib-general] kernel oops]

2005-09-02 Thread Viswanath Krishnamurthy
Here is the setup..

#svn info
Path: .

URL: https://openib.org/svn/gen2/trunk
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3295
Node Kind: directory
Schedule: normal
Last Changed Author: halr
Last Changed Rev: 3295
Last Changed Date: 2005-09-01 12:07:54 -0700 (Thu, 01 Sep 2005)


Patch applied to core/at.c and kernel 2.6.13 recompiled.


Machine  A
=
Running opensm

Run ucmpost

machine B
=
./ucmpost 

The problem is reproducible when you *cannot* ping each other

[EMAIL PROTECTED] ~]# ibv_devinfo
hca_id: mthca0
   
fw_ver:
1.0.1
   
node_guid: 
0002:c902:0040:0d00
   
sys_image_guid:
0002:c902:0040:0d03
   
max_mr_size:   
0x
   
page_size_cap: 
0x0
   
vendor_id: 
0x02c9
   
vendor_part_id:
25204
   
hw_ver:
0x0
   
phys_port_cnt: 
1
    port:   1
   
state: 
PORT_ACTIVE (4)
   
max_mtu:   
invalid MTU (0)  < What is this ??>
   
active_mtu:
invalid MTU (0)
   
sm_lid:
1
   
port_lid:  
3
   
port_lmc:  
0x00


-Viswa



On 02 Sep 2005 16:02:44 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Fri, 2005-09-02 at 15:39, Viswanath Krishnamurthy wrote:> The patch failed to fix the panic..Can you describe your setup ? Did you just run ucmpost without an SM/SArunning or is it a different scenario ?
Thanks.-- Hal
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [Fwd: Re: [openib-general] kernel oops]

2005-09-02 Thread Viswanath Krishnamurthy
I am working on it. With the updated version of code, slightly difficult to reproduce.

-Viswa

On 9/2/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Not really related to the ib_at oops, since I don't know that code.But have you made any progress in being able to post the code toreproduce the other oops (at mthca_poll_cq)?Thanks,  Roland

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [Fwd: Re: [openib-general] kernel oops]

2005-09-02 Thread Viswanath Krishnamurthy
The patch failed to fix the panic..

subnetmgr5 login: ib_at: ib_dev_ats_op: dev (c0449800) ib0 already has pending op 2
Unable to handle kernel NULL pointer dereference at virtual address 0068
 printing eip:
c02fee65
*pde = 365a7001
Oops:  [#1]
SMP
Modules linked in: nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbd sd_mod
CPU:    0
EIP:    0060:[]    Not tainted VLI
EFLAGS: 00010086   (2.6.13)
EIP is at _spin_lock_irqsave+0xa/0x51
eax: 0064   ebx: 0286   ecx: f665de6c   edx: c037bcd0
esi: 0064   edi: 0064   ebp:    esp: f665de00
ds: 007b   es: 007b   ss: 0068
Process lt-ucmpost (pid: 3749, threadinfo=f665c000 task=f6478020)
Stack: c01410ed 0001  c037bcd0 c0272f87  00d0 f665deac
   f67abe80 c027f14c c035ef80 c17f8ec0 f665de6c 0c30 0064 f665deac
   f67abe80 c0284cfa  0c30 0064 00d0 c02847b8 f67abe80
Call Trace:
 [] __alloc_pages+0x324/0x3f1
 [] ib_get_client_data+0x14/0x54
 [] ib_sa_path_rec_get+0x1b/0x138
 [] resolve_path+0x8c/0x15b
 [] path_req_complete+0x0/0xf7
 [] rtnetlink_dump_all+0x0/0x9e
 [] rtnetlink_done+0x0/0x3
 [] ib_at_paths_by_route+0xf5/0x10f
 [] same_path_req+0x0/0x95
 [] ib_uat_paths_by_route+0xef/0x1c4
 [] rtnetlink_dump_all+0x0/0x9e
 




-- Forwarded message --From: Sean Hefty <
[EMAIL PROTECTED]>To: Hal Rosenstock <[EMAIL PROTECTED]>Date: Thu, 01 Sep 2005 09:04:37 -0700Subject: Re: [openib-general] kernel oopsHal Rosenstock wrote:
> Here's a patch for this. Let me know if it works. [I tried it out and it> works for me.] If it does, the next question is how does the pointer get> trashed.I don't think that the pointer is getting trashed.  The SA was not running, so I
don't think that any route was returned.- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: List of issues in uverbs

2005-09-01 Thread viswanath krishnamurthy


--- Roland Dreier <[EMAIL PROTECTED]> wrote:

> viswanath> Here is new list of issues with
> uverbs
> 
> Thanks for the reports.
> 
> viswanath> I have attached the firmware
> version/svn info in the
> viswanath> attachment.
> 
> In the future can you attach things as text/plain
> (or just include
> them in your email)?  If you attach it as
> application/octet-stream
> then I have to save the attachment and open it
> manually, rather than
> just reading it as part of your email.
   OK..
> 
> viswanath> 2. libmthca library crashes when a
> server accepts lots
> viswanath> of new incoming sessions. See log
> (gdb) in the
> viswanath> attachment. (It accepts about 170
> connections) Looks
> viswanath> like a memory allocation issue.
> 
> I found a few bugs in libmthca relating to
> allocating doorbell records
> for memfree HCAs.  I've checked in fixes.  Please
> try the latest
> subversion libmthca and let me know if it helps.

  This definitely helped. No more crashes in the
library. Thanks
> 
> viswanath> 3. Kernel oops when lots of traffic
> between multiple
> viswanath> clients and server. Very consistently
> reproducible.
> viswanath> See attachment for details
> 
> Can you post the application you use to reproduce
> this?

  I still see the crash with yesterday's checkout
consistently at the same place. I will
send the application today to reproduce. If some debug
log needs to be collected let me know.
> 
> Thanks,
>   Roland
> 

Thanks,
Viswa




__ 
Yahoo! Mail 
Stay connected, organized, and protected. Take the tour: 
http://tour.mail.yahoo.com/mailtour.html 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] kernel oops

2005-09-01 Thread Viswanath Krishnamurthy
I will try out this patch and let you know..


Hal Rosenstock wrote:
> Here's a patch for this. Let me know if it works. [I tried it out and it
> works for me.] If it does, the next question is how does the pointer get
> trashed.

I don't think that the pointer is getting trashed.  The SA was not running, so I
don't think that any route was returned.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] List of issues in uverbs

2005-08-31 Thread viswanath krishnamurthy


--- Sean Hefty <[EMAIL PROTECTED]> wrote:

> viswanath krishnamurthy wrote:
> > 1. ib_cm_destroy_id(cm_id)
> > hangs (does return to the caller)
> > Is there a particular shutdown sequence
> > that needs to be followed ? Is there a
> trace/debug
> > I can enable ?
> 
> There's no significant debug to enable.  What app
> are you running that's calling 
> ib_cm_destroy_id()?  I didn't think that the ping
> pong tests used it.  Are you 
> trying to call this function from within a CM
> callback?

   Probably called from a callback.. The application
   is small application which accepts incoming   
connections (Like a socket server). 
 When is the good time to call the destroy ?
> 
> The call will hang while there is a CM callback
> outstanding or if a CM event has 
> not been completed by calling put_event.
> 
> > 2. libmthca library crashes when a server accepts
> > lots of new incoming sessions. See log (gdb)
> > in the attachment. (It accepts about 170
> > connections) Looks like a memory allocation issue.
> 
> The log file borders on unreadable.

Hope this time attachment is better..

  See information here
==
A server program that accepts multiple incoming
connections. After about 170 connections
the library dies as seen in the gdb output
==

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208648784 (LWP 21309)]
0xb7f79de8 in mthca_free_db (db_tab=0x805c688,
type=MTHCA_DB_TYPE_CQ_SET_CI, db_index=494) at
src/memfree.c:150
150 db_tab->page[db_index /
MTHCA_DB_REC_PER_PAGE].
(gdb) bt
#0  0xb7f79de8 in mthca_free_db (db_tab=0x805c688,
type=MTHCA_DB_TYPE_CQ_SET_CI, db_index=494)
at src/memfree.c:150
#1  0xb7f7c699 in mthca_create_cq (context=0x805a0b4,
cqe=10) at mthca.h:243
#2  0xb7f81eb5 in ibv_create_cq (context=0x805a0b4,
cqe=10, cq_context=0x0) at src/verbs.c:107
#3  0xb7f5d6c0 in xib_qp_alloc_init (hp=0x865c958,
port=1) at xsocket_trans2.c:157
#4  0xb7f5e19f in xib_conn_init (xcbp=0x865c958) at
xsocket_trans2.c:496
#5  0xb7f5bd06 in handle_cm_req (hp=0x805da08,
comm_id=0x865cab0, rguid=0x805db64 "",
rn_guid=0x805db64 "",
data=0x805d7b0, len=90) at xsocket.c:230
#6  0xb7f5ec73 in cm_handler () at
xsocket_trans2.c:799
#7  0x007993ae in start_thread () from
/lib/tls/libpthread.so.0
#8  0x00619aee in clone () from /lib/tls/libc.so.6



> 
> > 3. Kernel oops when lots of traffic between
> multiple
> >clients and server. Very consistently
> >reproducible.  See attachment for details
> 
> Can you clarify what application you're running?  I
> can't understand your 
> configuration from the log file.

The application is a simple one, which accepts
incoming requests and spawns a thread to handle it.
The application does simple "ping-pong" of data.

 printing eip:
c0285f7d
*pde = 3649a001
Oops:  [#1]
SMP
Modules linked in: nfs nfsd exportfs lockd autofs4
sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbd
sd_mod
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010002   (2.6.12.5)
EIP is at mthca_poll_cq+0x158/0x534
eax:    ebx: c2027080   ecx: 0007   edx:
0a60
esi: 013c   edi: c2027104   ebp: c1a33f0c   esp:
c1a33ea4
ds: 007b   es: 007b   ss: 0068
Process ib_mad1 (pid: 312, threadinfo=c1a32000
task=f7f16540)
Stack: c1800560 c17f8560 c17f8ec0 c1a33edc c0116819
f7d9489c f78a31e0 
   0080   0286 f7d83000
c1a33f0c 0001 f7d94880
   f8806000 0292 0001  c2027080
f7d83000 f789bc00 c1a33f0c
Call Trace:
 [] load_balance_newidle+0x76/0x81
 [] ib_mad_completion_handler+0x2c/0x8d
 [] remove_wait_queue+0xf/0x34
 [] worker_thread+0x1b0/0x23a
 [] ib_mad_completion_handler+0x0/0x8d
 [] default_wake_function+0x0/0xc
 [] default_wake_function+0x0/0xc
 [] worker_thread+0x0/0x23a
 [] kthread+0x8a/0xb2
 [] kthread+0x0/0xb2
 [] kernel_thread_helper+0x5/0xb Code: 01 00
00 8b 44 24 18 8d bb 84 00 00 00 8b 53 5c 8b 70 18 8b
4f 24 0f ce 2b b3 b8 00 00 00 8b 83 bc 00 00 00 d3 ee
01 f2 8d 14 d0 <8b> 02 8b 52 04 85 ff 89 45 00 89 55
04 74 16 8b 57 10 89 f0 39




After about 170 incoming connections the library
(hence
the application) dies..

> 
> > 4. Is there a way to get the Port GUID from
> >  incoming connection. I can only get the
> remote
> >node guid, but not the port GUID from the CM
> REQ
> > data. This was possible in gen1 stack.
> 
> You can use the returned path record to obtain port
> information.  What do you 
> need the port GUID for?

If an HCA has multiple ports, the node guid will be
the
same. It will be good to get the port guid to uniqely
identify the port.
> 
> - Sean
> 
Here is the code version used..

[EMAIL PROTECTED] svn in

[openib-general] List of issues in uverbs

2005-08-31 Thread viswanath krishnamurthy
I  have attached the firmware version/svn info in the
attachment. 

Here is new list of issues with uverbs

1. ib_cm_destroy_id(cm_id)
hangs (does return to the caller)
Is there a particular shutdown sequence
that needs to be followed ? Is there a trace/debug
I can enable ?

2. libmthca library crashes when a server accepts
lots of new incoming sessions. See log (gdb)
in the attachment. (It accepts about 170
connections) Looks like a memory allocation issue.

3. Kernel oops when lots of traffic between multiple
   clients and server. Very consistently
   reproducible.  See attachment for details

4. Is there a way to get the Port GUID from
 incoming connection. I can only get the remote
   node guid, but not the port GUID from the CM REQ
data. This was possible in gen1 stack.


I will look in the rc_ping pong issue and try to
reproduce.


--- Roland Dreier <[EMAIL PROTECTED]> wrote:

> viswanath> I have the latest openib code on 2.16
> machine, when I
> viswanath> run the rc pingpong program I get the
> following error
> viswanath> (The first time it passed, but
> subsequent ones got an
> viswanath> error, I tried changing the iteration
> count to a large
> viswanath> number, 10 after the first time)
> 
> I left "ibv_rc_pingpong -n 10" running in a loop
> between two of my
> machines with no problems, so there's something
> specific to your setup.
> 
> When you say "latest openib code," what does this
> mean?  Are you
> running something from subversion or a standard
> Linux kernel?  Do you
> have 1-port or 2-port HCAs?  What HCA firmware
> version are you
> running?
> 
>  - R.
> 




Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 

ib.log
Description: 2164448128-ib.log
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] rc ping pong error

2005-08-29 Thread viswanath krishnamurthy
I have the latest openib code on 2.16 machine, when
I run the rc pingpong program I get the following
error (The first time it passed, but subsequent ones
got an error, I tried changing the iteration count to
a large number, 10 after the first time)

#dmesg

ib_mthca :05:00.0: Mapped page at 395aa000 to
8 for ICM.
ib_mthca :05:00.0: CQ overrun on CQN 5b0083 
<=
ib_mthca :05:00.0: Unmapping 1 pages at 8 from
ICM.

[EMAIL PROTECTED] ./ibv_rc_pingpong 192.169.8.117
  local address:  LID 0x0003, QPN 0x440405, PSN
0xd6ae4e
  remote address: LID 0x0001, QPN 0x3a0405, PSN
0x9317a4
  [ 0] 00440405
  [ 4] 
  [ 8] 
  [ c] 
  [10] 1581
  [14] 
  [18] 8002
  [1c] ff10
Failed status 12 for wr_id 2









Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] kernel oops

2005-08-26 Thread Viswanath Krishnamurthy

Still see the issue

1. I rebooted both the machines,  started opensm, after LID assignment 
killed opensm.

Next started the ucmpost client/server, killing it panics the system

-Viswa


Unable to handle kernel NULL pointer dereference at virtual address 0068
printing eip:
c02f2635
*pde = 3661e001
Oops:  [#1]
SMP
Modules linked in: nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd 
hw_random e1000 ext3 jbd sd_mod

CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010086   (2.6.12.5)
EIP is at _spin_lock_irqsave+0xa/0x51
eax: 0064   ebx: 0286   ecx: f689be6c   edx: c036cbcc
esi: 0064   edi: 0064   ebp:    esp: f689be00
ds: 007b   es: 007b   ss: 0068
Process lt-ucmpost (pid: 3993, threadinfo=f689a000 task=f6ef9540)
Stack:  c013e3f0  c036cbcc c0267667  00d0 
f689beac
  f66a9e80 c027393f c0350d00  f689be6c 0c30 0064 
f689beac
  f66a9e80 c027955f  0c30 0064 00d0 c0279022 
f66a9e80

Call Trace:
[] __alloc_pages+0x166/0x3b6
[] ib_get_client_data+0x14/0x54
[] ib_sa_path_rec_get+0x1b/0x13e
[] resolve_path+0x8c/0x15b
[] path_req_complete+0x0/0xf7
[] rtnetlink_dump_all+0x0/0x9e
[] rtnetlink_done+0x0/0x3
[] ib_at_paths_by_route+0xc4/0xd9
[] same_path_req+0x0/0x95   


Sean Hefty wrote:


I downloaded the latest openib gen2 stack and ran into kernel panic when
I run the cmpost/ucmpost example. I modified the program to continously
send and receive data in an infinite loop and killed the application
with ctrl-c.
The kernel panics pretty consistently.

I am currently running 2.6.12 version of the kernel .  Log attached.  I
will try
upgrading to newer kernel and see if I can reproduce it.
   



I have gotten something similar to this in my own testing, but haven't had the
time to track it down.  It seems to be related to how the IB AT code interacts
with the SM, and if the SM has been restarted.  Can you try resetting the SM
node, then rebooting your other systems?

- Sean

 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] kernel oops

2005-08-26 Thread Viswanath Krishnamurthy

I downloaded the latest openib gen2 stack and ran into kernel panic when
I run the cmpost/ucmpost example. I modified the program to continously
send and receive data in an infinite loop and killed the application 
with ctrl-c.

The kernel panics pretty consistently.

I am currently running 2.6.12 version of the kernel .  Log attached.  I 
will try

upgrading to newer kernel and see if I can reproduce it.

-Viswa


[EMAIL PROTECTED] examples]# uname -a
Linux subnetmgr4 2.6.12 #7 SMP Thu Aug 25 22:33:36 PDT 2005 i686 i686 i386 
GNU/Linux

# svn info
Path: .
URL: https://openib.org/svn/gen2/trunk/src/linux-kernel
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3197
Node Kind: directory
Schedule: normal
Last Changed Author: roland
Last Changed Rev: 3197
Last Changed Date: 2005-08-25 18:07:09 -0700 (Thu, 25 Aug 2005)


mgr4 login: Unable to handle kernel NULL pointer dereference at virtual address 
0068
 printing eip:
c02f35c5
*pde = 365b6001
Oops:  [#1]
SMP
Modules linked in: nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd 
hw_random e1000 ext3 jbd sd_mod
CPU:1
EIP:0060:[]Not tainted VLI
EFLAGS: 00010086   (2.6.12)
EIP is at _spin_lock_irqsave+0xa/0x51
eax: 0064   ebx: 0286   ecx: f6607e6c   edx: c036dbcc
esi: 0064   edi: 0064   ebp:    esp: f6607e00
ds: 007b   es: 007b   ss: 0068
Process lt-ucmpost (pid: 3837, threadinfo=f6606000 task=f6feb020)
Stack:  c013e410  c036dbcc c0267637 0073 00d0 f6607eac
   f6504e80 c027390f c0351d00  f6607e6c 0c30 0064 f6607eac
   f6504e80 c027952f  0c30 0064 00d0 c0278ff2 f6504e80
Call Trace:
 [] __alloc_pages+0x166/0x3b6
 [] ib_get_client_data+0x14/0x54
 [] ib_sa_path_rec_get+0x1b/0x13e
 [] resolve_path+0x8c/0x15b
 [] path_req_complete+0x0/0xf7
 [] rtnetlink_dump_all+0x0/0x9e
 [] rtnetlink_done+0x0/0x3
 [] ib_at_paths_by_route+0xc4/0xd9
 [] same_path_req+0x0/0x95
 [] ib_uat_paths_by_route+0xef/0x1c4
 [] rtnetlink_dump_all+0x0/0x9e
 [] rtnetlink_done+0x0/0x3
 [] ib_uat_write+0x96/0xa2
 [] vfs_write+0x108/0x10a
 [] sys_write+0x41/0x6a
 [] sysenter_past_esp+0x54/0x75
Code: c8 c3 81 78 04 ed 1e af de 75 0c f0 83 28 01 79 05 e8 94 e5 ff ff c3 0f 
0b d7 00 56 60 30 c0 eb ea 56 89 c6 53 83 ec 08 9c 5b fa <81> 78 04 ad 4e ad de 
75 20 f0 fe 0e 79 13 f7 c3 00 02 00 00 74
 <7>ib_mthca :05:00.0: Unmapping 1 pages at 8 from ICM.
ib_mthca :05:00.0: Unmapping 1 pages at bf000 from ICM.
Unable to handle kernel NULL pointer dereference at virtual address 0005
 printing eip:
c027a160
*pde = 37e01001
Oops: 0002 [#2]
SMP
Modules linked in: nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd 
hw_random e1000 ext3 jbd sd_mod
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010286   (2.6.12)
EIP is at ib_uat_create_event+0x4e/0xb4
eax: 0246   ebx: 0005   ecx:    edx: 0066
esi: c19fa180   edi: c19fa1b4   ebp: f72f5f00   esp: f7f4def4
ds: 007b   es: 007b   ss: 0068
Process ib_at_wq/0 (pid: 309, threadinfo=f7f4c000 task=f7cfca60)
Stack: 0096  f72f5f00 0001 f6504ed4  c027a219 0002
    ff92 f6504eb0 f6504ed4 0292 c027a291 f72f5f00 ff92
   ff92 c0278913 ff92 f6504eb0 c1a12000 c0129aad  000f41fd
Call Trace:
 [] ib_uat_callback+0x53/0x6d
 [] ib_uat_path_callback+0x1a/0x1f
 [] req_comp_work+0x19/0x25
 [] worker_thread+0x1b0/0x23a
 [] req_comp_work+0x0/0x25
 [] default_wake_function+0x0/0xc
 [] default_wake_function+0x0/0xc
 [] worker_thread+0x0/0x23a
 [] kthread+0x8a/0xb2
 [] kthread+0x0/0xb2
 [] kernel_thread_helper+0x5/0xb
Code: 84 82 00 00 00 89 c7 b9 0d 00 00 00 89 d8 f3 ab 89 6e 04 ba 66 00 00 00 
8b 44 24 04 89 06 b8 fa 44 30 c0 8b 5d 08 e8 84 ea e9 ff  ff 0b 0f 88 27 0d 
00 00 8b 45 08 8d 56 08 83 c0 24 8b 48 04

  
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: useraccess_cm sample client/server (gen1 )

2005-07-07 Thread viswanath krishnamurthy
Itamar,

Thanks.. I was able to use it.

One more question.. Once the connection is
established,
which API's needs to be used from the userland to send
and receive data.. Any sample code/pointers  is
appreciated.

Thanks,
-Vish
--- Itamar Rabenstein <[EMAIL PROTECTED]> wrote:

> openib is working now on gen2.
> but if you want you can look at mellanox IBGD 1.7.0
> from  
> www.mellnaox.com follow the link "Download IB GOLD -
> 1.7.0"
> look for udapl code .
> The code is useing the user_cm IF
> 
> Itamar
> 
> > -----Original Message-
> > From: viswanath krishnamurthy
> [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, July 06, 2005 8:28 PM
> > To: viswanath krishnamurthy;
> openib-general@openib.org
> > Subject: [openib-general] Re: useraccess_cm sample
> 
> > client/server (gen1)
> > 
> > 
> > I looked further into the whole gen1 source tree.
> > There is no consumer of this useraccess_cm API
> > (ioctl). Are there any consumers of this API's. Is
> it
> > supported ?
> > 
> > Thanks,
> > Vish
> > 
> > --- viswanath krishnamurthy <[EMAIL PROTECTED]>
> wrote:
> > 
> > > Is there a sample code (examples) to use the
> gen1
> > > stack user level CM API (ioctls) ? Any pointers
> is
> > > appreciated.
> > > 
> > > Thanks,
> > > Vish
> > > 
> > > 
> > >   
> > >
> 
> > > 
> > > Yahoo! Sports 
> > > Rekindle the Rivalries. Sign up for Fantasy
> Football
> > > 
> > > http://football.fantasysports.yahoo.com
> > > 
> > 
> > 
> > 
> > 
> >
> 
> > Sell on Yahoo! Auctions - no fees. Bid on great
> items.  
> > http://auctions.yahoo.com/
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 
> 



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: useraccess_cm sample client/server (gen1)

2005-07-06 Thread viswanath krishnamurthy
I looked further into the whole gen1 source tree.
There is no consumer of this useraccess_cm API
(ioctl). Are there any consumers of this API's. Is it
supported ?

Thanks,
Vish

--- viswanath krishnamurthy <[EMAIL PROTECTED]> wrote:

> Is there a sample code (examples) to use the gen1
> stack user level CM API (ioctls) ? Any pointers is
> appreciated.
> 
> Thanks,
> Vish
> 
> 
>   
> 
> 
> Yahoo! Sports 
> Rekindle the Rivalries. Sign up for Fantasy Football
> 
> http://football.fantasysports.yahoo.com
> 





Sell on Yahoo! Auctions – no fees. Bid on great items.  
http://auctions.yahoo.com/
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] useraccess_cm sample client/server (gen1)

2005-07-05 Thread viswanath krishnamurthy
Is there a sample code (examples) to use the gen1
stack user level CM API (ioctls) ? Any pointers is
appreciated.

Thanks,
Vish



 
Yahoo! Sports 
Rekindle the Rivalries. Sign up for Fantasy Football 
http://football.fantasysports.yahoo.com
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general