Re: [openib-general] [openfabrics-ewg] OFED 1.2 release - to be reviewed in the meeting today

2007-01-31 Thread Tziporet Koren
thanks 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Sean Hefty
Sent: Tuesday, January 30, 2007 8:41 PM
To: Michael S. Tsirkin; Sasha Khapyorsky; [EMAIL PROTECTED]
Cc: EWG; OPENIB
Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.2 release - to be
reviewed in the meeting today

 *Sources developed in OFA:*
 1. Each git owner will open a branch with the name ofed_1_2. This
branch
 should be opened on 31-Jan (based on code readiness we will review
today).

I've added ofed_1_2 branches to my libibcm.git, librdmacm.git, and
rdma-dev.git
trees.

- Sean

___
openfabrics-ewg mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openfabrics-ewg

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [mthca] Creation of a SRQ with many WR ( 16K) in kernel level fails

2007-01-31 Thread Dotan Barak
Roland Dreier wrote:
   anyway, the solution that comes into my mind is to disable creating a
   QP/SRQ for which  128KB allocations are needed. So
   mthca_query_device() will set the max_qp_wr and max_srq_wr attributes
   to values whose derived size still allows to use kmalloc.

 But that will limit the size of the queues that userspace can create
 too.  I guess we could allocate kernel wrid arrays with vmalloc(), but
 I wonder if anyone actually cares about this limit...
   
I think that now, when implementation of IPoIB CM is available and SRQ 
is being used, one may
need to use a SRQ with more than 16K WRs.

thanks
Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [mthca] Creation of a SRQ with many WR ( 16K) in kernel level fails

2007-01-31 Thread Michael S. Tsirkin
 Quoting Dotan Barak [EMAIL PROTECTED]:
 Subject: Re: [mthca] Creation of a SRQ with many WR ( 16K) in kernel level 
 fails
 
 Roland Dreier wrote:
anyway, the solution that comes into my mind is to disable creating a
QP/SRQ for which  128KB allocations are needed. So
mthca_query_device() will set the max_qp_wr and max_srq_wr attributes
to values whose derived size still allows to use kmalloc.
 
  But that will limit the size of the queues that userspace can create
  too.  I guess we could allocate kernel wrid arrays with vmalloc(), but
  I wonder if anyone actually cares about this limit...

 I think that now, when implementation of IPoIB CM is available and SRQ 
 is being used, one may need to use a SRQ with more than 16K WRs.

Not really: IPoIB CM uses a common CQ for all recv completions, so
it does not make sense for IPoIB CM to create a SRQ bigger than
the max CQ size.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [libibverbs] destroying an AH causes a seg fault (this failure appeared during the last night)

2007-01-31 Thread Dotan Barak
Hi Roland.

During the last night many tests failed in our regression (new failure 
that appeared only during
the last night).

It seems that destroy an AH causes a seg fault, i reproduced it using 
the ibv_ud_pingpong.

Here are the machine props:
*
Host Architecture : x86_64
Linux Distribution: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10
Kernel Version: 2.6.16.21-0.8-smp
GCC Version   : gcc (GCC) 4.1.0 (SUSE Linux)
Memory size   : 5081168 kB
Driver Version: gen2_devel-20070130-1817
HCA ID(s) : mthca0
HCA model(s)  : 25218
FW version(s) : 5.2.0
Board(s)  : MT_015002
*

Driver Checksums:
gen2_devel-20070130-1817
Kernel:
Git:
git://git.openfabrics.org/~vlad/ofed_1_2/.git
commit ab8b772956b6178ef14c983fd215d0dda3fb6842
Kernel:
Git:
git://git.openfabrics.org/~vlad/ofed_1_2/.git
commit ab8b772956b6178ef14c983fd215d0dda3fb6842



Here is the backtrace of the core dump:

# gdb ibv_ud_pingpong core
GNU gdb 6.4
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as x86_64-suse-linux...Using host libthread_db 
library /lib64/libthread_db.so.1.

Core was generated by `ibv_ud_pingpong sw031'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib64/libibverbs.so.1...done.
Loaded symbols for /usr/local//lib64/libibverbs.so.1
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/local/lib64/libcxgb3-rdmav2.so...done.
Loaded symbols for /usr/local/lib64/libcxgb3-rdmav2.so
Reading symbols from /usr/local/lib64/libmthca-rdmav2.so...done.
Loaded symbols for /usr/local/lib64/libmthca-rdmav2.so
#0  0x2b94b6612263 in __ibv_destroy_ah (ah=0x504e60) at src/verbs.c:475
475 return ah-context-ops.destroy_ah(ah);
(gdb) bt
#0  0x2b94b6612263 in __ibv_destroy_ah (ah=0x504e60) at src/verbs.c:475
#1  0x00401cb8 in pp_close_ctx (ctx=0x505340) at 
examples/ud_pingpong.c:387
#2  0x00402a2b in main (argc=value optimized out, argv=value 
optimized out) at examples/ud_pingpong.c:749



thanks
Dotan



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [mthca] Creation of a SRQ with many WR ( 16K) in kernel level fails

2007-01-31 Thread Dotan Barak
Michael S. Tsirkin wrote:
 I think that now, when implementation of IPoIB CM is available and SRQ 
 is being used, one may need to use a SRQ with more than 16K WRs.
 

 Not really: IPoIB CM uses a common CQ for all recv completions, so
 it does not make sense for IPoIB CM to create a SRQ bigger than
 the max CQ size.

   

In many HCAs, the maximum CQ size is 128K entries.


Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Bugzilla Bug 329: HCA_FATAL_EVENT cause to OpenSM to stop functioning

2007-01-31 Thread Yevgeny Kliteynik
Hi Hal.

I noticed the following bug in Bugzilla:

Bugzilla Bug 329: HCA_FATAL_EVENT cause to opensm to stop functioning
  https://bugs.openfabrics.org/show_bug.cgi?id=329

When there is a HCA fatal event on the host that opensm is running on 
it,
the opensm stop to function (After the event, the driver restart the 
device,
and the port does not return to active state).

If the opensm run in sweep mode , after the event you can see that the 
opensm
stop sweeping.

I remember that a couple of months ago I sent a patch that takes care of this 
problem:
 - in case of IBV_EVENT_DEVICE_FATAL, osm was forced to exit
 - in case of IBV_EVENT_PORT_ERROR, osm initiated heavy sweep

The problem with my patch was that it made osm to depend on uverbs module.
To resolve this problem, support should be added in umad, and then osm could
use this support.

Do you know if some work in this area was done in umad?

-- Yevgeny

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ofa_1_2_kernel 20070131-0200 daily build status

2007-01-31 Thread vlad
This email was generated automatically, please do not reply


Common build parameters:  --with-ipoib-mod --with-sdp-mod --with-srp-mod 
--with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-core-mod 
--with-addr_trans-mod --with-cxgb3-mod 

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.16
Passed on powerpc with linux-2.6.18
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.15
Passed on powerpc with linux-2.6.19
Passed on powerpc with linux-2.6.17
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.13
Passed on powerpc with linux-2.6.16
Passed on powerpc with linux-2.6.12
Passed on x86_64 with linux-2.6.14
Passed on ppc64 with linux-2.6.19
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on powerpc with linux-2.6.15
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.17
Passed on ia64 with linux-2.6.19
Passed on ppc64 with linux-2.6.18
Passed on powerpc with linux-2.6.14
Passed on ppc64 with linux-2.6.14
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.15

Failed:

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] The ibv_cmd_* create functions need to set the context.

2007-01-31 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH] The ibv_cmd_* create functions need to set the context.
 
 Thanks, applied to master and stable branches.

Did you test it?
This patch (8b3d225476c99ea29a68109a7d40e5ef353d4388) causes ibv_ud_pingpong
to segfault on libmthca: libmthca never calls ibv_cmd_create_ah to context is 
now
never set.

Starting program: /usr/local/ofed/bin/ibv_ud_pingpong sw069
[Thread debugging using libthread_db enabled]
[New Thread 47299578320592 (LWP 5085)]
  local address:  LID 0x0002, QPN 0x090406, PSN 0x71bffb
  remote address: LID 0x0001, QPN 0x040406, PSN 0x92316a
4096000 bytes in 0.02 seconds = 1893.99 Mbit/sec
1000 iters in 0.02 seconds = 17.30 usec/iter

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47299578320592 (LWP 5085)]
0x2b04ca3b7263 in __ibv_destroy_ah (ah=0x5050b0) at src/verbs.c:475
475 return ah-context-ops.destroy_ah(ah);
(gdb) p ah-context
$1 = (struct ibv_context *) 0x0

I actually think this approach is a wrong one: context should be
set in common code like ibv_create_ah, not in ibv_cmd_ which is
a library function low level driver might or might not call.
And certainly this kind of change does not seem appropriate for stable branch.

I think the proper thing is for low level driver not to assume that
fields such as contex are intialized until create functions have returned.
Steve, pls fix your low level driver not to rely on this.

Roland, I have reverted this in OFED, please revert on master and stable.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 334] New: Problems with build OFED-1.1.1-ib_local_sa

2007-01-31 Thread bugzilla-daemon
https://bugs.openfabrics.org/show_bug.cgi?id=334

   Summary: Problems with build OFED-1.1.1-ib_local_sa
   Product: OpenFabrics Linux
   Version: gen2
  Platform: X86-64
OS/Version: SLES 10
Status: NEW
  Severity: critical
  Priority: P1
 Component: IB Core
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]


I have a problem with build RPM packages on SLES10. The output of prolem is
next:
  gcc
-Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/.cma.o.d 
-nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include 
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include  -Iinclude 
-Iinclude2 -I/usr/src/linux-2.6.16.21-0.8/include  -include
include/linux/autoconf.h  -include
/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h   
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core  -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration
-fno-strict-aliasing -fno-common -ffreestanding -Os -fomit-frame-pointer
-mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks
-Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse
-mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement -Wno-pointer-sign
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include 
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include 
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib 
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug  -DMODULE
-DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(cma) 
-DKBUILD_MODNAME=KBUILD_STR(rdma_cm) -c -o
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/.tmp_cma.o
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/cma.c
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/cma.c: In function
'cma_resolve_ib_route':
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/cma.c:1205: error:
implicit declaration of function 'ib_get_path_rec'
make[5]: *** [/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/cma.o]
Error 1
make[4]: *** [/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core] Error
2
make[3]: *** [_module_/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband]
Error 2
make[2]: *** [modules] Error 2
make[1]: *** [modules] Error 2

The machine configuration:
Kernel: Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64
x86_64 x86_64 GNU/Linux
OS: SUSE Linux Enterprise Server 10 (x86_64)
gcc version: gcc (GCC) 4.1.0 (SUSE Linux)


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa

2007-01-31 Thread bugzilla-daemon
https://bugs.openfabrics.org/show_bug.cgi?id=334


[EMAIL PROTECTED] changed:

   What|Removed |Added

 CC||[EMAIL PROTECTED]




-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Minutes for January 29, 2007 teleconference about OFED 1.2 release integration and build procedures

2007-01-31 Thread Jeff Squyres
On Jan 31, 2007, at 1:58 AM, Michael S. Tsirkin wrote:

4. Vlad to have a daily build of the full OFED package

 Where is this build available from?

 http://www.openfabrics.org/builds/

All I see at that URL is nightly tarballs of the OFA kernel sources  
and the OFA user sources.

I was under the impression from the above text that there would be an  
**OFED** nightly tarball generated.

Is this incorrect?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Minutes for January 29, 2007 teleconference about OFED 1.2 release integration and build procedures

2007-01-31 Thread Michael S. Tsirkin
 Quoting Jeff Squyres [EMAIL PROTECTED]:
 Subject: Re: Minutes for January 29, 2007 teleconference about OFED 1.2 
 release integration and build procedures
 
 On Jan 31, 2007, at 1:58 AM, Michael S. Tsirkin wrote:
 
 4. Vlad to have a daily build of the full OFED package
 
  Where is this build available from?
 
  http://www.openfabrics.org/builds/
 
 All I see at that URL is nightly tarballs of the OFA kernel sources  
 and the OFA user sources.
 
 I was under the impression from the above text that there would be an  
 **OFED** nightly tarball generated.
 
 Is this incorrect?

OFED didn't branch yet so there's no difference.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Minutes for January 29, 2007 teleconference about OFED 1.2 release integration and build procedures

2007-01-31 Thread Jeff Squyres
On Jan 31, 2007, at 6:51 AM, Michael S. Tsirkin wrote:

 I was under the impression from the above text that there would be an
 **OFED** nightly tarball generated.

 OFED didn't branch yet so there's no difference.

So are you saying that starting tomorrow (or shortly after tomorrow  
-- whatever), there will be a nightly OFED tarball (with all the OFED  
build scripts and sources and whatnot -- quite different than just  
bundling the OFA sources together) available at that URL as well?

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Minutes for January 29, 2007 teleconference about OFED 1.2 release integration and build procedures

2007-01-31 Thread Tziporet Koren
Jeff Squyres wrote:
 On Jan 31, 2007, at 6:51 AM, Michael S. Tsirkin wrote:

   
 I was under the impression from the above text that there would be an
 **OFED** nightly tarball generated.
   
 OFED didn't branch yet so there's no difference.
 

 So are you saying that starting tomorrow (or shortly after tomorrow  
 -- whatever), there will be a nightly OFED tarball (with all the OFED  
 build scripts and sources and whatnot -- quite different than just  
 bundling the OFA sources together) available at that URL as well?

   
There is a misunderstanding here:
Michale pointed you to the current daily build of OFA SW.
The build of the full OFED tarball will available early next week (hope 
on Monday).
When this will happened Vlad will send a mail to all with the packages 
location.

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.2 release - to be reviewed in the meeting today

2007-01-31 Thread Tziporet Koren
Jeff Squyres wrote:
 It would be helpful to see the MVAPICH1 distribution for OFED 1.2 
 somewhere on the OFA server (under ~vlad/ofed_1_2 or 
 ~vlad/public_html/ofed_1_2...?) for comparison / example purposes.
Pasha will place his SRPM on ~pasha/ofed_1_2 today

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Minutes for January 29, 2007 teleconference about OFED 1.2 release integration and build procedures

2007-01-31 Thread Jeff Squyres
On Jan 31, 2007, at 7:29 AM, Tziporet Koren wrote:

 There is a misunderstanding here:
 Michale pointed you to the current daily build of OFA SW.
 The build of the full OFED tarball will available early next week  
 (hope on Monday).
 When this will happened Vlad will send a mail to all with the  
 packages location.

Great -- thanks!

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.2 release - to be reviewed in the meeting today

2007-01-31 Thread Tziporet Koren
Shaun Rowland wrote:

 Hi. I am not exactly sure where the ofed_1_2 directory for MPI SRPMs is
 supposed to go. I assume from previous meetings this is just a
 filesystem directory. Should it be a directory in my home directory on
 staging.openfabrics.org, in ~/public_html, or is there something else I
 need to do to put this into place? I think from the previous MPI
 specific meeting, this was supposed to be done in a web directory. Since
 I am unclear, I wanted to ask here.

Please place your SRPM under your home directory at ofed_1_2 directory.
Then you can make this directory accessible to the web in this way:
1. mkdir public_html
2. chmod 755 public_html
 
Now you can put any stuff under public_html (also symbolic links) and it 
will be available via web
www.openfabrics.org/~user name/

Tziporet



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC][PATCH] rdma_cm: allow joins to return a unique address

2007-01-31 Thread Or Gerlitz
On 1/30/07, Sean Hefty [EMAIL PROTECTED] wrote:
 I believe that this patch lets you can do what you're trying to do.  The group
 handle would be the returned mgid from the initial join that created the 
 group.
   The mgid would need to be passed to other processes as an IPv6 address, who
 issue a join request on that group.  (The mgid is available from the
 rdma_cm_event.param.ud.ah_attr.grh.dgid.)

Sean,

I understand that your approach relies on the uniqueness of the MGID
being generated. This means that to have different MPI jobs use
different MGIDs , the MGIDs must be generated --always-- on the same
NODE and be propagated to other nodes/ranks participating in that MPI
job - correct?

Andrew - can you fulfil this demand? that is having the rank which
generated MGIDs always run on the same node of the cluster???

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC][PATCH] rdma_cm: allow joins to return a unique address

2007-01-31 Thread Andrew Friedley
Or Gerlitz wrote:
 Sean,
 
 I understand that your approach relies on the uniqueness of the MGID
 being generated. This means that to have different MPI jobs use
 different MGIDs , the MGIDs must be generated --always-- on the same
 NODE and be propagated to other nodes/ranks participating in that MPI
 job - correct?
 
 Andrew - can you fulfil this demand? that is having the rank which
 generated MGIDs always run on the same node of the cluster???

Not across multiple MPI jobs, no -- MPI jobs have no awareness of each 
other whatsoever.

Andrew

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] The ibv_cmd_* create functions need to set the context.

2007-01-31 Thread Steve Wise
On Wed, 2007-01-31 at 12:24 +0200, Michael S. Tsirkin wrote:
  Quoting Roland Dreier [EMAIL PROTECTED]:
  Subject: Re: [PATCH] The ibv_cmd_* create functions need to set the context.
  
  Thanks, applied to master and stable branches.
 
 Did you test it?
 This patch (8b3d225476c99ea29a68109a7d40e5ef353d4388) causes ibv_ud_pingpong
 to segfault on libmthca: libmthca never calls ibv_cmd_create_ah to context is 
 now
 never set.
 
 

I didn't test UD.  

 Starting program: /usr/local/ofed/bin/ibv_ud_pingpong sw069
 [Thread debugging using libthread_db enabled]
 [New Thread 47299578320592 (LWP 5085)]
   local address:  LID 0x0002, QPN 0x090406, PSN 0x71bffb
   remote address: LID 0x0001, QPN 0x040406, PSN 0x92316a
 4096000 bytes in 0.02 seconds = 1893.99 Mbit/sec
 1000 iters in 0.02 seconds = 17.30 usec/iter
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 47299578320592 (LWP 5085)]
 0x2b04ca3b7263 in __ibv_destroy_ah (ah=0x5050b0) at src/verbs.c:475
 475 return ah-context-ops.destroy_ah(ah);
 (gdb) p ah-context
 $1 = (struct ibv_context *) 0x0
 
 I actually think this approach is a wrong one: context should be
 set in common code like ibv_create_ah, not in ibv_cmd_ which is
 a library function low level driver might or might not call.
 And certainly this kind of change does not seem appropriate for stable branch.
 
 I think the proper thing is for low level driver not to assume that
 fields such as contex are intialized until create functions have returned.
 Steve, pls fix your low level driver not to rely on this.
 

The issue is that the provider lib calls ibv_cmd_create_blah to create
the object, then some failure happens (like a failure mmap()ing the
object's DMA area to the process).  At this point the provider lib must
destroy this object that is created from the perspective of the ibv_cmd*
interface.  The only way to do that is to call the ibv_cmd_destroy_blah
call, which needs the context field.

So I don't think solving this in the provider lib is the right thing to
do.

 Roland, I have reverted this in OFED, please revert on master and stable.
 

I think we should fix the bug introduced:  set the context field in the
ibv_create_blah service if its not set after calling the provider
method.

Steve.





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH ofed-1.2 alpha rel] ehca: reworked irq handler to support NAPI consistently

2007-01-31 Thread Vladimir Sokolovsky
On Tue, 2007-01-30 at 20:52 +0100, Hoang-Nam Nguyen wrote:
 Hi Vladimir,
 here is a patch for ehca with reworked irq handler. With those changes
 the performance result without/with scaling code and with NAPI (scaling
 code turned off) is consistent. They also reduce the rate of drop 
 packets (when scaling code is turned off) significantly.
 Thanks
 Nam
 PS: Roland, this patch is aligned with ofed-1.2 only. I'l send this
 patch for 2.6.21 separately next week.
 
 
 Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED]
 ---
 

Added to kernel_patches/fixes in ~vlad/ofed_1_2/.git



-- 
Vladimir Sokolovsky [EMAIL PROTECTED]
Mellanox Technologies Ltd.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Bugzilla Bug 329: HCA_FATAL_EVENT cause to OpenSM to stop functioning

2007-01-31 Thread Hal Rosenstock
Hi Yevgeny,

On Wed, 2007-01-31 at 05:16, Yevgeny Kliteynik wrote:
 Hi Hal.
 
 I noticed the following bug in Bugzilla:
 
   Bugzilla Bug 329: HCA_FATAL_EVENT cause to opensm to stop functioning
 https://bugs.openfabrics.org/show_bug.cgi?id=329
 
   When there is a HCA fatal event on the host that opensm is running on 
 it,
   the opensm stop to function (After the event, the driver restart the 
 device,
   and the port does not return to active state).
 
   If the opensm run in sweep mode , after the event you can see that the 
 opensm
   stop sweeping.
 
 I remember that a couple of months ago I sent a patch that takes care of this 
 problem:
  - in case of IBV_EVENT_DEVICE_FATAL, osm was forced to exit
  - in case of IBV_EVENT_PORT_ERROR, osm initiated heavy sweep
 
 The problem with my patch was that it made osm to depend on uverbs module.
 To resolve this problem, support should be added in umad, and then osm could
 use this support.
 
 Do you know if some work in this area was done in umad?

This has been on the list but unfortunately there has been no time yet
to work on the local events support in libibumad.

-- Hal

 -- Yevgeny


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.2 release - to be reviewed in the meeting today

2007-01-31 Thread Pavel Shamis (Pasha)
Tziporet Koren wrote:
 Jeff Squyres wrote:
 It would be helpful to see the MVAPICH1 distribution for OFED 1.2 
 somewhere on the OFA server (under ~vlad/ofed_1_2 or 
 ~vlad/public_html/ofed_1_2...?) for comparison / example purposes.
 Pasha will place his SRPM on ~pasha/ofed_1_2 today
I just finished to prepare the SRPM stuff.
So you may find it:
mvapich - http://www.openfabrics.org/~pasha/ofed_1_2/mvapich/
mpitests - http://www.openfabrics.org/~pasha/ofed_1_2/mpitests/

Pasha
 
 Tziporet
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] Minutes for January 29, 2007 teleconference about OFED 1.2 release integration and build procedures

2007-01-31 Thread Hoang-Nam Nguyen
Hi,
3. Each git maintainer: open ofed_1_2 branch till Feb 1.
created branch ofed_1_2 for libehca.
Regards
Nam


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.2 release - to be reviewed in the meeting today

2007-01-31 Thread Steve Wise
On Mon, 2007-01-29 at 16:05 +0200, Tziporet Koren wrote:
 Hi,
 
 This is the proposal for OFED 1.2 branching and tagging:
 
 Sources developed in OFA:
 1. Each git owner will open a branch with the name ofed_1_2. This
 branch should be opened on 31-Jan (based on code readiness we will
 review today).

ofed_1_2 branch created for libcxgb3.git.


Steve.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] Minutes for January 29, 2007 teleconference about OFED 1.2 release integration and build procedures

2007-01-31 Thread Tziporet Koren
thanks 

-Original Message-
From: Hoang-Nam Nguyen [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 31, 2007 5:14 PM
To: Tziporet Koren
Cc: EWG; [EMAIL PROTECTED]; openib
Subject: Re: [openfabrics-ewg] Minutes for January 29, 2007
teleconference about OFED 1.2 release integration and build procedures

Hi,
3. Each git maintainer: open ofed_1_2 branch till Feb 1.
created branch ofed_1_2 for libehca.
Regards
Nam



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] regression in ofed 1.2

2007-01-31 Thread Steve Wise
Sean,

I think librdmacm commit 1fd83b0bbbfc7fadba45390b98d5f9c944b42bdc broke
iwarp usermode.  I'm debugging now, but basically the change in
rdma_create_qp() to call into the kernel to setup the qp init attributes
doesn't work for iwarp because the iwcm hasn't been created at this
point.  So we fall off a NULL ptr in iw_cm_init_qp_attr().  I'm working
up a fix for this because I think the iw_cm_id _should_ be created at
the time the addr and/or route is resolved.  But it isn't create until
rdma_connect() is issued.  Stay tuned.

Bug 335 opened to track this.




commit 1fd83b0bbbfc7fadba45390b98d5f9c944b42bdc
Author: Sean Hefty [EMAIL PROTECTED]
Date:   Fri Jan 26 10:21:17 2007 -0800

Allow unicast traffic over IPOIB port space.

Adjust the RMDA_PS_IPOIB to allow unicast traffic.  This requires
changing how QPs are initialized in order to get the correct qkey
to use.  We need to call into the kernel to get the initial QP
attributes.

Update the udaddy unicast test program to test this capability.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.2 release - to be reviewed in the meeting today

2007-01-31 Thread Arlin Davis
*Sources developed in OFA:*

1. Each git owner will open a branch with the name ofed_1_2. This branch
should be opened on 31-Jan (based on code readiness we will review today).



  

ofed_1_2 branch created for dapl.git

-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 10/10] osm: QoS in OpenSM

2007-01-31 Thread Hal Rosenstock
Hi Yevgeny,

On Tue, 2007-01-30 at 10:33, Yevgeny Kliteynik wrote:
 Checking PathRecord query for QoS constraints
 
 The QoS-aware path selection logic is implemented in a
 separate function that is called only when QoS in OpenSM
 is on. It causes some code duplication, but the idea is
 to minimize the changes in the existing logic in OSM.
 Eventually, these two function (the old path selection
 and the new QoS-aware path selection) will be merged
 into a single function.

Yes, this would be nice to do in the future as there is much overlap.
Whether qos is carried in the request could be handled internal to this
combined routine rather than outside to determine which routine to call.
This will make for a lot less code.

Some comments embedded below.

 Signed-off-by: Yevgeny Kliteynik [EMAIL PROTECTED]
 ---
  osm/opensm/osm_sa_path_record.c |  822 
 ++-
  1 files changed, 816 insertions(+), 6 deletions(-)
 
 diff --git a/osm/opensm/osm_sa_path_record.c b/osm/opensm/osm_sa_path_record.c
 index a0dbb07..2ff7a42 100644
 --- a/osm/opensm/osm_sa_path_record.c
 +++ b/osm/opensm/osm_sa_path_record.c
 @@ -70,6 +70,7 @@
  #include opensm/osm_router.h
  #include opensm/osm_sa_mcmember_record.h
  #endif
 +#include opensm/osm_qos_parser.h
  
  #define OSM_PR_RCV_POOL_MIN_SIZE64
  #define OSM_PR_RCV_POOL_GROW_SIZE   64
 @@ -87,6 +88,7 @@ typedef struct _osm_path_parms
uint8_trate;
uint8_tsl;
uint8_tpkt_life;
 +  uint16_t   class;
boolean_t  reversible;
  } osm_path_parms_t;
  
 @@ -716,6 +718,799 @@ __osm_pr_rcv_get_path_parms(
  
  /**
   **/
 +
 +static ib_api_status_t
 +__osm_pr_rcv_get_path_parms_qos(

This is the similar function to the non QoS one:
__osm_pr_rcv_get_path_parms

 +  IN osm_pr_rcv_t* const p_rcv,
 +  IN const ib_path_rec_t*  const p_pr,
 +  IN const osm_port_t* const p_src_port,
 +  IN const osm_port_t* const p_dest_port,
 +  IN const uint16_tdest_lid_ho,
 +  IN const ib_net64_t  comp_mask,
 +  OUT osm_path_parms_t*const p_parms )
 +{
 +   const osm_node_t*p_node;
 +   const osm_physp_t*   p_physp;
 +   const osm_physp_t*   p_src_physp;
 +   const osm_physp_t*   p_dest_physp;
 +   const osm_prtn_t*p_prtn;
 +   const ib_port_info_t*p_pi;
 +   ib_api_status_t  status = IB_SUCCESS;
 +   ib_net16_t   pkey = 0;
 +   ib_net16_t   shared_pkey = 0;
 +   uint8_t  mtu = 0;
 +   uint8_t  rate = 0;
 +   uint8_t  pkt_life = 0;
 +   uint8_t  sl = 0;
 +   uint16_t class = 0;
 +   uint8_t  required_mtu;
 +   uint8_t  required_rate;
 +   uint8_t  required_pkt_life;
 +   uint8_t  in_port_num;
 +   uint8_t  out_port_num;
 +   ib_net16_t   dest_lid;
 +   uint8_t  i;
 +   uint8_t  vl;
 +   ib_slvl_table_t *p_slvl_tbl = NULL;
 +   boolean_tvalid_sls[IB_MAX_NUM_VLS];
 +   boolean_tsl2vl_valid_path = FALSE;
 +   uint8_t  first_valid_sl;
 +   osm_qos_level_t *p_qos_level = NULL;
 +
 +   OSM_LOG_ENTER( p_rcv-p_log, __osm_pr_rcv_get_path_parms_qos );
 +
 +   memset(valid_sls,TRUE,sizeof(valid_sls));
 +   dest_lid = cl_hton16( dest_lid_ho );
 +
 +   p_dest_physp = osm_port_get_default_phys_ptr( p_dest_port );
 +   p_physp = osm_port_get_default_phys_ptr( p_src_port );
 +   p_src_physp = p_physp;
 +   p_pi = p_physp-port_info;
 +
 +   mtu = ib_port_info_get_mtu_cap( p_pi );
 +   rate = ib_port_info_compute_rate( p_pi );
 +
 +   /*
 +* Mellanox Tavor device performance is better using 1K MTU.
 +* If required MTU and MTU selector are such that 1K is OK 
 +* and at least one end of the path is Tavor we override the
 +* port MTU with 1K.
 +*/
 +   if ( p_rcv-p_subn-opt.enable_quirks 
 +  __osm_sa_path_rec_apply_tavor_mtu_limit(
 + p_pr, p_src_port, p_dest_port, comp_mask) )
 +   {
 +  if (mtu  IB_MTU_LEN_1024) 
 +  {
 + mtu = IB_MTU_LEN_1024;
 + osm_log( p_rcv-p_log, OSM_LOG_DEBUG,
 +  __osm_pr_rcv_get_path_parms_qos: 
 +  Optimized Path MTU to 1K for Mellanox Tavor device\n);
 +  }
 +   }
 +
 +   /*
 +* Walk the subnet object from source to destination,
 +* tracking the most restrictive rate and mtu values along the way...
 +*
 +* If source port node is a switch, then p_physp should
 +* point to the port that routes the destination lid
 +*/
 +
 +   p_node = osm_physp_get_node_ptr( p_physp );
 +
 +   if( p_node-sw )
 +   {
 +  /* source node is a switch */
 +  in_port_num = osm_physp_get_port_num(p_physp);

Re: [openib-general] [RFC][PATCH] rdma_cm: allow joins to return a unique address

2007-01-31 Thread Sean Hefty
 I understand that your approach relies on the uniqueness of the MGID
 being generated. This means that to have different MPI jobs use
 different MGIDs , the MGIDs must be generated --always-- on the same
 NODE and be propagated to other nodes/ranks participating in that MPI
 job - correct?

MGID uniqueness is provided by the SA when the join request contains an MGID of 
0.  There is no requirement that the MGIDs be generated on the same node.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Diags/ibtracert: Add switch-map option to ibtracert

2007-01-31 Thread Hal Rosenstock
Diags/ibtracert: Add switch-map option to ibtracert

Signed-off-by: Ira K. Weiny [EMAIL PROTECTED]
Signed-off-by: Hal Rosenstock [EMAIL PROTECTED]

diff --git a/diags/man/ibtracert.8 b/diags/man/ibtracert.8
index c1632ac..28f18b6 100644
--- a/diags/man/ibtracert.8
+++ b/diags/man/ibtracert.8
@@ -1,11 +1,11 @@
-.TH IBTRACERT 8 July 25, 2006 OpenIB OpenIB Diagnostics
+.TH IBTRACERT 8 January 31, 2007 OpenIB OpenIB Diagnostics
 
 .SH NAME
 ibtracert\- trace InfiniBand path
 
 .SH SYNOPSIS
 .B ibtracert
-[\-d(ebug)] [-v(erbose)] [\-D(irect)] [\-G(uids)] [-n(o_info)] [-m mlid] [-s 
smlid] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] 
[\-h(elp)] [dest dr_path|lid|guid [startlid [endlid]]]
+[\-d(ebug)] [-v(erbose)] [\-D(irect)] [\-G(uids)] [-n(o_info)] [-m mlid] [-s 
smlid] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] 
[\-\-switch\-map switch-map] [\-h(elp)] [dest dr_path|lid|guid [startlid 
[endlid]]]
 
 .SH DESCRIPTION
 .PP
@@ -23,6 +23,10 @@ simple format; don't show additional inf
 .TP
 \fB\-m\fR
 show the multicast trace of the specified mlid
+.TP
+\fB\-\-switch\-map\fR switch-map
+Specify a switch map.  The switch map file maps GUIDs to more user friendly
+names.  See ibnetdiscover for switch map file format.
 
 .SH COMMON OPTIONS
 
@@ -101,3 +105,6 @@ ibtracert -m 0xc000 4 16# show multi
 .TP
 Hal Rosenstock
 .RI  [EMAIL PROTECTED] 
+.TP
+Ira Weiny
+.RI  [EMAIL PROTECTED] 
diff --git a/diags/src/ibtracert.c b/diags/src/ibtracert.c
index c69ff4e..34da658 100644
--- a/diags/src/ibtracert.c
+++ b/diags/src/ibtracert.c
@@ -35,6 +35,7 @@
 #  include config.h
 #endif /* HAVE_CONFIG_H */
 
+#define _GNU_SOURCE
 #include stdio.h
 #include stdlib.h
 #include unistd.h
@@ -43,6 +44,7 @@
 #include getopt.h
 #include netinet/in.h
 #include inttypes.h
+#include errno.h
 
 #define __BUILD_VERSION_TAG__ 1.2
 #include common.h
@@ -65,6 +67,8 @@ static int force;
 static FILE *f;
 
 static char *argv0 = ibtracert;
+static char *switch_map = NULL;
+static FILE *switch_map_fp = NULL;
 
 #undef DEBUG
 #defineDEBUG   if (ibdebug || verbose) IBWARN
@@ -146,6 +150,68 @@ clean_nodedesc(char *nodedesc)
return (nodedesc);
 }
 
+/** =
+ */
+static void
+open_switch_map(void)
+{
+   if (switch_map) {
+   switch_map_fp = fopen(switch_map, r);
+   if (switch_map_fp == NULL) {
+   fprintf(stderr,
+   WARNING failed to open switch map \%s\ 
(%s)\n
+  Switch names will default to node 
descriptions\n,
+   switch_map, strerror(errno));
+   }
+   }
+}
+
+static void
+close_switch_map(void)
+{
+   if (switch_map_fp)
+   fclose(switch_map_fp);
+}
+
+static char *
+lookup_switch_name(Node *node)
+{
+#define NAME_LEN (256)
+   char *line = NULL;
+   size_tlen = 0;
+   uint64_t  guid = 0;
+   char *rc = NULL;
+   int   line_count = 0;
+   uint64_t  target_guid = node-nodeguid;
+
+   if (switch_map_fp == NULL)
+   goto done;
+
+   rewind(switch_map_fp);
+   for (line_count = 1;
+   getline(line, len, switch_map_fp) != -1;
+   line_count++) {
+   line[len-1] = '\0';
+   if (line[0] == '#') { goto next_one; }
+   char *guid_str = strtok(line, \#);
+   char *name = strtok(NULL, \#);
+   if (!guid_str || !name) { goto next_one; }
+   guid = strtoull(guid_str, NULL, 0);
+   if (target_guid == guid) {
+   rc = strdup(name);
+   free(line);
+   goto done;
+   }
+next_one:
+   free (line);
+   line = NULL;
+   }
+done:
+   if (rc == NULL)
+   rc = strdup(clean_nodedesc(node-nodedesc));
+   return (rc);
+}
+
 static int
 get_node(Node *node, Port *port, ib_portid_t *portid)
 {
@@ -234,13 +300,20 @@ dump_endnode(int dump, char *prompt, Nod
return;
}
 
-   nodename = clean_nodedesc(node-nodedesc);
+   if (node-type == IB_NODE_SWITCH)
+   nodename = lookup_switch_name(node);
+   else
+   nodename = clean_nodedesc(node-nodedesc);
+
fprintf(f, %s %s {0x%016 PRIx64 } portnum %d lid 0x%x-0x%x \%s\\n,
prompt,
(node-type = IB_NODE_MAX ? node_type_str[node-type] : ???),
node-nodeguid, node-type == IB_NODE_SWITCH ? 0 : 
port-portnum,
port-lid, port-lid + (1  port-lmc) - 1,
nodename);
+
+   if (nodename  (node-type == IB_NODE_SWITCH))
+   free(nodename);
 }
 
 static void
@@ -251,7 +324,11 @@ dump_route(int dump, Node *node, int out
if (!dump  !verbose)
return;
 
-   nodename = 

[openib-general] Diags/ibnetdiscover: Add switch-map option to ibnetdiscover

2007-01-31 Thread Hal Rosenstock
Diags/ibnetdiscover: Add switch-map option to ibnetdiscover

Signed-off-by: Ira K. Weiny [EMAIL PROTECTED]
Signed-off-by: Hal Rosenstock [EMAIL PROTECTED]

diff --git a/diags/man/ibnetdiscover.8 b/diags/man/ibnetdiscover.8
index 4960a8f..000edb5 100644
--- a/diags/man/ibnetdiscover.8
+++ b/diags/man/ibnetdiscover.8
@@ -1,11 +1,11 @@
-.TH IBNETDISCOVER 8 October 16, 2006 OpenIB OpenIB Diagnostics
+.TH IBNETDISCOVER 8 January 31, 2007 OpenIB OpenIB Diagnostics
 
 .SH NAME
 ibnetdiscover \- discover InfiniBand topology
 
 .SH SYNOPSIS
 .B ibnetdiscover
-[\-d(ebug)] [\-e(rr_show)] [\-v(erbose)] [\-s(how)] [\-l(ist)] [\-g(rouping)] 
[\-H(ca_list)] [\-S(witch_list)] [\-C ca_name] [\-P ca_port] [\-t(imeout) 
timeout_ms] [\-V(ersion)] [\-h(elp)] [topology-file]
+[\-d(ebug)] [\-e(rr_show)] [\-v(erbose)] [\-s(how)] [\-l(ist)] [\-g(rouping)] 
[\-H(ca_list)] [\-S(witch_list)] [\-C ca_name] [\-P ca_port] [\-t(imeout) 
timeout_ms] [\-V(ersion)] [\--switch-map switch-map] [\-h(elp)] 
[topology-file]
 
 .SH DESCRIPTION
 .PP
@@ -34,6 +34,10 @@ List of connected switches
 .TP
 \fB\-s\fR, \fB\-\-show\fR
 Show more information
+.TP
+\fB\-\-switch\-map\fR switch-map
+Specify a switch map.  The switch map file maps GUIDs to more user friendly
+names.  See file format below.
 
 .SH COMMON OPTIONS
 
@@ -89,7 +93,63 @@ by the following criteria:
 If a port and/or CA name is specified, the user request is  
 attempted to be fulfilled, and will fail if it is not possible.
 
+.SH SWITCH MAP FILE FORMAT
+The switch map is used to specify a user friendly name for switches in the
+output.  GUIDs are used to perform the lookup.
+
+.TP
+\fBGenerically:\fR
+
+# comment
+.br
+guid name
+
+.TP
+\fBExample:\fR
+
+# IB1
+.br
+# Line cards
+.br
+0x0008f104003f125c IB1 (Rack 11 slot 1   ) ISR9288/ISR9096 Voltaire sLB-24D
+.br
+0x0008f104003f125d IB1 (Rack 11 slot 1   ) ISR9288/ISR9096 Voltaire sLB-24D
+.br
+0x0008f104003f10d2 IB1 (Rack 11 slot 2   ) ISR9288/ISR9096 Voltaire sLB-24D
+.br
+0x0008f104003f10d3 IB1 (Rack 11 slot 2   ) ISR9288/ISR9096 Voltaire sLB-24D
+.br
+0x0008f104003f10bf IB1 (Rack 11 slot 12  ) ISR9288/ISR9096 Voltaire sLB-24D
+.br
+.br
+# Spines
+.br
+0x0008f10400400e2d IB1 (Rack 11 spine 1   ) ISR9288 Voltaire sFB-12D
+.br
+0x0008f10400400e2e IB1 (Rack 11 spine 1   ) ISR9288 Voltaire sFB-12D
+.br
+0x0008f10400400e2f IB1 (Rack 11 spine 1   ) ISR9288 Voltaire sFB-12D
+.br
+0x0008f10400400e31 IB1 (Rack 11 spine 2   ) ISR9288 Voltaire sFB-12D
+.br
+0x0008f10400400e32 IB1 (Rack 11 spine 2   ) ISR9288 Voltaire sFB-12D
+.br
+.br
+# GUID   Switch Name
+.br
+0x0008f10400411a08 SW1  (Rack  3) ISR9024 Voltaire 9024D
+.br
+0x0008f10400411a28 SW2  (Rack  3) ISR9024 Voltaire 9024D
+.br
+0x0008f10400411a34 SW3  (Rack  3) ISR9024 Voltaire 9024D
+.br
+0x0008f104004119d0 SW4  (Rack  3) ISR9024 Voltaire 9024D
+.br
+
 .SH AUTHOR
 .TP
 Hal Rosenstock
 .RI  [EMAIL PROTECTED] 
+.TP
+Ira Weiny
+.RI  [EMAIL PROTECTED] 
diff --git a/diags/src/ibnetdiscover.c b/diags/src/ibnetdiscover.c
index ec47961..c0ed563 100644
--- a/diags/src/ibnetdiscover.c
+++ b/diags/src/ibnetdiscover.c
@@ -74,6 +74,9 @@ static FILE *f;
 
 static char *argv0 = ibnetdiscover;
 
+static char *switch_map = NULL;
+static FILE *switch_map_fp = NULL;
+
 Node *nodesdist[MAXHOPS+1]; /* last is Ca list */
 Node *mynode;
 int maxhops_discovered = 0;
@@ -201,6 +204,68 @@ clean_nodedesc(char *nodedesc)
return (nodedesc);
 }
 
+/** =
+ */
+static void
+open_switch_map(void)
+{
+   if (switch_map != NULL) {
+   switch_map_fp = fopen(switch_map, r);
+   if (switch_map_fp == NULL) {
+   fprintf(stderr,
+   WARNING failed to open switch map \%s\ 
(%s)\n,
+   switch_map, strerror(errno));
+   }
+   }
+}
+
+static void
+close_switch_map(void)
+{
+   if (switch_map_fp)
+   fclose(switch_map_fp);
+}
+
+static char *
+lookup_switch_name(Node *node)
+{
+#define NAME_LEN (256)
+   char *line = NULL;
+   size_tlen = 0;
+   uint64_t  guid = 0;
+   char *rc = NULL;
+   int   line_count = 0;
+   uint64_t  target_guid = node-nodeguid;
+
+   if (switch_map_fp == NULL)
+   goto done;
+
+   rewind(switch_map_fp);
+   for (line_count = 1;
+   getline(line, len, switch_map_fp) != -1;
+   line_count++) {
+   line[len-1] = '\0';
+   if (line[0] == '#') { goto next_one; }
+   char *guid_str = strtok(line, \#);
+   char *name = strtok(NULL, \#);
+   if (!guid_str || !name) { goto next_one; }
+   guid = strtoull(guid_str, NULL, 0);
+   if (target_guid == guid)
+   {
+   rc = strdup(name);
+   free (line);
+   goto done;
+   }
+next_one:
+

Re: [openib-general] [PATCH 0/10] osm: QoS in OpenSM

2007-01-31 Thread Hal Rosenstock
Hi Yevgeny,

On Tue, 2007-01-30 at 09:51, Yevgeny Kliteynik wrote:
 Hi Hal.
 
 The following is a series of 10 patches:
 1. QoS policy file parser Yacc file
 2. QoS policy file parser Lex file
 3. QoS policy file parser Yacc  Lex generated files
 4. QoS policy file parser header file
 5. QoS policy file parser C file with auxiliary functions
 6. Compilation changes for QoS policy file parser:
Added new files to makefiles.
Introduced new configuration switch '--enable-maintainer-mode',
which will run Lex  Yacc instead of just using the generated
files.
 7. Renamed static function find_prtn_by_name() to non-static 
 osm_prtn_find_by_name()
This function will be used later by the PathRecord logic.
 8. Added QoS class and service id fields to the path record.
 9. Added new command line option for OSM: '-Y' or '--qos_policy_file'
 10.Checking PathRecord query for QoS constraints.

Is everyone on the list satisfied with an XML format or should there be
a text version ? Is anyone concerned about the ease of configuring XML
for QoS ?

IMO, the XML syntax needs to be explained, discussed, and vetted on the
list. I am hopping this can occur reasonably quickly. If we are doing
XML for this, we need to get to a stable agreed syntax.

A couple of missing minor things:
SA ClassPortInfo and SA MultiPathRecord similar to PathRecord

A major missing component is a QoS manager which supports the granular
configuration of the SL2VL and VLArb tables. Based on our experience
with the existing QoS manager, this effort is not to be minimized. If
this is not part of this package, a fair portion of the QoS syntax is
dormant. I know this can be run on top of the existing QoS manager to
get a more complete QoS solution than what already exists so this could
be considered an stepping stone towards that.

-- Hal

 --
 Yevgeny
 
 Signed-off-by:  Yevgeny Kliteynik [EMAIL PROTECTED]
  
 
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/10] osm: QoS in OpenSM

2007-01-31 Thread Jason Gunthorpe
On Wed, Jan 31, 2007 at 12:41:43PM -0500, Hal Rosenstock wrote:

 IMO, the XML syntax needs to be explained, discussed, and vetted on the
 list. I am hopping this can occur reasonably quickly. If we are doing
 XML for this, we need to get to a stable agreed syntax.

I didn't see a DTD or schema float by for the XML.. IMHO a DTD is
essential for a complex XML like this.

Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [libibverbs] destroying an AH causes a seg fault (this failure appeared during the last night)

2007-01-31 Thread Roland Dreier
ugh -- OK, see my reply in the thread with mst's diagnosis...

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] The ibv_cmd_* create functions need to set the context.

2007-01-31 Thread Michael S. Tsirkin
 Quoting Steve Wise [EMAIL PROTECTED]:
 Subject: Re: [PATCH] The ibv_cmd_* create functions need to set the context.
 
 On Wed, 2007-01-31 at 12:24 +0200, Michael S. Tsirkin wrote:
   Quoting Roland Dreier [EMAIL PROTECTED]:
   Subject: Re: [PATCH] The ibv_cmd_* create functions need to set the 
   context.
   
   Thanks, applied to master and stable branches.
  
  Did you test it?
  This patch (8b3d225476c99ea29a68109a7d40e5ef353d4388) causes ibv_ud_pingpong
  to segfault on libmthca: libmthca never calls ibv_cmd_create_ah to context 
  is now
  never set.
  
  
 
 I didn't test UD.  

Well, when you touch the AH functions, UD is really the only way to test them.

 
  Starting program: /usr/local/ofed/bin/ibv_ud_pingpong sw069
  [Thread debugging using libthread_db enabled]
  [New Thread 47299578320592 (LWP 5085)]
local address:  LID 0x0002, QPN 0x090406, PSN 0x71bffb
remote address: LID 0x0001, QPN 0x040406, PSN 0x92316a
  4096000 bytes in 0.02 seconds = 1893.99 Mbit/sec
  1000 iters in 0.02 seconds = 17.30 usec/iter
  
  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 47299578320592 (LWP 5085)]
  0x2b04ca3b7263 in __ibv_destroy_ah (ah=0x5050b0) at src/verbs.c:475
  475 return ah-context-ops.destroy_ah(ah);
  (gdb) p ah-context
  $1 = (struct ibv_context *) 0x0
  
  I actually think this approach is a wrong one: context should be
  set in common code like ibv_create_ah, not in ibv_cmd_ which is
  a library function low level driver might or might not call.
  And certainly this kind of change does not seem appropriate for stable 
  branch.
  
  I think the proper thing is for low level driver not to assume that
  fields such as contex are intialized until create functions have returned.
  Steve, pls fix your low level driver not to rely on this.
  
 
 The issue is that the provider lib calls ibv_cmd_create_blah to create
 the object, then some failure happens (like a failure mmap()ing the
 object's DMA area to the process).  At this point the provider lib must
 destroy this object that is created from the perspective of the ibv_cmd*
 interface.  The only way to do that is to call the ibv_cmd_destroy_blah
 call, which needs the context field.

For stable, in case of error, set the context in the provider lib then?

 So I don't think solving this in the provider lib is the right thing to
 do.

At least for stable branch, this seams more sensible than the disruptive
patch that was applied. Roland, what do you think?

For master, maybe ibv_cmd destructors should get the context as a parameter?

  Roland, I have reverted this in OFED, please revert on master and stable.
  
 
 I think we should fix the bug introduced:  set the context field in the
 ibv_create_blah service if its not set after calling the provider
 method.

This is ugly as well, but at least it would work.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] The ibv_cmd_* create functions need to set the context.

2007-01-31 Thread Roland Dreier
  I actually think this approach is a wrong one: context should be
  set in common code like ibv_create_ah, not in ibv_cmd_ which is
  a library function low level driver might or might not call.
  And certainly this kind of change does not seem appropriate for stable 
  branch.
  
  I think the proper thing is for low level driver not to assume that
  fields such as contex are intialized until create functions have returned.
  Steve, pls fix your low level driver not to rely on this.

Hmm, there's not really any good solution to this.  Really the problem
is that the ibv_cmd_destroy_xxx functions assume the context is set in
the object they are destroying.  But I don't want to change the
signature of those functions at this point in the release cycle.

It's not really very pleasing for low-level drivers to have to know
about the internal assumptions of ibv_cmd_destroy_xxx either.

I think what I'll do is the following:
 - add the assignments to context back into ibv_create_ah() and so
   on.  context will get assigned in two places but oh well.
 - early in the libibverbs 1.2 cycle, change the signature of
   ibv_cmd_destroy_xxx so that low-level drivers need to explicitly
   pass in the context to use.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Bugzilla Bug 329: HCA_FATAL_EVENT cause to OpenSM to stop functioning

2007-01-31 Thread Michael S. Tsirkin
 Quoting Hal Rosenstock [EMAIL PROTECTED]:
 Subject: Re: Bugzilla Bug 329: HCA_FATAL_EVENT cause to OpenSM to stop 
 functioning
 
 Hi Yevgeny,
 
 On Wed, 2007-01-31 at 05:16, Yevgeny Kliteynik wrote:
  Hi Hal.
  
  I noticed the following bug in Bugzilla:
  
  Bugzilla Bug 329: HCA_FATAL_EVENT cause to opensm to stop functioning
https://bugs.openfabrics.org/show_bug.cgi?id=329
  
  When there is a HCA fatal event on the host that opensm is running on 
  it,
  the opensm stop to function (After the event, the driver restart the 
  device,
  and the port does not return to active state).
  
  If the opensm run in sweep mode , after the event you can see that the 
  opensm
  stop sweeping.
  
  I remember that a couple of months ago I sent a patch that takes care of 
  this problem:
   - in case of IBV_EVENT_DEVICE_FATAL, osm was forced to exit
   - in case of IBV_EVENT_PORT_ERROR, osm initiated heavy sweep
  
  The problem with my patch was that it made osm to depend on uverbs module.
  To resolve this problem, support should be added in umad, and then osm could
  use this support.
  
  Do you know if some work in this area was done in umad?
 
 This has been on the list but unfortunately there has been no time yet
 to work on the local events support in libibumad.

I do not think making libibmad depend on ib_uverbs module is a good idea either.
More properly, the problem is in ib_umad which does not report hotplug events.
If we just make ib_umad return an error code to user on hotplug,
the problem will go away without userspace changes.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] The ibv_cmd_* create functions need to set the context.

2007-01-31 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH] The ibv_cmd_* create functions need to set the context.
 
   I actually think this approach is a wrong one: context should be
   set in common code like ibv_create_ah, not in ibv_cmd_ which is
   a library function low level driver might or might not call.
   And certainly this kind of change does not seem appropriate for stable 
 branch.
   
   I think the proper thing is for low level driver not to assume that
   fields such as contex are intialized until create functions have returned.
   Steve, pls fix your low level driver not to rely on this.
 
 Hmm, there's not really any good solution to this.  Really the problem
 is that the ibv_cmd_destroy_xxx functions assume the context is set in
 the object they are destroying.  But I don't want to change the
 signature of those functions at this point in the release cycle.
 
 It's not really very pleasing for low-level drivers to have to know
 about the internal assumptions of ibv_cmd_destroy_xxx either.
 
 I think what I'll do is the following:
  - add the assignments to context back into ibv_create_ah() and so
on.  context will get assigned in two places but oh well.
  - early in the libibverbs 1.2 cycle, change the signature of
ibv_cmd_destroy_xxx so that low-level drivers need to explicitly
pass in the context to use.

This might work.
However, I wonder about stable branch - is it wise for a provider
to depend on a specific libibverbs 1.0.x version?
Surely just working atround this by setting up context field
before destroy cmd makes more sense?

And if the providers implement the work-around anyway,
should we implement hacks to work-around this in libibverbs as well?

What I am trying to propose is delaying the whole change till 1.2,
and doing the work-around in provider lib for now.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] The ibv_cmd_* create functions need to set the context.

2007-01-31 Thread Roland Dreier
  However, I wonder about stable branch - is it wise for a provider
  to depend on a specific libibverbs 1.0.x version?
  Surely just working atround this by setting up context field
  before destroy cmd makes more sense?

I think I'll just revert the change from libibverbs 1.0.x.  libcxgb3
(the impetus for this change) will never work with libibverbs 1.0 anyway.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC 4 of 5] IB/mthca: QoS support

2007-01-31 Thread Hal Rosenstock
On Mon, 2007-01-22 at 09:50, Michael S. Tsirkin wrote:
 encode SL in sched_queue field to improve hardware QoS guarantees
 for connected QPs.

Is UD already handled properly in terms of mthca ?

-- Hal

 Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]
 
 ---
 
 Index: linux-2.6/drivers/infiniband/hw/mthca/mthca_qp.c
 ===
 --- linux-2.6.orig/drivers/infiniband/hw/mthca/mthca_qp.c
 +++ linux-2.6/drivers/infiniband/hw/mthca/mthca_qp.c
 @@ -49,6 +49,10 @@
  #include mthca_memfree.h
  #include mthca_wqe.h
  
 +static int mthca_qos_support = 0;
 +module_param_named(qos_support, mthca_qos_support, int, 0644);
 +MODULE_PARM_DESC(qos_support, Enable QoS support if  0);
 +
  enum {
   MTHCA_MAX_DIRECT_QP_SIZE = 4 * PAGE_SIZE,
   MTHCA_ACK_REQ_FREQ   = 10,
 @@ -694,6 +698,19 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
   goto out_mailbox;
  
   qp_param-opt_param_mask |= 
 cpu_to_be32(MTHCA_QP_OPTPAR_PRIMARY_ADDR_PATH);
 + if (mthca_qos_support) {
 + u8 sl = attr-ah_attr.sl;
 + u8 sched_queue = (sl  0x8) | (sl  (~(sl  1))  0x4) 
 |
 + ((sl  1)  (sl  2)  0x2) | ((sl  1)  
 0x1);
 +
 + if (mthca_is_memfree(dev)) {
 + qp_context-rlkey_arbel_sched_queue |= 
 sched_queue;
 + } else {
 + qp_context-tavor_sched_queue |= sched_queue;
 + }
 + qp_param-opt_param_mask |=
 + cpu_to_be32(MTHCA_QP_OPTPAR_SCHED_QUEUE);
 + }
   }
  
   if (attr_mask  IB_QP_TIMEOUT) {
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC 4 of 5] IB/mthca: QoS support

2007-01-31 Thread Michael S. Tsirkin
 Quoting Hal Rosenstock [EMAIL PROTECTED]:
 Subject: Re: [PATCH RFC 4 of 5] IB/mthca: QoS support
 
 On Mon, 2007-01-22 at 09:50, Michael S. Tsirkin wrote:
  encode SL in sched_queue field to improve hardware QoS guarantees
  for connected QPs.
 
 Is UD already handled properly in terms of mthca ?

It's not the question of proper handling - this patch is an enhancement,
not really a bug fix. I think mthca already does the best it can with UD AVs.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC 4 of 5] IB/mthca: QoS support

2007-01-31 Thread Hal Rosenstock
On Wed, 2007-01-31 at 14:09, Michael S. Tsirkin wrote:
  Quoting Hal Rosenstock [EMAIL PROTECTED]:
  Subject: Re: [PATCH RFC 4 of 5] IB/mthca: QoS support
  
  On Mon, 2007-01-22 at 09:50, Michael S. Tsirkin wrote:
   encode SL in sched_queue field to improve hardware QoS guarantees
   for connected QPs.
  
  Is UD already handled properly in terms of mthca ?
 
 It's not the question of proper handling - this patch is an enhancement,
 not really a bug fix. I think mthca already does the best it can with UD AVs.

So there are no scheduling parameters or anything else that needs
tweaking in mthca in terms of the SL for UD AVs ? Just want to be sure.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC 4 of 5] IB/mthca: QoS support

2007-01-31 Thread Michael S. Tsirkin
 Quoting Hal Rosenstock [EMAIL PROTECTED]:
 Subject: Re: [PATCH RFC 4 of 5] IB/mthca: QoS support
 
 On Wed, 2007-01-31 at 14:09, Michael S. Tsirkin wrote:
   Quoting Hal Rosenstock [EMAIL PROTECTED]:
   Subject: Re: [PATCH RFC 4 of 5] IB/mthca: QoS support
   
   On Mon, 2007-01-22 at 09:50, Michael S. Tsirkin wrote:
encode SL in sched_queue field to improve hardware QoS guarantees
for connected QPs.
   
   Is UD already handled properly in terms of mthca ?
  
  It's not the question of proper handling - this patch is an enhancement,
  not really a bug fix. I think mthca already does the best it can with UD 
  AVs.
 
 So there are no scheduling parameters or anything else that needs
 tweaking in mthca in terms of the SL for UD AVs ? Just want to be sure.

Not that I know.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Sean Hefty
Here's a first attempt at a patch to allow the latest librdmacm to work with 
kernel ABI
version 3 without crashing the kernel.  If you're trying to use a developmental 
kernel
that has ABI 4, you'll have to update the kernel cma.

Note that I didn't actually run this against an older kernel (I need to reload 
that on my
system), but did test this fix by forcing the abi to version 3 with a newer 
kernel loaded.

Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
diff --git a/src/cma.c b/src/cma.c
index 2d2a587..c5f8cd9 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -653,11 +653,49 @@ static int ucma_modify_qp_err(struct rdma_cm_id *id)
return ibv_modify_qp(id-qp, qp_attr, IBV_QP_STATE);
 }
 
+static int ucma_find_pkey(struct cma_device *cma_dev, uint8_t port_num,
+ uint16_t pkey, uint16_t *pkey_index)
+{
+   int ret, i;
+   uint16_t chk_pkey;
+
+   for (i = 0, ret = 0; !ret; i++) {
+   ret = ibv_query_pkey(cma_dev-verbs, port_num, i, chk_pkey);
+   if (!ret  pkey == chk_pkey) {
+   *pkey_index = (uint16_t) i;
+   return 0;
+   }
+   }
+   return -EINVAL;
+}
+
+static int ucma_init_conn_qp3(struct cma_id_private *id_priv, struct ibv_qp 
*qp)
+{
+   struct ibv_qp_attr qp_attr;
+   int ret;
+
+   ret = ucma_find_pkey(id_priv-cma_dev, id_priv-id.port_num,
+id_priv-id.route.addr.addr.ibaddr.pkey,
+qp_attr.pkey_index);
+   if (ret)
+   return ret;
+
+   qp_attr.port_num = id_priv-id.port_num;
+   qp_attr.qp_state = IBV_QPS_INIT;
+   qp_attr.qp_access_flags = 0;
+
+   return ibv_modify_qp(qp, qp_attr, IBV_QP_STATE | IBV_QP_ACCESS_FLAGS |
+  IBV_QP_PKEY_INDEX | IBV_QP_PORT);
+}
+
 static int ucma_init_conn_qp(struct cma_id_private *id_priv, struct ibv_qp *qp)
 {
struct ibv_qp_attr qp_attr;
int qp_attr_mask, ret;
 
+   if (abi_ver == 3)
+   return ucma_init_conn_qp3(id_priv, qp);
+
qp_attr.qp_state = IBV_QPS_INIT;
ret = rdma_init_qp_attr(id_priv-id, qp_attr, qp_attr_mask);
if (ret)
@@ -666,11 +704,44 @@ static int ucma_init_conn_qp(struct cma_id_private 
*id_priv, struct
ibv_qp *qp)
return ibv_modify_qp(qp, qp_attr, qp_attr_mask);
 }
 
+static int ucma_init_ud_qp3(struct cma_id_private *id_priv, struct ibv_qp *qp)
+{
+   struct ibv_qp_attr qp_attr;
+   int ret;
+
+   ret = ucma_find_pkey(id_priv-cma_dev, id_priv-id.port_num,
+id_priv-id.route.addr.addr.ibaddr.pkey,
+qp_attr.pkey_index);
+   if (ret)
+   return ret;
+
+   qp_attr.port_num = id_priv-id.port_num;
+   qp_attr.qp_state = IBV_QPS_INIT;
+   qp_attr.qkey = RDMA_UDP_QKEY;
+
+   ret = ibv_modify_qp(qp, qp_attr, IBV_QP_STATE | IBV_QP_QKEY |
+ IBV_QP_PKEY_INDEX | IBV_QP_PORT);
+   if (ret)
+   return ret;
+
+   qp_attr.qp_state = IBV_QPS_RTR;
+   ret = ibv_modify_qp(qp, qp_attr, IBV_QP_STATE);
+   if (ret)
+   return ret;
+
+   qp_attr.qp_state = IBV_QPS_RTS;
+   qp_attr.sq_psn = 0;
+   return ibv_modify_qp(qp, qp_attr, IBV_QP_STATE | IBV_QP_SQ_PSN);
+}
+
 static int ucma_init_ud_qp(struct cma_id_private *id_priv, struct ibv_qp *qp)
 {
struct ibv_qp_attr qp_attr;
int qp_attr_mask, ret;
 
+   if (abi_ver == 3)
+   return ucma_init_ud_qp3(id_priv, qp);
+
qp_attr.qp_state = IBV_QPS_INIT;
ret = rdma_init_qp_attr(id_priv-id, qp_attr, qp_attr_mask);
if (ret)


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ip_ib_mc_map?

2007-01-31 Thread Steve Wise
where can I find this symbol?  I can't load rdma_cm on rhel4u4...

rdma_cm: Unknown symbol ip_ib_mc_map



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ip_ib_mc_map?

2007-01-31 Thread Steve Wise
Perhaps there's no backport for this to rhel4u4?


On Wed, 2007-01-31 at 12:32 -0800, Sean Hefty wrote:
 where can I find this symbol?  I can't load rdma_cm on rhel4u4...
 
 rdma_cm: Unknown symbol ip_ib_mc_map
 
 This is in include/net/ip.h for current systems.  It is part of ipoib support.
 
 - Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ip_ib_mc_map?

2007-01-31 Thread Sean Hefty
where can I find this symbol?  I can't load rdma_cm on rhel4u4...

rdma_cm: Unknown symbol ip_ib_mc_map

This is in include/net/ip.h for current systems.  It is part of ipoib support.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Steve Wise
Should this be a problem for OFED 1.2?  I would think the ABI for all
backports should be the same, so it wouldn't be a problem.  Is this
true?  I'm assuming all backported UCMA modules would have the same
ABI.  



On Wed, 2007-01-31 at 11:19 -0800, Sean Hefty wrote:
 Here's a first attempt at a patch to allow the latest librdmacm to work with 
 kernel ABI
 version 3 without crashing the kernel.  If you're trying to use a 
 developmental kernel
 that has ABI 4, you'll have to update the kernel cma.
 
 Note that I didn't actually run this against an older kernel (I need to 
 reload that on my
 system), but did test this fix by forcing the abi to version 3 with a newer 
 kernel loaded.
 
 Signed-off-by: Sean Hefty [EMAIL PROTECTED]
 ---
 diff --git a/src/cma.c b/src/cma.c
 index 2d2a587..c5f8cd9 100644
 --- a/src/cma.c
 +++ b/src/cma.c
 @@ -653,11 +653,49 @@ static int ucma_modify_qp_err(struct rdma_cm_id *id)
   return ibv_modify_qp(id-qp, qp_attr, IBV_QP_STATE);
  }
  
 +static int ucma_find_pkey(struct cma_device *cma_dev, uint8_t port_num,
 +   uint16_t pkey, uint16_t *pkey_index)
 +{
 + int ret, i;
 + uint16_t chk_pkey;
 +
 + for (i = 0, ret = 0; !ret; i++) {
 + ret = ibv_query_pkey(cma_dev-verbs, port_num, i, chk_pkey);
 + if (!ret  pkey == chk_pkey) {
 + *pkey_index = (uint16_t) i;
 + return 0;
 + }
 + }
 + return -EINVAL;
 +}
 +
 +static int ucma_init_conn_qp3(struct cma_id_private *id_priv, struct ibv_qp 
 *qp)
 +{
 + struct ibv_qp_attr qp_attr;
 + int ret;
 +
 + ret = ucma_find_pkey(id_priv-cma_dev, id_priv-id.port_num,
 +  id_priv-id.route.addr.addr.ibaddr.pkey,
 +  qp_attr.pkey_index);
 + if (ret)
 + return ret;
 +
 + qp_attr.port_num = id_priv-id.port_num;
 + qp_attr.qp_state = IBV_QPS_INIT;
 + qp_attr.qp_access_flags = 0;
 +
 + return ibv_modify_qp(qp, qp_attr, IBV_QP_STATE | IBV_QP_ACCESS_FLAGS |
 +IBV_QP_PKEY_INDEX | IBV_QP_PORT);
 +}
 +
  static int ucma_init_conn_qp(struct cma_id_private *id_priv, struct ibv_qp 
 *qp)
  {
   struct ibv_qp_attr qp_attr;
   int qp_attr_mask, ret;
  
 + if (abi_ver == 3)
 + return ucma_init_conn_qp3(id_priv, qp);
 +
   qp_attr.qp_state = IBV_QPS_INIT;
   ret = rdma_init_qp_attr(id_priv-id, qp_attr, qp_attr_mask);
   if (ret)
 @@ -666,11 +704,44 @@ static int ucma_init_conn_qp(struct cma_id_private 
 *id_priv, struct
 ibv_qp *qp)
   return ibv_modify_qp(qp, qp_attr, qp_attr_mask);
  }
  
 +static int ucma_init_ud_qp3(struct cma_id_private *id_priv, struct ibv_qp 
 *qp)
 +{
 + struct ibv_qp_attr qp_attr;
 + int ret;
 +
 + ret = ucma_find_pkey(id_priv-cma_dev, id_priv-id.port_num,
 +  id_priv-id.route.addr.addr.ibaddr.pkey,
 +  qp_attr.pkey_index);
 + if (ret)
 + return ret;
 +
 + qp_attr.port_num = id_priv-id.port_num;
 + qp_attr.qp_state = IBV_QPS_INIT;
 + qp_attr.qkey = RDMA_UDP_QKEY;
 +
 + ret = ibv_modify_qp(qp, qp_attr, IBV_QP_STATE | IBV_QP_QKEY |
 +   IBV_QP_PKEY_INDEX | IBV_QP_PORT);
 + if (ret)
 + return ret;
 +
 + qp_attr.qp_state = IBV_QPS_RTR;
 + ret = ibv_modify_qp(qp, qp_attr, IBV_QP_STATE);
 + if (ret)
 + return ret;
 +
 + qp_attr.qp_state = IBV_QPS_RTS;
 + qp_attr.sq_psn = 0;
 + return ibv_modify_qp(qp, qp_attr, IBV_QP_STATE | IBV_QP_SQ_PSN);
 +}
 +
  static int ucma_init_ud_qp(struct cma_id_private *id_priv, struct ibv_qp *qp)
  {
   struct ibv_qp_attr qp_attr;
   int qp_attr_mask, ret;
  
 + if (abi_ver == 3)
 + return ucma_init_ud_qp3(id_priv, qp);
 +
   qp_attr.qp_state = IBV_QPS_INIT;
   ret = rdma_init_qp_attr(id_priv-id, qp_attr, qp_attr_mask);
   if (ret)
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 10/10] osm: QoS in OpenSM

2007-01-31 Thread Sasha Khapyorsky
On 17:33 Tue 30 Jan , Yevgeny Kliteynik wrote:
 Checking PathRecord query for QoS constraints
 
 The QoS-aware path selection logic is implemented in a
 separate function that is called only when QoS in OpenSM
 is on. It causes some code duplication, but the idea is
 to minimize the changes in the existing logic in OSM.
 Eventually, these two function (the old path selection
 and the new QoS-aware path selection) will be merged
 into a single function.

Please merge __osm_pr_rcv_get_path_parms() and
__osm_pr_rcv_get_path_parms_qos() functions into single one - as you
stated most code is duplicated there.

In fact __osm_pr_rcv_get_path_parms() is most changeable function in
SA PR processor, and it is not good idea to make this twice. IMHO it
creates more ground for future bugs comparing to the risk of possible
impacts to existing functionality.

This also will make your patch much more review friendly.

Thanks,
Sasha

 
 Signed-off-by: Yevgeny Kliteynik [EMAIL PROTECTED]
 ---
  osm/opensm/osm_sa_path_record.c |  822 
 ++-
  1 files changed, 816 insertions(+), 6 deletions(-)
 
 diff --git a/osm/opensm/osm_sa_path_record.c b/osm/opensm/osm_sa_path_record.c
 index a0dbb07..2ff7a42 100644
 --- a/osm/opensm/osm_sa_path_record.c
 +++ b/osm/opensm/osm_sa_path_record.c
 @@ -70,6 +70,7 @@
  #include opensm/osm_router.h
  #include opensm/osm_sa_mcmember_record.h
  #endif
 +#include opensm/osm_qos_parser.h
  
  #define OSM_PR_RCV_POOL_MIN_SIZE64
  #define OSM_PR_RCV_POOL_GROW_SIZE   64
 @@ -87,6 +88,7 @@ typedef struct _osm_path_parms
uint8_trate;
uint8_tsl;
uint8_tpkt_life;
 +  uint16_t   class;
boolean_t  reversible;
  } osm_path_parms_t;
  
 @@ -716,6 +718,799 @@ __osm_pr_rcv_get_path_parms(
  
  /**
   **/
 +
 +static ib_api_status_t
 +__osm_pr_rcv_get_path_parms_qos(
 +  IN osm_pr_rcv_t* const p_rcv,
 +  IN const ib_path_rec_t*  const p_pr,
 +  IN const osm_port_t* const p_src_port,
 +  IN const osm_port_t* const p_dest_port,
 +  IN const uint16_tdest_lid_ho,
 +  IN const ib_net64_t  comp_mask,
 +  OUT osm_path_parms_t*const p_parms )
 +{
 +   const osm_node_t*p_node;
 +   const osm_physp_t*   p_physp;
 +   const osm_physp_t*   p_src_physp;
 +   const osm_physp_t*   p_dest_physp;
 +   const osm_prtn_t*p_prtn;
 +   const ib_port_info_t*p_pi;
 +   ib_api_status_t  status = IB_SUCCESS;
 +   ib_net16_t   pkey = 0;
 +   ib_net16_t   shared_pkey = 0;
 +   uint8_t  mtu = 0;
 +   uint8_t  rate = 0;
 +   uint8_t  pkt_life = 0;
 +   uint8_t  sl = 0;
 +   uint16_t class = 0;
 +   uint8_t  required_mtu;
 +   uint8_t  required_rate;
 +   uint8_t  required_pkt_life;
 +   uint8_t  in_port_num;
 +   uint8_t  out_port_num;
 +   ib_net16_t   dest_lid;
 +   uint8_t  i;
 +   uint8_t  vl;
 +   ib_slvl_table_t *p_slvl_tbl = NULL;
 +   boolean_tvalid_sls[IB_MAX_NUM_VLS];
 +   boolean_tsl2vl_valid_path = FALSE;
 +   uint8_t  first_valid_sl;
 +   osm_qos_level_t *p_qos_level = NULL;
 +
 +   OSM_LOG_ENTER( p_rcv-p_log, __osm_pr_rcv_get_path_parms_qos );
 +
 +   memset(valid_sls,TRUE,sizeof(valid_sls));
 +   dest_lid = cl_hton16( dest_lid_ho );
 +
 +   p_dest_physp = osm_port_get_default_phys_ptr( p_dest_port );
 +   p_physp = osm_port_get_default_phys_ptr( p_src_port );
 +   p_src_physp = p_physp;
 +   p_pi = p_physp-port_info;
 +
 +   mtu = ib_port_info_get_mtu_cap( p_pi );
 +   rate = ib_port_info_compute_rate( p_pi );
 +
 +   /*
 +* Mellanox Tavor device performance is better using 1K MTU.
 +* If required MTU and MTU selector are such that 1K is OK 
 +* and at least one end of the path is Tavor we override the
 +* port MTU with 1K.
 +*/
 +   if ( p_rcv-p_subn-opt.enable_quirks 
 +  __osm_sa_path_rec_apply_tavor_mtu_limit(
 + p_pr, p_src_port, p_dest_port, comp_mask) )
 +   {
 +  if (mtu  IB_MTU_LEN_1024) 
 +  {
 + mtu = IB_MTU_LEN_1024;
 + osm_log( p_rcv-p_log, OSM_LOG_DEBUG,
 +  __osm_pr_rcv_get_path_parms_qos: 
 +  Optimized Path MTU to 1K for Mellanox Tavor device\n);
 +  }
 +   }
 +
 +   /*
 +* Walk the subnet object from source to destination,
 +* tracking the most restrictive rate and mtu values along the way...
 +*
 +* If source port node is a switch, then p_physp should
 +* point to the port that routes the destination lid
 +*/
 +
 +   p_node = osm_physp_get_node_ptr( p_physp );
 +
 +   if( p_node-sw )
 + 

Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Sean Hefty
Steve Wise wrote:
 Should this be a problem for OFED 1.2?  I would think the ABI for all
 backports should be the same, so it wouldn't be a problem.  Is this
 true?  I'm assuming all backported UCMA modules would have the same
 ABI.  

This is a problem for anyone that tries to use a newer version of the librdamcm 
(like an OFED 1.2 version) with an older kernel (e.g. 2.6.20).  As you pointed 
out, the issue is that the kernel rdma_cm crashes if rdma_init_qp_attr() is 
called before the user calls rdma_connect().  The problem affects both IB and 
iWarp.  The latest changes to the librdmacm exposed this bug, but the latest 
kernel multicast code also fixed it.

As far as I know, only ABI 3 has been released anywhere.  ABI 4 is only 
available from my git tree.  This problem will occur on any code based on ABI 3 
or older code snapshots of ABI 4.

Hopefully this makes sense.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Steve Wise
On Wed, 2007-01-31 at 13:55 -0800, Sean Hefty wrote:
 Steve Wise wrote:
  Should this be a problem for OFED 1.2?  I would think the ABI for all
  backports should be the same, so it wouldn't be a problem.  Is this
  true?  I'm assuming all backported UCMA modules would have the same
  ABI.  
 
 This is a problem for anyone that tries to use a newer version of the 
 librdamcm 
 (like an OFED 1.2 version) with an older kernel (e.g. 2.6.20).  As you 
 pointed 
 out, the issue is that the kernel rdma_cm crashes if rdma_init_qp_attr() is 
 called before the user calls rdma_connect().  The problem affects both IB and 
 iWarp.  The latest changes to the librdmacm exposed this bug, but the latest 
 kernel multicast code also fixed it.
 

Fixed it for IB maybe, but not for iWarp, right?

 As far as I know, only ABI 3 has been released anywhere.  ABI 4 is only 
 available from my git tree.  This problem will occur on any code based on ABI 
 3 
 or older code snapshots of ABI 4.
 
 Hopefully this makes sense.

So OFED 1.2 will be ABI 3, right?

Sorry if I'm being dense...


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Sean Hefty
 Fixed it for IB maybe, but not for iWarp, right?

It should be fixed for both.

 So OFED 1.2 will be ABI 3, right?

OFED will be ABI 4, since it will include multicast support (which is what 
causes the ABI to bump from 3 to 4).

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] new IB CM reject reason

2007-01-31 Thread Sean Hefty
We've hit into an issue with the IB CM reject reason codes.  When a remote
application crashes during connection establishment, the connection will be
rejected by the kernel CM.  Unfortunately, there's not a decent reject reason
that maps to this event.  Currently, the ib_cm issues the reject as consumer
defined (code 28).

I'd like to propose adding reject reason 0, which would mean other/unknown/or
none given.  This is a deviation from the spec, but does anyone know of any
issues with such an approach?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 10/10] osm: QoS in OpenSM

2007-01-31 Thread Yevgeny Kliteynik
Hi Hal,

Hal Rosenstock wrote:
 Hi Yevgeny,
 
 On Tue, 2007-01-30 at 10:33, Yevgeny Kliteynik wrote:
 Checking PathRecord query for QoS constraints

 The QoS-aware path selection logic is implemented in a
 separate function that is called only when QoS in OpenSM
 is on. It causes some code duplication, but the idea is
 to minimize the changes in the existing logic in OSM.
 Eventually, these two function (the old path selection
 and the new QoS-aware path selection) will be merged
 into a single function.
 
 Yes, this would be nice to do in the future as there is much overlap.
 Whether qos is carried in the request could be handled internal to this
 combined routine rather than outside to determine which routine to call.
 This will make for a lot less code.
Sure, that's the plan.
The current implementation looks the way it does only to separate the new
code completely from the usual flow, so that the old functionality won't
be broken for sure.
 
 Some comments embedded below.
 
 Signed-off-by: Yevgeny Kliteynik [EMAIL PROTECTED]
 ---
  osm/opensm/osm_sa_path_record.c |  822 
 ++-
  1 files changed, 816 insertions(+), 6 deletions(-)

 diff --git a/osm/opensm/osm_sa_path_record.c 
 b/osm/opensm/osm_sa_path_record.c
 index a0dbb07..2ff7a42 100644
 --- a/osm/opensm/osm_sa_path_record.c
 +++ b/osm/opensm/osm_sa_path_record.c
 @@ -70,6 +70,7 @@
  #include opensm/osm_router.h
  #include opensm/osm_sa_mcmember_record.h
  #endif
 +#include opensm/osm_qos_parser.h
  
  #define OSM_PR_RCV_POOL_MIN_SIZE64
  #define OSM_PR_RCV_POOL_GROW_SIZE   64
 @@ -87,6 +88,7 @@ typedef struct _osm_path_parms
uint8_trate;
uint8_tsl;
uint8_tpkt_life;
 +  uint16_t   class;
boolean_t  reversible;
  } osm_path_parms_t;
  
 @@ -716,6 +718,799 @@ __osm_pr_rcv_get_path_parms(
  
  /**
   **/
 +
 +static ib_api_status_t
 +__osm_pr_rcv_get_path_parms_qos(
 
 This is the similar function to the non QoS one:
 __osm_pr_rcv_get_path_parms

Yes, the function with QoS has everything the function 
w/o QoS has, plus QoS constraints.
Eventually, the function w/o QoS should be removed,
and the function with QoS should ignore QoS constraints
if QoS in osm is off.
 
 +  IN osm_pr_rcv_t* const p_rcv,
 +  IN const ib_path_rec_t*  const p_pr,
 +  IN const osm_port_t* const p_src_port,
 +  IN const osm_port_t* const p_dest_port,
 +  IN const uint16_tdest_lid_ho,
 +  IN const ib_net64_t  comp_mask,
 +  OUT osm_path_parms_t*const p_parms )
 +{
 +   const osm_node_t*p_node;
 +   const osm_physp_t*   p_physp;
 +   const osm_physp_t*   p_src_physp;
 +   const osm_physp_t*   p_dest_physp;
 +   const osm_prtn_t*p_prtn;
 +   const ib_port_info_t*p_pi;
 +   ib_api_status_t  status = IB_SUCCESS;
 +   ib_net16_t   pkey = 0;
 +   ib_net16_t   shared_pkey = 0;
 +   uint8_t  mtu = 0;
 +   uint8_t  rate = 0;
 +   uint8_t  pkt_life = 0;
 +   uint8_t  sl = 0;
 +   uint16_t class = 0;
 +   uint8_t  required_mtu;
 +   uint8_t  required_rate;
 +   uint8_t  required_pkt_life;
 +   uint8_t  in_port_num;
 +   uint8_t  out_port_num;
 +   ib_net16_t   dest_lid;
 +   uint8_t  i;
 +   uint8_t  vl;
 +   ib_slvl_table_t *p_slvl_tbl = NULL;
 +   boolean_tvalid_sls[IB_MAX_NUM_VLS];
 +   boolean_tsl2vl_valid_path = FALSE;
 +   uint8_t  first_valid_sl;
 +   osm_qos_level_t *p_qos_level = NULL;
 +
 +   OSM_LOG_ENTER( p_rcv-p_log, __osm_pr_rcv_get_path_parms_qos );
 +
 +   memset(valid_sls,TRUE,sizeof(valid_sls));
 +   dest_lid = cl_hton16( dest_lid_ho );
 +
 +   p_dest_physp = osm_port_get_default_phys_ptr( p_dest_port );
 +   p_physp = osm_port_get_default_phys_ptr( p_src_port );
 +   p_src_physp = p_physp;
 +   p_pi = p_physp-port_info;
 +
 +   mtu = ib_port_info_get_mtu_cap( p_pi );
 +   rate = ib_port_info_compute_rate( p_pi );
 +
 +   /*
 +* Mellanox Tavor device performance is better using 1K MTU.
 +* If required MTU and MTU selector are such that 1K is OK 
 +* and at least one end of the path is Tavor we override the
 +* port MTU with 1K.
 +*/
 +   if ( p_rcv-p_subn-opt.enable_quirks 
 +  __osm_sa_path_rec_apply_tavor_mtu_limit(
 + p_pr, p_src_port, p_dest_port, comp_mask) )
 +   {
 +  if (mtu  IB_MTU_LEN_1024) 
 +  {
 + mtu = IB_MTU_LEN_1024;
 + osm_log( p_rcv-p_log, OSM_LOG_DEBUG,
 +  __osm_pr_rcv_get_path_parms_qos: 
 +  Optimized Path MTU to 1K for Mellanox Tavor device\n);
 +  }
 +   

Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Steve Wise
On Wed, 2007-01-31 at 14:04 -0800, Sean Hefty wrote:
  Fixed it for IB maybe, but not for iWarp, right?
 
 It should be fixed for both.
 

Ok. 

But there still exists an iwarp issue that I need to fix because
librdmacm (the one shipped in OFED) now calls the kernel
rdma_init_qp_attr() function via ucma before the library calls kernel
rdma_connect() via ucma...

 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/10] osm: QoS in OpenSM

2007-01-31 Thread Yevgeny Kliteynik
Hi Hal,

Hal Rosenstock wrote:
 Hi Yevgeny,
 
 On Tue, 2007-01-30 at 09:51, Yevgeny Kliteynik wrote:
 Hi Hal.

 The following is a series of 10 patches:
 1. QoS policy file parser Yacc file
 2. QoS policy file parser Lex file
 3. QoS policy file parser Yacc  Lex generated files
 4. QoS policy file parser header file
 5. QoS policy file parser C file with auxiliary functions
 6. Compilation changes for QoS policy file parser:
Added new files to makefiles.
Introduced new configuration switch '--enable-maintainer-mode',
which will run Lex  Yacc instead of just using the generated
files.
 7. Renamed static function find_prtn_by_name() to non-static 
 osm_prtn_find_by_name()
This function will be used later by the PathRecord logic.
 8. Added QoS class and service id fields to the path record.
 9. Added new command line option for OSM: '-Y' or '--qos_policy_file'
 10.Checking PathRecord query for QoS constraints.
 
 Is everyone on the list satisfied with an XML format or should there be
 a text version ? Is anyone concerned about the ease of configuring XML
 for QoS ?
 
 IMO, the XML syntax needs to be explained, discussed, and vetted on the
 list. I am hopping this can occur reasonably quickly. If we are doing
 XML for this, we need to get to a stable agreed syntax.
 
 A couple of missing minor things:
 SA ClassPortInfo and SA MultiPathRecord similar to PathRecord
 
 A major missing component is a QoS manager which supports the granular
 configuration of the SL2VL and VLArb tables. Based on our experience
 with the existing QoS manager, this effort is not to be minimized. If
 this is not part of this package, a fair portion of the QoS syntax is
 dormant. I know this can be run on top of the existing QoS manager to
 get a more complete QoS solution than what already exists so this could
 be considered an stepping stone towards that.
 

I already started working on multipath, and the next item on my list is 
QoS manager (or QoS setup), but I seriously doubt that I will manage to
finish it till the feature freeze, which is today :)
Anyway, qos policy file parser (whatever the format is) and the PathRecord
are definitely a stepping stone towards full QoS support in OpenSM.

-- Yevgeny

 -- Hal
 
 --
 Yevgeny

 Signed-off-by:  Yevgeny Kliteynik [EMAIL PROTECTED]
  


 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Sean Hefty
But there still exists an iwarp issue that I need to fix because
librdmacm (the one shipped in OFED) now calls the kernel
rdma_init_qp_attr() function via ucma before the library calls kernel
rdma_connect() via ucma...

Can you clarify which versions of the librdmacm and kernel you are using?

The librdmacm shipped with OFED 1.1 shouldn't hit this issue.  And neither
should the upcoming OFED 1.2 version of the librdmacm (with the previously
posted patch applied), when paired with either the OFED 1.2 kernel code, what
was requested to go into 2.6.21, or older kernels.

I just think that this problem is only exposed by developmental librdmacm code
paired with older developmental rdma_cm multicast code.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 10/10] osm: QoS in OpenSM

2007-01-31 Thread Yevgeny Kliteynik
Hi Sasha,

Sasha Khapyorsky wrote:
 On 17:33 Tue 30 Jan , Yevgeny Kliteynik wrote:
 Checking PathRecord query for QoS constraints

 The QoS-aware path selection logic is implemented in a
 separate function that is called only when QoS in OpenSM
 is on. It causes some code duplication, but the idea is
 to minimize the changes in the existing logic in OSM.
 Eventually, these two function (the old path selection
 and the new QoS-aware path selection) will be merged
 into a single function.
 
 Please merge __osm_pr_rcv_get_path_parms() and
 __osm_pr_rcv_get_path_parms_qos() functions into single one - as you
 stated most code is duplicated there.
 
 In fact __osm_pr_rcv_get_path_parms() is most changeable function in
 SA PR processor, and it is not good idea to make this twice. IMHO it
 creates more ground for future bugs comparing to the risk of possible
 impacts to existing functionality.

As I said, this actually won't be a merge - the original function 
will be removed, and the new function will have a few if() statements
for cases when QoS in osm is down.
However, this will be a bunch of new code that is running as part of
the usual flow, and since this code didn't have enough time to be tested
before feature freeze, and because we discussed the necessity of implementing
QoS-aware PathRecord the way that it won't change the usual path (again, 
until it will be tested), I think that it would be better right now to 
leave it in two separate functions.
Trust me, I want to get rid of this code duplication much more than you do :)
And I'll do it as soon as I get to test the new code properly.

-- Yevgeny

 This also will make your patch much more review friendly.
 
 Thanks,
 Sasha
 
 Signed-off-by: Yevgeny Kliteynik [EMAIL PROTECTED]
 ---
  osm/opensm/osm_sa_path_record.c |  822 
 ++-
  1 files changed, 816 insertions(+), 6 deletions(-)

 diff --git a/osm/opensm/osm_sa_path_record.c 
 b/osm/opensm/osm_sa_path_record.c
 index a0dbb07..2ff7a42 100644
 --- a/osm/opensm/osm_sa_path_record.c
 +++ b/osm/opensm/osm_sa_path_record.c
 @@ -70,6 +70,7 @@
  #include opensm/osm_router.h
  #include opensm/osm_sa_mcmember_record.h
  #endif
 +#include opensm/osm_qos_parser.h
  
  #define OSM_PR_RCV_POOL_MIN_SIZE64
  #define OSM_PR_RCV_POOL_GROW_SIZE   64
 @@ -87,6 +88,7 @@ typedef struct _osm_path_parms
uint8_trate;
uint8_tsl;
uint8_tpkt_life;
 +  uint16_t   class;
boolean_t  reversible;
  } osm_path_parms_t;
  
 @@ -716,6 +718,799 @@ __osm_pr_rcv_get_path_parms(
  
  /**
   **/
 +
 +static ib_api_status_t
 +__osm_pr_rcv_get_path_parms_qos(
 +  IN osm_pr_rcv_t* const p_rcv,
 +  IN const ib_path_rec_t*  const p_pr,
 +  IN const osm_port_t* const p_src_port,
 +  IN const osm_port_t* const p_dest_port,
 +  IN const uint16_tdest_lid_ho,
 +  IN const ib_net64_t  comp_mask,
 +  OUT osm_path_parms_t*const p_parms )
 +{
 +   const osm_node_t*p_node;
 +   const osm_physp_t*   p_physp;
 +   const osm_physp_t*   p_src_physp;
 +   const osm_physp_t*   p_dest_physp;
 +   const osm_prtn_t*p_prtn;
 +   const ib_port_info_t*p_pi;
 +   ib_api_status_t  status = IB_SUCCESS;
 +   ib_net16_t   pkey = 0;
 +   ib_net16_t   shared_pkey = 0;
 +   uint8_t  mtu = 0;
 +   uint8_t  rate = 0;
 +   uint8_t  pkt_life = 0;
 +   uint8_t  sl = 0;
 +   uint16_t class = 0;
 +   uint8_t  required_mtu;
 +   uint8_t  required_rate;
 +   uint8_t  required_pkt_life;
 +   uint8_t  in_port_num;
 +   uint8_t  out_port_num;
 +   ib_net16_t   dest_lid;
 +   uint8_t  i;
 +   uint8_t  vl;
 +   ib_slvl_table_t *p_slvl_tbl = NULL;
 +   boolean_tvalid_sls[IB_MAX_NUM_VLS];
 +   boolean_tsl2vl_valid_path = FALSE;
 +   uint8_t  first_valid_sl;
 +   osm_qos_level_t *p_qos_level = NULL;
 +
 +   OSM_LOG_ENTER( p_rcv-p_log, __osm_pr_rcv_get_path_parms_qos );
 +
 +   memset(valid_sls,TRUE,sizeof(valid_sls));
 +   dest_lid = cl_hton16( dest_lid_ho );
 +
 +   p_dest_physp = osm_port_get_default_phys_ptr( p_dest_port );
 +   p_physp = osm_port_get_default_phys_ptr( p_src_port );
 +   p_src_physp = p_physp;
 +   p_pi = p_physp-port_info;
 +
 +   mtu = ib_port_info_get_mtu_cap( p_pi );
 +   rate = ib_port_info_compute_rate( p_pi );
 +
 +   /*
 +* Mellanox Tavor device performance is better using 1K MTU.
 +* If required MTU and MTU selector are such that 1K is OK 
 +* and at least one end of the path is Tavor we override the
 +* port MTU with 1K.
 +*/
 +   if ( 

Re: [openib-general] new IB CM reject reason

2007-01-31 Thread Sean Hefty
Is there a reason to distinquish between a connection that
is being rejected because the listener crashed and a connection
that is being rejected because the listener does not exist?

This only covers the case for the REQ received state, and could work for that
state.  But the problem can also occur after sending/receiving an MRA, REQ, or
REP.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Steve Wise
On Wed, 2007-01-31 at 14:04 -0800, Sean Hefty wrote:
  Fixed it for IB maybe, but not for iWarp, right?
 
 It should be fixed for both.
 
  So OFED 1.2 will be ABI 3, right?
 
 OFED will be ABI 4, since it will include multicast support (which is what 
 causes the ABI to bump from 3 to 4).
 

Has the ofed tree been updated to ABI 4 yet?




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] new IB CM reject reason

2007-01-31 Thread Caitlin Bestler
Sean Hefty wrote:
 Is there a reason to distinquish between a connection that is being
 rejected because the listener crashed and a connection that is being
 rejected because the listener does not exist?
 
 This only covers the case for the REQ received state, and
 could work for that state.  But the problem can also occur
 after sending/receiving an MRA, REQ, or REP.
 
 - Sean

So would that would mean that only an InfiniBand specific wire-protocol
code was needed, and that no API enhancement was required?

Trying to describe failures in a transport neutral fashion
is a real pain.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Sean Hefty
 OFED will be ABI 4, since it will include multicast support (which is what
 causes the ABI to bump from 3 to 4).


Has the ofed tree been updated to ABI 4 yet?

I just looked in vlad's git tree a while ago, and his ofed_1_2 branch had ABI 3.
His ofed_1_2_multicast didn't have an rdma_user_cm.h file, so I'm not sure about
that branch.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] new IB CM reject reason

2007-01-31 Thread Sean Hefty
So would that would mean that only an InfiniBand specific wire-protocol
code was needed, and that no API enhancement was required?

Yes - I'm talking about the IB CM wire-protocol specifically.  Actual
implementation changes would likewise be limited to the ib_cm.

Trying to describe failures in a transport neutral fashion
is a real pain.

The rdma_cm exports the underlying transport reject code as a status value that
is left up to the user to interpret.  The event code is transport neutral, and
likely all that most users care about, but the transport specific code is useful
for debugging.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] ipath and current git woes

2007-01-31 Thread Jason Gunthorpe
Has anyone been able to use ipath with the current latest git
everything?

The libipathverbs git repository seems to be missing a patch from
Roland to make it work with libibverbs.2 in an email titled:

[openib-general] [PATCH 3/7] libipathverbs: Update libipathverbs for
new libibverbs driver handling

After applying that patch the user space consumers load but we got a
kernel oops when we tried to run a test here :

Unable to handle kernel NULL pointer dereference at 0918 RIP: 
 [88074c76] :ib_ipath:ipath_mmap+0x37/0x95
PGD 3ad46067 PUD 3ad4f067 PMD 0 
Oops:  [1] 
CPU 0 
Modules linked in: usb_storage skge bitrev crc32 ib_ipath k8temp hwmon 
forcedeth ehci_hcd ohci_hcd usbcore i2c_nforce2 i2c_core ib_uverbs ib_umad 
ib_mad ib_core
Pid: 4009, comm: ib_rdma_lat Not tainted 2.6.20-rc4-gf3a2c3ee-dirty #6
RIP: 0010:[88074c76]  [88074c76] 
:ib_ipath:ipath_mmap+0x37/0x95
RSP: :81003aaa3e88  EFLAGS: 00010002
RAX: 81003c434000 RBX: 0910 RCX: 1000
RDX: 002b3000 RSI: 81003bcab440 RDI: 81003bc2d840
RBP: 81003b7af918 R08: 81003aaa3f08 R09: 81003aa38c98
R10: 81003aa38c90 R11: 88074c3f R12: ffea
R13: 81003ac496c0 R14: f7ee3000 R15: 1000
FS:  () GS:8053(0063) knlGS:f7d8c6c0
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: 0918 CR3: 3aa2f000 CR4: 06e0
Process ib_rdma_lat (pid: 4009, threadinfo 81003aaa2000, task 
81003c0495e0)
Stack:  81003b7af918 001000fb ffea 80250151
 81003be9f440 81003bc2d140 0028 ff99df20
   81003b87d818 81003dc04840
Call Trace:
 [80250151] do_mmap_pgoff+0x4d5/0x739
 [8021b54b] sys32_mmap2+0x76/0x9e
 [8021a432] ia32_sysret+0x0/0xa


Code: 48 3b 7b 08 75 46 48 3b 53 10 75 40 8b 43 1c 48 39 c1 77 40 
RIP  [88074c76] :ib_ipath:ipath_mmap+0x37/0x95
 RSP 81003aaa3e88
CR2: 0918

This is with a PCI-E qlogic card: 
:03:00.0 InfiniBand: Unknown device 1fc1:0010 (rev 01)

Anyone have any clues?

One notable thing is that I have a 32 bit user space and a 64 bit
kernel. I'll try a 64 bit user space tomorrow in case there is some
thing wrong with 32bit compatability...

The last time we had these cards working was with OFED 1.1 on 64 bit
FC4 using a linus kernel (2.6.18 I think)..

Thanks,
Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ipath and current git woes

2007-01-31 Thread Robert Walsh
Jason Gunthorpe wrote:
 Has anyone been able to use ipath with the current latest git
 everything?

We're working on getting this up to date right now.  Give us a couple of 
days and we'll have some new patches ready.

Regards,
  Robert.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] MVAPICH2 SRPM and install file patches

2007-01-31 Thread Shaun Rowland

I've placed the MVAPICH2 SRPM on the OFA server in ~rowland/ofed_1_2,
and it is linked to here:

http://www.openfabrics.org/~rowland/ofed_1_2/

Additionally, I am including a patch in this email that updates the
ofed_1_2_scripts files from the GIT repository we were given to
handle the MVAPICH2 SRPM file. Basically, installing MVAPICH2 is similar
to the other MPI packages, except that I have added a choice option to
build with iWARP support or not. The default is IB only. If the user has
selected the librdmacm packages and the mvapich2 package, this choice is
presented. This is also saved in the ofed.conf file using an
MVAPICH2_IMPL variable, and the librdmacm packages are added as
dependencies if the iWARP version of MVAPICH2 is desired and they are
not already in the ofed.conf file, which seems like standard behavior in
the scripts. The resulting binary RPM uses the name convention
mvapich2_compiler as normal in either case. There are various ways
this could be implemented, perhaps in a better manner. This is what I
was able to come up with by today. Since the installation scripts given
were very similar to the original OFED 1.1 scripts, I was able to test
the installation procedure using OFED 1.1 files. Everything worked for
me, including building the mpitests package against the mvapich2
package. There are some comments about this in what I have done. I hope
that it is helpful in getting our SRPM integrated into the installation
scripts.

Additionally, I put a README file in my ofed_1_2 directory that contains
information about the macros that can be used with our SRPM file. The
SRPM can be used to install against an existing OFED installation, and
those macros control various aspects of the result. There is one special
macro I use for when the SRPM is being built along with the OFED source,
and its use should be clear in the patched build.sh script and
associated comment.
--
Shaun Rowland   [EMAIL PROTECTED]
http://www.cse.ohio-state.edu/~rowland/
diff --git a/build.sh b/build.sh
index c4fe469..380bd98 100755
--- a/build.sh
+++ b/build.sh
@@ -428,6 +428,130 @@ mpi_osu()
 return 0
 }
 
+mvapich2()
+{
+local iwarp=0
+
+if [ $MVAPICH2_IMPL = iwarp ]; then
+iwarp=1
+fi
+
+echo
+
+if [ $iwarp -eq 0 ]; then
+echo Building the MVAPICH2 RPM with IB support. Please wait...
+else
+echo Building the MVAPICH2 RPM with iWARP support. Please wait...
+fi
+
+echo
+
+for mpi_comp in ${MPI_COMPILER_mvapich2}
+do
+
MVAPICH2_RPM=${MVAPICH2_NAME}_${mpi_comp}-${MVAPICH2_VER}-${MVAPICH2_REL}.${build_arch}.rpm
+
MVAPICH2_PREFIX=${STACK_PREFIX}/mpi/${mpi_comp}/${MVAPICH2_NAME}-${MVAPICH2_VERSION}
+
+case ${mpi_comp} in
+gcc)
+MVAPICH2_COMP_ENV=CC=gcc CXX=g++
+
+if [ $is_gfortran -eq 1 ]; then
+MVAPICH2_COMP_ENV=$MVAPICH2_COMP_ENV F77=gfortran
+elif [ $is_gcc_g77 -eq 1 ]; then
+MVAPICH2_COMP_ENV=$MVAPICH2_COMP_ENV F77=g77
+fi
+;;
+pathscale)
+MVAPICH2_COMP_ENV=CC=pathcc CXX=pathCC F77=pathf90 
F90=pathf90
+
+# On i686 the PathScale compiler requires -g optimization
+# for MVAPICH2 in the shared library configuration.
+if [ $ARCH = i686 ]; then
+MVAPICH2_COMP_ENV=$MVAPICH2_COMP_ENV OPT_FLAG=-g
+fi
+;;
+pgi)
+MVAPICH2_COMP_ENV=CC=pgcc CXX=pgCC F77=pgf77 F90=pgf90
+;;
+intel)
+# The -i-dynamic flag is required for MVAPICH2 in the shared
+# library configuration.
+MVAPICH2_COMP_ENV='CC=icc -i-dynamic CXX=icpc -i-dynamic 
F77=ifort -i-dynamic F90=ifort -i-dynamic'
+;;
+esac
+
+ex rpmbuild --rebuild \
+   --define \'_topdir ${RPM_DIR}\' \
+   --define \'_name ${MVAPICH2_NAME}_${mpi_comp}\' \
+   --define \'_prefix ${MVAPICH2_PREFIX}\' \
+   --define \'build_root ${BUILD_ROOT}\' \
+   --define \'open_ib_home ${STACK_PREFIX}\' \
+   --define \'ofed_build_root ${BUILD_ROOT}\' \
+   --define \'comp_env ${MVAPICH2_COMP_ENV}\' \
+   --define \'iwarp ${iwarp}\' \
+   --define \'romio 1\' \
+   --define \'shared_libs 1\' \
+   --define \'auto_req 1\' \
+   $MVAPICH2_SRC_RPM
+ex $MV -f ${RPM_DIR}/RPMS/$build_arch/${MVAPICH2_RPM} $RPMS
+let BUILD_COUNTER++
+
+if [ $mpitests == y ]; then
+echo
+echo Building the mpitests RPM over MVAPICH2 using the 
${mpi_comp} compiler. Please wait...
+echo
+
MPITESTS_RPM=${MPITESTS_NAME}_${MVAPICH2_NAME}_${mpi_comp}-${MPITESTS_VERSION}.${build_arch}.rpm
+
+# rowland: The MVAPICH2 SRPM was built above by specifying
+# ofed_build_root (set to the same 

Re: [openib-general] ip_ib_mc_map?

2007-01-31 Thread Sean Hefty
Steve Wise wrote:
 Perhaps there's no backport for this to rhel4u4?

I would have thought so, but I really don't know.  The function is called from 
net/ipv4/arp.c, and not directly by ipoib.  So, I don't know how the backport 
patches typically handle this.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] nightly osm_sim report 2007-02-01:normal completion

2007-01-31 Thread Eitan Zahavi
OSM Simulation Regression Summary
OpenSM rev = Wed_Jan_31_12:00:12_2007 2095ee 
ibutils rev = Wed_Jan_3_11:42:12_2007 913448 
Total=410 Pass=410 Fail=0

Pass:
30 Stability IS1-16.topo
30 Pkey IS1-16.topo
30 OsmTest IS1-16.topo
30 OsmStress IS1-16.topo
30 Multicast IS1-16.topo
30 LidMgr IS1-16.topo
10 Stability IS3-loop.topo
10 Stability IS3-128.topo
10 Pkey IS3-128.topo
10 OsmTest IS3-loop.topo
10 OsmTest IS3-128.topo
10 OsmStress IS3-128.topo
10 Multicast IS3-loop.topo
10 Multicast IS3-128.topo
10 LidMgr IS3-128.topo
10 FatTree part-4-ary-3-tree.topo
10 FatTree merge-roots-reorder-4-ary-2-tree.topo
10 FatTree merge-roots-4-ary-2-tree.topo
10 FatTree merge-root-4-ary-3-tree.topo
10 FatTree merge-root-12-ary-2-tree.topo
10 FatTree merge-2-ary-4-tree.topo
10 FatTree half-4-ary-3-tree.topo
10 FatTree blend-4-ary-2-tree.topo
10 FatTree 4-ary-4-tree.topo
10 FatTree 4-ary-3-tree.topo
10 FatTree 32nodes-3lvl-is1.topo
10 FatTree 2-ary-4-tree.topo
10 FatTree 12-node-spaced.topo
10 FatTree 12-ary-2-tree.topo

Failures:

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] new IB CM reject reason

2007-01-31 Thread Michael S. Tsirkin
 Quoting Sean Hefty [EMAIL PROTECTED]:
 Subject: new IB CM reject reason
 
 We've hit into an issue with the IB CM reject reason codes.  When a remote
 application crashes during connection establishment, the connection will be
 rejected by the kernel CM.  Unfortunately, there's not a decent reject reason
 that maps to this event.  Currently, the ib_cm issues the reject as consumer
 defined (code 28).
 
 I'd like to propose adding reject reason 0, which would mean other/unknown/or
 none given.  This is a deviation from the spec, but does anyone know of any
 issues with such an approach?

No, I don't think application crashed makes sense as an element of wire 
protocol.
I think an optional logging of errors in kernel CM would be a much better
solution. I know I had to add some printks it each time I was debugging SDP.

A couple of reasons that come to mind:

1. Should the remote side care whether remote implementation is kernel or
userspace? Userspace consumers are not the only ones of interest. What about 
various error
codes and failure values returned from callback on passive side?
If you are trying to develop a debug aid, these need to be covered as well.

2. Another objection is that this feature seems to invite misuse where 
applications
will use REJ reason as a hint on whether remote side crashed. But REJ could be
lost. Wouldn't this confuse the remote side?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] RE: regression in ofed 1.2

2007-01-31 Thread Michael S. Tsirkin
 His ofed_1_2_multicast didn't have an rdma_user_cm.h file, so I'm not sure 
 about
 that branch.

That one should be removed. It was created as a debugging aid to help people
debug crashes observed by Dotan in the multicast module.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 334] Problems with build OFED-1.1.1-ib_local_sa

2007-01-31 Thread bugzilla-daemon
https://bugs.openfabrics.org/show_bug.cgi?id=334


[EMAIL PROTECTED] changed:

   What|Removed |Added

   Priority|P1  |P2




--- Comment #1 from [EMAIL PROTECTED]  2007-01-31 23:19 ---
I resolve the problem with build cma.c file (I added rdma/ib_local_sa.h file in
include), but I found a new problem:

 gcc
-Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.iser_verbs.o.d
 -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include 
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include  -Iinclude 
-Iinclude2 -I/usr/src/linux-2.6.16.21-0.8/include  -include
include/linux/autoconf.h  -include
/var/tmp/OFEDRPM/BUILD/openib-1.1/include/linux/autoconf.h   
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser  -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration
-fno-strict-aliasing -fno-common -ffreestanding -Os -fomit-frame-pointer
-mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks
-Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse
-mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement -Wno-pointer-sign
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include 
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include 
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib 
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug  -DMODULE
-DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(iser_verbs) 
-DKBUILD_MODNAME=KBUILD_STR(ib_iser) -c -o
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.tmp_iser_verbs.o
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c
In file included from
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c:42:
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser.h:47:27:
error: scsi/libiscsi.h: No such file or directory
In file included from
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser.h:48,
 from
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c:42:
/usr/src/linux-2.6.16.21-0.8/include/scsi/scsi_transport_iscsi.h:213: error:
field 'mutex' has incomplete type
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c: In
function 'iser_create_device_ib_res':
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c:80:
error: 'ISCSI_XMIT_CMDS_MAX' undeclared (first use in this function)
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c:80:
error: (Each undeclared identifier is reported only once
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c:80:
error: for each function it appears in.)
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c: In
function 'iser_create_ib_conn_res':
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c:159:
error: 'ISCSI_XMIT_CMDS_MAX' undeclared (first use in this function)
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c: In
function 'iser_disconnected_handler':
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.c:418:
error: implicit declaration of function 'iscsi_conn_failure'
make[5]: ***
[/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iser_verbs.o]
Error 1
make[4]: *** [/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser]
Error 2
make[3]: *** [_module_/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband]
Error 2
make[2]: *** [modules] Error 2
make[1]: *** [modules] Error 2

Could you please help me resolve this problem?


-- 
Configure bugmail: https://bugs.openfabrics.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general