Re: [openib-general] [PATCH] rdma_cm iWARP connection setup timeouts reported as rejects.

2007-01-08 Thread Mirko Benz
Hi,

What could be the reasons for these timeouts to occur?
How should an application handle this?

Thanks,
Mirko

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OFED release testing Task force meeting minutes

2007-01-08 Thread Nimrod Gindi
Meeting took place on Thursday - Jan. 4th, 2007

Agenda:
1. Introduction to targets as presented at the last OFA meeting
2. Determine priorities.
3. Determine schedule
4. Open discussion

Attending companies: Mellanox, NetEffect, ORNL, Qlogic, Voltaire, 

Discussion Items and Action Items:
1)  Note was made that OFA interoperability and IBTA Plugfest date
after the OFED 1.2 scheduled release
2)  Agreed initial targets (in priority order):
a.  Unified reporting of tests results
b.  Unified/Increased reporting bugs
c.  ULPS/driver parts testing ownerships
3)  Agreed Action Items:
a.  AI 1: Amit K (Mellanox) to take that with OFA to re-visit the
date and decide whether it would be better to have the testing prior to
the release.
b.  AI 2: Amit K (Mellanox) to send out test-report format for group
review.
c.  AI 3: Moni L (Voltaire) to send out test-report format for group
review.

Reviews/addition ideas were agreed to be taken by e-mail with the group
in the To field involved.
Follow-up meeting will be scheduled for either 17th or 18th of January
2007 8:30am PDT=11am EDT=6pm Israel (Please respond with which fits you
better).

Nimrod  Gindi
Mellanox Technologies Ltd.
mail  :  [EMAIL PROTECTED]
Cell  :  +1-408-750-4801
Office:  +1-347-342-0011
Fax   :  +1-212-987-0275

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ioctl and send_agents

2007-01-08 Thread Michael Arndt
Hi,

Thanks for the fast answer.

 OpenSM registers agents in opensm/osm_sm_mad_ctrl.c:osm_sm_mad_ctrl_bind
 and opensm/osm_sa_mad_ctrl.c:osm_sa_mad_ctrl_bind. osm_sm_mad_ctrl_bind
 is called from osm_sm.c:osm_sm_bind and osm_sa_mad_ctrl_bind is called
 from osm_sa.c:osm_sa_bind. Both osm_sm_bind and osm_sa_bind are called
 from opensm/osm_opensm.c:osm_opensm_bind which is in turn called from
 main.c during OpenSM startup. That is the vendor independent part.

 The vendor dependant part is done in the vendor layer. For OpenIB, it is
 done in osm_vendor_ibumad.c:osm_vendor_bind.

I looked at the osm_vendor_bind and seen the umad_register call. But if I 
checked the umad_register function (libibumad/src/umad.c) I just see an 
ioctl call again. And if it right that the user_mad module is uses at kernel 
space shouldn't there be a call like unlocked_ioctl or compat_ioctl like 
defined in this module?

These agents are all receiver agents and you say nothing about send agents 
for SM?

Thanks Michael


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] nightly osm_sim report 2007-01-07:normal completion

2007-01-08 Thread Hal Rosenstock
On Sun, 2007-01-07 at 00:15, Eitan Zahavi wrote: 
 OSM Simulation Regression Summary
 OpenSM rev = Sat_Jan_6_06:44:34_2007 6c8647 
 ibutils rev = Wed_Jan_3_11:42:12_2007 913448 
 Total=369 Pass=366 Fail=3
 
 Pass:
 27 Stability IS1-16.topo
 27 Pkey IS1-16.topo
 27 OsmTest IS1-16.topo
 27 OsmStress IS1-16.topo
 27 Multicast IS1-16.topo
 27 LidMgr IS1-16.topo
 9 Stability IS3-loop.topo
 9 Stability IS3-128.topo
 9 Pkey IS3-128.topo
 9 OsmTest IS3-loop.topo
 9 OsmTest IS3-128.topo
 9 OsmStress IS3-128.topo
 9 Multicast IS3-loop.topo
 9 Multicast IS3-128.topo
 9 FatTree part-4-ary-3-tree.topo
 9 FatTree merge-roots-reorder-4-ary-2-tree.topo
 9 FatTree merge-roots-4-ary-2-tree.topo
 9 FatTree merge-root-4-ary-3-tree.topo
 9 FatTree merge-root-12-ary-2-tree.topo
 9 FatTree half-4-ary-3-tree.topo
 9 FatTree blend-4-ary-2-tree.topo
 9 FatTree 4-ary-4-tree.topo
 9 FatTree 4-ary-3-tree.topo
 9 FatTree 32nodes-3lvl-is1.topo
 9 FatTree 2-ary-4-tree.topo
 9 FatTree 12-ary-2-tree.topo
 8 LidMgr IS3-128.topo
 8 FatTree merge-2-ary-4-tree.topo
 8 FatTree 12-node-spaced.topo
 
 Failures:
 1 LidMgr IS3-128.topo

Is this LidMgr failure a DNS issue like the others ?

Also, there was also pkey failure from late last week.

-- Hal

 1 FatTree merge-2-ary-4-tree.topo
 1 FatTree 12-node-spaced.topo


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] nightly osm_sim report 2007-01-08:normal completion

2007-01-08 Thread Hal Rosenstock
On Mon, 2007-01-08 at 00:26, Eitan Zahavi wrote:
 OSM Simulation Regression Summary
 OpenSM rev = Sat_Jan_6_06:44:34_2007 6c8647 
 ibutils rev = Wed_Jan_3_11:42:12_2007 913448 
 Total=410 Pass=409 Fail=1
 
 Pass:
 30 Stability IS1-16.topo
 30 Pkey IS1-16.topo
 30 OsmTest IS1-16.topo
 30 OsmStress IS1-16.topo
 30 Multicast IS1-16.topo
 30 LidMgr IS1-16.topo
 10 Stability IS3-loop.topo
 10 Stability IS3-128.topo
 10 Pkey IS3-128.topo
 10 OsmTest IS3-loop.topo
 10 OsmTest IS3-128.topo
 10 OsmStress IS3-128.topo
 10 Multicast IS3-loop.topo
 10 LidMgr IS3-128.topo
 10 FatTree part-4-ary-3-tree.topo
 10 FatTree merge-roots-reorder-4-ary-2-tree.topo
 10 FatTree merge-roots-4-ary-2-tree.topo
 10 FatTree merge-root-4-ary-3-tree.topo
 10 FatTree merge-root-12-ary-2-tree.topo
 10 FatTree merge-2-ary-4-tree.topo
 10 FatTree half-4-ary-3-tree.topo
 10 FatTree blend-4-ary-2-tree.topo
 10 FatTree 4-ary-4-tree.topo
 10 FatTree 4-ary-3-tree.topo
 10 FatTree 32nodes-3lvl-is1.topo
 10 FatTree 2-ary-4-tree.topo
 10 FatTree 12-node-spaced.topo
 10 FatTree 12-ary-2-tree.topo
 9 Multicast IS3-128.topo
 
 Failures:
 1 Multicast IS3-128.topo

What about this failure too ? Is it also DNS related or something else ?

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 2/2]: OpenSM/osm_console.c: Handle telnet disconnects better

2007-01-08 Thread Hal Rosenstock
OpenSM/osm_console.c: Handle telnet disconnects better

Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]
Signed-off-by: Hal Rosenstock [EMAIL PROTECTED]

diff --git a/osm/opensm/osm_console.c b/osm/opensm/osm_console.c
index 420acc2..8d770aa 100644
--- a/osm/opensm/osm_console.c
+++ b/osm/opensm/osm_console.c
@@ -336,7 +336,7 @@ void osm_console(osm_opensm_t *p_osm)
pollfd[1].events = POLLIN|POLLOUT;
pollfd[1].revents = 0;
 
-   if (poll(pollfd, 2, 1) = 0)
+   if (poll(pollfd, pollfd[1].fd = 0 ? 2 : 1, 1) = 0)
return;
 
 #ifdef ENABLE_OSM_CONSOLE_SOCKET
@@ -382,11 +382,10 @@ void osm_console(osm_opensm_t *p_osm)
if (n  0) {
/* Parse and act on input */
parse_cmd_line(p_line, p_osm);
+   osm_console_prompt(p_osm-console.out);
+   } else
+   osm_console_close_socket(p_osm);
+   if (p_line)
free(p_line);
-   } else {
-   fprintf(p_osm-console.out, Input error\n);
-   fflush(p_osm-console.out);
-   }
-   osm_console_prompt(p_osm-console.out);
}
 }




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 1/2] OpenSM: Add socket support to OpenSM console

2007-01-08 Thread Hal Rosenstock
OpenSM: Add socket support to OpenSM console

Signed-off-by: Ira Weiny [EMAIL PROTECTED]
Signed-off-by: Hal Rosenstock [EMAIL PROTECTED]

diff --git a/osm/include/opensm/osm_console.h b/osm/include/opensm/osm_console.h
index 705f918..2d212f2 100644
--- a/osm/include/opensm/osm_console.h
+++ b/osm/include/opensm/osm_console.h
@@ -38,6 +38,11 @@
 #include opensm/osm_subnet.h
 #include opensm/osm_opensm.h
 
+#define OSM_COMMAND_LINE_LEN120
+#define OSM_COMMAND_PROMPT  $ 
+#define OSM_DEFAULT_CONSOLE_PORT 1
+#define OSM_DAEMON_NAME  opensm
+
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern C {
 #  define END_C_DECLS   }
@@ -48,8 +53,10 @@
 
 BEGIN_C_DECLS
 
+void osm_console_init(osm_subn_opt_t *opt, osm_opensm_t *p_osm);
 void osm_console(osm_opensm_t *p_osm);
-void osm_console_prompt(void);
+void osm_console_prompt(FILE *out);
+void osm_console_close_socket(osm_opensm_t *p_osm);
 
 END_C_DECLS
 
diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h
index 16fef37..482de28 100644
--- a/osm/include/opensm/osm_opensm.h
+++ b/osm/include/opensm/osm_opensm.h
@@ -48,6 +48,7 @@
 #ifndef _OSM_OPENSM_H_
 #define _OSM_OPENSM_H_
 
+#include stdio.h
 #include signal.h
 #include complib/cl_dispatcher.h
 #include complib/cl_passivelock.h
@@ -130,6 +131,15 @@ struct osm_routing_engine {
 *  internals cleanup.
 */
 
+typedef struct _osm_console_t
+{
+  int   socket;
+  int   in_fd;
+  int   out_fd;
+  FILE *in;
+  FILE *out;
+} osm_console_t;
+
 /s* OpenSM: OpenSM/osm_opensm_t
 * NAME
 *  osm_opensm_t
@@ -156,6 +166,7 @@ typedef struct _osm_opensm_t
   cl_plock_t   lock;
   struct osm_routing_engine routing_engine;
   osm_stats_t  stats;
+  osm_console_tconsole;
 } osm_opensm_t;
 /*
 * FIELDS
diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h
index 79796e5..c9b04eb 100644
--- a/osm/include/opensm/osm_subnet.h
+++ b/osm/include/opensm/osm_subnet.h
@@ -266,6 +266,7 @@ typedef struct _osm_subn_opt
   boolean_tno_qos;
   boolean_taccum_log_file;
   boolean_tconsole;
+  uint16_t console_port;
   cl_map_t port_prof_ignore_guids;
   boolean_tport_profile_switch_nodes;
   osm_pfn_ui_extension_t   pfn_ui_pre_lid_assign;
diff --git a/osm/opensm/configure.in b/osm/opensm/configure.in
index 1ccf5c6..2d52675 100644
--- a/osm/opensm/configure.in
+++ b/osm/opensm/configure.in
@@ -62,6 +62,22 @@ AC_ARG_ENABLE(debug,
 esac],[debug=false])
 AM_CONDITIONAL(DEBUG, test x$debug = xtrue)
 
+dnl Console over a socket connection
+AC_ARG_ENABLE(console-socket,
+[  --enable-console-socket Enable a console socket, requires tcp_wrappers 
(default yes)],
+[case $enableval in
+ yes) console_socket=yes ;;
+ no)  console_socket=no ;;
+   esac],
+   console_socket=yes)
+if test $console_socket = yes; then
+  AC_CHECK_LIB(wrap, request_init, [],
+   AC_MSG_ERROR([request_init() not found. console-socket requires 
libwrap.]))
+  AC_DEFINE(ENABLE_OSM_CONSOLE_SOCKET,
+   1,
+   [Define as 1 if you want to enable a console on a socket 
connection])
+fi
+
 dnl Provide user option to select vendor
 OPENIB_APP_OSMV_SEL
 
diff --git a/osm/opensm/main.c b/osm/opensm/main.c
index 374d323..90432be 100644
--- a/osm/opensm/main.c
+++ b/osm/opensm/main.c
@@ -217,6 +217,11 @@ show_usage(void)
 4 outstanding SMPs.\n\n );
   printf( -console\n
 This option brings up the OpenSM console.\n\n );
+#ifdef ENABLE_OSM_CONSOLE_SOCKET
+  printf( --console_port port\n
+Specify an alternate telnet port for the console (default 
%d).\n\n,
+ OSM_DEFAULT_CONSOLE_PORT);
+#endif
   printf( -i equalize-ignore-guids-file\n
   -ignore-guids equalize-ignore-guids-file\n
 This option provides the means to define a set of ports\n
@@ -578,6 +583,9 @@ main(
   {  cache-options, 0, NULL, 'c'},
   {  stay_on_fatal, 0, NULL, 'y'},
   {  honor_guid2lid, 0, NULL, 'x'},
+#ifdef ENABLE_OSM_CONSOLE_SOCKET
+  {  console_port,  1, NULL, 'C'},
+#endif
   {  NULL,0, NULL,  0 }  /* Required at the end of the array */
 };
 
@@ -679,6 +687,12 @@ main(
   printf( Enabling OpenSM interactive console\n);
   break;
 
+#ifdef ENABLE_OSM_CONSOLE_SOCKET
+case 'C':
+  opt.console_port = strtol(optarg, NULL, 0);
+  break;
+#endif
+
 case 'd':
   dbg_lvl = strtol(optarg, NULL, 0);
   printf( d level = 0x%x\n, dbg_lvl);
@@ -931,15 +945,11 @@ main(
   }
   else
   {
+osm_console_init(opt, osm);
+
 /*
   Sit here forever
-  In the future, some sort of console interactivity could
-  be implemented in this loop.
 */
-if (opt.console) {
-  printf(\nOpenSM Console\n\n);
-  osm_console_prompt();
-}
 while( !osm_exit_flag ) {
   if (opt.console)
 osm_console(osm);
@@ 

Re: [openib-general] best way to get ibv_get_cq_event to return

2007-01-08 Thread Greenwood, Steve
Guys,
Thanks for the information - I'll give it a try.
SRG

-Original Message-
From: Roland Dreier [mailto:[EMAIL PROTECTED] 
Sent: Sunday, January 07, 2007 1:30 PM
To: Dotan Barak
Cc: Or Gerlitz; Greenwood, Steve; openib-general@openib.org
Subject: Re: [openib-general] best way to get ibv_get_cq_event to return

  This is true (and i guess that it will work), but if in the future
the 
  implementation of the ibv_comp_channel will be changed,
  this code will not work 

The use of a file descriptor is pretty fundamental, and it was done
exactly to permit this sort of stuff (poll(), epoll, SIGIO, etc).  So
I think it is extremely unlikely to change in a way that would break
an app using the file descriptor.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] OpenSM/osm_console.c: Add resweep and status commands

2007-01-08 Thread Hal Rosenstock
OpenSM/osm_console.c: Add resweep and status commands

Signed-off-by: Ira Weiny [EMAIL PROTECTED]
Signed-off-by: Hal Rosenstock [EMAIL PROTECTED]

diff --git a/osm/opensm/osm_console.c b/osm/opensm/osm_console.c
index 8d770aa..8157f90 100644
--- a/osm/opensm/osm_console.c
+++ b/osm/opensm/osm_console.c
@@ -84,6 +84,20 @@ static void help_quit(FILE *out, int det
 static void help_loglevel(FILE *out, int detail)
 {
fprintf(out, loglevel [log-level]\n);
+   if (detail) {
+   fprintf(out,log-level is OR'ed from the following\n);
+   fprintf(out,OSM_LOG_NONE 0x%02X\n, 
OSM_LOG_NONE);
+   fprintf(out,OSM_LOG_ERROR0x%02X\n, 
OSM_LOG_ERROR);
+   fprintf(out,OSM_LOG_INFO 0x%02X\n, 
OSM_LOG_INFO);
+   fprintf(out,OSM_LOG_VERBOSE  0x%02X\n, 
OSM_LOG_VERBOSE);
+   fprintf(out,OSM_LOG_DEBUG0x%02X\n, 
OSM_LOG_DEBUG);
+   fprintf(out,OSM_LOG_FUNCS0x%02X\n, 
OSM_LOG_FUNCS);
+   fprintf(out,OSM_LOG_FRAMES   0x%02X\n, 
OSM_LOG_FRAMES);
+   fprintf(out,OSM_LOG_ROUTING  0x%02X\n, 
OSM_LOG_ROUTING);
+   fprintf(out,OSM_LOG_SYS  0x%02X\n, 
OSM_LOG_SYS);
+   fprintf(out, \n);
+   fprintf(out,OSM_LOG_DEFAULT_LEVEL0x%02X\n, 
OSM_LOG_DEFAULT_LEVEL);
+   }
 }
 
 static void help_priority(FILE *out, int detail)
@@ -91,6 +105,16 @@ static void help_priority(FILE *out, int
fprintf(out, priority [sm-priority]\n);
 }
 
+static void help_resweep(FILE *out, int detail)
+{
+   fprintf(out, resweep [heavy|light]\n);
+}
+
+static void help_status(FILE *out, int detail)
+{
+   fprintf(out, status\n);
+}
+
 /* more help routines go here */
 
 static void help_parse(char **p_last, osm_opensm_t *p_osm, FILE *out)
@@ -164,6 +188,99 @@ static void priority_parse(char **p_last
}
 }
 
+static char *sm_state_str(int state)
+{
+   switch (state)
+   {
+   case IB_SMINFO_STATE_INIT:
+   return (Init);
+   case IB_SMINFO_STATE_DISCOVERING:
+   return (Discovering);
+   case IB_SMINFO_STATE_STANDBY:
+   return (Standby);
+   case IB_SMINFO_STATE_NOTACTIVE:
+   return (Not Active);
+   case IB_SMINFO_STATE_MASTER:
+   return (Master);
+   }
+   return (UNKNOWN);
+}
+
+static char *sa_state_str(osm_sa_state_t state)
+{
+   switch (state)
+   {
+   case OSM_SA_STATE_INIT:
+   return (Init);
+   case OSM_SA_STATE_READY:
+   return (Ready);
+   }
+   return (UNKNOWN);
+}
+
+static void status_parse(char **p_last, osm_opensm_t *p_osm, FILE *out)
+{
+   fprintf(out,SM State : %s\n,
+   sm_state_str(p_osm-subn.sm_state));
+   fprintf(out,SA State : %s\n,
+   sa_state_str(p_osm-sa.state));
+   fprintf(out,MAD stats\n
+   -\n
+   QP0 MADS outstanding   : %d\n
+   QP0 MADS outstanding (on wire) : %d\n
+   QP0 MADS rcvd  : %d\n
+   QP0 MADS sent  : %d\n
+   QP0 unicasts sent  : %d\n
+   QP1 MADS outstanding   : %d\n
+   QP1 MADS rcvd  : %d\n
+   QP1 MADS sent  : %d\n
+,
+   p_osm-stats.qp0_mads_outstanding,
+   p_osm-stats.qp0_mads_outstanding_on_wire,
+   p_osm-stats.qp0_mads_rcvd,
+   p_osm-stats.qp0_mads_sent,
+   p_osm-stats.qp0_unicasts_sent,
+   p_osm-stats.qp1_mads_outstanding,
+   p_osm-stats.qp1_mads_rcvd,
+   p_osm-stats.qp1_mads_sent
+   );
+   fprintf(out,Subnet flags\n
+   \n
+   Ignore existing lfts   : %d\n
+   Subnet Init errors : %d\n
+   In sweep hop 0 : %d\n
+   Moved to master state  : %d\n
+   First time master sweep: %d\n
+   Coming out of standby  : %d\n
+,
+   p_osm-subn.ignore_existing_lfts,
+   p_osm-subn.subnet_initialization_error,
+   p_osm-subn.in_sweep_hop_0,
+   p_osm-subn.moved_to_master_state,
+   p_osm-subn.first_time_master_sweep,
+   p_osm-subn.coming_out_of_standby
+   );
+   fprintf(out, \n);
+}
+
+static void resweep_parse(char **p_last, osm_opensm_t *p_osm, FILE *out)
+{
+   char *p_cmd;
+
+   p_cmd = next_token(p_last);
+   if (!p_cmd ||
+   (strcmp(p_cmd, heavy) != 0  
+   

Re: [openib-general] [openfabrics-ewg] OFED 1.2 Questions

2007-01-08 Thread Tziporet Koren
Michael S. Tsirkin wrote:
 Tziporet,

 I'm in the process of adding the Chelsio T3 drivers to the OFED
 repository and I have a question:

 The HowTo kernel section you posted on the wiki sez to add the new files
 to the repos directly via a git commit, but create patches for
 modifications to existing files and put the patches in the
 kernel_patches/fixes directory.  However, I don't see patches in that
 directory to modify the core Makefile/Kconfig for SDP or other new
 modules added for ofed.  So should I just modify infiniband/Makefile and
 Kconfig via the git commit that adds the new Chelsio files, or create a
 patch file and put it in kernel_patches/fixes?
 

 Yes you can modify the Makefile/Kconfig directly.
 Reason being, its always trivial to resolve conflicts there
 when merging from upstream.
  
   
After you check its working if you changed the general Makfiles/Kconfig 
please send the patches to Vlad.
 Also, are there machines available with the various ofed supported
 distros installed that I can do compile testing for the Chelsio user
 lib?

 
You can compile on the OFA server - but this has only Ubuntu OS.
For testing in other OSes  you should setup systems in your company.

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] using IB on a port without IPoIB running NIC

2007-01-08 Thread Tang, Changqing

Or:
Thank you for the information, I may change my mind to require
IPoIB to run newer version of HP-MPI on OFED 1.2, if I don't find other
way to easily establish IB connection dynamically between two process
groups with dynamic size.

--CQ

 

 -Original Message-
 From: Or Gerlitz [mailto:[EMAIL PROTECTED] 
 Sent: Monday, January 08, 2007 1:18 AM
 To: Tang, Changqing
 Cc: openib-general@openib.org
 Subject: using IB on a port without IPoIB running NIC
 
 Tang, Changqing wrote:
  We understand that, but we hope to have a connect/accept style IB 
  connection setup, without IPoIB involved,
 
  like HP-UX IT-API(similar to uDAPL without underlying IP 
 support), it 
  works with multiple cards.
 
  Configure 4-5 IP addresses on a single node is kind of silly.
 
 CQ,
 
 Few more thoughts on your being able to MPI on an IB PORT 
 without an IPoIB working NIC requirement...
 
 Basically, people use IB for both IPC and I/O, where except 
 for SRP, all the IB I/O ULPs (both block based: iSER and file 
 based: Lustre, GPFS,
 rNFS) use IP addressing and hence are either coded to the 
 RDMA CM or work on top of TCP/IP (iSCSI-TCP, NFS, pFS, etc).
 
 So if the user will not configure IPoIB on this IB port, it 
 will not be utilized for I/O.
 
 Now, you mention a use case of 4 cards on a node, I believe 
 that typically this would happen on big SMP machines where 
 you **must** use all the active IB links for I/O: eg when 
 most of your MPI work is within the SMP (128 to 512 ranks) 
 and most of the IB work is for I/O .
 
 I understand (please check and let me know eg about HP 1U 
 offering) that all/most nowadays 1U PCI-EX nodes can have at 
 most **one** PCI-EX card.
 
 Combing the above limitation with the fact that these nodes 
 would run at most 16 ranks (eg 8 dual-core CPUs) and that 8 
 ranks/IB link is a ratio that makes sense, we are remained 
 with **2** and not 4-5 NICs to configure.
 
 Oh, and one more thing, 4 IB links per node would make an N 
 node cluster to 4N IB end-ports cluster for which you need 
 f(4N) switching IB ports, and the specific f(.) might turn 
 the IB deployment over this cluster into very expensive one...
 
 Or.
 
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] using IB on a port without IPoIB running NIC

2007-01-08 Thread Michael S. Tsirkin
   Thank you for the information, I may change my mind to require
 IPoIB to run newer version of HP-MPI on OFED 1.2, if I don't find other
 way to easily establish IB connection dynamically between two process
 groups with dynamic size.

I'm not really sure what your needs are, but it's not like this is
completely impossible.

Some people use ad-hoc socket-based tricks establish IB connections,
and this will work for some topologies. You can look at libibverbs/examples
for an example of such implementation.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] rdma_cm iWARP connection setup timeouts reported as rejects.

2007-01-08 Thread Steve Wise
On Mon, 2007-01-08 at 12:13 +0100, Mirko Benz wrote:
 Hi,
 
 What could be the reasons for these timeouts to occur?

One way: If the host is not reachable but the next hop neighbour is,
then the connection attempt will timeout.

Another way is if, for some reason, the MPA negotiation doesn't complete
in a timely manner.  For instance, if the passive side never
rdma_accept()s the connection, then the active side should eventually
timeout the attempt and return a timeout error to the consumer.


 How should an application handle this?
 

Applications should handle connection timeouts however they want.
Usually they just report it to the user.


Steve.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] ofed_1_2 configure script typo

2007-01-08 Thread Steve Wise
Typo in OFED 1.2 configure script.

From: Steve Wise [EMAIL PROTECTED]

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 ofed_scripts/configure |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/ofed_scripts/configure b/ofed_scripts/configure
index 5a1694d..a0557e2 100755
--- a/ofed_scripts/configure
+++ b/ofed_scripts/configure
@@ -598,7 +598,7 @@ main()
 --with-vnic_debug-mod)
 CONFIG_INFINIBAND_VNIC_DEBUG=y
 ;;
---without-vnic-mod)
+--without-vnic_debug-mod)
 CONFIG_INFINIBAND_VNIC_DEBUG=
;;
 --with-vnic_stats-mod)



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] using IB on a port without IPoIB running NIC

2007-01-08 Thread Tang, Changqing

What I need is that, without IPoIB, how do I wire IB connection ?
Currently with Verbs API, it is an alltoall QP number exchange. I want
to remove the alltoall
QP number exchange in MPI dynamic process.

--CQ

 -Original Message-
 From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] 
 Sent: Monday, January 08, 2007 8:18 AM
 To: Tang, Changqing
 Cc: Or Gerlitz; openib-general@openib.org
 Subject: Re: using IB on a port without IPoIB running NIC
 
  Thank you for the information, I may change my mind to 
 require IPoIB 
  to run newer version of HP-MPI on OFED 1.2, if I don't find 
 other way 
  to easily establish IB connection dynamically between two process 
  groups with dynamic size.
 
 I'm not really sure what your needs are, but it's not like 
 this is completely impossible.
 
 Some people use ad-hoc socket-based tricks establish IB 
 connections, and this will work for some topologies. You can 
 look at libibverbs/examples for an example of such implementation.
 
 --
 MST
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH TRIVIAL] opensm: eliminate some local variable

2007-01-08 Thread Hal Rosenstock
On Sun, 2007-01-07 at 15:38, Sasha Khapyorsky wrote:
 This trivially eliminates some local variable.
 
 Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]

Thanks. Applied.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] ofed_1_2 configure script typo

2007-01-08 Thread Vladimir Sokolovsky
Applied,

Regards,
Vladimir

On Mon, 2007-01-08 at 08:59 -0600, Steve Wise wrote:
 Typo in OFED 1.2 configure script.
 
 From: Steve Wise [EMAIL PROTECTED]
 
 Signed-off-by: Steve Wise [EMAIL PROTECTED]
 ---
 
  ofed_scripts/configure |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/ofed_scripts/configure b/ofed_scripts/configure
 index 5a1694d..a0557e2 100755
 --- a/ofed_scripts/configure
 +++ b/ofed_scripts/configure
 @@ -598,7 +598,7 @@ main()
  --with-vnic_debug-mod)
  CONFIG_INFINIBAND_VNIC_DEBUG=y
  ;;
 ---without-vnic-mod)
 +--without-vnic_debug-mod)
  CONFIG_INFINIBAND_VNIC_DEBUG=
 ;;
  --with-vnic_stats-mod)
 
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] using IB on a port without IPoIB running NIC

2007-01-08 Thread Michael S. Tsirkin
 What I need is that, without IPoIB, how do I wire IB connection ?
 Currently with Verbs API, it is an alltoall QP number exchange. I want
 to remove the alltoall
 QP number exchange in MPI dynamic process.

Well, does your MPI implementation currently use librdmacm?
If not, you don't currently have a dependency on IPoIB and
probably have no reason to introduce one.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] using IB on a port without IPoIB running NIC

2007-01-08 Thread Tang, Changqing
   What I need is that, without IPoIB, how do I wire IB connection ?
  Currently with Verbs API, it is an alltoall QP number 
 exchange. I want 
  to remove the alltoall QP number exchange in MPI dynamic process.
 
 Well, does your MPI implementation currently use librdmacm?

No, we don't use both librdmacm and libibcm.

 If not, you don't currently have a dependency on IPoIB and 
 probably have no reason to introduce one.

As I said, the problem is the alltoall QP number exchange. I hope that a
process can only provide one piece of information(such as ip/port in
TCP/IP) so that all other processes have the same piece of info and can
make connection to it.


--CQ


 
 --
 MST
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] using IB on a port without IPoIB running NIC

2007-01-08 Thread Michael S. Tsirkin
  If not, you don't currently have a dependency on IPoIB and 
  probably have no reason to introduce one.
 
 As I said, the problem is the alltoall QP number exchange. I hope that a
 process can only provide one piece of information(such as ip/port in
 TCP/IP) so that all other processes have the same piece of info and can
 make connection to it.

Well, start with a socket, each time a process connects
create a QP on both sides and exchange the 2 QP numbers?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OFED 1.2, iw_cxgb3, and genalloc()

2007-01-08 Thread Steve Wise
I've packaged the Chelsio T3 Drivers (modules iw_cxgb3 and cxgb3) into
Vlad's ofed_1_2 repos and I'm testing now.  I've run into an issue with
the Chelsio driver.  It requires the kernel genalloc() allocator, which
is only built into the kernel if any code requires it at config time of
the kernel.  Also, it was new to 2.6.17 or 2.6.18 so it won't exist for
older OFED distros like SLES.  

So there are two related issues:  

1) the genalloc services don't exist in older kernels. 
2) Even if it does exist in the kernel src tree on a distro, it might
not have been built in if nothing required that service when the kernel
was configured.  

I need to handle both cases.

I'm seeking advice on how to pull this functionality in for ofed 1.2.
My initial thought is to add a patch similar to the memtrack patch and
add it either as a module or as part of the iw_cxgb3 module.  I could
even rename the services and always add them so I can avoid having to
detect if its in the running kernel.

Any Ideas/comments?

Thanks,

Steve.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] using IB on a port without IPoIB running NIC

2007-01-08 Thread Tang, Changqing
 
  As I said, the problem is the alltoall QP number exchange. 
 I hope that 
  a process can only provide one piece of information(such as 
 ip/port in
  TCP/IP) so that all other processes have the same piece of info and 
  can make connection to it.
 
 Well, start with a socket, each time a process connects 
 create a QP on both sides and exchange the 2 QP numbers?

Then the speed would be a big concern.

--CQ


 
 --
 MST
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] opensm: eliminate port/switch_info access methods

2007-01-08 Thread Hal Rosenstock
On Sun, 2007-01-07 at 18:01, Sasha Khapyorsky wrote:
 Following previous patch (remove osm_physp_get_port_info_ptr() checks)
 this removes confused functions osm_physp_get_port_info_ptr() and
 osm_switch_get_si_ptr().
 
 Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]

Thanks. Applied.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] [MINOR] perftest: send_bw: fix dangling else

2007-01-08 Thread Yosef Etigin
Symptom: ib_send_bw reports 'inf' bandwidth
Cause: dangling else

Signed-off-by: Yosef Etigin [EMAIL PROTECTED]
---
diff -rup a/src/userspace/perftest/send_bw.c b/src/userspace/perftest/send_bw.c
--- a/src/userspace/perftest/send_bw.c  2007-01-08 18:20:08.0 +0200
+++ b/src/userspace/perftest/send_bw.c  2007-01-08 18:21:06.0 +0200
@@ -1156,12 +1156,14 @@ int main(int argc, char *argv[])
rem_dest = pp_server_exch_dest(sockfd, 
my_dest);
}
} else {
-   if (user_param.duplex)
+   if (user_param.duplex) {
if (run_iter_bi(ctx, user_param, rem_dest, size))
return 18;
-   else
+   }
+   else {
if(run_iter_uni(ctx, user_param, rem_dest, size))
return 18;
+   }
 
if (user_param.servername)
print_report(user_param.iters, size, user_param.duplex, 
tposted, tcompleted);
--
Yosef Etigin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Bad URL on OFED Development Wiki Site

2007-01-08 Thread Steve Wise

This URL is bad on the OFED Development Wiki page:

https://wiki.openfabrics.org/tiki/tiki-download_file.php?fileId=23

It is supposed to be a OFED Release Process presentation.

Thanks,

Steve.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 2.6.20: outstanding patches and issues

2007-01-08 Thread Sean Hefty
   sean_cm_limit_mra_timeout.patch

I don't believe that I ever sent Roland a patch for merging upstream.  The last 
patch I remember sending was untested and waiting for some feedback.  I can 
resubmit this patch if it is working for you.  (Was this in OFED 1.1?)

 There are 3 Sean's patches I think we need
   rdma_ucm: fix reporting events with invalid user context
   rdma_ucm: fix struct ucma_event
   rdma_cm: avoid port reuse after close

The first two were pulled upstream.  I have not published the port reuse patch 
in any git branch yet, but can add it to my multicast-sa_cache branch if needed.

   Dotan reported oops with ucma at openib restart.
   Sean - any luck in reproducing this?

I have not, but maybe there's a difference in our configuration.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 2.6.20: outstanding patches and issues

2007-01-08 Thread Michael S. Tsirkin
  sean_cm_limit_mra_timeout.patch
 
 I don't believe that I ever sent Roland a patch for merging upstream.  The 
 last 
 patch I remember sending was untested and waiting for some feedback.  I can 
 resubmit this patch if it is working for you.  (Was this in OFED 1.1?)

Yes, it was in OFED, and it solves real problem with
misbehaved remote. I did say this works for us, did I not?
Let's have this in 2.6.20 - is there need to resend?

Acked-by: Michael S. Tsirkin [EMAIL PROTECTED]

  There are 3 Sean's patches I think we need
  rdma_ucm: fix reporting events with invalid user context
  rdma_ucm: fix struct ucma_event
  rdma_cm: avoid port reuse after close
 
 The first two were pulled upstream.  I have not published the port reuse 
 patch 
 in any git branch yet, but can add it to my multicast-sa_cache branch if 
 needed.

OK.
The patch is small enough though - I hope it just lands upstream and we don't
have to maintain it in side branch.

Acked-by: Michael S. Tsirkin [EMAIL PROTECTED]

  Dotan reported oops with ucma at openib restart.
  Sean - any luck in reproducing this?
 
 I have not, but maybe there's a difference in our configuration.

Hmm. One of these then. So where do we go from here?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Fwd: [ANNOUNCE] GIT 1.4.4.4

2007-01-08 Thread Michael S. Tsirkin
FYI.
The infinite loop fix looks potentially relevant, so I guess
we should update staging. Sasha?

- Forwarded message from Junio C Hamano [EMAIL PROTECTED] -

Subject: [ANNOUNCE] GIT 1.4.4.4
Date: Mon, 8 Jan 2007 05:30:50 +0200
From: Junio C Hamano [EMAIL PROTECTED]

The latest maintenance release GIT 1.4.4.4 is available at the
usual places:

  http://www.kernel.org/pub/software/scm/git/

  git-1.4.4.4.tar.{gz,bz2}  (tarball)
  git-htmldocs-1.4.4.4.tar.{gz,bz2} (preformatted docs)
  git-manpages-1.4.4.4.tar.{gz,bz2} (preformatted docs)
  RPMS/$arch/git-*-1.4.4.4-1.$arch.rpm  (RPM)

This is to push out a handful bugfixes since 1.4.4.3.

On the 'master' development front, the stabilization for v1.5.0
will start soonish.



Changes since v1.4.4.3 are as follows:

Johannes Schindelin (1):
  diff --check: fix off by one error

Junio C Hamano (3):
  spurious .sp in manpages
  Fix infinite loop when deleting multiple packed refs.
  pack-check.c::verify_packfile(): don't run SHA-1 update on huge data



-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

- End forwarded message -

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Infiniband Network Library

2007-01-08 Thread Sean Hubbell
Hello,

  I have a question that is slightly off topic but I would think that 
this would be to ask the question. So, here goes ... I have been using 
InfiniBand here for about 2 years now. I have had to make significant 
work arounds for our current, third party network API that we purchased 
and continue to watch if fall down and still not take advantage on the 
bandwidth that I need. With that said, does anyone on this list have a 
recommendation for an InfiniBand capable network library?

Thanks in advance,

Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts

2007-01-08 Thread Sean Hefty
Limit the timeout that the ib_cm will wait to receive a response to
a message, to avoid excessively large (on the order of hours) timeout
values.  This prevents consuming resources tracking requests for
extended periods of time.

This helps correct for a bug in the SRP Engenio target sending a large
value ( 1 hour) as a service timeout.

Signed-off-by: Sean Hefty [EMAIL PROTECTED]
---
 drivers/infiniband/core/cm.c |   30 +++---
 1 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d446998..147b41e 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -54,6 +54,12 @@ MODULE_AUTHOR(Sean Hefty);
 MODULE_DESCRIPTION(InfiniBand CM);
 MODULE_LICENSE(Dual BSD/GPL);
 
+/*
+ * Limit CM msg timeouts to something reasonable.
+ * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min.
+ */
+#define IB_CM_MAX_TIMEOUT 21
+
 static void cm_add_one(struct ib_device *device);
 static void cm_remove_one(struct ib_device *device);
 
@@ -888,12 +894,12 @@ static void cm_format_req(struct cm_req_msg *req_msg,
cm_req_set_resp_res(req_msg, param-responder_resources);
cm_req_set_init_depth(req_msg, param-initiator_depth);
cm_req_set_remote_resp_timeout(req_msg,
-  param-remote_cm_response_timeout);
+   min((u8) IB_CM_MAX_TIMEOUT, param-remote_cm_response_timeout));
cm_req_set_qp_type(req_msg, param-qp_type);
cm_req_set_flow_ctrl(req_msg, param-flow_control);
cm_req_set_starting_psn(req_msg, cpu_to_be32(param-starting_psn));
cm_req_set_local_resp_timeout(req_msg,
- param-local_cm_response_timeout);
+   min((u8) IB_CM_MAX_TIMEOUT, param-local_cm_response_timeout));
cm_req_set_retry_count(req_msg, param-retry_count);
req_msg-pkey = param-primary_path-pkey;
cm_req_set_path_mtu(req_msg, param-primary_path-mtu);
@@ -999,10 +1005,10 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
}
cm_id-service_id = param-service_id;
cm_id-service_mask = __constant_cpu_to_be64(~0ULL);
-   cm_id_priv-timeout_ms = cm_convert_to_ms(
-   param-primary_path-packet_life_time) * 2 +
-cm_convert_to_ms(
-   param-remote_cm_response_timeout);
+   cm_id_priv-timeout_ms = 
+   min(IB_CM_MAX_TIMEOUT,
+   cm_convert_to_ms(param-primary_path-packet_life_time) * 2 
+
+   cm_convert_to_ms(param-remote_cm_response_timeout));
cm_id_priv-max_cm_retries = param-max_cm_retries;
cm_id_priv-initiator_depth = param-initiator_depth;
cm_id_priv-responder_resources = param-responder_resources;
@@ -1400,8 +1406,9 @@ static int cm_req_handler(struct cm_work *work)
}
}
cm_id_priv-tid = req_msg-hdr.tid;
-   cm_id_priv-timeout_ms = cm_convert_to_ms(
-   cm_req_get_local_resp_timeout(req_msg));
+   cm_id_priv-timeout_ms =
+   min(IB_CM_MAX_TIMEOUT,
+   cm_convert_to_ms(cm_req_get_local_resp_timeout(req_msg)));
cm_id_priv-max_cm_retries = cm_req_get_max_cm_retries(req_msg);
cm_id_priv-remote_qpn = cm_req_get_local_qpn(req_msg);
cm_id_priv-initiator_depth = cm_req_get_resp_res(req_msg);
@@ -2303,8 +2310,9 @@ static int cm_mra_handler(struct cm_work *work)
work-cm_event.private_data = mra_msg-private_data;
work-cm_event.param.mra_rcvd.service_timeout =
cm_mra_get_service_timeout(mra_msg);
-   timeout = cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) +
- cm_convert_to_ms(cm_id_priv-av.packet_life_time);
+   timeout = min(IB_CM_MAX_TIMEOUT,
+ cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) +
+ cm_convert_to_ms(cm_id_priv-av.packet_life_time));
 
spin_lock_irqsave(cm_id_priv-lock, flags);
switch (cm_id_priv-id.state) {
@@ -2707,7 +2715,7 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id,
 
cm_id-service_id = param-service_id;
cm_id-service_mask = __constant_cpu_to_be64(~0ULL);
-   cm_id_priv-timeout_ms = param-timeout_ms;
+   cm_id_priv-timeout_ms = min(IB_CM_MAX_TIMEOUT, param-timeout_ms);
cm_id_priv-max_cm_retries = param-max_cm_retries;
ret = cm_alloc_msg(cm_id_priv, msg);
if (ret)
-- 
1.4.4.3



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA

2007-01-08 Thread Sean Hefty
 I just noticed that once i apply the patch, the last + lines (that is 
 pthread_mutex_lock, while loop doing pthread_cond_wait and then 
 pthread_mutex_unlock) become part of rdma_leave_multicast which seems to 
 me strictly buggy as no one is going to wake up this code.

The leave must wait until all events have been reported on the multicast group. 
  There can be more than one event on a group if an error occurs.  See 
ucma_complete_mc_event() for where the condition is signaled.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH RFC 0/2] ofed_1_2 - Chelsio T3 RDMA Support

2007-01-08 Thread Steve Wise

This series adds the Chelsio T3 drivers to the ofed_1_2 tree. For
this review, I've omitted the patch that actually adds the two drivers
themselves, and just included the changes to the ofed_1_2 configuration
scripts and the new kernel_patches/ files needed.  The driver code itself
is on track to go into either 2.6.20 or 2.6.21.

I would appreciate any feedback/comments on what I've done.  This is
just for review. I'm still testing it.


Here are the key changes:

The package now needs to visit drivers/net to build the T3 Ethernet
driver which is required for the T3 RDMA driver.

Added a patch to backport the Linux 2.6.20 genalloc() services.  I added
the allocator as local services to the T3 RDMA module.

Core changes are required for the T3 driver.  This includes the addition
of a udata pointer parameter to the ib_req_notify_cq() provider method.
This is still being discussed on the openib-general list and I'll update
it accordingly once we finalize the solution.


Steve.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 1/2] ofed_1_2 Changes to kernel_patches/ for Chelsio T3 Support.

2007-01-08 Thread Steve Wise

- rdma core changes needed for T3 Support.
- genalloc backport.
- modified the qp_num - qp ptr patch to include cxgb3.

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 kernel_patches/fixes/genalloc.patch|  392 
 kernel_patches/fixes/ib_wc_qpn_to_qp.patch |   13 +
 kernel_patches/fixes/t3_core_changes.patch |  202 ++
 3 files changed, 607 insertions(+), 0 deletions(-)

diff --git a/kernel_patches/fixes/genalloc.patch 
b/kernel_patches/fixes/genalloc.patch
new file mode 100644
index 000..c44a98f
--- /dev/null
+++ b/kernel_patches/fixes/genalloc.patch
@@ -0,0 +1,392 @@
+Backport of the Linux 2.6.20 generic allocator.
+
+From: Steve Wise [EMAIL PROTECTED]
+
+Signed-off-by: Steve Wise [EMAIL PROTECTED]
+---
+
+ drivers/infiniband/hw/cxgb3/Kconfig  |1 
+ drivers/infiniband/hw/cxgb3/Makefile |3 
+ drivers/infiniband/hw/cxgb3/core/cxio_hal.h  |4 
+ drivers/infiniband/hw/cxgb3/core/cxio_resource.c |   20 +-
+ drivers/infiniband/hw/cxgb3/core/cxio_resource.h |2 
+ drivers/infiniband/hw/cxgb3/core/genalloc.c  |  196 ++
+ drivers/infiniband/hw/cxgb3/core/genalloc.h  |   36 
+ 7 files changed, 247 insertions(+), 15 deletions(-)
+
+diff --git a/drivers/infiniband/hw/cxgb3/Kconfig 
b/drivers/infiniband/hw/cxgb3/Kconfig
+index d3db264..0361a72 100644
+--- a/drivers/infiniband/hw/cxgb3/Kconfig
 b/drivers/infiniband/hw/cxgb3/Kconfig
+@@ -1,7 +1,6 @@
+ config INFINIBAND_CXGB3
+   tristate Chelsio RDMA Driver
+   depends on CHELSIO_T3  INFINIBAND
+-  select GENERIC_ALLOCATOR
+   ---help---
+ This is an iWARP/RDMA driver for the Chelsio T3 1GbE and
+ 10GbE adapters.
+diff --git a/drivers/infiniband/hw/cxgb3/Makefile 
b/drivers/infiniband/hw/cxgb3/Makefile
+index 7a89f6d..12e7a94 100644
+--- a/drivers/infiniband/hw/cxgb3/Makefile
 b/drivers/infiniband/hw/cxgb3/Makefile
+@@ -4,7 +4,8 @@ EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/
+ obj-$(CONFIG_INFINIBAND_CXGB3) += iw_cxgb3.o
+ 
+ iw_cxgb3-y :=  iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \
+- iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o
++ iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o \
++  core/genalloc.o
+ 
+ ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
+ EXTRA_CFLAGS += -DDEBUG -g 
+diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h 
b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
+index e5e702d..a9e8452 100644
+--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
 b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
+@@ -104,8 +104,8 @@ struct cxio_rdev {
+   u32 qpnr;
+   u32 qpmask;
+   struct cxio_ucontext uctx;
+-  struct gen_pool *pbl_pool;
+-  struct gen_pool *rqt_pool;
++  struct iwch_gen_pool *pbl_pool;
++  struct iwch_gen_pool *rqt_pool;
+ };
+ 
+ static inline int cxio_num_stags(struct cxio_rdev *rdev_p)
+diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c 
b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
+index d1d8722..cecb27b 100644
+--- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
 b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
+@@ -265,7 +265,7 @@ #define PBL_CHUNK 2*1024*1024  
+ 
+ u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size)
+ {
+-  unsigned long addr = gen_pool_alloc(rdev_p-pbl_pool, size);
++  unsigned long addr = iwch_gen_pool_alloc(rdev_p-pbl_pool, size);
+   PDBG(%s addr 0x%x size %d\n, __FUNCTION__, (u32)addr, size);
+   return (u32)addr;
+ }
+@@ -273,24 +273,24 @@ u32 cxio_hal_pblpool_alloc(struct cxio_r
+ void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size)
+ {
+   PDBG(%s addr 0x%x size %d\n, __FUNCTION__, addr, size);
+-  gen_pool_free(rdev_p-pbl_pool, (unsigned long)addr, size);
++  iwch_gen_pool_free(rdev_p-pbl_pool, (unsigned long)addr, size);
+ }
+ 
+ int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p)
+ {
+   unsigned long i;
+-  rdev_p-pbl_pool = gen_pool_create(MIN_PBL_SHIFT, -1);
++  rdev_p-pbl_pool = iwch_gen_pool_create(MIN_PBL_SHIFT, -1);
+   if (rdev_p-pbl_pool)
+   for (i = rdev_p-rnic_info.pbl_base;
+i = rdev_p-rnic_info.pbl_top - PBL_CHUNK + 1;
+i += PBL_CHUNK)
+-  gen_pool_add(rdev_p-pbl_pool, i, PBL_CHUNK, -1);
++  iwch_gen_pool_add(rdev_p-pbl_pool, i, PBL_CHUNK, -1);
+   return rdev_p-pbl_pool ? 0 : -ENOMEM;
+ }
+ 
+ void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p)
+ {
+-  gen_pool_destroy(rdev_p-pbl_pool);
++  iwch_gen_pool_destroy(rdev_p-pbl_pool);
+ }
+ 
+ /*
+@@ -302,7 +302,7 @@ #define RQT_CHUNK 2*1024*1024  
+ 
+ u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size)
+ {
+-  unsigned long addr = gen_pool_alloc(rdev_p-rqt_pool, size  6);
++  unsigned long addr = iwch_gen_pool_alloc(rdev_p-rqt_pool, size  6);
+  

[openib-general] [PATCH 2/2] ofed_1_2 Changes to ofed scripts for Chelsio T3 Support.

2007-01-08 Thread Steve Wise

Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 ofed_scripts/Makefile  |9 +++--
 ofed_scripts/configure |   47 +++
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/ofed_scripts/Makefile b/ofed_scripts/Makefile
index d63b1d2..049e533 100644
--- a/ofed_scripts/Makefile
+++ b/ofed_scripts/Makefile
@@ -46,8 +46,10 @@ kernel:
@echo Kernel sources: $(KSRC)
env EXTRA_CFLAGS=$(OPENIB_KERNEL_EXTRA_CFLAGS) 
$(KERNEL_MEMTRACK_CFLAGS) -I$(CWD)/include -I$(CWD)/drivers/infiniband/include \
-I$(CWD)/drivers/infiniband/ulp/ipoib \
-   -I$(CWD)/drivers/infiniband/debug \
-   $(MAKE) -C $(KSRC) SUBDIRS=$(CWD)/drivers/infiniband 
KERNELRELEASE=$(KVERSION) \
+   -I$(CWD)/drivers/infiniband/debug \
+   -I$(CWD)/drivers/infiniband/hw/cxgb3/core \
+   -I$(CWD)/drivers/net/cxgb3  \
+   $(MAKE) -C $(KSRC) SUBDIRS=$(CWD)/drivers/infiniband 
$(CWD)/drivers/net KERNELRELEASE=$(KVERSION) \
EXTRAVERSION=$(EXTRAVERSION) V=1 $(WITH_MAKE_PARAMS) \
CONFIG_INFINIBAND=$(CONFIG_INFINIBAND) \
CONFIG_INFINIBAND_IPOIB=$(CONFIG_INFINIBAND_IPOIB) \
@@ -74,6 +76,9 @@ kernel:
CONFIG_INFINIBAND_VNIC=$(CONFIG_INFINIBAND_VNIC) \
CONFIG_INFINIBAND_VNIC_DEBUG=$(CONFIG_INFINIBAND_VNIC_DEBUG) \
CONFIG_INFINIBAND_VNIC_STATS=$(CONFIG_INFINIBAND_VNIC_STATS) \
+   CONFIG_INFINIBAND_CXGB3=$(CONFIG_INFINIBAND_CXGB3) \
+   CONFIG_INFINIBAND_CXGB3_DEBUG=$(CONFIG_INFINIBAND_CXGB3_DEBUG) \
+   CONFIG_CHELSIO_T3=$(CONFIG_CHELSIO_T3) \
LINUXINCLUDE=' \
$(BACKPORT_INCLUDES) \
-I$(CWD)/include \
diff --git a/ofed_scripts/configure b/ofed_scripts/configure
index a0557e2..08f15f5 100755
--- a/ofed_scripts/configure
+++ b/ofed_scripts/configure
@@ -126,6 +126,12 @@ Usage:  `basename $0` [options]
 --with-vnic_stats-modmake CONFIG_INFINIBAND_VNIC_STATS=y [no]
 --without-vnic_stats-mod[yes]
 
+--with-cxgb3-modmake CONFIG_INFINIBAND_CXGB3=m [no]
+--without-cxgb3-mod[yes]
+
+--with-cxgb3_debug-modmake CONFIG_INFINIBAND_CXGB3_DEBUG=y [no]
+--without-cxgb3_debug-mod[yes]
+
 --help - print out options
 
 
@@ -607,6 +613,20 @@ main()
 --without-vnic_stats-mod)
 CONFIG_INFINIBAND_VNIC_STATS=
 ;;
+--with-cxgb3-mod)
+CONFIG_INFINIBAND_CXGB3=m
+   CONFIG_CHELSIO_T3=m
+;;
+--without-cxgb3-mod)
+CONFIG_INFINIBAND_CXGB3=
+   CONFIG_CHELSIO_T3=
+;;
+--with-cxgb3_debug-mod)
+CONFIG_INFINIBAND_CXGB3_DEBUG=y
+;;
+--without-cxgb3_debug-mod)
+CONFIG_INFINIBAND_CXGB3_DEBUG=
+;;
 --with-modprobe|--without-modprobe)
 ;;
 -h | --help)
@@ -679,6 +699,8 @@ CONFIG_INFINIBAND_RDS=${CONFIG_INFINIBAN
 CONFIG_INFINIBAND_RDS_DEBUG=${CONFIG_INFINIBAND_RDS_DEBUG:-''}
 CONFIG_INFINIBAND_MADEYE=${CONFIG_INFINIBAND_MADEYE:-''}
 CONFIG_INFINIBAND_VNIC=${CONFIG_INFINIBAND_VNIC:-''}
+CONFIG_INFINIBAND_CXGB3=${CONFIG_INFINIBAND_CXGB3:-''}
+CONFIG_CHELSIO_T3=${CONFIG_CHELSIO_T3:-''}
 
 CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=${CONFIG_INFINIBAND_IPOIB_DEBUG_DATA:-''}
 CONFIG_INFINIBAND_SDP_SEND_ZCOPY=${CONFIG_INFINIBAND_SDP_SEND_ZCOPY:-''}
@@ -689,6 +711,7 @@ CONFIG_INFINIBAND_IPATH=${CONFIG_INFINIB
 CONFIG_INFINIBAND_MTHCA_DEBUG=${CONFIG_INFINIBAND_MTHCA_DEBUG:-''}
 CONFIG_INFINIBAND_VNIC_DEBUG=${CONFIG_INFINIBAND_VNIC_DEBUG:-''}
 CONFIG_INFINIBAND_VNIC_STATS=${CONFIG_INFINIBAND_VNIC_STATS:-''}
+CONFIG_INFINIBAND_CXGB3_DEBUG=${CONFIG_INFINIBAND_CXGB3_DEBUG:-''}
 
 # Check for minimal supported kernel version
 if ! check_kerver ${KVERSION} ${MIN_KVERSION}; then
@@ -742,6 +765,8 @@ CONFIG_INFINIBAND_RDS=${CONFIG_INFINIBAN
 CONFIG_INFINIBAND_RDS_DEBUG=${CONFIG_INFINIBAND_RDS_DEBUG}
 CONFIG_INFINIBAND_MADEYE=${CONFIG_INFINIBAND_MADEYE}
 CONFIG_INFINIBAND_VNIC=${CONFIG_INFINIBAND_VNIC}
+CONFIG_INFINIBAND_CXGB3=${CONFIG_INFINIBAND_CXGB3}
+CONFIG_CHELSIO_T3=${CONFIG_CHELSIO_T3}
 
 CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=${CONFIG_INFINIBAND_IPOIB_DEBUG_DATA}
 CONFIG_INFINIBAND_SDP_SEND_ZCOPY=${CONFIG_INFINIBAND_SDP_SEND_ZCOPY}
@@ -752,6 +777,7 @@ CONFIG_INFINIBAND_IPATH=${CONFIG_INFINIB
 CONFIG_INFINIBAND_MTHCA_DEBUG=${CONFIG_INFINIBAND_MTHCA_DEBUG}
 CONFIG_INFINIBAND_VNIC_DEBUG=${CONFIG_INFINIBAND_VNIC_DEBUG}
 CONFIG_INFINIBAND_VNIC_STATS=${CONFIG_INFINIBAND_VNIC_STATS}
+CONFIG_INFINIBAND_CXGB3_DEBUG=${CONFIG_INFINIBAND_CXGB3_DEBUG}
 
 

[openib-general] [PATCH untested] IB/mthca: avoid wasting MTT enties on memfree

2007-01-08 Thread Michael S. Tsirkin
I looked at what be the clean fix for the MTT SEG handling in mthca,
and I came up with the following (applies on top of the series I posted
earlier). I think this gives us an important optimization.
Roland, could you please give me a hint whether something
like this is too big a change to get into 2.6.20?


Arbel does not actually have a concept of MTT segment.
So we should set MTT segment size to 64 bit (1 entry) for memfree,
otherwise we might be wasting as much as 87% of MTT entries.

Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]

---

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c 
b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 7131446..968d151 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1051,11 +1051,7 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev *dev,
MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_EQ_OFFSET);
dev_lim-max_eqs = 1  (field  0x7);
MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MTT_OFFSET);
-   if (mthca_is_memfree(dev))
-   dev_lim-reserved_mtts = ALIGN((1  (field  4)) * 
sizeof(u64),
-  MTHCA_MTT_SEG_SIZE) / 
MTHCA_MTT_SEG_SIZE;
-   else
-   dev_lim-reserved_mtts = 1  (field  4);
+   dev_lim-reserved_mtts = 1  (field  4);
MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_MRW_SZ_OFFSET);
dev_lim-max_mrw_sz = 1  field;
MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MRW_OFFSET);
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h 
b/drivers/infiniband/hw/mthca/mthca_dev.h
index b7e42ef..0973359 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -78,16 +78,17 @@ enum {
 };
 
 enum {
-   MTHCA_EQ_CONTEXT_SIZE =  0x40,
-   MTHCA_CQ_CONTEXT_SIZE =  0x40,
-   MTHCA_QP_CONTEXT_SIZE = 0x200,
-   MTHCA_RDB_ENTRY_SIZE  =  0x20,
-   MTHCA_AV_SIZE =  0x20,
-   MTHCA_MGM_ENTRY_SIZE  =  0x40,
+   MTHCA_EQ_CONTEXT_SIZE=  0x40,
+   MTHCA_CQ_CONTEXT_SIZE=  0x40,
+   MTHCA_QP_CONTEXT_SIZE= 0x200,
+   MTHCA_RDB_ENTRY_SIZE =  0x20,
+   MTHCA_AV_SIZE=  0x20,
+   MTHCA_MGM_ENTRY_SIZE =  0x40,
+
+   MTHCA_TAVOR_MTT_SEG_SIZE =  0x40,
 
/* Arbel FW gives us these, but we need them for Tavor */
MTHCA_MPT_ENTRY_SIZE  =  0x40,
-   MTHCA_MTT_SEG_SIZE=  0x40,
 
MTHCA_QP_PER_MGM  = 4 * (MTHCA_MGM_ENTRY_SIZE / 16 - 2)
 };
@@ -595,4 +596,8 @@ static inline int mthca_is_memfree(struct mthca_dev *dev)
return dev-mthca_flags  MTHCA_FLAG_MEMFREE;
 }
 
+static inline unsigned mthca_mtt_seg_size(struct mthca_dev *dev)
+{
+   return mthca_is_memfree(dev) ? sizeof(u64) : MTHCA_TAVOR_MTT_SEG_SIZE;
+}
 #endif /* MTHCA_DEV_H */
diff --git a/drivers/infiniband/hw/mthca/mthca_main.c 
b/drivers/infiniband/hw/mthca/mthca_main.c
index bbe9143..d9d5b89 100644
--- a/drivers/infiniband/hw/mthca/mthca_main.c
+++ b/drivers/infiniband/hw/mthca/mthca_main.c
@@ -465,11 +465,11 @@ static int mthca_init_icm(struct mthca_dev *mdev,
}
 
/* CPU writes to non-reserved MTTs, while HCA might DMA to reserved 
mtts */
-   mdev-limits.reserved_mtts = ALIGN(mdev-limits.reserved_mtts * 
MTHCA_MTT_SEG_SIZE,
-  dma_get_cache_alignment()) / 
MTHCA_MTT_SEG_SIZE;
+   mdev-limits.reserved_mtts = ALIGN(mdev-limits.reserved_mtts * 
sizeof(u64),
+  dma_get_cache_alignment()) / 
sizeof(u64);
 
mdev-mr_table.mtt_table = mthca_alloc_icm_table(mdev, 
init_hca-mtt_base,
-MTHCA_MTT_SEG_SIZE,
+sizeof(u64),
 
mdev-limits.num_mtt_segs,
 
mdev-limits.reserved_mtts,
 1, 0);
diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c 
b/drivers/infiniband/hw/mthca/mthca_mr.c
index 88f9dc2..0357dbe 100644
--- a/drivers/infiniband/hw/mthca/mthca_mr.c
+++ b/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -212,7 +212,7 @@ static struct mthca_mtt *__mthca_alloc_mtt(struct mthca_dev 
*dev, int size,
 
mtt-buddy = buddy;
mtt-order = 0;
-   for (i = MTHCA_MTT_SEG_SIZE / 8; i  size; i = 1)
+   for (i = mthca_mtt_seg_size(dev) / sizeof(u64); i  size; i = 1)
++mtt-order;
 
mtt-first_seg = mthca_alloc_mtt_range(dev, mtt-order, buddy);
@@ -259,7 +259,7 @@ static int __mthca_write_mtt(struct mthca_dev *dev, struct 
mthca_mtt *mtt,
 
while (list_len  0) {
mtt_entry[0] = cpu_to_be64(dev-mr_table.mtt_base +
-  mtt-first_seg * MTHCA_MTT_SEG_SIZE +
+  mtt-first_seg * 
mthca_mtt_seg_size(dev) +
   

Re: [openib-general] [PATCH 1/2] ofed_1_2 Changes to kernel_patches/ for Chelsio T3 Support.

2007-01-08 Thread Michael S. Tsirkin
 - modified the qp_num - qp ptr patch to include cxgb3.

If you don't mind, this might be better as a separate patch - it's just easier
for me to continue pushing this upstream if I can just copy it from OFED
sources.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts

2007-01-08 Thread Michael S. Tsirkin
 Limit the timeout that the ib_cm will wait to receive a response to
 a message, to avoid excessively large (on the order of hours) timeout
 values.  This prevents consuming resources tracking requests for
 extended periods of time.
 
 This helps correct for a bug in the SRP Engenio target sending a large
 value ( 1 hour) as a service timeout.
 
 Signed-off-by: Sean Hefty [EMAIL PROTECTED]

A very similiar code is in OFED 1.1 (we chickened out and had a module parameter
to disable this just in case, but I don't think its really needed upstream).

Acked-by: Michael S. Tsirkin [EMAIL PROTECTED]

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/2] ofed_1_2 Changes to kernel_patches/ for Chelsio T3 Support.

2007-01-08 Thread Steve Wise
On Mon, 2007-01-08 at 21:29 +0200, Michael S. Tsirkin wrote:
  - modified the qp_num - qp ptr patch to include cxgb3.
 
 If you don't mind, this might be better as a separate patch - it's just easier
 for me to continue pushing this upstream if I can just copy it from OFED
 sources.
 

Ok...that makes sense.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC 0/2] ofed_1_2 - Chelsio T3 RDMA Support

2007-01-08 Thread Michael S. Tsirkin
 Core changes are required for the T3 driver.  This includes the addition
 of a udata pointer parameter to the ib_req_notify_cq() provider method.
 This is still being discussed on the openib-general list and I'll update
 it accordingly once we finalize the solution.

So what I plan to do is, review the patches are in proper format,
but delay applying until this API issue is closed. OK?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA

2007-01-08 Thread Or Gerlitz
On 1/8/07, Sean Hefty [EMAIL PROTECTED] wrote:
  I just noticed that once i apply the patch, the last + lines (that is
  pthread_mutex_lock, while loop doing pthread_cond_wait and then
  pthread_mutex_unlock) become part of rdma_leave_multicast which seems to
  me strictly buggy as no one is going to wake up this code.

 The leave must wait until all events have been reported on the multicast 
 group.
   There can be more than one event on a group if an error occurs.  See
 ucma_complete_mc_event() for where the condition is signaled.

OK, got you, however printing resp-events_reported after the write
call returns shows complete junk most of the times where as you
explain here it should be 1 unless some error occurs. Looking on the
ucma kernel code under

http://www2.openfabrics.org/git/?p=~shefty/rdma-dev.git;a=blob;f=drivers/infiniband/core/ucma.c

I think to see the bug: there is no copy_to_user() before
ucma_leave_multicast() returns and hence the response structure at
rdma_leave_multicast of librdmacm is not set to anything, what do you
say?

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Infiniband Network Library

2007-01-08 Thread Roland Dreier
what is a network library?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA

2007-01-08 Thread Or Gerlitz
On 1/8/07, Or Gerlitz [EMAIL PROTECTED] wrote:
 explain here it should be 1 unless some error occurs. Looking on the
 ucma kernel code under
http://www2.openfabrics.org/git/?p=~shefty/rdma-dev.git;a=blob;f=drivers/infiniband/core/ucma.c

I have looked in the multicast-sa_cache branch.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH untested] IB/mthca: avoid wasting MTT enties on memfree

2007-01-08 Thread Roland Dreier
Have you tested this?  I think it increases the amount of memory
needed for the buddy allocator bitmaps by a factor of 8, and right now
those bitmaps are kmalloc()ed.  So I'd be aftraid that it would make
it impossible to load the module.

Anyway this is definitely 2.6.21 material given that we're already at
2.6.20-rc4, and this change is has a decent chance of introducing regressions.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.20 rdma_ucm: fix reporting events with invalid user context

2007-01-08 Thread Or Gerlitz
On 1/5/07, Sean Hefty [EMAIL PROTECTED] wrote:
 There's a problem with how rdma cm events are reported to userspace that can
 lead to application crashes.

 When a new connection request arrives, a context for the connection is 
 allocated
 in the kernel.  The connection event is then reported to userspace.  The
 userspace library retrieves the event and allocates its own context for the
 connection.  The userspace context is associated with the kernel's context 
 when
 accepting.  This allows the kernel to give userspace context with other 
 events.

 A problem occurs if a second event for the same connection occurs before the
 user has had a chance to call accept.  The userspace context has not yet been
 set, which causes the librdmacm to crash.   (This has been seen when the app
 takes too long to call accept, resulting in the remote side timing out and
 rejecting the connection.)

Assuming that events are reported in order (correct?)  then the user
space consumer was calling rdma_get_cm_event, got a connection request
and before calling rdma_accept they have called rdma_get_cm_event
again and got connection reject ?

Or the thing is that there are two threads in user space, one calling
rdma_get_cm_event and on some events acting by itself where on other
events causing another thread to act, so it got the conn request and
moved it to the other thread and then got the conn reject and tried to
act on it before the other thread called rdma_accept ?

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA

2007-01-08 Thread Sean Hefty
 I think to see the bug: there is no copy_to_user() before 
 ucma_leave_multicast() returns and hence the response structure at 
 rdma_leave_multicast of librdmacm is not set to anything, what do you say?

This looks like problem.  I wonder how this is working for me at all...  maybe 
the response structure is being initialized to 0, but this doesn't match up 
with 
your debug output...  I will look into this more, but the copy_to_user 
definitely seems to be missing.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC 0/2] ofed_1_2 - Chelsio T3 RDMA Support

2007-01-08 Thread Steve Wise
On Mon, 2007-01-08 at 21:57 +0200, Michael S. Tsirkin wrote:
  Core changes are required for the T3 driver.  This includes the addition
  of a udata pointer parameter to the ib_req_notify_cq() provider method.
  This is still being discussed on the openib-general list and I'll update
  it accordingly once we finalize the solution.
 
 So what I plan to do is, review the patches are in proper format,
 but delay applying until this API issue is closed. OK?
 

Right. Don't apply these at all. I just wanted folks to look at what I
did and make sure it looks ok.  I'll repost a final patch set after we
resolve this issue.

Thanks,

Steve.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.20 rdma_ucm: fix reporting events with invalid user context

2007-01-08 Thread Sean Hefty
 Assuming that events are reported in order (correct?)  then the user
 space consumer was calling rdma_get_cm_event, got a connection request
 and before calling rdma_accept they have called rdma_get_cm_event
 again and got connection reject ?

The events are reported in order in the kernel, but the same guarantee cannot 
be 
made for userspace if an application is processing events using multiple 
threads.  However, in the case where the bug occurred, a single thread was 
polling for events.

 Or the thing is that there are two threads in user space, one calling
 rdma_get_cm_event and on some events acting by itself where on other
 events causing another thread to act, so it got the conn request and
 moved it to the other thread and then got the conn reject and tried to
 act on it before the other thread called rdma_accept ?

This was what was happening.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Infiniband Network Library

2007-01-08 Thread Greg Lindahl
On Mon, Jan 08, 2007 at 12:26:00PM -0600, Sean Hubbell wrote:

 I have had to make significant 
 work arounds for our current, third party network API that we purchased 
 and continue to watch if fall down and still not take advantage on the 
 bandwidth that I need. With that said, does anyone on this list have a 
 recommendation for an InfiniBand capable network library?

To amplify Roland's question: What does this library do that the
existing ways of using Infiniband doesn't? Sockets, verbs, MPI...

-- greg


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH untested] IB/mthca: avoid wasting MTT enties on memfree

2007-01-08 Thread Michael S. Tsirkin

Subject: Re: [PATCH untested] IB/mthca: avoid wasting MTT enties on memfree

 Have you tested this?

No, didn't I make this clear? Sorry. I'm not in the lab at the moment,
and my laptop does not have infiniband.
That's why it says untested in the subject :).

 I think it increases the amount of memory
 needed for the buddy allocator bitmaps by a factor of 8, and right now
 those bitmaps are kmalloc()ed.  So I'd be aftraid that it would make
 it impossible to load the module.

Hmph. We'll need to make these 2-level then?

 Anyway this is definitely 2.6.21 material given that we're already at
 2.6.20-rc4, and this change is has a decent chance of introducing regressions.

OK.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Infiniband Network Library

2007-01-08 Thread Sean Hubbell


---BeginMessage---

Roland Dreier wrote:

what is a network library?


  
openpgm, openib are some but but I am looking for one that is a few 
levels higher or abstracted. I am looking for around 3 or 4 calls to 
send a message, something like connection, disconnect send and receive.


Sean

---End Message---
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Infiniband Network Library

2007-01-08 Thread Sean Hubbell
This would just be a higher level of abstraction... For example code to 
send 1 msg would look like Connect, Send and Disconnect...
Sean

Greg Lindahl wrote:
 On Mon, Jan 08, 2007 at 12:26:00PM -0600, Sean Hubbell wrote:

   
 I have had to make significant 
 work arounds for our current, third party network API that we purchased 
 and continue to watch if fall down and still not take advantage on the 
 bandwidth that I need. With that said, does anyone on this list have a 
 recommendation for an InfiniBand capable network library?
 

 To amplify Roland's question: What does this library do that the
 existing ways of using Infiniband doesn't? Sockets, verbs, MPI...

 -- greg


 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


   


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCHv4] IPoIB CM Experimental support

2007-01-08 Thread Bernard King-Smith
- Message from Michael S. Tsirkin [EMAIL PROTECTED] on Mon,
 8 Jan 2007 18:57:14 +0200 -
 
 To:
 
 openib-general@openib.org, Roland Dreier [EMAIL PROTECTED]
 
 Subject:
 
 [openib-general] [PATCHv4] IPoIB CM Experimental support
 
 The following patch adds experimental support for IPoIB connected mode.
 The idea is to increase performance by increasing the MTU
 from the maximum of 2K (theoretically 4K) supported by IPoIB on top of 
UD.
 With this code, I'm able to get 800MByte/sec or more with netperf
 without options on a Mellanox 4x back-to-back DDR system.
 
 Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED]
 
 ---
 
 Sorry about the churn, just fixed a bug in this code.

[SNIP] 
 e. Some notes on code
 1. SRQ is used for scalability to large cluster sizes

I still want to support non-SRQ adapters with this code. Not all systems 
have 100's or 1000's of endpoints and those smaller systems will benefit 
from IPoIB-CM. The larger systems tend to have larger memory per node so 
can support the additional memory requirements. 

At the November meeting one of the main themes from application developers 
and customers is we must have a well performing TCP/IP story across as 
much of the IB space as possible. If only one or two of the IB adapters 
perform well, then we haven't addressed the customer needs. Those adapters 
that can't support RC is one issue, but for those who do without SRQ, 
smaller configurations should be able to use IPoIB-CM.

 2. Only RC connections are used (UC does not support SRQ now)
 3. Retry count is set to 0 since spec draft warns against retries
 4. Each connection is used for data transfers in only 1 direction,
so each connection is either active(TX) or passive (RX).
2 sides that want to communicate create 2 connections.
 5. Each active (TX) connection has a separate CQ for send completions -
this keeps the code simple without CQ resize and other tricks
 

Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
[EMAIL PROTECTED](845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future. William Shatner___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA

2007-01-08 Thread Or Gerlitz
On 1/8/07, Sean Hefty [EMAIL PROTECTED] wrote:
  I think to see the bug: there is no copy_to_user() before
  ucma_leave_multicast() returns and hence the response structure at
  rdma_leave_multicast of librdmacm is not set to anything, what do you say?

 This looks like problem.  I wonder how this is working for me at all...

I don't think mckey calls rdma_leave_multicast so maybe this is why
you did not notice the problem?

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Infiniband Network Library

2007-01-08 Thread Michael S. Tsirkin
 what is a network library?
 
 

 openpgm, openib are some but but I am looking for one that is a few 
 levels higher or abstracted. I am looking for around 3 or 4 calls to 
 send a message, something like connection, disconnect send and receive.

PGM is transport level, isn't it?
So a few levels higher would be the Application layer in the OSI model ...
Are you looking for something that works with e.g. SQL queries?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA

2007-01-08 Thread Or Gerlitz
On 1/8/07, Sean Hefty [EMAIL PROTECTED] wrote:
  I just noticed that once i apply the patch, the last + lines (that is
  pthread_mutex_lock, while loop doing pthread_cond_wait and then
  pthread_mutex_unlock) become part of rdma_leave_multicast which seems to
  me strictly buggy as no one is going to wake up this code.

 The leave must wait until all events have been reported on the multicast 
 group.
   There can be more than one event on a group if an error occurs.  See
 ucma_complete_mc_event() for where the condition is signaled.

let me see i follow your design: mc-events_completed is incremented
in the library when the consumer calls rdma_ack_cm_event() and
resp-events_reported is incremeted in the kernel called when the user
calls rdma_get_cm_event() ?

If this is indeed the case, the design seems fine to me, else it might
be problematic eg if it does not support the case where there was
multicast error but the user did not consume the associated event and
now want to call rdma_leave_multicast().

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] rdma_cm iWARP connection setup timeouts reported as rejects.

2007-01-08 Thread Caitlin Bestler
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Steve Wise
 Sent: Monday, January 08, 2007 6:55 AM
 To: Mirko Benz
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] [PATCH] rdma_cm iWARP 
 connection setup timeouts reported as rejects.
 
 On Mon, 2007-01-08 at 12:13 +0100, Mirko Benz wrote:
  Hi,
  
  What could be the reasons for these timeouts to occur?
 
 One way: If the host is not reachable but the next hop 
 neighbour is, then the connection attempt will timeout.
 
 Another way is if, for some reason, the MPA negotiation 
 doesn't complete in a timely manner.  For instance, if the 
 passive side never rdma_accept()s the connection, then the 
 active side should eventually timeout the attempt and return 
 a timeout error to the consumer.
 
 

One very important additonal example of MPA negotiation failure
is the case where only one end of the TCP connection was anticipating
the usage of MPA.

For example, if an ssh client mistakenly tried to connect to 
an iWARP port, both sides would just sit there waiting for
the other one to say something. An eventaul timeout is the
only way out of this.

  How should an application handle this?
  
 
 Applications should handle connection timeouts however they want.
 Usually they just report it to the user.
 
 

One way to look at it is that host unreachable is an *optimized*
error report that deals with certain conditions where the unreachability
can be quickly determined. In the more general case, the fact that a
given host/service is currently unavailable is only known by its failure
to answer.

In most cases corrective action (either get the remote service
restarted,
make the path to it work, or select another service) is up to the user.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Infiniband Network Library

2007-01-08 Thread Or Gerlitz
On 1/8/07, Sean Hubbell [EMAIL PROTECTED] wrote:
 This would just be a higher level of abstraction... For example code to
 send 1 msg would look like Connect, Send and Disconnect...

From your email i understand that using BSD sockets over IB ULPs
such as IPoIB UD, IPoIB CM or SDP is not enough for the performance
enhancemt you want to get with IB.

Can you share what are you hunting for, ie which from the following measures:
BW / LAT / PPS / CPU %% and for which msg size huge/big/med/small

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA

2007-01-08 Thread Sean Hefty
 I don't think mckey calls rdma_leave_multicast so maybe this is why
 you did not notice the problem?

Yep - this was the case.  I've updated mckey and created a patch for the 
kernel, 
which I'll push out through my rdma-dev tree shortly.  Thanks for the report.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC] return qp pointer as part of ib_wc

2007-01-08 Thread Roland Dreier
This change makes sense to me.  Does anyone object to queueing this
for 2.6.21?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC] return qp pointer as part of ib_wc

2007-01-08 Thread Steve Wise
Ok with me.

On Mon, 2007-01-08 at 13:40 -0800, Roland Dreier wrote:
 This change makes sense to me.  Does anyone object to queueing this
 for 2.6.21?
 
  - R.
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH RFC] return qp pointer as part of ib_wc

2007-01-08 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: [PATCH RFC] return qp pointer as part of ib_wc
 
 This change makes sense to me.  Does anyone object to queueing this
 for 2.6.21?

And for-mm, pls: last version of IPoIB CM patch needs this.


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [RFC] userspace IB SA support

2007-01-08 Thread Sean Hefty
Today, userspace support for SA related operations is limited to the libibmad
interface, which supports sending and receiving MADs only.  I've been assigned
with the task of exposing multicast and informinfo support to userspace.
Specifically, the following functionality is needed:

1. Join a multicast group - needs to use the ib_sa multicast capability.
2. Receive notification of multicast errors.
3. Leave a multicast group.
4. Register to receive SA events - needs to use the ib_sa notice capability.
5. Receive notification of events.
6. Deregister from SA events.

Are there any preferences for how this is added?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts

2007-01-08 Thread Roland Dreier
This all looks rather fishy:

  +/*
  + * Limit CM msg timeouts to something reasonable.
  + * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min.
  + */
  +#define IB_CM_MAX_TIMEOUT 21

OK... (although 8 seconds seems a little short -- it seems a somewhat
longer timeout could be legitimate on a very busy fabric across a WAN
or something like that)

but then...

  +timeout = min(IB_CM_MAX_TIMEOUT,
  +  cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) +
  +  cm_convert_to_ms(cm_id_priv-av.packet_life_time));

should the IB_CM_MAX_TIMEOUT be inside a cm_convert_to_ms() too?
and similarly...

  -cm_id_priv-timeout_ms = param-timeout_ms;
  +cm_id_priv-timeout_ms = min(IB_CM_MAX_TIMEOUT, param-timeout_ms);

is timeout_ms misnamed, or did we just limit all timeouts to 21 msecs?

...and other places in the patch seem to have similar problems.

Also, I would like to see warning messages like

ib_cm: Possibly bogus timeout of xx (yy msecs) in REP from GID 

printed in the kernel log so people realize they have broken SRP
targets or whatever.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 2.6.20: outstanding patches and issues

2007-01-08 Thread Roland Dreier
fix_query_qp_in_reset.patch

will merge

ib_verbs_h_missing_kref.patch

does this actually fix any compilation problems?  if not I think it's
better for 2.6.21.

mthca_0_fmr_page_fix.patch

already merged in my tree pending a pull, right?

Patch 5 of 5 is at v3, hope it's all good now.

you only listed 4...

mthca_1_merge_mr_fmr_on_64bit.patch
mthca_2_fast_registration.patch
mthca_3_alloc_consistent.patch
mthca_4_dma_align_reserved_mtts.patch

still need review but I don't think they're appropriate for 2.6.20
given how much they change some pretty key memory registration stuff.

mthca_wrid_swap.patch - very small benefit, but very small patch either

Will merge for 2.6.21

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH/RFC] libibverbs: Improve driver loading

2007-01-08 Thread Krishna Kumar2
Hi Roland/others,

Sorry to be a bit off-topic, but ...

Is this a good time to submit the Transport Neutral Verbs code ? Roland had
earlier suggested to do this after all major changes were finished and
before
libibverbs1.1 is released.

The way the code is designed is to have the existing ibv_*() routines which
are
exported, but these are changed to simply call similarly named rdmav_*()
routines
(also exported) which implement the original code. The intention is to
remove or
deprecate the use of ibv_*() routines by the next release (1.2?).

Thanks,

- KK

[EMAIL PROTECTED] wrote on 01/05/2007 07:49:39 PM:

   BTW, the question still stands. If I start trying to play with
   static linking issues, I'd like to do this based on this patch,
   not what's in master currently.

 Yes, I had hoped to push it out sooner but I wanted to fix all the
 driver libraries first.  I didn't get a chance to finish that up
 before my vacatation, but I will do that soon and post patches for
 driver libraries when I change libibverbs.

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ib_gid_is_link_local

2007-01-08 Thread Jason Gunthorpe
On Fri, Jan 05, 2007 at 11:44:49AM -0500, Hal Rosenstock wrote:

  However, it might be smart to have opensm consider the routers to be a
  send-only member for every MLID..
 
 Do you mean non-member rather than send-only member ? Routers need to
 receive as well as send, right ? Or are you worried about some other
 issue here ?

I would like it if routers did not have to worry about joins in order
to send a multicast packet. There really isn't a good way to know how
long to keep a join active for.. Having them be send-only members of
every relevent group skips that problem and decreases the latency for
first-packet multicast forwarding.
 
 Also, I'm still not sure about a couple of aspects of every MLID:
 1. Wouldn't the router only want to be full member of link local scoped
 MGIDs (that it was interested in locally) ? Are you saying any local
 scoped MGIDs not of interest would just get dropped anyhow ? If that is
 the point here, that would work but isn't there a performance impact of
 doing so ? 
 2. Similarly for any other (non local scope) MGRPs which do not match
 across any router ports, isn't there a performance impact of receiving
 and then having to drop/filter these packets ?

I think there is a balance to be had here, on one side if you have
alot of multicast groups (ie Ipv6 SNMs) then requiring alot of
extra work to keep the SM informed about what is going on is more
harmful than having the router get more multicast traffic than it
optimally could.

A router must already keep track of what multicast groups are
forwarded to what ports, so it is virtually free for it to also do
filtering.

[Aside: The more I think about scaling a router up the more it seems
 to me that the router and SM need alot of intercommunication. The
 most efficient thing would be if the router could maintain a replica
 of the entire SM database for paths and multicast.
 The router would then always be ready to handle any incoming
 packet, just like an IB switch.]

 Right, the router is some sort of member on the MGRPs of interest. I
 think you are trying to make that list of MGRPs of interest simpler
 and utilize filtering where not needed (as I mentioned above), but I may

Yes.

Simpler, I hope :

  A onlink line routing table just terminates the routing
  lookup. 'unreachable' is another termination. A via line changes the
  next hop GID and creates more lookups until an onlink is reached.
 
 So is the specification of all multicast as onlink a short term thing
 then ?
 
 Also, with using onlink for all multicast, is there some forwarding
 determination made somewhere in the router stack ?

I think it is useful to keep the router stack and the SM stack
seperate.  Especailly when it comes to multicast. The router will have
a multicast routing table that works somewhat differently than the
unicast table. This table would indicate which ports in the router are
part of each group. The SM should only need a MGID to MLID path
translation. This is similar to the distinction between a host routing
table and a router routing table in IP land - where hosts generally do
not have multicast routing information.

 Yes, I have no idea how IPv6 will work with large inter subnet clusters
 either. We had a thread on this a while ago and I think it died out at
 that point. To state the obvious, I think some changes need to be made
 for IPv6 to work well with current IB hardware or perhaps some
 configuration restrictions ?

Yes, I agree. I wonder if the IPoIB RFC authors considerd the negative
impact of IPv6 SNM when they designed the specification? It would be
much better if 1 IP subnet = 1 IB subnet.

  Yes, but in this case I don't think multicast routing can be pushed to
  the host. It is either the router or some combination of the router
  and the SM.
 
 I'm not quite following you on this yet. Why/how is host multicast
 routing any different (than unicast) ?

Well, I can't see how to make this situation sane if the host is in
control and the routers/sm are fairly passive:
- Two subnets, each with nodes joined to multicast group M
- Two routers connecting the two subnets (multipath)
- Each host has an inter-subnet multicast spanning tree and knows
  which router to use for M
- Host sends a packet for M, what LRH does it use?

Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] 2.6.20: outstanding patches and issues

2007-01-08 Thread Michael S. Tsirkin
 Quoting Roland Dreier [EMAIL PROTECTED]:
 Subject: Re: 2.6.20: outstanding patches and issues
 
   fix_query_qp_in_reset.patch
 
 will merge
 
   ib_verbs_h_missing_kref.patch
 
 does this actually fix any compilation problems?  if not I think it's
 better for 2.6.21.

2.6.21 then.

   mthca_0_fmr_page_fix.patch
 
 already merged in my tree pending a pull, right?

Yes.

   Patch 5 of 5 is at v3, hope it's all good now.
 
 you only listed 4...
 
   mthca_1_merge_mr_fmr_on_64bit.patch
   mthca_2_fast_registration.patch
   mthca_3_alloc_consistent.patch
   mthca_4_dma_align_reserved_mtts.patch

Because I counted mthca_0_fmr_page_fix.patch

 still need review but I don't think they're appropriate for 2.6.20
 given how much they change some pretty key memory registration stuff.

Hmph. I was afraid you'd say this. The only reason I'm surprised is that
these do fix FMR on non-cache-coherent architectures - it's
a bug fix, not just a feature searies.  
And you did say (patches 1-2 are what was posted then):
http://article.gmane.org/gmane.linux.drivers.openib/34184/match=patchv2+mthca+speed+memory+registration+filling+mtts+directly
I think this still can go into 2.6.20 after -rc1 if we can get this fixed up.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH][MINOR] OpenSM/osm_ucast_updn.c: Handle failed memory allocation

2007-01-08 Thread Hal Rosenstock
OpenSM/osm_ucast_updn.c: Handle failed memory allocation

Signed-off-by: Hal Rosenstock [EMAIL PROTECTED]

diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c
index 7fa119e..3d96478 100644
--- a/osm/opensm/osm_ucast_updn.c
+++ b/osm/opensm/osm_ucast_updn.c
@@ -628,6 +628,11 @@ updn_init(
 if (strlen(line)  1)
 {
   p_tmp = malloc(sizeof(uint64_t));
+  if (!p_tmp)
+  {
+status = IB_ERROR;
+goto Exit;
+  }
   *p_tmp = strtoull(line, NULL, 16);
   cl_list_insert_tail(p_updn-p_root_nodes, p_tmp);
 }




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] nightly osm_sim report 2007-01-09:normal completion

2007-01-08 Thread Eitan Zahavi
OSM Simulation Regression Summary
OpenSM rev = Mon_Jan_8_12:41:44_2007 064f5e 
ibutils rev = Wed_Jan_3_11:42:12_2007 913448 
Total=410 Pass=410 Fail=0

Pass:
30 Stability IS1-16.topo
30 Pkey IS1-16.topo
30 OsmTest IS1-16.topo
30 OsmStress IS1-16.topo
30 Multicast IS1-16.topo
30 LidMgr IS1-16.topo
10 Stability IS3-loop.topo
10 Stability IS3-128.topo
10 Pkey IS3-128.topo
10 OsmTest IS3-loop.topo
10 OsmTest IS3-128.topo
10 OsmStress IS3-128.topo
10 Multicast IS3-loop.topo
10 Multicast IS3-128.topo
10 LidMgr IS3-128.topo
10 FatTree part-4-ary-3-tree.topo
10 FatTree merge-roots-reorder-4-ary-2-tree.topo
10 FatTree merge-roots-4-ary-2-tree.topo
10 FatTree merge-root-4-ary-3-tree.topo
10 FatTree merge-root-12-ary-2-tree.topo
10 FatTree merge-2-ary-4-tree.topo
10 FatTree half-4-ary-3-tree.topo
10 FatTree blend-4-ary-2-tree.topo
10 FatTree 4-ary-4-tree.topo
10 FatTree 4-ary-3-tree.topo
10 FatTree 32nodes-3lvl-is1.topo
10 FatTree 2-ary-4-tree.topo
10 FatTree 12-node-spaced.topo
10 FatTree 12-ary-2-tree.topo

Failures:

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ib_gid_is_link_local

2007-01-08 Thread Hal Rosenstock
On Tue, 2007-01-09 at 00:22, Jason Gunthorpe wrote:
 On Fri, Jan 05, 2007 at 11:44:49AM -0500, Hal Rosenstock wrote:
 
   However, it might be smart to have opensm consider the routers to be a
   send-only member for every MLID..
  
  Do you mean non-member rather than send-only member ? Routers need to
  receive as well as send, right ? Or are you worried about some other
  issue here ?
 
 I would like it if routers did not have to worry about joins in order
 to send a multicast packet.

Send-only members are not supposed to receive. How do they receive then
? Send-only members do not receive. Don't routers need to receive
multicast (as well as send) ?

 There really isn't a good way to know how long to keep a join active for..

What about MGID creation/deletion events ?

 Having them be send-only members of every relevent group 

How does the router know every relevant group ?

 skips that problem and decreases the latency for
 first-packet multicast forwarding.

It could be a benefit on first packet MC forwarding on a new group but
it depends on when the first packet is received relative to the group
detected and joined.

  Also, I'm still not sure about a couple of aspects of every MLID:
  1. Wouldn't the router only want to be full member of link local scoped
  MGIDs (that it was interested in locally) ? Are you saying any local
  scoped MGIDs not of interest would just get dropped anyhow ? If that is
  the point here, that would work but isn't there a performance impact of
  doing so ? 
  2. Similarly for any other (non local scope) MGRPs which do not match
  across any router ports, isn't there a performance impact of receiving
  and then having to drop/filter these packets ?
 
 I think there is a balance to be had here, on one side if you have
 alot of multicast groups (ie Ipv6 SNMs) then requiring alot of
 extra work to keep the SM informed about what is going on is more
 harmful than having the router get more multicast traffic than it
 optimally could.
 
 A router must already keep track of what multicast groups are
 forwarded to what ports, so it is virtually free for it to also do
 filtering.
 
 [Aside: The more I think about scaling a router up the more it seems
  to me that the router and SM need alot of intercommunication. The
  most efficient thing would be if the router could maintain a replica
  of the entire SM database for paths and multicast.
  The router would then always be ready to handle any incoming
  packet, just like an IB switch.]

There is no IBA standard for replicating the SM or SA database. This is
a similar issue which multiple SMs in the same subnet might have
depending on the approach taken for this.

  Right, the router is some sort of member on the MGRPs of interest. I
  think you are trying to make that list of MGRPs of interest simpler
  and utilize filtering where not needed (as I mentioned above), but I may
 
 Yes.
 
 Simpler, I hope :
 
   A onlink line routing table just terminates the routing
   lookup. 'unreachable' is another termination. A via line changes the
   next hop GID and creates more lookups until an onlink is reached.
  
  So is the specification of all multicast as onlink a short term thing
  then ?
  
  Also, with using onlink for all multicast, is there some forwarding
  determination made somewhere in the router stack ?
 
 I think it is useful to keep the router stack and the SM stack
 seperate.

Yes, but on the other hand, you just said you wanted a partial copy of
the SM database for the router...

 Especailly when it comes to multicast. The router will have
 a multicast routing table that works somewhat differently than the
 unicast table. This table would indicate which ports in the router are
 part of each group. The SM should only need a MGID to MLID path
 translation.

Doesn't it already have this ?

  This is similar to the distinction between a host routing
 table and a router routing table in IP land - where hosts generally do
 not have multicast routing information.

I don't think the SM needs the multicast routing (intersubnet)
information either.

  Yes, I have no idea how IPv6 will work with large inter subnet clusters
  either. We had a thread on this a while ago and I think it died out at
  that point. To state the obvious, I think some changes need to be made
  for IPv6 to work well with current IB hardware or perhaps some
  configuration restrictions ?
 
 Yes, I agree. I wonder if the IPoIB RFC authors considerd the negative
 impact of IPv6 SNM when they designed the specification?

Not sure.

 It would be much better if 1 IP subnet = 1 IB subnet.

Yes, it would be better in terms of this but it was an architectural
goal to allow flexibility in the IP - IB subnet mappings. There was a
lot of discussion about this and there were earlier schemes which
restricted to that mapping.

   Yes, but in this case I don't think multicast routing can be pushed to
   the host. It is either the router or some combination of the router
   

Re: [openib-general] [PATCH RFC] return qp pointer as part of ib_wc

2007-01-08 Thread Or Gerlitz
Roland Dreier wrote:
 This change makes sense to me.  Does anyone object to queueing this
 for 2.6.21?

Indeed, it makes much sense, do you any idea what would it take to 
expose this capability also by libibverbs?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] userspace IB SA support

2007-01-08 Thread Dotan Barak
Sean Hefty wrote:
 Today, userspace support for SA related operations is limited to the libibmad
 interface, which supports sending and receiving MADs only.  I've been assigned
 with the task of exposing multicast and informinfo support to userspace.
 Specifically, the following functionality is needed:

 1. Join a multicast group - needs to use the ib_sa multicast capability.
 2. Receive notification of multicast errors.
 3. Leave a multicast group.
 4. Register to receive SA events - needs to use the ib_sa notice capability.
 5. Receive notification of events.
 6. Deregister from SA events.

 Are there any preferences for how this is added?
   
What about path query or any SA query from the user level ?

Thanks
Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] second version of the libibverbs man pages

2007-01-08 Thread Dotan Barak
[EMAIL PROTECTED] wrote:
 Hi all and Happy new year.

 * I rewrote the man pages and removed all of the extra characters of the 
 POD module (according to Roland request).
 * I tried to stick with the 80 characters limit (according to James 
 request), without 100% success (when i
described the attributes of the structures, i needed more than 80
 characters in a line..)
 * Several spelling mistakes were fixed


 Roland, what do you think? can you use this version and check in those 
 files?
   
Roland, do you plan to check in these man pages?

thanks
Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] ib_gid_is_link_local

2007-01-08 Thread Jason Gunthorpe
On Tue, Jan 09, 2007 at 01:17:47AM -0500, Hal Rosenstock wrote:

  I would like it if routers did not have to worry about joins in order
  to send a multicast packet.
 
 Send-only members are not supposed to receive. How do they receive then
 ? Send-only members do not receive. Don't routers need to receive
 multicast (as well as send) ?

How about more exactly: The SM could implicity consider the router as
a send-only member of all forwardable multicast groups until a reason
arises for it to be a full member.

Anyhow, I think this discussion has lost context and we are not
thinking about the same things. Let me describe to you how I think
that a router today can implement multicast without special SM
support using IBA defined protocols:

Let me try to do that:
- The router maintains a table of all forwardable multicast groups on
  each IB subnet that it is connected to.
- It also tracks for each router port which groups have receivers
  on the local subnet. If so the group on that port is flagged
  'rxer' otherwise 'txonly'
- This table is kept in sync with the SM by using SM traps and SM queries.
- Each router port then computes a set of joins to perform on the local
  subnet based on this table:
   Join TypeLocal_MGID   Remote_MGID
none none txonly
none none rxer
none txonly   none
none txonly   txonly[No receiver]
full txonly   rxer  [Only remote receiver]
none rxer none
send-onlyrxer txonly[Only local receiver]
full rxer rxer  [Both receiver]

  Remote_MGID would be rxer if any other participating port
  has a rxer flag for this MGID. [participating port being
  derived from a multicast routing protocol].

  (How exactly to determine the rxer/txonly flag and if this
   optimization is even really necessary is not something I have spent
   alot of time on just yet - but this conceptually describes
   the optimal, minimum spanning methodology.)
- The router connects to other routers on the local subnet and
  performs a multicast routing protocol to produce a inter-subnet
  multicast spanning tree for each MGID. The results from this control
  which ports participate in each MGID.
- Finally, the router programs its internal forwarding path.

As an example using IPv6 SNM:
1) A new nodes comes up on subnet alpha and registers SNM MGID A as
   full membership.
2) The subnet alpha local router port learns of #1 from the local SM.
3) The router forwards the new mgid to other routers it is connected
   to via the multicast routing protocol.
4) On the beta subnet, another node registers as send-only for SNM MGID
   A.
5) The beta local router port learns of #4 from the local SM.
6) The router inspects its MGID table and finds one of its ports
   has a path to the rxer in #1. It joins MGID A on subnet beta as
   a full member, and the other port joins MGID A as a send-only
   member.
7) The above repeats through the chain of subnets until subnet alpha
   is reached.
8) A port on the router connected to subnet alpha sees the MGID A
   creation on one of its other ports and registers as send-only
   for MGID A on subnet alpha. (similar to step #6)
9) The host sends the SNM, unsubscribres from MGID A and the process
   reverses itself.

I think this is within what IBA already defines and is pretty much
what has to be done today to have a chance of working with existing
subnet managers. I don't think it needs changes to the SM. I
don't think it scales very well since it requires alot of exchanges
between the SM and the routers. This is more or less what I had in
mind during the concall we had last year...

Also, I expect the first SNM message will be lost since the SM will
ack the host before the router has received the trap, found the new
MGID and joined it. (I don't think that is very good :)

==

The above describes the router as being autonomous of the SM. The
routers learn of data the SM has through queries (a pull
model). Another approach is to have the SM program the routers
explicitly (a push model). In this view the router is more like an IB
switch from a SM programming perspective. It has a more complex
LinearFDB that uses GIDs rather than LIDs and a more complex multicast
table that works on MGIDs rather than MLIDS. Like a switch the SM
would program the router as needed.

I view this as being more in line with the IB treatment of the network
as a completely managed resource. It should be more efficient since
the SM only sends what changes to the routers rather than the routers
responding to traps/etc. I'd ultimately like to find other interested
people to work on this idea since I think it has merit..

It is with this second case where my prior thoughts about optimization
strategies make more sense. (Ie pre-arranging send-only status for the
router is an optimization that lets the SM do less work on group
creation, the SM