Re: [openib-general] [PATCH] rdma_cm iWARP connection setup timeouts reported as rejects.
Hi, What could be the reasons for these timeouts to occur? How should an application handle this? Thanks, Mirko ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OFED release testing Task force meeting minutes
Meeting took place on Thursday - Jan. 4th, 2007 Agenda: 1. Introduction to targets as presented at the last OFA meeting 2. Determine priorities. 3. Determine schedule 4. Open discussion Attending companies: Mellanox, NetEffect, ORNL, Qlogic, Voltaire, Discussion Items and Action Items: 1) Note was made that OFA interoperability and IBTA Plugfest date after the OFED 1.2 scheduled release 2) Agreed initial targets (in priority order): a. Unified reporting of tests results b. Unified/Increased reporting bugs c. ULPS/driver parts testing ownerships 3) Agreed Action Items: a. AI 1: Amit K (Mellanox) to take that with OFA to re-visit the date and decide whether it would be better to have the testing prior to the release. b. AI 2: Amit K (Mellanox) to send out test-report format for group review. c. AI 3: Moni L (Voltaire) to send out test-report format for group review. Reviews/addition ideas were agreed to be taken by e-mail with the group in the To field involved. Follow-up meeting will be scheduled for either 17th or 18th of January 2007 8:30am PDT=11am EDT=6pm Israel (Please respond with which fits you better). Nimrod Gindi Mellanox Technologies Ltd. mail : [EMAIL PROTECTED] Cell : +1-408-750-4801 Office: +1-347-342-0011 Fax : +1-212-987-0275 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ioctl and send_agents
Hi, Thanks for the fast answer. OpenSM registers agents in opensm/osm_sm_mad_ctrl.c:osm_sm_mad_ctrl_bind and opensm/osm_sa_mad_ctrl.c:osm_sa_mad_ctrl_bind. osm_sm_mad_ctrl_bind is called from osm_sm.c:osm_sm_bind and osm_sa_mad_ctrl_bind is called from osm_sa.c:osm_sa_bind. Both osm_sm_bind and osm_sa_bind are called from opensm/osm_opensm.c:osm_opensm_bind which is in turn called from main.c during OpenSM startup. That is the vendor independent part. The vendor dependant part is done in the vendor layer. For OpenIB, it is done in osm_vendor_ibumad.c:osm_vendor_bind. I looked at the osm_vendor_bind and seen the umad_register call. But if I checked the umad_register function (libibumad/src/umad.c) I just see an ioctl call again. And if it right that the user_mad module is uses at kernel space shouldn't there be a call like unlocked_ioctl or compat_ioctl like defined in this module? These agents are all receiver agents and you say nothing about send agents for SM? Thanks Michael ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] nightly osm_sim report 2007-01-07:normal completion
On Sun, 2007-01-07 at 00:15, Eitan Zahavi wrote: OSM Simulation Regression Summary OpenSM rev = Sat_Jan_6_06:44:34_2007 6c8647 ibutils rev = Wed_Jan_3_11:42:12_2007 913448 Total=369 Pass=366 Fail=3 Pass: 27 Stability IS1-16.topo 27 Pkey IS1-16.topo 27 OsmTest IS1-16.topo 27 OsmStress IS1-16.topo 27 Multicast IS1-16.topo 27 LidMgr IS1-16.topo 9 Stability IS3-loop.topo 9 Stability IS3-128.topo 9 Pkey IS3-128.topo 9 OsmTest IS3-loop.topo 9 OsmTest IS3-128.topo 9 OsmStress IS3-128.topo 9 Multicast IS3-loop.topo 9 Multicast IS3-128.topo 9 FatTree part-4-ary-3-tree.topo 9 FatTree merge-roots-reorder-4-ary-2-tree.topo 9 FatTree merge-roots-4-ary-2-tree.topo 9 FatTree merge-root-4-ary-3-tree.topo 9 FatTree merge-root-12-ary-2-tree.topo 9 FatTree half-4-ary-3-tree.topo 9 FatTree blend-4-ary-2-tree.topo 9 FatTree 4-ary-4-tree.topo 9 FatTree 4-ary-3-tree.topo 9 FatTree 32nodes-3lvl-is1.topo 9 FatTree 2-ary-4-tree.topo 9 FatTree 12-ary-2-tree.topo 8 LidMgr IS3-128.topo 8 FatTree merge-2-ary-4-tree.topo 8 FatTree 12-node-spaced.topo Failures: 1 LidMgr IS3-128.topo Is this LidMgr failure a DNS issue like the others ? Also, there was also pkey failure from late last week. -- Hal 1 FatTree merge-2-ary-4-tree.topo 1 FatTree 12-node-spaced.topo ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] nightly osm_sim report 2007-01-08:normal completion
On Mon, 2007-01-08 at 00:26, Eitan Zahavi wrote: OSM Simulation Regression Summary OpenSM rev = Sat_Jan_6_06:44:34_2007 6c8647 ibutils rev = Wed_Jan_3_11:42:12_2007 913448 Total=410 Pass=409 Fail=1 Pass: 30 Stability IS1-16.topo 30 Pkey IS1-16.topo 30 OsmTest IS1-16.topo 30 OsmStress IS1-16.topo 30 Multicast IS1-16.topo 30 LidMgr IS1-16.topo 10 Stability IS3-loop.topo 10 Stability IS3-128.topo 10 Pkey IS3-128.topo 10 OsmTest IS3-loop.topo 10 OsmTest IS3-128.topo 10 OsmStress IS3-128.topo 10 Multicast IS3-loop.topo 10 LidMgr IS3-128.topo 10 FatTree part-4-ary-3-tree.topo 10 FatTree merge-roots-reorder-4-ary-2-tree.topo 10 FatTree merge-roots-4-ary-2-tree.topo 10 FatTree merge-root-4-ary-3-tree.topo 10 FatTree merge-root-12-ary-2-tree.topo 10 FatTree merge-2-ary-4-tree.topo 10 FatTree half-4-ary-3-tree.topo 10 FatTree blend-4-ary-2-tree.topo 10 FatTree 4-ary-4-tree.topo 10 FatTree 4-ary-3-tree.topo 10 FatTree 32nodes-3lvl-is1.topo 10 FatTree 2-ary-4-tree.topo 10 FatTree 12-node-spaced.topo 10 FatTree 12-ary-2-tree.topo 9 Multicast IS3-128.topo Failures: 1 Multicast IS3-128.topo What about this failure too ? Is it also DNS related or something else ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 2/2]: OpenSM/osm_console.c: Handle telnet disconnects better
OpenSM/osm_console.c: Handle telnet disconnects better Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED] Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] diff --git a/osm/opensm/osm_console.c b/osm/opensm/osm_console.c index 420acc2..8d770aa 100644 --- a/osm/opensm/osm_console.c +++ b/osm/opensm/osm_console.c @@ -336,7 +336,7 @@ void osm_console(osm_opensm_t *p_osm) pollfd[1].events = POLLIN|POLLOUT; pollfd[1].revents = 0; - if (poll(pollfd, 2, 1) = 0) + if (poll(pollfd, pollfd[1].fd = 0 ? 2 : 1, 1) = 0) return; #ifdef ENABLE_OSM_CONSOLE_SOCKET @@ -382,11 +382,10 @@ void osm_console(osm_opensm_t *p_osm) if (n 0) { /* Parse and act on input */ parse_cmd_line(p_line, p_osm); + osm_console_prompt(p_osm-console.out); + } else + osm_console_close_socket(p_osm); + if (p_line) free(p_line); - } else { - fprintf(p_osm-console.out, Input error\n); - fflush(p_osm-console.out); - } - osm_console_prompt(p_osm-console.out); } } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 1/2] OpenSM: Add socket support to OpenSM console
OpenSM: Add socket support to OpenSM console Signed-off-by: Ira Weiny [EMAIL PROTECTED] Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] diff --git a/osm/include/opensm/osm_console.h b/osm/include/opensm/osm_console.h index 705f918..2d212f2 100644 --- a/osm/include/opensm/osm_console.h +++ b/osm/include/opensm/osm_console.h @@ -38,6 +38,11 @@ #include opensm/osm_subnet.h #include opensm/osm_opensm.h +#define OSM_COMMAND_LINE_LEN120 +#define OSM_COMMAND_PROMPT $ +#define OSM_DEFAULT_CONSOLE_PORT 1 +#define OSM_DAEMON_NAME opensm + #ifdef __cplusplus # define BEGIN_C_DECLS extern C { # define END_C_DECLS } @@ -48,8 +53,10 @@ BEGIN_C_DECLS +void osm_console_init(osm_subn_opt_t *opt, osm_opensm_t *p_osm); void osm_console(osm_opensm_t *p_osm); -void osm_console_prompt(void); +void osm_console_prompt(FILE *out); +void osm_console_close_socket(osm_opensm_t *p_osm); END_C_DECLS diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h index 16fef37..482de28 100644 --- a/osm/include/opensm/osm_opensm.h +++ b/osm/include/opensm/osm_opensm.h @@ -48,6 +48,7 @@ #ifndef _OSM_OPENSM_H_ #define _OSM_OPENSM_H_ +#include stdio.h #include signal.h #include complib/cl_dispatcher.h #include complib/cl_passivelock.h @@ -130,6 +131,15 @@ struct osm_routing_engine { * internals cleanup. */ +typedef struct _osm_console_t +{ + int socket; + int in_fd; + int out_fd; + FILE *in; + FILE *out; +} osm_console_t; + /s* OpenSM: OpenSM/osm_opensm_t * NAME * osm_opensm_t @@ -156,6 +166,7 @@ typedef struct _osm_opensm_t cl_plock_t lock; struct osm_routing_engine routing_engine; osm_stats_t stats; + osm_console_tconsole; } osm_opensm_t; /* * FIELDS diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 79796e5..c9b04eb 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -266,6 +266,7 @@ typedef struct _osm_subn_opt boolean_tno_qos; boolean_taccum_log_file; boolean_tconsole; + uint16_t console_port; cl_map_t port_prof_ignore_guids; boolean_tport_profile_switch_nodes; osm_pfn_ui_extension_t pfn_ui_pre_lid_assign; diff --git a/osm/opensm/configure.in b/osm/opensm/configure.in index 1ccf5c6..2d52675 100644 --- a/osm/opensm/configure.in +++ b/osm/opensm/configure.in @@ -62,6 +62,22 @@ AC_ARG_ENABLE(debug, esac],[debug=false]) AM_CONDITIONAL(DEBUG, test x$debug = xtrue) +dnl Console over a socket connection +AC_ARG_ENABLE(console-socket, +[ --enable-console-socket Enable a console socket, requires tcp_wrappers (default yes)], +[case $enableval in + yes) console_socket=yes ;; + no) console_socket=no ;; + esac], + console_socket=yes) +if test $console_socket = yes; then + AC_CHECK_LIB(wrap, request_init, [], + AC_MSG_ERROR([request_init() not found. console-socket requires libwrap.])) + AC_DEFINE(ENABLE_OSM_CONSOLE_SOCKET, + 1, + [Define as 1 if you want to enable a console on a socket connection]) +fi + dnl Provide user option to select vendor OPENIB_APP_OSMV_SEL diff --git a/osm/opensm/main.c b/osm/opensm/main.c index 374d323..90432be 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -217,6 +217,11 @@ show_usage(void) 4 outstanding SMPs.\n\n ); printf( -console\n This option brings up the OpenSM console.\n\n ); +#ifdef ENABLE_OSM_CONSOLE_SOCKET + printf( --console_port port\n +Specify an alternate telnet port for the console (default %d).\n\n, + OSM_DEFAULT_CONSOLE_PORT); +#endif printf( -i equalize-ignore-guids-file\n -ignore-guids equalize-ignore-guids-file\n This option provides the means to define a set of ports\n @@ -578,6 +583,9 @@ main( { cache-options, 0, NULL, 'c'}, { stay_on_fatal, 0, NULL, 'y'}, { honor_guid2lid, 0, NULL, 'x'}, +#ifdef ENABLE_OSM_CONSOLE_SOCKET + { console_port, 1, NULL, 'C'}, +#endif { NULL,0, NULL, 0 } /* Required at the end of the array */ }; @@ -679,6 +687,12 @@ main( printf( Enabling OpenSM interactive console\n); break; +#ifdef ENABLE_OSM_CONSOLE_SOCKET +case 'C': + opt.console_port = strtol(optarg, NULL, 0); + break; +#endif + case 'd': dbg_lvl = strtol(optarg, NULL, 0); printf( d level = 0x%x\n, dbg_lvl); @@ -931,15 +945,11 @@ main( } else { +osm_console_init(opt, osm); + /* Sit here forever - In the future, some sort of console interactivity could - be implemented in this loop. */ -if (opt.console) { - printf(\nOpenSM Console\n\n); - osm_console_prompt(); -} while( !osm_exit_flag ) { if (opt.console) osm_console(osm); @@
Re: [openib-general] best way to get ibv_get_cq_event to return
Guys, Thanks for the information - I'll give it a try. SRG -Original Message- From: Roland Dreier [mailto:[EMAIL PROTECTED] Sent: Sunday, January 07, 2007 1:30 PM To: Dotan Barak Cc: Or Gerlitz; Greenwood, Steve; openib-general@openib.org Subject: Re: [openib-general] best way to get ibv_get_cq_event to return This is true (and i guess that it will work), but if in the future the implementation of the ibv_comp_channel will be changed, this code will not work The use of a file descriptor is pretty fundamental, and it was done exactly to permit this sort of stuff (poll(), epoll, SIGIO, etc). So I think it is extremely unlikely to change in a way that would break an app using the file descriptor. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] OpenSM/osm_console.c: Add resweep and status commands
OpenSM/osm_console.c: Add resweep and status commands Signed-off-by: Ira Weiny [EMAIL PROTECTED] Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] diff --git a/osm/opensm/osm_console.c b/osm/opensm/osm_console.c index 8d770aa..8157f90 100644 --- a/osm/opensm/osm_console.c +++ b/osm/opensm/osm_console.c @@ -84,6 +84,20 @@ static void help_quit(FILE *out, int det static void help_loglevel(FILE *out, int detail) { fprintf(out, loglevel [log-level]\n); + if (detail) { + fprintf(out,log-level is OR'ed from the following\n); + fprintf(out,OSM_LOG_NONE 0x%02X\n, OSM_LOG_NONE); + fprintf(out,OSM_LOG_ERROR0x%02X\n, OSM_LOG_ERROR); + fprintf(out,OSM_LOG_INFO 0x%02X\n, OSM_LOG_INFO); + fprintf(out,OSM_LOG_VERBOSE 0x%02X\n, OSM_LOG_VERBOSE); + fprintf(out,OSM_LOG_DEBUG0x%02X\n, OSM_LOG_DEBUG); + fprintf(out,OSM_LOG_FUNCS0x%02X\n, OSM_LOG_FUNCS); + fprintf(out,OSM_LOG_FRAMES 0x%02X\n, OSM_LOG_FRAMES); + fprintf(out,OSM_LOG_ROUTING 0x%02X\n, OSM_LOG_ROUTING); + fprintf(out,OSM_LOG_SYS 0x%02X\n, OSM_LOG_SYS); + fprintf(out, \n); + fprintf(out,OSM_LOG_DEFAULT_LEVEL0x%02X\n, OSM_LOG_DEFAULT_LEVEL); + } } static void help_priority(FILE *out, int detail) @@ -91,6 +105,16 @@ static void help_priority(FILE *out, int fprintf(out, priority [sm-priority]\n); } +static void help_resweep(FILE *out, int detail) +{ + fprintf(out, resweep [heavy|light]\n); +} + +static void help_status(FILE *out, int detail) +{ + fprintf(out, status\n); +} + /* more help routines go here */ static void help_parse(char **p_last, osm_opensm_t *p_osm, FILE *out) @@ -164,6 +188,99 @@ static void priority_parse(char **p_last } } +static char *sm_state_str(int state) +{ + switch (state) + { + case IB_SMINFO_STATE_INIT: + return (Init); + case IB_SMINFO_STATE_DISCOVERING: + return (Discovering); + case IB_SMINFO_STATE_STANDBY: + return (Standby); + case IB_SMINFO_STATE_NOTACTIVE: + return (Not Active); + case IB_SMINFO_STATE_MASTER: + return (Master); + } + return (UNKNOWN); +} + +static char *sa_state_str(osm_sa_state_t state) +{ + switch (state) + { + case OSM_SA_STATE_INIT: + return (Init); + case OSM_SA_STATE_READY: + return (Ready); + } + return (UNKNOWN); +} + +static void status_parse(char **p_last, osm_opensm_t *p_osm, FILE *out) +{ + fprintf(out,SM State : %s\n, + sm_state_str(p_osm-subn.sm_state)); + fprintf(out,SA State : %s\n, + sa_state_str(p_osm-sa.state)); + fprintf(out,MAD stats\n + -\n + QP0 MADS outstanding : %d\n + QP0 MADS outstanding (on wire) : %d\n + QP0 MADS rcvd : %d\n + QP0 MADS sent : %d\n + QP0 unicasts sent : %d\n + QP1 MADS outstanding : %d\n + QP1 MADS rcvd : %d\n + QP1 MADS sent : %d\n +, + p_osm-stats.qp0_mads_outstanding, + p_osm-stats.qp0_mads_outstanding_on_wire, + p_osm-stats.qp0_mads_rcvd, + p_osm-stats.qp0_mads_sent, + p_osm-stats.qp0_unicasts_sent, + p_osm-stats.qp1_mads_outstanding, + p_osm-stats.qp1_mads_rcvd, + p_osm-stats.qp1_mads_sent + ); + fprintf(out,Subnet flags\n + \n + Ignore existing lfts : %d\n + Subnet Init errors : %d\n + In sweep hop 0 : %d\n + Moved to master state : %d\n + First time master sweep: %d\n + Coming out of standby : %d\n +, + p_osm-subn.ignore_existing_lfts, + p_osm-subn.subnet_initialization_error, + p_osm-subn.in_sweep_hop_0, + p_osm-subn.moved_to_master_state, + p_osm-subn.first_time_master_sweep, + p_osm-subn.coming_out_of_standby + ); + fprintf(out, \n); +} + +static void resweep_parse(char **p_last, osm_opensm_t *p_osm, FILE *out) +{ + char *p_cmd; + + p_cmd = next_token(p_last); + if (!p_cmd || + (strcmp(p_cmd, heavy) != 0 +
Re: [openib-general] [openfabrics-ewg] OFED 1.2 Questions
Michael S. Tsirkin wrote: Tziporet, I'm in the process of adding the Chelsio T3 drivers to the OFED repository and I have a question: The HowTo kernel section you posted on the wiki sez to add the new files to the repos directly via a git commit, but create patches for modifications to existing files and put the patches in the kernel_patches/fixes directory. However, I don't see patches in that directory to modify the core Makefile/Kconfig for SDP or other new modules added for ofed. So should I just modify infiniband/Makefile and Kconfig via the git commit that adds the new Chelsio files, or create a patch file and put it in kernel_patches/fixes? Yes you can modify the Makefile/Kconfig directly. Reason being, its always trivial to resolve conflicts there when merging from upstream. After you check its working if you changed the general Makfiles/Kconfig please send the patches to Vlad. Also, are there machines available with the various ofed supported distros installed that I can do compile testing for the Chelsio user lib? You can compile on the OFA server - but this has only Ubuntu OS. For testing in other OSes you should setup systems in your company. Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] using IB on a port without IPoIB running NIC
Or: Thank you for the information, I may change my mind to require IPoIB to run newer version of HP-MPI on OFED 1.2, if I don't find other way to easily establish IB connection dynamically between two process groups with dynamic size. --CQ -Original Message- From: Or Gerlitz [mailto:[EMAIL PROTECTED] Sent: Monday, January 08, 2007 1:18 AM To: Tang, Changqing Cc: openib-general@openib.org Subject: using IB on a port without IPoIB running NIC Tang, Changqing wrote: We understand that, but we hope to have a connect/accept style IB connection setup, without IPoIB involved, like HP-UX IT-API(similar to uDAPL without underlying IP support), it works with multiple cards. Configure 4-5 IP addresses on a single node is kind of silly. CQ, Few more thoughts on your being able to MPI on an IB PORT without an IPoIB working NIC requirement... Basically, people use IB for both IPC and I/O, where except for SRP, all the IB I/O ULPs (both block based: iSER and file based: Lustre, GPFS, rNFS) use IP addressing and hence are either coded to the RDMA CM or work on top of TCP/IP (iSCSI-TCP, NFS, pFS, etc). So if the user will not configure IPoIB on this IB port, it will not be utilized for I/O. Now, you mention a use case of 4 cards on a node, I believe that typically this would happen on big SMP machines where you **must** use all the active IB links for I/O: eg when most of your MPI work is within the SMP (128 to 512 ranks) and most of the IB work is for I/O . I understand (please check and let me know eg about HP 1U offering) that all/most nowadays 1U PCI-EX nodes can have at most **one** PCI-EX card. Combing the above limitation with the fact that these nodes would run at most 16 ranks (eg 8 dual-core CPUs) and that 8 ranks/IB link is a ratio that makes sense, we are remained with **2** and not 4-5 NICs to configure. Oh, and one more thing, 4 IB links per node would make an N node cluster to 4N IB end-ports cluster for which you need f(4N) switching IB ports, and the specific f(.) might turn the IB deployment over this cluster into very expensive one... Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] using IB on a port without IPoIB running NIC
Thank you for the information, I may change my mind to require IPoIB to run newer version of HP-MPI on OFED 1.2, if I don't find other way to easily establish IB connection dynamically between two process groups with dynamic size. I'm not really sure what your needs are, but it's not like this is completely impossible. Some people use ad-hoc socket-based tricks establish IB connections, and this will work for some topologies. You can look at libibverbs/examples for an example of such implementation. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] rdma_cm iWARP connection setup timeouts reported as rejects.
On Mon, 2007-01-08 at 12:13 +0100, Mirko Benz wrote: Hi, What could be the reasons for these timeouts to occur? One way: If the host is not reachable but the next hop neighbour is, then the connection attempt will timeout. Another way is if, for some reason, the MPA negotiation doesn't complete in a timely manner. For instance, if the passive side never rdma_accept()s the connection, then the active side should eventually timeout the attempt and return a timeout error to the consumer. How should an application handle this? Applications should handle connection timeouts however they want. Usually they just report it to the user. Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] ofed_1_2 configure script typo
Typo in OFED 1.2 configure script. From: Steve Wise [EMAIL PROTECTED] Signed-off-by: Steve Wise [EMAIL PROTECTED] --- ofed_scripts/configure |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/ofed_scripts/configure b/ofed_scripts/configure index 5a1694d..a0557e2 100755 --- a/ofed_scripts/configure +++ b/ofed_scripts/configure @@ -598,7 +598,7 @@ main() --with-vnic_debug-mod) CONFIG_INFINIBAND_VNIC_DEBUG=y ;; ---without-vnic-mod) +--without-vnic_debug-mod) CONFIG_INFINIBAND_VNIC_DEBUG= ;; --with-vnic_stats-mod) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] using IB on a port without IPoIB running NIC
What I need is that, without IPoIB, how do I wire IB connection ? Currently with Verbs API, it is an alltoall QP number exchange. I want to remove the alltoall QP number exchange in MPI dynamic process. --CQ -Original Message- From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] Sent: Monday, January 08, 2007 8:18 AM To: Tang, Changqing Cc: Or Gerlitz; openib-general@openib.org Subject: Re: using IB on a port without IPoIB running NIC Thank you for the information, I may change my mind to require IPoIB to run newer version of HP-MPI on OFED 1.2, if I don't find other way to easily establish IB connection dynamically between two process groups with dynamic size. I'm not really sure what your needs are, but it's not like this is completely impossible. Some people use ad-hoc socket-based tricks establish IB connections, and this will work for some topologies. You can look at libibverbs/examples for an example of such implementation. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH TRIVIAL] opensm: eliminate some local variable
On Sun, 2007-01-07 at 15:38, Sasha Khapyorsky wrote: This trivially eliminates some local variable. Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED] Thanks. Applied. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] ofed_1_2 configure script typo
Applied, Regards, Vladimir On Mon, 2007-01-08 at 08:59 -0600, Steve Wise wrote: Typo in OFED 1.2 configure script. From: Steve Wise [EMAIL PROTECTED] Signed-off-by: Steve Wise [EMAIL PROTECTED] --- ofed_scripts/configure |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/ofed_scripts/configure b/ofed_scripts/configure index 5a1694d..a0557e2 100755 --- a/ofed_scripts/configure +++ b/ofed_scripts/configure @@ -598,7 +598,7 @@ main() --with-vnic_debug-mod) CONFIG_INFINIBAND_VNIC_DEBUG=y ;; ---without-vnic-mod) +--without-vnic_debug-mod) CONFIG_INFINIBAND_VNIC_DEBUG= ;; --with-vnic_stats-mod) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] using IB on a port without IPoIB running NIC
What I need is that, without IPoIB, how do I wire IB connection ? Currently with Verbs API, it is an alltoall QP number exchange. I want to remove the alltoall QP number exchange in MPI dynamic process. Well, does your MPI implementation currently use librdmacm? If not, you don't currently have a dependency on IPoIB and probably have no reason to introduce one. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] using IB on a port without IPoIB running NIC
What I need is that, without IPoIB, how do I wire IB connection ? Currently with Verbs API, it is an alltoall QP number exchange. I want to remove the alltoall QP number exchange in MPI dynamic process. Well, does your MPI implementation currently use librdmacm? No, we don't use both librdmacm and libibcm. If not, you don't currently have a dependency on IPoIB and probably have no reason to introduce one. As I said, the problem is the alltoall QP number exchange. I hope that a process can only provide one piece of information(such as ip/port in TCP/IP) so that all other processes have the same piece of info and can make connection to it. --CQ -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] using IB on a port without IPoIB running NIC
If not, you don't currently have a dependency on IPoIB and probably have no reason to introduce one. As I said, the problem is the alltoall QP number exchange. I hope that a process can only provide one piece of information(such as ip/port in TCP/IP) so that all other processes have the same piece of info and can make connection to it. Well, start with a socket, each time a process connects create a QP on both sides and exchange the 2 QP numbers? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OFED 1.2, iw_cxgb3, and genalloc()
I've packaged the Chelsio T3 Drivers (modules iw_cxgb3 and cxgb3) into Vlad's ofed_1_2 repos and I'm testing now. I've run into an issue with the Chelsio driver. It requires the kernel genalloc() allocator, which is only built into the kernel if any code requires it at config time of the kernel. Also, it was new to 2.6.17 or 2.6.18 so it won't exist for older OFED distros like SLES. So there are two related issues: 1) the genalloc services don't exist in older kernels. 2) Even if it does exist in the kernel src tree on a distro, it might not have been built in if nothing required that service when the kernel was configured. I need to handle both cases. I'm seeking advice on how to pull this functionality in for ofed 1.2. My initial thought is to add a patch similar to the memtrack patch and add it either as a module or as part of the iw_cxgb3 module. I could even rename the services and always add them so I can avoid having to detect if its in the running kernel. Any Ideas/comments? Thanks, Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] using IB on a port without IPoIB running NIC
As I said, the problem is the alltoall QP number exchange. I hope that a process can only provide one piece of information(such as ip/port in TCP/IP) so that all other processes have the same piece of info and can make connection to it. Well, start with a socket, each time a process connects create a QP on both sides and exchange the 2 QP numbers? Then the speed would be a big concern. --CQ -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] opensm: eliminate port/switch_info access methods
On Sun, 2007-01-07 at 18:01, Sasha Khapyorsky wrote: Following previous patch (remove osm_physp_get_port_info_ptr() checks) this removes confused functions osm_physp_get_port_info_ptr() and osm_switch_get_si_ptr(). Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED] Thanks. Applied. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] [MINOR] perftest: send_bw: fix dangling else
Symptom: ib_send_bw reports 'inf' bandwidth Cause: dangling else Signed-off-by: Yosef Etigin [EMAIL PROTECTED] --- diff -rup a/src/userspace/perftest/send_bw.c b/src/userspace/perftest/send_bw.c --- a/src/userspace/perftest/send_bw.c 2007-01-08 18:20:08.0 +0200 +++ b/src/userspace/perftest/send_bw.c 2007-01-08 18:21:06.0 +0200 @@ -1156,12 +1156,14 @@ int main(int argc, char *argv[]) rem_dest = pp_server_exch_dest(sockfd, my_dest); } } else { - if (user_param.duplex) + if (user_param.duplex) { if (run_iter_bi(ctx, user_param, rem_dest, size)) return 18; - else + } + else { if(run_iter_uni(ctx, user_param, rem_dest, size)) return 18; + } if (user_param.servername) print_report(user_param.iters, size, user_param.duplex, tposted, tcompleted); -- Yosef Etigin ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Bad URL on OFED Development Wiki Site
This URL is bad on the OFED Development Wiki page: https://wiki.openfabrics.org/tiki/tiki-download_file.php?fileId=23 It is supposed to be a OFED Release Process presentation. Thanks, Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] 2.6.20: outstanding patches and issues
sean_cm_limit_mra_timeout.patch I don't believe that I ever sent Roland a patch for merging upstream. The last patch I remember sending was untested and waiting for some feedback. I can resubmit this patch if it is working for you. (Was this in OFED 1.1?) There are 3 Sean's patches I think we need rdma_ucm: fix reporting events with invalid user context rdma_ucm: fix struct ucma_event rdma_cm: avoid port reuse after close The first two were pulled upstream. I have not published the port reuse patch in any git branch yet, but can add it to my multicast-sa_cache branch if needed. Dotan reported oops with ucma at openib restart. Sean - any luck in reproducing this? I have not, but maybe there's a difference in our configuration. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] 2.6.20: outstanding patches and issues
sean_cm_limit_mra_timeout.patch I don't believe that I ever sent Roland a patch for merging upstream. The last patch I remember sending was untested and waiting for some feedback. I can resubmit this patch if it is working for you. (Was this in OFED 1.1?) Yes, it was in OFED, and it solves real problem with misbehaved remote. I did say this works for us, did I not? Let's have this in 2.6.20 - is there need to resend? Acked-by: Michael S. Tsirkin [EMAIL PROTECTED] There are 3 Sean's patches I think we need rdma_ucm: fix reporting events with invalid user context rdma_ucm: fix struct ucma_event rdma_cm: avoid port reuse after close The first two were pulled upstream. I have not published the port reuse patch in any git branch yet, but can add it to my multicast-sa_cache branch if needed. OK. The patch is small enough though - I hope it just lands upstream and we don't have to maintain it in side branch. Acked-by: Michael S. Tsirkin [EMAIL PROTECTED] Dotan reported oops with ucma at openib restart. Sean - any luck in reproducing this? I have not, but maybe there's a difference in our configuration. Hmm. One of these then. So where do we go from here? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Fwd: [ANNOUNCE] GIT 1.4.4.4
FYI. The infinite loop fix looks potentially relevant, so I guess we should update staging. Sasha? - Forwarded message from Junio C Hamano [EMAIL PROTECTED] - Subject: [ANNOUNCE] GIT 1.4.4.4 Date: Mon, 8 Jan 2007 05:30:50 +0200 From: Junio C Hamano [EMAIL PROTECTED] The latest maintenance release GIT 1.4.4.4 is available at the usual places: http://www.kernel.org/pub/software/scm/git/ git-1.4.4.4.tar.{gz,bz2} (tarball) git-htmldocs-1.4.4.4.tar.{gz,bz2} (preformatted docs) git-manpages-1.4.4.4.tar.{gz,bz2} (preformatted docs) RPMS/$arch/git-*-1.4.4.4-1.$arch.rpm (RPM) This is to push out a handful bugfixes since 1.4.4.3. On the 'master' development front, the stabilization for v1.5.0 will start soonish. Changes since v1.4.4.3 are as follows: Johannes Schindelin (1): diff --check: fix off by one error Junio C Hamano (3): spurious .sp in manpages Fix infinite loop when deleting multiple packed refs. pack-check.c::verify_packfile(): don't run SHA-1 update on huge data - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - End forwarded message - -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Infiniband Network Library
Hello, I have a question that is slightly off topic but I would think that this would be to ask the question. So, here goes ... I have been using InfiniBand here for about 2 years now. I have had to make significant work arounds for our current, third party network API that we purchased and continue to watch if fall down and still not take advantage on the bandwidth that I need. With that said, does anyone on this list have a recommendation for an InfiniBand capable network library? Thanks in advance, Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts
Limit the timeout that the ib_cm will wait to receive a response to a message, to avoid excessively large (on the order of hours) timeout values. This prevents consuming resources tracking requests for extended periods of time. This helps correct for a bug in the SRP Engenio target sending a large value ( 1 hour) as a service timeout. Signed-off-by: Sean Hefty [EMAIL PROTECTED] --- drivers/infiniband/core/cm.c | 30 +++--- 1 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index d446998..147b41e 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -54,6 +54,12 @@ MODULE_AUTHOR(Sean Hefty); MODULE_DESCRIPTION(InfiniBand CM); MODULE_LICENSE(Dual BSD/GPL); +/* + * Limit CM msg timeouts to something reasonable. + * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min. + */ +#define IB_CM_MAX_TIMEOUT 21 + static void cm_add_one(struct ib_device *device); static void cm_remove_one(struct ib_device *device); @@ -888,12 +894,12 @@ static void cm_format_req(struct cm_req_msg *req_msg, cm_req_set_resp_res(req_msg, param-responder_resources); cm_req_set_init_depth(req_msg, param-initiator_depth); cm_req_set_remote_resp_timeout(req_msg, - param-remote_cm_response_timeout); + min((u8) IB_CM_MAX_TIMEOUT, param-remote_cm_response_timeout)); cm_req_set_qp_type(req_msg, param-qp_type); cm_req_set_flow_ctrl(req_msg, param-flow_control); cm_req_set_starting_psn(req_msg, cpu_to_be32(param-starting_psn)); cm_req_set_local_resp_timeout(req_msg, - param-local_cm_response_timeout); + min((u8) IB_CM_MAX_TIMEOUT, param-local_cm_response_timeout)); cm_req_set_retry_count(req_msg, param-retry_count); req_msg-pkey = param-primary_path-pkey; cm_req_set_path_mtu(req_msg, param-primary_path-mtu); @@ -999,10 +1005,10 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, } cm_id-service_id = param-service_id; cm_id-service_mask = __constant_cpu_to_be64(~0ULL); - cm_id_priv-timeout_ms = cm_convert_to_ms( - param-primary_path-packet_life_time) * 2 + -cm_convert_to_ms( - param-remote_cm_response_timeout); + cm_id_priv-timeout_ms = + min(IB_CM_MAX_TIMEOUT, + cm_convert_to_ms(param-primary_path-packet_life_time) * 2 + + cm_convert_to_ms(param-remote_cm_response_timeout)); cm_id_priv-max_cm_retries = param-max_cm_retries; cm_id_priv-initiator_depth = param-initiator_depth; cm_id_priv-responder_resources = param-responder_resources; @@ -1400,8 +1406,9 @@ static int cm_req_handler(struct cm_work *work) } } cm_id_priv-tid = req_msg-hdr.tid; - cm_id_priv-timeout_ms = cm_convert_to_ms( - cm_req_get_local_resp_timeout(req_msg)); + cm_id_priv-timeout_ms = + min(IB_CM_MAX_TIMEOUT, + cm_convert_to_ms(cm_req_get_local_resp_timeout(req_msg))); cm_id_priv-max_cm_retries = cm_req_get_max_cm_retries(req_msg); cm_id_priv-remote_qpn = cm_req_get_local_qpn(req_msg); cm_id_priv-initiator_depth = cm_req_get_resp_res(req_msg); @@ -2303,8 +2310,9 @@ static int cm_mra_handler(struct cm_work *work) work-cm_event.private_data = mra_msg-private_data; work-cm_event.param.mra_rcvd.service_timeout = cm_mra_get_service_timeout(mra_msg); - timeout = cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + - cm_convert_to_ms(cm_id_priv-av.packet_life_time); + timeout = min(IB_CM_MAX_TIMEOUT, + cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + + cm_convert_to_ms(cm_id_priv-av.packet_life_time)); spin_lock_irqsave(cm_id_priv-lock, flags); switch (cm_id_priv-id.state) { @@ -2707,7 +2715,7 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id, cm_id-service_id = param-service_id; cm_id-service_mask = __constant_cpu_to_be64(~0ULL); - cm_id_priv-timeout_ms = param-timeout_ms; + cm_id_priv-timeout_ms = min(IB_CM_MAX_TIMEOUT, param-timeout_ms); cm_id_priv-max_cm_retries = param-max_cm_retries; ret = cm_alloc_msg(cm_id_priv, msg); if (ret) -- 1.4.4.3 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA
I just noticed that once i apply the patch, the last + lines (that is pthread_mutex_lock, while loop doing pthread_cond_wait and then pthread_mutex_unlock) become part of rdma_leave_multicast which seems to me strictly buggy as no one is going to wake up this code. The leave must wait until all events have been reported on the multicast group. There can be more than one event on a group if an error occurs. See ucma_complete_mc_event() for where the condition is signaled. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH RFC 0/2] ofed_1_2 - Chelsio T3 RDMA Support
This series adds the Chelsio T3 drivers to the ofed_1_2 tree. For this review, I've omitted the patch that actually adds the two drivers themselves, and just included the changes to the ofed_1_2 configuration scripts and the new kernel_patches/ files needed. The driver code itself is on track to go into either 2.6.20 or 2.6.21. I would appreciate any feedback/comments on what I've done. This is just for review. I'm still testing it. Here are the key changes: The package now needs to visit drivers/net to build the T3 Ethernet driver which is required for the T3 RDMA driver. Added a patch to backport the Linux 2.6.20 genalloc() services. I added the allocator as local services to the T3 RDMA module. Core changes are required for the T3 driver. This includes the addition of a udata pointer parameter to the ib_req_notify_cq() provider method. This is still being discussed on the openib-general list and I'll update it accordingly once we finalize the solution. Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 1/2] ofed_1_2 Changes to kernel_patches/ for Chelsio T3 Support.
- rdma core changes needed for T3 Support. - genalloc backport. - modified the qp_num - qp ptr patch to include cxgb3. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- kernel_patches/fixes/genalloc.patch| 392 kernel_patches/fixes/ib_wc_qpn_to_qp.patch | 13 + kernel_patches/fixes/t3_core_changes.patch | 202 ++ 3 files changed, 607 insertions(+), 0 deletions(-) diff --git a/kernel_patches/fixes/genalloc.patch b/kernel_patches/fixes/genalloc.patch new file mode 100644 index 000..c44a98f --- /dev/null +++ b/kernel_patches/fixes/genalloc.patch @@ -0,0 +1,392 @@ +Backport of the Linux 2.6.20 generic allocator. + +From: Steve Wise [EMAIL PROTECTED] + +Signed-off-by: Steve Wise [EMAIL PROTECTED] +--- + + drivers/infiniband/hw/cxgb3/Kconfig |1 + drivers/infiniband/hw/cxgb3/Makefile |3 + drivers/infiniband/hw/cxgb3/core/cxio_hal.h |4 + drivers/infiniband/hw/cxgb3/core/cxio_resource.c | 20 +- + drivers/infiniband/hw/cxgb3/core/cxio_resource.h |2 + drivers/infiniband/hw/cxgb3/core/genalloc.c | 196 ++ + drivers/infiniband/hw/cxgb3/core/genalloc.h | 36 + 7 files changed, 247 insertions(+), 15 deletions(-) + +diff --git a/drivers/infiniband/hw/cxgb3/Kconfig b/drivers/infiniband/hw/cxgb3/Kconfig +index d3db264..0361a72 100644 +--- a/drivers/infiniband/hw/cxgb3/Kconfig b/drivers/infiniband/hw/cxgb3/Kconfig +@@ -1,7 +1,6 @@ + config INFINIBAND_CXGB3 + tristate Chelsio RDMA Driver + depends on CHELSIO_T3 INFINIBAND +- select GENERIC_ALLOCATOR + ---help--- + This is an iWARP/RDMA driver for the Chelsio T3 1GbE and + 10GbE adapters. +diff --git a/drivers/infiniband/hw/cxgb3/Makefile b/drivers/infiniband/hw/cxgb3/Makefile +index 7a89f6d..12e7a94 100644 +--- a/drivers/infiniband/hw/cxgb3/Makefile b/drivers/infiniband/hw/cxgb3/Makefile +@@ -4,7 +4,8 @@ EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/ + obj-$(CONFIG_INFINIBAND_CXGB3) += iw_cxgb3.o + + iw_cxgb3-y := iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \ +- iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o ++ iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o \ ++ core/genalloc.o + + ifdef CONFIG_INFINIBAND_CXGB3_DEBUG + EXTRA_CFLAGS += -DDEBUG -g +diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h +index e5e702d..a9e8452 100644 +--- a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h +@@ -104,8 +104,8 @@ struct cxio_rdev { + u32 qpnr; + u32 qpmask; + struct cxio_ucontext uctx; +- struct gen_pool *pbl_pool; +- struct gen_pool *rqt_pool; ++ struct iwch_gen_pool *pbl_pool; ++ struct iwch_gen_pool *rqt_pool; + }; + + static inline int cxio_num_stags(struct cxio_rdev *rdev_p) +diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c +index d1d8722..cecb27b 100644 +--- a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c +@@ -265,7 +265,7 @@ #define PBL_CHUNK 2*1024*1024 + + u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size) + { +- unsigned long addr = gen_pool_alloc(rdev_p-pbl_pool, size); ++ unsigned long addr = iwch_gen_pool_alloc(rdev_p-pbl_pool, size); + PDBG(%s addr 0x%x size %d\n, __FUNCTION__, (u32)addr, size); + return (u32)addr; + } +@@ -273,24 +273,24 @@ u32 cxio_hal_pblpool_alloc(struct cxio_r + void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size) + { + PDBG(%s addr 0x%x size %d\n, __FUNCTION__, addr, size); +- gen_pool_free(rdev_p-pbl_pool, (unsigned long)addr, size); ++ iwch_gen_pool_free(rdev_p-pbl_pool, (unsigned long)addr, size); + } + + int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p) + { + unsigned long i; +- rdev_p-pbl_pool = gen_pool_create(MIN_PBL_SHIFT, -1); ++ rdev_p-pbl_pool = iwch_gen_pool_create(MIN_PBL_SHIFT, -1); + if (rdev_p-pbl_pool) + for (i = rdev_p-rnic_info.pbl_base; +i = rdev_p-rnic_info.pbl_top - PBL_CHUNK + 1; +i += PBL_CHUNK) +- gen_pool_add(rdev_p-pbl_pool, i, PBL_CHUNK, -1); ++ iwch_gen_pool_add(rdev_p-pbl_pool, i, PBL_CHUNK, -1); + return rdev_p-pbl_pool ? 0 : -ENOMEM; + } + + void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p) + { +- gen_pool_destroy(rdev_p-pbl_pool); ++ iwch_gen_pool_destroy(rdev_p-pbl_pool); + } + + /* +@@ -302,7 +302,7 @@ #define RQT_CHUNK 2*1024*1024 + + u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size) + { +- unsigned long addr = gen_pool_alloc(rdev_p-rqt_pool, size 6); ++ unsigned long addr = iwch_gen_pool_alloc(rdev_p-rqt_pool, size 6); +
[openib-general] [PATCH 2/2] ofed_1_2 Changes to ofed scripts for Chelsio T3 Support.
Signed-off-by: Steve Wise [EMAIL PROTECTED] --- ofed_scripts/Makefile |9 +++-- ofed_scripts/configure | 47 +++ 2 files changed, 54 insertions(+), 2 deletions(-) diff --git a/ofed_scripts/Makefile b/ofed_scripts/Makefile index d63b1d2..049e533 100644 --- a/ofed_scripts/Makefile +++ b/ofed_scripts/Makefile @@ -46,8 +46,10 @@ kernel: @echo Kernel sources: $(KSRC) env EXTRA_CFLAGS=$(OPENIB_KERNEL_EXTRA_CFLAGS) $(KERNEL_MEMTRACK_CFLAGS) -I$(CWD)/include -I$(CWD)/drivers/infiniband/include \ -I$(CWD)/drivers/infiniband/ulp/ipoib \ - -I$(CWD)/drivers/infiniband/debug \ - $(MAKE) -C $(KSRC) SUBDIRS=$(CWD)/drivers/infiniband KERNELRELEASE=$(KVERSION) \ + -I$(CWD)/drivers/infiniband/debug \ + -I$(CWD)/drivers/infiniband/hw/cxgb3/core \ + -I$(CWD)/drivers/net/cxgb3 \ + $(MAKE) -C $(KSRC) SUBDIRS=$(CWD)/drivers/infiniband $(CWD)/drivers/net KERNELRELEASE=$(KVERSION) \ EXTRAVERSION=$(EXTRAVERSION) V=1 $(WITH_MAKE_PARAMS) \ CONFIG_INFINIBAND=$(CONFIG_INFINIBAND) \ CONFIG_INFINIBAND_IPOIB=$(CONFIG_INFINIBAND_IPOIB) \ @@ -74,6 +76,9 @@ kernel: CONFIG_INFINIBAND_VNIC=$(CONFIG_INFINIBAND_VNIC) \ CONFIG_INFINIBAND_VNIC_DEBUG=$(CONFIG_INFINIBAND_VNIC_DEBUG) \ CONFIG_INFINIBAND_VNIC_STATS=$(CONFIG_INFINIBAND_VNIC_STATS) \ + CONFIG_INFINIBAND_CXGB3=$(CONFIG_INFINIBAND_CXGB3) \ + CONFIG_INFINIBAND_CXGB3_DEBUG=$(CONFIG_INFINIBAND_CXGB3_DEBUG) \ + CONFIG_CHELSIO_T3=$(CONFIG_CHELSIO_T3) \ LINUXINCLUDE=' \ $(BACKPORT_INCLUDES) \ -I$(CWD)/include \ diff --git a/ofed_scripts/configure b/ofed_scripts/configure index a0557e2..08f15f5 100755 --- a/ofed_scripts/configure +++ b/ofed_scripts/configure @@ -126,6 +126,12 @@ Usage: `basename $0` [options] --with-vnic_stats-modmake CONFIG_INFINIBAND_VNIC_STATS=y [no] --without-vnic_stats-mod[yes] +--with-cxgb3-modmake CONFIG_INFINIBAND_CXGB3=m [no] +--without-cxgb3-mod[yes] + +--with-cxgb3_debug-modmake CONFIG_INFINIBAND_CXGB3_DEBUG=y [no] +--without-cxgb3_debug-mod[yes] + --help - print out options @@ -607,6 +613,20 @@ main() --without-vnic_stats-mod) CONFIG_INFINIBAND_VNIC_STATS= ;; +--with-cxgb3-mod) +CONFIG_INFINIBAND_CXGB3=m + CONFIG_CHELSIO_T3=m +;; +--without-cxgb3-mod) +CONFIG_INFINIBAND_CXGB3= + CONFIG_CHELSIO_T3= +;; +--with-cxgb3_debug-mod) +CONFIG_INFINIBAND_CXGB3_DEBUG=y +;; +--without-cxgb3_debug-mod) +CONFIG_INFINIBAND_CXGB3_DEBUG= +;; --with-modprobe|--without-modprobe) ;; -h | --help) @@ -679,6 +699,8 @@ CONFIG_INFINIBAND_RDS=${CONFIG_INFINIBAN CONFIG_INFINIBAND_RDS_DEBUG=${CONFIG_INFINIBAND_RDS_DEBUG:-''} CONFIG_INFINIBAND_MADEYE=${CONFIG_INFINIBAND_MADEYE:-''} CONFIG_INFINIBAND_VNIC=${CONFIG_INFINIBAND_VNIC:-''} +CONFIG_INFINIBAND_CXGB3=${CONFIG_INFINIBAND_CXGB3:-''} +CONFIG_CHELSIO_T3=${CONFIG_CHELSIO_T3:-''} CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=${CONFIG_INFINIBAND_IPOIB_DEBUG_DATA:-''} CONFIG_INFINIBAND_SDP_SEND_ZCOPY=${CONFIG_INFINIBAND_SDP_SEND_ZCOPY:-''} @@ -689,6 +711,7 @@ CONFIG_INFINIBAND_IPATH=${CONFIG_INFINIB CONFIG_INFINIBAND_MTHCA_DEBUG=${CONFIG_INFINIBAND_MTHCA_DEBUG:-''} CONFIG_INFINIBAND_VNIC_DEBUG=${CONFIG_INFINIBAND_VNIC_DEBUG:-''} CONFIG_INFINIBAND_VNIC_STATS=${CONFIG_INFINIBAND_VNIC_STATS:-''} +CONFIG_INFINIBAND_CXGB3_DEBUG=${CONFIG_INFINIBAND_CXGB3_DEBUG:-''} # Check for minimal supported kernel version if ! check_kerver ${KVERSION} ${MIN_KVERSION}; then @@ -742,6 +765,8 @@ CONFIG_INFINIBAND_RDS=${CONFIG_INFINIBAN CONFIG_INFINIBAND_RDS_DEBUG=${CONFIG_INFINIBAND_RDS_DEBUG} CONFIG_INFINIBAND_MADEYE=${CONFIG_INFINIBAND_MADEYE} CONFIG_INFINIBAND_VNIC=${CONFIG_INFINIBAND_VNIC} +CONFIG_INFINIBAND_CXGB3=${CONFIG_INFINIBAND_CXGB3} +CONFIG_CHELSIO_T3=${CONFIG_CHELSIO_T3} CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=${CONFIG_INFINIBAND_IPOIB_DEBUG_DATA} CONFIG_INFINIBAND_SDP_SEND_ZCOPY=${CONFIG_INFINIBAND_SDP_SEND_ZCOPY} @@ -752,6 +777,7 @@ CONFIG_INFINIBAND_IPATH=${CONFIG_INFINIB CONFIG_INFINIBAND_MTHCA_DEBUG=${CONFIG_INFINIBAND_MTHCA_DEBUG} CONFIG_INFINIBAND_VNIC_DEBUG=${CONFIG_INFINIBAND_VNIC_DEBUG} CONFIG_INFINIBAND_VNIC_STATS=${CONFIG_INFINIBAND_VNIC_STATS} +CONFIG_INFINIBAND_CXGB3_DEBUG=${CONFIG_INFINIBAND_CXGB3_DEBUG}
[openib-general] [PATCH untested] IB/mthca: avoid wasting MTT enties on memfree
I looked at what be the clean fix for the MTT SEG handling in mthca, and I came up with the following (applies on top of the series I posted earlier). I think this gives us an important optimization. Roland, could you please give me a hint whether something like this is too big a change to get into 2.6.20? Arbel does not actually have a concept of MTT segment. So we should set MTT segment size to 64 bit (1 entry) for memfree, otherwise we might be wasting as much as 87% of MTT entries. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] --- diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 7131446..968d151 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -1051,11 +1051,7 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev *dev, MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_EQ_OFFSET); dev_lim-max_eqs = 1 (field 0x7); MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MTT_OFFSET); - if (mthca_is_memfree(dev)) - dev_lim-reserved_mtts = ALIGN((1 (field 4)) * sizeof(u64), - MTHCA_MTT_SEG_SIZE) / MTHCA_MTT_SEG_SIZE; - else - dev_lim-reserved_mtts = 1 (field 4); + dev_lim-reserved_mtts = 1 (field 4); MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_MRW_SZ_OFFSET); dev_lim-max_mrw_sz = 1 field; MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_MRW_OFFSET); diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index b7e42ef..0973359 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -78,16 +78,17 @@ enum { }; enum { - MTHCA_EQ_CONTEXT_SIZE = 0x40, - MTHCA_CQ_CONTEXT_SIZE = 0x40, - MTHCA_QP_CONTEXT_SIZE = 0x200, - MTHCA_RDB_ENTRY_SIZE = 0x20, - MTHCA_AV_SIZE = 0x20, - MTHCA_MGM_ENTRY_SIZE = 0x40, + MTHCA_EQ_CONTEXT_SIZE= 0x40, + MTHCA_CQ_CONTEXT_SIZE= 0x40, + MTHCA_QP_CONTEXT_SIZE= 0x200, + MTHCA_RDB_ENTRY_SIZE = 0x20, + MTHCA_AV_SIZE= 0x20, + MTHCA_MGM_ENTRY_SIZE = 0x40, + + MTHCA_TAVOR_MTT_SEG_SIZE = 0x40, /* Arbel FW gives us these, but we need them for Tavor */ MTHCA_MPT_ENTRY_SIZE = 0x40, - MTHCA_MTT_SEG_SIZE= 0x40, MTHCA_QP_PER_MGM = 4 * (MTHCA_MGM_ENTRY_SIZE / 16 - 2) }; @@ -595,4 +596,8 @@ static inline int mthca_is_memfree(struct mthca_dev *dev) return dev-mthca_flags MTHCA_FLAG_MEMFREE; } +static inline unsigned mthca_mtt_seg_size(struct mthca_dev *dev) +{ + return mthca_is_memfree(dev) ? sizeof(u64) : MTHCA_TAVOR_MTT_SEG_SIZE; +} #endif /* MTHCA_DEV_H */ diff --git a/drivers/infiniband/hw/mthca/mthca_main.c b/drivers/infiniband/hw/mthca/mthca_main.c index bbe9143..d9d5b89 100644 --- a/drivers/infiniband/hw/mthca/mthca_main.c +++ b/drivers/infiniband/hw/mthca/mthca_main.c @@ -465,11 +465,11 @@ static int mthca_init_icm(struct mthca_dev *mdev, } /* CPU writes to non-reserved MTTs, while HCA might DMA to reserved mtts */ - mdev-limits.reserved_mtts = ALIGN(mdev-limits.reserved_mtts * MTHCA_MTT_SEG_SIZE, - dma_get_cache_alignment()) / MTHCA_MTT_SEG_SIZE; + mdev-limits.reserved_mtts = ALIGN(mdev-limits.reserved_mtts * sizeof(u64), + dma_get_cache_alignment()) / sizeof(u64); mdev-mr_table.mtt_table = mthca_alloc_icm_table(mdev, init_hca-mtt_base, -MTHCA_MTT_SEG_SIZE, +sizeof(u64), mdev-limits.num_mtt_segs, mdev-limits.reserved_mtts, 1, 0); diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c index 88f9dc2..0357dbe 100644 --- a/drivers/infiniband/hw/mthca/mthca_mr.c +++ b/drivers/infiniband/hw/mthca/mthca_mr.c @@ -212,7 +212,7 @@ static struct mthca_mtt *__mthca_alloc_mtt(struct mthca_dev *dev, int size, mtt-buddy = buddy; mtt-order = 0; - for (i = MTHCA_MTT_SEG_SIZE / 8; i size; i = 1) + for (i = mthca_mtt_seg_size(dev) / sizeof(u64); i size; i = 1) ++mtt-order; mtt-first_seg = mthca_alloc_mtt_range(dev, mtt-order, buddy); @@ -259,7 +259,7 @@ static int __mthca_write_mtt(struct mthca_dev *dev, struct mthca_mtt *mtt, while (list_len 0) { mtt_entry[0] = cpu_to_be64(dev-mr_table.mtt_base + - mtt-first_seg * MTHCA_MTT_SEG_SIZE + + mtt-first_seg * mthca_mtt_seg_size(dev) +
Re: [openib-general] [PATCH 1/2] ofed_1_2 Changes to kernel_patches/ for Chelsio T3 Support.
- modified the qp_num - qp ptr patch to include cxgb3. If you don't mind, this might be better as a separate patch - it's just easier for me to continue pushing this upstream if I can just copy it from OFED sources. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts
Limit the timeout that the ib_cm will wait to receive a response to a message, to avoid excessively large (on the order of hours) timeout values. This prevents consuming resources tracking requests for extended periods of time. This helps correct for a bug in the SRP Engenio target sending a large value ( 1 hour) as a service timeout. Signed-off-by: Sean Hefty [EMAIL PROTECTED] A very similiar code is in OFED 1.1 (we chickened out and had a module parameter to disable this just in case, but I don't think its really needed upstream). Acked-by: Michael S. Tsirkin [EMAIL PROTECTED] -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 1/2] ofed_1_2 Changes to kernel_patches/ for Chelsio T3 Support.
On Mon, 2007-01-08 at 21:29 +0200, Michael S. Tsirkin wrote: - modified the qp_num - qp ptr patch to include cxgb3. If you don't mind, this might be better as a separate patch - it's just easier for me to continue pushing this upstream if I can just copy it from OFED sources. Ok...that makes sense. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH RFC 0/2] ofed_1_2 - Chelsio T3 RDMA Support
Core changes are required for the T3 driver. This includes the addition of a udata pointer parameter to the ib_req_notify_cq() provider method. This is still being discussed on the openib-general list and I'll update it accordingly once we finalize the solution. So what I plan to do is, review the patches are in proper format, but delay applying until this API issue is closed. OK? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA
On 1/8/07, Sean Hefty [EMAIL PROTECTED] wrote: I just noticed that once i apply the patch, the last + lines (that is pthread_mutex_lock, while loop doing pthread_cond_wait and then pthread_mutex_unlock) become part of rdma_leave_multicast which seems to me strictly buggy as no one is going to wake up this code. The leave must wait until all events have been reported on the multicast group. There can be more than one event on a group if an error occurs. See ucma_complete_mc_event() for where the condition is signaled. OK, got you, however printing resp-events_reported after the write call returns shows complete junk most of the times where as you explain here it should be 1 unless some error occurs. Looking on the ucma kernel code under http://www2.openfabrics.org/git/?p=~shefty/rdma-dev.git;a=blob;f=drivers/infiniband/core/ucma.c I think to see the bug: there is no copy_to_user() before ucma_leave_multicast() returns and hence the response structure at rdma_leave_multicast of librdmacm is not set to anything, what do you say? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Infiniband Network Library
what is a network library? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA
On 1/8/07, Or Gerlitz [EMAIL PROTECTED] wrote: explain here it should be 1 unless some error occurs. Looking on the ucma kernel code under http://www2.openfabrics.org/git/?p=~shefty/rdma-dev.git;a=blob;f=drivers/infiniband/core/ucma.c I have looked in the multicast-sa_cache branch. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH untested] IB/mthca: avoid wasting MTT enties on memfree
Have you tested this? I think it increases the amount of memory needed for the buddy allocator bitmaps by a factor of 8, and right now those bitmaps are kmalloc()ed. So I'd be aftraid that it would make it impossible to load the module. Anyway this is definitely 2.6.21 material given that we're already at 2.6.20-rc4, and this change is has a decent chance of introducing regressions. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.20 rdma_ucm: fix reporting events with invalid user context
On 1/5/07, Sean Hefty [EMAIL PROTECTED] wrote: There's a problem with how rdma cm events are reported to userspace that can lead to application crashes. When a new connection request arrives, a context for the connection is allocated in the kernel. The connection event is then reported to userspace. The userspace library retrieves the event and allocates its own context for the connection. The userspace context is associated with the kernel's context when accepting. This allows the kernel to give userspace context with other events. A problem occurs if a second event for the same connection occurs before the user has had a chance to call accept. The userspace context has not yet been set, which causes the librdmacm to crash. (This has been seen when the app takes too long to call accept, resulting in the remote side timing out and rejecting the connection.) Assuming that events are reported in order (correct?) then the user space consumer was calling rdma_get_cm_event, got a connection request and before calling rdma_accept they have called rdma_get_cm_event again and got connection reject ? Or the thing is that there are two threads in user space, one calling rdma_get_cm_event and on some events acting by itself where on other events causing another thread to act, so it got the conn request and moved it to the other thread and then got the conn reject and tried to act on it before the other thread called rdma_accept ? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA
I think to see the bug: there is no copy_to_user() before ucma_leave_multicast() returns and hence the response structure at rdma_leave_multicast of librdmacm is not set to anything, what do you say? This looks like problem. I wonder how this is working for me at all... maybe the response structure is being initialized to 0, but this doesn't match up with your debug output... I will look into this more, but the copy_to_user definitely seems to be missing. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH RFC 0/2] ofed_1_2 - Chelsio T3 RDMA Support
On Mon, 2007-01-08 at 21:57 +0200, Michael S. Tsirkin wrote: Core changes are required for the T3 driver. This includes the addition of a udata pointer parameter to the ib_req_notify_cq() provider method. This is still being discussed on the openib-general list and I'll update it accordingly once we finalize the solution. So what I plan to do is, review the patches are in proper format, but delay applying until this API issue is closed. OK? Right. Don't apply these at all. I just wanted folks to look at what I did and make sure it looks ok. I'll repost a final patch set after we resolve this issue. Thanks, Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.20 rdma_ucm: fix reporting events with invalid user context
Assuming that events are reported in order (correct?) then the user space consumer was calling rdma_get_cm_event, got a connection request and before calling rdma_accept they have called rdma_get_cm_event again and got connection reject ? The events are reported in order in the kernel, but the same guarantee cannot be made for userspace if an application is processing events using multiple threads. However, in the case where the bug occurred, a single thread was polling for events. Or the thing is that there are two threads in user space, one calling rdma_get_cm_event and on some events acting by itself where on other events causing another thread to act, so it got the conn request and moved it to the other thread and then got the conn reject and tried to act on it before the other thread called rdma_accept ? This was what was happening. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Infiniband Network Library
On Mon, Jan 08, 2007 at 12:26:00PM -0600, Sean Hubbell wrote: I have had to make significant work arounds for our current, third party network API that we purchased and continue to watch if fall down and still not take advantage on the bandwidth that I need. With that said, does anyone on this list have a recommendation for an InfiniBand capable network library? To amplify Roland's question: What does this library do that the existing ways of using Infiniband doesn't? Sockets, verbs, MPI... -- greg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH untested] IB/mthca: avoid wasting MTT enties on memfree
Subject: Re: [PATCH untested] IB/mthca: avoid wasting MTT enties on memfree Have you tested this? No, didn't I make this clear? Sorry. I'm not in the lab at the moment, and my laptop does not have infiniband. That's why it says untested in the subject :). I think it increases the amount of memory needed for the buddy allocator bitmaps by a factor of 8, and right now those bitmaps are kmalloc()ed. So I'd be aftraid that it would make it impossible to load the module. Hmph. We'll need to make these 2-level then? Anyway this is definitely 2.6.21 material given that we're already at 2.6.20-rc4, and this change is has a decent chance of introducing regressions. OK. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Infiniband Network Library
---BeginMessage--- Roland Dreier wrote: what is a network library? openpgm, openib are some but but I am looking for one that is a few levels higher or abstracted. I am looking for around 3 or 4 calls to send a message, something like connection, disconnect send and receive. Sean ---End Message--- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Infiniband Network Library
This would just be a higher level of abstraction... For example code to send 1 msg would look like Connect, Send and Disconnect... Sean Greg Lindahl wrote: On Mon, Jan 08, 2007 at 12:26:00PM -0600, Sean Hubbell wrote: I have had to make significant work arounds for our current, third party network API that we purchased and continue to watch if fall down and still not take advantage on the bandwidth that I need. With that said, does anyone on this list have a recommendation for an InfiniBand capable network library? To amplify Roland's question: What does this library do that the existing ways of using Infiniband doesn't? Sockets, verbs, MPI... -- greg ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCHv4] IPoIB CM Experimental support
- Message from Michael S. Tsirkin [EMAIL PROTECTED] on Mon, 8 Jan 2007 18:57:14 +0200 - To: openib-general@openib.org, Roland Dreier [EMAIL PROTECTED] Subject: [openib-general] [PATCHv4] IPoIB CM Experimental support The following patch adds experimental support for IPoIB connected mode. The idea is to increase performance by increasing the MTU from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD. With this code, I'm able to get 800MByte/sec or more with netperf without options on a Mellanox 4x back-to-back DDR system. Signed-off-by: Michael S. Tsirkin [EMAIL PROTECTED] --- Sorry about the churn, just fixed a bug in this code. [SNIP] e. Some notes on code 1. SRQ is used for scalability to large cluster sizes I still want to support non-SRQ adapters with this code. Not all systems have 100's or 1000's of endpoints and those smaller systems will benefit from IPoIB-CM. The larger systems tend to have larger memory per node so can support the additional memory requirements. At the November meeting one of the main themes from application developers and customers is we must have a well performing TCP/IP story across as much of the IB space as possible. If only one or two of the IB adapters perform well, then we haven't addressed the customer needs. Those adapters that can't support RC is one issue, but for those who do without SRQ, smaller configurations should be able to use IPoIB-CM. 2. Only RC connections are used (UC does not support SRQ now) 3. Retry count is set to 0 since spec draft warns against retries 4. Each connection is used for data transfers in only 1 direction, so each connection is either active(TX) or passive (RX). 2 sides that want to communicate create 2 connections. 5. Each active (TX) connection has a separate CQ for send completions - this keeps the code simple without CQ resize and other tricks Bernie King-Smith IBM Corporation Server Group Cluster System Performance [EMAIL PROTECTED](845)433-8483 Tie. 293-8483 or wombat2 on NOTES We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future. William Shatner___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA
On 1/8/07, Sean Hefty [EMAIL PROTECTED] wrote: I think to see the bug: there is no copy_to_user() before ucma_leave_multicast() returns and hence the response structure at rdma_leave_multicast of librdmacm is not set to anything, what do you say? This looks like problem. I wonder how this is working for me at all... I don't think mckey calls rdma_leave_multicast so maybe this is why you did not notice the problem? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Infiniband Network Library
what is a network library? openpgm, openib are some but but I am looking for one that is a few levels higher or abstracted. I am looking for around 3 or 4 calls to send a message, something like connection, disconnect send and receive. PGM is transport level, isn't it? So a few levels higher would be the Application layer in the OSI model ... Are you looking for something that works with e.g. SQL queries? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA
On 1/8/07, Sean Hefty [EMAIL PROTECTED] wrote: I just noticed that once i apply the patch, the last + lines (that is pthread_mutex_lock, while loop doing pthread_cond_wait and then pthread_mutex_unlock) become part of rdma_leave_multicast which seems to me strictly buggy as no one is going to wake up this code. The leave must wait until all events have been reported on the multicast group. There can be more than one event on a group if an error occurs. See ucma_complete_mc_event() for where the condition is signaled. let me see i follow your design: mc-events_completed is incremented in the library when the consumer calls rdma_ack_cm_event() and resp-events_reported is incremeted in the kernel called when the user calls rdma_get_cm_event() ? If this is indeed the case, the design seems fine to me, else it might be problematic eg if it does not support the case where there was multicast error but the user did not consume the associated event and now want to call rdma_leave_multicast(). Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] rdma_cm iWARP connection setup timeouts reported as rejects.
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Wise Sent: Monday, January 08, 2007 6:55 AM To: Mirko Benz Cc: openib-general@openib.org Subject: Re: [openib-general] [PATCH] rdma_cm iWARP connection setup timeouts reported as rejects. On Mon, 2007-01-08 at 12:13 +0100, Mirko Benz wrote: Hi, What could be the reasons for these timeouts to occur? One way: If the host is not reachable but the next hop neighbour is, then the connection attempt will timeout. Another way is if, for some reason, the MPA negotiation doesn't complete in a timely manner. For instance, if the passive side never rdma_accept()s the connection, then the active side should eventually timeout the attempt and return a timeout error to the consumer. One very important additonal example of MPA negotiation failure is the case where only one end of the TCP connection was anticipating the usage of MPA. For example, if an ssh client mistakenly tried to connect to an iWARP port, both sides would just sit there waiting for the other one to say something. An eventaul timeout is the only way out of this. How should an application handle this? Applications should handle connection timeouts however they want. Usually they just report it to the user. One way to look at it is that host unreachable is an *optimized* error report that deals with certain conditions where the unreachability can be quickly determined. In the more general case, the fact that a given host/service is currently unavailable is only known by its failure to answer. In most cases corrective action (either get the remote service restarted, make the path to it work, or select another service) is up to the user. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Infiniband Network Library
On 1/8/07, Sean Hubbell [EMAIL PROTECTED] wrote: This would just be a higher level of abstraction... For example code to send 1 msg would look like Connect, Send and Disconnect... From your email i understand that using BSD sockets over IB ULPs such as IPoIB UD, IPoIB CM or SDP is not enough for the performance enhancemt you want to get with IB. Can you share what are you hunting for, ie which from the following measures: BW / LAT / PPS / CPU %% and for which msg size huge/big/med/small Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] librdmacm: updated librdmacm to work with proposed 2.6.20 kernel CMA
I don't think mckey calls rdma_leave_multicast so maybe this is why you did not notice the problem? Yep - this was the case. I've updated mckey and created a patch for the kernel, which I'll push out through my rdma-dev tree shortly. Thanks for the report. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH RFC] return qp pointer as part of ib_wc
This change makes sense to me. Does anyone object to queueing this for 2.6.21? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH RFC] return qp pointer as part of ib_wc
Ok with me. On Mon, 2007-01-08 at 13:40 -0800, Roland Dreier wrote: This change makes sense to me. Does anyone object to queueing this for 2.6.21? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH RFC] return qp pointer as part of ib_wc
Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: [PATCH RFC] return qp pointer as part of ib_wc This change makes sense to me. Does anyone object to queueing this for 2.6.21? And for-mm, pls: last version of IPoIB CM patch needs this. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [RFC] userspace IB SA support
Today, userspace support for SA related operations is limited to the libibmad interface, which supports sending and receiving MADs only. I've been assigned with the task of exposing multicast and informinfo support to userspace. Specifically, the following functionality is needed: 1. Join a multicast group - needs to use the ib_sa multicast capability. 2. Receive notification of multicast errors. 3. Leave a multicast group. 4. Register to receive SA events - needs to use the ib_sa notice capability. 5. Receive notification of events. 6. Deregister from SA events. Are there any preferences for how this is added? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] 2.6.20 ib_cm: limit cm message timeouts
This all looks rather fishy: +/* + * Limit CM msg timeouts to something reasonable. + * 8 seconds, with up to 15 retries, gives per msg timeout of 2 min. + */ +#define IB_CM_MAX_TIMEOUT 21 OK... (although 8 seconds seems a little short -- it seems a somewhat longer timeout could be legitimate on a very busy fabric across a WAN or something like that) but then... +timeout = min(IB_CM_MAX_TIMEOUT, + cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + + cm_convert_to_ms(cm_id_priv-av.packet_life_time)); should the IB_CM_MAX_TIMEOUT be inside a cm_convert_to_ms() too? and similarly... -cm_id_priv-timeout_ms = param-timeout_ms; +cm_id_priv-timeout_ms = min(IB_CM_MAX_TIMEOUT, param-timeout_ms); is timeout_ms misnamed, or did we just limit all timeouts to 21 msecs? ...and other places in the patch seem to have similar problems. Also, I would like to see warning messages like ib_cm: Possibly bogus timeout of xx (yy msecs) in REP from GID printed in the kernel log so people realize they have broken SRP targets or whatever. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] 2.6.20: outstanding patches and issues
fix_query_qp_in_reset.patch will merge ib_verbs_h_missing_kref.patch does this actually fix any compilation problems? if not I think it's better for 2.6.21. mthca_0_fmr_page_fix.patch already merged in my tree pending a pull, right? Patch 5 of 5 is at v3, hope it's all good now. you only listed 4... mthca_1_merge_mr_fmr_on_64bit.patch mthca_2_fast_registration.patch mthca_3_alloc_consistent.patch mthca_4_dma_align_reserved_mtts.patch still need review but I don't think they're appropriate for 2.6.20 given how much they change some pretty key memory registration stuff. mthca_wrid_swap.patch - very small benefit, but very small patch either Will merge for 2.6.21 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] libibverbs: Improve driver loading
Hi Roland/others, Sorry to be a bit off-topic, but ... Is this a good time to submit the Transport Neutral Verbs code ? Roland had earlier suggested to do this after all major changes were finished and before libibverbs1.1 is released. The way the code is designed is to have the existing ibv_*() routines which are exported, but these are changed to simply call similarly named rdmav_*() routines (also exported) which implement the original code. The intention is to remove or deprecate the use of ibv_*() routines by the next release (1.2?). Thanks, - KK [EMAIL PROTECTED] wrote on 01/05/2007 07:49:39 PM: BTW, the question still stands. If I start trying to play with static linking issues, I'd like to do this based on this patch, not what's in master currently. Yes, I had hoped to push it out sooner but I wanted to fix all the driver libraries first. I didn't get a chance to finish that up before my vacatation, but I will do that soon and post patches for driver libraries when I change libibverbs. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ib_gid_is_link_local
On Fri, Jan 05, 2007 at 11:44:49AM -0500, Hal Rosenstock wrote: However, it might be smart to have opensm consider the routers to be a send-only member for every MLID.. Do you mean non-member rather than send-only member ? Routers need to receive as well as send, right ? Or are you worried about some other issue here ? I would like it if routers did not have to worry about joins in order to send a multicast packet. There really isn't a good way to know how long to keep a join active for.. Having them be send-only members of every relevent group skips that problem and decreases the latency for first-packet multicast forwarding. Also, I'm still not sure about a couple of aspects of every MLID: 1. Wouldn't the router only want to be full member of link local scoped MGIDs (that it was interested in locally) ? Are you saying any local scoped MGIDs not of interest would just get dropped anyhow ? If that is the point here, that would work but isn't there a performance impact of doing so ? 2. Similarly for any other (non local scope) MGRPs which do not match across any router ports, isn't there a performance impact of receiving and then having to drop/filter these packets ? I think there is a balance to be had here, on one side if you have alot of multicast groups (ie Ipv6 SNMs) then requiring alot of extra work to keep the SM informed about what is going on is more harmful than having the router get more multicast traffic than it optimally could. A router must already keep track of what multicast groups are forwarded to what ports, so it is virtually free for it to also do filtering. [Aside: The more I think about scaling a router up the more it seems to me that the router and SM need alot of intercommunication. The most efficient thing would be if the router could maintain a replica of the entire SM database for paths and multicast. The router would then always be ready to handle any incoming packet, just like an IB switch.] Right, the router is some sort of member on the MGRPs of interest. I think you are trying to make that list of MGRPs of interest simpler and utilize filtering where not needed (as I mentioned above), but I may Yes. Simpler, I hope : A onlink line routing table just terminates the routing lookup. 'unreachable' is another termination. A via line changes the next hop GID and creates more lookups until an onlink is reached. So is the specification of all multicast as onlink a short term thing then ? Also, with using onlink for all multicast, is there some forwarding determination made somewhere in the router stack ? I think it is useful to keep the router stack and the SM stack seperate. Especailly when it comes to multicast. The router will have a multicast routing table that works somewhat differently than the unicast table. This table would indicate which ports in the router are part of each group. The SM should only need a MGID to MLID path translation. This is similar to the distinction between a host routing table and a router routing table in IP land - where hosts generally do not have multicast routing information. Yes, I have no idea how IPv6 will work with large inter subnet clusters either. We had a thread on this a while ago and I think it died out at that point. To state the obvious, I think some changes need to be made for IPv6 to work well with current IB hardware or perhaps some configuration restrictions ? Yes, I agree. I wonder if the IPoIB RFC authors considerd the negative impact of IPv6 SNM when they designed the specification? It would be much better if 1 IP subnet = 1 IB subnet. Yes, but in this case I don't think multicast routing can be pushed to the host. It is either the router or some combination of the router and the SM. I'm not quite following you on this yet. Why/how is host multicast routing any different (than unicast) ? Well, I can't see how to make this situation sane if the host is in control and the routers/sm are fairly passive: - Two subnets, each with nodes joined to multicast group M - Two routers connecting the two subnets (multipath) - Each host has an inter-subnet multicast spanning tree and knows which router to use for M - Host sends a packet for M, what LRH does it use? Jason ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] 2.6.20: outstanding patches and issues
Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: 2.6.20: outstanding patches and issues fix_query_qp_in_reset.patch will merge ib_verbs_h_missing_kref.patch does this actually fix any compilation problems? if not I think it's better for 2.6.21. 2.6.21 then. mthca_0_fmr_page_fix.patch already merged in my tree pending a pull, right? Yes. Patch 5 of 5 is at v3, hope it's all good now. you only listed 4... mthca_1_merge_mr_fmr_on_64bit.patch mthca_2_fast_registration.patch mthca_3_alloc_consistent.patch mthca_4_dma_align_reserved_mtts.patch Because I counted mthca_0_fmr_page_fix.patch still need review but I don't think they're appropriate for 2.6.20 given how much they change some pretty key memory registration stuff. Hmph. I was afraid you'd say this. The only reason I'm surprised is that these do fix FMR on non-cache-coherent architectures - it's a bug fix, not just a feature searies. And you did say (patches 1-2 are what was posted then): http://article.gmane.org/gmane.linux.drivers.openib/34184/match=patchv2+mthca+speed+memory+registration+filling+mtts+directly I think this still can go into 2.6.20 after -rc1 if we can get this fixed up. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH][MINOR] OpenSM/osm_ucast_updn.c: Handle failed memory allocation
OpenSM/osm_ucast_updn.c: Handle failed memory allocation Signed-off-by: Hal Rosenstock [EMAIL PROTECTED] diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index 7fa119e..3d96478 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -628,6 +628,11 @@ updn_init( if (strlen(line) 1) { p_tmp = malloc(sizeof(uint64_t)); + if (!p_tmp) + { +status = IB_ERROR; +goto Exit; + } *p_tmp = strtoull(line, NULL, 16); cl_list_insert_tail(p_updn-p_root_nodes, p_tmp); } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] nightly osm_sim report 2007-01-09:normal completion
OSM Simulation Regression Summary OpenSM rev = Mon_Jan_8_12:41:44_2007 064f5e ibutils rev = Wed_Jan_3_11:42:12_2007 913448 Total=410 Pass=410 Fail=0 Pass: 30 Stability IS1-16.topo 30 Pkey IS1-16.topo 30 OsmTest IS1-16.topo 30 OsmStress IS1-16.topo 30 Multicast IS1-16.topo 30 LidMgr IS1-16.topo 10 Stability IS3-loop.topo 10 Stability IS3-128.topo 10 Pkey IS3-128.topo 10 OsmTest IS3-loop.topo 10 OsmTest IS3-128.topo 10 OsmStress IS3-128.topo 10 Multicast IS3-loop.topo 10 Multicast IS3-128.topo 10 LidMgr IS3-128.topo 10 FatTree part-4-ary-3-tree.topo 10 FatTree merge-roots-reorder-4-ary-2-tree.topo 10 FatTree merge-roots-4-ary-2-tree.topo 10 FatTree merge-root-4-ary-3-tree.topo 10 FatTree merge-root-12-ary-2-tree.topo 10 FatTree merge-2-ary-4-tree.topo 10 FatTree half-4-ary-3-tree.topo 10 FatTree blend-4-ary-2-tree.topo 10 FatTree 4-ary-4-tree.topo 10 FatTree 4-ary-3-tree.topo 10 FatTree 32nodes-3lvl-is1.topo 10 FatTree 2-ary-4-tree.topo 10 FatTree 12-node-spaced.topo 10 FatTree 12-ary-2-tree.topo Failures: ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ib_gid_is_link_local
On Tue, 2007-01-09 at 00:22, Jason Gunthorpe wrote: On Fri, Jan 05, 2007 at 11:44:49AM -0500, Hal Rosenstock wrote: However, it might be smart to have opensm consider the routers to be a send-only member for every MLID.. Do you mean non-member rather than send-only member ? Routers need to receive as well as send, right ? Or are you worried about some other issue here ? I would like it if routers did not have to worry about joins in order to send a multicast packet. Send-only members are not supposed to receive. How do they receive then ? Send-only members do not receive. Don't routers need to receive multicast (as well as send) ? There really isn't a good way to know how long to keep a join active for.. What about MGID creation/deletion events ? Having them be send-only members of every relevent group How does the router know every relevant group ? skips that problem and decreases the latency for first-packet multicast forwarding. It could be a benefit on first packet MC forwarding on a new group but it depends on when the first packet is received relative to the group detected and joined. Also, I'm still not sure about a couple of aspects of every MLID: 1. Wouldn't the router only want to be full member of link local scoped MGIDs (that it was interested in locally) ? Are you saying any local scoped MGIDs not of interest would just get dropped anyhow ? If that is the point here, that would work but isn't there a performance impact of doing so ? 2. Similarly for any other (non local scope) MGRPs which do not match across any router ports, isn't there a performance impact of receiving and then having to drop/filter these packets ? I think there is a balance to be had here, on one side if you have alot of multicast groups (ie Ipv6 SNMs) then requiring alot of extra work to keep the SM informed about what is going on is more harmful than having the router get more multicast traffic than it optimally could. A router must already keep track of what multicast groups are forwarded to what ports, so it is virtually free for it to also do filtering. [Aside: The more I think about scaling a router up the more it seems to me that the router and SM need alot of intercommunication. The most efficient thing would be if the router could maintain a replica of the entire SM database for paths and multicast. The router would then always be ready to handle any incoming packet, just like an IB switch.] There is no IBA standard for replicating the SM or SA database. This is a similar issue which multiple SMs in the same subnet might have depending on the approach taken for this. Right, the router is some sort of member on the MGRPs of interest. I think you are trying to make that list of MGRPs of interest simpler and utilize filtering where not needed (as I mentioned above), but I may Yes. Simpler, I hope : A onlink line routing table just terminates the routing lookup. 'unreachable' is another termination. A via line changes the next hop GID and creates more lookups until an onlink is reached. So is the specification of all multicast as onlink a short term thing then ? Also, with using onlink for all multicast, is there some forwarding determination made somewhere in the router stack ? I think it is useful to keep the router stack and the SM stack seperate. Yes, but on the other hand, you just said you wanted a partial copy of the SM database for the router... Especailly when it comes to multicast. The router will have a multicast routing table that works somewhat differently than the unicast table. This table would indicate which ports in the router are part of each group. The SM should only need a MGID to MLID path translation. Doesn't it already have this ? This is similar to the distinction between a host routing table and a router routing table in IP land - where hosts generally do not have multicast routing information. I don't think the SM needs the multicast routing (intersubnet) information either. Yes, I have no idea how IPv6 will work with large inter subnet clusters either. We had a thread on this a while ago and I think it died out at that point. To state the obvious, I think some changes need to be made for IPv6 to work well with current IB hardware or perhaps some configuration restrictions ? Yes, I agree. I wonder if the IPoIB RFC authors considerd the negative impact of IPv6 SNM when they designed the specification? Not sure. It would be much better if 1 IP subnet = 1 IB subnet. Yes, it would be better in terms of this but it was an architectural goal to allow flexibility in the IP - IB subnet mappings. There was a lot of discussion about this and there were earlier schemes which restricted to that mapping. Yes, but in this case I don't think multicast routing can be pushed to the host. It is either the router or some combination of the router
Re: [openib-general] [PATCH RFC] return qp pointer as part of ib_wc
Roland Dreier wrote: This change makes sense to me. Does anyone object to queueing this for 2.6.21? Indeed, it makes much sense, do you any idea what would it take to expose this capability also by libibverbs? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] userspace IB SA support
Sean Hefty wrote: Today, userspace support for SA related operations is limited to the libibmad interface, which supports sending and receiving MADs only. I've been assigned with the task of exposing multicast and informinfo support to userspace. Specifically, the following functionality is needed: 1. Join a multicast group - needs to use the ib_sa multicast capability. 2. Receive notification of multicast errors. 3. Leave a multicast group. 4. Register to receive SA events - needs to use the ib_sa notice capability. 5. Receive notification of events. 6. Deregister from SA events. Are there any preferences for how this is added? What about path query or any SA query from the user level ? Thanks Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] second version of the libibverbs man pages
[EMAIL PROTECTED] wrote: Hi all and Happy new year. * I rewrote the man pages and removed all of the extra characters of the POD module (according to Roland request). * I tried to stick with the 80 characters limit (according to James request), without 100% success (when i described the attributes of the structures, i needed more than 80 characters in a line..) * Several spelling mistakes were fixed Roland, what do you think? can you use this version and check in those files? Roland, do you plan to check in these man pages? thanks Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ib_gid_is_link_local
On Tue, Jan 09, 2007 at 01:17:47AM -0500, Hal Rosenstock wrote: I would like it if routers did not have to worry about joins in order to send a multicast packet. Send-only members are not supposed to receive. How do they receive then ? Send-only members do not receive. Don't routers need to receive multicast (as well as send) ? How about more exactly: The SM could implicity consider the router as a send-only member of all forwardable multicast groups until a reason arises for it to be a full member. Anyhow, I think this discussion has lost context and we are not thinking about the same things. Let me describe to you how I think that a router today can implement multicast without special SM support using IBA defined protocols: Let me try to do that: - The router maintains a table of all forwardable multicast groups on each IB subnet that it is connected to. - It also tracks for each router port which groups have receivers on the local subnet. If so the group on that port is flagged 'rxer' otherwise 'txonly' - This table is kept in sync with the SM by using SM traps and SM queries. - Each router port then computes a set of joins to perform on the local subnet based on this table: Join TypeLocal_MGID Remote_MGID none none txonly none none rxer none txonly none none txonly txonly[No receiver] full txonly rxer [Only remote receiver] none rxer none send-onlyrxer txonly[Only local receiver] full rxer rxer [Both receiver] Remote_MGID would be rxer if any other participating port has a rxer flag for this MGID. [participating port being derived from a multicast routing protocol]. (How exactly to determine the rxer/txonly flag and if this optimization is even really necessary is not something I have spent alot of time on just yet - but this conceptually describes the optimal, minimum spanning methodology.) - The router connects to other routers on the local subnet and performs a multicast routing protocol to produce a inter-subnet multicast spanning tree for each MGID. The results from this control which ports participate in each MGID. - Finally, the router programs its internal forwarding path. As an example using IPv6 SNM: 1) A new nodes comes up on subnet alpha and registers SNM MGID A as full membership. 2) The subnet alpha local router port learns of #1 from the local SM. 3) The router forwards the new mgid to other routers it is connected to via the multicast routing protocol. 4) On the beta subnet, another node registers as send-only for SNM MGID A. 5) The beta local router port learns of #4 from the local SM. 6) The router inspects its MGID table and finds one of its ports has a path to the rxer in #1. It joins MGID A on subnet beta as a full member, and the other port joins MGID A as a send-only member. 7) The above repeats through the chain of subnets until subnet alpha is reached. 8) A port on the router connected to subnet alpha sees the MGID A creation on one of its other ports and registers as send-only for MGID A on subnet alpha. (similar to step #6) 9) The host sends the SNM, unsubscribres from MGID A and the process reverses itself. I think this is within what IBA already defines and is pretty much what has to be done today to have a chance of working with existing subnet managers. I don't think it needs changes to the SM. I don't think it scales very well since it requires alot of exchanges between the SM and the routers. This is more or less what I had in mind during the concall we had last year... Also, I expect the first SNM message will be lost since the SM will ack the host before the router has received the trap, found the new MGID and joined it. (I don't think that is very good :) == The above describes the router as being autonomous of the SM. The routers learn of data the SM has through queries (a pull model). Another approach is to have the SM program the routers explicitly (a push model). In this view the router is more like an IB switch from a SM programming perspective. It has a more complex LinearFDB that uses GIDs rather than LIDs and a more complex multicast table that works on MGIDs rather than MLIDS. Like a switch the SM would program the router as needed. I view this as being more in line with the IB treatment of the network as a completely managed resource. It should be more efficient since the SM only sends what changes to the routers rather than the routers responding to traps/etc. I'd ultimately like to find other interested people to work on this idea since I think it has merit.. It is with this second case where my prior thoughts about optimization strategies make more sense. (Ie pre-arranging send-only status for the router is an optimization that lets the SM do less work on group creation, the SM