[PATCH] opensm: accept looonnng partition.conf lines
Sometimes we use partition.conf files with so many entries that the 1K buffer currently used by osm_prtn_config_parse_file() is too small. Use getline() to avoid that. Signed-off-by: akep...@sgi.com --- opensm/opensm/osm_prtn_config.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- diff --git a/opensm/opensm/osm_prtn_config.c b/opensm/opensm/osm_prtn_config.c index 0d02597..34676ef 100644 --- a/opensm/opensm/osm_prtn_config.c +++ b/opensm/opensm/osm_prtn_config.c @@ -400,7 +400,9 @@ skip_header: int osm_prtn_config_parse_file(osm_log_t * p_log, osm_subn_t * p_subn, const char *file_name) { - char line[1024]; + char *line = NULL; + ssize_t llen; + size_t n; struct part_conf *conf = NULL; FILE *file; int lineno; @@ -415,7 +417,7 @@ int osm_prtn_config_parse_file(osm_log_t * p_log, osm_subn_t * p_subn, lineno = 0; - while (fgets(line, sizeof(line) - 1, file) != NULL) { + while ((llen = getline(line, n, file)) != -1) { char *q, *p = line; lineno++; @@ -463,6 +465,8 @@ int osm_prtn_config_parse_file(osm_log_t * p_log, osm_subn_t * p_subn, } while (q); } + free(line); + fclose(file); return 0; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rdma/ib_cm: check LAP state before sending an MRA
On Wed, Jul 21, 2010 at 04:36:52PM -0700, Hefty, Sean wrote: ... Josh or Arthur, can either of you confirm if this patch fixes the crashes that you've seen? I can't. It's been practically impossible for us to reproduce. (Only our customer seems to have the magic recipe.) -- Arthur -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RESEND] mlx4_core: module param to limit msix vec allocation
The mlx4_core driver allocates 'nreq' msix vectors (and irqs), where: nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs, num_possible_cpus() + 1); ConnectX HCAs support 512 event queues (4 reserved). On a system with enough processors, we get: mlx4_core 0006:01:00.0: Requested 508 vectors, but only 256 MSI-X vectors available, trying again Further attempts (by other drivers) to allocate interrupts fail, because mlx4_core got 'em all. How about this? Signed-off-by: Arthur Kepner akep...@sgi.com --- main.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index e3e0d54..0a316d0 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -68,6 +68,10 @@ static int msi_x = 1; module_param(msi_x, int, 0444); MODULE_PARM_DESC(msi_x, attempt to use MSI-X if nonzero); +static int max_msi_x_vec = 64; +module_param(max_msi_x_vec, int, 0444); +MODULE_PARM_DESC(max_msi_x_vec, max MSI-X vectors we'll attempt to allocate); + #else /* CONFIG_PCI_MSI */ #define msi_x (0) @@ -968,8 +972,10 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) int i; if (msi_x) { + nreq = min_t(int, num_possible_cpus() + 1, max_msi_x_vec); nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs, -num_possible_cpus() + 1); +nreq); + entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL); if (!entries) goto no_msi; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] mlx4_core: module param to limit msix vec allocation
On Thu, Jun 17, 2010 at 05:53:58PM +0300, Yevgeny Petrilin wrote: I think that this patch would do the job, (Is that an ack?) Anyway we are thinking of ways to change our interrupt allocation scheme. Would be interested to know what you've got in mind. -- Arthur -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] mlx4_core: module param to limit msix vec allocation
On Sun, Jun 13, 2010 at 09:53:24AM +0300, Eli Cohen wrote: how many CPU cores are in your system? What kernel version did you use? I'm almost certain that it was a 2048 core system (it's not available right now for me to verify). We used 2.6.32.12 (sles11 sp1). -- Arthur -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] mlx4_core: module param to limit msix vec allocation
The mlx4_core driver allocates 'nreq' msix vectors (and irqs), where: nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs, num_possible_cpus() + 1); ConnectX HCAs support 512 event queues (4 reserved). On a system with enough processors, we get: mlx4_core 0006:01:00.0: Requested 508 vectors, but only 256 MSI-X vectors available, trying again Further attempts (by other drivers) to allocate interrupts fail, because mlx4_core got 'em all. How about this? Signed-off-by: Arthur Kepner akep...@sgi.com --- main.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index e3e0d54..0a316d0 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -68,6 +68,10 @@ static int msi_x = 1; module_param(msi_x, int, 0444); MODULE_PARM_DESC(msi_x, attempt to use MSI-X if nonzero); +static int max_msi_x_vec = 64; +module_param(max_msi_x_vec, int, 0444); +MODULE_PARM_DESC(max_msi_x_vec, max MSI-X vectors we'll attempt to allocate); + #else /* CONFIG_PCI_MSI */ #define msi_x (0) @@ -968,8 +972,10 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) int i; if (msi_x) { + nreq = min_t(int, num_possible_cpus() + 1, max_msi_x_vec); nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs, -num_possible_cpus() + 1); +nreq); + entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL); if (!entries) goto no_msi; -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] opensm: toggle sweeping V3
Add option to toggle sweeping from opensm console. Signed-off-by: Arthur Kepner akep...@sgi.com --- include/opensm/osm_subnet.h |6 ++ opensm/osm_console.c| 32 opensm/osm_state_mgr.c |8 +++- opensm/osm_subnet.c |1 + 4 files changed, 46 insertions(+), 1 deletion(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index d79ed8f..2a1db99 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -532,6 +532,7 @@ typedef struct osm_subn { boolean_t in_sweep_hop_0; boolean_t first_time_master_sweep; boolean_t coming_out_of_standby; + boolean_t sweeping_enabled; unsigned need_update; cl_fmap_t mgrp_mgid_tbl; void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1]; @@ -651,6 +652,11 @@ typedef struct osm_subn { * The flag is set true if the SM state was standby and now * changed to MASTER it is reset at the end of the sweep. * +* sweeping_enabled +* FALSE - sweeping is administratively disabled, all +* sweeping is inhibited, TRUE - sweeping is done +* normally +* * need_update * This flag should be on during first non-master heavy * (including pre-master discovery stage) diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c index 968486e..bc7bea3 100644 --- a/opensm/opensm/osm_console.c +++ b/opensm/opensm/osm_console.c @@ -150,6 +150,16 @@ static void help_reroute(FILE * out, int detail) } } +static void help_sweep(FILE * out, int detail) +{ + fprintf(out, sweep [on|off]\n); + if (detail) { + fprintf(out, enable or disable sweeping\n); + fprintf(out,[on] sweep normally\n); + fprintf(out,[off] inhibit all sweeping\n); + } +} + static void help_status(FILE * out, int detail) { fprintf(out, status [loop]\n); @@ -427,11 +437,15 @@ static void print_status(osm_opensm_t * p_osm, FILE * out) p_osm-stats.sa_mads_ignored); fprintf(out, \n Subnet flags\n \n + Sweeping enabled : %d\n + Sweep interval (seconds) : %d\n Ignore existing lfts : %d\n Subnet Init errors : %d\n In sweep hop 0 : %d\n First time master sweep: %d\n Coming out of standby : %d\n, + p_osm-subn.sweeping_enabled, + p_osm-subn.opt.sweep_interval, p_osm-subn.ignore_existing_lfts, p_osm-subn.subnet_initialization_error, p_osm-subn.in_sweep_hop_0, @@ -495,6 +509,23 @@ static void reroute_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) osm_opensm_sweep(p_osm); } +static void sweep_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) +{ + char *p_cmd; + + p_cmd = next_token(p_last); + if (!p_cmd || + (strcmp(p_cmd, on) != 0 strcmp(p_cmd, off) != 0)) { + fprintf(out, Invalid sweep command\n); + help_sweep(out, 1); + } else { + if (strcmp(p_cmd, on) == 0) + p_osm-subn.sweeping_enabled = TRUE; + else + p_osm-subn.sweeping_enabled = FALSE; + } +} + static void logflush_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) { fflush(p_osm-log.out_port); @@ -1332,6 +1363,7 @@ static const struct command console_cmds[] = { {priority, help_priority, priority_parse}, {resweep, help_resweep, resweep_parse}, {reroute, help_reroute, reroute_parse}, + {sweep, help_sweep, sweep_parse}, {status, help_status, status_parse}, {logflush, help_logflush, logflush_parse}, {querylid, help_querylid, querylid_parse}, diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index e43463f..81c8f54 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1415,7 +1415,13 @@ void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal) switch (signal) { case OSM_SIGNAL_SWEEP: - do_sweep(sm); + if (!sm-p_subn-sweeping_enabled) { + OSM_LOG(sm-p_log, OSM_LOG_DEBUG, sweeping disabled - + ignoring signal %s in state %s\n, + osm_get_sm_signal_str(signal), + osm_get_sm_mgr_state_str(sm-p_subn-sm_state)); + } else + do_sweep(sm); break; case
Re: [PATCH/RFC] opensm: toggle sweeping V2
On Sat, May 22, 2010 at 08:04:31PM +0300, Sasha Khapyorsky wrote: . I still not understand what is wrong with running OpenSM with sweep disabled and restarting when a fabric is ready. But anyway a new console command looks less aggressive for me than signaling... :) I think that they found that restarting opensm disrupted running jobs much more than just pausing/resuming normal sweeping. By pausing/resuming, they were able to grow the cluster without interrupting the jobs which were running on the old portion of the cluster. . The questions about patch is below. . /* do a sweep if we received a trap */ if (sm-p_subn-opt.sweep_on_trap) { - /* if this is trap number 128 or run_heavy_sweep is TRUE - - update the force_heavy_sweep flag of the subnet. - Sweep also on traps 144 - these traps signal a change of - certain port capabilities. - TODO: In the future this can be changed to just getting - PortInfo on this port instead of sweeping the entire subnet. */ - if (ib_notice_is_generic(p_ntci) - (cl_ntoh16(p_ntci-g_or_v.generic.trap_num) == 128 || -cl_ntoh16(p_ntci-g_or_v.generic.trap_num) == 144 || -run_heavy_sweep)) { - OSM_LOG(sm-p_log, OSM_LOG_VERBOSE, - Forcing heavy sweep. Received trap:%u\n, + if (!sm-p_subn-sweeping_enabled) { + OSM_LOG(sm-p_log, OSM_LOG_DEBUG, + sweeping disabled - ignoring trap %u\n, cl_ntoh16(p_ntci-g_or_v.generic.trap_num)); Isn't this case already handled in osm_state_mgr_process() and this code addition in osm_trap_rcv.c redundant? It is redundant. The only reason for it is to log the additional message about the ignored trap, instead of the less specific sweeping disabled - ignoring signal message. -- Arthur -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] opensm: toggle sweeping V2
One of our customers recently merged some new systems into a large, existing cluster. They requested a mechanism to prevent opensm from sweeping while the new equipment was being added to the IB fabric, and then resume sweeping once they felt confident that the newly added (sub)fabric was correctly cabled, and fully functional. They used something similar to the following patch. Comments? Signed-off-by: Arthur Kepner akep...@sgi.com --- include/opensm/osm_subnet.h |6 ++ opensm/osm_console.c| 32 opensm/osm_state_mgr.c |8 +++- opensm/osm_subnet.c |1 + opensm/osm_trap_rcv.c | 35 +-- 5 files changed, 67 insertions(+), 15 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index d79ed8f..2a1db99 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -532,6 +532,7 @@ typedef struct osm_subn { boolean_t in_sweep_hop_0; boolean_t first_time_master_sweep; boolean_t coming_out_of_standby; + boolean_t sweeping_enabled; unsigned need_update; cl_fmap_t mgrp_mgid_tbl; void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1]; @@ -651,6 +652,11 @@ typedef struct osm_subn { * The flag is set true if the SM state was standby and now * changed to MASTER it is reset at the end of the sweep. * +* sweeping_enabled +* FALSE - sweeping is administratively disabled, all +* sweeping is inhibited, TRUE - sweeping is done +* normally +* * need_update * This flag should be on during first non-master heavy * (including pre-master discovery stage) diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c index 968486e..bc7bea3 100644 --- a/opensm/opensm/osm_console.c +++ b/opensm/opensm/osm_console.c @@ -150,6 +150,16 @@ static void help_reroute(FILE * out, int detail) } } +static void help_sweep(FILE * out, int detail) +{ + fprintf(out, sweep [on|off]\n); + if (detail) { + fprintf(out, enable or disable sweeping\n); + fprintf(out,[on] sweep normally\n); + fprintf(out,[off] inhibit all sweeping\n); + } +} + static void help_status(FILE * out, int detail) { fprintf(out, status [loop]\n); @@ -427,11 +437,15 @@ static void print_status(osm_opensm_t * p_osm, FILE * out) p_osm-stats.sa_mads_ignored); fprintf(out, \n Subnet flags\n \n + Sweeping enabled : %d\n + Sweep interval (seconds) : %d\n Ignore existing lfts : %d\n Subnet Init errors : %d\n In sweep hop 0 : %d\n First time master sweep: %d\n Coming out of standby : %d\n, + p_osm-subn.sweeping_enabled, + p_osm-subn.opt.sweep_interval, p_osm-subn.ignore_existing_lfts, p_osm-subn.subnet_initialization_error, p_osm-subn.in_sweep_hop_0, @@ -495,6 +509,23 @@ static void reroute_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) osm_opensm_sweep(p_osm); } +static void sweep_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) +{ + char *p_cmd; + + p_cmd = next_token(p_last); + if (!p_cmd || + (strcmp(p_cmd, on) != 0 strcmp(p_cmd, off) != 0)) { + fprintf(out, Invalid sweep command\n); + help_sweep(out, 1); + } else { + if (strcmp(p_cmd, on) == 0) + p_osm-subn.sweeping_enabled = TRUE; + else + p_osm-subn.sweeping_enabled = FALSE; + } +} + static void logflush_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) { fflush(p_osm-log.out_port); @@ -1332,6 +1363,7 @@ static const struct command console_cmds[] = { {priority, help_priority, priority_parse}, {resweep, help_resweep, resweep_parse}, {reroute, help_reroute, reroute_parse}, + {sweep, help_sweep, sweep_parse}, {status, help_status, status_parse}, {logflush, help_logflush, logflush_parse}, {querylid, help_querylid, querylid_parse}, diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index e43463f..81c8f54 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1415,7 +1415,13 @@ void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal) switch (signal) { case OSM_SIGNAL_SWEEP: - do_sweep(sm
[RESEND] [PATCH/RFC] opensm: toggle sweeping
One of our customers recently merged some new systems into a large, existing cluster. They requested a mechanism to prevent opensm from sweeping while the new equipment was being added to the IB fabric, and then resume sweeping once they felt confident that the newly added (sub)fabric was correctly cabled, and fully functional. They used the following patch. Would it be worth adding this (or something with similar functionality) to opensm? Signed-off-by: Dale Talcott dale.r.talc...@nasa.gov Signed-off-by: Arthur Kepner akep...@sgi.com --- main.c | 16 osm_state_mgr.c |9 - osm_trap_rcv.c | 40 3 files changed, 48 insertions(+), 17 deletions(-) diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c index 0093aa7..c3d71bc 100644 --- a/opensm/opensm/main.c +++ b/opensm/opensm/main.c @@ -86,6 +86,12 @@ static void mark_usr1_flag(int signum) osm_usr1_flag = 1; } +int sweeping = 1; +static void toggle_sweeping(int signum) +{ + sweeping = !sweeping; +} + static sigset_t saved_sigset; static void block_signals() @@ -99,6 +105,7 @@ static void block_signals() #ifndef HAVE_OLD_LINUX_THREADS sigaddset(set, SIGUSR1); #endif + sigaddset(set, SIGUSR2); pthread_sigmask(SIG_SETMASK, set, saved_sigset); } @@ -118,6 +125,8 @@ static void setup_signals() act.sa_handler = mark_usr1_flag; sigaction(SIGUSR1, act, NULL); #endif + act.sa_handler = toggle_sweeping; + sigaction(SIGUSR2, act, NULL); pthread_sigmask(SIG_SETMASK, saved_sigset, NULL); } @@ -498,6 +507,7 @@ static int daemonize(osm_opensm_t * osm) int osm_manager_loop(osm_subn_opt_t * p_opt, osm_opensm_t * p_osm) { int console_init_flag = 0; + int prev_sweeping = sweeping; if (is_console_enabled(p_opt)) { if (!osm_console_init(p_opt, p_osm-console, p_osm-log)) @@ -524,6 +534,12 @@ int osm_manager_loop(osm_subn_opt_t * p_opt, osm_opensm_t * p_osm) p_osm-subn.force_heavy_sweep = TRUE; osm_opensm_sweep(p_osm); } + if (prev_sweeping != sweeping) { + prev_sweeping = sweeping; + OSM_LOG(p_osm-log, OSM_LOG_INFO, + Sweeping is now %s\n, + (sweeping ? enabled : disabled) ); + } } if (is_console_enabled(p_opt)) osm_console_exit(p_osm-console, p_osm-log); diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c index e43463f..e8eb47b 100644 --- a/opensm/opensm/osm_state_mgr.c +++ b/opensm/opensm/osm_state_mgr.c @@ -1405,6 +1405,7 @@ static void do_process_mgrp_queue(osm_sm_t * sm) void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal) { + extern int sweeping; CL_ASSERT(sm); OSM_LOG_ENTER(sm-p_log); @@ -1415,7 +1416,13 @@ void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal) switch (signal) { case OSM_SIGNAL_SWEEP: - do_sweep(sm); + if (!sweeping) + OSM_LOG(sm-p_log, OSM_LOG_DEBUG, sweeping disabled - + ignoring signal %s in state %s\n, + osm_get_sm_signal_str(signal), + osm_get_sm_mgr_state_str(sm-p_subn-sm_state)); + else + do_sweep(sm); break; case OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST: do_process_mgrp_queue(sm); diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c index bf13239..42e9b32 100644 --- a/opensm/opensm/osm_trap_rcv.c +++ b/opensm/opensm/osm_trap_rcv.c @@ -332,6 +332,7 @@ static void trap_rcv_process_request(IN osm_sm_t * sm, boolean_t physp_change_trap = FALSE; uint64_t event_wheel_timeout = OSM_DEFAULT_TRAP_SUPRESSION_TIMEOUT; boolean_t run_heavy_sweep = FALSE; + extern int sweeping; OSM_LOG_ENTER(sm-p_log); @@ -515,23 +516,30 @@ static void trap_rcv_process_request(IN osm_sm_t * sm, check_sweep: /* do a sweep if we received a trap */ if (sm-p_subn-opt.sweep_on_trap) { - /* if this is trap number 128 or run_heavy_sweep is TRUE - - update the force_heavy_sweep flag of the subnet. - Sweep also on traps 144 - these traps signal a change of - certain port capabilities. - TODO: In the future this can be changed to just getting - PortInfo on this port instead of sweeping the entire subnet. */ - if (ib_notice_is_generic(p_ntci) - (cl_ntoh16(p_ntci-g_or_v.generic.trap_num) == 128 || -cl_ntoh16(p_ntci-g_or_v.generic.trap_num) == 144 || -run_heavy_sweep
Re: [PATCH] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path
On Thu, Feb 25, 2010 at 11:29:02AM -0800, Ralph Campbell wrote: I haven't looked carefully at the whole patch, but this bit looks wrong: @@ -848,61 +823,112 @@ static void ipoib_neigh_cleanup(struct neighbour *n) struct ipoib_neigh *neigh; struct ipoib_dev_priv *priv = netdev_priv(n-dev); unsigned long flags; - struct ipoib_ah *ah = NULL; + + spin_lock_irqsave(priv-lock, flags); neigh = *to_ipoib_neigh(n); - if (neigh) - priv = netdev_priv(neigh-dev); - else + if (neigh) { Should this be if (!neigh) ? + spin_unlock_irqrestore(priv-lock, flags); return; + } + *to_ipoib_neigh(n) = NULL; + neigh-neighbour = NULL; + -- Arthur -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPoIB memory use after free
On Wed, Feb 17, 2010 at 12:02:36PM -0800, Ralph Campbell wrote: I have been tracking down a kernel panic while running qperf udp_bw tests and it looks like ib_ipoib is using memory after freeing it. The problem is with connected mode. I don't see the panic with datagram mode. Looking at the source code, I see that the process of creating the QP with the connection manager, ipoib_cm_create_tx(), has pointers to struct ipoib_neigh and struct ipoib_path but there doesn't seem to be a reference count or struct completion similar to the way the SA path record look up process has to prevent this. I'm working on a patch to test this theory but wanted to post this before going too far in case others are already aware of the problem and working on it. Could what you're seeing be related to what's reported here: http://lists.openfabrics.org/pipermail/general/2008-April/049629.html ? -- Arthur -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html