[PATCH] opensm: accept looonnng partition.conf lines

2011-01-24 Thread Arthur Kepner

Sometimes we use partition.conf files with so many entries that 
the 1K buffer currently used by osm_prtn_config_parse_file() is 
too small. Use getline() to avoid that.

Signed-off-by: akep...@sgi.com
---
 opensm/opensm/osm_prtn_config.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

---

diff --git a/opensm/opensm/osm_prtn_config.c b/opensm/opensm/osm_prtn_config.c
index 0d02597..34676ef 100644
--- a/opensm/opensm/osm_prtn_config.c
+++ b/opensm/opensm/osm_prtn_config.c
@@ -400,7 +400,9 @@ skip_header:
 int osm_prtn_config_parse_file(osm_log_t * p_log, osm_subn_t * p_subn,
   const char *file_name)
 {
-   char line[1024];
+   char *line = NULL;
+   ssize_t llen;
+   size_t n;
struct part_conf *conf = NULL;
FILE *file;
int lineno;
@@ -415,7 +417,7 @@ int osm_prtn_config_parse_file(osm_log_t * p_log, 
osm_subn_t * p_subn,
 
lineno = 0;
 
-   while (fgets(line, sizeof(line) - 1, file) != NULL) {
+   while ((llen = getline(line, n, file)) != -1) {
char *q, *p = line;
 
lineno++;
@@ -463,6 +465,8 @@ int osm_prtn_config_parse_file(osm_log_t * p_log, 
osm_subn_t * p_subn,
} while (q);
}
 
+   free(line);
+
fclose(file);
 
return 0;
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rdma/ib_cm: check LAP state before sending an MRA

2010-07-22 Thread Arthur Kepner
On Wed, Jul 21, 2010 at 04:36:52PM -0700, Hefty, Sean wrote:
 ...
 Josh or Arthur, can either of you confirm if this patch fixes the 
 crashes that you've seen?
 

I can't. It's been practically impossible for us to reproduce.
(Only our customer seems to have the magic recipe.) 

-- 
Arthur
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RESEND] mlx4_core: module param to limit msix vec allocation

2010-07-14 Thread Arthur Kepner

The mlx4_core driver allocates 'nreq' msix vectors (and irqs), 
where:

  nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs,
   num_possible_cpus() + 1);

ConnectX HCAs support 512 event queues (4 reserved). On a system 
with enough processors, we get:

  mlx4_core 0006:01:00.0: Requested 508 vectors, but only 256 MSI-X vectors 
available, trying again

Further attempts (by other drivers) to allocate interrupts fail, 
because mlx4_core got 'em all.

How about this?

Signed-off-by: Arthur Kepner akep...@sgi.com

---

 main.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index e3e0d54..0a316d0 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -68,6 +68,10 @@ static int msi_x = 1;
 module_param(msi_x, int, 0444);
 MODULE_PARM_DESC(msi_x, attempt to use MSI-X if nonzero);
 
+static int max_msi_x_vec = 64;
+module_param(max_msi_x_vec, int, 0444);
+MODULE_PARM_DESC(max_msi_x_vec, max MSI-X vectors we'll attempt to allocate);
+
 #else /* CONFIG_PCI_MSI */
 
 #define msi_x (0)
@@ -968,8 +972,10 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
int i;
 
if (msi_x) {
+   nreq = min_t(int, num_possible_cpus() + 1, max_msi_x_vec);
nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs,
-num_possible_cpus() + 1);
+nreq);
+
entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
if (!entries)
goto no_msi;

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] mlx4_core: module param to limit msix vec allocation

2010-06-17 Thread Arthur Kepner
On Thu, Jun 17, 2010 at 05:53:58PM +0300, Yevgeny Petrilin wrote:
 I think that this patch would do the job,

(Is that an ack?)

 Anyway we are thinking of ways to change our interrupt allocation scheme.
 

Would be interested to know what you've got in mind.

-- 
Arthur
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] mlx4_core: module param to limit msix vec allocation

2010-06-14 Thread Arthur Kepner
On Sun, Jun 13, 2010 at 09:53:24AM +0300, Eli Cohen wrote:
 
 how many CPU cores are in your system? What kernel version did you
 use?

I'm almost certain that it was a 2048 core system (it's not available 
right now for me to verify).

We used 2.6.32.12 (sles11 sp1).

-- 
Arthur
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] mlx4_core: module param to limit msix vec allocation

2010-06-10 Thread Arthur Kepner

The mlx4_core driver allocates 'nreq' msix vectors (and irqs), 
where:

  nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs,
   num_possible_cpus() + 1);

ConnectX HCAs support 512 event queues (4 reserved). On a system 
with enough processors, we get:

  mlx4_core 0006:01:00.0: Requested 508 vectors, but only 256 MSI-X vectors 
available, trying again

Further attempts (by other drivers) to allocate interrupts fail, 
because mlx4_core got 'em all.

How about this?

Signed-off-by: Arthur Kepner akep...@sgi.com

---

 main.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index e3e0d54..0a316d0 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -68,6 +68,10 @@ static int msi_x = 1;
 module_param(msi_x, int, 0444);
 MODULE_PARM_DESC(msi_x, attempt to use MSI-X if nonzero);
 
+static int max_msi_x_vec = 64;
+module_param(max_msi_x_vec, int, 0444);
+MODULE_PARM_DESC(max_msi_x_vec, max MSI-X vectors we'll attempt to allocate);
+
 #else /* CONFIG_PCI_MSI */
 
 #define msi_x (0)
@@ -968,8 +972,10 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
int i;
 
if (msi_x) {
+   nreq = min_t(int, num_possible_cpus() + 1, max_msi_x_vec);
nreq = min_t(int, dev-caps.num_eqs - dev-caps.reserved_eqs,
-num_possible_cpus() + 1);
+nreq);
+
entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
if (!entries)
goto no_msi;
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] opensm: toggle sweeping V3

2010-05-25 Thread Arthur Kepner

Add option to toggle sweeping from opensm console.

Signed-off-by: Arthur Kepner akep...@sgi.com

--- 

 include/opensm/osm_subnet.h |6 ++
 opensm/osm_console.c|   32 
 opensm/osm_state_mgr.c  |8 +++-
 opensm/osm_subnet.c |1 +
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index d79ed8f..2a1db99 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -532,6 +532,7 @@ typedef struct osm_subn {
boolean_t in_sweep_hop_0;
boolean_t first_time_master_sweep;
boolean_t coming_out_of_standby;
+   boolean_t sweeping_enabled;
unsigned need_update;
cl_fmap_t mgrp_mgid_tbl;
void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
@@ -651,6 +652,11 @@ typedef struct osm_subn {
 *  The flag is set true if the SM state was standby and now
 *  changed to MASTER it is reset at the end of the sweep.
 *
+*  sweeping_enabled
+*  FALSE - sweeping is administratively disabled, all
+*  sweeping is inhibited, TRUE - sweeping is done
+*  normally
+*
 *  need_update
 *  This flag should be on during first non-master heavy
 *  (including pre-master discovery stage)
diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c
index 968486e..bc7bea3 100644
--- a/opensm/opensm/osm_console.c
+++ b/opensm/opensm/osm_console.c
@@ -150,6 +150,16 @@ static void help_reroute(FILE * out, int detail)
}
 }
 
+static void help_sweep(FILE * out, int detail)
+{
+   fprintf(out, sweep [on|off]\n);
+   if (detail) {
+   fprintf(out, enable or disable sweeping\n);
+   fprintf(out,[on] sweep normally\n);
+   fprintf(out,[off] inhibit all sweeping\n);
+   }
+}
+
 static void help_status(FILE * out, int detail)
 {
fprintf(out, status [loop]\n);
@@ -427,11 +437,15 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
p_osm-stats.sa_mads_ignored);
fprintf(out, \n   Subnet flags\n
   \n
+  Sweeping enabled   : %d\n
+  Sweep interval (seconds)   : %d\n
   Ignore existing lfts   : %d\n
   Subnet Init errors : %d\n
   In sweep hop 0 : %d\n
   First time master sweep: %d\n
   Coming out of standby  : %d\n,
+   p_osm-subn.sweeping_enabled,
+   p_osm-subn.opt.sweep_interval,
p_osm-subn.ignore_existing_lfts,
p_osm-subn.subnet_initialization_error,
p_osm-subn.in_sweep_hop_0,
@@ -495,6 +509,23 @@ static void reroute_parse(char **p_last, osm_opensm_t * 
p_osm, FILE * out)
osm_opensm_sweep(p_osm);
 }
 
+static void sweep_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)
+{
+   char *p_cmd;
+
+   p_cmd = next_token(p_last);
+   if (!p_cmd ||
+   (strcmp(p_cmd, on) != 0  strcmp(p_cmd, off) != 0)) {
+   fprintf(out, Invalid sweep command\n);
+   help_sweep(out, 1);
+   } else {
+   if (strcmp(p_cmd, on) == 0)
+   p_osm-subn.sweeping_enabled = TRUE;
+   else
+   p_osm-subn.sweeping_enabled = FALSE;
+   }
+}
+
 static void logflush_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)
 {
fflush(p_osm-log.out_port);
@@ -1332,6 +1363,7 @@ static const struct command console_cmds[] = {
{priority, help_priority, priority_parse},
{resweep, help_resweep, resweep_parse},
{reroute, help_reroute, reroute_parse},
+   {sweep, help_sweep, sweep_parse},
{status, help_status, status_parse},
{logflush, help_logflush, logflush_parse},
{querylid, help_querylid, querylid_parse},
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index e43463f..81c8f54 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1415,7 +1415,13 @@ void osm_state_mgr_process(IN osm_sm_t * sm, IN 
osm_signal_t signal)
 
switch (signal) {
case OSM_SIGNAL_SWEEP:
-   do_sweep(sm);
+   if (!sm-p_subn-sweeping_enabled) {
+   OSM_LOG(sm-p_log, OSM_LOG_DEBUG, sweeping disabled - 
+   ignoring signal %s in state %s\n,
+   osm_get_sm_signal_str(signal),
+   osm_get_sm_mgr_state_str(sm-p_subn-sm_state));
+   } else
+   do_sweep(sm);
break;
case

Re: [PATCH/RFC] opensm: toggle sweeping V2

2010-05-24 Thread Arthur Kepner
On Sat, May 22, 2010 at 08:04:31PM +0300, Sasha Khapyorsky wrote:
 .
 I still not understand what is wrong with running OpenSM with sweep
 disabled and restarting when a fabric is ready. But anyway a new
 console command looks less aggressive for me than signaling... :)

I think that they found that restarting opensm disrupted running jobs 
much more than just pausing/resuming normal sweeping. By pausing/resuming, 
they were able to grow the cluster without interrupting the jobs which 
were running on the old portion of the cluster. 

 .
 The questions about patch is below.
 
  .
  /* do a sweep if we received a trap */
  if (sm-p_subn-opt.sweep_on_trap) {
 
  -   /* if this is trap number 128 or run_heavy_sweep is TRUE -
  -  update the force_heavy_sweep flag of the subnet.
  -  Sweep also on traps 144 - these traps signal a change of
  -  certain port capabilities.
  -  TODO: In the future this can be changed to just getting
  -  PortInfo on this port instead of sweeping the entire subnet. 
  */
  -   if (ib_notice_is_generic(p_ntci) 
  -   (cl_ntoh16(p_ntci-g_or_v.generic.trap_num) == 128 ||
  -cl_ntoh16(p_ntci-g_or_v.generic.trap_num) == 144 ||
  -run_heavy_sweep)) {
  -   OSM_LOG(sm-p_log, OSM_LOG_VERBOSE,
  -   Forcing heavy sweep. Received trap:%u\n,
  +   if (!sm-p_subn-sweeping_enabled) {
  +   OSM_LOG(sm-p_log, OSM_LOG_DEBUG,
  +   sweeping disabled - ignoring trap %u\n,
  cl_ntoh16(p_ntci-g_or_v.generic.trap_num));
 
 Isn't this case already handled in osm_state_mgr_process() and this code
 addition in osm_trap_rcv.c redundant?

It is redundant. The only reason for it is to log the additional message 
about the ignored trap, instead of the less specific sweeping disabled - 
ignoring signal  message. 

-- 
Arthur
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] opensm: toggle sweeping V2

2010-05-19 Thread Arthur Kepner

One of our customers recently merged some new systems into a 
large, existing cluster. They requested a mechanism to prevent 
opensm from sweeping while the new equipment was being added to 
the IB fabric, and then resume sweeping once they felt confident 
that the newly added (sub)fabric was correctly cabled, and fully 
functional. They used something similar to the following patch. 

Comments?

Signed-off-by: Arthur Kepner akep...@sgi.com

--- 

 include/opensm/osm_subnet.h |6 ++
 opensm/osm_console.c|   32 
 opensm/osm_state_mgr.c  |8 +++-
 opensm/osm_subnet.c |1 +
 opensm/osm_trap_rcv.c   |   35 +--
 5 files changed, 67 insertions(+), 15 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h 
b/opensm/include/opensm/osm_subnet.h
index d79ed8f..2a1db99 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -532,6 +532,7 @@ typedef struct osm_subn {
boolean_t in_sweep_hop_0;
boolean_t first_time_master_sweep;
boolean_t coming_out_of_standby;
+   boolean_t sweeping_enabled;
unsigned need_update;
cl_fmap_t mgrp_mgid_tbl;
void *mboxes[IB_LID_MCAST_END_HO - IB_LID_MCAST_START_HO + 1];
@@ -651,6 +652,11 @@ typedef struct osm_subn {
 *  The flag is set true if the SM state was standby and now
 *  changed to MASTER it is reset at the end of the sweep.
 *
+*  sweeping_enabled
+*  FALSE - sweeping is administratively disabled, all
+*  sweeping is inhibited, TRUE - sweeping is done
+*  normally
+*
 *  need_update
 *  This flag should be on during first non-master heavy
 *  (including pre-master discovery stage)
diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c
index 968486e..bc7bea3 100644
--- a/opensm/opensm/osm_console.c
+++ b/opensm/opensm/osm_console.c
@@ -150,6 +150,16 @@ static void help_reroute(FILE * out, int detail)
}
 }
 
+static void help_sweep(FILE * out, int detail)
+{
+   fprintf(out, sweep [on|off]\n);
+   if (detail) {
+   fprintf(out, enable or disable sweeping\n);
+   fprintf(out,[on] sweep normally\n);
+   fprintf(out,[off] inhibit all sweeping\n);
+   }
+}
+
 static void help_status(FILE * out, int detail)
 {
fprintf(out, status [loop]\n);
@@ -427,11 +437,15 @@ static void print_status(osm_opensm_t * p_osm, FILE * out)
p_osm-stats.sa_mads_ignored);
fprintf(out, \n   Subnet flags\n
   \n
+  Sweeping enabled   : %d\n
+  Sweep interval (seconds)   : %d\n
   Ignore existing lfts   : %d\n
   Subnet Init errors : %d\n
   In sweep hop 0 : %d\n
   First time master sweep: %d\n
   Coming out of standby  : %d\n,
+   p_osm-subn.sweeping_enabled,
+   p_osm-subn.opt.sweep_interval,
p_osm-subn.ignore_existing_lfts,
p_osm-subn.subnet_initialization_error,
p_osm-subn.in_sweep_hop_0,
@@ -495,6 +509,23 @@ static void reroute_parse(char **p_last, osm_opensm_t * 
p_osm, FILE * out)
osm_opensm_sweep(p_osm);
 }
 
+static void sweep_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)
+{
+   char *p_cmd;
+
+   p_cmd = next_token(p_last);
+   if (!p_cmd ||
+   (strcmp(p_cmd, on) != 0  strcmp(p_cmd, off) != 0)) {
+   fprintf(out, Invalid sweep command\n);
+   help_sweep(out, 1);
+   } else {
+   if (strcmp(p_cmd, on) == 0)
+   p_osm-subn.sweeping_enabled = TRUE;
+   else
+   p_osm-subn.sweeping_enabled = FALSE;
+   }
+}
+
 static void logflush_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)
 {
fflush(p_osm-log.out_port);
@@ -1332,6 +1363,7 @@ static const struct command console_cmds[] = {
{priority, help_priority, priority_parse},
{resweep, help_resweep, resweep_parse},
{reroute, help_reroute, reroute_parse},
+   {sweep, help_sweep, sweep_parse},
{status, help_status, status_parse},
{logflush, help_logflush, logflush_parse},
{querylid, help_querylid, querylid_parse},
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index e43463f..81c8f54 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1415,7 +1415,13 @@ void osm_state_mgr_process(IN osm_sm_t * sm, IN 
osm_signal_t signal)
 
switch (signal) {
case OSM_SIGNAL_SWEEP:
-   do_sweep(sm

[RESEND] [PATCH/RFC] opensm: toggle sweeping

2010-04-28 Thread Arthur Kepner

One of our customers recently merged some new systems into a
large, existing cluster. They requested a mechanism to prevent
opensm from sweeping while the new equipment was being added to
the IB fabric, and then resume sweeping once they felt confident
that the newly added (sub)fabric was correctly cabled, and fully
functional. They used the following patch.

Would it be worth adding this (or something with similar functionality)
to opensm?

Signed-off-by: Dale Talcott dale.r.talc...@nasa.gov
Signed-off-by: Arthur Kepner akep...@sgi.com

---

 main.c  |   16 
 osm_state_mgr.c |9 -
 osm_trap_rcv.c  |   40 
 3 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 0093aa7..c3d71bc 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -86,6 +86,12 @@ static void mark_usr1_flag(int signum)
osm_usr1_flag = 1;
 }
 
+int sweeping = 1;
+static void toggle_sweeping(int signum)
+{
+   sweeping = !sweeping;
+}
+
 static sigset_t saved_sigset;
 
 static void block_signals()
@@ -99,6 +105,7 @@ static void block_signals()
 #ifndef HAVE_OLD_LINUX_THREADS
sigaddset(set, SIGUSR1);
 #endif
+   sigaddset(set, SIGUSR2);
pthread_sigmask(SIG_SETMASK, set, saved_sigset);
 }
 
@@ -118,6 +125,8 @@ static void setup_signals()
act.sa_handler = mark_usr1_flag;
sigaction(SIGUSR1, act, NULL);
 #endif
+   act.sa_handler = toggle_sweeping;
+   sigaction(SIGUSR2, act, NULL);
pthread_sigmask(SIG_SETMASK, saved_sigset, NULL);
 }
 
@@ -498,6 +507,7 @@ static int daemonize(osm_opensm_t * osm)
 int osm_manager_loop(osm_subn_opt_t * p_opt, osm_opensm_t * p_osm)
 {
int console_init_flag = 0;
+   int prev_sweeping = sweeping;
 
if (is_console_enabled(p_opt)) {
if (!osm_console_init(p_opt, p_osm-console, p_osm-log))
@@ -524,6 +534,12 @@ int osm_manager_loop(osm_subn_opt_t * p_opt, osm_opensm_t 
* p_osm)
p_osm-subn.force_heavy_sweep = TRUE;
osm_opensm_sweep(p_osm);
}
+   if (prev_sweeping != sweeping) {
+   prev_sweeping = sweeping;
+   OSM_LOG(p_osm-log, OSM_LOG_INFO,
+   Sweeping is now %s\n,
+   (sweeping ? enabled : disabled) );
+   }
}
if (is_console_enabled(p_opt))
osm_console_exit(p_osm-console, p_osm-log);
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index e43463f..e8eb47b 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1405,6 +1405,7 @@ static void do_process_mgrp_queue(osm_sm_t * sm)
 
 void osm_state_mgr_process(IN osm_sm_t * sm, IN osm_signal_t signal)
 {
+   extern int sweeping;
CL_ASSERT(sm);
 
OSM_LOG_ENTER(sm-p_log);
@@ -1415,7 +1416,13 @@ void osm_state_mgr_process(IN osm_sm_t * sm, IN 
osm_signal_t signal)
 
switch (signal) {
case OSM_SIGNAL_SWEEP:
-   do_sweep(sm);
+   if (!sweeping)
+   OSM_LOG(sm-p_log, OSM_LOG_DEBUG, sweeping disabled - 
+   ignoring signal %s in state %s\n,
+   osm_get_sm_signal_str(signal),
+   osm_get_sm_mgr_state_str(sm-p_subn-sm_state));
+   else
+   do_sweep(sm);
break;
case OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST:
do_process_mgrp_queue(sm);
diff --git a/opensm/opensm/osm_trap_rcv.c b/opensm/opensm/osm_trap_rcv.c
index bf13239..42e9b32 100644
--- a/opensm/opensm/osm_trap_rcv.c
+++ b/opensm/opensm/osm_trap_rcv.c
@@ -332,6 +332,7 @@ static void trap_rcv_process_request(IN osm_sm_t * sm,
boolean_t physp_change_trap = FALSE;
uint64_t event_wheel_timeout = OSM_DEFAULT_TRAP_SUPRESSION_TIMEOUT;
boolean_t run_heavy_sweep = FALSE;
+   extern int sweeping;
 
OSM_LOG_ENTER(sm-p_log);
 
@@ -515,23 +516,30 @@ static void trap_rcv_process_request(IN osm_sm_t * sm,
 check_sweep:
/* do a sweep if we received a trap */
if (sm-p_subn-opt.sweep_on_trap) {
-   /* if this is trap number 128 or run_heavy_sweep is TRUE -
-  update the force_heavy_sweep flag of the subnet.
-  Sweep also on traps 144 - these traps signal a change of
-  certain port capabilities.
-  TODO: In the future this can be changed to just getting
-  PortInfo on this port instead of sweeping the entire subnet. 
*/
-   if (ib_notice_is_generic(p_ntci) 
-   (cl_ntoh16(p_ntci-g_or_v.generic.trap_num) == 128 ||
-cl_ntoh16(p_ntci-g_or_v.generic.trap_num) == 144 ||
-run_heavy_sweep

Re: [PATCH] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path

2010-02-25 Thread Arthur Kepner
On Thu, Feb 25, 2010 at 11:29:02AM -0800, Ralph Campbell wrote:
  

I haven't looked carefully at the whole patch, but this bit 
looks wrong:

 @@ -848,61 +823,112 @@ static void ipoib_neigh_cleanup(struct neighbour *n)
   struct ipoib_neigh *neigh;
   struct ipoib_dev_priv *priv = netdev_priv(n-dev);
   unsigned long flags;
 - struct ipoib_ah *ah = NULL;
 +
 + spin_lock_irqsave(priv-lock, flags);
  
   neigh = *to_ipoib_neigh(n);
 - if (neigh)
 - priv = netdev_priv(neigh-dev);
 - else
 + if (neigh) {

Should this be if (!neigh) ?

 + spin_unlock_irqrestore(priv-lock, flags);
   return;
 + }
 + *to_ipoib_neigh(n) = NULL;
 + neigh-neighbour = NULL;
 +

-- 
Arthur

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPoIB memory use after free

2010-02-17 Thread Arthur Kepner
On Wed, Feb 17, 2010 at 12:02:36PM -0800, Ralph Campbell wrote:
 I have been tracking down a kernel panic while running qperf udp_bw
 tests and it looks like ib_ipoib is using memory after freeing it.
 
 The problem is with connected mode. I don't see the panic with
 datagram mode. Looking at the source code, I see that the process
 of creating the QP with the connection manager, ipoib_cm_create_tx(),
 has pointers to struct ipoib_neigh and struct ipoib_path but there
 doesn't seem to be a reference count or struct completion similar to
 the way the SA path record look up process has to prevent this.
 
 I'm working on a patch to test this theory but wanted to post
 this before going too far in case others are already aware
 of the problem and working on it.
 

Could what you're seeing be related to what's reported here:

http://lists.openfabrics.org/pipermail/general/2008-April/049629.html

?

-- 
Arthur
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html