Re: [openib-general] [PATCH 3/4] New routing module which loads LFT tables from dump file.

2006-06-13 Thread Eitan Zahavi
Hi Hal, Sasha,

Regarding OpenSM coding style:

Sasha wrote:
> 
> Really? Don't want to bother with examples, but I may see almost any
> "combination" in OpenSM and it is not clear for me which one is common
> (the coding style and identation are different even from file to
file).
[EZ] This bothers me as I think we should use a consistent coding style.
You might also remember we had put in place a both a script to do
automatic indentation and coding style rule fixes (osm_indent and
osm_check_n_fix)

I did check for all "else" statements:
osm/opensm>grep else *.c | wc -l
397
osm/opensm>grep else *.c | grep -v "{" | grep -v "}" | wc -l
361

So you can see only <10%  (36 out of 397) "else" statement are not
coding style consistent. 
Checking what is the code that is "non standard":
osm/opensm>grep else *.c | grep "{" | awk '{print $1}' | sort | uniq -c
| sort -rn
  7 osm_console.c:
  6 osm_prtn_config.c:
  3 st.c:
  3 osm_sa_multipath_record.c:
  2 osm_ucast_mgr.c:
  2 osm_sa_path_record.c:
  1 osm_sa_mcmember_record.c:
  1 osm_sa_informinfo.c:
  1 osm_sa_class_port_info.c:
  1 osm_multicast.c:

You can see the majority of these mismatches are in code introduced by
Hal and yourself.

I think OpenSM should sue a single code style. My proposal is that we
update our osm_indent script with a set of rules we agree on and apply
to the entire tree.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] MSI enabled (was OFED 1.0 release schedule)

2006-06-13 Thread Tziporet Koren
Since this is the case in the git tree too we have not changed it.
Most our QA so far run in this way so I don't want to change the default
now.
I will add this option in mthca release notes.

Tziporet

-Original Message-
From: Woodruff, Robert J [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 14, 2006 3:09 AM
To: Woodruff, Robert J; Betsy Zeller; Tziporet Koren; Davis, Arlin R
Cc: Matt L. Leininger; OpenFabricsEWG; openib; Matters, Todd
Subject: RE: [openib-general] OFED 1.0 release schedule

 Tziporet wrote,
>We upload OFED-1.0-pre1.tgz to
> https://openib.org/svn/gen2/branches/1.0/ofed/releases/
> 

One other thing I noticed is that you do not enable MSI interrupt
mode by default. You will get lower performance if you do not 
enable MSI. I think you can set it when you load the driver with a 
modprobe parameter. 

woody



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.0 release schedule

2006-06-13 Thread Betsy Zeller
Woody - you are absolutely correct for ipath - you definitely want MSI
interrupts enabled. We (QLogic) need to submit this information for
inclusion in the OFED 1.0 release notes. 

Thanks, Betsy

On Tue, 2006-06-13 at 17:09 -0700, Woodruff, Robert J wrote:
>  Tziporet wrote,
> >We upload OFED-1.0-pre1.tgz to
> > https://openib.org/svn/gen2/branches/1.0/ofed/releases/
> > 
> 
> One other thing I noticed is that you do not enable MSI interrupt
> mode by default. You will get lower performance if you do not 
> enable MSI. I think you can set it when you load the driver with a 
> modprobe parameter. 
> 
> woody


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-13 Thread Sean Hefty
>Is the only downside of a larger timeout that potentially more memory
>accumulates (until the timeout occurs) before it is freed ?

This is the only one that I can think of.  Can anyone think of others?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] MPI error when using a "system" call in mpi job.

2006-06-13 Thread Ira Weiny
A co-worker here was seeing the following MPI error from his job:

[1] Abort: [ldev2:1] Got completion with error, code=1
 at line 2148 in file viacheck.c

After some tracking down he found that apparently if he used a "system" call
[int system(const char *string)] the next MPI command will fail.

I have been able to reproduce this with the attached simple "hello" program.

Perhaps someone has seen this type of error?  Here is the output from 2 runs:

[EMAIL PROTECTED]:~/ior-test
17:04:04 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello x
ldev1
[0] Abort: [ldev1:0] Got completion with error, code=1
 at line 2148 in file viacheck.c
ldev2
mpirun_rsh: Abort signaled from [0]
done.
[EMAIL PROTECTED]:~/ior-test
17:05:23 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello
now = 0.00
now = 0.52
now = 0.94
now = 0.000121
now = 0.000151
now = 0.001072
now = 0.001102
now = 0.001118
now = 0.001141
now = 0.001160
done.

We are running mvapich 0.9.7 and the openib trunk rev 6829.

Thanks,
Ira



hello.c
Description: Binary data
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OFED 1.0 release schedule

2006-06-13 Thread Woodruff, Robert J
 Tziporet wrote,
>We upload OFED-1.0-pre1.tgz to
> https://openib.org/svn/gen2/branches/1.0/ofed/releases/
> 

One other thing I noticed is that you do not enable MSI interrupt
mode by default. You will get lower performance if you do not 
enable MSI. I think you can set it when you load the driver with a 
modprobe parameter. 

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/4] opensm: Loading unicast routes from the file

2006-06-13 Thread Greg Johnson
On Tue, Jun 13, 2006 at 11:00:35PM +0300, Sasha Khapyorsky wrote:
> Hi Greg,
> 
> On 11:02 Tue 13 Jun , Greg Johnson wrote:
> > It seems to load the routes generated by the dump
> > script, but afterward it is not possible to dump the routes again.
> 
> This means you have broken LFTs now. Probably I know what is going on
> here - new LFTs don't have " 0" entries, and switches are
> not accessible by LIDs anymore.
> 
> Please update 'ibroute' utility (diags/) from the trunk and recreate the
> dump file - this should fix the problem.
> 
> (Sorry, I forgot to mention 'ibroute' upgrade issue in patch announcement).

Ok, that fixed it.  It works fine now.

Any chance of making our own lid -> guid assignments while we are at it?

Thanks,

Greg

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] uDAPL cma provider - add missing ia_attributes for the ia_query

2006-06-13 Thread Arlin Davis

James,

Here are some changes to include some missing IA attributes during a query.

-arlin

Signed-off by: Arlin Davis [EMAIL PROTECTED]


Index: dapl/openib_cma/dapl_ib_util.c
===
--- dapl/openib_cma/dapl_ib_util.c  (revision 7935)
+++ dapl/openib_cma/dapl_ib_util.c  (working copy)
@@ -444,7 +444,10 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HC
ia_attr->hardware_version_major = dev_attr.hw_ver;
ia_attr->max_eps  = dev_attr.max_qp;
ia_attr->max_dto_per_ep   = dev_attr.max_qp_wr;
-   ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+   ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
ia_attr->max_evds = dev_attr.max_cq;
ia_attr->max_evd_qlen = dev_attr.max_cqe;
ia_attr->max_iov_segments_per_dto = dev_attr.max_sge;
@@ -468,10 +471,11 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HC
ia_attr->max_eps, ia_attr->max_dto_per_ep,
ia_attr->max_evds, ia_attr->max_evd_qlen );
dapl_dbg_log(DAPL_DBG_TYPE_UTIL, 
-   " query_hca: msg %llu rdma %llu iov %d lmr %d rmr 
%d\n", 
+   " query_hca: msg %llu rdma %llu iov %d lmr %d rmr %d"
+   " rd_io %d\n", 
ia_attr->max_mtu_size, ia_attr->max_rdma_size,
ia_attr->max_iov_segments_per_dto, ia_attr->max_lmrs, 
-   ia_attr->max_rmrs );
+   ia_attr->max_rmrs, ia_attr->max_rdma_read_per_ep_in );
}

if (ep_attr != NULL) {


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/4] Simplification of the ucast fdb dumps.

2006-06-13 Thread Sasha Khapyorsky
Hi Eitan,

On 15:03 Tue 13 Jun , Eitan Zahavi wrote:
> Hi Sasha,
> 
> I still need to see if there are no real problematic changes in the osm.fdbs
> file syntax (need to update ibdm to support those) but I like the patch and
> the clean way you resolved the multiple opens of the dump file.

Thanks.

Sasha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 2/4 v2] Modular routing engine (unicast only yet).

2006-06-13 Thread Sasha Khapyorsky
Hi,

The same patch, but with comment addition about osm_routing_engine
structure.

Sasha.


This patch introduces routing_engine structure which may be used for
"plugging" new routing module. Currently only unicast callbacks are
supported (multicast can be added later). And existing routing module
is up-down 'updn', may be activated with '-R updn' option (instead of
old '-u'). General usage is:

 $ opensm -R 'module-name'

Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>
---

 osm/include/opensm/osm_opensm.h |   45 ++-
 osm/include/opensm/osm_subnet.h |   16 ++--
 osm/include/opensm/osm_ucast_updn.h |   26 -
 osm/opensm/main.c   |   26 +
 osm/opensm/osm_opensm.c |   41 ++---
 osm/opensm/osm_subnet.c |   23 ++--
 osm/opensm/osm_ucast_mgr.c  |   69 ---
 osm/opensm/osm_ucast_updn.c |   69 ++-
 8 files changed, 184 insertions(+), 131 deletions(-)

diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h
index 3235ad4..77d2a86 100644
--- a/osm/include/opensm/osm_opensm.h
+++ b/osm/include/opensm/osm_opensm.h
@@ -92,6 +92,46 @@ BEGIN_C_DECLS
 *
 */
 
+/s* OpenSM: OpenSM/osm_routing_engine
+* NAME
+*  struct osm_routing_engine
+*
+* DESCRIPTION
+*  OpenSM routing engine module definition.
+* NOTES
+*  routing engine structure - yet limited by ucast_fdb_assign and
+*  ucast_build_fwd_tables (multicast callbacks may be added later)
+*/
+struct osm_routing_engine {
+   const char *name;
+   void *context;
+   int (*ucast_build_fwd_tables)(void *context);
+   int (*ucast_fdb_assign)(void *context);
+   void (*delete)(void *context);
+};
+/*
+* FIELDS
+*  name
+*  The routing engine name (will be used in logs).
+*
+*  context 
+*  The routing engine context. Will be passed as parameter
+*  to the callback functions.
+*
+*  ucast_build_fwd_tables
+*  The callback for unicast forwarding table generation.
+*
+*  ucast_fdb_assign
+*  The same as above, but pretty integrated with default
+*  routing flow. Look at osm_ucast_mgr_process() and
+*  osm_ucast_updn.c for details. In future may be merged
+*  with ucast_build_fwd_tables() callback.
+*
+*  delete
+*  The delete method, may be used for routing engine
+*  internals cleanup.
+*/
+
 /s* OpenSM: OpenSM/osm_opensm_t
 * NAME
 *  osm_opensm_t
@@ -116,7 +156,7 @@ typedef struct _osm_opensm_t
   osm_log_tlog;
   cl_dispatcher_t  disp;
   cl_plock_t   lock;
-  updn_t *p_updn_ucast_routing;
+  struct osm_routing_engine routing_engine;
   osm_stats_t  stats;
 } osm_opensm_t;
 /*
@@ -153,6 +193,9 @@ typedef struct _osm_opensm_t
 *  lock
 *  Shared lock guarding most OpenSM structures.
 *
+*  routing_engine
+*  Routing engine, will be initialized then used
+*
 *  stats
 *  Open SM statistics block
 *
diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h
index 4db449d..a637367 100644
--- a/osm/include/opensm/osm_subnet.h
+++ b/osm/include/opensm/osm_subnet.h
@@ -272,13 +272,11 @@ typedef struct _osm_subn_opt
   uint32_t max_port_profile;
   osm_pfn_ui_extension_t   pfn_ui_pre_lid_assign;
   void *   ui_pre_lid_assign_ctx;
-  osm_pfn_ui_extension_t   pfn_ui_ucast_fdb_assign;
-  void *   ui_ucast_fdb_assign_ctx;
   osm_pfn_ui_mcast_extension_t pfn_ui_mcast_fdb_assign;
   void *   ui_mcast_fdb_assign_ctx;
   boolean_tsweep_on_trap;
   osm_testability_modes_t  testability_mode;
-  boolean_tupdn_activate;
+  char *   routing_engine_name;
   char *   updn_guid_file;
   boolean_texit_on_fatal;
   boolean_thonor_guid2lid_file;
@@ -407,13 +405,6 @@ typedef struct _osm_subn_opt
 *  ui_pre_lid_assign_ctx
 * A UI context (void *) to be provided to the pfn_ui_pre_lid_assign
 *
-*  pfn_ui_ucast_fdb_assign
-* A UI function to be called instead of the ucast manager FDB
-* configuration.
-*
-*  ui_ucast_fdb_assign_ctx
-* A UI context (void *) to be provided to the pfn_ui_ucast_fdb_assign
-*
 *  pfn_ui_mcast_fdb_assign
 * A UI function to be called inside the mcast manager instead of the
 * call for the build spanning tree. This will be called on every
@@ -429,9 +420,8 @@ typedef struct _osm_subn_opt
 *  testability_mode
 * Object that indicates if we are running in a special testability mode.
 *
-*  updn_activate
-* Object that indicates if we are running the UPDN algorithm (TRUE) or 
-* Min Hop Algorithm (FALSE)
+*  routing_engine_name
+* Name of used routing engine (

Re: [openib-general] OFED 1.0 release schedule

2006-06-13 Thread Arlin Davis
Woodruff, Robert J wrote:

>Tziporet wrote,
>  
>
>>We upload OFED-1.0-pre1.tgz to
>>https://openib.org/svn/gen2/branches/1.0/ofed/releases/
>>
>>
>>
>
>I tried the new tar ball and the pathscale driver now
>compiles (on Redhat EL4 - U3) and IPoIB and OpenSM appear to work OK,
>but Intel MPI/uDAPL and NetPipe/uDAPL are broken. It apprears to 
>be a problem with rdma operations. I also tried SDP/pathscale and 
>it does not work either.
>Finally, the rdma_cm is missing the changes that match the uDAPL fix
>that
>was put in for the new setops for the CM timeouts.
>Arlin will provide specifics. We'd really like the rdma_cm fix in the
>release. 
>
>  
>
Here is a pointer to Sean's email/patches with the details:

http://openib.org/pipermail/openib-general/2006-June/022654.html
http://openib.org/pipermail/openib-general/2006-June/022655.html

-arlin

>woody
>
>
>-Original Message-
>From: Betsy Zeller [mailto:[EMAIL PROTECTED] 
>Sent: Tuesday, June 13, 2006 1:44 PM
>To: Tziporet Koren
>Cc: Matt L. Leininger; Scott Weitzenkamp (sweitzen); Matters, Todd; Moni
>Levy; Woodruff, Robert J; openib; OpenFabricsEWG
>Subject: Re: OFED 1.0 release schedule
>
>Tziporet - this plan makes sense. We'll let you know how the testing
>goes. BTW, for some reason, if you click on the URL you sent out, it
>just hangs but if you type it in, it works. Not sure why.
>
>Thanks, Betsy
>
>On Tue, 2006-06-13 at 16:07 +0300, Tziporet Koren wrote:
>  
>
>>Hi All,
>>
>> 
>>
>>After reading the mail thread regarding OFED release I have decided
>>this:
>>
>> 
>>
>>We upload OFED-1.0-pre1.tgz to
>>https://openib.org/svn/gen2/branches/1.0/ofed/releases/
>>
>> 
>>
>>We checked that all modules compile and loaded on this build
>>(including ipath and uDAPL)
>>
>>The only missing parts of this release from the final release are the
>>documents, and the scripts rpm that Scott requested.
>>
>> 
>>
>>I think testing this version 3 days (Tuesday, Wednesday and Thursday)
>>should be enough as Scott wrote.
>>
>>So - we can do the official OFED 1.0 release on Friday 16-June.
>>
>> 
>>
>>Matt - please check with Novel if this date is acceptable by them.
>>
>> 
>>
>>If not then the earliest we can do the release if Thursday 15-June.
>>
>> 
>>
>> 
>>
>>Tziporet Koren
>>
>>Software Director
>>
>>Mellanox Technologies
>>
>>mailto: [EMAIL PROTECTED]
>>Tel +972-4-9097200, ext 380
>>
>> 
>>
>>
>>
>>
>
>___
>openib-general mailing list
>openib-general@openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>  
>


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2/4] Modular routing engine (unicast only yet).

2006-06-13 Thread Sasha Khapyorsky
Hi Eitan,

On 14:55 Tue 13 Jun , Eitan Zahavi wrote:
> 
> As provided in my previous patch 1/4 comments
> I think the callbacks should also have an entry for the MinHop stage (maybe
> this is the ucast_build_fwd_tables?) I have some algorithms in mind that 
> will
> skip that stage all-together.

We may add new callback when it will be useful.

> Also it might make sense for each routing engine to provide its own "dump"
> routine such that each could support difference file format if needed.

Why we may want dump format per routing engine? Even if we are, you may
put it into routing engine specific code.

> 
> Rest of the comments are inline
> 
> EZ
> 
> Sasha Khapyorsky wrote:
> >
> >diff --git a/osm/include/opensm/osm_opensm.h 
> >b/osm/include/opensm/osm_opensm.h
> >index 3235ad4..3e6e120 100644
> >--- a/osm/include/opensm/osm_opensm.h
> >+++ b/osm/include/opensm/osm_opensm.h
> >@@ -92,6 +92,18 @@ BEGIN_C_DECLS
> > *
> > */
> > 
> >+/*
> >+ * routing engine structure - yet limited by ucast_fdb_assign and
> >+ *  ucast_build_fwd_tables (multicast callbacks may be added later)
> >+ */
> >+struct osm_routing_engine {
> >+const char *name;
> >+void *context;
> >+int (*ucast_build_fwd_tables)(void *context);
> >+int (*ucast_fdb_assign)(void *context);
> >+void (*delete)(void *context);
> >+};
> It would be nice if you added a standard header to this struct.
> It is not clear to me what ucast_build_fwd_tables and
> ucast_fdb_assign are mapping to.

Ok, will add.

BTW, seems OpenSM declarations were used for generation manuals or other
docs. Do you know are those

 /h*
 /s*
 /f*

in use anymore? And with what is the tool?

> Please see the next section as an example for a struct header.
> >+
> > /s* OpenSM: OpenSM/osm_opensm_t
> > * NAME
> > *   osm_opensm_t

> >@@ -1129,6 +1144,14 @@ osm_ucast_mgr_process(
> >  i
> >  );
> > 
> >+if (p_routing_eng->ucast_build_fwd_tables &&
> >+p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context) == 
> >0)
> >+{
> >+  cl_qmap_apply_func( p_sw_guid_tbl,
> >+  __osm_ucast_mgr_set_table_cb, p_mgr );
> >+} /* fallback on the regular path in case of failures */
> >+else
> >+{
> Please explain why this step is needed and why if the routing engine 
> function is
> returning 0 you still invoke the standard __osm_ucast_mgr_set_table_cb.

->ucast_build_fwd_tables() creates fwd tables and
__osm_ucast_mgr_set_table_cb() upload them on the switches. In case of
->ucast_build_fwd_tables() fatal failure (when return status is != 0),
tables uploading will be skipped and flow will continue with default
routing code.

Thanks for the comments.
Sasha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] uDAPL openib-cma provider - add support for IB_CM_REQ_OPTIONS

2006-06-13 Thread Arlin Davis
Tziporet Koren wrote:

>Jack put the bug fix to OFED 1.0.
>
>Tziporet
>  
>

Great.

Did the CMA module (SVN 7742) changes also get in? If not, uDAPL is out 
of sync with CMA and will not work.

-arlin


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.0 release schedule

2006-06-13 Thread Woodruff, Robert J
Tziporet wrote,
>We upload OFED-1.0-pre1.tgz to
> https://openib.org/svn/gen2/branches/1.0/ofed/releases/
> 

I tried the new tar ball and the pathscale driver now
compiles (on Redhat EL4 - U3) and IPoIB and OpenSM appear to work OK,
but Intel MPI/uDAPL and NetPipe/uDAPL are broken. It apprears to 
be a problem with rdma operations. I also tried SDP/pathscale and 
it does not work either.
Finally, the rdma_cm is missing the changes that match the uDAPL fix
that
was put in for the new setops for the CM timeouts.
Arlin will provide specifics. We'd really like the rdma_cm fix in the
release. 

woody


-Original Message-
From: Betsy Zeller [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 13, 2006 1:44 PM
To: Tziporet Koren
Cc: Matt L. Leininger; Scott Weitzenkamp (sweitzen); Matters, Todd; Moni
Levy; Woodruff, Robert J; openib; OpenFabricsEWG
Subject: Re: OFED 1.0 release schedule

Tziporet - this plan makes sense. We'll let you know how the testing
goes. BTW, for some reason, if you click on the URL you sent out, it
just hangs but if you type it in, it works. Not sure why.

Thanks, Betsy

On Tue, 2006-06-13 at 16:07 +0300, Tziporet Koren wrote:
> Hi All,
> 
>  
> 
> After reading the mail thread regarding OFED release I have decided
> this:
> 
>  
> 
> We upload OFED-1.0-pre1.tgz to
> https://openib.org/svn/gen2/branches/1.0/ofed/releases/
> 
>  
> 
> We checked that all modules compile and loaded on this build
> (including ipath and uDAPL)
> 
> The only missing parts of this release from the final release are the
> documents, and the scripts rpm that Scott requested.
> 
>  
> 
> I think testing this version 3 days (Tuesday, Wednesday and Thursday)
> should be enough as Scott wrote.
> 
> So - we can do the official OFED 1.0 release on Friday 16-June.
> 
>  
> 
> Matt - please check with Novel if this date is acceptable by them.
> 
>  
> 
> If not then the earliest we can do the release if Thursday 15-June.
> 
>  
> 
>  
> 
> Tziporet Koren
> 
> Software Director
> 
> Mellanox Technologies
> 
> mailto: [EMAIL PROTECTED]
> Tel +972-4-9097200, ext 380
> 
>  
> 
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-13 Thread Hal Rosenstock
On Tue, 2006-06-13 at 17:58, Sean Hefty wrote:
> >> Assuming minimal hard-coding of which methods are requests, a client would
> >drop
> >> only about 1 MAD per method during start-up.
> >
> >Is this only the new methods which are not hard coded ? Would this
> >invoke a timeout (and hopefully retry) ?
> 
> We can hard-code existing methods to avoid this problem.  So only unknown
> methods would be affected, which would affect user-defined classes more than 
> the
> existing classes.

I would expect vendor classes to follow the standard methods unless they
need something different.

> In most cases, I would expect the sender to timeout and retry the request, 
> which
> hopefully comes after the request table has been updated.
> 
> >> And I
> >> would argue that even if a request has been acknowledged, the sender of the
> >> request would still need to deal with the case that no response is ever
> >> generated.
> >
> >Are you referring to a request being acknowledged but the response is
> >not sent (yet) ?
> 
> Yes.
> 
> >> My current thoughts on how to handle requests are to time when each request
> >MAD
> >> is received, and queue it.  Once the queue is full, if another request is
> >> received, it would check the MAD at the head of the queue.  If the MAD at 
> >> the
> >> head was older than some selected value (say 20 seconds), it would be 
> >> bumped
> >> from the queue, and the new request would be added to the tail.
> >
> >For RMPP, this time should start when the last segment is received. Is
> >that how you would envision it working ?
> 
> Correct.  Part of the motivation here is if a client cannot or will not 
> generate
> a response for some reason, we don't want to keep the MAD hanging around
> forever.
> 
> >I'm also not sure what the right timeout value would be for this. Where
> >did 20 seconds come from ?
> 
> I just made that up.  Something like this would probably have to be adaptable,
> and would likely depend on the size of the fabric.  In most cases, I would 
> guess
> that a timeout indicates some sort of error in the client, so I would tend
> towards a larger timeout.

Is the only downside of a larger timeout that potentially more memory
accumulates (until the timeout occurs) before it is freed ?

-- Hal

> - Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-13 Thread Sean Hefty
>> Assuming minimal hard-coding of which methods are requests, a client would
>drop
>> only about 1 MAD per method during start-up.
>
>Is this only the new methods which are not hard coded ? Would this
>invoke a timeout (and hopefully retry) ?

We can hard-code existing methods to avoid this problem.  So only unknown
methods would be affected, which would affect user-defined classes more than the
existing classes.

In most cases, I would expect the sender to timeout and retry the request, which
hopefully comes after the request table has been updated.

>> And I
>> would argue that even if a request has been acknowledged, the sender of the
>> request would still need to deal with the case that no response is ever
>> generated.
>
>Are you referring to a request being acknowledged but the response is
>not sent (yet) ?

Yes.

>> My current thoughts on how to handle requests are to time when each request
>MAD
>> is received, and queue it.  Once the queue is full, if another request is
>> received, it would check the MAD at the head of the queue.  If the MAD at the
>> head was older than some selected value (say 20 seconds), it would be bumped
>> from the queue, and the new request would be added to the tail.
>
>For RMPP, this time should start when the last segment is received. Is
>that how you would envision it working ?

Correct.  Part of the motivation here is if a client cannot or will not generate
a response for some reason, we don't want to keep the MAD hanging around
forever.

>I'm also not sure what the right timeout value would be for this. Where
>did 20 seconds come from ?

I just made that up.  Something like this would probably have to be adaptable,
and would likely depend on the size of the fabric.  In most cases, I would guess
that a timeout indicates some sort of error in the client, so I would tend
towards a larger timeout.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v2 1/2] iWARP Connection Manager.

2006-06-13 Thread Steve Wise
On Tue, 2006-06-13 at 14:36 -0700, Sean Hefty wrote:
> >> Er...no. It will lose this event. Depending on the event...the carnage
> >> varies. We'll take a look at this.
> >>
> >
> >This behavior is consistent with the Infiniband CM (see
> >drivers/infiniband/core/cm.c function cm_recv_handler()).  But I think
> >we should at least log an error because a lost event will usually stall
> >the rdma connection.
> 
> I believe that there's a difference here.  For the Infiniband CM, an 
> allocation
> error behaves the same as if the received MAD were lost or dropped.  Since 
> MADs
> are unreliable anyway, it's not so much that an IB CM event gets lost, as it
> doesn't ever occur.  A remote CM should retry the send, which hopefully allows
> the connection to make forward progress.
> 

hmm.  Ok.  I see.  I misunderstood the code in cm_recv_handler().

Tom and I have been talking about what we can do to not drop the event.
Stay tuned.

Steve.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v2 1/2] iWARP Connection Manager.

2006-06-13 Thread Sean Hefty
>> Er...no. It will lose this event. Depending on the event...the carnage
>> varies. We'll take a look at this.
>>
>
>This behavior is consistent with the Infiniband CM (see
>drivers/infiniband/core/cm.c function cm_recv_handler()).  But I think
>we should at least log an error because a lost event will usually stall
>the rdma connection.

I believe that there's a difference here.  For the Infiniband CM, an allocation
error behaves the same as if the received MAD were lost or dropped.  Since MADs
are unreliable anyway, it's not so much that an IB CM event gets lost, as it
doesn't ever occur.  A remote CM should retry the send, which hopefully allows
the connection to make forward progress.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 3/4] New routing module which loads LFT tables from dump file.

2006-06-13 Thread Sasha Khapyorsky
Hi Eitan,

On 14:30 Tue 13 Jun , Eitan Zahavi wrote:
> Hi Sasha,
> 
> Please see my comments inside
> 
> Sasha Khapyorsky wrote:
> >This patch implements trivial routing module which able to load LFT
> >tables from dump file. Main features:
> >- support for unicast LFTs only, support for multicast can be added later
> >- this will run after min hop matrix calculation
> >- this will load switch LFTs according to the path entries introduced in
> >  the dump file
> >- no additional checks will be performed (like is port connected, etc)
> >- in case when fabric LIDs were changed this will try to reconstruct LFTs
> >  correctly if endport GUIDs are represented in the dump file (in order
> >  to disable this GUIDs may be removed from the dump file or zeroed)
> I think you cold use the concept of directed routes for storing the LIDs 
> too.

Maybe. But there is one disadvantage - such dump file will be node
dependent, we will not be able to generate it on one node and load on
another. Anyway the goal of LID/GUID checking is to provide minimal
fixing for trivial case and not to limit the subnet administrator in
what He/She wants to do.

> So in case of new LID assignments you can extract the old -> new mapping by
> scanning the LIDs of end ports by their DR path.

I do it with GUID.

> Anyway, I think it is required that you also perform topology matching such 
> that
> if someone changed the topology you are able to figure it out and stop.
> THIS IS A SERIOUS LIMITATION OF YOUR PROPOSAL.

I think this is limitation of the subnet administrator's choice - one may
want to create LFT with entries for yet not connected nodes.

If you are about more "safe" dump loader, this may be done (and the code
may be reused), but I think this should be different routing method.

> >The dump file format is compatible with output of 'ibroute' util and for
> >whole fabric may be generated with script like this:
> >
> >  for sw_lid in `ibswitches | awk '{print $NF}'` ; do
> > ibroute $sw_lid
> >  done > /path/to/dump_file
> >
> >, or using DR paths:
> >
> >
> >  for sw_dr in `ibnetdiscover -v \
> > | sed -ne '/^DR path .* switch /s/^DR path 
> > \[\(.*\)\].*$/\1/p' \
> > | sed -e 's/\]\[/,/g' \
> > | sort -u` ; do
> > ibroute -D ${sw_dr}
> >  done > /path/to/dump_file
> WE SHOULD ALSO PROVIDE A DUMP FILE VIA:
> 1. OpenSM should dump its routes using this format (like it does today 
> using osm.fdbs)

In this way you may generate dump with LFTs created only by OpenSM (and
not by other SMs). This is unnecessary limitation for primary method.

However I agree that as additional method this may be good and useful.
Please feel free to provide the path for this.

> 2. ibdiagnet

Ditto

> >
> >
> >
> >diff --git a/osm/include/opensm/osm_subnet.h 
> >b/osm/include/opensm/osm_subnet.h
> >index a637367..ec1d056 100644
> >--- a/osm/include/opensm/osm_subnet.h
> >+++ b/osm/include/opensm/osm_subnet.h
> >@@ -423,6 +424,10 @@ typedef struct _osm_subn_opt
> > *  routing_engine_name
> > * Name of used routing engine (other than default Min Hop Algorithm)
> > *
> >+*  ucast_dump_file
> >+* Name of the unicast routing dump file from where switch
> >+* forwearding tables will be loaded
>  ^^^
>  forwarding

Thanks. Will fix.

> >+  "cannot parse port guid "
> >+  "(maybe broken dump): "
> >+  "\'%s\'\n", p);
> >+port_guid = 0;
> >+}
> >+}
> >+port_guid = cl_hton64(port_guid);
> >+add_path(p_osm, p_sw, lid, port_num, port_guid);
> >+}
> >+}
> >+
> >+fclose(file);
> >+return 0;
> >+}
> In OpenSM we write with style:
> if () {
> }
> else if ()
> {
> }
> else
> {
> }
> 
> Not any other combination

Really? Don't want to bother with examples, but I may see almost any
"combination" in OpenSM and it is not clear for me which one is common
(the coding style and identation are different even from file to file).

Thanks for comments.
Sasha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-13 Thread Hal Rosenstock
On Tue, 2006-06-13 at 14:05, Sean Hefty wrote:
> >There are architected ways to do that. There's busy for MADs which could
> >be used for some MADs. For RMPP, would the transfer be ABORTed ? I don't
> >think you can switch to BUSY in the middle (but I'm not 100% sure). I
> >don't know how this limit is being used exactly, but it might be best if
> >the RMPP receive were treated as 1 MAD regardless of of how many
> >segments it was.
> 
> Maybe I should back-up some here.  There are a couple problems that I'm trying
> to solve, but the main goal is to prevent sending duplicate responses.  I'd 
> like
> to do this by detecting and dropping duplicate requests.
> 
> To detect a duplicate request, my proposal is to move completed MADs to a
> "done_list".  Newly received MADs would also check the done_list to determine 
> if
> the MAD is a duplicate.  When a user sends a response MAD, a check would be 
> made
> against the done_list for a matching request that has not generated a response
> yet.  If one is not found, then the send would be failed.
> 
> Received MADs would be removed from the done_list when they are freed.  My 
> guess
> is that for kernel clients, the changes would probably be minimal.  For 
> usermode
> clients, the problem is more difficult, since we cannot trust usermode clients
> to generate responses correctly, and there's no free_mad call that maps to the
> kernel.
> 
> One of the ideas then, is for the kernel umad module to learn which MADs
> generate responses.  It would do this by updating an entry to a table 
> whenever a
> response MAD is generated.  A received MAD would check against the table to 
> see
> if a response is supposed to be generated.  If not, then the MAD would be 
> freed
> after userspace claims it.  If a response is expected, then the MAD would not 
> be
> freed until the response was generated.
> 
> Assuming minimal hard-coding of which methods are requests, a client would 
> drop
> only about 1 MAD per method during start-up.

Is this only the new methods which are not hard coded ? Would this
invoke a timeout (and hopefully retry) ?

> Considering most requests are not
> sent reliably, this shouldn't be a big issue.  (In fact, outside of a
> MultiPathRecord query, I don't believe any requests are sent reliably.)

If you mean sent via RMPP, then yes, only GetMulti is sent this way.

> And I
> would argue that even if a request has been acknowledged, the sender of the
> request would still need to deal with the case that no response is ever
> generated.

Are you referring to a request being acknowledged but the response is
not sent (yet) ?

> If this approach were taken, then, it brings up the issue that MADs are being
> stored in the kernel waiting for a response.  But what if a response is never
> generated?  This problem is somewhat related to MADs being queued in the 
> kernel,
> but the userspace app doesn't call down to receive them.  Ideally, we could 
> come
> up with a single solution to both problems, but that may not be possible.
> 
> My current thoughts on how to handle requests are to time when each request 
> MAD
> is received, and queue it.  Once the queue is full, if another request is
> received, it would check the MAD at the head of the queue.  If the MAD at the
> head was older than some selected value (say 20 seconds), it would be bumped
> from the queue, and the new request would be added to the tail.

For RMPP, this time should start when the last segment is received. Is
that how you would envision it working ?

I'm also not sure what the right timeout value would be for this. Where
did 20 seconds come from ?

-- Hal

> - Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] OpenSM/modular-routing.txt: Add description of modular routing

2006-06-13 Thread Hal Rosenstock
OpenSM/doc/modular_routing.txt: Add description of modular routing

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

Index: osm/doc/modular-routing.txt
===
--- osm/doc/modular-routing.txt (revision 0)
+++ osm/doc/modular-routing.txt (revision 0)
@@ -0,0 +1,53 @@
+Modular routing engine structure has been added to allow
+for ease of "plugging" new routing module.
+
+Currently, only unicast callbacks are supported. Multicast
+can be added later.
+
+An existing routing module is up-down "updn", which may be
+activate with '-R updb' option (instead of old '-u').
+
+General usage is:
+$ opensm -R 'module-name'
+
+There is also a trivial routing module which is able
+to load LFT tables from a dump file.
+
+Main features:
+
+- support for unicast LFTs only, support for multicast can be added later
+- this will run after min hop matrix calculation
+- this will load switch LFTs according to the path entries introduced in
+  the dump file
+- no additional checks will be performed (like is port connected, etc)
+- in case when fabric LIDs were changed this will try to reconstruct LFTs
+  correctly if endport GUIDs are represented in the dump file (in order
+  to disable this GUIDs may be removed from the dump file or zeroed)
+
+The dump file format is compatible with output of 'ibroute' util and for
+whole fabric may be generated with script like this:
+
+  for sw_lid in `ibswitches | awk '{print $NF}'` ; do
+ibroute $sw_lid
+  done > /path/to/dump_file
+
+, or using DR paths:
+
+
+  for sw_dr in `ibnetdiscover -v \
+| sed -ne '/^DR path .* switch /s/^DR path \[\(.*\)\].*$/\1/p' 
\
+| sed -e 's/\]\[/,/g' \
+| sort -u` ; do
+ibroute -D ${sw_dr}
+  done > /path/to/dump_file
+
+
+In order to activate new module use:
+
+  opensm -R file -U /path/to/dump_file
+
+NOTE: ibroute has been updated to support this (for switch management ports).
+Also, lmc was added to switch management ports. ibroute needs to be 7855 or
+later from the trunk.
+
+




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OFED 1.0 release schedule

2006-06-13 Thread Betsy Zeller
Tziporet - this plan makes sense. We'll let you know how the testing
goes. BTW, for some reason, if you click on the URL you sent out, it
just hangs but if you type it in, it works. Not sure why.

Thanks, Betsy

On Tue, 2006-06-13 at 16:07 +0300, Tziporet Koren wrote:
> Hi All,
> 
>  
> 
> After reading the mail thread regarding OFED release I have decided
> this:
> 
>  
> 
> We upload OFED-1.0-pre1.tgz to
> https://openib.org/svn/gen2/branches/1.0/ofed/releases/
> 
>  
> 
> We checked that all modules compile and loaded on this build
> (including ipath and uDAPL)
> 
> The only missing parts of this release from the final release are the
> documents, and the scripts rpm that Scott requested.
> 
>  
> 
> I think testing this version 3 days (Tuesday, Wednesday and Thursday)
> should be enough as Scott wrote.
> 
> So – we can do the official OFED 1.0 release on Friday 16-June.
> 
>  
> 
> Matt – please check with Novel if this date is acceptable by them.
> 
>  
> 
> If not then the earliest we can do the release if Thursday 15-June.
> 
>  
> 
>  
> 
> Tziporet Koren
> 
> Software Director
> 
> Mellanox Technologies
> 
> mailto: [EMAIL PROTECTED]
> Tel +972-4-9097200, ext 380
> 
>  
> 
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 0/4] opensm: Loading unicast routes from the file

2006-06-13 Thread Sasha Khapyorsky
Hi Eitan,

On 09:36 Sun 11 Jun , Eitan Zahavi wrote:
> Hi Sasha,
> 
> General comments:
> 1. I hope the change in osm.fdbs is not going to break the parser in
> ibdm:Fabric.cpp -

The file format was not changed, I don't expect brokenness.

> was it really necessary change?

Yes, in order to create unified osm.fdbs with any routing engine.

> or just nice to have ?

This is the nice side effect.

> 2. The modular routing is a great idea. From my first glance it seems
> that it assumes calculation of min-hop-tables is common to all routing
> engines.

Yes and no. Currently the min-hop-tables are used with multicast, so it is
common code. But I expect this will be different in the future (for
instance extend this loader to handle multicast tables too).

> I think it should be a callback provided by the engine too.

Yes, when it will be useful.

> Please note that the Min-Hop engine takes most of the routing time so in
> the future if we could avoid that stage it would be even better. 

Agree.

Thanks for the comments.
Sasha

> [EZ] We should start thinking about testing of this new feature too.
> 
> Further comment on the patches themselves.
> 
> > There are couple of unicast routing related patches for OpenSM.
> > 
> > Basically it implements routing module which provides possibility to
> load
> > switch forwarding tables from pre-created dump file. Currently unicast
> > tables loading is only supported, multicast may be added in a future.
> > 
> > Short patch descriptions (more details may be found in emails with
> > patches):
> > 
> > 1. Ucast dump file simplification.
> > 2. Modular routing - preliminary implements generic model to plug new
> > routing engine to OpenSM.
> > 3. New simple unicast routing engine which allows to load LFTs from
> > pre-created dump file.
> > 4. Example of ucast dump generation script.
> > 
> > Please comment and test. Thanks.
> > 
> > Sasha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH v2 1/2] iWARP Connection Manager.

2006-06-13 Thread Steve Wise


> > > +static void cm_event_handler(struct iw_cm_id *cm_id,
> > > +  struct iw_cm_event *iw_event) 
> > > +{
> > > + struct iwcm_work *work;
> > > + struct iwcm_id_private *cm_id_priv;
> > > + unsigned long flags;
> > > +
> > > + work = kmalloc(sizeof(*work), GFP_ATOMIC);
> > > + if (!work)
> > > + return;
> > 
> > This allocation _will_ fail sometimes.  The driver must recover from it. 
> > Will it do so?
> 
> Er...no. It will lose this event. Depending on the event...the carnage
> varies. We'll take a look at this.
> 

This behavior is consistent with the Infiniband CM (see
drivers/infiniband/core/cm.c function cm_recv_handler()).  But I think
we should at least log an error because a lost event will usually stall
the rdma connection.

> > 
> > > +EXPORT_SYMBOL(iw_cm_init_qp_attr);
> > 
> > This file exports a ton of symbols.  It's usual to provide some justifying
> > commentary in the changelog when this happens.
> 
> This module is a logical instance of the xx_cm where xx is the transport
> type. I think there is some discussion warranted on whether or not these
> should all be built into and exported by rdma_cm. One rationale would be
> that the rdma_cm is the only client for many of these functions (this
> being a particularly good example) and doing so would reduce the export
> count. Others would be reasonably needed for any application (connect,
> etc...)
> 

Transport-dependent ULPs, in theory, are able to use the
transport-specific CM directly if they don't wish to use the RDMA CM.  I
think that's the rationale for have the xx_cm modules seperate from the
rdma_cm module and exporting the various functions.   

> All that said, we'll be sure to document the exported symbols in a
> follow-up patch.
> 

I'll add commentary explaining this.

Steve.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/4] opensm: Loading unicast routes from the file

2006-06-13 Thread Sasha Khapyorsky
Hi Greg,

On 11:02 Tue 13 Jun , Greg Johnson wrote:
> On Sun, Jun 11, 2006 at 03:27:58AM +0300, Sasha Khapyorsky wrote:
> > Hi,
> > 
> > There are couple of unicast routing related patches for OpenSM.
> > 
> > Basically it implements routing module which provides possibility to load
> > switch forwarding tables from pre-created dump file. Currently unicast
> > tables loading is only supported, multicast may be added in a future.
> > 
> > Short patch descriptions (more details may be found in emails with
> > patches):
> > 
> > 1. Ucast dump file simplification.
> > 2. Modular routing - preliminary implements generic model to plug new
> > routing engine to OpenSM.
> > 3. New simple unicast routing engine which allows to load LFTs from
> > pre-created dump file.
> > 4. Example of ucast dump generation script.
> > 
> > Please comment and test. Thanks.
> 
> We tried this on our 256-node cluster with a single chassis Voltaire
> 288-port switch.

Thanks.

> It seems to load the routes generated by the dump
> script, but afterward it is not possible to dump the routes again.

This means you have broken LFTs now. Probably I know what is going on
here - new LFTs don't have " 0" entries, and switches are
not accessible by LIDs anymore.

Please update 'ibroute' utility (diags/) from the trunk and recreate the
dump file - this should fix the problem.

(Sorry, I forgot to mention 'ibroute' upgrade issue in patch announcement).

> I
> would like to re-dump the routes after loading to ensure that they were
> loaded correctly.
> 
> After loading routes with "opensm -R file -U dump_file", dump_lfts.sh
> gives:
> 
> nodeinfo
>        
>        
>        
>        
> ibroute: iberror: dump tables failed: node info failed: valid addr?
> 
> for each switch.
> 
> Also, I had to delete a space in the sed script on line 17 of
> dump_lfts.sh:
> 
> sed -ne 's/^.* lid \([1-9a-f]*\) .*$/\1/p'
> 
> became
> 
> sed -ne 's/^.* lid \([1-9a-f]*\).*$/\1/p'

I see. I've used ibswitches/ibnetdiscover from the trunk, there is some
minor difference in the output (' lmc N' was added). I think with your
change the script will work with both old and new outputs. Thanks for
the fix.

> Thanks for the work!

Thanks for trying this.

Sasha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/4] Simplification of the ucast fdb dumps.

2006-06-13 Thread Hal Rosenstock
On Sat, 2006-06-10 at 20:32, Sasha Khapyorsky wrote:
> This separates the dump procedure from rest of the flow and prevents
> multiple fopen()/fclose() (one pair per switch) - one fopen() and one
> fclose() instead.
> 
> Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>

Thanks. Applied (with some cosmetic changes).

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH] IB/uverbs: Don't serialize with ib_uverbs_idr_mutex

2006-06-13 Thread Robert Walsh
On Tue, 2006-06-13 at 10:55 -0700, Roland Dreier wrote:
> Michael> Won't this let the user issue multiple modify QP commands
> Michael> in parallel on the same QP? mthca at least does not
> Michael> protect against such attempts, and doing this will
> Michael> confuse the hardware.
> 
> Hmm, that's a good point.  But I did write the following in
> Documentation/infiniband/core_locking.txt:
> 
>   All of the methods in struct ib_device exported by a low-level
>   driver must be fully reentrant.  The low-level driver is required to
>   perform all synchronization necessary to maintain consistency, even
>   if multiple function calls using the same object are run
>   simultaneously.
> 
>   The IB midlayer does not perform any serialization of function calls.
> 
> So I guess this is a bug in mthca.

We have a similar problem in resource checking - we were relying on the
idr lock to keep us safe.  I'll fix that up, too.

Regards,
 Robert.

-- 
Robert Walsh Email: [EMAIL PROTECTED]
PathScale, Inc.  Phone: +1 650 934 8117
2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969
Mountain View, CA 94043.


signature.asc
Description: This is a digitally signed message part
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [git pull] please pull infiniband.git

2006-06-13 Thread Roland Dreier
Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This has a couple of mthca driver bug fixes:

Michael S. Tsirkin:
  IB/mthca: restore missing PCI registers after reset
  IB/mthca: memfree completion with error FW bug workaround

 drivers/infiniband/hw/mthca/mthca_cq.c|   11 +
 drivers/infiniband/hw/mthca/mthca_reset.c |   59 +
 2 files changed, 69 insertions(+), 1 deletions(-)


diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c 
b/drivers/infiniband/hw/mthca/mthca_cq.c
index 205854e..87a8f11 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -540,8 +540,17 @@ static inline int mthca_poll_one(struct 
entry->wr_id = srq->wrid[wqe_index];
mthca_free_srq_wqe(srq, wqe);
} else {
+   s32 wqe;
wq = &(*cur_qp)->rq;
-   wqe_index = be32_to_cpu(cqe->wqe) >> wq->wqe_shift;
+   wqe = be32_to_cpu(cqe->wqe);
+   wqe_index = wqe >> wq->wqe_shift;
+   /*
+   * WQE addr == base - 1 might be reported in receive completion
+   * with error instead of (rq size - 1) by Sinai FW 1.0.800 and
+   * Arbel FW 5.1.400.  This bug should be fixed in later FW revs.
+   */
+   if (unlikely(wqe_index < 0))
+   wqe_index = wq->max - 1;
entry->wr_id = (*cur_qp)->wrid[wqe_index];
}
 
diff --git a/drivers/infiniband/hw/mthca/mthca_reset.c 
b/drivers/infiniband/hw/mthca/mthca_reset.c
index df5e494..f4fddd5 100644
--- a/drivers/infiniband/hw/mthca/mthca_reset.c
+++ b/drivers/infiniband/hw/mthca/mthca_reset.c
@@ -49,6 +49,12 @@ int mthca_reset(struct mthca_dev *mdev)
u32 *hca_header= NULL;
u32 *bridge_header = NULL;
struct pci_dev *bridge = NULL;
+   int bridge_pcix_cap = 0;
+   int hca_pcie_cap = 0;
+   int hca_pcix_cap = 0;
+
+   u16 devctl;
+   u16 linkctl;
 
 #define MTHCA_RESET_OFFSET 0xf0010
 #define MTHCA_RESET_VALUE  swab32(1)
@@ -110,6 +116,9 @@ #define MTHCA_RESET_VALUE  swab32(1)
}
}
 
+   hca_pcix_cap = pci_find_capability(mdev->pdev, PCI_CAP_ID_PCIX);
+   hca_pcie_cap = pci_find_capability(mdev->pdev, PCI_CAP_ID_EXP);
+
if (bridge) {
bridge_header = kmalloc(256, GFP_KERNEL);
if (!bridge_header) {
@@ -129,6 +138,13 @@ #define MTHCA_RESET_VALUE  swab32(1)
goto out;
}
}
+   bridge_pcix_cap = pci_find_capability(bridge, PCI_CAP_ID_PCIX);
+   if (!bridge_pcix_cap) {
+   err = -ENODEV;
+   mthca_err(mdev, "Couldn't locate HCA bridge "
+ "PCI-X capability, aborting.\n");
+   goto out;
+   }
}
 
/* actually hit reset */
@@ -178,6 +194,20 @@ #define MTHCA_RESET_VALUE  swab32(1)
 good:
/* Now restore the PCI headers */
if (bridge) {
+   if (pci_write_config_dword(bridge, bridge_pcix_cap + 0x8,
+bridge_header[(bridge_pcix_cap + 0x8) / 4])) {
+   err = -ENODEV;
+   mthca_err(mdev, "Couldn't restore HCA bridge Upstream "
+ "split transaction control, aborting.\n");
+   goto out;
+   }
+   if (pci_write_config_dword(bridge, bridge_pcix_cap + 0xc,
+bridge_header[(bridge_pcix_cap + 0xc) / 4])) {
+   err = -ENODEV;
+   mthca_err(mdev, "Couldn't restore HCA bridge Downstream 
"
+ "split transaction control, aborting.\n");
+   goto out;
+   }
/*
 * Bridge control register is at 0x3e, so we'll
 * naturally restore it last in this loop.
@@ -203,6 +233,35 @@ good:
}
}
 
+   if (hca_pcix_cap) {
+   if (pci_write_config_dword(mdev->pdev, hca_pcix_cap,
+hca_header[hca_pcix_cap / 4])) {
+   err = -ENODEV;
+   mthca_err(mdev, "Couldn't restore HCA PCI-X "
+ "command register, aborting.\n");
+   goto out;
+   }
+   }
+
+   if (hca_pcie_cap) {
+   devctl = hca_header[(hca_pcie_cap + PCI_EXP_DEVCTL) / 4];
+   if (pci_write_config_word(mdev->pdev, hca_pcie_cap + 
PCI_EXP_DEVCTL,
+  devctl)) {
+   

Re: [openib-general] [PATCH] OpenSM/SA: Properly handle non base LID requests to someSA records

2006-06-13 Thread Eitan Zahavi
Sure. Looks good to me

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -Original Message-
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 13, 2006 7:42 PM
> To: openib-general@openib.org
> Cc: Eitan Zahavi
> Subject: [PATCH] OpenSM/SA: Properly handle non base LID requests to
someSA
> records
> 
> OpenSM/SA: Properly handle non base LID requests to some SA records
> 
> In osm_sa_node_record.c and osm_sa_portinfo_record.c, properly handle
> non base LID requests per C15-0.1.11: Query responses shall contain a
> port's base LID in  any LID component of a RID. So when LMC is non 0,
> the only records that appear are those with the base LID and not with
> any masked LIDs. Furthermore, if a query comes in on a non base LID,
the
> LID in the RID returned is only with the base LID.
> 
> Also, fixed some endian issues in osm_log messages.
> 
> Note: Similar patch for other affected SA records will follow.
> 
> Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>
> 
> Index: opensm/osm_sa_node_record.c
> ===
> --- opensm/osm_sa_node_record.c   (revision 7961)
> +++ opensm/osm_sa_node_record.c   (working copy)
> @@ -200,12 +200,11 @@ __osm_nr_rcv_create_nr(
>uint8_t  port_num;
>uint8_t  num_ports;
>uint16_t match_lid_ho;
> -  uint16_t lid_ho;
> +  ib_net16_t   base_lid;
>ib_net16_t   base_lid_ho;
>ib_net16_t   max_lid_ho;
>uint8_t  lmc;
>ib_net64_t   port_guid;
> -  ib_api_status_t  status;
> 
>OSM_LOG_ENTER( p_rcv->p_log, __osm_nr_rcv_create_nr );
> 
> @@ -245,7 +244,8 @@ __osm_nr_rcv_create_nr(
>  if( match_port_guid && ( port_guid != match_port_guid ) )
>continue;
> 
> -base_lid_ho = cl_ntoh16( osm_physp_get_base_lid( p_physp ) );
> +base_lid = osm_physp_get_base_lid( p_physp );
> +base_lid_ho = cl_ntoh16( base_lid );
>  lmc = osm_physp_get_lmc( p_physp );
>  max_lid_ho = (uint16_t)( base_lid_ho + (1 << lmc) - 1 );
>  match_lid_ho = cl_ntoh16( match_lid );
> @@ -260,29 +260,18 @@ __osm_nr_rcv_create_nr(
>  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
>   "__osm_nr_rcv_create_nr: "
>   "Comparing LID: 0x%X <= 0x%X <= 0x%X\n",
> - cl_ntoh16( base_lid_ho ),
> - cl_ntoh16( match_lid_ho ),
> - cl_ntoh16( max_lid_ho )
> + base_lid_ho, match_lid_ho, max_lid_ho
>   );
>}
> 
>if( (match_lid_ho <= max_lid_ho) && (match_lid_ho >=
base_lid_ho) )
>{
> -__osm_nr_rcv_new_nr( p_rcv, p_node, p_list, port_guid,
match_lid );
> +__osm_nr_rcv_new_nr( p_rcv, p_node, p_list, port_guid,
base_lid );
>}
>  }
>  else
>  {
> -  /*
> -For every lid value create a Node Record.
> -  */
> -  for( lid_ho = base_lid_ho; lid_ho <= max_lid_ho; lid_ho++ )
> -  {
> -status = __osm_nr_rcv_new_nr( p_rcv, p_node, p_list,
> -  port_guid, cl_hton16( lid_ho )
);
> -if( status != IB_SUCCESS )
> -  break;
> -  }
> +  __osm_nr_rcv_new_nr( p_rcv, p_node, p_list, port_guid, base_lid
);
>  }
>}
> 
> Index: opensm/osm_sa_portinfo_record.c
> ===
> --- opensm/osm_sa_portinfo_record.c   (revision 7961)
> +++ opensm/osm_sa_portinfo_record.c   (working copy)
> @@ -194,9 +194,9 @@ __osm_sa_pir_create(
>IN osm_pir_search_ctxt_t*   const p_ctxt )
>  {
>uint8_t   lmc;
> -  uint16_t  lid_ho;
>uint16_t  max_lid_ho;
>uint16_t  base_lid_ho;
> +  uint16_t  match_lid_ho;
> 
>OSM_LOG_ENTER( p_rcv->p_log, __osm_sa_pir_create );
> 
> @@ -218,17 +218,28 @@ __osm_sa_pir_create(
> 
>if( p_ctxt->comp_mask & IB_PIR_COMPMASK_LID )
>{
> -__osm_pir_rcv_new_pir( p_rcv, p_physp, p_ctxt->p_list,
> -   p_ctxt->p_rcvd_rec->lid );
> -  }
> -  else
> -  {
> -for( lid_ho = base_lid_ho; lid_ho <= max_lid_ho; lid_ho++ )
> +match_lid_ho = cl_ntoh16( p_ctxt->p_rcvd_rec->lid );
> +
> +/*
> +  We validate that the lid belongs to this node.
> +*/
> +if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) )
>  {
> -  __osm_pir_rcv_new_pir( p_rcv, p_physp, p_ctxt->p_list,
> - cl_hton16( lid_ho ) );
> +  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
> +   "__osm_sa_pir_create: "
> +   "Comparing LID: 0x%X <= 0x%X <= 0x%X\n",
> +   base_lid_ho, match_lid_ho, max_lid_ho
> +   );
>  }
> +
> +if ( match_lid_ho < base_lid_ho || match_lid_ho > max_l

Re: [openib-general] [PATCH updated] mthca: memfree completion with error workaround

2006-06-13 Thread Roland Dreier
Yeah, I like this much more.  It doesn't seem that likely that there
will be another firmware bug with the same symptoms, and we have to
trust some of what the hardware tells us...

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] mthca: restore missing registers

2006-06-13 Thread Roland Dreier
Thanks, applied for 2.6.17

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-13 Thread Sean Hefty
>There are architected ways to do that. There's busy for MADs which could
>be used for some MADs. For RMPP, would the transfer be ABORTed ? I don't
>think you can switch to BUSY in the middle (but I'm not 100% sure). I
>don't know how this limit is being used exactly, but it might be best if
>the RMPP receive were treated as 1 MAD regardless of of how many
>segments it was.

Maybe I should back-up some here.  There are a couple problems that I'm trying
to solve, but the main goal is to prevent sending duplicate responses.  I'd like
to do this by detecting and dropping duplicate requests.

To detect a duplicate request, my proposal is to move completed MADs to a
"done_list".  Newly received MADs would also check the done_list to determine if
the MAD is a duplicate.  When a user sends a response MAD, a check would be made
against the done_list for a matching request that has not generated a response
yet.  If one is not found, then the send would be failed.

Received MADs would be removed from the done_list when they are freed.  My guess
is that for kernel clients, the changes would probably be minimal.  For usermode
clients, the problem is more difficult, since we cannot trust usermode clients
to generate responses correctly, and there's no free_mad call that maps to the
kernel.

One of the ideas then, is for the kernel umad module to learn which MADs
generate responses.  It would do this by updating an entry to a table whenever a
response MAD is generated.  A received MAD would check against the table to see
if a response is supposed to be generated.  If not, then the MAD would be freed
after userspace claims it.  If a response is expected, then the MAD would not be
freed until the response was generated.

Assuming minimal hard-coding of which methods are requests, a client would drop
only about 1 MAD per method during start-up.  Considering most requests are not
sent reliably, this shouldn't be a big issue.  (In fact, outside of a
MultiPathRecord query, I don't believe any requests are sent reliably.)  And I
would argue that even if a request has been acknowledged, the sender of the
request would still need to deal with the case that no response is ever
generated.

If this approach were taken, then, it brings up the issue that MADs are being
stored in the kernel waiting for a response.  But what if a response is never
generated?  This problem is somewhat related to MADs being queued in the kernel,
but the userspace app doesn't call down to receive them.  Ideally, we could come
up with a single solution to both problems, but that may not be possible.

My current thoughts on how to handle requests are to time when each request MAD
is received, and queue it.  Once the queue is full, if another request is
received, it would check the MAD at the head of the queue.  If the MAD at the
head was older than some selected value (say 20 seconds), it would be bumped
from the queue, and the new request would be added to the tail.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH resend] libamso: fix erroneous return and memory leak in verbs.c

2006-06-13 Thread Steve Wise
On Tue, 2006-06-13 at 23:25 +0530, Pradipta Kumar Banerjee wrote:
> Forgot to add the 'Signed-off-by'
> 
> This patch fixes an erroneous return in func amso_create_cq() and a memory
> leak in amso_create_qp().
> 
> Signed-off-by: Pradipta Kumar Banerjee <[EMAIL PROTECTED]>
> 

Committed revision 7971.

Thanks,

Steve.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH resend] libamso: fix erroneous return and memory leak in verbs.c

2006-06-13 Thread Pradipta Kumar Banerjee
Forgot to add the 'Signed-off-by'

This patch fixes an erroneous return in func amso_create_cq() and a memory
leak in amso_create_qp().

Signed-off-by: Pradipta Kumar Banerjee <[EMAIL PROTECTED]>

---

Index = libamso/verbs.c

--- verbs.org   2006-06-13 18:56:50.0 +0530
+++ verbs.c 2006-06-13 19:02:03.0 +0530
@@ -154,9 +154,8 @@ struct ibv_cq *amso_create_cq(struct ibv
int ret;
 
cq = malloc(sizeof *cq);
-   if (!cq) {
-   goto err;
-   }
+   if (!cq) 
+   return NULL;
 
ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector,
&cq->ibv_cq, &cmd.ibv_cmd, sizeof cmd,
@@ -248,14 +247,15 @@ struct ibv_qp *amso_create_qp(struct ibv
ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd,
&resp.ibv_resp, sizeof resp);
if (ret)
-   return NULL;
+   goto err;
 
 #if 0 /* A reminder for bypass functionality */
qp->physaddr = resp.physaddr;
 #endif
 
return &qp->ibv_qp;
-
+err:
+   free(qp);
 
return NULL;
 }

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH] IB/uverbs: Don't serialize with ib_uverbs_idr_mutex

2006-06-13 Thread Roland Dreier
Michael> Won't this let the user issue multiple modify QP commands
Michael> in parallel on the same QP? mthca at least does not
Michael> protect against such attempts, and doing this will
Michael> confuse the hardware.

Hmm, that's a good point.  But I did write the following in
Documentation/infiniband/core_locking.txt:

  All of the methods in struct ib_device exported by a low-level
  driver must be fully reentrant.  The low-level driver is required to
  perform all synchronization necessary to maintain consistency, even
  if multiple function calls using the same object are run
  simultaneously.

  The IB midlayer does not perform any serialization of function calls.

So I guess this is a bug in mthca.

I think modify_srq at least has the same problem.  I'll audit this and
fix it up in mthca.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 4/4] diags: ucast routing dump file generator example - dump_lfts.sh

2006-06-13 Thread Hal Rosenstock
On Sat, 2006-06-10 at 20:32, Sasha Khapyorsky wrote:
> New simple script - dump_lfts.sh, may be used for ucast dump file
> generation.
> 
> Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>

Thanks. Applied.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] rping: Erroneous check for minumum ping buffer size

2006-06-13 Thread Steve Wise
Thanks.  Committed under revisions:

trunk: r7968
iwarp branch: r7969


Steve.


On Sat, 2006-06-10 at 23:04 +0530, Pradipta Kumar Banerjee wrote:
> This includes the changes suggested by Tom.
> 
> Signed-off-by: Pradipta Kumar Banerjee <[EMAIL PROTECTED]>
> ---
> 
> Index: rping.c
> =
> --- rping.org 2006-06-09 10:57:43.0 +0530
> +++ rping.c.new   2006-06-10 22:48:53.0 +0530
> @@ -96,6 +96,15 @@ struct rping_rdma_info {
>  #define RPING_BUFSIZE 64*1024
>  #define RPING_SQ_DEPTH 16
>  
> +/* Default string for print data and
> + * minimum buffer size
> + */
> +#define _stringify( _x ) # _x
> +#define stringify( _x ) _stringify( _x )
> +
> +#define RPING_MSG_FMT   "rdma-ping-%d: "
> +#define RPING_MIN_BUFSIZE   sizeof(stringify(INT_MAX)) + 
> sizeof(RPING_MSG_FMT)
> +
>  /*
>   * Control block struct.
>   */
> @@ -774,7 +783,7 @@ static void rping_test_client(struct rpi
>   cb->state = RDMA_READ_ADV;
>  
>   /* Put some ascii text in the buffer. */
> - cc = sprintf(cb->start_buf, "rdma-ping-%d: ", ping);
> + cc = sprintf(cb->start_buf, RPING_MSG_FMT, ping);
>   for (i = cc, c = start; i < cb->size; i++) {
>   cb->start_buf[i] = c;
>   c++;
> @@ -977,11 +986,11 @@ int main(int argc, char *argv[])
>   break;
>   case 'S':
>   cb->size = atoi(optarg);
> - if ((cb->size < 1) ||
> + if ((cb->size < RPING_MIN_BUFSIZE) ||
>   (cb->size > (RPING_BUFSIZE - 1))) {
>   fprintf(stderr, "Invalid size %d "
> -"(valid range is 1 to %d)\n",
> -cb->size, RPING_BUFSIZE);
> +"(valid range is %d to %d)\n",
> +cb->size, RPING_MIN_BUFSIZE, 
> RPING_BUFSIZE);
>   ret = EINVAL;
>   } else
>   DEBUG_LOG("size %d\n", (int) atoi(optarg));
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] librdmacm/examples/rping.c

2006-06-13 Thread Steve Wise
Thanks, applied.

iwarp branch: r7964
trunk: r7966


On Tue, 2006-06-13 at 11:24 -0500, Boyd R. Faulkner wrote:
> This patch resolves a race condition between the receipt of
> a connection established event and a receive completion from 
> the client.  The server no longer goes to connected state but
> merely waits for the READ_ADV state to begin its looping.  This
> keeps the server from going back to CONNECTED from the later
> states if the connection established event comes in after the
> receive completion (i.e. the loop starts).
> 
> Signed-off-by: Boyd Faulkner <[EMAIL PROTECTED]> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] opensm and NPTL

2006-06-13 Thread Hal Rosenstock
On Tue, 2006-06-13 at 12:56, Viswanath Krishnamurthy wrote:
> I am using the trunk.   Should I be using 1.0 ?

No; I didn't check but if my memory serves me correctly, the trunk may
have some fixes 1.0 doesn't towards this but I'm not 100% sure right now
and since you are using the trunk, I'm not going to do my homework on
whether that is really the case or my memory is just fuzzy on this.

-- Hal

> 
> -Viswa
>  
> 
> On 13 Jun 2006 12:35:17 -0400, Hal Rosenstock <[EMAIL PROTECTED]>
> wrote:
> On Tue, 2006-06-13 at 12:21, Viswanath Krishnamurthy wrote:
> > Yes.. I want to test waters again and see if the issues went
> away.
> 
> Are you using the trunk or 1.0 ?
> 
> -- Hal
> 
> > -Viswa
> >
> > 
> > On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock
> <[EMAIL PROTECTED]>
> > wrote:
> > Hi Viswa,
> >
> > On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy
> wrote: 
> > > There were some issues with opensm running with
> > NPTL  (thread
> > > library). Has the issues been resolved ?
> >
> > There were some fixes to the signal handling which
> went in 
> > back in the
> > Feb/early March time frame. OpenSM should be better
> with NPTL
> > now. Is it
> > working for you or are you asking before stepping
> into these
> > waters 
> > again ?
> >
> > -- Hal
> >
> > > Regards,
> > > Viswa
> > >
> > >
> > >
> > >
> >
> 
> __ 
> > >
> > > ___
> > > openib-general mailing list
> > > openib-general@openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> >
> 
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 0/4] opensm: Loading unicast routes from the file

2006-06-13 Thread Greg Johnson
On Sun, Jun 11, 2006 at 03:27:58AM +0300, Sasha Khapyorsky wrote:
> Hi,
> 
> There are couple of unicast routing related patches for OpenSM.
> 
> Basically it implements routing module which provides possibility to load
> switch forwarding tables from pre-created dump file. Currently unicast
> tables loading is only supported, multicast may be added in a future.
> 
> Short patch descriptions (more details may be found in emails with
> patches):
> 
> 1. Ucast dump file simplification.
> 2. Modular routing - preliminary implements generic model to plug new
> routing engine to OpenSM.
> 3. New simple unicast routing engine which allows to load LFTs from
> pre-created dump file.
> 4. Example of ucast dump generation script.
> 
> Please comment and test. Thanks.

We tried this on our 256-node cluster with a single chassis Voltaire
288-port switch.  It seems to load the routes generated by the dump
script, but afterward it is not possible to dump the routes again.  I
would like to re-dump the routes after loading to ensure that they were
loaded correctly.

After loading routes with "opensm -R file -U dump_file", dump_lfts.sh
gives:

nodeinfo
       
       
       
       
ibroute: iberror: dump tables failed: node info failed: valid addr?

for each switch.

Also, I had to delete a space in the sed script on line 17 of
dump_lfts.sh:

sed -ne 's/^.* lid \([1-9a-f]*\) .*$/\1/p'

became

sed -ne 's/^.* lid \([1-9a-f]*\).*$/\1/p'

Thanks for the work!

Greg

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] opensm and NPTL

2006-06-13 Thread Viswanath Krishnamurthy
I am using the trunk.   Should I be using 1.0 ?

-Viswa
 On 13 Jun 2006 12:35:17 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Tue, 2006-06-13 at 12:21, Viswanath Krishnamurthy wrote:> Yes.. I want to test waters again and see if the issues went away.Are you using the trunk or 1.0 ?-- Hal> -Viswa>>
> On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]>> wrote:> Hi Viswa,>> On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:
> > There were some issues with opensm running with> NPTL  (thread> > library). Has the issues been resolved ?>> There were some fixes to the signal handling which went in
> back in the> Feb/early March time frame. OpenSM should be better with NPTL> now. Is it> working for you or are you asking before stepping into these> waters
> again ?>> -- Hal>> > Regards,> > Viswa> >> >> >> >> __
> >> > ___> > openib-general mailing list> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general> >> > To unsubscribe, please visit> 
http://openib.org/mailman/listinfo/openib-general>>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] OpenSM/SA: Properly handle non base LID requests to some SA records

2006-06-13 Thread Hal Rosenstock
OpenSM/SA: Properly handle non base LID requests to some SA records

In osm_sa_node_record.c and osm_sa_portinfo_record.c, properly handle
non base LID requests per C15-0.1.11: Query responses shall contain a
port's base LID in  any LID component of a RID. So when LMC is non 0,
the only records that appear are those with the base LID and not with
any masked LIDs. Furthermore, if a query comes in on a non base LID, the
LID in the RID returned is only with the base LID.

Also, fixed some endian issues in osm_log messages.

Note: Similar patch for other affected SA records will follow.

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

Index: opensm/osm_sa_node_record.c
===
--- opensm/osm_sa_node_record.c (revision 7961)
+++ opensm/osm_sa_node_record.c (working copy)
@@ -200,12 +200,11 @@ __osm_nr_rcv_create_nr(
   uint8_t  port_num;
   uint8_t  num_ports;
   uint16_t match_lid_ho;
-  uint16_t lid_ho;
+  ib_net16_t   base_lid;
   ib_net16_t   base_lid_ho;
   ib_net16_t   max_lid_ho;
   uint8_t  lmc;
   ib_net64_t   port_guid;
-  ib_api_status_t  status;
 
   OSM_LOG_ENTER( p_rcv->p_log, __osm_nr_rcv_create_nr );
 
@@ -245,7 +244,8 @@ __osm_nr_rcv_create_nr(
 if( match_port_guid && ( port_guid != match_port_guid ) )
   continue;
 
-base_lid_ho = cl_ntoh16( osm_physp_get_base_lid( p_physp ) );
+base_lid = osm_physp_get_base_lid( p_physp );
+base_lid_ho = cl_ntoh16( base_lid );
 lmc = osm_physp_get_lmc( p_physp );
 max_lid_ho = (uint16_t)( base_lid_ho + (1 << lmc) - 1 );
 match_lid_ho = cl_ntoh16( match_lid );
@@ -260,29 +260,18 @@ __osm_nr_rcv_create_nr(
 osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
  "__osm_nr_rcv_create_nr: "
  "Comparing LID: 0x%X <= 0x%X <= 0x%X\n",
- cl_ntoh16( base_lid_ho ),
- cl_ntoh16( match_lid_ho ),
- cl_ntoh16( max_lid_ho )
+ base_lid_ho, match_lid_ho, max_lid_ho
  );
   }
 
   if( (match_lid_ho <= max_lid_ho) && (match_lid_ho >= base_lid_ho) )
   {
-__osm_nr_rcv_new_nr( p_rcv, p_node, p_list, port_guid, match_lid );
+__osm_nr_rcv_new_nr( p_rcv, p_node, p_list, port_guid, base_lid );
   }
 }
 else
 {
-  /*
-For every lid value create a Node Record.
-  */
-  for( lid_ho = base_lid_ho; lid_ho <= max_lid_ho; lid_ho++ )
-  {
-status = __osm_nr_rcv_new_nr( p_rcv, p_node, p_list,
-  port_guid, cl_hton16( lid_ho ) );
-if( status != IB_SUCCESS )
-  break;
-  }
+  __osm_nr_rcv_new_nr( p_rcv, p_node, p_list, port_guid, base_lid );
 }
   }
 
Index: opensm/osm_sa_portinfo_record.c
===
--- opensm/osm_sa_portinfo_record.c (revision 7961)
+++ opensm/osm_sa_portinfo_record.c (working copy)
@@ -194,9 +194,9 @@ __osm_sa_pir_create(
   IN osm_pir_search_ctxt_t*   const p_ctxt )
 {
   uint8_t   lmc;
-  uint16_t  lid_ho;
   uint16_t  max_lid_ho;
   uint16_t  base_lid_ho;
+  uint16_t  match_lid_ho;
 
   OSM_LOG_ENTER( p_rcv->p_log, __osm_sa_pir_create );
 
@@ -218,17 +218,28 @@ __osm_sa_pir_create(
 
   if( p_ctxt->comp_mask & IB_PIR_COMPMASK_LID )
   {
-__osm_pir_rcv_new_pir( p_rcv, p_physp, p_ctxt->p_list,
-   p_ctxt->p_rcvd_rec->lid );
-  }
-  else
-  {
-for( lid_ho = base_lid_ho; lid_ho <= max_lid_ho; lid_ho++ )
+match_lid_ho = cl_ntoh16( p_ctxt->p_rcvd_rec->lid );
+
+/*
+  We validate that the lid belongs to this node.
+*/
+if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) )
 {
-  __osm_pir_rcv_new_pir( p_rcv, p_physp, p_ctxt->p_list,
- cl_hton16( lid_ho ) );
+  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
+   "__osm_sa_pir_create: "
+   "Comparing LID: 0x%X <= 0x%X <= 0x%X\n",
+   base_lid_ho, match_lid_ho, max_lid_ho
+   );
 }
+
+if ( match_lid_ho < base_lid_ho || match_lid_ho > max_lid_ho )
+  goto Exit;
   }
+
+  __osm_pir_rcv_new_pir( p_rcv, p_physp, p_ctxt->p_list,
+ cl_hton16( base_lid_ho ) );
+
+ Exit:
   OSM_LOG_EXIT( p_rcv->p_log );
 }
 




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] osmtest: Add test for non base LID SA PortInfoRecord request when LMC > 0

2006-06-13 Thread Hal Rosenstock
osmtest: Add test for non base LID SA PortInfoRecord request when LMC >
0

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

Index: osmtest/osmtest.c
===
--- osmtest/osmtest.c   (revision 7961)
+++ osmtest/osmtest.c   (working copy)
@@ -1613,6 +1613,7 @@ osmtest_stress_port_recs_small( IN osmte
  **/
 ib_api_status_t
 osmtest_get_local_port_lmc( IN osmtest_t * const p_osmt,
+IN ib_net16_t  lid,
 OUT uint8_t *  const p_lmc )
 {
   osmtest_req_context_t context;
@@ -1629,7 +1630,7 @@ osmtest_get_local_port_lmc( IN osmtest_t
* Do a blocking query for our own PortRecord in the subnet.
*/
   status = osmtest_get_port_rec( p_osmt,
- cl_ntoh16(p_osmt->local_port.lid),
+ cl_ntoh16( lid ),
  &context );
 
   if( status != IB_SUCCESS )
@@ -3181,7 +3182,7 @@ osmtest_validate_path_data( IN osmtest_t
  cl_ntoh16( p_rec->slid ), cl_ntoh16( p_rec->dlid ) );
   }
 
-  status = osmtest_get_local_port_lmc( p_osmt, &lmc );
+  status = osmtest_get_local_port_lmc( p_osmt, p_osmt->local_port.lid, &lmc );
 
   /* HACK: Assume uniform LMC across endports in the subnet */ 
   /* In absence of this assumption, validation of this is much more 
complicated */
@@ -4885,10 +4886,13 @@ static ib_api_status_t
 osmtest_validate_against_db( IN osmtest_t * const p_osmt )
 {
   ib_api_status_t status = IB_SUCCESS;
-#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP)
+#ifdef VENDOR_RMPP_SUPPORT
+  uint8_t lmc;
+#ifdef DUAL_SIDED_RMPP
   osmtest_req_context_t context;
   osmv_multipath_req_t request;
 #endif
+#endif
 
   OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_against_db );
 
@@ -4999,6 +5003,18 @@ osmtest_validate_against_db( IN osmtest_
   if( status != IB_SUCCESS )
 goto Exit;
 
+  /* If LMC > 0, test non base LID SA PortInfoRecord request */
+  status = osmtest_get_local_port_lmc( p_osmt, p_osmt->local_port.lid, &lmc );
+  if ( status != IB_SUCCESS )
+goto Exit;
+
+  if (lmc != 0)
+  {
+status = osmtest_get_local_port_lmc( p_osmt, p_osmt->local_port.lid + 1, 
NULL);
+if ( status != IB_SUCCESS )
+  goto Exit;
+  }
+
   if (! p_osmt->opt.ignore_path_records)
   {
 status = osmtest_validate_all_path_recs( p_osmt );




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] opensm and NPTL

2006-06-13 Thread Hal Rosenstock
On Tue, 2006-06-13 at 12:21, Viswanath Krishnamurthy wrote:
> Yes.. I want to test waters again and see if the issues went away.

Are you using the trunk or 1.0 ?

-- Hal

> -Viswa
> 
> 
> On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]>
> wrote:
> Hi Viswa,
> 
> On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:
> > There were some issues with opensm running with
> NPTL  (thread
> > library). Has the issues been resolved ?
> 
> There were some fixes to the signal handling which went in
> back in the 
> Feb/early March time frame. OpenSM should be better with NPTL
> now. Is it
> working for you or are you asking before stepping into these
> waters
> again ?
> 
> -- Hal
> 
> > Regards,
> > Viswa
> >
> > 
> >
> >
> __
> >
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> 
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] librdmacm/examples/rping.c

2006-06-13 Thread Boyd R. Faulkner
This patch resolves a race condition between the receipt of
a connection established event and a receive completion from 
the client.  The server no longer goes to connected state but
merely waits for the READ_ADV state to begin its looping.  This
keeps the server from going back to CONNECTED from the later
states if the connection established event comes in after the
receive completion (i.e. the loop starts).

Signed-off-by: Boyd Faulkner <[EMAIL PROTECTED]>

Index: rping.c
===
--- rping.c (revision 7960)
+++ rping.c (working copy)
@@ -182,7 +182,13 @@
 
case RDMA_CM_EVENT_ESTABLISHED:
DEBUG_LOG("ESTABLISHED\n");
-   cb->state = CONNECTED;
+
+   /*
+* Server will wake up when first RECV completes.
+*/
+   if (!cb->server) {
+   cb->state = CONNECTED;
+   }
sem_post(&cb->sem);
break;
 
@@ -197,7 +203,7 @@
break;
 
case RDMA_CM_EVENT_DISCONNECTED:
-   fprintf(stderr, "DISCONNECT EVENT...\n");
+   fprintf(stderr, "%s DISCONNECT EVENT...\n", cb->server ? 
"server" : "client");
sem_post(&cb->sem);
break;
 
@@ -225,7 +231,7 @@
DEBUG_LOG("Received rkey %x addr %" PRIx64 "len %d from peer\n",
  cb->remote_rkey, cb->remote_addr, cb->remote_len);
 
-   if (cb->state == CONNECTED || cb->state == RDMA_WRITE_COMPLETE)
+   if (cb->state <= CONNECTED || cb->state == RDMA_WRITE_COMPLETE)
cb->state = RDMA_READ_ADV;
else
cb->state = RDMA_WRITE_ADV;


-- 
Boyd R. Faulkner
Open Grid Computing, Inc.
Phone:  512-343-9196 x109
Fax:512-343-5450

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] opensm and NPTL

2006-06-13 Thread Viswanath Krishnamurthy
Yes.. I want to test waters again and see if the issues went away.

-Viswa
On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:> There were some issues with opensm running with NPTL  (thread> library). Has the issues been resolved ?There were some fixes to the signal handling which went in back in the
Feb/early March time frame. OpenSM should be better with NPTL now. Is itworking for you or are you asking before stepping into these watersagain ?-- Hal> Regards,> Viswa>>
>> __>> ___> openib-general mailing list> 
openib-general@openib.org> http://openib.org/mailman/listinfo/openib-general>> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] SRP: Avoid a potential race on target->req_queue

2006-06-13 Thread Ishai Rabinovitz
Hi Roland,

There is a potential race between srp_reconnect_target and srp_reset_device
when they access the target->req_queue.
These functions can execute in the same time because srp_reconnect_target is
called form srp_reconnect_work that is scheduled by srp_completion, while
srp_reset_device is called from the scsi layer.

The race is caused because srp_reconnect_target is not holding host_lock while
accessing target->req_queue. It assumes that since the state is CONNECTING no
other function will access target->req_queue (and this is the case with
srp_reset_host for example).

There are two possible solutions: 
1) Change srp_reset_device: after locking host_lock, it will check the
   state. Only if the state is LIVE it will execute the loop that access
   target->req_queue.
2) Change srp_reconnect_target. Before executing the loop that access
   target->req_queue it will lock host_lock and will release it after
   the loop.

I'm sending a patch for the second solution. If you prefer the first, I have 
another patch for it (It is a bit longer).
Which solution do you like better?

Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]>

Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
===
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-06-13 
02:24:22.0 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-06-13 
02:26:07.0 +0300
@@ -641,8 +641,10 @@ static int srp_reconnect_target(struct s
while (ib_poll_cq(target->cq, 1, &wc) > 0)
; /* nothing */
 
+   spin_lock_irq(target->scsi_host->host_lock);
list_for_each_entry_safe(req, tmp, &target->req_queue, list)
srp_reset_req(target, req);
+   spin_unlock_irq(target->scsi_host->host_lock);
 
target->rx_head  = 0;
target->tx_head  = 0;
-- 
Ishai Rabinovitz

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] osm: partition manager force policy

2006-06-13 Thread Eitan Zahavi
Hi Hal,

Hal Rosenstock wrote:
> Hi Eitan,
> 
> On Tue, 2006-06-13 at 08:54, Eitan Zahavi wrote:
> 
>>--text follows this line--
>>Hi Hal
>>
>>This is a second take after debug and cleanup of the partition manager
>>patch I have previously provided.
> 
> 
> Thanks.
> 
> So this patch superceeds the previous version ? If so, in the future,
> just indicate [PATCHv2] for this.
> 
> 
>> The functionality is the same but
>>this one is after 2 days of testing on the simulator.
> 
> 
> Are you still working on this (more testing) ?
> 
> 
>>I also did some code restructuring for clarity. 
> 
> 
>>Tests passed were both dedicated pkey enforcements (pkey.*) and
>>stress test (osmStress.*)
>>
>>As I started to test the partition manager code (using ibmgtsim pkey test),
>>I realized the implementation does not really enforces the partition policy
>>on the given fabric. This patch fixes that. It was verified using the 
>>simulation test. Several other corner cases were fixed too.
> 
> 
> Can you elaborate on these cases ?
If you ask about the corner cases:
1. A bug in avoiding switch enforcement when the HCA had more blocks then the 
switch.
2. Similar but when the HCA blocks are unused so actually the switch does not 
need so many blocks
3. Segfaults due to fabric instability.

If you ask about the test code it is checked in 
https://openib.org/svn/gen2/utils/src/linux-user/ibmgtsim/tests
the file names start with pkey.* and osmStress.*.

In general the pkey test does:
* Randomize 3 pkeys p1 p2 p3 (first 2 are full 1 is partial)
* Assignment of ports into 3 groups G1 which uses p1, G2 which
   uses p2 and G3 which uses p1,p2 and p3
* For each HCA port randomize pkey tables with random number of entries
   (including the ones above with random location)
* For some ports override the tables with an incorrect set
* write a partition policy file
* start the SM, wait for subnet up
* randomly select HCA ports and verify (using osmtest -f c) that all-to-all 
path records they
   see are limited by the partitions they belong to
* forcefully null all default pkey entries on  the fabric ports
* set a change bit on a switch to force a sweep
* wait for subnet up and check all ports do have correct default pkey set

The stress test does:
* Setup LIDs
* Force some random LID violations (duplicated, misaligned, zero)
* Write guid2lid file with some random change
* Disconnect some random nodes
* Run OpenSM wait for subnet up
* Repeat 10 times: Reconnect all nodes Disconnect some random nodes
* Wait for subnet up
* check all LID values are correct (according to guid2lid)
* Start 240 iterations of selecting one of the following :
   connect random port
   disconnect random port
   register random service
   query random paths from random nodes
   join random port to 0xC000
   leave random port from 0xC000
*  Eventually:
connect all nodes
join 0xC000 from all HCA ports
wait for subnet up
check connectivity and FDB validity etc using ibdiagnet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH updated] mthca: memfree completion with error workaround

2006-06-13 Thread Michael S. Tsirkin
OK, here's an optimized version of the fix. With this, I see:

before
   5994   0   05994176a drivers/infiniband/hw/mthca/mthca_cq.o
after
   5995   0   05995176b drivers/infiniband/hw/mthca/mthca_cq.o

So the cost is minimal. Please consider for 2.6.17.

---

Memfree firmware is in rare cases reporting WQE index == base - 1
in receive completion with error instead of (rq size - 1); base is 0 in mthca.
Here is a patch to avoid kernel crash and report a correct WR id in this case.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

Index: linux-2.6.16/drivers/infiniband/hw/mthca/mthca_cq.c
===
--- linux-2.6.16.orig/drivers/infiniband/hw/mthca/mthca_cq.c2006-05-16 
12:33:05.0 +0300
+++ linux-2.6.16/drivers/infiniband/hw/mthca/mthca_cq.c 2006-06-13 
12:14:13.0 +0300
@@ -540,8 +540,17 @@ static inline int mthca_poll_one(struct 
entry->wr_id = srq->wrid[wqe_index];
mthca_free_srq_wqe(srq, wqe);
} else {
+   s32 wqe;
wq = &(*cur_qp)->rq;
-   wqe_index = be32_to_cpu(cqe->wqe) >> wq->wqe_shift;
+   wqe = be32_to_cpu(cqe->wqe);
+   wqe_index = wqe >> wq->wqe_shift;
+   /*
+   * WQE addr == base - 1 might be reported in receive completion
+   * with error instead of (rq size - 1) by Sinai FW 1.0.800,
+   * Arbel FW 5.1.400 and should be fixed in later revisions.
+   */
+   if (unlikely(wqe_index < 0))
+   wqe_index = wq->max - 1;
entry->wr_id = (*cur_qp)->wrid[wqe_index];
}
 
-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH updated] libmthca: memfree completion with error

2006-06-13 Thread Michael S. Tsirkin
Same thing for userspace.

---

Fix up completion with error for memfree.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

Index: openib/src/userspace/libmthca/src/cq.c
===
--- openib/src/userspace/libmthca/src/cq.c  (revision 7890)
+++ openib/src/userspace/libmthca/src/cq.c  (working copy)
@@ -347,8 +347,17 @@
wc->wr_id = srq->wrid[wqe_index];
mthca_free_srq_wqe(srq, wqe_index);
} else {
+   int32_t wqe;
wq = &(*cur_qp)->rq;
-   wqe_index = ntohl(cqe->wqe) >> wq->wqe_shift;
+   wqe = ntohl(cqe->wqe);
+   wqe_index = wqe >> wq->wqe_shift;
+   /*
+* WQE addr == base - 1 might be reported in receive completion
+* with error instead of (rq size - 1) by Sinai FW 1.0.800,
+* Arbel FW 5.1.400 and should be fixed in later revisions.
+*/
+   if (wqe_index < 0)
+   wqe_index = wq->max - 1;
wc->wr_id = (*cur_qp)->wrid[wqe_index];
}
 

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] libamso: fix erroneous return and memory leak in verbs.c

2006-06-13 Thread Pradipta Kumar Banerjee
Hi,
 This patch fixes an erroneous return in func amso_create_cq() and a memory
leak in amso_create_qp()

---
Index = libamso/verbs.c

--- verbs.org   2006-06-13 18:56:50.0 +0530
+++ verbs.c 2006-06-13 19:02:03.0 +0530
@@ -154,9 +154,8 @@ struct ibv_cq *amso_create_cq(struct ibv
int ret;
 
cq = malloc(sizeof *cq);
-   if (!cq) {
-   goto err;
-   }
+   if (!cq) 
+   return NULL;
 
ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector,
&cq->ibv_cq, &cmd.ibv_cmd, sizeof cmd,
@@ -248,14 +247,15 @@ struct ibv_qp *amso_create_qp(struct ibv
ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd,
&resp.ibv_resp, sizeof resp);
if (ret)
-   return NULL;
+   goto err;
 
 #if 0 /* A reminder for bypass functionality */
qp->physaddr = resp.physaddr;
 #endif
 
return &qp->ibv_qp;
-
+err:
+   free(qp);
 
return NULL;
 }


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] osm: partition manager force policy

2006-06-13 Thread Hal Rosenstock
Hi Eitan,

On Tue, 2006-06-13 at 08:54, Eitan Zahavi wrote:
> --text follows this line--
> Hi Hal
> 
> This is a second take after debug and cleanup of the partition manager
> patch I have previously provided.

Thanks.

So this patch superceeds the previous version ? If so, in the future,
just indicate [PATCHv2] for this.

>  The functionality is the same but
> this one is after 2 days of testing on the simulator.

Are you still working on this (more testing) ?

> I also did some code restructuring for clarity. 

> Tests passed were both dedicated pkey enforcements (pkey.*) and
> stress test (osmStress.*)
> 
> As I started to test the partition manager code (using ibmgtsim pkey test),
> I realized the implementation does not really enforces the partition policy
> on the given fabric. This patch fixes that. It was verified using the 
> simulation test. Several other corner cases were fixed too.

Can you elaborate on these cases ?

-- Hal



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] osm: Provide SUBNET UP message every heavy sweep - resend

2006-06-13 Thread Hal Rosenstock
Hi Eitan,

On Tue, 2006-06-13 at 08:39, Eitan Zahavi wrote:
> Hi Hal
> 
> Sorry bout the previous patch - I got the } else { in it.
> 
> This trivial patch provides a "SUBNET UP" message (with level INFO)
> every time the SM completes a full heavy sweep. It is most useful for
> cases where you want to make sure teh SM responded to some change in
> the fabric. Also used to sync the various test flows to the end of sweeps.

I already had fixed this prior to committing it. I thought that was
easier than "going 'round the block" on it.

> Eitan
> 
> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>
> 
> Index: opensm/osm_state_mgr.c
> ===
> --- opensm/osm_state_mgr.c(revision 7904)
> +++ opensm/osm_state_mgr.c(working copy)
> @@ -200,6 +200,10 @@ __osm_state_mgr_up_msg(
>/* clear the signal */
>p_mgr->p_subn->moved_to_master_state = FALSE;
> }
> + else
> + {
> +  osm_log( p_mgr->p_log, OSM_LOG_INFO, "SUBNET UP\n" ); /* Format Waived 
> */
> + }

If tab is supposed to be the convention, spaces are used in most OpenSM
modules and I have been trying to keep to the convention used in the
particular module.

-- Hal

> if( p_mgr->p_subn->opt.sweep_interval )
> {
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OFED 1.0 release schedule

2006-06-13 Thread Tziporet Koren








Hi All,

 

After reading the mail thread regarding OFED release I have
decided this:

 

We upload OFED-1.0-pre1.tgz
to https://openib.org/svn/gen2/branches/1.0/ofed/releases/

 

We checked that all modules compile and loaded on this build
(including ipath and uDAPL)

The only missing parts of this release from the final release
are the documents, and the scripts rpm that Scott requested.

 

I think testing this version 3 days (Tuesday, Wednesday and Thursday)
should be enough as Scott wrote.

So – we can do the official OFED 1.0 release on Friday
16-June.

 

Matt – please check with Novel if this date is
acceptable by them.

 

If not then the earliest we can do the release if Thursday
15-June.

 

 

Tziporet
Koren

Software
Director

Mellanox Technologies

mailto: [EMAIL PROTECTED]
Tel +972-4-9097200, ext 380

 






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] osm: partition manager force policy

2006-06-13 Thread Eitan Zahavi
--text follows this line--
Hi Hal

This is a second take after debug and cleanup of the partition manager
patch I have previously provided. The functionality is the same but
this one is after 2 days of testing on the simulator.
I also did some code restructuring for clarity. 

Tests passed were both dedicated pkey enforcements (pkey.*) and
stress test (osmStress.*)

As I started to test the partition manager code (using ibmgtsim pkey test),
I realized the implementation does not really enforces the partition policy
on the given fabric. This patch fixes that. It was verified using the 
simulation test. Several other corner cases were fixed too.

Eitan

Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Index: include/opensm/osm_port.h
===
--- include/opensm/osm_port.h   (revision 7867)
+++ include/opensm/osm_port.h   (working copy)
@@ -586,6 +586,39 @@ osm_physp_get_pkey_tbl( IN const osm_phy
 *  Port, Physical Port
 */
 
+/f* OpenSM: Physical Port/osm_physp_get_mod_pkey_tbl
+* NAME
+*  osm_physp_get_mod_pkey_tbl
+*
+* DESCRIPTION
+*  Returns a NON CONST pointer to the P_Key table object of the Physical Port 
object.
+*
+* SYNOPSIS
+*/
+static inline osm_pkey_tbl_t *
+osm_physp_get_mod_pkey_tbl( IN osm_physp_t* const p_physp )
+{
+  CL_ASSERT( osm_physp_is_valid( p_physp ) );
+  /*
+(14.2.5.7) - the block number valid values are 0-2047, and are further
+limited by the size of the P_Key table specified by the PartitionCap on 
the node. 
+  */
+  return( &p_physp->pkeys );
+};
+/*
+* PARAMETERS
+*  p_physp
+* [in] Pointer to an osm_physp_t object.
+*
+* RETURN VALUES
+*  The pointer to the P_Key table object.
+*
+* NOTES
+*
+* SEE ALSO
+*  Port, Physical Port
+*/
+
 /f* OpenSM: Physical Port/osm_physp_set_slvl_tbl
 * NAME
 *  osm_physp_set_slvl_tbl
Index: include/opensm/osm_pkey.h
===
--- include/opensm/osm_pkey.h   (revision 7867)
+++ include/opensm/osm_pkey.h   (working copy)
@@ -92,6 +92,9 @@ typedef struct _osm_pkey_tbl
   cl_ptr_vector_t blocks;
   cl_ptr_vector_t new_blocks;
   cl_map_tkeys;
+  cl_qlist_t  pending;
+  uint16_tused_blocks;
+  uint16_tmax_blocks;
 } osm_pkey_tbl_t;
 /*
 * FIELDS
@@ -104,6 +107,18 @@ typedef struct _osm_pkey_tbl
 *  keys
 *  A set holding all keys
 *
+*  pending
+* A list osm_pending_pkey structs that is temporarily set by the 
+* pkey mgr and used during pkey mgr algorithm only
+*
+*  used_blocks
+* Tracks the number of blocks having non-zero pkeys
+*
+*  max_blocks
+* The maximal number of blocks this partition table might hold
+* this value is based on node_info (for port 0 or CA) or switch_info
+* updated on receiving the node_info or switch_info GetResp
+*
 * NOTES
 * 'blocks' vector should be used to store pkey values obtained from
 * the port and SM pkey manager should not change it directly, for this
@@ -114,6 +129,39 @@ typedef struct _osm_pkey_tbl
 *
 */
 
+/s* OpenSM: osm_pending_pkey_t
+* NAME
+*  osm_pending_pkey_t
+*
+* DESCRIPTION
+*  This objects stores temporary information on pkeys their target block 
and index
+*  during the pkey manager operation
+*
+* SYNOPSIS
+*/
+typedef struct _osm_pending_pkey {
+  cl_list_item_t list_item;
+  uint16_t   pkey;
+  uint32_t   block;
+  uint8_tindex;
+  boolean_t  is_new;
+} osm_pending_pkey_t;
+/*
+* FIELDS
+*  pkey
+*  The actual P_Key
+*
+*  block
+*  The block index based on the previous table extracted from the 
device
+*
+*  index
+*  The index of the pky within the block
+*
+*  is_new
+* TRUE for new P_Keys such that the block and index are invalid in that 
case
+*
+*/
+
 /f* OpenSM: osm_pkey_tbl_construct
 * NAME
 *  osm_pkey_tbl_construct
@@ -209,8 +257,8 @@ osm_pkey_tbl_get_num_blocks( 
 static inline ib_pkey_table_t *osm_pkey_tbl_block_get( 
   const osm_pkey_tbl_t *p_pkey_tbl, uint16_t block)
 {
-  CL_ASSERT(block < cl_ptr_vector_get_size(&p_pkey_tbl->blocks));
-  return(cl_ptr_vector_get(&p_pkey_tbl->blocks, block));
+   return( (block < cl_ptr_vector_get_size(&p_pkey_tbl->blocks)) ?
+ cl_ptr_vector_get(&p_pkey_tbl->blocks, block) : NULL);
 };
 /*
 *  p_pkey_tbl
@@ -244,6 +292,106 @@ static inline ib_pkey_table_t *osm_pkey_
 /*
  */
 
+
+/f* OpenSM: osm_pkey_tbl_make_block_pair
+* NAME
+*  osm_pkey_tbl_make_block_pair
+*
+* DESCRIPTION
+*  Find or create a pair of "old" and "new" blocks for the
+*  given block index
+*
+* SYNOPSIS
+*/
+int osm_pkey_tbl_make_block_pair( 
+   osm_pkey_tbl_t   *p_pkey_tbl, 
+   uint16_t  block_idx,
+   ib_pkey_table_t **pp_old_block,
+   ib_pkey_table_t **pp_new_block);
+/*
+* p_pkey_tbl
+*   [in] Pointer to the PKey table 
+*
+* block_idx
+*   [in

[openib-general] [PATCH] osm: Provide SUBNET UP message every heavy sweep - resend

2006-06-13 Thread Eitan Zahavi
Hi Hal

Sorry bout the previous patch - I got the } else { in it.

This trivial patch provides a "SUBNET UP" message (with level INFO)
every time the SM completes a full heavy sweep. It is most useful for
cases where you want to make sure teh SM responded to some change in
the fabric. Also used to sync the various test flows to the end of sweeps.

Eitan

Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Index: opensm/osm_state_mgr.c
===
--- opensm/osm_state_mgr.c  (revision 7904)
+++ opensm/osm_state_mgr.c  (working copy)
@@ -200,6 +200,10 @@ __osm_state_mgr_up_msg(
   /* clear the signal */
   p_mgr->p_subn->moved_to_master_state = FALSE;
}
+   else
+   {
+  osm_log( p_mgr->p_log, OSM_LOG_INFO, "SUBNET UP\n" ); /* Format Waived */
+   }
 
if( p_mgr->p_subn->opt.sweep_interval )
{


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1/4] Simplification of the ucast fdb dumps.

2006-06-13 Thread Eitan Zahavi
Hi Sasha,

I still need to see if there are no real problematic changes in the osm.fdbs
file syntax (need to update ibdm to support those) but I like the patch and
the clean way you resolved the multiple opens of the dump file.

EZ

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 2/4] Modular routing engine (unicast only yet).

2006-06-13 Thread Eitan Zahavi
Hi Sasha,

As provided in my previous patch 1/4 comments
I think the callbacks should also have an entry for the MinHop stage (maybe
this is the ucast_build_fwd_tables?) I have some algorithms in mind that will
skip that stage all-together.

Also it might make sense for each routing engine to provide its own "dump"
routine such that each could support difference file format if needed.

Rest of the comments are inline

EZ

Sasha Khapyorsky wrote:
> 
> diff --git a/osm/include/opensm/osm_opensm.h b/osm/include/opensm/osm_opensm.h
> index 3235ad4..3e6e120 100644
> --- a/osm/include/opensm/osm_opensm.h
> +++ b/osm/include/opensm/osm_opensm.h
> @@ -92,6 +92,18 @@ BEGIN_C_DECLS
>  *
>  */
>  
> +/*
> + * routing engine structure - yet limited by ucast_fdb_assign and
> + *  ucast_build_fwd_tables (multicast callbacks may be added later)
> + */
> +struct osm_routing_engine {
> + const char *name;
> + void *context;
> + int (*ucast_build_fwd_tables)(void *context);
> + int (*ucast_fdb_assign)(void *context);
> + void (*delete)(void *context);
> +};
It would be nice if you added a standard header to this struct.
It is not clear to me what ucast_build_fwd_tables and
ucast_fdb_assign are mapping to.

Please see the next section as an example for a struct header.
> +
>  /s* OpenSM: OpenSM/osm_opensm_t
>  * NAME
>  *osm_opensm_t
> @@ -116,7 +128,7 @@ typedef struct _osm_opensm_t
>osm_log_t  log;
>cl_dispatcher_tdisp;
>cl_plock_t lock;
> -  updn_t *p_updn_ucast_routing;
> +  struct osm_routing_engine routing_engine;
>osm_stats_tstats;
>  } osm_opensm_t;
>  /*
> @@ -153,6 +165,9 @@ typedef struct _osm_opensm_t
>  *lock
>  *Shared lock guarding most OpenSM structures.
>  *
> +*routing_engine
> +*Routing engine, will be initialized then used
> +*
>  *stats
>  *Open SM statistics block
>  *
> diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c
> index cac7f9b..0c0d635 100644
> --- a/osm/opensm/osm_ucast_mgr.c
> +++ b/osm/opensm/osm_ucast_mgr.c
> @@ -62,6 +62,7 @@ #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define LINE_LENGTH 256
>  
> @@ -269,7 +270,7 @@ osm_ucast_mgr_dump_ucast_routes(
>strcat( p_mgr->p_report_buf, "yes" );
>  else
>  {
> -  if (p_mgr->p_subn->opt.pfn_ui_ucast_fdb_assign) {
> +  if (p_mgr->p_subn->p_osm->routing_engine.ucast_fdb_assign) {
>  ui_ucast_fdb_assign_func_defined = TRUE;
>} else {
>  ui_ucast_fdb_assign_func_defined = FALSE;
> @@ -708,7 +709,7 @@ __osm_ucast_mgr_process_port(
>node_guid = osm_node_get_node_guid(osm_switch_get_node_ptr( p_sw ) );
>  
>/* Flag to mark whether or not a ui ucast fdb assign function was given */
> -  if (p_mgr->p_subn->opt.pfn_ui_ucast_fdb_assign)
> +  if (p_mgr->p_subn->p_osm->routing_engine.ucast_fdb_assign)
>  ui_ucast_fdb_assign_func_defined = TRUE;
>else
>  ui_ucast_fdb_assign_func_defined = FALSE;
> @@ -753,7 +754,7 @@ __osm_ucast_mgr_process_port(
>  
>/* Up/Down routing can cause unreachable routes between some 
>   switches so we do not report that as an error in that case */
> -  if (!p_mgr->p_subn->opt.updn_activate)
> +  if (!p_mgr->p_subn->p_osm->routing_engine.ucast_fdb_assign)
>{
>  osm_log( p_mgr->p_log, OSM_LOG_ERROR,
>   "__osm_ucast_mgr_process_port: ERR 3A08: "
> @@ -973,6 +974,18 @@ __osm_ucast_mgr_process_tbl(
>  /**
>   **/
>  static void
> +__osm_ucast_mgr_set_table_cb(
> +  IN cl_map_item_t* const  p_map_item,
> +  IN void* context )
> +{
> +  osm_switch_t* const p_sw = (osm_switch_t*)p_map_item;
> +  osm_ucast_mgr_t* const p_mgr = (osm_ucast_mgr_t*)context;
> +  __osm_ucast_mgr_set_table( p_mgr, p_sw );
> +}
> +
> +/**
> + **/
> +static void
>  __osm_ucast_mgr_process_neighbors(
>IN cl_map_item_t* const  p_map_item,
>IN void* context )
> @@ -1058,12 +1071,14 @@ osm_ucast_mgr_process(
>  {
>uint32_t i;
>uint32_t iteration_max;
> +  struct osm_routing_engine *p_routing_eng;
>osm_signal_t signal;
>cl_qmap_t *p_sw_guid_tbl;
>  
>OSM_LOG_ENTER( p_mgr->p_log, osm_ucast_mgr_process );
>  
>p_sw_guid_tbl = &p_mgr->p_subn->sw_guid_tbl;
> +  p_routing_eng = &p_mgr->p_subn->p_osm->routing_engine;
>  
>CL_PLOCK_EXCL_ACQUIRE( p_mgr->p_lock );
>  
> @@ -1129,6 +1144,14 @@ osm_ucast_mgr_process(
>   i
>   );
>  
> +if (p_routing_eng->ucast_build_fwd_tables &&
> +p_routing_eng->ucast_build_fwd_tables(p_routing_eng->context) == 0)
> +{
> +  cl_qmap_apply_func( p_sw_guid_tbl,
> +   

Re: [openib-general] [PATCH 3/4] New routing module which loads LFT tables from dump file.

2006-06-13 Thread Eitan Zahavi
Hi Sasha,

Please see my comments inside

Sasha Khapyorsky wrote:
> This patch implements trivial routing module which able to load LFT
> tables from dump file. Main features:
> - support for unicast LFTs only, support for multicast can be added later
> - this will run after min hop matrix calculation
> - this will load switch LFTs according to the path entries introduced in
>   the dump file
> - no additional checks will be performed (like is port connected, etc)
> - in case when fabric LIDs were changed this will try to reconstruct LFTs
>   correctly if endport GUIDs are represented in the dump file (in order
>   to disable this GUIDs may be removed from the dump file or zeroed)
I think you cold use the concept of directed routes for storing the LIDs too.
So in case of new LID assignments you can extract the old -> new mapping by
scanning the LIDs of end ports by their DR path.
Anyway, I think it is required that you also perform topology matching such that
if someone changed the topology you are able to figure it out and stop.
THIS IS A SERIOUS LIMITATION OF YOUR PROPOSAL.
> 
> The dump file format is compatible with output of 'ibroute' util and for
> whole fabric may be generated with script like this:
> 
>   for sw_lid in `ibswitches | awk '{print $NF}'` ; do
>   ibroute $sw_lid
>   done > /path/to/dump_file
> 
> , or using DR paths:
> 
> 
>   for sw_dr in `ibnetdiscover -v \
>   | sed -ne '/^DR path .* switch /s/^DR path \[\(.*\)\].*$/\1/p' \
>   | sed -e 's/\]\[/,/g' \
>   | sort -u` ; do
>   ibroute -D ${sw_dr}
>   done > /path/to/dump_file
WE SHOULD ALSO PROVIDE A DUMP FILE VIA:
1. OpenSM should dump its routes using this format (like it does today using 
osm.fdbs)
2. ibdiagnet
> 
> 
> 
> diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h
> index a637367..ec1d056 100644
> --- a/osm/include/opensm/osm_subnet.h
> +++ b/osm/include/opensm/osm_subnet.h
> @@ -423,6 +424,10 @@ typedef struct _osm_subn_opt
>  *  routing_engine_name
>  * Name of used routing engine (other than default Min Hop Algorithm)
>  *
> +*  ucast_dump_file
> +* Name of the unicast routing dump file from where switch
> +* forwearding tables will be loaded
  ^^^
  forwarding
> +*
>  *  updn_guid_file
>  * Pointer to name of the UPDN guid file given by User
>  *
>  
> diff --git a/osm/opensm/osm_ucast_file.c b/osm/opensm/osm_ucast_file.c
> new file mode 100644
> index 000..a68d9ec
> --- /dev/null
> +++ b/osm/opensm/osm_ucast_file.c
> @@ -0,0 +1,258 @@
> +/*
> + * Copyright (c) 2006 Voltaire, Inc. All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + *  - Redistributions of source code must retain the above
> + *copyright notice, this list of conditions and the following
> + *disclaimer.
> + *
> + *  - Redistributions in binary form must reproduce the above
> + *copyright notice, this list of conditions and the following
> + *disclaimer in the documentation and/or other materials
> + *provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + * $Id$
> + */
> +
> +/*
> + * Abstract:
> + *Implementation of OpenSM unicast routing module which loads
> + *routes from the dump file
> + *
> + * Environment:
> + *Linux User Mode
> + *
> + */
> +
> +#if HAVE_CONFIG_H
> +#  include 
> +#endif   /* HAVE_CONFIG_H */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define PARSEERR(log, file_name, lineno, fmt, arg...) \
> + osm_log(log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " fmt , \
> + file_name, lineno, ##arg )
> +
> +#define PARSEWARN(log, file_name, lineno, fmt, arg...) \
> + osm_log(log, OSM_LOG_VERBOSE, "PARSE WARN: %s:%u: " fmt , \
> + file_name, lineno, ##arg )
> +
> +static uint16_t remap_lid(osm_opensm_t *p_osm, uint16_t lid, ib_net64_t guid)
> +{
> +   

Re: [openib-general] OFED-RC4 backport to sles9 sp3 kernel 2.6.5-7.244

2006-06-13 Thread Tziporet Koren
Moshe Kazir wrote:
>The enclosed diff file include sles9 sp3 backporrt changes.
>
>
Great you did it

You understand we cannot include it in OFED 1.0, since it should be out 
this week, but we can add it to OFED 1.1, that will be on July.

 
Tziporet


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] osm: Provide SUBNET UP message every heavy sweep

2006-06-13 Thread Hal Rosenstock
Hi Eitan,

On Tue, 2006-06-13 at 06:44, Eitan Zahavi wrote:
> Hi Hal
> 
> This trivial patch provides a "SUBNET UP" message (with level INFO)
> every time the SM completes a full heavy sweep. It is most useful for
> cases where you want to make sure teh SM responded to some change in
> the fabric. Also used to sync the various test flows to the end of sweeps.
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Thanks. Applied to trunk only.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [Bug 126] RDMA_CM and UCM not loaded on boot

2006-06-13 Thread Tziporet Koren
It's not in the default, since CM and CMA are not defined as basic HPC
components (basic components are only mthca, ipath, core and ipoib). 
Thus any one wants these modules should change the file
/etc/infiniband/openib.conf

Tziporet


-Original Message-
From: Arlin Davis [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 12, 2006 8:30 PM
To: Tziporet Koren
Cc: openib; Woodruff, Robert J
Subject: Re: [openib-general] [Bug 126] RDMA_CM and UCM not loaded on
boot

[EMAIL PROTECTED] wrote:


Did the default openib.conf script get updated with:

RDMA_CM_LOAD=yes
RDMA_UCM_LOAD=yes

-arlin



-arlin


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] osm: Provide SUBNET UP message every heavy sweep

2006-06-13 Thread Eitan Zahavi
Hi Hal

This trivial patch provides a "SUBNET UP" message (with level INFO)
every time the SM completes a full heavy sweep. It is most useful for
cases where you want to make sure teh SM responded to some change in
the fabric. Also used to sync the various test flows to the end of sweeps.

Eitan

Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Index: opensm/osm_state_mgr.c
===
--- opensm/osm_state_mgr.c  (revision 7904)
+++ opensm/osm_state_mgr.c  (working copy)
@@ -199,6 +199,8 @@ __osm_state_mgr_up_msg(
   osm_log( p_mgr->p_log, OSM_LOG_SYS, "SUBNET UP\n" ); /* Format Waived */
   /* clear the signal */
   p_mgr->p_subn->moved_to_master_state = FALSE;
+   } else {
+  osm_log( p_mgr->p_log, OSM_LOG_INFO, "SUBNET UP\n" ); /* Format Waived */
}
 
if( p_mgr->p_subn->opt.sweep_interval )


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] opensm and NPTL

2006-06-13 Thread Hal Rosenstock
Hi Viswa,

On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:
> There were some issues with opensm running with NPTL  (thread
> library). Has the issues been resolved ?

There were some fixes to the signal handling which went in back in the
Feb/early March time frame. OpenSM should be better with NPTL now. Is it
working for you or are you asking before stepping into these waters
again ?

-- Hal

> Regards,
> Viswa
> 
> 
> 
> __
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] RFC: detecting duplicate MAD requests

2006-06-13 Thread Hal Rosenstock
On Tue, 2006-06-13 at 00:40, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > Without some sort of restriction, a userspace app that's slow to pull 
> > > receive 
> > > MADs from the kernel would result in consuming a large amount of kernel 
> > > memory.
> > 
> > Understood but dropping a MAD after acknowledging also seems like a bad
> > thing to me.
> 
> True. Maybe we can find a way to avoid acknowledging the MAD?

There are architected ways to do that. There's busy for MADs which could
be used for some MADs. For RMPP, would the transfer be ABORTed ? I don't
think you can switch to BUSY in the middle (but I'm not 100% sure). I
don't know how this limit is being used exactly, but it might be best if
the RMPP receive were treated as 1 MAD regardless of of how many
segments it was.

-- Hal

> > Couldn't this be controlled on the request side (assuming
> > the request has a response as opposed to unsolicited sends/receives) ?
> 
> Sounds like the wrong thing to do.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general