date:20060919

[openib-general] [PATCH] osm: fixing bugs in osmtest

2006-09-19 Thread Yevgeny Kliteynik

Hi Hal

I'm doing a major review of the osmtest.
This patch is fixing a few bugs in osmtest where failures
were ignored. More precisely, osmtest was expecting error,
but got IB_SUCCESS and ignored the fact that it should have
gotten an error.
There are also a few changes to improve the code and osmtest
log readability.
More patches expected.

This patch is for trunk only.

I tested applying this patch before sending it. If you get the
patch rejected again - let me know.

Thanks.

Yevgeny

Signed-off-by:  Yevgeny Kliteynik <[EMAIL PROTECTED]>

Index: osmtest/include/osmtest.h
===
--- osmtest/include/osmtest.h   (revision 9552)
+++ osmtest/include/osmtest.h   (working copy)
@@ -506,4 +506,13 @@ ib_api_status_t
  osmtest_get_local_port_lmc( IN osmtest_t * const p_osmt,
  IN ib_net16_t  lid,
  OUT uint8_t *  const p_lmc );
+
+
+/*
+ * A few auxiliary macros for logging
+ */
+
+#define EXPECTING_ERRORS_START "[[ = Expecting Errors - START = "
+#define EXPECTING_ERRORS_END   "   = Expecting Errors  -  END = ]]"
+
  #endif /* _OSMTEST_H_ */
Index: osmtest/osmtest.c
===
--- osmtest/osmtest.c   (revision 9552)
+++ osmtest/osmtest.c   (working copy)
@@ -552,6 +552,7 @@ osmtest_init( IN osmtest_t * const p_osm
  osm_log( &p_osmt->log, OSM_LOG_ERROR,
   "osmtest_init: ERR 0001: "
   "Unable to allocate vendor object" );
+status = IB_ERROR;
  goto Exit;
}

@@ -1817,6 +1818,11 @@ osmtest_wrong_sm_key_ignored( IN osmtest
  osm_log( &p_osmt->log, OSM_LOG_ERROR,
   "osmtest_wrong_sm_key_ignored: ERR 0011: "
   "Did not get a timeout but got (%s)\n", ib_get_err_str( status ) 
);
+if ( status == IB_SUCCESS )
+{
+  /* assign some error value to status, since IB_SUCCESS is a bad rc */
+  status = IB_ERROR;
+}
  goto Exit;
}
else
@@ -5448,14 +5454,23 @@ osmtest_validate_against_db( IN osmtest_

memset( &context, 0, sizeof( context ) );
memset( &request, 0, sizeof( request ) );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+   "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
status = osmtest_get_multipath_rec( p_osmt, &request, &context );
+  if( status != IB_SUCCESS )
+  {
+ osm_log( &p_osmt->log, OSM_LOG_ERROR,
+  "osmtest_get_multipath_rec: "
+  "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+   "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
if( status == IB_SUCCESS )
-goto Exit;
-  else
{
-osm_log( &p_osmt->log, OSM_LOG_ERROR,
- "osmtest_get_multipath_rec: "
- "IS EXPECTED ERROR \n");
+status = IB_ERROR;
+goto Exit;
}

memset( &context, 0, sizeof( context ) );
@@ -5463,14 +5478,23 @@ osmtest_validate_against_db( IN osmtest_
request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT;
request.sgid_count = 1;
ib_gid_set_default( &request.gids[0], portguid );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+   "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
status = osmtest_get_multipath_rec( p_osmt, &request, &context );
-  if( status == IB_SUCCESS )
-goto Exit;
-  else
+  if( status != IB_SUCCESS )
{
-osm_log( &p_osmt->log, OSM_LOG_ERROR,
- "osmtest_get_multipath_rec: "
- "IS EXPECTED ERROR \n");
+ osm_log( &p_osmt->log, OSM_LOG_ERROR,
+  "osmtest_get_multipath_rec: "
+  "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+   "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
+  if( status == IB_SUCCESS )
+  {
+status = IB_ERROR;
+goto Exit;
}

memset( &context, 0, sizeof( context ) );
@@ -5482,14 +5506,23 @@ osmtest_validate_against_db( IN osmtest_
/* Set IPoIB broadcast MGID */
request.gids[1].unicast.prefix = CL_HTON64(0xff12401bULL);
request.gids[1].unicast.interface_id = CL_HTON64(0xULL);
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+   "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
status = osmtest_get_multipath_rec( p_osmt, &request, &context );
+  if( status != IB_SUCCESS )
+  {
+ osm_log( &p_osmt->log, OSM_LOG_ERROR,
+  "osmtest_get_multipath_rec: "
+  "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+   "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
if( status == IB_SUCCESS )
-goto Exit;
-  else
{
-osm_log( &p_osmt->log, OSM_LOG_ERROR,
- "osmtest_get_multipath_rec: "
- "IS EXPECTED ERROR \n");
+status = IB_ERROR;
+goto Exit;
}

memset( &context, 0, sizeof( context ) );
@@ -5500,14

Re: [openib-general] [PATCH] mthca: fix lid used for sending traps

2006-09-19 Thread Michael S. Tsirkin

Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] mthca: fix lid used for sending traps
> 
>  > I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or
>  > 2.6.18.1.
> 
> Makes sense -- I'll try to get this into 2.6.18, since it's a
> one-liner and fixes a regression from 2.6.17.

Arrr!
http://lkml.org/lkml/2006/9/20/2

Missed 2.6.18 by a small margin. Gar! Acked for 2.6.18.1?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] gen2_basic patch 5/10: select a valid port number

2006-09-19 Thread Robert Walsh

> Its easy to get linux running on a switch, so why not? You just
> need to write a low level driver that cn send/receve MADs.
> We did run a gen1 port on a switch at some point, and someone might want to
> do it again.

OK - that's a fine project idea, but I'm not about to start coding it up 
any time soon :-)

In any case, if we're going to insist that this test run on a 
hypothetical switch gen2 distribution, then the "choose a random port" 
code needs to check if it's running on a CA or router versus a switch 
and choose the port range appropriately.

Regards,
  Robert.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Completion callback /teardown race

2006-09-19 Thread Michael S. Tsirkin

Quoting r. Fabian Tillier <[EMAIL PROTECTED]>:
> > There are some differences in HCA behaviour with regard to
> > ib_req_notify_cq.  Mellanox HCAs will provide a callback/interrupt if
> > the CQ is not empty at this point (in which case the poll_cq's after the
> > notify are optional).
> >
> > However the behaviour defined in the IBTA spec indicates that
> > ib_req_notify_cq will cause a callback/interrupt only on the next CQE
> > which arrives, hence to be portable the poll_cq loop after
> > ib_req_notify_cq is necessary to cover any CQEs which arrived between
> > the prior poll and the ib_req_notify_cq.
> 
> I remember a while ago a mention that the behavior of the Mellanox
> HCAs could be controlled in the firmware, so that they would follow
> the IBTA spec defined behavior.

There's a mistake here. Mellanox HCAs will generate an event upon
ib_req_notify_cq only if new completions has arrived after the previous event
has been reported.

AFAIK this is IBTA spec compliant.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] gen2_basic patch 5/10: select a valid port number

2006-09-19 Thread Michael S. Tsirkin

Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: Re: gen2_basic patch 5/10: select a valid port number
> 
> On Tue, 2006-09-19 at 21:16, Robert Walsh wrote:
> > Hal Rosenstock wrote:
> > > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
> > >> gen2_basic - select a valid port number
> > >>
> > >> Port numbers start at 1, not 0.
> > > 
> > > True for CA and routers but not switches.
> > 
> > Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
> > HCA-centric.
> 
> Yes, that appears to be the scope but I'm not 100% sure.

Its easy to get linux running on a switch, so why not? You just
need to write a low level driver that cn send/receve MADs.
We did run a gen1 port on a switch at some point, and someone might want to
do it again.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries

2006-09-19 Thread Michael S. Tsirkin

Quoting r. Michael S. Tsirkin <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> 
> Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> > Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path 
> > queries
> > 
> > I didn't really read the new patch before... anyway:
> > 
> > Why have you changed from the approach of just using the broadcast
> > group's MTU?  As far as I can see, the issue being addressed here is
> > purely theoretical anyway, but with the approach of taking the current
> > device MTU, you now have to flush all the paths if the configured MTU
> > changes, and you have to have a big switch in path_rec_start().
> > 
> >  - R.
> > 
> 
> I'm not sure priv->broadcast is always initialized when we start
> a path record query. Is there a reason why it is?

It also seemed kind of nice to be able to control the path MTU
from dev->mtu - and I don't think path flush on mtu change is an issue
from the performance POV.

What do you think?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries

2006-09-19 Thread Michael S. Tsirkin

Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> 
> Seems OK from an anal spec compliance point of view, but I don't
> understand this:
> 
>  > This breaks IPoIB on networks with SM Tavor quirk activates.
> 
> Even if opensm returns a path record with a lower MTU, the underlying
> links still have a 2K mtu really, so nothing breaks.  IPoIB is just
> doing something naughty by ignoring the MTU in the path record.  So
> what breaks really?

Maybe "breaks" was too strong a word. Let's change that to
"This makes IPoIB behave in a naughty way on networks with SM Tavor quirk
active" :)

> (not to mention the fact that the "Tavor quirk" hasn't been accepted
> into OpenSM yet anyway)

AFAIK it has been accepted.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] gen2_basic patch 5/10: select a valid port number

2006-09-19 Thread Hal Rosenstock

On Tue, 2006-09-19 at 21:16, Robert Walsh wrote:
> Hal Rosenstock wrote:
> > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
> >> gen2_basic - select a valid port number
> >>
> >> Port numbers start at 1, not 0.
> > 
> > True for CA and routers but not switches.
> 
> Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
> HCA-centric.

Yes, that appears to be the scope but I'm not 100% sure.

-- Hal

> Regards,
>  Robert.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries

2006-09-19 Thread Michael S. Tsirkin

Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> 
> I didn't really read the new patch before... anyway:
> 
> Why have you changed from the approach of just using the broadcast
> group's MTU?  As far as I can see, the issue being addressed here is
> purely theoretical anyway, but with the approach of taking the current
> device MTU, you now have to flush all the paths if the configured MTU
> changes, and you have to have a big switch in path_rec_start().
> 
>  - R.
> 

I'm not sure priv->broadcast is always initialized when we start
a path record query. Is there a reason why it is?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] gen2_basic patch 5/10: select a valid port number

2006-09-19 Thread Robert Walsh

Hal Rosenstock wrote:
> On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
>> gen2_basic - select a valid port number
>>
>> Port numbers start at 1, not 0.
> 
> True for CA and routers but not switches.

Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
HCA-centric.

Regards,
 Robert.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] gen2_basic patch 5/10: select a valid port number

2006-09-19 Thread Hal Rosenstock

On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
> gen2_basic - select a valid port number
> 
> Port numbers start at 1, not 0.

True for CA and routers but not switches.

> Signed-off by: Robert Walsh <[EMAIL PROTECTED]>
> 
> diff -rNu a/gen2_basic/test_poll_post.c b/gen2_basic/test_poll_post.c
> --- a/gen2_basic/test_poll_post.c 2006-09-13 19:09:47.410808000 -0700
> +++ b/gen2_basic/test_poll_post.c 2006-08-14 14:17:03.705821000 -0700
> @@ -283,7 +283,7 @@
>   .dlid  = VL_range(rand_gen, 1, 0x),
>   .sl= VL_range(rand_gen, 0, 15),
>   .src_path_bits = VL_range(rand_gen, 0, 0x8f),
> - .port_num  = VL_random(rand_gen, 
> device_attr.phys_port_cnt),
> + .port_num  = VL_range(rand_gen, 1, 
> device_attr.phys_port_cnt),
>   .static_rate   = get_static_rate(1, rand_gen),
>   .grh   = {
>   .traffic_class = VL_range(rand_gen, 1, 0xff),
> 
> __
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] gen2_basic patch 6/10: handle case where max_sge > 100

2006-09-19 Thread Robert Walsh

gen2_basic - handle case where max_sge > 100

When choosing an illegal number of maximum SGEs, handle the case where
the device allows more than 100 SGEs.

Signed-off by: Robert Walsh <[EMAIL PROTECTED]>

diff -rNu a/gen2_basic/test_poll_post.c b/gen2_basic/test_poll_post.c
--- a/gen2_basic/test_poll_post.c   2006-09-13 19:11:50.911178000 -0700
+++ b/gen2_basic/test_poll_post.c   2006-08-14 14:17:03.705821000 -0700
@@ -751,7 +751,7 @@
CHECK_VALUE("my_modify_qp", rc, 0, goto 
cleanup);
}
 
-   num_sge = VL_range(rand_gen, 
srq_init_attr.attr.max_sge + 1, 100);
+   num_sge = VL_range(rand_gen, 
srq_init_attr.attr.max_sge + 1, srq_init_attr.attr.max_sge + 100);
wr = my_create_rr_desc(rand_gen, num_sge, 
num_wr);
CHECK_PTR("my_create_rr_desc", wr, goto 
cleanup);

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] gen2_basic patch 5/10: select a valid port number

2006-09-19 Thread Robert Walsh

gen2_basic - select a valid port number

Port numbers start at 1, not 0.

Signed-off by: Robert Walsh <[EMAIL PROTECTED]>

diff -rNu a/gen2_basic/test_poll_post.c b/gen2_basic/test_poll_post.c
--- a/gen2_basic/test_poll_post.c   2006-09-13 19:09:47.410808000 -0700
+++ b/gen2_basic/test_poll_post.c   2006-08-14 14:17:03.705821000 -0700
@@ -283,7 +283,7 @@
.dlid  = VL_range(rand_gen, 1, 0x),
.sl= VL_range(rand_gen, 0, 15),
.src_path_bits = VL_range(rand_gen, 0, 0x8f),
-   .port_num  = VL_random(rand_gen, 
device_attr.phys_port_cnt),
+   .port_num  = VL_range(rand_gen, 1, 
device_attr.phys_port_cnt),
.static_rate   = get_static_rate(1, rand_gen),
.grh   = {
.traffic_class = VL_range(rand_gen, 1, 0xff),
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] gen2_basic patch 4/10: make sure the DLID is valid

2006-09-19 Thread Robert Walsh

gen2_basic - make sure the DLID is valid

For valid address handles, make sure the DLID is not a multicast LID.
You can't modify a QP to use a multicast DLID using ib_modify_qp().

Signed-off by: Robert Walsh <[EMAIL PROTECTED]>

diff -rNu a/gen2_basic/test_cq.c b/gen2_basic/test_cq.c
--- a/gen2_basic/test_cq.c  2006-09-13 19:07:11.102264000 -0700
+++ b/gen2_basic/test_cq.c  2006-08-14 14:17:17.352167000 -0700
@@ -183,7 +183,7 @@
.rq_psn = 0,
.ah_attr= {
.is_global  = 0,
-   .dlid   = VL_random(rand_gen, 0x),
+   .dlid   = VL_range(rand_gen, 1, 0xBFFF),
.sl = 0,
.src_path_bits  = 0,
.port_num   = VL_range(rand_gen, 1, 
device_attr.phys_port_cnt)
diff -rNu a/gen2_basic/test_poll_post.c b/gen2_basic/test_poll_post.c
--- a/gen2_basic/test_poll_post.c   2006-09-13 19:07:12.325046000 -0700
+++ b/gen2_basic/test_poll_post.c   2006-08-14 14:17:03.705821000 -0700
@@ -196,7 +196,7 @@
.min_rnr_timer  = 12,
.ah_attr= {
.is_global  = 0,
-   .dlid   = VL_random(rand_gen, 0x),
+   .dlid   = VL_range(rand_gen, 1, 0xBFFF),
.sl = 0,
.src_path_bits  = 0,
.port_num   = port
diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c
--- a/gen2_basic/test_qp.c  2006-09-13 19:07:11.118256000 -0700
+++ b/gen2_basic/test_qp.c  2006-08-14 14:16:57.911621000 -0700
@@ -1185,7 +1188,7 @@
.rq_psn = 0,
.ah_attr= {
.is_global  = 0,
-   .dlid   = VL_random(rand_gen, 0x),
+   .dlid   = VL_range(rand_gen, 1, 0xBFFF),
.sl = 0,
.src_path_bits  = 0,
.port_num   = VL_range(rand_gen, 1, 
device_attr.phys_port_cnt)
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] gen2_basic patch 3/10: fix is_global settings for AH attributes

2006-09-19 Thread Robert Walsh

gen2_basic - fix is_global settings for AH attributes

For valid address handles, the is_global field of the AH attribute should
not be a random number if the dlid is a multicast LID.

Signed-off by: Robert Walsh <[EMAIL PROTECTED]>

diff -rNu a/gen2_basic/main.h b/gen2_basic/main.h
--- a/gen2_basic/main.h 2006-01-08 10:59:26.320271000 -0800
+++ b/gen2_basic/main.h 2006-09-13 18:41:24.169155000 -0700
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2006 QLogic Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -102,6 +103,10 @@
IN  int valid_value,
IN  struct VL_random_t *rand_gen);
 
+uint8_t get_is_global(
+   IN  struct VL_random_t *rand_gen,
+   IN  uint16_t dlid);
+
 int my_modify_qp(
IN  struct VL_random_t *rand_gen,
IN  struct ibv_qp *qp,
diff -rNu a/gen2_basic/test_av.c b/gen2_basic/test_av.c
--- a/gen2_basic/test_av.c  2006-07-26 17:46:51.707754000 -0700
+++ b/gen2_basic/test_av.c  2006-08-14 14:16:43.790758000 -0700
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2006 QLogic Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -71,6 +72,16 @@
 return static_rate;
 }
 
+uint8_t get_is_global(
+   IN  struct VL_random_t *rand_gen,
+   IN  uint16_t dlid)
+{
+   if (dlid >= 0xC000 && dlid < 0x)
+   return 1;
+
+   return VL_random(rand_gen, 2);
+}
+
 /* ibv_create_ah ibv_destroy_ah */
 int av_1(
IN  struct config_t *config,
@@ -105,7 +116,6 @@
{
struct ibv_ah   *av = NULL;
struct ibv_ah_attr av_attr  = {
-   .is_global = VL_random(rand_gen, 2),
.dlid  = VL_range(rand_gen, 1, 0x),
.sl= VL_range(rand_gen, 0, 15),
.src_path_bits = VL_range(rand_gen, 0, 0x8f),
@@ -117,6 +127,7 @@
.hop_limit = VL_range(rand_gen, 1, 0xff),
}
};
+   av_attr.is_global = get_is_global(rand_gen, av_attr.dlid);
av = ibv_create_ah(pd, &av_attr);
CHECK_PTR("ibv_create_ah", av, goto cleanup);

@@ -130,7 +141,6 @@
{
struct ibv_ah   *av = NULL;
struct ibv_ah_attr av_attr  = {
-   .is_global = VL_random(rand_gen, 2),
.dlid  = VL_range(rand_gen, 1, 0x),
.sl= VL_range(rand_gen, 0, 15),
.src_path_bits = VL_range(rand_gen, 0, 0x8f),
@@ -142,6 +152,7 @@
.hop_limit = VL_range(rand_gen, 1, 0xff),
}
};
+   av_attr.is_global = get_is_global(rand_gen, av_attr.dlid);
av = ibv_create_ah(pd, &av_attr);
if (av != NULL) {
FAILED;
@@ -152,7 +163,6 @@
{
struct ibv_ah   *av = NULL;
struct ibv_ah_attr av_attr  = {
-   .is_global = VL_random(rand_gen, 2),
.dlid  = VL_range(rand_gen, 1, 0x),
.sl= VL_range(rand_gen, 0, 15),
.src_path_bits = VL_range(rand_gen, 0, 0x8f),
@@ -164,6 +174,7 @@
.hop_limit = VL_range(rand_gen, 1, 0xff),
}
};
+   av_attr.is_global = get_is_global(rand_gen, av_attr.dlid);
av = ibv_create_ah(pd, &av_attr);
if (av != NULL) {
FAILED;
@@ -174,7 +185,6 @@
{
struct ibv_ah   *av = NULL;
struct ibv_ah_attr av_attr  = {
-   .is_global = VL_random(rand_gen, 2),
.dlid  = VL_range(rand_gen, 1, 0x),
.sl= VL_range(rand_gen, 0, 15),
.src_path_bits = VL_range(rand_gen, 0, 0x8f),
@@ -186,6 +196,7 @@
.hop_limit = VL_range(rand_gen, 1, 0xff),
}
};
+   av_attr.is_global = get_is_global(rand_gen, av_attr.dlid);
av = ibv_create_ah(pd, &av_attr);
if (av != NULL) {
FAILED;
@@ -199,7 +210,6 @@
{
struct ibv_ah   *av = NULL;
struct ibv_ah_attr av_attr  = {
-   .is_global = VL_random(rand_gen, 2),

[openib-general] gen2_basic patch 2/10: fix up some compiler warnings

2006-09-19 Thread Robert Walsh

gen2_basic - fix up some compiler warnings

Create a new CHECK_PVALUE macro for checking pointers and use it where
appropriate.  This makes a bunch of compiler printf warnings go away.

Signed-off by: Robert Walsh <[EMAIL PROTECTED]>

diff -rNu a/gen2_basic/test_cq.c b/gen2_basic/test_cq.c
--- a/gen2_basic/test_cq.c  2006-07-27 13:42:44.857603000 -0700
+++ b/gen2_basic/test_cq.c  2006-08-14 14:17:17.352167000 -0700
@@ -446,13 +447,13 @@
TEST_CASE(("illegal comp vector"));
 
cq = ibv_create_cq(ib_cont, size, NULL, NULL, -1);
-   CHECK_VALUE("create cq with invalid comp vector", cq, NULL, goto 
cleanup);
+   CHECK_PVALUE("create cq with invalid comp vector", cq, NULL, goto 
cleanup);
 
cq = ibv_create_cq(ib_cont, size, NULL, NULL, 
ib_cont->num_comp_vectors);
-   CHECK_VALUE("create cq with invalid comp vector", cq, NULL, goto 
cleanup);
+   CHECK_PVALUE("create cq with invalid comp vector", cq, NULL, goto 
cleanup);
 
cq = ibv_create_cq(ib_cont, size, NULL, NULL, VL_range(rand_gen, 
ib_cont->num_comp_vectors + 1, 0xFFF));
-   CHECK_VALUE("create cq with invalid comp vector", cq, NULL, goto 
cleanup);
+   CHECK_PVALUE("create cq with invalid comp vector", cq, NULL, goto 
cleanup);
 
PASSED;

@@ -499,7 +500,7 @@
size = VL_random(rand_gen, device_attr.max_cqe);
 
cq = ibv_create_cq(ib_cont, size, NULL, channel, 0);
-   CHECK_VALUE("create cq with invalid channel", cq, NULL, goto cleanup);
+   CHECK_PVALUE("create cq with invalid channel", cq, NULL, goto cleanup);
 
PASSED;

diff -rNu a/gen2_basic/test_poll_post.c b/gen2_basic/test_poll_post.c
--- a/gen2_basic/test_poll_post.c   2006-07-26 17:46:51.753706000 -0700
+++ b/gen2_basic/test_poll_post.c   2006-08-14 14:17:03.705821000 -0700
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2006 QLogic Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -365,10 +366,10 @@
CHECK_VALUE("ibv_post_send", rc, -1, 
goto cleanup);
 
for (i = 0; i < num_wr - 
(attr.cap.max_send_wr % num_wr) - 1; ++i) {
-   CHECK_VALUE("next ptr", 
bad_wr[i].next, &bad_wr[i + 1], goto cleanup);
+   CHECK_PVALUE("next ptr", 
bad_wr[i].next, &bad_wr[i + 1], goto cleanup);
}
 
-   CHECK_VALUE("last ptr", bad_wr[num_wr - 
(attr.cap.max_send_wr % num_wr) - 1].next, NULL, goto cleanup);
+   CHECK_PVALUE("last ptr", bad_wr[num_wr 
- (attr.cap.max_send_wr % num_wr) - 1].next, NULL, goto cleanup);
  
rc = ibv_destroy_qp(qp);
CHECK_VALUE("ibv_destroy_qp", rc, 0, 
goto cleanup);
@@ -536,10 +537,10 @@
CHECK_VALUE("ibv_post_recv", rc, -1, goto 
cleanup);
 
for (i = 0; i < num_wr - (attr.cap.max_recv_wr 
% num_wr) - 1; ++i) {
-   CHECK_VALUE("next ptr", bad_wr[i].next, 
&bad_wr[i + 1], goto cleanup);
+   CHECK_PVALUE("next ptr", 
bad_wr[i].next, &bad_wr[i + 1], goto cleanup);
}
 
-   CHECK_VALUE("last ptr", bad_wr[num_wr - 
(attr.cap.max_recv_wr % num_wr) - 1].next, NULL, goto cleanup);
+   CHECK_PVALUE("last ptr", bad_wr[num_wr - 
(attr.cap.max_recv_wr % num_wr) - 1].next, NULL, goto cleanup);

rc = ibv_destroy_qp(qp);
CHECK_VALUE("ibv_destroy_qp", rc, 0, goto 
cleanup);
diff -rNu gen2_basic/test_qp.c b/gen2_basic/test_qp.c
--- a/gen2_basic/test_qp.c  2006-04-25 11:40:30.668369000 -0700
+++ b/gen2_basic/test_qp.c  2006-08-14 14:16:57.911621000 -0700
@@ -1692,12 +1695,12 @@
CHECK_VALUE("max_send_sge", query_init_attr.cap.max_send_sge, 
attr.cap.max_send_sge, goto cleanup);
CHECK_VALUE("max_send_wr", query_init_attr.cap.max_send_wr, 
attr.cap.max_send_wr, goto cleanup);
 
-   CHECK_VALUE("qp_context", query_init_attr.qp_context, 
attr.qp_context, goto cleanup);
+   CHECK_PVALUE("qp_context", query_init_attr.qp_context, 
attr.qp_context, goto cleanup);
CHECK_VALUE("qp_type", query_init_attr.qp_type, attr.qp_type, 
goto cleanup);
-   CHECK_VALUE("recv_cq", query_init_attr.recv_cq, attr.recv_cq, 
goto cleanup);
-   CHECK_VALUE("send_cq", query_init_attr.send_cq, attr.send_cq, 
goto cleanup);
+   CHECK_PVALUE("recv

[openib-general] gen2_basic patch 1/10: fix some minor typos

2006-09-19 Thread Robert Walsh

gen2_basic - fix some minor typos

Signed-off by: Robert Walsh <[EMAIL PROTECTED]>

diff -rNu a/gen2_basic/test_cq.c b/gen2_basic/test_cq.c
--- a/gen2_basic/test_cq.c  2006-07-27 13:42:44.857603000 -0700
+++ b/gen2_basic/test_cq.c  2006-08-14 14:17:17.352167000 -0700
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2006 QLogic Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -466,7 +467,7 @@
return test_result;
 }
 
-/* invallid comp channel */
+/* invalid comp channel */
 int cq_6(
IN  struct config_t *config,
IN  struct VL_random_t *rand_gen)
@@ -479,9 +480,9 @@
int rc;
int test_result = -1;
 
-   VL_MISC_TRACE1(("cq_6 - invallid comp channel"));
+   VL_MISC_TRACE1(("cq_6 - invalid comp channel"));
 
-   TEST_CASE(("invallid comp channel"));
+   TEST_CASE(("invalid comp channel"));
 
ib_cont = open_hca(config);
CHECK_PTR("open_hca", ib_cont, goto cleanup);
diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c
--- a/gen2_basic/test_qp.c  (revision 9473)
+++ b/gen2_basic/test_qp.c  (working copy)
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2006 QLogic Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -1287,7 +1288,7 @@
break;
case 25218:
case 25204:
-   num_qp = 15872; /* Found in expiraments to be the max for 
memfree per process */
+   num_qp = 15872; /* Found in experiments to be the max for 
memfree per process */
break;
}
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] gen2_basic patches

2006-09-19 Thread Robert Walsh

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

We've got some patches to gen2_basic to fix some problems with the test
suite.  Some are trivial (fix typos, etc.) and some are more serious
(handle max_qp counts correctly, etc.)  I'm going to be sending them out
piecemeal as we review them internally, and I'll make sure to send them
out in sequence (i.e. in the order they should be applied), so don't be
surprised to hear nothing for a day or two, then see some more patches ;-)

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRRCGnvzvnpzTd9fxAQKcegf/UtzQJiZFPRkcd4ZvBTHbNUdVK2NcQNkw
pAu/Mh2xRDboQ28btoJJbrERZ9VUpIlnyc8rQ2wRmDbkCQL/7vpDZkLK5XRYXZfg
DrwiXimRd8NHLfKVR/wbrR6QtuTDbIUpMWSpCFxkOoAYmKSRusjEoLK/Yf3gXggt
NsxoomFKSEPV3W2tgEn8Aanq0ZzfTPmBhFNbHPOrpyfb/tWFVc+IAQF/QFSai1Tm
PSjagRxTHY1eHCBHC7w1WZc7OOrSOBeKev5tzzcFO2PpzQ/3fAztcKRfDJ0UakIi
xvMOO+C0qM1EUowIRW+ymCoeFF5SXR6p2fuFeZ+vF6S6Sf9X1o7PLg==
=YULT
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Completion callback /teardown race

2006-09-19 Thread Fabian Tillier

On 9/19/06, Rimmer, Todd <[EMAIL PROTECTED]> wrote:
> > From: Eric Barton
> > Sent: Tuesday, September 19, 2006 2:14 PM
> > To: openib-general@openib.org
> > Subject: [openib-general] Completion callback /teardown race
> >
> > All the CQ callback does is wake a thread to poll the queue.  This
> > effectively
> > keeps polling completions out of the CQ until it is empty. Then it
> > calls
> > ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time.
> >
> > If this last call to ib_poll_cq() finds something, it repeats the
> > whole process
> > - but can I be guaranteed another CQ callback in this case or is it
> > indeterminate?
> >
> The recommended algorithm would be:
>
> poll_cq until empty
> ib_req_notify_cq
> poll_cq until empty

Note that if you are going to poll after ib_req_notify_cq, you can
simplify the above algorithm and just do:

ib_req_notify_cq
poll_cq until empty

However, such an algorithm will result in extra CQ events on Mellanox
HCAs.  On HCAs where the new CQ event is only generated for new CQEs
it works just as well as the opposite, which works only on Mellanox
HCAs:

poll_cq until empty
ib_req_notify_cq

> There are some differences in HCA behaviour with regard to
> ib_req_notify_cq.  Mellanox HCAs will provide a callback/interrupt if
> the CQ is not empty at this point (in which case the poll_cq's after the
> notify are optional).
>
> However the behaviour defined in the IBTA spec indicates that
> ib_req_notify_cq will cause a callback/interrupt only on the next CQE
> which arrives, hence to be portable the poll_cq loop after
> ib_req_notify_cq is necessary to cover any CQEs which arrived between
> the prior poll and the ib_req_notify_cq.

I remember a while ago a mention that the behavior of the Mellanox
HCAs could be controlled in the firmware, so that they would follow
the IBTA spec defined behavior.

I don't know what the impact on performance would be if such a change
were made.  Perhaps someone from Mellanox can confirm/deny the HCAs
ability to implement the IBA spec behavior, and quantify the effects.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Completion callback /teardown race

2006-09-19 Thread Roland Dreier

Todd> An approach we implemented a few years ago in our
Todd> proprietary stack was a new verb (in addition to poll_cq and
Todd> notify_req): poll_and_notify (we called it
Todd> iba_poll_and_rearm).

Makes sense but it doesn't actually help for NAPI for ipoib (to be
fair I haven't described the issue there yet).

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Completion callback /teardown race

2006-09-19 Thread Rimmer, Todd

> From: Roland Dreier
> Sent: Tuesday, September 19, 2006 5:17 PM
> To: Eric Barton
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] Completion callback /teardown race
> 
> 
> I'll have more to say on this in the context of IPoIB and NAPI
> shortly, since I've been thinking about this issue myself.
> 
> The ipath driver implements only the weaker semantics guaranteed by
> the IBA spec -- ie an event is generated if a completion is added
> after the request for notification.  And I don't know what ehca and
> amso1100 implement to be honest.
> 
> (The Mellanox semantics are conforming though, since it's not
> well-defined exactly when a completion is added to a CQ if no one
> looks...)

An approach we implemented a few years ago in our proprietary stack was
a new verb (in addition to poll_cq and notify_req): poll_and_notify (we
called it iba_poll_and_rearm).

This verb always did a poll_cq, but if the CQ was drained it then did a
rearm of the CQ.  The return value from the call indicated what the next
step for the caller should be:
- SUCCESS - call poll_and_notify again (CQE returned)
- COMPLETED - nothing to do after this CQE (CQE returned, rearmed, no
need to poll anymore)
- POLL_NEEDED - loop on poll (CQE returned, rearmed, need to poll_cq til
empty)
- NOT_DONE - nothing more to do, no CQE (no CQE returned, rearmed, CQ
still empty, no need to poll anymore)
- error (invalid call, etc)

callback would loop on poll_and_notify as long as SUCCESS was returned.
afterwhich if POLL_NEEDED had been returned, it would loop on poll_cq

This approach provided 2 advantages:
1. for performance an extra 1-2 calls into the HCA driver per callback
were avoided.  The win here was saving some spin locks (in high CQE rate
drivers like IPoIB this was noticible).
2. on HCAs such as mellanox, POLL_NEEDED was never returned and the
caller never did unnecessary polls, however the caller and API was also
able to handle HCAs which did not have the mellanox semantics.

Todd Rimmer


 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Fluent and OFED

2006-09-19 Thread Barry Evans









Hello,

 

Has anyone had any luck getting Fluent 6.2 to cooperate with
OFED? I think I’ve got all the libraries pointing to the right place, but
I’m ending up with the dreaded: “[1] Abort: [0] Abort: mpirun:
executable version 1 does not match our version 3.” from mvapich. Ugh.

 

Cheers,

Barry






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Completion callback /teardown race

2006-09-19 Thread Roland Dreier

Eric> If this last call to ib_poll_cq() finds something, it
Eric> repeats the whole process - but can I be guaranteed another
Eric> CQ callback in this case or is it indeterminate?

In general there is an unavoidable race, since you don't know whether
the new completion you find in the CQ was generated before or after
you requested notification.  So with the completion semantics as
defined in the IBA spec, you have the choice of two poisons:

 1) Don't poll after you request notification.  Then you run the risk
of a completion being added after your last poll but before you
requested notification.  If another completion never occurs, then
you're stuck forever.

 2) Poll after you request notification.  Then you run the risk of
having a completion added after your request for notification but
before your final poll.  This means another completion event will
be pending, but you will likely drain the CQ before you take the event.

However, Mellanox HCAs implement stronger semantics: they generate an
event if the CQ is not empty at the time notification is requested,
which closes the race between draining the CQ and requesting
notification.  This means *for Mellanox HCAs only* it is safe to do:

  completion_handler():
poll CQ until empty
request notification on CQ

with no additional poll after the request for notification.

I'll have more to say on this in the context of IPoIB and NAPI
shortly, since I've been thinking about this issue myself.

The ipath driver implements only the weaker semantics guaranteed by
the IBA spec -- ie an event is generated if a completion is added
after the request for notification.  And I don't know what ehca and
amso1100 implement to be honest.

(The Mellanox semantics are conforming though, since it's not
well-defined exactly when a completion is added to a CQ if no one looks...)

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] IB diagnostics problems (OFED-1.1-rc5)

2006-09-19 Thread Hal Rosenstock

Hi Mirko,

On Mon, 2006-09-18 at 08:56, Mirko Benz wrote:
> Hi Hal,
> 
> Please prepare the bugzilla entry.

I entered the following:
http://openib.org/bugzilla/show_bug.cgi?id=238
http://openib.org/bugzilla/show_bug.cgi?id=239

Feel free to annotate it.

-- Hal

> It is not critical -- I just think it is not convenient for an end user.
> 
> Regards,
> Mirko
> 
> Hal Rosenstock schrieb:
> > Hi again Mirko,
> >
> > On Mon, 2006-09-18 at 07:20, Mirko Benz wrote:
> >   
> >> Hi Hal,
> >>
> >> This was a default/build all OFED install. Either we should place these 
> >> tools under ../ofed/sbin or make it work for every body.
> >> 
> >
> > The issue with making it work for everyone is that there's a chicken and
> > egg problem in that when the tools are built and installed, one doesn't
> > know how udev will be configured for umad. I agree that since the
> > default is to run as root, these should be in sbin rather than bin. Can
> > you file a bugzilla report for this (or do you want me to do it on your
> > behalf) ? Is this critical for OFED 1.1 ?
> >
> >   
> >>  At least a error message that umad access failed would be required.
> >> 
> > Those are scripts and the errors are being returned from the lower level
> > programs invoked but not by the scripts.
> >
> > Would you please file a bug for this as well (or let me know whether I
> > should do this) ? 
> >
> > Thanks.
> >
> > -- Hal
> >
> >   
> >> Regards,
> >> Mirko
> >>
> >> Hal Rosenstock schrieb:
> >> 
> >>> Hi Mirko,
> >>>
> >>> On Mon, 2006-09-18 at 06:59, Mirko Benz wrote:
> >>>   
> >>>   
>  Hello,
> 
>  We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone).
>  Some IB diagnostics tools e.g. ibhosts and ibswitches (located under 
>  .../ofed/bin/)
>  do not work with a normal user account -- no output given. It works as 
>  root though.
>  
>  
> >>> It depends on how you have udev access for umad setup. With the default
> >>> setup for IB, root is required as these diagnostics send SMPs which
> >>> require umad access which is limited to root.
> >>>
> >>> -- Hal
> >>>
> >>>   
> >>>   
>  Regards,
>  Mirko
> 
>  ___
>  openib-general mailing list
>  openib-general@openib.org
>  http://openib.org/mailman/listinfo/openib-general
> 
>  To unsubscribe, please visit 
>  http://openib.org/mailman/listinfo/openib-general
> 
>  
>  
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries

2006-09-19 Thread Roland Dreier

I didn't really read the new patch before... anyway:

Why have you changed from the approach of just using the broadcast
group's MTU?  As far as I can see, the issue being addressed here is
purely theoretical anyway, but with the approach of taking the current
device MTU, you now have to flush all the paths if the configured MTU
changes, and you have to have a big switch in path_rec_start().

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] ipoib mcast restart

2006-09-19 Thread Roland Dreier

OK, I applied this to for-2.6.19, although the patch was line-wrapped,
didn't have a usable subject, etc  So...


I merge > 100 patches every kernel release.  If I have to spend an
extra 5 minutes for each one fixing a patch or pulling it out of svn,
then I end up burning an extra 9 hours of stupid work.  If 20+ people
who contribute patches sent me clean patches, then everyone will be
happier because I'll be able to merge things quicker and focus on
productive work.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Completion callback /teardown race

2006-09-19 Thread Rimmer, Todd

> From: Eric Barton
> Sent: Tuesday, September 19, 2006 2:14 PM
> To: openib-general@openib.org
> Subject: [openib-general] Completion callback /teardown race
> 
> 
> 
> All the CQ callback does is wake a thread to poll the queue.  This
> effectively
> keeps polling completions out of the CQ until it is empty. Then it
calls
> ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time.
> 
> If this last call to ib_poll_cq() finds something, it repeats the
whole
> process
> - but can I be guaranteed another CQ callback in this case or is it
> indeterminate?
> 
The recommended algorithm would be:

poll_cq until empty
ib_req_notify_cq
poll_cq until empty

Once ib_req_notify_cq is called, its possible for an additional callback
to race with the poll_cq's which follow.

There are some differences in HCA behaviour with regard to
ib_req_notify_cq.  Mellanox HCAs will provide a callback/interrupt if
the CQ is not empty at this point (in which case the poll_cq's after the
notify are optional).

However the behaviour defined in the IBTA spec indicates that
ib_req_notify_cq will cause a callback/interrupt only on the next CQE
which arrives, hence to be portable the poll_cq loop after
ib_req_notify_cq is necessary to cover any CQEs which arrived between
the prior poll and the ib_req_notify_cq.

Within a given callback invokation, there is no reason to call notify
more than once.

Todd Rimmer

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [GIT PULL] please pull infiniband.git (one-liner fix for 2.6.18)

2006-09-19 Thread Roland Dreier

Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This contains another one-liner that fixes a regression from 2.6.17:

Jack Morgenstein:
  IB/mthca: Fix lid used for sending traps

 drivers/infiniband/hw/mthca/mthca_mad.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c 
b/drivers/infiniband/hw/mthca/mthca_mad.c
index d9bc030..45e106f 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -119,7 +119,7 @@ static void smp_snoop(struct ib_device *
 
mthca_update_rate(to_mdev(ibdev), port_num);
update_sm_ah(to_mdev(ibdev), port_num,
-be16_to_cpu(pinfo->lid),
+be16_to_cpu(pinfo->sm_lid),
 pinfo->neighbormtu_mastersmsl & 0xf);
 
event.device   = ibdev;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] mthca: fix lid used for sending traps

2006-09-19 Thread Roland Dreier

 > I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or
 > 2.6.18.1.

Makes sense -- I'll try to get this into 2.6.18, since it's a
one-liner and fixes a regression from 2.6.17.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] ipoib mcast restart

2006-09-19 Thread eli

> Why is the ipoib_mcast_start_thread() at the end of ipoib_ib_dev_up()
> not sufficient to rejoin all the mcgs?
>
Because after a port event all the mcast groups on the device are flushed
and all that remains is from the dev->mclist and we must renew the joins
from there.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ipoib multicast problem

2006-09-19 Thread Roland Dreier

eli> That is because the broadcast group is not part of the
eli> multicast groups maintained by the kernel but rather is part
eli> of ipoib and is joined from a different function. The other
eli> full members are maintained by the kernel for the net device
eli> and come from dev->mclist.

Oh I see, when we flush the multicast groups we actually delete all of
them instead of just removing the attached flag.  OK I guess your fix
makes sense then.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 8/13] osm: port to WinIB stack : opensm/osm_opensm.c

2006-09-19 Thread Hal Rosenstock

On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote:
> Hi Hal
> 
> Explicit NULL in empty array initializer
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Thanks. Applied to trunk only.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Completion callback /teardown race

2006-09-19 Thread Eric Barton


Hi,

I create 1 CQ just for receive completions on each of my QPs.  When I tear down
the QP, I rdma_disconnect(), change the QP state to IB_QPS_ERR and then wait
for all currently posted receives to complete.

This has worked just fine for me, but I've had a bug report from a site using
this software (possibly with HCAs I've not tested with) that another completion
callback can happen after all the posted receives have completed.

I supplied a debug/workaround patch that checks the CQ in this situation.  It
confirms that all posted receives have completed and that the CQ is in fact
empty.

Is this a bug, or an unavoidable race between arming the callback and polling
the CQ?

All the CQ callback does is wake a thread to poll the queue.  This effectively
keeps polling completions out of the CQ until it is empty. Then it calls
ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time.  

If this last call to ib_poll_cq() finds something, it repeats the whole process
- but can I be guaranteed another CQ callback in this case or is it
indeterminate?

-- 

Cheers,
Eric



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ipoib multicast problem

2006-09-19 Thread eli

> >
> I don't understand.  How could ipoib rejoin the broadcast group and
> then not rejoin the rest of the full member groups it has?
>
>
That is because the broadcast group is not part of the multicast groups
maintained by the kernel but rather is part of ipoib and is joined from a
different function. The other full members are maintained by the kernel
for the net device and come from dev->mclist.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 11/13] osm: port to WinIB stack : opensm/osm_log.c

2006-09-19 Thread Hal Rosenstock

On Sun, 2006-09-17 at 12:00, Eitan Zahavi wrote:
> Hi Hal
> 
> 1. function mappings for stat, fstat and fileno
> 2. Currently no imp for log file truncation 
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Thanks. Applied to trunk only.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 9/13] osm: port to WinIB stack : opensm/osm_prtn.c

2006-09-19 Thread Hal Rosenstock

On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote:
> Hi Hal
> 
> Required cl_debug.h for PRIx64
> Also map snprintf to _snprintf and stat to _stat
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Thanks. Applied to trunk only.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ipoib multicast problems on RHEL4.0 u4

2006-09-19 Thread Doug Ledford

On Tue, 2006-09-19 at 14:44 +0300, Eli cohen wrote:
> Hi,
> 
> while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt()
> succeeds to add a multicast group to an interface but actually the
> multicast group is not added to the net_device. This means that an
> application cannot join a multicast group as a full member. When I
> examined the differences between the kernel sources for u3 and u4 I
> noticed that essential code was removed:
> 
> diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c
> --- net/ipv4/arp.c  2006-09-18 15:35:03.0 +0300
> +++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c  2006-09-19
> 10:08:06.0 +0300
> @@ -213,9 +213,6 @@
> case ARPHRD_IEEE802_TR:
> ip_tr_mc_map(addr, haddr);
> return 0;
> -   case ARPHRD_INFINIBAND:
> -   ip_ib_mc_map(addr, haddr);
> -   return 0;
> default:
> if (dir) {
> memcpy(haddr, dev->broadcast, dev->addr_len);
> 
> 
> Can anyone suggest a workaround to this issue?

Short of spinning a kernel, it's going to be hard to work around.
Thanks for finding this, I'll track down how this got left out of the U4
kernel when it was in the U3 kernel :-/

-- 
Doug Ledford <[EMAIL PROTECTED]>
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 2/13] osm: port to WinIB stack : opensm/osm_subnet.c

2006-09-19 Thread Hal Rosenstock

Hi Eitan,

On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote:
> Hi Hal

I think this patch is really 5/13 rather than 2/13.

> No need for stdio.h but do need stdlib.h ...

It appears to be the other way around (stdio.h needed but stdlib.h
isn't), right ?

> Also map snprintf to _snprintf in windows case
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>
> 
> Index: opensm/osm_subnet.c
> ===
> --- opensm/osm_subnet.c   (revision 9502)
> +++ opensm/osm_subnet.c   (working copy)
> @@ -53,6 +53,7 @@
>  
>  #include 

Should this include of stdlib.h also be removed ?

-- Hal

>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -65,7 +66,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  /**
>   **/
> @@ -659,6 +659,9 @@ __osm_subn_opts_unpack_charp(
>}
>  }
>  
> +#ifdef WIN32
> +#define snprintf _snprintf
> +#endif
>  /**
>   **/
>  static void
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 10/13] osm: port to WinIB stack : opensm/osm_pkey.c

2006-09-19 Thread Hal Rosenstock

On Sun, 2006-09-17 at 12:00, Eitan Zahavi wrote:
> Hi Hal
> 
> Some explicit casting required and also pkey blocks are only uint16_t .
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Thanks. Applied to trunk only in conjunction with patch 2/13 on
osm_pkey.h.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 2/13] osm: port to WinIB stack : include/opensm/osm_pkey.h

2006-09-19 Thread Hal Rosenstock

On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote:
> Hi Hal
> 
> Partition tables blocks are always 16 bits. 
> This resolves the need to later cast back and forth.
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>

Thanks. Applied to trunk only in conjunction with patch 10/13 on
osm_pkey.c.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries

2006-09-19 Thread Roland Dreier

Seems OK from an anal spec compliance point of view, but I don't
understand this:

 > This breaks IPoIB on networks with SM Tavor quirk activates.

Even if opensm returns a path record with a lower MTU, the underlying
links still have a 2K mtu really, so nothing breaks.  IPoIB is just
doing something naughty by ignoring the MTU in the path record.  So
what breaks really?

(not to mention the fact that the "Tavor quirk" hasn't been accepted
into OpenSM yet anyway)

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name

2006-09-19 Thread Roland Dreier

Or> I am fine with having the CMA config selected whenever someone
Or> selects INFINIBAND so adding the help text and making it
Or> visible are not a must per my taste. However, are you fine
Or> with changing the **name** of the config directive to
Or> CONFIG_INFINIBAND_RDMA_CM so its better understood?

No, since really what it is controlling is the ib_addr module.

Or> As Erez wrote you on the other thread, we must depend on the
Or> CMA else a user running make rndconfig would be able to
Or> produce a config file where INFINIBAND is selected but the CMA
Or> (RDMA_ADDR_TRANS) config is not selected so linkage will fail.

How?  make randconfig won't produce invalid configurations.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/iser: fix iSER description and selections in Kconfig

2006-09-19 Thread Roland Dreier

Erez> I don't agree with that. It is possible that
Erez> INFINIBAND_ADDR_TRANS won't be selected according to your
Erez> patch. How about this solution: iSER should depend on
Erez> INFINIBAND && SCSI && INFINIBAND_ADDR_TRANS (which depends
Erez> on INET, so the INET dependency is ok).

How is that possible?  If INFINIBAND and INET are selected, then
INFINIBAND_ADDR_TRANS is selected too (at least as far as I can see).
How do you enable INET without INFINIBAND_ADDR_TRANS?

I don't like making things depend on INFINIBAND_ADDR_TRANS, since it's
really just an internal symbol to prevent building ib_addr when it
won't build.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] ipoib mcast restart

2006-09-19 Thread Roland Dreier

Eli> Make sure after after ipoib_ib_dev_flush is executed,
Eli> ipoib_mcast_restart_task is executed also to join all the
Eli> mcast groups maintained by the kernel for the device.

Why is the ipoib_mcast_start_thread() at the end of ipoib_ib_dev_up()
not sufficient to rejoin all the mcgs?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ipoib multicast problem

2006-09-19 Thread Roland Dreier

Eli> 1. An application registers to a multicast group as a full
Eli> member. As a result all the groups are listed in dev->mclist.
Eli> 2. The infiniband link falls momentarily, opensm restarted
Eli> etc.  3. All multicast memberships are flushed.  4. The net
Eli> device will not join again until at a later time something
Eli> will cause ipoib_set_mcast_list() to be called.
 
I don't understand.  How could ipoib rejoin the broadcast group and
then not rejoin the rest of the full member groups it has?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Fwd: IPoIB Multicast

2006-09-19 Thread Roland Dreier

Michael> Works OK here. Please commit. Please note this does fix a
Michael> real issue for us, which is quite severe for clusters
Michael> where ipoib is the only interconnect, I wander whether
Michael> this is 2.6.18 material.

I don't understand why this is a big problem.  What breaks if we let
OpenSM pick the MTU and Rate for a new multicast group?  It's already
picking them for the broadcast group.

Anyway I put this in my for-2.6.19 branch for now.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [openfabrics-ewg] OFED 1.1

2006-09-19 Thread Aviram Gutman

Aviram Gutman wrote:
> We want to have RC6 on Wed and final release next week on Tues or Wed 
> Sep-27.
> Is that acceptable by all EWG members?
>
> Regards,
> Aviram
>
>
>
>
> ___
> openfabrics-ewg mailing list
> [EMAIL PROTECTED]
> http://openib.org/mailman/listinfo/openfabrics-ewg
>   

We currently see two issues:

1) IPoIB multicast is not working on RHEL4 U4
2) iSER on SLES10 requires root privilege

I hope that Voltaire can fix issue #2. It seems that issue #1 is not 
solvable (unless we require the user to replace the kernel).
Are these issues showstoppers? Or can we issue RC6 with these issues 
outstanding?


Regards,

Aviram


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH][TRIVIAL]OpenSM/osm_node_info_rcv.c: Eliminate superfluous call level

2006-09-19 Thread Hal Rosenstock

OpenSM/osm_node_info_rcv.c: Eliminate superfluous call level

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>
Index: opensm/osm_node_info_rcv.c
===
--- opensm/osm_node_info_rcv.c  (revision 9536)
+++ opensm/osm_node_info_rcv.c  (working copy)
@@ -437,7 +437,7 @@ __osm_ni_rcv_process_new_ca(
  The plock must be held before calling this function.
 **/
 static void
-__osm_ni_rcv_process_ca_port(
+__osm_ni_rcv_process_existing_ca(
   IN const osm_ni_rcv_t* const p_rcv,
   IN osm_node_t* const p_node,
   IN const osm_madw_t* const p_madw )
@@ -455,7 +455,7 @@ __osm_ni_rcv_process_ca_port(
   osm_bind_handle_t h_bind;
   cl_status_t cl_status;
 
-  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_ca_port );
+  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca );
 
   p_smp = osm_madw_get_smp_ptr( p_madw );
   p_ni = (ib_node_info_t*)ib_smp_get_payload_ptr( p_smp );
@@ -473,7 +473,7 @@ __osm_ni_rcv_process_ca_port(
   if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) )
   {
 osm_log( p_rcv->p_log, OSM_LOG_VERBOSE,
- "__osm_ni_rcv_process_ca_port: "
+ "__osm_ni_rcv_process_existing_ca: "
  "Creating new port object with GUID = 0x%" PRIx64 "\n",
  cl_ntoh64( p_ni->port_guid ) );
 
@@ -483,7 +483,7 @@ __osm_ni_rcv_process_ca_port(
 if( p_port == NULL )
 {
   osm_log( p_rcv->p_log, OSM_LOG_ERROR,
-   "__osm_ni_rcv_process_ca_port: ERR 0D04: "
+   "__osm_ni_rcv_process_existing_ca: ERR 0D04: "
"Unable to create new port object\n" );
   goto Exit;
 }
@@ -500,7 +500,7 @@ __osm_ni_rcv_process_ca_port(
 Somehow, this port GUID already exists in the table.
   */
   osm_log( p_rcv->p_log, OSM_LOG_ERROR,
-   "__osm_ni_rcv_process_ca_port: ERR 0D12: "
+   "__osm_ni_rcv_process_existing_ca: ERR 0D12: "
"Port 0x%" PRIx64 " already in the database!\n",
cl_ntoh64( p_ni->port_guid ) );
 
@@ -521,7 +521,7 @@ __osm_ni_rcv_process_ca_port(
   if( cl_status != CL_SUCCESS )
   {
 osm_log( p_rcv->p_log, OSM_LOG_ERROR,
- "__osm_ni_rcv_process_ca_port: ERR 0D08: "
+ "__osm_ni_rcv_process_existing_ca: ERR 0D08: "
  "Error %s adding to list\n",
  CL_STATUS_MSG( cl_status ) );
 osm_port_delete( &p_port );
@@ -530,7 +530,7 @@ __osm_ni_rcv_process_ca_port(
   else
   {
 osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
- "__osm_ni_rcv_process_ca_port: "
+ "__osm_ni_rcv_process_existing_ca: "
  "Adding port GUID:0x%016" PRIx64 " to new_ports_list\n",
  cl_ntoh64(osm_node_get_node_guid( p_port->p_node )) );
   }
@@ -547,7 +547,7 @@ __osm_ni_rcv_process_ca_port(
 if ( !osm_physp_is_valid( p_physp ) )
 {
 osm_log( p_rcv->p_log, OSM_LOG_ERROR,
- "__osm_ni_rcv_process_ca_port: ERR 0D19: "
+ "__osm_ni_rcv_process_existing_ca: ERR 0D19: "
  "Invalid physical port. Aborting discovery\n");
 goto Exit;
 }
@@ -579,7 +579,7 @@ __osm_ni_rcv_process_ca_port(
   if( status != IB_SUCCESS )
   {
 osm_log( p_rcv->p_log, OSM_LOG_ERROR,
- "__osm_ni_rcv_process_ca_port: ERR 0D13: "
+ "__osm_ni_rcv_process_existing_ca: ERR 0D13: "
  "Failure initiating PortInfo request (%s)\n",
  ib_get_err_str(status));
   }
@@ -592,22 +592,6 @@ __osm_ni_rcv_process_ca_port(
  The plock must be held before calling this function.
 **/
 static void
-__osm_ni_rcv_process_existing_ca(
-  IN const osm_ni_rcv_t* const p_rcv,
-  IN osm_node_t* const p_node,
-  IN const osm_madw_t* const p_madw )
-{
-  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca );
-
-  __osm_ni_rcv_process_ca_port( p_rcv, p_node, p_madw );
-
-  OSM_LOG_EXIT( p_rcv->p_log );
-}
-
-/**
- The plock must be held before calling this function.
-**/
-static void
 __osm_ni_rcv_process_new_router(
   IN const osm_ni_rcv_t* const p_rcv,
   IN osm_node_t* const p_node,





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics

2006-09-19 Thread Hal Rosenstock

On Mon, 2006-09-18 at 14:52, Yevgeny Kliteynik wrote:
> Hi Hal
> 
> This patch fixes a bug in opensm that was discovered on
> a 'broken' fabrics when opensm was executed with --stay_on_fatal.
> Replacing assert with a real check.
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <[EMAIL PROTECTED]>

Thanks. Applied with some cosmetic changes (to both trunk and 1.1).

Note that this patch was rejected (not sure why) and was manually
applied.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA config name

2006-09-19 Thread Or Gerlitz

Roland Dreier wrote:
> Or> I want it to be visible so if some other config **depends** on
> Or> it the use can **see** this config and select it.
> 
> Or> Also as of the importance of the rdma cm within the IB stack
> Or> being along with the ib verbs the second access point to ULP
> Or> coders, seeing its config and documenting it is important.
> 
> I don't buy this.  The only thing making this config option visible
> does is make it more likely (far more likely) that someone will
> disable it.  Right now the RDMA CM is built as long as INFINIBAND and
> INET are enabled.  No one is going to turn off INET on any normal
> system so effectively the RDMA CM is always built whenever INFINIBAND
> is enabled.

I am fine with having the CMA config selected whenever someone selects 
INFINIBAND so adding the help text and making it visible are not a must 
per my taste. However, are you fine with changing the **name** of the 
config directive to CONFIG_INFINIBAND_RDMA_CM so its better understood?

> As far as making a config symbol to depend on, I think INET makes as
> much sense or more: something using IP addressing naturally depends on
> having IP networking.

As Erez wrote you on the other thread, we must depend on the CMA else a 
user running make rndconfig would be able to produce a config file where 
  INFINIBAND is selected but the CMA (RDMA_ADDR_TRANS) config is not 
selected so linkage will fail.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] IB/ipoib: user appropriate mtu selector for path queries

2006-09-19 Thread Michael S. Tsirkin

Roland, the patch is still under test (I'll leave it to run
for a nigh), but I'd like to get comments on the following:


IB/ipoib: user appropriate mtu selector for path queries

IPoIB must set mtu selector in path record query according to dev->mtu:
if we wildcard it, SM can select a path with lower MTU.
This breaks IPoIB on networks with SM Tavor quirk activates.

We can always require this, since IPoIB spec includes the following statement:
The value (for IB MTU) assigned to the broadcast-GID must not
be greater than any physical link MTU spanned by the IPoIB
subnet.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

---

Note the following uses IB_SA_GT so it should be applied on top of SA
enum rename.

Index: ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- ofed_1_1.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -182,6 +182,8 @@ static int ipoib_change_mtu(struct net_d
 
dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
 
+   queue_work(ipoib_workqueue, &priv->flush_task);
+
return 0;
 }
 
@@ -452,15 +454,39 @@ static int path_rec_start(struct net_dev
  struct ipoib_path *path)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
+   ib_sa_comp_mask comp_mask = IB_SA_PATH_REC_MTU_SELECTOR | 
IB_SA_PATH_REC_MTU;
+
+   path->pathrec.mtu_selector = IB_SA_GT;
 
-   ipoib_dbg(priv, "Start path record lookup for " IPOIB_GID_FMT "\n",
- IPOIB_GID_ARG(path->pathrec.dgid));
+   switch (roundup_pow_of_two(dev->mtu + IPOIB_ENCAP_LEN)) {
+   case 512:
+   path->pathrec.mtu = IB_MTU_256;
+   break;
+   case 1024:
+   path->pathrec.mtu = IB_MTU_512;
+   break;
+   case 2048:
+   path->pathrec.mtu = IB_MTU_1024;
+   break;
+   case 4096:
+   path->pathrec.mtu = IB_MTU_2048;
+   break;
+   default:
+   /* Wildcard everything */
+   comp_mask = 0;
+   path->pathrec.mtu = 0;
+   path->pathrec.mtu_selector = 0;
+   }
+
+   ipoib_dbg(priv, "Start path record lookup for " IPOIB_GID_FMT " MTU > 
%d\n",
+ IPOIB_GID_ARG(path->pathrec.dgid),
+ comp_mask ? ib_mtu_enum_to_int(path->pathrec.mtu) : 0);
 
init_completion(&path->done);
 
path->query_id =
ib_sa_path_rec_get(priv->ca, priv->port,
-  &path->pathrec,
+  &path->pathrec, comp_mask|
   IB_SA_PATH_REC_DGID  |
   IB_SA_PATH_REC_SGID  |
   IB_SA_PATH_REC_NUMB_PATH |

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics

2006-09-19 Thread Yevgeny Kliteynik

Hi Hal.

Please apply this patch both to trunk and 1.1.

Thanks.

--
Yevgeny

 > Hi Yevgeny,
 >
 > On Mon, 2006-09-18 at 14:52, Yevgeny Kliteynik wrote:
 > > Hi Hal
 > >
 > > This patch fixes a bug in opensm that was discovered on
 > > a 'broken' fabrics when opensm was executed with --stay_on_fatal.
 > > Replacing assert with a real check.
 > >
 > > Yevgeny
 > >
 > > Signed-off-by:  Yevgeny Kliteynik 
 >
 > Is this intended for trunk only or also 1.1 ?
 >
 > -- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] IB/iser: fix iSER description and selections in Kconfig

2006-09-19 Thread Erez Zilber


Roland Dreier wrote:
> Erez> There are 3 additional required config entries: NET, INET &
> Erez> INFINIBAND_RDMA_CM. Do you suggest to 'depned' on them or
> Erez> 'depned' on some of them and 'select' the rest?
>
> INET depends on NET, and INFINIBAND_RDMA_CM doesn't exist.  So
> depending on INET is sufficient.  That's the reason 'depend' is better
> than 'select' -- you don't have to worry about recreating the full
> dependency tree of things you depend on.
>
> Erez> Also, since I'm not familiar enough with 'make rndconfig',
> Erez> here's a question: if iSER 'depends' on INET, is it possible
> Erez> that 'make rndconfig' will enable iSER without enabling
> Erez> INET?
>
> No, of course not.  The whole point of make randconfig is to make a
> random but valid configuration.
>
> Anyway, rather than waste more time going back and forth on this, I
> added the following to my for-2.6.19 tree as the obvious fix:
>
> Author: Roland Dreier <[EMAIL PROTECTED]>
> Date:   Sun Sep 17 22:58:27 2006 -0700
>
> IB/iser: INFINIBAND_ISER depends on INET
> 
> iSER won't build without CONFIG_INET enabled, so make Kconfig reflect 
> that.
> 
> Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>
>
> diff --git a/drivers/infiniband/ulp/iser/Kconfig 
> b/drivers/infiniband/ulp/iser/Kconfig
> index fead87d..365a1b5 100644
> --- a/drivers/infiniband/ulp/iser/Kconfig
> +++ b/drivers/infiniband/ulp/iser/Kconfig
> @@ -1,6 +1,6 @@
>  config INFINIBAND_ISER
>   tristate "ISCSI RDMA Protocol"
> - depends on INFINIBAND && SCSI
> + depends on INFINIBAND && SCSI && INET
>   select SCSI_ISCSI_ATTRS
>   ---help---
> Support for the ISCSI RDMA Protocol over InfiniBand.  This
>   
I don't agree with that. It is possible that INFINIBAND_ADDR_TRANS won't 
be selected according to your patch. How about this solution: iSER 
should depend on INFINIBAND && SCSI && INFINIBAND_ADDR_TRANS (which 
depends on INET, so the INET dependency is ok).

Erez


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] ipoib multicast problems on RHEL4.0 u4

2006-09-19 Thread Eli cohen

Hi,

while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt()
succeeds to add a multicast group to an interface but actually the
multicast group is not added to the net_device. This means that an
application cannot join a multicast group as a full member. When I
examined the differences between the kernel sources for u3 and u4 I
noticed that essential code was removed:

diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c
--- net/ipv4/arp.c  2006-09-18 15:35:03.0 +0300
+++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c  2006-09-19
10:08:06.0 +0300
@@ -213,9 +213,6 @@
case ARPHRD_IEEE802_TR:
ip_tr_mc_map(addr, haddr);
return 0;
-   case ARPHRD_INFINIBAND:
-   ip_ib_mc_map(addr, haddr);
-   return 0;
default:
if (dir) {
memcpy(haddr, dev->broadcast, dev->addr_len);


Can anyone suggest a workaround to this issue?

Thanks
Eli


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Posting requests on multiple QPs simultaneously

2006-09-19 Thread Dotan Barak

Mahesh Barve wrote:
> Hi,
>   Infiniband allows the creation of 16M QPs. 
>  Suppose a programmer wants to post separate requests on each of the 
> QPs simultaneously,
>  what would be the most efficient way of doing it?
> regards,
> -mahesh
>  
what is your question: should you use threads?  should you post one by 
one or post a list?

in one post operation, you cannot post WR to more than one QP.

Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Posting requests on multiple QPs simultaneously

2006-09-19 Thread Mahesh Barve

Hi, 
  Infiniband allows the creation of 16M QPs. 
 Suppose a programmer wants to post separate requests on each of the QPs simultaneously,
 what would be the most efficient way of doing it? 
regards,
-mahesh 
 
 
 
  
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Fwd: IPoIB Multicast

2006-09-19 Thread Michael S. Tsirkin

Quoting r. Michael S. Tsirkin <[EMAIL PROTECTED]>:
> Subject: Re: Fwd: IPoIB Multicast
> 
> Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: Fwd: IPoIB Multicast
> 
> Here's a patch that tries to fix this.  I only tried it with the Cisco
> embedded SM, so someone should probably check that this doesn't break
> under OpenSM.
> 
> Look OK?
> 
>  - R.
> 
> 
> We've been testing the following which looks exactly equivalent.
> I'll look at the regression results in the morning and will let you know.

Works OK here. Please commit. Please note this does fix a real issue
for us, which is quite severe for clusters where ipoib is the only
interconnect, I wander whether this is 2.6.18 material.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] mthca: fix lid used for sending traps

2006-09-19 Thread Michael S. Tsirkin

From: "Jack Morgenstein" <[EMAIL PROTECTED]>

SM lid was incorrectly set to port lid.  This is a regression from 2.6.17 -
after event, no traps are sent to the SM LID - they go to the
loopback interface instead, and are typicaly dropped there.
Should be set to sm_lid of port info response.

Signed-off-by: Jack Morgenstein <[EMAIL PROTECTED]>
Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

---

Roland, this fixes a serious regression from 2.6.17.
The bug was introduced by commit 12bbb2b7be7f5564952ebe0196623e97464b8ac5:
IB/mthca: Add client reregister event generation
I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or
2.6.18.1.

Index: ofed_1_1/drivers/infiniband/hw/mthca/mthca_mad.c
===
--- ofed_1_1.orig/drivers/infiniband/hw/mthca/mthca_mad.c   2006-08-16 
10:16:19.0 +0300
+++ ofed_1_1/drivers/infiniband/hw/mthca/mthca_mad.c2006-09-19 
10:33:31.280328000 +0300
@@ -119,7 +119,7 @@ static void smp_snoop(struct ib_device *
 
mthca_update_rate(to_mdev(ibdev), port_num);
update_sm_ah(to_mdev(ibdev), port_num,
-be16_to_cpu(pinfo->lid),
+be16_to_cpu(pinfo->sm_lid),
 pinfo->neighbormtu_mastersmsl & 0xf);
 
event.device   = ibdev;


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Fwd: [PATCH] id_priv_list->list is not initialized sometimes

2006-09-19 Thread Krishna Kumar2

Hi Michael,

> Did you actually see these crashes?
> If yes, this might need to be fixed even for 2.6.18. Sean?

No I have not seen this crash, this is based on reading the code.

thanks,

- KK

[EMAIL PROTECTED] wrote on 09/19/2006 12:55:09 PM:

> 
> - Forwarded message from Krishna Kumar <[EMAIL PROTECTED]> -
> 
> From: "Krishna Kumar" <[EMAIL PROTECTED]>
> Date: Tue, 19 Sep 2006 12:32:10 +0530
> Subject: [PATCH] id_priv_list->list is not initialized
>  sometimes
> 
> rdma_listen could be called from a context where id_priv->list
> is not initialized. Then at a later stage, a cma_cancel_listen
> does a list_del() which could oops since this element is not
> on any list. 
> 
> Eg, in rdma_listen(), if id->device is !NULL, it calls
> cma_ib_listen() which doesn't add this id to any list. A
> cma_cancel_listen() will do a list_del.
> 
> Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>
> 
> 
> diff -ruNp org/core/cma.c new/core/cma.c
> --- org/core/cma.c   2006-09-14 15:31:27.0 +0530
> +++ new/core/cma.c   2006-09-14 16:07:35.0 +0530
> @@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c
> atomic_set(&id_priv->dev_remove, 0);
> INIT_LIST_HEAD(&id_priv->listen_list);
> INIT_LIST_HEAD(&id_priv->mc_list);
> +   INIT_LIST_HEAD(&id_priv->list);
> get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
> 
> return &id_priv->id;
> 
> - End forwarded message -
> 

> 
> -- 
> MST
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Fwd: [PATCH] id_priv_list->list is not initialized sometimes

2006-09-19 Thread Michael S. Tsirkin

- Forwarded message from Krishna Kumar <[EMAIL PROTECTED]> -

From: "Krishna Kumar" <[EMAIL PROTECTED]>
Date: Tue, 19 Sep 2006 12:32:10 +0530
Subject: [PATCH] id_priv_list->list is not initialized
 sometimes

rdma_listen could be called from a context where id_priv->list
is not initialized. Then at a later stage, a cma_cancel_listen
does a list_del() which could oops since this element is not
on any list. 

Eg, in rdma_listen(), if id->device is !NULL, it calls
cma_ib_listen() which doesn't add this id to any list. A
cma_cancel_listen() will do a list_del.

Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c  2006-09-14 15:31:27.0 +0530
+++ new/core/cma.c  2006-09-14 16:07:35.0 +0530
@@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c
atomic_set(&id_priv->dev_remove, 0);
INIT_LIST_HEAD(&id_priv->listen_list);
INIT_LIST_HEAD(&id_priv->mc_list);
+   INIT_LIST_HEAD(&id_priv->list);
get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);

return &id_priv->id;

- End forwarded message -

Did you actually see these crashes?
If yes, this might need to be fixed even for 2.6.18. Sean?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] Fix freed mem deref race in cma_process_remove/cma_req_handler

2006-09-19 Thread Michael S. Tsirkin

Quoting r. Krishna Kumar <[EMAIL PROTECTED]>:
> Subject: [PATCH] Fix freed mem deref race in 
> cma_process_remove/cma_req_handler
> 
> The race is as follows :
> 
> A process : cma_process_remove() calls cma_remove_id_dev(),
>   which sets id state to CMA_DEVICE_REMOVAL and
>   calls wait_event(dev_remove).
> 
> B process : cma_req_handler() had incremented dev_remove,
>   and calls cma_acquire_ib_dev() and on failure
>   calls cma_release_remove(), which does a
>   wake_up of cma_process_remove(). Then
>   cma_req_handler() calls rdma_destroy_id();
> 
> A Process : cma_remove_id_dev() gets woken and checks the
>   state of id, and since it is still (wrongly)
>   CMA_DEVICE_REMOVAL, it calls notify_user(id)
>   and if that fails, the caller - cma_process_remove()
>   calls rdma_destroy_id(id). Two processes can
>   call rdma_destroy_id(), resulting in one
>   de-referencing kfreed id_priv.
> 
> Fix is for process B to set CMA_DESTROYING in cma_req_handler()
> so that process A will return instead of doing a rdma_destroy_id().
> 
> Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>

Did you actually see these crashes?
If yes, this looks serious enough even for 2.6.18. Sean?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] Typo in ib_set_client_data()

2006-09-19 Thread Krishna Kumar

Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>


diff -ruNp org/core/device.c new/core/device.c
--- org/core/device.c   2006-09-14 15:38:14.0 +0530
+++ new/core/device.c   2006-09-14 15:38:29.0 +0530
@@ -385,7 +385,7 @@ void *ib_get_client_data(struct ib_devic
 EXPORT_SYMBOL(ib_get_client_data);
 
 /**
- * ib_set_client_data - Get IB client context
+ * ib_set_client_data - Set IB client context
  * @device:Device to set context for
  * @client:Client to set context for
  * @data:Context to set

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] fix cma_leave_mc_groups

2006-09-19 Thread Krishna Kumar

- mthca_multicast_detach - as an example, frees up a bit
  for re-use later so if it is not called during destroy_id,
  it *appears* that those bits (index) are leaked.

- cma_leave_mc_groups can race with other routines updating
  or reading the mclist, so use lock. Eg while doing a
  rdma_destroy_id(), other processes could be looking at
  this id and de-referencing mclist.

Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>


diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c  2006-09-18 16:00:41.0 +0530
+++ new/core/cma.c  2006-09-18 16:12:58.0 +0530
@@ -761,14 +761,24 @@ static void cma_release_port(struct rdma
 static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
 {
struct cma_multicast *mc;
+   unsigned long flags;
 
+   spin_lock_irqsave(&id_priv->lock, flags);
while (!list_empty(&id_priv->mc_list)) {
mc = container_of(id_priv->mc_list.next,
  struct cma_multicast, list);
list_del(&mc->list);
+   spin_unlock_irqrestore(&id_priv->lock, flags);
+   if (id_priv->id.qp) {
+   ib_detach_mcast(id_priv->id.qp,
+   &mc->multicast.ib->rec.mgid,
+   mc->multicast.ib->rec.mlid);
+   }
ib_free_multicast(mc->multicast.ib);
kfree(mc);
+   spin_lock_irqsave(&id_priv->lock, flags);
}
+   spin_unlock_irqrestore(&id_priv->lock, flags);
 }
 
 void rdma_destroy_id(struct rdma_cm_id *id)

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] ucma : Encapsulate duplicate code to common routine

2006-09-19 Thread Krishna Kumar

Encapsulate duplicate code to common routine - avoid checking same
errors in multiple places.

Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>


diff -ruNp org/core/ucma.c new/core/ucma.c
--- org/core/ucma.c 2006-09-18 17:38:12.0 +0530
+++ new/core/ucma.c 2006-09-18 17:39:34.0 +0530
@@ -87,20 +87,30 @@ struct ucma_event {
 static DEFINE_MUTEX(ctx_mutex);
 static DEFINE_IDR(ctx_idr);
 
-static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id)
+/* _ucma_find_context : internal find routine. Assumes ctx_mutex is held */
+static inline struct ucma_context* _ucma_find_context(int id)
 {
struct ucma_context *ctx;
 
-   mutex_lock(&ctx_mutex);
+   BUG_ON(!mutex_is_locked(&ctx_mutex));
+
ctx = idr_find(&ctx_idr, id);
if (!ctx)
ctx = ERR_PTR(-ENOENT);
else if (ctx->file != file)
ctx = ERR_PTR(-EINVAL);
-   else
+   return ctx;
+}
+
+static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id)
+{
+   struct ucma_context *ctx;
+
+   mutex_lock(&ctx_mutex);
+   ctx = _ucma_find_context(id);
+   if (!IS_ERR(ctx))
atomic_inc(&ctx->ref);
mutex_unlock(&ctx_mutex);
-
return ctx;
 }
 
@@ -354,12 +364,8 @@ static ssize_t ucma_destroy_id(struct uc
return -EFAULT;
 
mutex_lock(&ctx_mutex);
-   ctx = idr_find(&ctx_idr, cmd.id);
-   if (!ctx)
-   ctx = ERR_PTR(-ENOENT);
-   else if (ctx->file != file)
-   ctx = ERR_PTR(-EINVAL);
-   else
+   ctx = _ucma_find_context(cmd.id);
+   if (!IS_ERR(ctx))
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&ctx_mutex);
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] id_priv_list->list is not initialized sometimes

2006-09-19 Thread Krishna Kumar

rdma_listen could be called from a context where id_priv->list
is not initialized. Then at a later stage, a cma_cancel_listen
does a list_del() which could oops since this element is not
on any list. 

Eg, in rdma_listen(), if id->device is !NULL, it calls
cma_ib_listen() which doesn't add this id to any list. A
cma_cancel_listen() will do a list_del.

Signed-off-by: Krishna Kumar <[EMAIL PROTECTED]>


diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c  2006-09-14 15:31:27.0 +0530
+++ new/core/cma.c  2006-09-14 16:07:35.0 +0530
@@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c
atomic_set(&id_priv->dev_remove, 0);
INIT_LIST_HEAD(&id_priv->listen_list);
INIT_LIST_HEAD(&id_priv->mc_list);
+   INIT_LIST_HEAD(&id_priv->list);
get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
 
return &id_priv->id;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

65 matches

Mail list logo