Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
Sayantan> I am getting a segmentation fault after a couple of
Sayantan> thousand messages are sent over SRQ (using ping-pong
Sayantan> latency test). Here is a snippet from the core
Sayantan> generated.

Is it possible that you are posting one more receive to the SRQ than
the max capacity you requested when creating the SRQ?

What happens with the patch below applied to libmthca?

Thanks,
  Roland


--- libmthca/src/srq.c  (revision 3664)
+++ libmthca/src/srq.c  (working copy)
@@ -110,6 +110,13 @@ int mthca_tavor_post_srq_recv(struct ibv
 
wqe   = get_wqe(srq, ind);
next_ind  = *wqe_to_link(wqe);
+
+   if (next_ind < 0) {
+   err = -1;
+   *bad_wr = wr;
+   break;
+   }
+
prev_wqe  = srq->last;
srq->last = wqe;
 
@@ -197,6 +204,12 @@ int mthca_arbel_post_srq_recv(struct ibv
wqe   = get_wqe(srq, ind);
next_ind  = *wqe_to_link(wqe);
 
+   if (next_ind < 0) {
+   err = -1;
+   *bad_wr = wr;
+   break;
+   }
+
((struct mthca_next_seg *) wqe)->nda_op =
htonl((next_ind << srq->wqe_shift) | 1);
((struct mthca_next_seg *) wqe)->ee_nds = 0;
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Suppress your appetite

2005-10-05 Thread Eunice Hager

You've seen it on "60 Minutes" and read the BBC News report -- now find out 
just what everyone is talking about.

# Suppress your appetite and feel full and satisfied all day long
# Increase your energy levels
# Lose excess weight
# Increase your metabolism
# Burn body fat
# Burn calories
# Attack obesity
And more..

http://hrusmiafc.info/

# Suitable for vegetarians and vegans
# MAINTAIN your weight loss
# Make losing weight a sure guarantee
# Look your best during the summer months

http://hrusmiafc.info/

Regards, 
Dr. Eunice Hager

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Sayantan Sur
Roland,

* On Oct,7 Roland Dreier<[EMAIL PROTECTED]> wrote :
> OK, I just checked in an initial implementation of both setting the
> SRQ limit with the modify SRQ verb, and also getting SRP limit reached
> events when the occur.  You will need to update your kernel drivers,
> libibverbs and libmthca to get this.
> 
> I've done zero testing, so please let me know how it works.  You
> should at least get an interesting new failure.

I am getting a segmentation fault after a couple of thousand messages
are sent over SRQ (using ping-pong latency test). Here is a snippet from
the core generated.

Let me know what you think about this.

Thanks,
Sayantan.

=

#0  0x2b238faa in mthca_poll_cq (ibcq=0xd4b920, ne=1, 
wc=0x7f957f90) at
cq.c:336
336 wc->wr_id = srq->wrid[wqe_index];
(gdb) bt
#0  0x2b238faa in mthca_poll_cq (ibcq=0xd4b920, ne=1, 
wc=0x7f957f90) at
cq.c:336
#1  0x004151f5 in MPID_DeviceCheck (blocking=MPID_BLOCKING) at 
verbs.h:746
#2  0x0042101c in MPID_RecvComplete (request=0x7f958030,
status=0x7f958230, error_code=0x7f958184)
at mpid_recv.c:90
#3  0x0041791c in MPID_RecvDatatype (comm_ptr=0xf5e9d0, buf=0x536280, 
count=2,
dtype_ptr=0xd36f60, src_lrank=0,
tag=1, context_id=0, status=0x7f958230, error_code=0x7f958184) at
mpid_hrecv.c:89
#4  0x00402586 in PMPI_Recv (buf=0x536280, count=2, datatype=, source=0, tag=1,
comm=, status=0x7f958230) at recv.c:87
#5  0x004020a9 in main ()
(gdb) f 0
#0  0x2b238faa in mthca_poll_cq (ibcq=0xd4b920, ne=1, 
wc=0x7f957f90) at
cq.c:336
336 wc->wr_id = srq->wrid[wqe_index];
(gdb) list
331 } else if ((*cur_qp)->ibv_qp.srq) {
332 srq = to_msrq((*cur_qp)->ibv_qp.srq);
333 wqe = htonl(cqe->wqe);
334 wq = NULL;
335 wqe_index = wqe >> srq->wqe_shift;
336 wc->wr_id = srq->wrid[wqe_index];
337 mthca_free_srq_wqe(srq, wqe);
338 } else {
339 wq = &(*cur_qp)->rq;
340 wqe_index = ntohl(cqe->wqe) >> wq->wqe_shift;



> 
>  - R.

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH] Fix leak on MAD initialization failure

2005-10-05 Thread Roland Dreier
Sean> The patch looks fine.  Did you want to commit this, or have
Sean> myself or Hal do it?

I'll do it in a little while unless you beat me to it.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] Fix leak on MAD initialization failure

2005-10-05 Thread Shirley Ma

Yes. I found the the problem too.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [PATCH] Fix leak on MAD initialization failure

2005-10-05 Thread Sean Hefty
>It seems that there is a bug in ib_mad_init_device(): if
>ib_agent_port_open() fails for a given port, then the current code
>doesn't call ib_mad_port_close() for that port.  I think something
>like the patch below is needed.

The patch looks fine.  Did you want to commit this, or have myself or Hal do it?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Roland Dreier
Shirley> I don't think we handle "half-usable" devices here. We
Shirley> treat each port as an individual "device" in many layers,
Shirley> ports to ports are independent. For each HCA which could
Shirley> be as many as 256 ports, I think it makes more sense to
Shirley> handle per port, not per HCA device based.

The problem with this view is that the HCA is really the fundamental
object in the model described in the IB spec.  Most transport
resources are attached to an HCA, not a port.  In fact, with APM, a QP
might be attached to two different ports at the same time.

Shirley> Second, The IB SW stack shouldn't prevent any
Shirley> implementation from handling later ports becoming
Shirley> usable. The SW implementation should support all kinds of
Shirley> HCA implementations. Doesn't matter if it is IBM HCAs or
Shirley> HCAs from other vendors in the future.

I definitely don't want to block support for IBM HCAs.  However, at
the same time I don't want to make the IB stack more complex, more
error-prone, etc. just to work around what I would argue is a bug in
your firmware.

Shirley> Third ib_cache & ib_ipoib implmentation actually allow
Shirley> "half-usable" devices. It allows other ports initializing
Shirley> while one port has errors.

It seems cache.c actually bails out if it fails to allocate space for
one HCA port.  IPoIB does indeed proceed even if one port fails, but
that's more because there's no real reason to bail out halfway rather
than wanting to support half-usable devices.

I don't object much to making layers that really are per-port work
that way.  What worries me is trying to fix everything to work sanely
with individual ports becoming usable or unusable after an HCA has
been attached to the system.

I guess we'll have to wait and see how convincing your patches are.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
Matt>Is this due to memfree vs. memfull hardware or firmware
Matt> difference?  If you flash the memfull HCA with the memfree
Matt> firmware (which I was told you can do) will the HCA generate
Matt> an SRQ limit reached event?

I believe it's a firmware difference.  There are basically three
Mellanox HCA chips:

MT23108 - PCI-X - memfull only (FW 3.x.y)
MT25208 - 2 port PCI Express - memfull (FW 4.x.y) or memfree (FW 5.x.y)
   memfree FW will work even if HCA board
   has memory on it.  Obviously memfree FW
   is required if the HCA board has no memory.
MT25204 - 1 port PCI Express - memfree only (FW 1.x.y)

Any HCA that works with memfree FW (ie any PCI Express HCA) should be
able to generate SRQ limit events.  In the current FW release, memfull
HCAs do not generate SRQ limit events.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Shirley Ma

> I don't agree that we want to handle "half-usable"
devices where some
> ports don't work.  The only use for this seems to be working
around
> some problems with the current Galaxy HCA implementation, and there
> must be a better way to handle this.
> You're welcome to prove me wrong, but I think that handling ports
that
> are not usable and then become usable later is just going to be
> horrible.  And if we do that, then I think it would make sense
to
> handle ports starting out usable and then becoming unusable later
--
> and I think that's going to be even worse still.

I don't think we handle "half-usable" devices
here. We treat each port as an individual "device" in many layers,
ports to ports are independent. For each HCA which could be as many as
256 ports, I think it makes more sense to handle per port, not per HCA
device based.

Second, The IB SW stack shouldn't prevent any implementation
from handling later ports becoming usable. The SW implementation should
support all kinds of HCA implementations. Doesn't matter if it is IBM HCAs
or HCAs from other vendors in the future. 

Third ib_cache & ib_ipoib implmentation actually
allow "half-usable" devices. It allows other ports initializing
while one port has errors.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Matt L. Leininger
On Wed, 2005-10-05 at 15:09 -0400, Sayantan Sur wrote:

> > This is because the modify SRQ operation is not implemented at all in
> > libmthca.  Do you just want to set the SRQ limit?  That's not so hard
> > for me to implement.  However, you should be aware that as far as I
> > know, only mem-free HCAs generate the SRQ limited reached event.
> 
> Thanks for your reply. Yes, I want to set a SRQ limit. Yes, I am aware
> that only mem-free HCAs generate SRQ limit reached event. I am trying
> this on a Mem-free HCA.

   Is this due to memfree vs. memfull hardware or firmware difference?
If you flash the memfull HCA with the memfree firmware (which I was told
you can do) will the HCA generate an SRQ limit reached event?

 
 Thanks,

- Matt


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Sayantan Sur
Roland,

* On Oct,5 Roland Dreier<[EMAIL PROTECTED]> wrote :
> OK, I just checked in an initial implementation of both setting the
> SRQ limit with the modify SRQ verb, and also getting SRP limit reached
> events when the occur.  You will need to update your kernel drivers,
> libibverbs and libmthca to get this.

Thanks a lot for checking this in so quickly! I got the changes and
updated our systems.

> 
> I've done zero testing, so please let me know how it works.  You
> should at least get an interesting new failure.

With your changes the `ibv_modify_qp' works. I will have the "message
passing" part done sometime soon. If I see any failure, I'll report it
to this reflector.

Thanks,
Sayantan.

> 
>  - R.

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] Fix leak on MAD initialization failure

2005-10-05 Thread Roland Dreier
It seems that there is a bug in ib_mad_init_device(): if
ib_agent_port_open() fails for a given port, then the current code
doesn't call ib_mad_port_close() for that port.  I think something
like the patch below is needed.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

--- infiniband/core/mad.c   (revision 3664)
+++ infiniband/core/mad.c   (working copy)
@@ -2683,40 +2683,47 @@ static int ib_mad_port_close(struct ib_d
 
 static void ib_mad_init_device(struct ib_device *device)
 {
-   int num_ports, cur_port, i;
+   int start, end, i;
 
if (device->node_type == IB_NODE_SWITCH) {
-   num_ports = 1;
-   cur_port = 0;
+   start = 0;
+   end   = 0;
} else {
-   num_ports = device->phys_port_cnt;
-   cur_port = 1;
+   start = 1;
+   end   = device->phys_port_cnt;
}
-   for (i = 0; i < num_ports; i++, cur_port++) {
-   if (ib_mad_port_open(device, cur_port)) {
+
+   for (i = start; i <= end; i++) {
+   if (ib_mad_port_open(device, i)) {
printk(KERN_ERR PFX "Couldn't open %s port %d\n",
-  device->name, cur_port);
-   goto error_device_open;
+  device->name, i);
+   goto error;
}
-   if (ib_agent_port_open(device, cur_port)) {
+   if (ib_agent_port_open(device, i)) {
printk(KERN_ERR PFX "Couldn't open %s port %d "
   "for agents\n",
-  device->name, cur_port);
-   goto error_device_open;
+  device->name, i);
+   goto error_agent;
}
}
return;
 
-error_device_open:
-   while (i > 0) {
-   cur_port--;
-   if (ib_agent_port_close(device, cur_port))
+error_agent:
+   if (ib_mad_port_close(device, i))
+   printk(KERN_ERR PFX "Couldn't close %s port %d\n",
+  device->name, i);
+
+error:
+   i--;
+
+   while (i >= start) {
+   if (ib_agent_port_close(device, i))
printk(KERN_ERR PFX "Couldn't close %s port %d "
   "for agents\n",
-  device->name, cur_port);
-   if (ib_mad_port_close(device, cur_port))
+  device->name, i);
+   if (ib_mad_port_close(device, i))
printk(KERN_ERR PFX "Couldn't close %s port %d\n",
-  device->name, cur_port);
+  device->name, i);
i--;
}
 }
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Roland Dreier
Shirley> It's necessary to modify the ib_mad, ib_sa, ib_cm, just
Shirley> act like ib_ipoib and ib_cache to continue initializing
Shirley> when one port encounting errors, instead of releasing all
Shirley> resouces. If you agree, I am creating as the first patch
Shirley> for review. How to handler the errors would be the second
Shirley> patch.

I don't agree that we want to handle "half-usable" devices where some
ports don't work.  The only use for this seems to be working around
some problems with the current Galaxy HCA implementation, and there
must be a better way to handle this.

You're welcome to prove me wrong, but I think that handling ports that
are not usable and then become usable later is just going to be
horrible.  And if we do that, then I think it would make sense to
handle ports starting out usable and then becoming unusable later --
and I think that's going to be even worse still.

I do agree that we want to handle errors in initialization better.
The ib_mad and ib_cm code actually looks OK to me (with a small bug in
ib_mad for which I'll post a patch shortly).  I think something like
the patch below is all that's needed to fix ib_sa:

--- infiniband/core/sa_query.c  (revision 3664)
+++ infiniband/core/sa_query.c  (working copy)
@@ -583,10 +583,16 @@ int ib_sa_path_rec_get(struct ib_device 
 {
struct ib_sa_path_query *query;
struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-   struct ib_sa_port   *port   = &sa_dev->port[port_num - 
sa_dev->start_port];
-   struct ib_mad_agent *agent  = port->agent;
+   struct ib_sa_port   *port;
+   struct ib_mad_agent *agent;
int ret;
 
+   if (!sa_dev)
+   return -ENODEV;
+
+   port  = &sa_dev->port[port_num - sa_dev->start_port];
+   agent = port->agent;
+
query = kmalloc(sizeof *query, gfp_mask);
if (!query)
return -ENOMEM;
@@ -685,10 +691,16 @@ int ib_sa_service_rec_query(struct ib_de
 {
struct ib_sa_service_query *query;
struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-   struct ib_sa_port   *port   = &sa_dev->port[port_num - 
sa_dev->start_port];
-   struct ib_mad_agent *agent  = port->agent;
+   struct ib_sa_port   *port;
+   struct ib_mad_agent *agent;
int ret;
 
+   if (!sa_dev)
+   return -ENODEV;
+
+   port  = &sa_dev->port[port_num - sa_dev->start_port];
+   agent = port->agent;
+
if (method != IB_MGMT_METHOD_GET &&
method != IB_MGMT_METHOD_SET &&
method != IB_SA_METHOD_DELETE)
@@ -768,10 +780,16 @@ int ib_sa_mcmember_rec_query(struct ib_d
 {
struct ib_sa_mcmember_query *query;
struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-   struct ib_sa_port   *port   = &sa_dev->port[port_num - 
sa_dev->start_port];
-   struct ib_mad_agent *agent  = port->agent;
+   struct ib_sa_port   *port;
+   struct ib_mad_agent *agent;
int ret;
 
+   if (!sa_dev)
+   return -ENODEV;
+
+   port  = &sa_dev->port[port_num - sa_dev->start_port];
+   agent = port->agent;
+
query = kmalloc(sizeof *query, gfp_mask);
if (!query)
return -ENOMEM;
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] ib_cm_listen failure

2005-10-05 Thread James Lentini


On Wed, 5 Oct 2005, Todd Bowman wrote:

> Here is a patch for dtest.c to remove the qualifier from the sdp range.
> 
> Index: userspace/dapl/test/dtest/dtest.c
> ===
> --- userspace/dapl/test/dtest/dtest.c (revision 3547)
> +++ userspace/dapl/test/dtest/dtest.c (working copy)
> @@ -53,7 +53,7 @@
> #include "dat/udat.h"
> 
> /* definitions */
> -#define SERVER_CONN_QUAL 71123
> +#define SERVER_CONN_QUAL 45248
> #define DTO_TIMEOUT (1000*1000*5)
> #define DTO_FLUSH_TIMEOUT (1000*1000*2)
> #define CONN_TIMEOUT (1000*1000*10)

Thanks Todd. I don't mean to nit pick, but do mind throwing a 
Signed-off-by line on it?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Shirley Ma

    Fab> Shouldn't a user get an error
(not an oops) if they try to
    Fab> use the MAD layer for a device that didn't initialize
    Fab> properly within the MAD layer?  Doesn't the
MAD layer trap
    Fab> that device requests are valid?  It seems that
adding such
    Fab> checks would be much simpler to implement, rather
than trying
    Fab> to figure out how to express these limitations to
the various
    Fab> ULPs.

> Yeah, I guess that makes sense, although it exercises the upper
> layers' error paths more.  All of the modules that export interfaces
> used by other layers have to be prepared for a device that they failed
> to initialize, and the upper layers have to be prepared for lower
> layers to fail.

These two approches are both need to
go through each layer. The difference is one prevents the error happen
earlier, another one detects the error later, which would be a better solution
if the error could happen later.
 
It's necessary to modify the ib_mad,
ib_sa, ib_cm, just act like ib_ipoib and ib_cache to continue initializing
when one port encounting errors, instead of releasing all resouces. If
you agree, I am creating as the first patch for review. How to handler
the errors would be the second patch.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
OK, I just checked in an initial implementation of both setting the
SRQ limit with the modify SRQ verb, and also getting SRP limit reached
events when the occur.  You will need to update your kernel drivers,
libibverbs and libmthca to get this.

I've done zero testing, so please let me know how it works.  You
should at least get an interesting new failure.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Roland Dreier
Fab> Proper error handling should resolve both the ifconfig hang
Fab> and multicast join oops.

To be honest, I'm not familiar with the ifconfig hang, but I don't
think the multicast join oops is caused by lack of error handling.
It's some small race somewhere.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Roland Dreier
Fab> Shouldn't a user get an error (not an oops) if they try to
Fab> use the MAD layer for a device that didn't initialize
Fab> properly within the MAD layer?  Doesn't the MAD layer trap
Fab> that device requests are valid?  It seems that adding such
Fab> checks would be much simpler to implement, rather than trying
Fab> to figure out how to express these limitations to the various
Fab> ULPs.

Yeah, I guess that makes sense, although it exercises the upper
layers' error paths more.  All of the modules that export interfaces
used by other layers have to be prepared for a device that they failed
to initialize, and the upper layers have to be prepared for lower
layers to fail.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
Sayantan> If you could implement this feature, that would be
Sayantan> really great!

OK, there's not much left to do.  I should have something to check in
today.  I'll let you know when it's ready.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Sean Hefty

Roland Dreier wrote:

Yes, I agree we should fix the bugs in error handling during
registration.  However, I don't think that a mask of ports is the
right answer -- it doesn't seem to address the real issue.  We should
just make sure that if, say, the MAD layer fails to initialize a
device, then all clients that depend on the MAD layer don't try to use
that device.  I'm not sure what the right way to express these
dependencies is, however.


One possibility is to have each layer verify the device/port parameters.  The 
MAD layer can verify that the specified device/port are valid in 
ib_register_mad_agent().  Similar for other other modules.


We also have the port capability mask available that could be used.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Fab Tillier
> From: Roland Dreier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 05, 2005 12:07 PM
> 
> Shirley> The port failure means the SW clients initilization of
> Shirley> that port failure. Doesn't matter whether the link is
> Shirley> up/down or the hardware/firmare problem. If encountering
> Shirley> any of the SW errors, the upper users can't use that port
> Shirley> correctly, or even the whole device correctly. It's
> Shirley> easily to prove that if you set error points during
> Shirley> client registration and start the upper users. The
> Shirley> problems could be kernel hung, kernel oops. For example,
> Shirley> if mad_client initilization ports failure and you start
> Shirley> ipoib_client. ifconfig will hung in kernel. If sa_client
> Shirley> failure, the ipoib multicast join will hit kernel
> Shirley> oops. Staring the upper users without checking the
> Shirley> depency resouce allocation is buggy. It is definitely
> Shirley> worth to spend time to address this.
> 
> Yes, I agree we should fix the bugs in error handling during
> registration.  However, I don't think that a mask of ports is the
> right answer -- it doesn't seem to address the real issue.  We should
> just make sure that if, say, the MAD layer fails to initialize a
> device, then all clients that depend on the MAD layer don't try to use
> that device.

Shouldn't a user get an error (not an oops) if they try to use the MAD layer for
a device that didn't initialize properly within the MAD layer?  Doesn't the MAD
layer trap that device requests are valid?  It seems that adding such checks
would be much simpler to implement, rather than trying to figure out how to
express these limitations to the various ULPs.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Sayantan Sur
Roland,

* On Oct,2 Roland Dreier<[EMAIL PROTECTED]> wrote :
> Sayantan> Hello, This is in regard to the use of `ibv_modify_srq'
> Sayantan> call. When I use this call, I get a segmentation
> Sayantan> fault.
> 
> This is because the modify SRQ operation is not implemented at all in
> libmthca.  Do you just want to set the SRQ limit?  That's not so hard
> for me to implement.  However, you should be aware that as far as I
> know, only mem-free HCAs generate the SRQ limited reached event.

Thanks for your reply. Yes, I want to set a SRQ limit. Yes, I am aware
that only mem-free HCAs generate SRQ limit reached event. I am trying
this on a Mem-free HCA.

If you could implement this feature, that would be really great!

Thanks,
Sayantan.

> 
>  - R.

-- 
http://www.cse.ohio-state.edu/~surs
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Fab Tillier
> From: Shirley Ma [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 05, 2005 11:56 AM
> 
> The port failure means the SW clients initilization of that port failure.
> Doesn't matter whether the link is up/down or the hardware/firmare problem. If
> encountering any of the SW errors, the upper users can't use that port
> correctly, or even the whole device correctly. It's easily to prove that if
> you set error points during client registration and start the upper users. The
> problems could be kernel hung, kernel oops. For example, if mad_client
> initilization ports failure and you start ipoib_client. ifconfig will hung in
> kernel. If sa_client failure, the ipoib multicast join will hit kernel oops.
> Staring the upper users without checking the depency resouce allocation is
> buggy. It is  definitely worth to spend time to address this.

This sounds like bugs in the code where we don't trap failures gracefully.  I
think fixing that is probably much more useful.  There will always be situations
where runtime errors can occur (memory allocation failure, for example), and all
upper level protocols must handle failures of these calls.

Putting in code and requiring every client to compare all the various bit fields
they're interested in doesn't remove the need for proper error handling.  Proper
error handling should resolve both the ifconfig hang and multicast join oops.

Just my $0.02

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Roland Dreier
Shirley> There is not a really bitmap there. I just use it to be
Shirley> easily understood. The client registration has
Shirley> sequence. Checking resouce dependency is needed to start
Shirley> upper client registration on that port.

It's not a strict sequence, however.  If the CM fails to initialize a
device, then SDP and SRP cannot use that device.  However, IPoIB can
use the device just fine, even if it loads after the CM.  Similarly,
if SDP fails to initialize a device, then SRP should not be affected
even if it loads after SDP.  And so on.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Roland Dreier
Shirley> The port failure means the SW clients initilization of
Shirley> that port failure. Doesn't matter whether the link is
Shirley> up/down or the hardware/firmare problem. If encountering
Shirley> any of the SW errors, the upper users can't use that port
Shirley> correctly, or even the whole device correctly. It's
Shirley> easily to prove that if you set error points during
Shirley> client registration and start the upper users. The
Shirley> problems could be kernel hung, kernel oops. For example,
Shirley> if mad_client initilization ports failure and you start
Shirley> ipoib_client. ifconfig will hung in kernel. If sa_client
Shirley> failure, the ipoib multicast join will hit kernel
Shirley> oops. Staring the upper users without checking the
Shirley> depency resouce allocation is buggy. It is definitely
Shirley> worth to spend time to address this.

Yes, I agree we should fix the bugs in error handling during
registration.  However, I don't think that a mask of ports is the
right answer -- it doesn't seem to address the real issue.  We should
just make sure that if, say, the MAD layer fails to initialize a
device, then all clients that depend on the MAD layer don't try to use
that device.  I'm not sure what the right way to express these
dependencies is, however.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [MailServer Notification]To Recipient virus found and action taken.

2005-10-05 Thread Administrator
ScanMail for Microsoft Exchange has detected virus-infected attachment(s).

Sender = [EMAIL PROTECTED]
Recipient(s) = openib-general@openib.org
Subject = [openib-general] Your password has been successfully updated
Scanning time = 10/5/2005 12:04:40 PM
Engine/Pattern = 7.510-1002/2.873.00

Action on virus found:
The attachment email-password.zip contains WORM_MYTOB.EI virus. ScanMail has 
Deleted it. 

Warning to recipient. ScanMail has detected a virus.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Shirley Ma

> One thing that strikes me is to
have a single "bit map" (or it's equivalent, implemented in say
ib_device). This single "bit map"
corresponds to the physical ports. So, each of the higher level modules
only references this "bit map"
and one does not have mad client "bit map", sa client "bit
map" and so on 
-is my understanding of your proposal
correct?
With multiple "bit maps" isn't
there a risk of these not being in sync, resulting in hard to detect problems?

There is not a really bitmap there.
I just use it to be easily understood. The client registration has sequence.
Checking resouce dependency is needed to start upper client registration
on that port.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [MailServer Notification]To Recipient virus found and action taken.

2005-10-05 Thread Administrator
ScanMail for Microsoft Exchange has detected virus-infected attachment(s).

Sender = [EMAIL PROTECTED]
Recipient(s) = openib-general@openib.org
Subject = [openib-general] Your password has been successfully updated
Scanning time = 10/5/2005 2:04:22 PM
Engine/Pattern = 7.510-1002/2.873.00

Action on virus found:
The attachment email-password.zip contains WORM_MYTOB.EI virus. ScanMail has 
Deleted it. 

Warning to recipient. ScanMail has detected a virus.

10/5/2005
email-password.zip/Deleted 
openib-general@openib.org
[EMAIL PROTECTED]
[openib-general] Your password has been successfully updated
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Shirley Ma

The port failure means the SW clients
initilization of that port failure. Doesn't matter whether the link is
up/down or the hardware/firmare problem. If encountering any of the SW
errors, the upper users can't use that port correctly, or even the whole
device correctly. It's easily to prove that if you set error points during
client registration and start the upper users. The problems could be kernel
hung, kernel oops. For example, if mad_client initilization ports failure
and you start ipoib_client. ifconfig will hung in kernel. If sa_client
failure, the ipoib multicast join will hit kernel oops. Staring the upper
users without checking the depency resouce allocation is buggy. It is  definitely
worth to spend time to address this. 

And the complication is only added to
the client registration. The ports info are stored in ib_device, ib_cache,
ib_sa_device, cm_device, it's not hard to fix it.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Roland Dreier
Sayantan> Hello, This is in regard to the use of `ibv_modify_srq'
Sayantan> call. When I use this call, I get a segmentation
Sayantan> fault.

This is because the modify SRQ operation is not implemented at all in
libmthca.  Do you just want to set the SRQ limit?  That's not so hard
for me to implement.  However, you should be aware that as far as I
know, only mem-free HCAs generate the SRQ limited reached event.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] segmentation fault in ibv_modify_srq

2005-10-05 Thread Sayantan Sur
Hello,

This is in regard to the use of `ibv_modify_srq' call. When I use this
call, I get a segmentation fault. I have included the code snippet,
output of strace -ewrite=all command and dmesg output below. I'd be glad
if someone could help me get around the problem. Please let me know if
additional debug information is required.

TIA,
Sayantan.

Platform: Opteron 2.2GHz, Tyan S2895 motherboard, 2GB memory
OS: Linux 2.6.13.1-smp, SuSe 9.3
Firmware: 5.1.0
OpenIB svn rev: 3665 (the revision number might be off by a little, but
this version was checked out yesterday evening 04/10).

Code Snippet:
=
static void create_srq(void)
{
struct ibv_srq_init_attr srq_init_attr;
struct ibv_srq_attr srq_attr;

memset(&srq_init_attr, 0, sizeof(srq_init_attr));
memset(&srq_attr, 0, sizeof(srq_attr));

srq_init_attr.srq_context = ibv_dev.context;
srq_init_attr.attr.max_wr = viadev_rq_size; // is 300.
srq_init_attr.attr.max_sge = 1;
srq_init_attr.attr.srq_limit = 10;

ibv_dev.srq_hndl = ibv_create_srq(ibv_dev.ptag, &srq_init_attr);

if(!ibv_dev.srq_hndl) {
error_abort_all(GEN_EXIT_ERR, "Error creating SRQ\n");
}

srq_attr.max_wr = viadev_rq_size;
srq_attr.max_sge = 1;
srq_attr.srq_limit = 10;

// Fails after this call
if(ibv_modify_srq(ibv_dev.srq_hndl, &srq_attr, IBV_SRQ_LIMIT)) {

error_abort_all(GEN_EXIT_ERR, "Couldn't modify SRQ
limit\n");
}

fprintf(stderr,"[%d] limit %d\n", ibv_dev.me, srq_attr.srq_limit);
}

===

Strace output
===
[EMAIL PROTECTED]:osu_benchmarks] ../bin/mpirun_rsh -np 2 ro0 ro1 strace -ewrite
-ewrite=all ./lat 
write(3, "\0\0\0\0\4\0\4\0PT\317\377\377\177\0\0", 16write(3,
"\0\0\0\0\4\0\4\0\20\370\233\377\377\177\0\0", 16) = 16
 | 0  00 00 00 00 04 00 04 00  10 f8 9b ff ff 7f 00 00  
 |
write(3, "\3\0\0\0\4\0\3\0\320\367\233\377\377\177\0\0", 16) = 16
 | 0  03 00 00 00 04 00 03 00  d0 f7 9b ff ff 7f 00 00  
 |
write(3, "\3\0\0\0\4\0\3\0 \370\233\377\377\177\0\0", 16) = 16
 | 0  03 00 00 00 04 00 03 00  20 f8 9b ff ff 7f 00 00  
... |
write(3, "\2\0\0\0\6\0\n\0\340\367\233\377\377\177\0\0\1\335\324"...,
24) = 24
 | 0  02 00 00 00 06 00 0a 00  e0 f7 9b ff ff 7f 00 00  
 |
 | 00010  01 dd d4 00 00 00 00 00   
|
) = 16
 | 0  00 00 00 00 04 00 04 00  50 54 cf ff ff 7f 00 00  
PT.. |
write(3, "\3\0\0\0\4\0\3\0\20T\317\377\377\177\0\0", 16) = 16
 | 0  03 00 00 00 04 00 03 00  10 54 cf ff ff 7f 00 00  
.T.. |
write(3, "\3\0\0\0\4\0\3\0`T\317\377\377\177\0\0", 16) = 16
 | 0  03 00 00 00 04 00 03 00  60 54 cf ff ff 7f 00 00  
`T.. |
write(3, "\2\0\0\0\6\0\n\0 T\317\377\377\177\0\0\1\335\324\0\0\0"...,
24) = 24
 | 0  02 00 00 00 06 00 0a 00  20 54 cf ff ff 7f 00 00  
T.. |
 | 00010  01 dd d4 00 00 00 00 00   
|
write(3, "\t\0\0\0\f\0\3\0 S\317\377\377\177\0\0\0\20\325\0\0\0\0"...,
48) = 48
 | 0  09 00 00 00 0c 00 03 00  20 53 cf ff ff 7f 00 00  
S.. |
write(3, "\t\0\0\0\f\0\3\0\340\366\233\377\377\177\0\0\0\20\325\0"...,
48) = 48
 | 0  09 00 00 00 0c 00 03 00  e0 f6 9b ff ff 7f 00 00  
 |
 | 00010  00 10 d5 00 00 00 00 00  00 00 20 00 00 00 00 00   ..
. |
 | 00020  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  
 |
write(3, "\22\0\0\0\22\0\4\0\260\367\233\377\377\177\0\0 \331\324"...,
72) = 72
 | 0  12 00 00 00 12 00 04 00  b0 f7 9b ff ff 7f 00 00  
 |
 | 00010  20 d9 d4 00 00 00 00 00  ff ff 00 00 00 00 00 00   ...
 |
 | 00020  ff ff ff ff 00 00 00 00  02 26 00 4c 07 00 12 00  
.&.L |
 | 00030  00 40 f5 00 00 00 00 00  00 20 f5 00 00 00 00 00  [EMAIL PROTECTED] .
.. |
 | 00040  00 00 00 00 ff 7f 00 00   
|
write(3, "\t\0\0\0\f\0\3\0 \367\233\377\377\177\0\0\0`\365\0\0\0"...,
48) = 48
 | 0  09 00 00 00 0c 00 03 00  20 f7 9b ff ff 7f 00 00  
... |
 | 00010  00 60 f5 00 00 00 00 00  00 80 00 00 00 00 00 00  .`..
 |
 | 00020  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  
 |
write(3, " \0\0\0\16\0\3\0\340\367\233\377\377\177\0\0\0\1\325\0"...,
56) = 56
 | 0  20 00 00 00 0e 00 03 00  e0 f7 9b ff ff 7f 00 00   ...
 |
 | 00010  00 01 d5 00 00 00 00 00  01 00 00 00 2c 01 00 00  
,... |
 | 00010  00 10 d5 00 00 00 00 00  00 00 20 00 00 00 00 00   ..
. |
 | 00020  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  
 |
write(3, "\22\0\0\0\22\0\4\0\360S\317\377\377\177\0\0 \331\324\0"...,
72) = 72
 | 0  12 00 00 00 12 00 04 00  f0 53 cf ff ff 7f 00 00  
.S.. |
 | 00010  20 d9 d4 00 00 00 00 00  ff ff 00 00 00 00 00 00   ...
 |
 | 00020  ff ff ff ff 00 00 00 00  02 26 00 4c 07 00 12 00  
.&.L |
 | 00030  00 40 f5 00 00 0

Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Hal Rosenstock
On Wed, 2005-10-05 at 13:51, Roland Dreier wrote:
> Shirley> mad_client: This client doesn't allow partially ports. I
> Shirley> would like to suggestion only enable the ports when both
> Shirley> QP0&QP1 are successful. Don't know where QP0 can be used
> Shirley> while QP1 is absent. (You can tell me if there is a
> Shirley> case.) The upper users are ib_umad, ib_cm, ib_sa.
> 
> If the drivers can't access QP0 until the port is active, how does one
> run an SM?

or perhaps also a software based SMA ?

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Roland Dreier
Shirley> mad_client: This client doesn't allow partially ports. I
Shirley> would like to suggestion only enable the ports when both
Shirley> QP0&QP1 are successful. Don't know where QP0 can be used
Shirley> while QP1 is absent. (You can tell me if there is a
Shirley> case.) The upper users are ib_umad, ib_cm, ib_sa.

If the drivers can't access QP0 until the port is active, how does one
run an SM?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Roland Dreier
Shirley> One HCA could support 256 ports. The current
Shirley> implementation doesn't support partially successful
Shirley> ports, which would be a waste if any of the port
Shirley> failure.

What does "port failure" mean?  If it just means that the port is not
active, then I think the drivers should still be able to use the
port.  I don't know of anything in the IB spec that says a port can
only be used if its link is up.

It seems fantastically unlikely that we'll some HCA failure that means
a particular port can never be used but the rest of the HCA continues
to work.  So I don't think it's worth spending time on that either.

Right now my feeling is that we don't want to add the complication
entailed by having to track individual HCA ports, just to work around
a certain hardware/firmware quirk (which I would argue is in fact a bug).

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Pradeep Satyanarayana

One thing that strikes me is to have a single "bit map" (or it's equivalent, implemented in say
ib_device). This single "bit map" corresponds to the physical ports. So, each of the higher level modules
only references this "bit map" and one does not have mad client "bit map", sa client "bit map" and so on 
-is my understanding of your proposal correct?

With multiple "bit maps" isn't there a risk of these not being in sync, resulting in hard to detect problems?

Pradeep
[EMAIL PROTECTED]

[EMAIL PROTECTED] wrote on 10/05/2005 09:52:53 AM:

> 
> One HCA could support 256 ports. The current implementation doesn't 
> support partially successful ports, which would be a waste if any of
> the port failure. And after adding some break points to induce 
> errors in each client during registration, some of the potential 
> problems will be triggered. Here is my proposal to enable partial 
> ports. Basically the upper user's physical ports number is going to 
> replaced by the successful ports bitmap of the client it depends on.
> I have done some research on each client for enabling partially 
> ports on HCA, and created some patches and tested the idea. Please 
> correct if my understanding is wrong. Also if you have other idea, 
> please share. 
> 
> cache_client: This client allows partially ports. But 
> ib_cache_update() might fail on a port whose pkey_cache, gid_cache 
> fail to be generated, so all the upper level users can be only 
> allowed on the successful ports not the HCA's physical ports number.
> There are 9 upper users there, they are: ib_srp,ib_sdp,ib_uverbs,
> ib_umad,ib_cm, ib_ipoib,ib_sa,ib_mad. 
> 
> mad_client: This client doesn't allow partially ports. I would like 
> to suggestion only enable the ports when both QP0&QP1 are 
> successful. Don't know where QP0 can be used while QP1 is absent. 
> (You can tell me if there is a case.) The upper users are ib_umad, 
> ib_cm, ib_sa. 
> 
> cm_client: This client doesn't allow partial ports. To enable 
> partial ports, these upper users ib_ucm, ib_srp, ib_sdp can be only 
> allowed on the successful ports. 
> 
> sa_client: This client doesn't allow partial ports. To enable 
> partial ports, these upper users ib_ipoib, ib_srp, ib_sdp, ib_at can
> be only allowed on the successful ports. 
> 
> ipoib_client: This client does allow partial ports. 
> 
> The number of physical ports should be replaced by each client's 
> successful ports. For example ipoib_client will be allowed on 
> sa_client ports bitmap, sa_client will be allowed on mad_client 
> ports bitmap, mad_client will be allowed on cache_client ports bitmap. 
> 
> Adding bitmap field is not necessary, the ib_cache, ib_device, 
> ib_sa_device, cm_device stored all the ports info. ib_uat & kdapl & 
> ib_ping should be updated too. 
> 
> Thanks
> Shirley Ma
> IBM Linux Technology Center
> 15300 SW Koll Parkway
> Beaverton, OR 97006-6063
> Phone(Fax): (503) 578-7638___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH]proposal for enabling partial ports on HCA

2005-10-05 Thread Shirley Ma

One HCA could support 256 ports. The
current implementation doesn't support partially successful ports, which
would be a waste if any of the port failure. And after adding some break
points to induce errors in each client during registration, some of the
potential problems will be triggered. Here is my proposal to enable partial
ports. Basically the upper user's physical ports number is going to replaced
by the successful ports bitmap of the client it depends on. I have done
some research on each client for enabling partially ports on HCA, and created
some patches and tested the idea. Please correct if my understanding is
wrong. Also if you have other idea, please share.

cache_client: This client allows partially
ports. But ib_cache_update() might fail on a port whose pkey_cache, gid_cache
fail to be generated, so all the upper level users can be only allowed
on the successful ports not the HCA's physical ports number. There are
9 upper users there, they are: ib_srp,ib_sdp,ib_uverbs,ib_umad,ib_cm, ib_ipoib,ib_sa,ib_mad.


mad_client: This client doesn't allow
partially ports. I would like to suggestion only enable the ports when
both QP0&QP1 are successful. Don't know where QP0 can be used while
QP1 is absent. (You can tell me if there is a case.) The upper users are
ib_umad, ib_cm, ib_sa. 

cm_client: This client doesn't allow
partial ports. To enable partial ports, these upper users ib_ucm, ib_srp,
ib_sdp can be only allowed on the successful ports.

sa_client: This client doesn't allow
partial ports. To enable partial ports, these upper users ib_ipoib, ib_srp,
ib_sdp, ib_at can be only allowed on the successful ports.

ipoib_client: This client does allow
partial ports.

The number of physical ports should
be replaced by each client's successful ports. For example ipoib_client
will be allowed on sa_client ports bitmap, sa_client will be allowed on
mad_client ports bitmap, mad_client will be allowed on cache_client ports
bitmap.

Adding bitmap field is not necessary,
the ib_cache, ib_device, ib_sa_device, cm_device stored all the ports info.
ib_uat & kdapl & ib_ping should be updated too.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH]small cleanup in cache.c

2005-10-05 Thread Shirley Ma

Yes, as long as it's on Linux it's safe.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH]small cleanup in cache.c

2005-10-05 Thread Roland Dreier
> -   kfree(old_pkey_cache);
> -   kfree(old_gid_cache);
> +   if (old_pkey_cache)
> +   kfree(old_pkey_cache);
> +   if (old_gid_cache)
> +   kfree(old_gid_cache);

This isn't needed and in fact having this check is considered bad
kernel style.  The first thing kfree() does is check if the pointer is
NULL, so duplicating this check in the caller just makes the code bigger.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] ib_cm_listen failure

2005-10-05 Thread Todd Bowman
On 9/30/05, James Lentini <[EMAIL PROTECTED]> wrote:
On Fri, 30 Sep 2005, Todd Bowman wrote:> udapl is using 0x115d3. How is this set and what value should it be?>> ToddOn InfiniBand, uDAPL maps connection qualifiers onto service IDs
(SIDs).The connection qualifier is chosen by the uDAPL application when itcreates a Public Service Point (PSP) or Reserved Service Point (RSP).As Arlin noted, 0x115d3 is in the SDP range. The dapltest test tools
uses 0xB0de. I would try any value except those in the range0x1-0x1f and 0xB0de.james

Here is a patch for dtest.c to remove the qualifier from the sdp range.  

Index: userspace/dapl/test/dtest/dtest.c
===
--- userspace/dapl/test/dtest/dtest.c   (revision 3547)
+++ userspace/dapl/test/dtest/dtest.c   (working copy)
@@ -53,7 +53,7 @@
 #include    "dat/udat.h"

 /* definitions */
-#define SERVER_CONN_QUAL    71123
+#define SERVER_CONN_QUAL    45248
 #define DTO_TIMEOUT    (1000*1000*5)
 #define DTO_FLUSH_TIMEOUT   (1000*1000*2)
 #define CONN_TIMEOUT    (1000*1000*10)

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH]small cleanup in cache.c

2005-10-05 Thread Shirley Ma


The first time ib_cache_update being
called both old_pkey_cache & old_gid_cache are NULL.

Signed-off-by: Shirley Ma ([EMAIL PROTECTED])

diff -uprN infiniband/core/cache.c infiniband-patch/core/cache.c
--- infiniband/core/cache.c        2005-10-05
06:59:34.0 -0700
+++ infiniband-patch/core/cache.c        2005-10-05
08:55:42.550693304 -0700
@@ -252,8 +252,10 @@ static void ib_cache_update(struct ib_de
 
         write_unlock_irq(&device->cache.lock);
 
-        kfree(old_pkey_cache);
-        kfree(old_gid_cache);
+        if (old_pkey_cache)
+                kfree(old_pkey_cache);
+        if (old_gid_cache)
+                kfree(old_gid_cache);
         kfree(tprops);
         return;
 



Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638



freecache.patch
Description: Binary data
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [MailServer Notification]To Recipient virus found and action taken.

2005-10-05 Thread Administrator
ScanMail for Microsoft Exchange has detected virus-infected attachment(s).

Sender = [EMAIL PROTECTED]
Recipient(s) = openib-general@openib.org
Subject = [openib-general] Important Notification
Scanning time = 10/5/2005 7:04:11 AM
Engine/Pattern = 7.510-1002/2.873.00

Action on virus found:
The attachment account-report.zip contains WORM_MYTOB.EI virus. ScanMail has 
Deleted it. 

Warning to recipient. ScanMail has detected a virus.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [MailServer Notification]To Recipient virus found and action taken.

2005-10-05 Thread Administrator
ScanMail for Microsoft Exchange has detected virus-infected attachment(s).

Sender = [EMAIL PROTECTED]
Recipient(s) = openib-general@openib.org
Subject = [openib-general] Important Notification
Scanning time = 10/5/2005 6:26:45 AM
Engine/Pattern = 7.510-1002/2.873.00

Action on virus found:
The attachment account-report.zip contains WORM_MYTOB.EI virus. ScanMail has 
Deleted it. 

Warning to recipient. ScanMail has detected a virus.

10/5/2005
account-report.zip/Deleted 
openib-general@openib.org
[EMAIL PROTECTED]
[openib-general] Important Notification
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] ipv4/fib_frontend.c: (Re)export ip_dev_find for 2.6.14

2005-10-05 Thread Hal Rosenstock
Hi,

The following patch is currently needed for 2.6.14-rc3 (for SDP and AT).

I placed this in
gen2/trunk/src/linux-kernel/patches/linux-2.6.14-rc3-fib-frontend.diff

-- Hal

ipv4/fib_frontend.c: (Re)export ip_dev_find for 2.6.14
This was removed at 2.6.14 as part of a general cleanup
as noone outside of IP currently is using this 
(but SDP and AT currently do)

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -661,4 +661,5 @@ void __init ip_fib_init(void)
 }
 
 EXPORT_SYMBOL(inet_addr_type);
+EXPORT_SYMBOL(ip_dev_find);
 EXPORT_SYMBOL(ip_rt_ioctl);





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general