date:20051110

Re: [openib-general] OpenSM and Wrong SM_Key

2005-11-10 Thread Troy Benjegerdes

On Wed, Nov 09, 2005 at 09:46:06AM +0200, Eitan Zahavi wrote:
> Hi Hal,
> 
> I would like to bring this to MgtWG before we change anything.
> IMO the situation when this happens is really not "legal" since if the
> SM's are not coordinated at least in their SM_Key it will cause the two
> masters on the subnet. 
> 
> >From our experience it is always better to cause a fatal flow and exit
> the SM rather then report the event in some log - normally it will not
> be seen ...
> 
> I know this is a controversial issue. 

Okay, so you're telling me you *WANT* behavior where a rogue node can
trivially cause the running subnet manager to exit and take over
management of the network?

Opensm needs to have a well documented config file, instead of 3 pages
of command line options, and different levels of logging. What to do in
the above situation is a site-local policy config decision, not something 
that should be hard-coded in the SM source code.

The logs might actually get looked at if there wasn't junk in the log
every time something timed out.

The linux kernel has 'WARN, NOTICE, and CRITICAL' level log messages.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re:先払い

2005-11-10 Thread 香奈

【受信メール1件】届きました。
『名前』：kirarin
『年齢』：27歳
『職業』：自営業
『年収』：1000万円
『写真』：あり
『一言』：正直に言うとエッチ希望なんです。10万円先払いしますのでここに連絡
  くれませんか？連絡くるまで待ってます。090-8012-
☆こちらから無料返信☆
http://lov025.com/?senyoh
※現在、kirarinさんからの指名メールは貴方様への一通のみとなっております。
※番号の続きは本人掲示板にてご確認下さい。
☆yahooアドレスなどフリーメールアドレスからでも登録できます☆






拒否の方
[EMAIL PROTECTED]

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Michael S. Tsirkin

Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- 
> avoid dlists
> 
>  > The const confuses me somewhat.
> 
> Yeah, and thinking about it more, the memory really belongs to the
> consumer of the function.  So I don't think the const is even correct.
> 
>  > > extern int ibv_get_device_list(struct ibv_device ***list);
> 
>  > Is ***list really what we want here?  Can we just get away with **list?
> 
> Yes -- a single device is represented by a struct ibv_device *.
> So an array of devices is represented by a struct ibv_device **.
> And a pointer to such an array is struct ibv_device ***.
> 
> But the following is OK too I think:
> 
>   extern int ibv_get_device_list(struct ibv_device **list[]);
>   extern void ibv_free_device_list(struct ibv_device *list[]);
> 
> is that clearer?  (a pointer to an array of pointers to struct ibv_device).

Yes, this looks good.

>  > Would something like:
>  > 
>  > struct ibv_device * ibv_get_device(index);
>  > 
>  > work as well?
> 
> That could work as well.  But it doesn't handle hotplug quite as well.
> By returning a snapshot of all the known devices at a given moment, we
> at least have a chance at doing something sensible with devices
> appearing or disappearing.
> 
>  - R.

I agree. With ibv_free_device_list we just need to document that
the application is supposed to close devices it doesnt listen
for hotplug on.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3

2005-11-10 Thread Caitlin Bestler

Sean Hefty wrote:
> Caitlin Bestler wrote:
>> Current CM software could generate the Serive ID. Therefore the fact
>> that the Private Data is in the "new format" cannot be part of the
>> Service ID. Otherwise I agree with your analysis that data can be
>> moved to the Serivce ID. Which is more valuable,
>> 4 more bytes of private data or a very larger number of Service IDS,
>> is another topic.
> 
> The CM would still need to know what range of service IDs can
> be generated.  I don't believe that the range can overlap
> with an existing range that is already defined without
> needing to redefine service records and other items.  The
> extra bit in essence becomes a 65th bit for the service ID in such
> cases. 
>

How would you prevent someone using old CM software from
forging their IP address in user mode and requesting the
Service ID from an old CM implementation that did not know
to check newly standardized portion of what it thinks of
as entirely "private" data?

By comparison, an RDMA application on an iWARP system
cannot receive a "connection established" event until
the IP Address has been validated by kernels at both
end and by the ability to round-trip with said IP
address.
 
> The additional 4 bytes of private data come at an expense of
> consuming something like .006% of the service ID space.
> 
>>> To be clear, the CM REQ _carries_ the IP address.  There should be
>>> no requirement that the CM performs the mapping, and I see no
>>> reason why it should even care.
>> 
>> The CM needs to have at least the capability of validating the local
>> IP address supplied.
> 
> Validation can be done outside of the CM in a separate module.
> 
That's fine. Just as long as an application that wants to cheat
has to consider the possibility that the kernel might validate.
Similarly ingress validation *might* be done in an IP network.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3

2005-11-10 Thread Sean Hefty


Caitlin Bestler wrote:

Current CM software could generate the Serive ID. Therefore
the fact that the Private Data is in the "new format" cannot
be part of the Service ID. Otherwise I agree with your analysis
that data can be moved to the Serivce ID. Which is more valuable,
4 more bytes of private data or a very larger number of Service
IDS, is another topic.


The CM would still need to know what range of service IDs can be generated.  I 
don't believe that the range can overlap with an existing range that is already 
defined without needing to redefine service records and other items.  The extra 
bit in essence becomes a 65th bit for the service ID in such cases.


The additional 4 bytes of private data come at an expense of consuming something 
like .006% of the service ID space.



To be clear, the CM REQ _carries_ the IP address.  There
should be no requirement that the CM performs the mapping,
and I see no reason why it should even care.


The CM needs to have at least the capability of validating
the local IP address supplied.


Validation can be done outside of the CM in a separate module.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] socket based connectionmodel for IB proposal - round 3

2005-11-10 Thread Sean Hefty

>> If you want to maximize consumer usable private data, then
>> you can move the version, IP version, protocol, source and
>> destination ports into the service ID.
>
>Not at the expense of redefining what Service ID is.
>How do you propose to move all these fields into Service ID without
>violating IBTA spec Annex A3.2.? Remember Service ID is what responder
>advertize and requestor sends communucation requests to. It may be
>possible
>to server to advertize multiple service IDs to cover version and IP
>version
>variations but it will not be symmetrical to iWARP. Port is port
>(service ID)
>and address is address. Port does not encode IP version.

The service ID could be formatted as:

Set ID: 24
Version: 4  
IP version:  4
Src port:   16
Dst port:   16

I don't see how this violates the spec.  Beyond the set ID, the rest is defined
as "any".  It's not necessary, but it does save 4 bytes of private data for the
user.

>> Separately, if there's any defined mapping to a service ID or
>> set of service IDs, then the service ID indicates the format
>> of the private data.  No additional information is needed in
>> the CM REQ, such as using a reserve bit.
>
>That is a good point.
>But this restricts the usage of IP addressing only to these ports.

It doesn't restrict the usage at all.  It defines a portion of the private data
for a specific range of service IDs, the same way it is done for SDP.  There's
no restriction that other service IDs not use the same format.

Even with the proposal to use a reserved bit in the CM, a particular service
could format its private data this way, not set the bit, and still be spec
compliant.

>The question is what is easier to check 1 bit or Service ID.
>Of course, service ID will have to be checked anyhow to direct the
>request.

Exactly.  If the service ID is checked anyway, why set the bit?

>While this overloads the semantic meaning of Service ID it is a viable
>method.

How is this not viable?  There's a _working_ implementation today for both
userspace and kernel mode clients to connect using IP addressing that didn't
require any modifications to the IB CM.

>> To be clear, the CM REQ _carries_ the IP address.  There
>> should be no requirement that the CM performs the mapping,
>> and I see no reason why it should even care.
>>
>Can you elaborate on this? Is this addresses who populates the formated
>portion of
>the provate data?

I'm referring to who formats the private data and performs the mapping to the
service IDs (slide 13)

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Johann George

> It seemed faintly preferable to tell the caller how big the array was
> rather than forcing the caller to count for itself.

If you really wanted that, I would be more inclined towards:

struct ibv_device ** ibv_get_device(*length_ptr)

and if you do not want length, you could pass a null length_ptr.  But since
I also cannot think of a strong case for it, I prefer the cleaner interface
of leaving it out.

Johann
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3

2005-11-10 Thread Caitlin Bestler

[EMAIL PROTECTED] wrote:
> If you want to maximize consumer usable private data, then
> you can move the version, IP version, protocol, source and
> destination ports into the service ID.
> 
> Separately, if there's any defined mapping to a service ID or
> set of service IDs, then the service ID indicates the format
> of the private data.  No additional information is needed in
> the CM REQ, such as using a reserve bit.
> 

Current CM software could generate the Serive ID. Therefore
the fact that the Private Data is in the "new format" cannot
be part of the Service ID. Otherwise I agree with your analysis
that data can be moved to the Serivce ID. Which is more valuable,
4 more bytes of private data or a very larger number of Service
IDS, is another topic.

> To be clear, the CM REQ _carries_ the IP address.  There
> should be no requirement that the CM performs the mapping,
> and I see no reason why it should even care.
>

The CM needs to have at least the capability of validating
the local IP address supplied.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3

2005-11-10 Thread Kanevsky, Arkady

Sean,
comments inline.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
275 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 

> -Original Message-
> From: Sean Hefty [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, November 10, 2005 6:01 PM
> To: Kanevsky, Arkady
> Cc: [EMAIL PROTECTED]; 
> openib-general@openib.org; [EMAIL PROTECTED]
> Subject: Re: [openib-general] RE: [dat-discussions] socket 
> based connection model for IB proposal - round 3
> 
> If you want to maximize consumer usable private data, then 
> you can move the version, IP version, protocol, source and 
> destination ports into the service ID.

Not at the expense of redefining what Service ID is.
How do you propose to move all these fields into Service ID without
violating IBTA spec Annex A3.2.? Remember Service ID is what responder
advertize and requestor sends communucation requests to. It may be
possible
to server to advertize multiple service IDs to cover version and IP
version
variations but it will not be symmetrical to iWARP. Port is port
(service ID)
and address is address. Port does not encode IP version. 

> 
> Separately, if there's any defined mapping to a service ID or 
> set of service IDs, then the service ID indicates the format 
> of the private data.  No additional information is needed in 
> the CM REQ, such as using a reserve bit.

That is a good point.
But this restricts the usage of IP addressing only to these ports.
The question is what is easier to check 1 bit or Service ID.
Of course, service ID will have to be checked anyhow to direct the
request.
While this overloads the semantic meaning of Service ID it is a viable
method.

> 
> To be clear, the CM REQ _carries_ the IP address.  There 
> should be no requirement that the CM performs the mapping, 
> and I see no reason why it should even care.
> 

Can you elaborate on this? Is this addresses who populates the formated
portion of
the provate data?

> - Sean
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: (SPAM?) Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Sean Hefty

Roland Dreier wrote:

 > Is ***list really what we want here?  Can we just get away with **list?

Yes -- a single device is represented by a struct ibv_device *.
So an array of devices is represented by a struct ibv_device **.
And a pointer to such an array is struct ibv_device ***.

I understand.  This is just API that I've seen that used '***'.  Why not just 
return a copy of the array?

 > Would something like:
 > 
 > struct ibv_device * ibv_get_device(index);
 > 
 > work as well?

That could work as well.  But it doesn't handle hotplug quite as well.
By returning a snapshot of all the known devices at a given moment, we
at least have a chance at doing something sensible with devices
appearing or disappearing.

This doesn't seem any worse to me.  The user can reference device_array[i] or 
call ibv_get_device(i).

I need to spend more time understanding how userspace hotplug will work.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Roland Dreier

 > I would prefer one call to get the entire structure.  Another option might
 > be:
 > 
 > struct ibv_device ** ibv_get_device()
 > 
 > where it returns a list which is null terminated so you do not need to
 > return the length.

Yes, I thought of that too.  It seemed faintly preferable to tell the
caller how big the array was rather than forcing the caller to count
for itself.  But I can't really think of a use case where it makes a
difference, so perhaps your simpler version is better.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Roland Dreier

 > The const confuses me somewhat.

Yeah, and thinking about it more, the memory really belongs to the
consumer of the function.  So I don't think the const is even correct.

 > > extern int ibv_get_device_list(struct ibv_device ***list);

 > Is ***list really what we want here?  Can we just get away with **list?

Yes -- a single device is represented by a struct ibv_device *.
So an array of devices is represented by a struct ibv_device **.
And a pointer to such an array is struct ibv_device ***.

But the following is OK too I think:

extern int ibv_get_device_list(struct ibv_device **list[]);
extern void ibv_free_device_list(struct ibv_device *list[]);

is that clearer?  (a pointer to an array of pointers to struct ibv_device).

 > Would something like:
 > 
 > struct ibv_device * ibv_get_device(index);
 > 
 > work as well?

That could work as well.  But it doesn't handle hotplug quite as well.
By returning a snapshot of all the known devices at a given moment, we
at least have a chance at doing something sensible with devices
appearing or disappearing.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Johann George

> Is ***list really what we want here?  Can we just get away with **list?
> 
> Would something like:
> 
> struct ibv_device * ibv_get_device(index);

I would prefer one call to get the entire structure.  Another option might
be:

struct ibv_device ** ibv_get_device()

where it returns a list which is null terminated so you do not need to
return the length.

Johann
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3

2005-11-10 Thread Sean Hefty

If you want to maximize consumer usable private data, then you can move the 
version, IP version, protocol, source and destination ports into the service ID.


Separately, if there's any defined mapping to a service ID or set of service 
IDs, then the service ID indicates the format of the private data.  No 
additional information is needed in the CM REQ, such as using a reserve bit.


To be clear, the CM REQ _carries_ the IP address.  There should be no 
requirement that the CM performs the mapping, and I see no reason why it should 
even care.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Johann George

> So how about just doing
> 
> /* put list of devices in list and return length of list */
> extern int ibv_get_device_list(struct ibv_device * const **list);
> 
> /* free a list of devices from ibv_get_device_list */
> extern void ibv_free_device_list(struct ibv_device * const *list);

I like it much better than what we have now.  Clean, simple and easy to
understand.

> Or are the consts too confusing?  Should we be a little less safe but
> make it nice and simple and just do

I often find consts a bit of a nuisance for what they give me; but am fine
either way.  Both are simple enough.

Johann
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Sean Hefty


Roland Dreier wrote:

Or are the consts too confusing?  Should we be a little less safe but
make it nice and simple and just do


The const confuses me somewhat.


extern int ibv_get_device_list(struct ibv_device ***list);


Is ***list really what we want here?  Can we just get away with **list?

Would something like:

struct ibv_device * ibv_get_device(index);

work as well?

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: [PATCHv1] userspace CMA

2005-11-10 Thread Sean Hefty

> . I don't see a file sa_kern-abi.h anywhere -- I think you forgot to
>   add it.  Also, please name it sa-kern-abi.h (ie all '-'s) -- mixed
>   underscores and dashes are just too hard to type and look weird.
>
> . Please add a ChangeLog entry covering the libibverbs changes.

Here's an updated patch for just libibverbs.

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>



Index: include/infiniband/sa-kern-abi.h
===
--- include/infiniband/sa-kern-abi.h(revision 0)
+++ include/infiniband/sa-kern-abi.h(revision 0)
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef SA_KERN_ABI_H
+#define SA_KERN_ABI_H
+
+#include 
+
+struct ib_kern_path_rec {
+   __u8  dgid[16];
+   __u8  sgid[16];
+   __u16 dlid;
+   __u16 slid;
+   __u32 raw_traffic;
+   __u32 flow_label;
+   __u32 reversible;
+   __u32 mtu;
+   __u16 pkey;
+   __u8  hop_limit;
+   __u8  traffic_class;
+   __u8  numb_path;
+   __u8  sl;
+   __u8  mtu_selector;
+   __u8  rate_selector;
+   __u8  rate;
+   __u8  packet_life_time_selector;
+   __u8  packet_life_time;
+   __u8  preference;
+};
+
+#endif /* SA_KERN_ABI_H */
Index: include/infiniband/sa.h
===
--- include/infiniband/sa.h (revision 0)
+++ include/infiniband/sa.h (revision 0)
@@ -0,0 +1,130 @@
+/*
+ * Copyright (c) 2004 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id: sa.h 2616 2005-06-15 15:22:39Z halr $
+ */
+
+#ifndef IB_SA_H
+#define IB_SA_H
+
+#include 
+
+enum ib_sa_rate {
+   IB_SA_RATE_2_5_GBPS = 2,
+   IB_SA_RATE_5_GBPS   = 5,
+   IB_SA_RATE_10_GBPS  = 3,
+   IB_SA_RATE_20_GBPS  = 6,
+   IB_SA_RATE_30_GBPS  = 4,
+   IB_SA_RATE_40_GBPS  = 7,
+   IB_SA_RATE_60_GBPS  = 8,
+   IB_SA_RATE_80_GBPS  = 9,
+   IB_SA_RATE_120_GBPS = 10
+};
+
+static inline int ib_sa_rate_enum_to_int(enum ib_sa_rate rate)
+{
+   switch (rate) {
+   case IB_SA_RATE_2_5_GBPS: return  1;
+   case IB_SA_RATE_5_GBPS:   ret

[openib-general] [RFC] new ibv_get_devices() API -- avoid dlists

2005-11-10 Thread Roland Dreier

Michael> Maybe its a naming thing? We can call the list
Michael> "iterator", does this make it less ugly?

I thought about this, but it feels like overkill for something pretty
simple.  So how about just doing

/* put list of devices in list and return length of list */
extern int ibv_get_device_list(struct ibv_device * const **list);

/* free a list of devices from ibv_get_device_list */
extern void ibv_free_device_list(struct ibv_device * const *list);

which could be used as:

struct ibv_device * const *list;
int list_len;

list_len = ibv_get_device_list(&list);

/* ... */

ibv_free_device_list(list);

Or are the consts too confusing?  Should we be a little less safe but
make it nice and simple and just do

extern int ibv_get_device_list(struct ibv_device ***list);

and so on?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] RE: [openib-general] socket based connection model for IB proposal -round 3

2005-11-10 Thread Kanevsky, Arkady


If other agree I am happy to make version 4 bits field.

We will use IPv4 encapsulation into IPv6 as defined by IETF.

0-based VA and remote invalidate are not relevant to IP addressing.
But we are proposing a change to IB CM so we need to address all the
differences
between IB and iWARP. This is why these are addressed in the discussion.

If we have protocol field than CM will populate this based on the
5-tuple of
socket_addr.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
275 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 

> -Original Message-
> From: Fab Tillier [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, November 10, 2005 4:53 PM
> To: Kanevsky, Arkady; openib-general@openib.org; 
> [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: [dat-discussions] RE: [openib-general] socket based 
> connection model for IB proposal -round 3
> 
> > From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, November 10, 2005 1:37 PM
> > 
> > It will be discussed at IBTA SWG meeting next week Tu.
> > Please, post your comments before that.
> 
> Looks fine to me overall.  The only thing I would change is 
> make the version field 4 bits rather than just 2, and shift 
> the IP version down 2 bits, eliminating the reserved bits.  
> That way, the first byte is split evenly between protocol 
> version and IP version.
> 
> Do we even need to indicate the IP version, or can IPv4 
> addresses be expressed as IPv6 addresses just by zeroing the 
> first 12 bytes?
> 
> I don't understand the relevance of the 0-based VA or Send 
> with Invalidate discussion points.  They seem orthogonal to 
> the socket-based CM proposal, and IMO should be moved to a 
> separate proposal.
> 
> I have no opinion one way or another on the presence of the 
> protocol field.  It could just as well be left as "flags" for 
> the consumer to do with what they please.
> 
> - Fab
> 
> 
> 
>  Yahoo! Groups Sponsor 
> ~--> Get Bzzzy! (real tools to help you 
> find a job). Welcome to the Sweet Life.
> http://us.click.yahoo.com/A77XvD/vlQLAA/TtwFAA/W6uqlB/TM
> --
> --~-> 
> 
>  
> Yahoo! Groups Links
> 
> <*> To visit your group on the web, go to:
> http://groups.yahoo.com/group/dat-discussions/
> 
> <*> To unsubscribe from this group, send an email to:
> [EMAIL PROTECTED]
> 
> <*> Your use of Yahoo! Groups is subject to:
> http://docs.yahoo.com/info/terms/
>  
> 
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCHv1] userspace CMA

2005-11-10 Thread Sean Hefty


Roland Dreier wrote:

 . I don't see a file sa_kern-abi.h anywhere -- I think you forgot to
   add it.  Also, please name it sa-kern-abi.h (ie all '-'s) -- mixed
   underscores and dashes are just too hard to type and look weird.


I did forget to add this file.  I've added it and renamed it to sa-kern-abi.h. 
The file only contains a definition for struct ib_kern_path_rec at the moment.



 . Please add a ChangeLog entry covering the libibverbs changes.


Will do.

Thanks,
Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCHv1] userspace CMA

2005-11-10 Thread Roland Dreier

The libibverbs bits look mostly OK but:

 . I don't see a file sa_kern-abi.h anywhere -- I think you forgot to
   add it.  Also, please name it sa-kern-abi.h (ie all '-'s) -- mixed
   underscores and dashes are just too hard to type and look weird.

   Or did you just put everything in kern-abi.h?  That's fine too,
   just remove the sa_kern-abi.h references.

 . Please add a ChangeLog entry covering the libibverbs changes.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] socket based connection model for IB proposal -round 3

2005-11-10 Thread Fab Tillier

> From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 10, 2005 1:37 PM
> 
> It will be discussed at IBTA SWG meeting next week Tu.
> Please, post your comments before that.

Looks fine to me overall.  The only thing I would change is make the version
field 4 bits rather than just 2, and shift the IP version down 2 bits,
eliminating the reserved bits.  That way, the first byte is split evenly between
protocol version and IP version.

Do we even need to indicate the IP version, or can IPv4 addresses be expressed
as IPv6 addresses just by zeroing the first 12 bytes?

I don't understand the relevance of the 0-based VA or Send with Invalidate
discussion points.  They seem orthogonal to the socket-based CM proposal, and
IMO should be moved to a separate proposal.

I have no opinion one way or another on the presence of the protocol field.  It
could just as well be left as "flags" for the consumer to do with what they
please.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3

2005-11-10 Thread Kanevsky, Arkady




Fixed the bit value for formating 
indicator.
 





Arkady Kanevsky   
email: [EMAIL PROTECTED]
Network 
Appliance Inc.   
phone: 781-768-5395
275 Totten 
Pond Rd.  
Fax: 
781-895-1195
Waltham, MA 
02451-2010  
central phone: 781-768-5300


IP Address Support by InfiniBand CM_v3.pdf
Description: IP Address Support by InfiniBand CM_v3.pdf
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] uDAPL free build issues cleaned up, print path records returned from uAT

2005-11-10 Thread James Lentini

On Thu, 10 Nov 2005, Arlin Davis wrote:

> James,
> 
> I fixed some problems with the free build openib_scm version. Also 
> turned down some debugging and added some debug prints for uAT path 
> records.
> 
> -arlin

Thanks Arlin. Committed in revision 4018.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] socket based connection model for IB proposal - round 3

2005-11-10 Thread Kanevsky, Arkady




It will be discussed 
at IBTA SWG meeting next week Tu.
Please, post your 
comments before that.
Thanks,
 





Arkady Kanevsky   
email: [EMAIL PROTECTED]
Network 
Appliance Inc.   
phone: 781-768-5395
275 Totten 
Pond Rd.  
Fax: 
781-895-1195
Waltham, MA 
02451-2010  
central phone: 781-768-5300
 


IP Address Support by InfiniBand CM_v3.pdf
Description: IP Address Support by InfiniBand CM_v3.pdf
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Caitlin Bestler




 

  My concern is the requirement that RDS resync the structures in the 
  face of failureand know whether to re-transmit or will deal with 
  duplicates.  Having pre-posted bufferswill help enable the resync to 
  be accomplished but should not be equated to pre-post equalsone can deal 
  with duplicates or will verify to prevent duplicates from 
  occurring.Mike  
   
The 
semantics should be that barring an error the flow between any 
two
endpoints is reliable and ordered.
 
The 
difference versus a normal point-to-point definition of reliable is 
that
a) 
lack of a receive buffer is an error, b) the endpoint 
communicates
with 
many known remote peers (as opposed to one known remote
peer, 
or many unknown).
 
Having 
an API with those semantics, particularly as an upgrade in
semanitcs from SOCK_DGRAM while preserving 
SOCK_DGRAM
syntax, is something that I believe is of distinct value to 
many
cluster based applications. Further the API can be 
implemeneted
in an 
offload device (IB or IP) more efficiently than if it is 
simply
implemented on top of SOCK_STREAM sockets by the 
application.
 
Documenting and clarifying the semantics to make it's general 
applicability
clearer should definitely be done, however.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Michael Krause

At 10:48 AM 11/10/2005, Caitlin Bestler wrote:

Mike Krause wrote in response to Greg Lindahl:

>   If it is to
be reasonably robust, then RDS should be required to
support
> the resync between the two sides of the communication.  This
aligns
with the
> stated objective of implementing reliability in one location in
software and
> one location in hardware.  Without such resync being required
in the
ULP,
> then one ends up with a ULP that falls shorts of its stated
objectives
and
> pushes complexity back up to the application which is where the
advocates
> have stated it is too complex or expensive to get it correct.

I haven't reread all of RDS fine print to double-check this, but my
impression is that RDS semantics exactly match the subset of MPI
point-to-point communications where the receiving rank is required
to have pre-posted buffers before the send is allowed.

My concern is the requirement that RDS resync the structures in the face
of failure and know whether to re-transmit or will deal with
duplicates.  Having pre-posted buffers will help enable the resync
to be accomplished but should not be equated to pre-post equals one can
deal with duplicates or will verify to prevent duplicates from
occurring.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] netperf over SDP bug

2005-11-10 Thread Grant Grundler

On Tue, Sep 27, 2005 at 06:17:00PM -0700, Grant Grundler wrote:
> Hi Michael,
> I'm trying to collect a full set of netperf TCP_STREAM over SDP for
> SVN r3547 on 2.6.13 kernel.  But some netperf runs get no throughput.

Michael,
I was able to reproduce this problem with SVN r3984.
I've posted the graphs for r3547 and r3984 on:
http://iou.parisc-linux.org/openib-perf-2005/r3547/
http://iou.parisc-linux.org/openib-perf-2005/r3984/

See sdpstream.png in each location.
I'll pursue collecting information you asked for a few weeks ago
as time permits.

The above data was collected with "netserver" bound to the same CPU
as the one taking IB MSI-X interrupts. This is bad for IPoIB (CPU bound)
and good for SDP (CPU cache). I'll rerun the r3984 data and bind
the netperf process as well.

BTW, in case I haven't mentioned this before, I setup a parisc-linux
box so netperf maintainer Rick Jones could manage his
releases using something better than tarballs. netperf 2.x and
netperf 4.x (under developement) source is available from:
svn co http://www.netperf.org/svn/netperf2/
svn co http://www.netperf.org/svn/netperf4/

thanks,
grant

> Usually when sending 1k to 4k messages.  The same netperf parameters
> sing IPoIB seem to be working fine - just alot slower of course.
> Summary of all netperf over SDP runs is appended.
> 
> Sample commandline that got < 1Mb/s throughput is:
> LD_PRELOAD=/usr/local/lib/libsdp.so /usr/local/bin/netperf -p 12866 -l 60 -H 
> 10.0.0.30 -t TCP_STREAM -T 1 -- -m 1024 -s 16384 -S 16384
> 
> I tried with some smaller -m parameters:
>   512   -> ~270-280 Mb/s
>   640   -> ~200-2100 Mb/s
>   768   -> ~30-50 Mb/s
>   896   -> ~2-6 Mb/s
> 
> CPU is essentially idle in the above 512-896 byte cases.
...
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [git pull] IB updates for 2.6.15

2005-11-10 Thread Roland Dreier

Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

The pull will get the following changes:

Jack Morgenstein:
  [IB] mthca: report page size capability
  [IB] uverbs: have kernel return QP capabilities

Michael S. Tsirkin:
  [IB] umad: two small fixes
  [IB] mthca: fix posting of atomic operations
  [IB] mthca: fix posting long lists of receive work requests

Roland Dreier:
  [IPoIB] add path record information in debugfs
  [IB] umad: avoid potential deadlock when unregistering MAD agents
  [IPoIB] no need to set skb->dev right before freeing skb
  [IB] mthca: fix typo in catastrophic error polling
  [IB] Have cq_resize() method take an int, not int*
  [IB] umad: get rid of unused mr array
  [IB] mthca: fix wraparound handling in mthca_cq_clean()
  [IB] umad: further ib_unregister_mad_agent() deadlock fixes

 drivers/infiniband/core/user_mad.c |  129 ++---
 drivers/infiniband/core/uverbs_cmd.c   |   12 +-
 drivers/infiniband/core/verbs.c|   12 --
 drivers/infiniband/hw/mthca/mthca_catas.c  |2 
 drivers/infiniband/hw/mthca/mthca_cmd.c|2 
 drivers/infiniband/hw/mthca/mthca_cq.c |   16 +-
 drivers/infiniband/hw/mthca/mthca_dev.h|2 
 drivers/infiniband/hw/mthca/mthca_main.c   |2 
 drivers/infiniband/hw/mthca/mthca_provider.c   |3 
 drivers/infiniband/hw/mthca/mthca_provider.h   |1 
 drivers/infiniband/hw/mthca/mthca_qp.c |  113 +--
 drivers/infiniband/hw/mthca/mthca_srq.c|   22 +++
 drivers/infiniband/hw/mthca/mthca_wqe.h|3 
 drivers/infiniband/ulp/ipoib/ipoib.h   |   15 +-
 drivers/infiniband/ulp/ipoib/ipoib_fs.c|  179 
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   72 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   26 +--
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c  |7 -
 include/rdma/ib_user_verbs.h   |9 +
 include/rdma/ib_verbs.h|2 
 20 files changed, 466 insertions(+), 163 deletions(-)
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Lustre over OpenIB Gen2

2005-11-10 Thread Roland Dreier

Hi Eric... writing YAN (yet another NAL) I see :)

Eric> 2. I'd like to scale to >= 10,000 peer nodes; 1 RC QP per
Eric> peer.  Is this going to get me into trouble?

Eric>For example, I currently create a single PD and CQ for
Eric> everything, however the example I've seen (cmatose.c)
Eric> appears to create these separately for each peer.  Is that
Eric> what I should be doing too?

I don't think you want 10K PDs.  But having a single CQ big enough to
handle 10K QPs might be a problem.

Eric> 3. Is contiguous memory allocation an issue in Gen2?  Since
Eric> this is such a scarce resource in the kernel (and particular
Eric> CQ usage with one vendor's stack relied heavily on it) what
Eric> red flags should I be aware of?

There are still a few places where you can get in trouble (for
example, with the mthca driver, extremely large QP work queues might
be a problem, because the driver allocates contiguous memory for the
array used to track work request IDs -- not the work queues themselves
though).  But CQs in particular should be fine.

Eric> 4. Are RDMA reads still deprecated?  Which resources hit the
Eric> spotlight if I chose to use them?

I don't think RDMA reads were ever really deprecated.  But RDMA writes
probably pipeline better.

Eric> 5. Should I pre-map all physical memory and do RDMA in
Eric> page-sized fragments?  This avoids any mapping overhead at
Eric> the expense of having much larger numbers of queued RDMAs.
Eric> Since I try to keep up to 8 (by default) 1MByte RDMAs active
Eric> concurrently to any individual peer, with 4k pages I can
Eric> have up to 2048 RDMA work items queued at a time per peer.

Eric>And if I pre-map, can I be guaranteed that if I put the
Eric> CQ into the error state, all remote access to my memory is
Eric> revoked (e.g. could a CQ I create after I destroy the one I
Eric> just shut down somehow alias with it such that a
Eric> pathalogically delayed RDMA could write my memory)?

s/CQ/QP/ ... anyway, if you choose your receive queue sequence numbers
randomly, then the probability of a QP number/sequence number
collision allowing a stray RDMA is astronomically low (effectively 0).

Eric>Or is it better to use FMR pools and take the map/unmap
Eric> overhead?  If so, is there a way to know when the unmap
Eric> actually hits the hardware and my memory is safe?

FMRs are only supported on Mellanox HCAs at the moment.  But they do
have some advantages, like allowing you to convert a bunch of pages
into a single virtually contiguous region.  You can use the
ib_flush_fmr_pool() function to make sure that all unmapped FMRs are
really and truly flushed, but that is a slow operation (since it
incurs the penalty of flushing all in-flight operations in the HCA).

Eric> 6. Does Gen2 present substantially the same APIs as the
Eric> kernel in userspace?  So if I wrote a userspace equivalent
Eric> of my kernel driver, could I have pure userspace clients
Eric> talk to kernel servers?

Pretty much so, except of course userspace doesn't have access to
physical memory or FMRs.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] uDAPL free build issues cleaned up, print path records returned from uAT

2005-11-10 Thread Arlin Davis

James,

I fixed some problems with the free build openib_scm version. Also turned down 
some debugging and
added some debug prints for uAT path records.

-arlin

Signed-off by: Arlin Davis <[EMAIL PROTECTED]>

Index: dapl/openib/dapl_ib_cm.c
===
--- dapl/openib/dapl_ib_cm.c(revision 3990)
+++ dapl/openib/dapl_ib_cm.c(working copy)
@@ -136,14 +136,27 @@ static void dapli_path_comp_handler(uint
 
dapl_dbg_log(DAPL_DBG_TYPE_CM, 
" path_comp_handler: SRC GID subnet %016llx id %016llx\n",
-   (unsigned long 
long)cpu_to_be64(conn->dapl_rt.sgid.global.subnet_prefix),
-   (unsigned long 
long)cpu_to_be64(conn->dapl_rt.sgid.global.interface_id) );
+   (unsigned long 
long)cpu_to_be64(conn->dapl_path.sgid.global.subnet_prefix),
+   (unsigned long 
long)cpu_to_be64(conn->dapl_path.sgid.global.interface_id) );
 
dapl_dbg_log(DAPL_DBG_TYPE_CM, 
" path_comp_handler: DST GID subnet %016llx id %016llx\n",
-   (unsigned long 
long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix),
-   (unsigned long 
long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) );
+   (unsigned long 
long)cpu_to_be64(conn->dapl_path.dgid.global.subnet_prefix),
+   (unsigned long 
long)cpu_to_be64(conn->dapl_path.dgid.global.interface_id) );
 
+   dapl_dbg_log(DAPL_DBG_TYPE_CM, 
+   " path_comp_handler: slid %x dlid %x mtu %x(%x) pktlife 
%x(%x)\n",
+   ntohs(conn->dapl_path.slid), ntohs(conn->dapl_path.dlid),
+   conn->dapl_path.mtu, conn->dapl_path.mtu_selector,
+   conn->dapl_path.packet_life_time, 
+   conn->dapl_path.packet_life_time_selector );
+
+   dapl_dbg_log(DAPL_DBG_TYPE_CM, 
+   " path_comp_handler: hops %x npaths %x pkey %x tclass %x rate 
%x(%x)\n",
+   conn->dapl_path.hop_limit, conn->dapl_path.numb_path,
+   conn->dapl_path.pkey, conn->dapl_path.traffic_class,
+   conn->dapl_path.rate, conn->dapl_path.rate_selector);
+   
if (rec_num <= 0) {
dapl_dbg_log(DAPL_DBG_TYPE_CM, 
 " path_comp_handler: ERR %d retry %d\n",
Index: dapl/openib_scm/dapl_ib_cm.c
===
--- dapl/openib_scm/dapl_ib_cm.c(revision 3990)
+++ dapl/openib_scm/dapl_ib_cm.c(working copy)
@@ -285,7 +285,7 @@ dapli_socket_listen (   DAPL_IA *ia_ptr,
if (( bind( cm_ptr->l_socket,(struct sockaddr*)&addr, sizeof(addr) ) < 
0) ||
   (listen( cm_ptr->l_socket, 128 ) < 0) ) {

-   dapl_dbg_log( DAPL_DBG_TYPE_ERR,
+   dapl_dbg_log( DAPL_DBG_TYPE_CM,
" listen: ERROR %s on conn_qual 0x%x\n",
strerror(errno),serviceID); 
 
@@ -313,7 +313,7 @@ dapli_socket_listen (   DAPL_IA *ia_ptr,

return dat_status;
 bail:
-   dapl_dbg_log( DAPL_DBG_TYPE_ERR,
+   dapl_dbg_log( DAPL_DBG_TYPE_CM,
" listen: ERROR on conn_qual 0x%x\n",serviceID); 
if ( cm_ptr->l_socket >= 0 )
close( cm_ptr->l_socket );
Index: dapl/openib_scm/dapl_ib_cq.c
===
--- dapl/openib_scm/dapl_ib_cq.c(revision 3990)
+++ dapl/openib_scm/dapl_ib_cq.c(working copy)
@@ -569,7 +569,6 @@ dapls_ib_wait_object_wait (
 {
struct dapl_evd *evd_ptr;
struct ibv_cq   *ibv_cq = NULL;
-   void*ibv_ctx = NULL;
int status = 0; 
int timeout_ms = -1;
struct pollfd cq_fd = {
@@ -602,7 +601,7 @@ dapls_ib_wait_object_wait (

dapl_dbg_log (DAPL_DBG_TYPE_CM, 
  " cq_object_wait: RET evd %p ibv_cq %p ibv_ctx %p %s\n",
- evd_ptr, ibv_cq,ibv_ctx,strerror(errno));
+ evd_ptr, ibv_cq,strerror(errno));

return(dapl_convert_errno(status,"cq_wait_object_wait"));




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Lustre over OpenIB Gen2

2005-11-10 Thread Roland Dreier

Eric> However I guess this still means that CQ resources
Eric> sufficient for the maximum number of RDMAs I _could_ queue
Eric> have to be allocated...

In general there will be a relatively low limit on the maximum CQ
size.  For example, the maximum CQ size on Mellanox HCAs is ~128K
entries.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Lustre over OpenIB Gen2

2005-11-10 Thread Eric Barton

Yes, of course; I meant the QP.

Regarding the total number of outstanding RDMA work requests,
I can keep a separate cap on that, so if relatively few peers
are active, I push the maximum number of RDMAs at them, but if
many peers are active the number of active RDMAs per peer reduces.

However I guess this still means that CQ resources sufficient for
the maximum number of RDMAs I _could_ queue have to be allocated...

> -Original Message-
> From: Sean Hefty [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 10, 2005 7:12 PM
> To: Eric Barton
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] Lustre over OpenIB Gen2
> 
> 
> Eric Barton wrote:
> > 5. Should I pre-map all physical memory and do RDMA in 
> page-sized fragments?
> >This avoids any mapping overhead at the expense of 
> having much larger
> >numbers of queued RDMAs.  Since I try to keep up to 8 
> (by default) 1MByte
> >RDMAs active concurrently to any individual peer, with 
> 4k pages I can have
> >up to 2048 RDMA work items queued at a time per peer.
> 
> This is 20 million outstanding RDMA work requests per node.
> 
> >And if I pre-map, can I be guaranteed that if I put the 
> CQ into the error
> >state, all remote access to my memory is revoked (e.g. 
> could a CQ I create
> >after I destroy the one I just shut down somehow alias 
> with it such that a
> >pathalogically delayed RDMA could write my memory)?
> 
> I think that you mean QP into the error state.  If the QP is 
> in the error state, 
> then further access from a remote system should be impossible.
> 
> - Sean
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Lustre over OpenIB Gen2

2005-11-10 Thread Sean Hefty


Eric Barton wrote:

5. Should I pre-map all physical memory and do RDMA in page-sized fragments?
   This avoids any mapping overhead at the expense of having much larger
   numbers of queued RDMAs.  Since I try to keep up to 8 (by default) 1MByte
   RDMAs active concurrently to any individual peer, with 4k pages I can have
   up to 2048 RDMA work items queued at a time per peer.


This is 20 million outstanding RDMA work requests per node.


   And if I pre-map, can I be guaranteed that if I put the CQ into the error
   state, all remote access to my memory is revoked (e.g. could a CQ I create
   after I destroy the one I just shut down somehow alias with it such that a
   pathalogically delayed RDMA could write my memory)?


I think that you mean QP into the error state.  If the QP is in the error state, 
then further access from a remote system should be impossible.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Rick Frank

Yes, this is the case.

- Original Message - 
From: "Caitlin Bestler" <[EMAIL PROTECTED]>

To: 
Sent: Thursday, November 10, 2005 1:48 PM
Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS 
(ReliableDatagramSockets) to OpenIB

Mike Krause wrote in response to Greg Lindahl:

If it is to be reasonably robust, then RDS should be required to

support

the resync between the two sides of the communication.  This aligns

with the

stated objective of implementing reliability in one location in

software and

one location in hardware.  Without such resync being required in the

ULP,

then one ends up with a ULP that falls shorts of its stated objectives

and

pushes complexity back up to the application which is where the

advocates

have stated it is too complex or expensive to get it correct.

This sort of message service, by the way, has a long

history in distributed computing.

Yep.

I haven't reread all of RDS fine print to double-check this, but my
impression is that RDS semantics exactly match the subset of MPI
point-to-point communications where the receiving rank is required
to have pre-posted buffers before the send is allowed.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Lustre over OpenIB Gen2

2005-11-10 Thread Sean Hefty


Eric Barton wrote:

1. How stable is the CM API and is it supported by all OpenIB affiliated
   vendors?


The IB CM API is stable.  Changes might occur as a result of changes to the CM 
protocol itself, but that effect is not limited to just the openib API.


The RDMA CMA API is fairly stable, but could still see minor changes.  This 
would be the better connection API to use if you want to connect using IP addresses.



2. I'd like to scale to >= 10,000 peer nodes; 1 RC QP per peer.  Is this going
   to get me into trouble?

   For example, I currently create a single PD and CQ for everything, however
   the example I've seen (cmatose.c) appears to create these separately for
   each peer.  Is that what I should be doing too?


Cmatose is just a simple example program that I use for testing.  If you're 
trying to scale out to 10,000 nodes, you'll want to limit your resources.  For 
example, I've never been able to run cmatose with 10,000 connections without 
running out of resources on my system.


Note that the IB CM does not implement a peer to peer connection model yet, so 
you would need to establish your connections using the client/server model.



4. Are RDMA reads still deprecated?  Which resources hit the spotlight if I
   chose to use them?


RDMA reads are fully supported.  Not sure what lead you to think that they were 
deprecated.



6. Does Gen2 present substantially the same APIs as the kernel in userspace?
   So if I wrote a userspace equivalent of my kernel driver, could I have pure
   userspace clients talk to kernel servers?


Most of the APIs are similar.  There shouldn't be any issues talking between 
userspace clients and kernel servers.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Caitlin Bestler

 


Mike Krause wrote in response to Greg Lindahl:


>   If it is to be reasonably robust, then RDS should be required to
support
> the resync between the two sides of the communication.  This aligns
with the
> stated objective of implementing reliability in one location in
software and
> one location in hardware.  Without such resync being required in the
ULP,
> then one ends up with a ULP that falls shorts of its stated objectives
and
> pushes complexity back up to the application which is where the
advocates
> have stated it is too complex or expensive to get it correct.




>>  This sort of message service, by the way, has a long
history in distributed computing.


>   Yep.   


I haven't reread all of RDS fine print to double-check this, but my
impression is that RDS semantics exactly match the subset of MPI
point-to-point communications where the receiving rank is required
to have pre-posted buffers before the send is allowed.


 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Lustre over OpenIB Gen2

2005-11-10 Thread Eric Barton

Hi,

I'm working with Cluster File Systems on lustre network drivers, including IB
drivers for the Voltaire, Infinicon and Topspin stacks.  These are kernel
drivers which use RC QPs with VERBS for small message queueing and RDMA for
bulk transfers.

We're obviously looking at OpenIB Gen2, and I wonder if people could be so kind
as to answer some questions for me.

1. How stable is the CM API and is it supported by all OpenIB affiliated
   vendors?

2. I'd like to scale to >= 10,000 peer nodes; 1 RC QP per peer.  Is this going
   to get me into trouble?

   For example, I currently create a single PD and CQ for everything, however
   the example I've seen (cmatose.c) appears to create these separately for
   each peer.  Is that what I should be doing too?

3. Is contiguous memory allocation an issue in Gen2?  Since this is such a
   scarce resource in the kernel (and particular CQ usage with one vendor's
   stack relied heavily on it) what red flags should I be aware of?

4. Are RDMA reads still deprecated?  Which resources hit the spotlight if I
   chose to use them?

5. Should I pre-map all physical memory and do RDMA in page-sized fragments?
   This avoids any mapping overhead at the expense of having much larger
   numbers of queued RDMAs.  Since I try to keep up to 8 (by default) 1MByte
   RDMAs active concurrently to any individual peer, with 4k pages I can have
   up to 2048 RDMA work items queued at a time per peer.

   And if I pre-map, can I be guaranteed that if I put the CQ into the error
   state, all remote access to my memory is revoked (e.g. could a CQ I create
   after I destroy the one I just shut down somehow alias with it such that a
   pathalogically delayed RDMA could write my memory)?

   Or is it better to use FMR pools and take the map/unmap overhead?  If so, is
   there a way to know when the unmap actually hits the hardware and my memory
   is safe?

6. Does Gen2 present substantially the same APIs as the kernel in userspace?
   So if I wrote a userspace equivalent of my kernel driver, could I have pure
   userspace clients talk to kernel servers?

Thanks in advance...

-- 

Cheers,
Eric

---
|Eric BartonBarton Software   |
|9 York Gardens Tel:+44 (117) 330 1575|
|CliftonMobile: +44 (7909) 680 356|
|Bristol BS8 4LLFax:call first|
|United Kingdom E-Mail: --|
---
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [git patch review 7/7] [IB] umad: further ib_unregister_mad_agent() deadlock fixes

2005-11-10 Thread Roland Dreier

The previous umad deadlock fix left ib_umad_kill_port() still
vulnerable to deadlocking.  This patch fixes that by downgrading our
lock to a read lock when we might end up trying to reacquire the lock
for reading.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 drivers/infiniband/core/user_mad.c |   87 ++--
 1 files changed, 63 insertions(+), 24 deletions(-)

applies-to: 17115437026be55dcd74641be21561fecf33dcdb
94382f3562e350ed7c8f7dcd6fc968bdece31328
diff --git a/drivers/infiniband/core/user_mad.c 
b/drivers/infiniband/core/user_mad.c
index d61f544..5ea741f 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -31,7 +31,7 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  *
- * $Id: user_mad.c 2814 2005-07-06 19:14:09Z halr $
+ * $Id: user_mad.c 4010 2005-11-09 23:11:56Z roland $
  */
 
 #include 
@@ -110,12 +110,13 @@ struct ib_umad_device {
 };
 
 struct ib_umad_file {
-   struct ib_umad_port *port;
-   struct list_head recv_list;
-   struct list_head port_list;
-   spinlock_t   recv_lock;
-   wait_queue_head_trecv_wait;
-   struct ib_mad_agent *agent[IB_UMAD_MAX_AGENTS];
+   struct ib_umad_port*port;
+   struct list_headrecv_list;
+   struct list_headport_list;
+   spinlock_t  recv_lock;
+   wait_queue_head_t   recv_wait;
+   struct ib_mad_agent*agent[IB_UMAD_MAX_AGENTS];
+   int agents_dead;
 };
 
 struct ib_umad_packet {
@@ -144,6 +145,12 @@ static void ib_umad_release_dev(struct k
kfree(dev);
 }
 
+/* caller must hold port->mutex at least for reading */
+static struct ib_mad_agent *__get_agent(struct ib_umad_file *file, int id)
+{
+   return file->agents_dead ? NULL : file->agent[id];
+}
+
 static int queue_packet(struct ib_umad_file *file,
struct ib_mad_agent *agent,
struct ib_umad_packet *packet)
@@ -151,10 +158,11 @@ static int queue_packet(struct ib_umad_f
int ret = 1;
 
down_read(&file->port->mutex);
+
for (packet->mad.hdr.id = 0;
 packet->mad.hdr.id < IB_UMAD_MAX_AGENTS;
 packet->mad.hdr.id++)
-   if (agent == file->agent[packet->mad.hdr.id]) {
+   if (agent == __get_agent(file, packet->mad.hdr.id)) {
spin_lock_irq(&file->recv_lock);
list_add_tail(&packet->list, &file->recv_list);
spin_unlock_irq(&file->recv_lock);
@@ -326,7 +334,7 @@ static ssize_t ib_umad_write(struct file
 
down_read(&file->port->mutex);
 
-   agent = file->agent[packet->mad.hdr.id];
+   agent = __get_agent(file, packet->mad.hdr.id);
if (!agent) {
ret = -EINVAL;
goto err_up;
@@ -480,7 +488,7 @@ static int ib_umad_reg_agent(struct ib_u
}
 
for (agent_id = 0; agent_id < IB_UMAD_MAX_AGENTS; ++agent_id)
-   if (!file->agent[agent_id])
+   if (!__get_agent(file, agent_id))
goto found;
 
ret = -ENOMEM;
@@ -530,7 +538,7 @@ static int ib_umad_unreg_agent(struct ib
 
down_write(&file->port->mutex);
 
-   if (id < 0 || id >= IB_UMAD_MAX_AGENTS || !file->agent[id]) {
+   if (id < 0 || id >= IB_UMAD_MAX_AGENTS || !__get_agent(file, id)) {
ret = -EINVAL;
goto out;
}
@@ -608,21 +616,29 @@ static int ib_umad_close(struct inode *i
struct ib_umad_file *file = filp->private_data;
struct ib_umad_device *dev = file->port->umad_dev;
struct ib_umad_packet *packet, *tmp;
+   int already_dead;
int i;
 
-   for (i = 0; i < IB_UMAD_MAX_AGENTS; ++i)
-   if (file->agent[i])
-   ib_unregister_mad_agent(file->agent[i]);
+   down_write(&file->port->mutex);
+
+   already_dead = file->agents_dead;
+   file->agents_dead = 1;
 
list_for_each_entry_safe(packet, tmp, &file->recv_list, list)
kfree(packet);
 
-   down_write(&file->port->mutex);
list_del(&file->port_list);
-   up_write(&file->port->mutex);
 
-   kfree(file);
+   downgrade_write(&file->port->mutex);
+
+   if (!already_dead)
+   for (i = 0; i < IB_UMAD_MAX_AGENTS; ++i)
+   if (file->agent[i])
+   ib_unregister_mad_agent(file->agent[i]);
 
+   up_read(&file->port->mutex);
+
+   kfree(file);
kref_put(&dev->ref, ib_umad_release_dev);
 
return 0;
@@ -848,13 +864,36 @@ static void ib_umad_kill_port(struct ib_
 
port->ib_dev = NULL;
 
-   list_for_each_entry(file, &port->file_list, port_list)
-   for (id = 0; id < IB_UMAD_MAX_AGENTS; ++id) {
-   if (!file->agent[id])
-   continue;
-

[openib-general] [git patch review 5/7] [IB] mthca: fix wraparound handling in mthca_cq_clean()

2005-11-10 Thread Roland Dreier

Handle case where prod_index has wrapped around and become less than
cq->cons_index by checking that their difference as a signed int is
positive rather than comparing directly.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 drivers/infiniband/hw/mthca/mthca_cq.c |   16 ++--
 1 files changed, 6 insertions(+), 10 deletions(-)

applies-to: 704990abeb22a51ed2722e92536d22135f60957f
64044bcf75063cb5a6d42712886a712449df2ce3
diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c 
b/drivers/infiniband/hw/mthca/mthca_cq.c
index f98e235..4a8adce 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -258,7 +258,7 @@ void mthca_cq_clean(struct mthca_dev *de
 {
struct mthca_cq *cq;
struct mthca_cqe *cqe;
-   int prod_index;
+   u32 prod_index;
int nfreed = 0;
 
spin_lock_irq(&dev->cq_table.lock);
@@ -293,19 +293,15 @@ void mthca_cq_clean(struct mthca_dev *de
 * Now sweep backwards through the CQ, removing CQ entries
 * that match our QP by copying older entries on top of them.
 */
-   while (prod_index > cq->cons_index) {
-   cqe = get_cqe(cq, (prod_index - 1) & cq->ibcq.cqe);
+   while ((int) --prod_index - (int) cq->cons_index >= 0) {
+   cqe = get_cqe(cq, prod_index & cq->ibcq.cqe);
if (cqe->my_qpn == cpu_to_be32(qpn)) {
if (srq)
mthca_free_srq_wqe(srq, be32_to_cpu(cqe->wqe));
++nfreed;
-   }
-   else if (nfreed)
-   memcpy(get_cqe(cq, (prod_index - 1 + nfreed) &
-  cq->ibcq.cqe),
-  cqe,
-  MTHCA_CQ_ENTRY_SIZE);
-   --prod_index;
+   } else if (nfreed)
+   memcpy(get_cqe(cq, (prod_index + nfreed) & 
cq->ibcq.cqe),
+  cqe, MTHCA_CQ_ENTRY_SIZE);
}
 
if (nfreed) {
---
0.99.9e
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [git patch review 4/7] [IB] mthca: fix posting of atomic operations

2005-11-10 Thread Roland Dreier

The size of work requests for atomic operations was computed
incorrectly in mthca: all sizeofs need to be divided by 16.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>
Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 drivers/infiniband/hw/mthca/mthca_qp.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

applies-to: 308dce81364b1cbb563942a1a57146c1808e8911
62abb8416f1923f4cef50ce9ce841b919275e3fb
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c 
b/drivers/infiniband/hw/mthca/mthca_qp.c
index 7f39af4..190c1dc 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1556,8 +1556,8 @@ int mthca_tavor_post_send(struct ib_qp *
}
 
wqe += sizeof (struct mthca_atomic_seg);
-   size += sizeof (struct mthca_raddr_seg) / 16 +
-   sizeof (struct mthca_atomic_seg);
+   size += (sizeof (struct mthca_raddr_seg) +
+sizeof (struct mthca_atomic_seg)) / 16;
break;
 
case IB_WR_RDMA_WRITE:
@@ -1876,8 +1876,8 @@ int mthca_arbel_post_send(struct ib_qp *
}
 
wqe += sizeof (struct mthca_atomic_seg);
-   size += sizeof (struct mthca_raddr_seg) / 16 +
-   sizeof (struct mthca_atomic_seg);
+   size += (sizeof (struct mthca_raddr_seg) +
+sizeof (struct mthca_atomic_seg)) / 16;
break;
 
case IB_WR_RDMA_READ:
---
0.99.9e
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [git patch review 6/7] [IB] mthca: fix posting long lists of receive work requests

2005-11-10 Thread Roland Dreier

In Tavor mode, when posting a long list of receive work requests, a
doorbell must be rung every 256 requests.  Add code to do this when
required.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>
Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 drivers/infiniband/hw/mthca/mthca_qp.c  |   19 +--
 drivers/infiniband/hw/mthca/mthca_srq.c |   22 --
 drivers/infiniband/hw/mthca/mthca_wqe.h |3 ++-
 3 files changed, 39 insertions(+), 5 deletions(-)

applies-to: 984d2fc62c548af3d01450135f33b5b97aecf00b
ae57e24a4006fd46b73d842ee99db9580ef74a02
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c 
b/drivers/infiniband/hw/mthca/mthca_qp.c
index 190c1dc..760c418 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1707,6 +1707,7 @@ int mthca_tavor_post_receive(struct ib_q
 {
struct mthca_dev *dev = to_mdev(ibqp->device);
struct mthca_qp *qp = to_mqp(ibqp);
+   __be32 doorbell[2];
unsigned long flags;
int err = 0;
int nreq;
@@ -1724,6 +1725,22 @@ int mthca_tavor_post_receive(struct ib_q
ind = qp->rq.next_ind;
 
for (nreq = 0; wr; ++nreq, wr = wr->next) {
+   if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) {
+   nreq = 0;
+
+   doorbell[0] = cpu_to_be32((qp->rq.next_ind << 
qp->rq.wqe_shift) | size0);
+   doorbell[1] = cpu_to_be32(qp->qpn << 8);
+
+   wmb();
+
+   mthca_write64(doorbell,
+ dev->kar + MTHCA_RECEIVE_DOORBELL,
+ 
MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
+
+   qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB;
+   size0 = 0;
+   }
+
if (mthca_wq_overflow(&qp->rq, nreq, qp->ibqp.recv_cq)) {
mthca_err(dev, "RQ %06x full (%u head, %u tail,"
" %d max, %d nreq)\n", qp->qpn,
@@ -1781,8 +1798,6 @@ int mthca_tavor_post_receive(struct ib_q
 
 out:
if (likely(nreq)) {
-   __be32 doorbell[2];
-
doorbell[0] = cpu_to_be32((qp->rq.next_ind << qp->rq.wqe_shift) 
| size0);
doorbell[1] = cpu_to_be32((qp->qpn << 8) | nreq);
 
diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c 
b/drivers/infiniband/hw/mthca/mthca_srq.c
index 292f55b..c3c0331 100644
--- a/drivers/infiniband/hw/mthca/mthca_srq.c
+++ b/drivers/infiniband/hw/mthca/mthca_srq.c
@@ -414,6 +414,7 @@ int mthca_tavor_post_srq_recv(struct ib_
 {
struct mthca_dev *dev = to_mdev(ibsrq->device);
struct mthca_srq *srq = to_msrq(ibsrq);
+   __be32 doorbell[2];
unsigned long flags;
int err = 0;
int first_ind;
@@ -429,6 +430,25 @@ int mthca_tavor_post_srq_recv(struct ib_
first_ind = srq->first_free;
 
for (nreq = 0; wr; ++nreq, wr = wr->next) {
+   if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) {
+   nreq = 0;
+
+   doorbell[0] = cpu_to_be32(first_ind << srq->wqe_shift);
+   doorbell[1] = cpu_to_be32(srq->srqn << 8);
+
+   /*
+* Make sure that descriptors are written
+* before doorbell is rung.
+*/
+   wmb();
+
+   mthca_write64(doorbell,
+ dev->kar + MTHCA_RECEIVE_DOORBELL,
+ 
MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
+
+   first_ind = srq->first_free;
+   }
+
ind = srq->first_free;
 
if (ind < 0) {
@@ -491,8 +511,6 @@ int mthca_tavor_post_srq_recv(struct ib_
}
 
if (likely(nreq)) {
-   __be32 doorbell[2];
-
doorbell[0] = cpu_to_be32(first_ind << srq->wqe_shift);
doorbell[1] = cpu_to_be32((srq->srqn << 8) | nreq);
 
diff --git a/drivers/infiniband/hw/mthca/mthca_wqe.h 
b/drivers/infiniband/hw/mthca/mthca_wqe.h
index 1f4c0ff..73f1c0b 100644
--- a/drivers/infiniband/hw/mthca/mthca_wqe.h
+++ b/drivers/infiniband/hw/mthca/mthca_wqe.h
@@ -49,7 +49,8 @@ enum {
 };
 
 enum {
-   MTHCA_INVAL_LKEY = 0x100
+   MTHCA_INVAL_LKEY= 0x100,
+   MTHCA_TAVOR_MAX_WQES_PER_RECV_DB= 256
 };
 
 struct mthca_next_seg {
---
0.99.9e
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [git patch review 2/7] [IB] umad: get rid of unused mr array

2005-11-10 Thread Roland Dreier

Now that ib_umad uses the new MAD sending interface, it no longer
needs its own L_Key.  So just delete the array of MRs that it keeps.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 drivers/infiniband/core/user_mad.c |   29 -
 1 files changed, 4 insertions(+), 25 deletions(-)

applies-to: e7b9ffe6fca9246f29a0a3cdf6417770f5821cef
ec914c52d6208d8752dfd85b48a9aff304911434
diff --git a/drivers/infiniband/core/user_mad.c 
b/drivers/infiniband/core/user_mad.c
index f5ed36c..d61f544 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -116,7 +116,6 @@ struct ib_umad_file {
spinlock_t   recv_lock;
wait_queue_head_trecv_wait;
struct ib_mad_agent *agent[IB_UMAD_MAX_AGENTS];
-   struct ib_mr*mr[IB_UMAD_MAX_AGENTS];
 };
 
 struct ib_umad_packet {
@@ -505,29 +504,16 @@ found:
goto out;
}
 
-   file->mr[agent_id] = ib_get_dma_mr(agent->qp->pd, 
IB_ACCESS_LOCAL_WRITE);
-   if (IS_ERR(file->mr[agent_id])) {
-   ret = -ENOMEM;
-   goto err;
-   }
-
if (put_user(agent_id,
 (u32 __user *) (arg + offsetof(struct ib_user_mad_reg_req, 
id {
ret = -EFAULT;
-   goto err_mr;
+   ib_unregister_mad_agent(agent);
+   goto out;
}
 
file->agent[agent_id] = agent;
ret = 0;
 
-   goto out;
-
-err_mr:
-   ib_dereg_mr(file->mr[agent_id]);
-
-err:
-   ib_unregister_mad_agent(agent);
-
 out:
up_write(&file->port->mutex);
return ret;
@@ -536,7 +522,6 @@ out:
 static int ib_umad_unreg_agent(struct ib_umad_file *file, unsigned long arg)
 {
struct ib_mad_agent *agent = NULL;
-   struct ib_mr *mr = NULL;
u32 id;
int ret = 0;
 
@@ -551,16 +536,13 @@ static int ib_umad_unreg_agent(struct ib
}
 
agent = file->agent[id];
-   mr= file->mr[id];
file->agent[id] = NULL;
 
 out:
up_write(&file->port->mutex);
 
-   if (agent) {
+   if (agent)
ib_unregister_mad_agent(agent);
-   ib_dereg_mr(mr);
-   }
 
return ret;
 }
@@ -629,10 +611,8 @@ static int ib_umad_close(struct inode *i
int i;
 
for (i = 0; i < IB_UMAD_MAX_AGENTS; ++i)
-   if (file->agent[i]) {
+   if (file->agent[i])
ib_unregister_mad_agent(file->agent[i]);
-   ib_dereg_mr(file->mr[i]);
-   }
 
list_for_each_entry_safe(packet, tmp, &file->recv_list, list)
kfree(packet);
@@ -872,7 +852,6 @@ static void ib_umad_kill_port(struct ib_
for (id = 0; id < IB_UMAD_MAX_AGENTS; ++id) {
if (!file->agent[id])
continue;
-   ib_dereg_mr(file->mr[id]);
ib_unregister_mad_agent(file->agent[id]);
file->agent[id] = NULL;
}
---
0.99.9e
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [git patch review 3/7] [IB] uverbs: have kernel return QP capabilities

2005-11-10 Thread Roland Dreier

Move the computation of QP capabilities (max scatter/gather entries,
max inline data, etc) into the kernel, and have the uverbs module
return the values as part of the create QP response.  This keeps
precise knowledge of device limits in the low-level kernel driver.

This requires an ABI bump, so while we're making changes, get rid of
the max_sge parameter for the modify SRQ command -- it's not used and
shouldn't be there.

Signed-off-by: Jack Morgenstein <[EMAIL PROTECTED]>
Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>
Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 drivers/infiniband/core/uverbs_cmd.c |   12 ++--
 drivers/infiniband/hw/mthca/mthca_cmd.c  |2 +
 drivers/infiniband/hw/mthca/mthca_dev.h  |1 
 drivers/infiniband/hw/mthca/mthca_main.c |1 
 drivers/infiniband/hw/mthca/mthca_provider.c |2 -
 drivers/infiniband/hw/mthca/mthca_provider.h |1 
 drivers/infiniband/hw/mthca/mthca_qp.c   |   86 --
 include/rdma/ib_user_verbs.h |9 ++-
 8 files changed, 98 insertions(+), 16 deletions(-)

applies-to: 2741f22c820fb664f6958becc4f3d415eea0e61b
77369ed31daac51f4827c50d30f233c45480235a
diff --git a/drivers/infiniband/core/uverbs_cmd.c 
b/drivers/infiniband/core/uverbs_cmd.c
index 63a7415..ed45da8 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -708,7 +708,7 @@ ssize_t ib_uverbs_poll_cq(struct ib_uver
resp->wc[i].opcode = wc[i].opcode;
resp->wc[i].vendor_err = wc[i].vendor_err;
resp->wc[i].byte_len   = wc[i].byte_len;
-   resp->wc[i].imm_data   = wc[i].imm_data;
+   resp->wc[i].imm_data   = (__u32 __force) wc[i].imm_data;
resp->wc[i].qp_num = wc[i].qp_num;
resp->wc[i].src_qp = wc[i].src_qp;
resp->wc[i].wc_flags   = wc[i].wc_flags;
@@ -908,7 +908,12 @@ retry:
if (ret)
goto err_destroy;
 
-   resp.qp_handle = uobj->uobject.id;
+   resp.qp_handle   = uobj->uobject.id;
+   resp.max_recv_sge= attr.cap.max_recv_sge;
+   resp.max_send_sge= attr.cap.max_send_sge;
+   resp.max_recv_wr = attr.cap.max_recv_wr;
+   resp.max_send_wr = attr.cap.max_send_wr;
+   resp.max_inline_data = attr.cap.max_inline_data;
 
if (copy_to_user((void __user *) (unsigned long) cmd.response,
 &resp, sizeof resp)) {
@@ -1135,7 +1140,7 @@ ssize_t ib_uverbs_post_send(struct ib_uv
next->num_sge= user_wr->num_sge;
next->opcode = user_wr->opcode;
next->send_flags = user_wr->send_flags;
-   next->imm_data   = user_wr->imm_data;
+   next->imm_data   = (__be32 __force) user_wr->imm_data;
 
if (qp->qp_type == IB_QPT_UD) {
next->wr.ud.ah = idr_find(&ib_uverbs_ah_idr,
@@ -1701,7 +1706,6 @@ ssize_t ib_uverbs_modify_srq(struct ib_u
}
 
attr.max_wr= cmd.max_wr;
-   attr.max_sge   = cmd.max_sge;
attr.srq_limit = cmd.srq_limit;
 
ret = ib_modify_srq(srq, &attr, cmd.attr_mask);
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c 
b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 49f211d..9ed3458 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -1060,6 +1060,8 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev
dev_lim->hca.arbel.resize_srq = field & 1;
MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_SG_RQ_OFFSET);
dev_lim->max_sg = min_t(int, field, dev_lim->max_sg);
+   MTHCA_GET(size, outbox, QUERY_DEV_LIM_MAX_DESC_SZ_RQ_OFFSET);
+   dev_lim->max_desc_sz = min_t(int, size, dev_lim->max_desc_sz);
MTHCA_GET(size, outbox, QUERY_DEV_LIM_MPT_ENTRY_SZ_OFFSET);
dev_lim->mpt_entry_sz = size;
MTHCA_GET(field, outbox, QUERY_DEV_LIM_PBL_SZ_OFFSET);
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h 
b/drivers/infiniband/hw/mthca/mthca_dev.h
index 808037f..497ff79 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -131,6 +131,7 @@ struct mthca_limits {
int  max_sg;
int  num_qps;
int  max_wqes;
+   int  max_desc_sz;
int  max_qp_init_rdma;
int  reserved_qps;
int  num_srqs;
diff --git a/drivers/infiniband/hw/mthca/mthca_main.c 
b/drivers/infiniband/hw/mthca/mthca_main.c
index 16594d1..147f248 100644
--- a/drivers/infiniband/hw/mthca/mthca_main.c
+++ b/drivers/infiniband/hw/mthca/mthca_main.c
@@ -168,6 +168,7 @@ static int __devinit mthca_dev_lim(struc
mdev->limits.max_srq_wqes   = dev_lim->max_srq_sz;
mdev->limits.reserved_srqs  = dev_lim->reserved_srqs;
mdev->limits.reserved_eecs  =

[openib-general] [git patch review 1/7] [IB] Have cq_resize() method take an int, not int*

2005-11-10 Thread Roland Dreier

Change the struct ib_device.resize_cq() method to take a plain integer
that holds the new CQ size, rather than a pointer to an integer that
it uses to return the new size.  This makes the interface match the
exported ib_resize_cq() signature, and allows the low-level driver to
update the CQ size with proper locking if necessary.

No in-tree drivers are exporting this method yet.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 drivers/infiniband/core/verbs.c |   12 ++--
 include/rdma/ib_verbs.h |2 +-
 2 files changed, 3 insertions(+), 11 deletions(-)

applies-to: 08d94f59d6f80937db5d87f0bb60eafcedd811d1
40de2e548c225e3ef859e3c60de9785e37e1b5b1
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 72d3ef7..4f51d79 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -324,16 +324,8 @@ EXPORT_SYMBOL(ib_destroy_cq);
 int ib_resize_cq(struct ib_cq *cq,
  int   cqe)
 {
-   int ret;
-
-   if (!cq->device->resize_cq)
-   return -ENOSYS;
-
-   ret = cq->device->resize_cq(cq, &cqe);
-   if (!ret)
-   cq->cqe = cqe;
-
-   return ret;
+   return cq->device->resize_cq ?
+   cq->device->resize_cq(cq, cqe) : -ENOSYS;
 }
 EXPORT_SYMBOL(ib_resize_cq);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f72d46d..a7f4c35 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -881,7 +881,7 @@ struct ib_device {
struct ib_ucontext *context,
struct ib_udata *udata);
int(*destroy_cq)(struct ib_cq *cq);
-   int(*resize_cq)(struct ib_cq *cq, int *cqe);
+   int(*resize_cq)(struct ib_cq *cq, int cqe);
int(*poll_cq)(struct ib_cq *cq, int num_entries,
  struct ib_wc *wc);
int(*peek_cq)(struct ib_cq *cq, int wc_cnt);
---
0.99.9e
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Michael Krause



At 02:09 PM 11/9/2005, Greg Lindahl wrote:
On Wed, Nov 09, 2005 at
01:57:06PM -0800, Michael Krause wrote:
> What you indicate above is that RDS 
> will implement a resync of the two sides of the association to
determine 
> what has been successfully sent.
More accurate to say that it "could" implement that. I'm
just
kibbutzing on someone else's proposal.
> This then implies that the reliability of the underlying
> interconnect isn't as critical per se as the end-to-end RDS
protocol
> will assure that data is delivered to the RDS components in the
face
> of hardware failures.  Correct?
Yes. That's the intent that I see in the proposal. The
implementation
required to actually support this may not be what the proposers had
in
mind.
If it is to be reasonably robust, then RDS should be required to support
the resync between the two sides of the communication.  This aligns
with the stated objective of implementing reliability in one location in
software and one location in hardware.  Without such resync being
required in the ULP, then one ends up with a ULP that falls shorts of its
stated objectives and pushes complexity back up to the application which
is where the advocates have stated it is too complex or expensive to get
it correct.

This sort of
message service, by the way, has a long history in distributed
computing.
Yep.   
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] libmthca: fix posting long wqe lists for srq

2005-11-10 Thread Roland Dreier

Thanks -- I had basically the same thing in my local working directory
but forgot to commit it.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] libmthca: fix posting long wqe lists for srq

2005-11-10 Thread Michael S. Tsirkin

Fix posting long WQE lists for SRQ.

Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]>

Index: src/userspace/libmthca/src/srq.c
===
--- src/userspace/libmthca/src/srq.c(revision 4016)
+++ src/userspace/libmthca/src/srq.c(working copy)
@@ -99,6 +99,7 @@ int mthca_tavor_post_srq_recv(struct ibv
 
for (nreq = 0; wr; ++nreq, wr = wr->next) {
if (nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB) {
+   nreq = 0;
doorbell[0] = htonl(first_ind << srq->wqe_shift);
doorbell[1] = htonl((srq->srqn << 8) | nreq);
 

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

48 matches

Mail list logo