Re: [openib-general] OpenSM and Wrong SM_Key
On Wed, Nov 09, 2005 at 09:46:06AM +0200, Eitan Zahavi wrote: > Hi Hal, > > I would like to bring this to MgtWG before we change anything. > IMO the situation when this happens is really not "legal" since if the > SM's are not coordinated at least in their SM_Key it will cause the two > masters on the subnet. > > >From our experience it is always better to cause a fatal flow and exit > the SM rather then report the event in some log - normally it will not > be seen ... > > I know this is a controversial issue. Okay, so you're telling me you *WANT* behavior where a rogue node can trivially cause the running subnet manager to exit and take over management of the network? Opensm needs to have a well documented config file, instead of 3 pages of command line options, and different levels of logging. What to do in the above situation is a site-local policy config decision, not something that should be hard-coded in the SM source code. The logs might actually get looked at if there wasn't junk in the log every time something timed out. The linux kernel has 'WARN, NOTICE, and CRITICAL' level log messages. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re:先払い
【受信メール1件】届きました。 『名前』:kirarin 『年齢』:27歳 『職業』:自営業 『年収』:1000万円 『写真』:あり 『一言』:正直に言うとエッチ希望なんです。10万円先払いしますのでここに連絡 くれませんか?連絡くるまで待ってます。090-8012- ☆こちらから無料返信☆ http://lov025.com/?senyoh ※現在、kirarinさんからの指名メールは貴方様への一通のみとなっております。 ※番号の続きは本人掲示板にてご確認下さい。 ☆yahooアドレスなどフリーメールアドレスからでも登録できます☆ 拒否の方 [EMAIL PROTECTED] ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- > avoid dlists > > > The const confuses me somewhat. > > Yeah, and thinking about it more, the memory really belongs to the > consumer of the function. So I don't think the const is even correct. > > > > extern int ibv_get_device_list(struct ibv_device ***list); > > > Is ***list really what we want here? Can we just get away with **list? > > Yes -- a single device is represented by a struct ibv_device *. > So an array of devices is represented by a struct ibv_device **. > And a pointer to such an array is struct ibv_device ***. > > But the following is OK too I think: > > extern int ibv_get_device_list(struct ibv_device **list[]); > extern void ibv_free_device_list(struct ibv_device *list[]); > > is that clearer? (a pointer to an array of pointers to struct ibv_device). Yes, this looks good. > > Would something like: > > > > struct ibv_device * ibv_get_device(index); > > > > work as well? > > That could work as well. But it doesn't handle hotplug quite as well. > By returning a snapshot of all the known devices at a given moment, we > at least have a chance at doing something sensible with devices > appearing or disappearing. > > - R. I agree. With ibv_free_device_list we just need to document that the application is supposed to close devices it doesnt listen for hotplug on. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3
Sean Hefty wrote: > Caitlin Bestler wrote: >> Current CM software could generate the Serive ID. Therefore the fact >> that the Private Data is in the "new format" cannot be part of the >> Service ID. Otherwise I agree with your analysis that data can be >> moved to the Serivce ID. Which is more valuable, >> 4 more bytes of private data or a very larger number of Service IDS, >> is another topic. > > The CM would still need to know what range of service IDs can > be generated. I don't believe that the range can overlap > with an existing range that is already defined without > needing to redefine service records and other items. The > extra bit in essence becomes a 65th bit for the service ID in such > cases. > How would you prevent someone using old CM software from forging their IP address in user mode and requesting the Service ID from an old CM implementation that did not know to check newly standardized portion of what it thinks of as entirely "private" data? By comparison, an RDMA application on an iWARP system cannot receive a "connection established" event until the IP Address has been validated by kernels at both end and by the ability to round-trip with said IP address. > The additional 4 bytes of private data come at an expense of > consuming something like .006% of the service ID space. > >>> To be clear, the CM REQ _carries_ the IP address. There should be >>> no requirement that the CM performs the mapping, and I see no >>> reason why it should even care. >> >> The CM needs to have at least the capability of validating the local >> IP address supplied. > > Validation can be done outside of the CM in a separate module. > That's fine. Just as long as an application that wants to cheat has to consider the possibility that the kernel might validate. Similarly ingress validation *might* be done in an IP network. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3
Caitlin Bestler wrote: Current CM software could generate the Serive ID. Therefore the fact that the Private Data is in the "new format" cannot be part of the Service ID. Otherwise I agree with your analysis that data can be moved to the Serivce ID. Which is more valuable, 4 more bytes of private data or a very larger number of Service IDS, is another topic. The CM would still need to know what range of service IDs can be generated. I don't believe that the range can overlap with an existing range that is already defined without needing to redefine service records and other items. The extra bit in essence becomes a 65th bit for the service ID in such cases. The additional 4 bytes of private data come at an expense of consuming something like .006% of the service ID space. To be clear, the CM REQ _carries_ the IP address. There should be no requirement that the CM performs the mapping, and I see no reason why it should even care. The CM needs to have at least the capability of validating the local IP address supplied. Validation can be done outside of the CM in a separate module. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] socket based connectionmodel for IB proposal - round 3
>> If you want to maximize consumer usable private data, then >> you can move the version, IP version, protocol, source and >> destination ports into the service ID. > >Not at the expense of redefining what Service ID is. >How do you propose to move all these fields into Service ID without >violating IBTA spec Annex A3.2.? Remember Service ID is what responder >advertize and requestor sends communucation requests to. It may be >possible >to server to advertize multiple service IDs to cover version and IP >version >variations but it will not be symmetrical to iWARP. Port is port >(service ID) >and address is address. Port does not encode IP version. The service ID could be formatted as: Set ID: 24 Version: 4 IP version: 4 Src port: 16 Dst port: 16 I don't see how this violates the spec. Beyond the set ID, the rest is defined as "any". It's not necessary, but it does save 4 bytes of private data for the user. >> Separately, if there's any defined mapping to a service ID or >> set of service IDs, then the service ID indicates the format >> of the private data. No additional information is needed in >> the CM REQ, such as using a reserve bit. > >That is a good point. >But this restricts the usage of IP addressing only to these ports. It doesn't restrict the usage at all. It defines a portion of the private data for a specific range of service IDs, the same way it is done for SDP. There's no restriction that other service IDs not use the same format. Even with the proposal to use a reserved bit in the CM, a particular service could format its private data this way, not set the bit, and still be spec compliant. >The question is what is easier to check 1 bit or Service ID. >Of course, service ID will have to be checked anyhow to direct the >request. Exactly. If the service ID is checked anyway, why set the bit? >While this overloads the semantic meaning of Service ID it is a viable >method. How is this not viable? There's a _working_ implementation today for both userspace and kernel mode clients to connect using IP addressing that didn't require any modifications to the IB CM. >> To be clear, the CM REQ _carries_ the IP address. There >> should be no requirement that the CM performs the mapping, >> and I see no reason why it should even care. >> >Can you elaborate on this? Is this addresses who populates the formated >portion of >the provate data? I'm referring to who formats the private data and performs the mapping to the service IDs (slide 13) - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
> It seemed faintly preferable to tell the caller how big the array was > rather than forcing the caller to count for itself. If you really wanted that, I would be more inclined towards: struct ibv_device ** ibv_get_device(*length_ptr) and if you do not want length, you could pass a null length_ptr. But since I also cannot think of a strong case for it, I prefer the cleaner interface of leaving it out. Johann ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3
[EMAIL PROTECTED] wrote: > If you want to maximize consumer usable private data, then > you can move the version, IP version, protocol, source and > destination ports into the service ID. > > Separately, if there's any defined mapping to a service ID or > set of service IDs, then the service ID indicates the format > of the private data. No additional information is needed in > the CM REQ, such as using a reserve bit. > Current CM software could generate the Serive ID. Therefore the fact that the Private Data is in the "new format" cannot be part of the Service ID. Otherwise I agree with your analysis that data can be moved to the Serivce ID. Which is more valuable, 4 more bytes of private data or a very larger number of Service IDS, is another topic. > To be clear, the CM REQ _carries_ the IP address. There > should be no requirement that the CM performs the mapping, > and I see no reason why it should even care. > The CM needs to have at least the capability of validating the local IP address supplied. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3
Sean, comments inline. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 > -Original Message- > From: Sean Hefty [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 10, 2005 6:01 PM > To: Kanevsky, Arkady > Cc: [EMAIL PROTECTED]; > openib-general@openib.org; [EMAIL PROTECTED] > Subject: Re: [openib-general] RE: [dat-discussions] socket > based connection model for IB proposal - round 3 > > If you want to maximize consumer usable private data, then > you can move the version, IP version, protocol, source and > destination ports into the service ID. Not at the expense of redefining what Service ID is. How do you propose to move all these fields into Service ID without violating IBTA spec Annex A3.2.? Remember Service ID is what responder advertize and requestor sends communucation requests to. It may be possible to server to advertize multiple service IDs to cover version and IP version variations but it will not be symmetrical to iWARP. Port is port (service ID) and address is address. Port does not encode IP version. > > Separately, if there's any defined mapping to a service ID or > set of service IDs, then the service ID indicates the format > of the private data. No additional information is needed in > the CM REQ, such as using a reserve bit. That is a good point. But this restricts the usage of IP addressing only to these ports. The question is what is easier to check 1 bit or Service ID. Of course, service ID will have to be checked anyhow to direct the request. While this overloads the semantic meaning of Service ID it is a viable method. > > To be clear, the CM REQ _carries_ the IP address. There > should be no requirement that the CM performs the mapping, > and I see no reason why it should even care. > Can you elaborate on this? Is this addresses who populates the formated portion of the provate data? > - Sean > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: (SPAM?) Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
Roland Dreier wrote: > Is ***list really what we want here? Can we just get away with **list? Yes -- a single device is represented by a struct ibv_device *. So an array of devices is represented by a struct ibv_device **. And a pointer to such an array is struct ibv_device ***. I understand. This is just API that I've seen that used '***'. Why not just return a copy of the array? > Would something like: > > struct ibv_device * ibv_get_device(index); > > work as well? That could work as well. But it doesn't handle hotplug quite as well. By returning a snapshot of all the known devices at a given moment, we at least have a chance at doing something sensible with devices appearing or disappearing. This doesn't seem any worse to me. The user can reference device_array[i] or call ibv_get_device(i). I need to spend more time understanding how userspace hotplug will work. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
> I would prefer one call to get the entire structure. Another option might > be: > > struct ibv_device ** ibv_get_device() > > where it returns a list which is null terminated so you do not need to > return the length. Yes, I thought of that too. It seemed faintly preferable to tell the caller how big the array was rather than forcing the caller to count for itself. But I can't really think of a use case where it makes a difference, so perhaps your simpler version is better. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
> The const confuses me somewhat. Yeah, and thinking about it more, the memory really belongs to the consumer of the function. So I don't think the const is even correct. > > extern int ibv_get_device_list(struct ibv_device ***list); > Is ***list really what we want here? Can we just get away with **list? Yes -- a single device is represented by a struct ibv_device *. So an array of devices is represented by a struct ibv_device **. And a pointer to such an array is struct ibv_device ***. But the following is OK too I think: extern int ibv_get_device_list(struct ibv_device **list[]); extern void ibv_free_device_list(struct ibv_device *list[]); is that clearer? (a pointer to an array of pointers to struct ibv_device). > Would something like: > > struct ibv_device * ibv_get_device(index); > > work as well? That could work as well. But it doesn't handle hotplug quite as well. By returning a snapshot of all the known devices at a given moment, we at least have a chance at doing something sensible with devices appearing or disappearing. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
> Is ***list really what we want here? Can we just get away with **list? > > Would something like: > > struct ibv_device * ibv_get_device(index); I would prefer one call to get the entire structure. Another option might be: struct ibv_device ** ibv_get_device() where it returns a list which is null terminated so you do not need to return the length. Johann ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3
If you want to maximize consumer usable private data, then you can move the version, IP version, protocol, source and destination ports into the service ID. Separately, if there's any defined mapping to a service ID or set of service IDs, then the service ID indicates the format of the private data. No additional information is needed in the CM REQ, such as using a reserve bit. To be clear, the CM REQ _carries_ the IP address. There should be no requirement that the CM performs the mapping, and I see no reason why it should even care. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
> So how about just doing > > /* put list of devices in list and return length of list */ > extern int ibv_get_device_list(struct ibv_device * const **list); > > /* free a list of devices from ibv_get_device_list */ > extern void ibv_free_device_list(struct ibv_device * const *list); I like it much better than what we have now. Clean, simple and easy to understand. > Or are the consts too confusing? Should we be a little less safe but > make it nice and simple and just do I often find consts a bit of a nuisance for what they give me; but am fine either way. Both are simple enough. Johann ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: (SPAM?) [openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
Roland Dreier wrote: Or are the consts too confusing? Should we be a little less safe but make it nice and simple and just do The const confuses me somewhat. extern int ibv_get_device_list(struct ibv_device ***list); Is ***list really what we want here? Can we just get away with **list? Would something like: struct ibv_device * ibv_get_device(index); work as well? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [PATCHv1] userspace CMA
> . I don't see a file sa_kern-abi.h anywhere -- I think you forgot to > add it. Also, please name it sa-kern-abi.h (ie all '-'s) -- mixed > underscores and dashes are just too hard to type and look weird. > > . Please add a ChangeLog entry covering the libibverbs changes. Here's an updated patch for just libibverbs. Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> Index: include/infiniband/sa-kern-abi.h === --- include/infiniband/sa-kern-abi.h(revision 0) +++ include/infiniband/sa-kern-abi.h(revision 0) @@ -0,0 +1,60 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef SA_KERN_ABI_H +#define SA_KERN_ABI_H + +#include + +struct ib_kern_path_rec { + __u8 dgid[16]; + __u8 sgid[16]; + __u16 dlid; + __u16 slid; + __u32 raw_traffic; + __u32 flow_label; + __u32 reversible; + __u32 mtu; + __u16 pkey; + __u8 hop_limit; + __u8 traffic_class; + __u8 numb_path; + __u8 sl; + __u8 mtu_selector; + __u8 rate_selector; + __u8 rate; + __u8 packet_life_time_selector; + __u8 packet_life_time; + __u8 preference; +}; + +#endif /* SA_KERN_ABI_H */ Index: include/infiniband/sa.h === --- include/infiniband/sa.h (revision 0) +++ include/infiniband/sa.h (revision 0) @@ -0,0 +1,130 @@ +/* + * Copyright (c) 2004 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: sa.h 2616 2005-06-15 15:22:39Z halr $ + */ + +#ifndef IB_SA_H +#define IB_SA_H + +#include + +enum ib_sa_rate { + IB_SA_RATE_2_5_GBPS = 2, + IB_SA_RATE_5_GBPS = 5, + IB_SA_RATE_10_GBPS = 3, + IB_SA_RATE_20_GBPS = 6, + IB_SA_RATE_30_GBPS = 4, + IB_SA_RATE_40_GBPS = 7, + IB_SA_RATE_60_GBPS = 8, + IB_SA_RATE_80_GBPS = 9, + IB_SA_RATE_120_GBPS = 10 +}; + +static inline int ib_sa_rate_enum_to_int(enum ib_sa_rate rate) +{ + switch (rate) { + case IB_SA_RATE_2_5_GBPS: return 1; + case IB_SA_RATE_5_GBPS: ret
[openib-general] [RFC] new ibv_get_devices() API -- avoid dlists
Michael> Maybe its a naming thing? We can call the list Michael> "iterator", does this make it less ugly? I thought about this, but it feels like overkill for something pretty simple. So how about just doing /* put list of devices in list and return length of list */ extern int ibv_get_device_list(struct ibv_device * const **list); /* free a list of devices from ibv_get_device_list */ extern void ibv_free_device_list(struct ibv_device * const *list); which could be used as: struct ibv_device * const *list; int list_len; list_len = ibv_get_device_list(&list); /* ... */ ibv_free_device_list(list); Or are the consts too confusing? Should we be a little less safe but make it nice and simple and just do extern int ibv_get_device_list(struct ibv_device ***list); and so on? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [dat-discussions] RE: [openib-general] socket based connection model for IB proposal -round 3
If other agree I am happy to make version 4 bits field. We will use IPv4 encapsulation into IPv6 as defined by IETF. 0-based VA and remote invalidate are not relevant to IP addressing. But we are proposing a change to IB CM so we need to address all the differences between IB and iWARP. This is why these are addressed in the discussion. If we have protocol field than CM will populate this based on the 5-tuple of socket_addr. Arkady Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 > -Original Message- > From: Fab Tillier [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 10, 2005 4:53 PM > To: Kanevsky, Arkady; openib-general@openib.org; > [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: [dat-discussions] RE: [openib-general] socket based > connection model for IB proposal -round 3 > > > From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] > > Sent: Thursday, November 10, 2005 1:37 PM > > > > It will be discussed at IBTA SWG meeting next week Tu. > > Please, post your comments before that. > > Looks fine to me overall. The only thing I would change is > make the version field 4 bits rather than just 2, and shift > the IP version down 2 bits, eliminating the reserved bits. > That way, the first byte is split evenly between protocol > version and IP version. > > Do we even need to indicate the IP version, or can IPv4 > addresses be expressed as IPv6 addresses just by zeroing the > first 12 bytes? > > I don't understand the relevance of the 0-based VA or Send > with Invalidate discussion points. They seem orthogonal to > the socket-based CM proposal, and IMO should be moved to a > separate proposal. > > I have no opinion one way or another on the presence of the > protocol field. It could just as well be left as "flags" for > the consumer to do with what they please. > > - Fab > > > > Yahoo! Groups Sponsor > ~--> Get Bzzzy! (real tools to help you > find a job). Welcome to the Sweet Life. > http://us.click.yahoo.com/A77XvD/vlQLAA/TtwFAA/W6uqlB/TM > -- > --~-> > > > Yahoo! Groups Links > > <*> To visit your group on the web, go to: > http://groups.yahoo.com/group/dat-discussions/ > > <*> To unsubscribe from this group, send an email to: > [EMAIL PROTECTED] > > <*> Your use of Yahoo! Groups is subject to: > http://docs.yahoo.com/info/terms/ > > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCHv1] userspace CMA
Roland Dreier wrote: . I don't see a file sa_kern-abi.h anywhere -- I think you forgot to add it. Also, please name it sa-kern-abi.h (ie all '-'s) -- mixed underscores and dashes are just too hard to type and look weird. I did forget to add this file. I've added it and renamed it to sa-kern-abi.h. The file only contains a definition for struct ib_kern_path_rec at the moment. . Please add a ChangeLog entry covering the libibverbs changes. Will do. Thanks, Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCHv1] userspace CMA
The libibverbs bits look mostly OK but: . I don't see a file sa_kern-abi.h anywhere -- I think you forgot to add it. Also, please name it sa-kern-abi.h (ie all '-'s) -- mixed underscores and dashes are just too hard to type and look weird. Or did you just put everything in kern-abi.h? That's fine too, just remove the sa_kern-abi.h references. . Please add a ChangeLog entry covering the libibverbs changes. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] socket based connection model for IB proposal -round 3
> From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 10, 2005 1:37 PM > > It will be discussed at IBTA SWG meeting next week Tu. > Please, post your comments before that. Looks fine to me overall. The only thing I would change is make the version field 4 bits rather than just 2, and shift the IP version down 2 bits, eliminating the reserved bits. That way, the first byte is split evenly between protocol version and IP version. Do we even need to indicate the IP version, or can IPv4 addresses be expressed as IPv6 addresses just by zeroing the first 12 bytes? I don't understand the relevance of the 0-based VA or Send with Invalidate discussion points. They seem orthogonal to the socket-based CM proposal, and IMO should be moved to a separate proposal. I have no opinion one way or another on the presence of the protocol field. It could just as well be left as "flags" for the consumer to do with what they please. - Fab ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3
Fixed the bit value for formating indicator. Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 IP Address Support by InfiniBand CM_v3.pdf Description: IP Address Support by InfiniBand CM_v3.pdf ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] uDAPL free build issues cleaned up, print path records returned from uAT
On Thu, 10 Nov 2005, Arlin Davis wrote: > James, > > I fixed some problems with the free build openib_scm version. Also > turned down some debugging and added some debug prints for uAT path > records. > > -arlin Thanks Arlin. Committed in revision 4018. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] socket based connection model for IB proposal - round 3
It will be discussed at IBTA SWG meeting next week Tu. Please, post your comments before that. Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 IP Address Support by InfiniBand CM_v3.pdf Description: IP Address Support by InfiniBand CM_v3.pdf ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB
My concern is the requirement that RDS resync the structures in the face of failureand know whether to re-transmit or will deal with duplicates. Having pre-posted bufferswill help enable the resync to be accomplished but should not be equated to pre-post equalsone can deal with duplicates or will verify to prevent duplicates from occurring.Mike The semantics should be that barring an error the flow between any two endpoints is reliable and ordered. The difference versus a normal point-to-point definition of reliable is that a) lack of a receive buffer is an error, b) the endpoint communicates with many known remote peers (as opposed to one known remote peer, or many unknown). Having an API with those semantics, particularly as an upgrade in semanitcs from SOCK_DGRAM while preserving SOCK_DGRAM syntax, is something that I believe is of distinct value to many cluster based applications. Further the API can be implemeneted in an offload device (IB or IP) more efficiently than if it is simply implemented on top of SOCK_STREAM sockets by the application. Documenting and clarifying the semantics to make it's general applicability clearer should definitely be done, however. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB
At 10:48 AM 11/10/2005, Caitlin Bestler wrote: Mike Krause wrote in response to Greg Lindahl: > If it is to be reasonably robust, then RDS should be required to support > the resync between the two sides of the communication. This aligns with the > stated objective of implementing reliability in one location in software and > one location in hardware. Without such resync being required in the ULP, > then one ends up with a ULP that falls shorts of its stated objectives and > pushes complexity back up to the application which is where the advocates > have stated it is too complex or expensive to get it correct. I haven't reread all of RDS fine print to double-check this, but my impression is that RDS semantics exactly match the subset of MPI point-to-point communications where the receiving rank is required to have pre-posted buffers before the send is allowed. My concern is the requirement that RDS resync the structures in the face of failure and know whether to re-transmit or will deal with duplicates. Having pre-posted buffers will help enable the resync to be accomplished but should not be equated to pre-post equals one can deal with duplicates or will verify to prevent duplicates from occurring. Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] netperf over SDP bug
On Tue, Sep 27, 2005 at 06:17:00PM -0700, Grant Grundler wrote: > Hi Michael, > I'm trying to collect a full set of netperf TCP_STREAM over SDP for > SVN r3547 on 2.6.13 kernel. But some netperf runs get no throughput. Michael, I was able to reproduce this problem with SVN r3984. I've posted the graphs for r3547 and r3984 on: http://iou.parisc-linux.org/openib-perf-2005/r3547/ http://iou.parisc-linux.org/openib-perf-2005/r3984/ See sdpstream.png in each location. I'll pursue collecting information you asked for a few weeks ago as time permits. The above data was collected with "netserver" bound to the same CPU as the one taking IB MSI-X interrupts. This is bad for IPoIB (CPU bound) and good for SDP (CPU cache). I'll rerun the r3984 data and bind the netperf process as well. BTW, in case I haven't mentioned this before, I setup a parisc-linux box so netperf maintainer Rick Jones could manage his releases using something better than tarballs. netperf 2.x and netperf 4.x (under developement) source is available from: svn co http://www.netperf.org/svn/netperf2/ svn co http://www.netperf.org/svn/netperf4/ thanks, grant > Usually when sending 1k to 4k messages. The same netperf parameters > sing IPoIB seem to be working fine - just alot slower of course. > Summary of all netperf over SDP runs is appended. > > Sample commandline that got < 1Mb/s throughput is: > LD_PRELOAD=/usr/local/lib/libsdp.so /usr/local/bin/netperf -p 12866 -l 60 -H > 10.0.0.30 -t TCP_STREAM -T 1 -- -m 1024 -s 16384 -S 16384 > > I tried with some smaller -m parameters: > 512 -> ~270-280 Mb/s > 640 -> ~200-2100 Mb/s > 768 -> ~30-50 Mb/s > 896 -> ~2-6 Mb/s > > CPU is essentially idle in the above 512-896 byte cases. ... ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [git pull] IB updates for 2.6.15
Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus The pull will get the following changes: Jack Morgenstein: [IB] mthca: report page size capability [IB] uverbs: have kernel return QP capabilities Michael S. Tsirkin: [IB] umad: two small fixes [IB] mthca: fix posting of atomic operations [IB] mthca: fix posting long lists of receive work requests Roland Dreier: [IPoIB] add path record information in debugfs [IB] umad: avoid potential deadlock when unregistering MAD agents [IPoIB] no need to set skb->dev right before freeing skb [IB] mthca: fix typo in catastrophic error polling [IB] Have cq_resize() method take an int, not int* [IB] umad: get rid of unused mr array [IB] mthca: fix wraparound handling in mthca_cq_clean() [IB] umad: further ib_unregister_mad_agent() deadlock fixes drivers/infiniband/core/user_mad.c | 129 ++--- drivers/infiniband/core/uverbs_cmd.c | 12 +- drivers/infiniband/core/verbs.c| 12 -- drivers/infiniband/hw/mthca/mthca_catas.c |2 drivers/infiniband/hw/mthca/mthca_cmd.c|2 drivers/infiniband/hw/mthca/mthca_cq.c | 16 +- drivers/infiniband/hw/mthca/mthca_dev.h|2 drivers/infiniband/hw/mthca/mthca_main.c |2 drivers/infiniband/hw/mthca/mthca_provider.c |3 drivers/infiniband/hw/mthca/mthca_provider.h |1 drivers/infiniband/hw/mthca/mthca_qp.c | 113 +-- drivers/infiniband/hw/mthca/mthca_srq.c| 22 +++ drivers/infiniband/hw/mthca/mthca_wqe.h|3 drivers/infiniband/ulp/ipoib/ipoib.h | 15 +- drivers/infiniband/ulp/ipoib/ipoib_fs.c| 179 drivers/infiniband/ulp/ipoib/ipoib_main.c | 72 +- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 26 +-- drivers/infiniband/ulp/ipoib/ipoib_vlan.c |7 - include/rdma/ib_user_verbs.h |9 + include/rdma/ib_verbs.h|2 20 files changed, 466 insertions(+), 163 deletions(-) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Lustre over OpenIB Gen2
Hi Eric... writing YAN (yet another NAL) I see :) Eric> 2. I'd like to scale to >= 10,000 peer nodes; 1 RC QP per Eric> peer. Is this going to get me into trouble? Eric>For example, I currently create a single PD and CQ for Eric> everything, however the example I've seen (cmatose.c) Eric> appears to create these separately for each peer. Is that Eric> what I should be doing too? I don't think you want 10K PDs. But having a single CQ big enough to handle 10K QPs might be a problem. Eric> 3. Is contiguous memory allocation an issue in Gen2? Since Eric> this is such a scarce resource in the kernel (and particular Eric> CQ usage with one vendor's stack relied heavily on it) what Eric> red flags should I be aware of? There are still a few places where you can get in trouble (for example, with the mthca driver, extremely large QP work queues might be a problem, because the driver allocates contiguous memory for the array used to track work request IDs -- not the work queues themselves though). But CQs in particular should be fine. Eric> 4. Are RDMA reads still deprecated? Which resources hit the Eric> spotlight if I chose to use them? I don't think RDMA reads were ever really deprecated. But RDMA writes probably pipeline better. Eric> 5. Should I pre-map all physical memory and do RDMA in Eric> page-sized fragments? This avoids any mapping overhead at Eric> the expense of having much larger numbers of queued RDMAs. Eric> Since I try to keep up to 8 (by default) 1MByte RDMAs active Eric> concurrently to any individual peer, with 4k pages I can Eric> have up to 2048 RDMA work items queued at a time per peer. Eric>And if I pre-map, can I be guaranteed that if I put the Eric> CQ into the error state, all remote access to my memory is Eric> revoked (e.g. could a CQ I create after I destroy the one I Eric> just shut down somehow alias with it such that a Eric> pathalogically delayed RDMA could write my memory)? s/CQ/QP/ ... anyway, if you choose your receive queue sequence numbers randomly, then the probability of a QP number/sequence number collision allowing a stray RDMA is astronomically low (effectively 0). Eric>Or is it better to use FMR pools and take the map/unmap Eric> overhead? If so, is there a way to know when the unmap Eric> actually hits the hardware and my memory is safe? FMRs are only supported on Mellanox HCAs at the moment. But they do have some advantages, like allowing you to convert a bunch of pages into a single virtually contiguous region. You can use the ib_flush_fmr_pool() function to make sure that all unmapped FMRs are really and truly flushed, but that is a slow operation (since it incurs the penalty of flushing all in-flight operations in the HCA). Eric> 6. Does Gen2 present substantially the same APIs as the Eric> kernel in userspace? So if I wrote a userspace equivalent Eric> of my kernel driver, could I have pure userspace clients Eric> talk to kernel servers? Pretty much so, except of course userspace doesn't have access to physical memory or FMRs. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] uDAPL free build issues cleaned up, print path records returned from uAT
James, I fixed some problems with the free build openib_scm version. Also turned down some debugging and added some debug prints for uAT path records. -arlin Signed-off by: Arlin Davis <[EMAIL PROTECTED]> Index: dapl/openib/dapl_ib_cm.c === --- dapl/openib/dapl_ib_cm.c(revision 3990) +++ dapl/openib/dapl_ib_cm.c(working copy) @@ -136,14 +136,27 @@ static void dapli_path_comp_handler(uint dapl_dbg_log(DAPL_DBG_TYPE_CM, " path_comp_handler: SRC GID subnet %016llx id %016llx\n", - (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.interface_id) ); + (unsigned long long)cpu_to_be64(conn->dapl_path.sgid.global.subnet_prefix), + (unsigned long long)cpu_to_be64(conn->dapl_path.sgid.global.interface_id) ); dapl_dbg_log(DAPL_DBG_TYPE_CM, " path_comp_handler: DST GID subnet %016llx id %016llx\n", - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) ); + (unsigned long long)cpu_to_be64(conn->dapl_path.dgid.global.subnet_prefix), + (unsigned long long)cpu_to_be64(conn->dapl_path.dgid.global.interface_id) ); + dapl_dbg_log(DAPL_DBG_TYPE_CM, + " path_comp_handler: slid %x dlid %x mtu %x(%x) pktlife %x(%x)\n", + ntohs(conn->dapl_path.slid), ntohs(conn->dapl_path.dlid), + conn->dapl_path.mtu, conn->dapl_path.mtu_selector, + conn->dapl_path.packet_life_time, + conn->dapl_path.packet_life_time_selector ); + + dapl_dbg_log(DAPL_DBG_TYPE_CM, + " path_comp_handler: hops %x npaths %x pkey %x tclass %x rate %x(%x)\n", + conn->dapl_path.hop_limit, conn->dapl_path.numb_path, + conn->dapl_path.pkey, conn->dapl_path.traffic_class, + conn->dapl_path.rate, conn->dapl_path.rate_selector); + if (rec_num <= 0) { dapl_dbg_log(DAPL_DBG_TYPE_CM, " path_comp_handler: ERR %d retry %d\n", Index: dapl/openib_scm/dapl_ib_cm.c === --- dapl/openib_scm/dapl_ib_cm.c(revision 3990) +++ dapl/openib_scm/dapl_ib_cm.c(working copy) @@ -285,7 +285,7 @@ dapli_socket_listen ( DAPL_IA *ia_ptr, if (( bind( cm_ptr->l_socket,(struct sockaddr*)&addr, sizeof(addr) ) < 0) || (listen( cm_ptr->l_socket, 128 ) < 0) ) { - dapl_dbg_log( DAPL_DBG_TYPE_ERR, + dapl_dbg_log( DAPL_DBG_TYPE_CM, " listen: ERROR %s on conn_qual 0x%x\n", strerror(errno),serviceID); @@ -313,7 +313,7 @@ dapli_socket_listen ( DAPL_IA *ia_ptr, return dat_status; bail: - dapl_dbg_log( DAPL_DBG_TYPE_ERR, + dapl_dbg_log( DAPL_DBG_TYPE_CM, " listen: ERROR on conn_qual 0x%x\n",serviceID); if ( cm_ptr->l_socket >= 0 ) close( cm_ptr->l_socket ); Index: dapl/openib_scm/dapl_ib_cq.c === --- dapl/openib_scm/dapl_ib_cq.c(revision 3990) +++ dapl/openib_scm/dapl_ib_cq.c(working copy) @@ -569,7 +569,6 @@ dapls_ib_wait_object_wait ( { struct dapl_evd *evd_ptr; struct ibv_cq *ibv_cq = NULL; - void*ibv_ctx = NULL; int status = 0; int timeout_ms = -1; struct pollfd cq_fd = { @@ -602,7 +601,7 @@ dapls_ib_wait_object_wait ( dapl_dbg_log (DAPL_DBG_TYPE_CM, " cq_object_wait: RET evd %p ibv_cq %p ibv_ctx %p %s\n", - evd_ptr, ibv_cq,ibv_ctx,strerror(errno)); + evd_ptr, ibv_cq,strerror(errno)); return(dapl_convert_errno(status,"cq_wait_object_wait")); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Lustre over OpenIB Gen2
Eric> However I guess this still means that CQ resources Eric> sufficient for the maximum number of RDMAs I _could_ queue Eric> have to be allocated... In general there will be a relatively low limit on the maximum CQ size. For example, the maximum CQ size on Mellanox HCAs is ~128K entries. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Lustre over OpenIB Gen2
Yes, of course; I meant the QP. Regarding the total number of outstanding RDMA work requests, I can keep a separate cap on that, so if relatively few peers are active, I push the maximum number of RDMAs at them, but if many peers are active the number of active RDMAs per peer reduces. However I guess this still means that CQ resources sufficient for the maximum number of RDMAs I _could_ queue have to be allocated... > -Original Message- > From: Sean Hefty [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 10, 2005 7:12 PM > To: Eric Barton > Cc: openib-general@openib.org > Subject: Re: [openib-general] Lustre over OpenIB Gen2 > > > Eric Barton wrote: > > 5. Should I pre-map all physical memory and do RDMA in > page-sized fragments? > >This avoids any mapping overhead at the expense of > having much larger > >numbers of queued RDMAs. Since I try to keep up to 8 > (by default) 1MByte > >RDMAs active concurrently to any individual peer, with > 4k pages I can have > >up to 2048 RDMA work items queued at a time per peer. > > This is 20 million outstanding RDMA work requests per node. > > >And if I pre-map, can I be guaranteed that if I put the > CQ into the error > >state, all remote access to my memory is revoked (e.g. > could a CQ I create > >after I destroy the one I just shut down somehow alias > with it such that a > >pathalogically delayed RDMA could write my memory)? > > I think that you mean QP into the error state. If the QP is > in the error state, > then further access from a remote system should be impossible. > > - Sean > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Lustre over OpenIB Gen2
Eric Barton wrote: 5. Should I pre-map all physical memory and do RDMA in page-sized fragments? This avoids any mapping overhead at the expense of having much larger numbers of queued RDMAs. Since I try to keep up to 8 (by default) 1MByte RDMAs active concurrently to any individual peer, with 4k pages I can have up to 2048 RDMA work items queued at a time per peer. This is 20 million outstanding RDMA work requests per node. And if I pre-map, can I be guaranteed that if I put the CQ into the error state, all remote access to my memory is revoked (e.g. could a CQ I create after I destroy the one I just shut down somehow alias with it such that a pathalogically delayed RDMA could write my memory)? I think that you mean QP into the error state. If the QP is in the error state, then further access from a remote system should be impossible. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB
Yes, this is the case. - Original Message - From: "Caitlin Bestler" <[EMAIL PROTECTED]> To: Sent: Thursday, November 10, 2005 1:48 PM Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB Mike Krause wrote in response to Greg Lindahl: If it is to be reasonably robust, then RDS should be required to support the resync between the two sides of the communication. This aligns with the stated objective of implementing reliability in one location in software and one location in hardware. Without such resync being required in the ULP, then one ends up with a ULP that falls shorts of its stated objectives and pushes complexity back up to the application which is where the advocates have stated it is too complex or expensive to get it correct. This sort of message service, by the way, has a long history in distributed computing. Yep. I haven't reread all of RDS fine print to double-check this, but my impression is that RDS semantics exactly match the subset of MPI point-to-point communications where the receiving rank is required to have pre-posted buffers before the send is allowed. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Lustre over OpenIB Gen2
Eric Barton wrote: 1. How stable is the CM API and is it supported by all OpenIB affiliated vendors? The IB CM API is stable. Changes might occur as a result of changes to the CM protocol itself, but that effect is not limited to just the openib API. The RDMA CMA API is fairly stable, but could still see minor changes. This would be the better connection API to use if you want to connect using IP addresses. 2. I'd like to scale to >= 10,000 peer nodes; 1 RC QP per peer. Is this going to get me into trouble? For example, I currently create a single PD and CQ for everything, however the example I've seen (cmatose.c) appears to create these separately for each peer. Is that what I should be doing too? Cmatose is just a simple example program that I use for testing. If you're trying to scale out to 10,000 nodes, you'll want to limit your resources. For example, I've never been able to run cmatose with 10,000 connections without running out of resources on my system. Note that the IB CM does not implement a peer to peer connection model yet, so you would need to establish your connections using the client/server model. 4. Are RDMA reads still deprecated? Which resources hit the spotlight if I chose to use them? RDMA reads are fully supported. Not sure what lead you to think that they were deprecated. 6. Does Gen2 present substantially the same APIs as the kernel in userspace? So if I wrote a userspace equivalent of my kernel driver, could I have pure userspace clients talk to kernel servers? Most of the APIs are similar. There shouldn't be any issues talking between userspace clients and kernel servers. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB
Mike Krause wrote in response to Greg Lindahl: > If it is to be reasonably robust, then RDS should be required to support > the resync between the two sides of the communication. This aligns with the > stated objective of implementing reliability in one location in software and > one location in hardware. Without such resync being required in the ULP, > then one ends up with a ULP that falls shorts of its stated objectives and > pushes complexity back up to the application which is where the advocates > have stated it is too complex or expensive to get it correct. >> This sort of message service, by the way, has a long history in distributed computing. > Yep. I haven't reread all of RDS fine print to double-check this, but my impression is that RDS semantics exactly match the subset of MPI point-to-point communications where the receiving rank is required to have pre-posted buffers before the send is allowed. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Lustre over OpenIB Gen2
Hi, I'm working with Cluster File Systems on lustre network drivers, including IB drivers for the Voltaire, Infinicon and Topspin stacks. These are kernel drivers which use RC QPs with VERBS for small message queueing and RDMA for bulk transfers. We're obviously looking at OpenIB Gen2, and I wonder if people could be so kind as to answer some questions for me. 1. How stable is the CM API and is it supported by all OpenIB affiliated vendors? 2. I'd like to scale to >= 10,000 peer nodes; 1 RC QP per peer. Is this going to get me into trouble? For example, I currently create a single PD and CQ for everything, however the example I've seen (cmatose.c) appears to create these separately for each peer. Is that what I should be doing too? 3. Is contiguous memory allocation an issue in Gen2? Since this is such a scarce resource in the kernel (and particular CQ usage with one vendor's stack relied heavily on it) what red flags should I be aware of? 4. Are RDMA reads still deprecated? Which resources hit the spotlight if I chose to use them? 5. Should I pre-map all physical memory and do RDMA in page-sized fragments? This avoids any mapping overhead at the expense of having much larger numbers of queued RDMAs. Since I try to keep up to 8 (by default) 1MByte RDMAs active concurrently to any individual peer, with 4k pages I can have up to 2048 RDMA work items queued at a time per peer. And if I pre-map, can I be guaranteed that if I put the CQ into the error state, all remote access to my memory is revoked (e.g. could a CQ I create after I destroy the one I just shut down somehow alias with it such that a pathalogically delayed RDMA could write my memory)? Or is it better to use FMR pools and take the map/unmap overhead? If so, is there a way to know when the unmap actually hits the hardware and my memory is safe? 6. Does Gen2 present substantially the same APIs as the kernel in userspace? So if I wrote a userspace equivalent of my kernel driver, could I have pure userspace clients talk to kernel servers? Thanks in advance... -- Cheers, Eric --- |Eric BartonBarton Software | |9 York Gardens Tel:+44 (117) 330 1575| |CliftonMobile: +44 (7909) 680 356| |Bristol BS8 4LLFax:call first| |United Kingdom E-Mail: --| --- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [git patch review 7/7] [IB] umad: further ib_unregister_mad_agent() deadlock fixes
The previous umad deadlock fix left ib_umad_kill_port() still vulnerable to deadlocking. This patch fixes that by downgrading our lock to a read lock when we might end up trying to reacquire the lock for reading. Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- drivers/infiniband/core/user_mad.c | 87 ++-- 1 files changed, 63 insertions(+), 24 deletions(-) applies-to: 17115437026be55dcd74641be21561fecf33dcdb 94382f3562e350ed7c8f7dcd6fc968bdece31328 diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index d61f544..5ea741f 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -31,7 +31,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: user_mad.c 2814 2005-07-06 19:14:09Z halr $ + * $Id: user_mad.c 4010 2005-11-09 23:11:56Z roland $ */ #include @@ -110,12 +110,13 @@ struct ib_umad_device { }; struct ib_umad_file { - struct ib_umad_port *port; - struct list_head recv_list; - struct list_head port_list; - spinlock_t recv_lock; - wait_queue_head_trecv_wait; - struct ib_mad_agent *agent[IB_UMAD_MAX_AGENTS]; + struct ib_umad_port*port; + struct list_headrecv_list; + struct list_headport_list; + spinlock_t recv_lock; + wait_queue_head_t recv_wait; + struct ib_mad_agent*agent[IB_UMAD_MAX_AGENTS]; + int agents_dead; }; struct ib_umad_packet { @@ -144,6 +145,12 @@ static void ib_umad_release_dev(struct k kfree(dev); } +/* caller must hold port->mutex at least for reading */ +static struct ib_mad_agent *__get_agent(struct ib_umad_file *file, int id) +{ + return file->agents_dead ? NULL : file->agent[id]; +} + static int queue_packet(struct ib_umad_file *file, struct ib_mad_agent *agent, struct ib_umad_packet *packet) @@ -151,10 +158,11 @@ static int queue_packet(struct ib_umad_f int ret = 1; down_read(&file->port->mutex); + for (packet->mad.hdr.id = 0; packet->mad.hdr.id < IB_UMAD_MAX_AGENTS; packet->mad.hdr.id++) - if (agent == file->agent[packet->mad.hdr.id]) { + if (agent == __get_agent(file, packet->mad.hdr.id)) { spin_lock_irq(&file->recv_lock); list_add_tail(&packet->list, &file->recv_list); spin_unlock_irq(&file->recv_lock); @@ -326,7 +334,7 @@ static ssize_t ib_umad_write(struct file down_read(&file->port->mutex); - agent = file->agent[packet->mad.hdr.id]; + agent = __get_agent(file, packet->mad.hdr.id); if (!agent) { ret = -EINVAL; goto err_up; @@ -480,7 +488,7 @@ static int ib_umad_reg_agent(struct ib_u } for (agent_id = 0; agent_id < IB_UMAD_MAX_AGENTS; ++agent_id) - if (!file->agent[agent_id]) + if (!__get_agent(file, agent_id)) goto found; ret = -ENOMEM; @@ -530,7 +538,7 @@ static int ib_umad_unreg_agent(struct ib down_write(&file->port->mutex); - if (id < 0 || id >= IB_UMAD_MAX_AGENTS || !file->agent[id]) { + if (id < 0 || id >= IB_UMAD_MAX_AGENTS || !__get_agent(file, id)) { ret = -EINVAL; goto out; } @@ -608,21 +616,29 @@ static int ib_umad_close(struct inode *i struct ib_umad_file *file = filp->private_data; struct ib_umad_device *dev = file->port->umad_dev; struct ib_umad_packet *packet, *tmp; + int already_dead; int i; - for (i = 0; i < IB_UMAD_MAX_AGENTS; ++i) - if (file->agent[i]) - ib_unregister_mad_agent(file->agent[i]); + down_write(&file->port->mutex); + + already_dead = file->agents_dead; + file->agents_dead = 1; list_for_each_entry_safe(packet, tmp, &file->recv_list, list) kfree(packet); - down_write(&file->port->mutex); list_del(&file->port_list); - up_write(&file->port->mutex); - kfree(file); + downgrade_write(&file->port->mutex); + + if (!already_dead) + for (i = 0; i < IB_UMAD_MAX_AGENTS; ++i) + if (file->agent[i]) + ib_unregister_mad_agent(file->agent[i]); + up_read(&file->port->mutex); + + kfree(file); kref_put(&dev->ref, ib_umad_release_dev); return 0; @@ -848,13 +864,36 @@ static void ib_umad_kill_port(struct ib_ port->ib_dev = NULL; - list_for_each_entry(file, &port->file_list, port_list) - for (id = 0; id < IB_UMAD_MAX_AGENTS; ++id) { - if (!file->agent[id]) - continue; -
[openib-general] [git patch review 5/7] [IB] mthca: fix wraparound handling in mthca_cq_clean()
Handle case where prod_index has wrapped around and become less than cq->cons_index by checking that their difference as a signed int is positive rather than comparing directly. Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- drivers/infiniband/hw/mthca/mthca_cq.c | 16 ++-- 1 files changed, 6 insertions(+), 10 deletions(-) applies-to: 704990abeb22a51ed2722e92536d22135f60957f 64044bcf75063cb5a6d42712886a712449df2ce3 diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index f98e235..4a8adce 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -258,7 +258,7 @@ void mthca_cq_clean(struct mthca_dev *de { struct mthca_cq *cq; struct mthca_cqe *cqe; - int prod_index; + u32 prod_index; int nfreed = 0; spin_lock_irq(&dev->cq_table.lock); @@ -293,19 +293,15 @@ void mthca_cq_clean(struct mthca_dev *de * Now sweep backwards through the CQ, removing CQ entries * that match our QP by copying older entries on top of them. */ - while (prod_index > cq->cons_index) { - cqe = get_cqe(cq, (prod_index - 1) & cq->ibcq.cqe); + while ((int) --prod_index - (int) cq->cons_index >= 0) { + cqe = get_cqe(cq, prod_index & cq->ibcq.cqe); if (cqe->my_qpn == cpu_to_be32(qpn)) { if (srq) mthca_free_srq_wqe(srq, be32_to_cpu(cqe->wqe)); ++nfreed; - } - else if (nfreed) - memcpy(get_cqe(cq, (prod_index - 1 + nfreed) & - cq->ibcq.cqe), - cqe, - MTHCA_CQ_ENTRY_SIZE); - --prod_index; + } else if (nfreed) + memcpy(get_cqe(cq, (prod_index + nfreed) & cq->ibcq.cqe), + cqe, MTHCA_CQ_ENTRY_SIZE); } if (nfreed) { --- 0.99.9e ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [git patch review 4/7] [IB] mthca: fix posting of atomic operations
The size of work requests for atomic operations was computed incorrectly in mthca: all sizeofs need to be divided by 16. Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- drivers/infiniband/hw/mthca/mthca_qp.c |8 1 files changed, 4 insertions(+), 4 deletions(-) applies-to: 308dce81364b1cbb563942a1a57146c1808e8911 62abb8416f1923f4cef50ce9ce841b919275e3fb diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 7f39af4..190c1dc 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1556,8 +1556,8 @@ int mthca_tavor_post_send(struct ib_qp * } wqe += sizeof (struct mthca_atomic_seg); - size += sizeof (struct mthca_raddr_seg) / 16 + - sizeof (struct mthca_atomic_seg); + size += (sizeof (struct mthca_raddr_seg) + +sizeof (struct mthca_atomic_seg)) / 16; break; case IB_WR_RDMA_WRITE: @@ -1876,8 +1876,8 @@ int mthca_arbel_post_send(struct ib_qp * } wqe += sizeof (struct mthca_atomic_seg); - size += sizeof (struct mthca_raddr_seg) / 16 + - sizeof (struct mthca_atomic_seg); + size += (sizeof (struct mthca_raddr_seg) + +sizeof (struct mthca_atomic_seg)) / 16; break; case IB_WR_RDMA_READ: --- 0.99.9e ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [git patch review 6/7] [IB] mthca: fix posting long lists of receive work requests
In Tavor mode, when posting a long list of receive work requests, a doorbell must be rung every 256 requests. Add code to do this when required. Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- drivers/infiniband/hw/mthca/mthca_qp.c | 19 +-- drivers/infiniband/hw/mthca/mthca_srq.c | 22 -- drivers/infiniband/hw/mthca/mthca_wqe.h |3 ++- 3 files changed, 39 insertions(+), 5 deletions(-) applies-to: 984d2fc62c548af3d01450135f33b5b97aecf00b ae57e24a4006fd46b73d842ee99db9580ef74a02 diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 190c1dc..760c418 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1707,6 +1707,7 @@ int mthca_tavor_post_receive(struct ib_q { struct mthca_dev *dev = to_mdev(ibqp->device); struct mthca_qp *qp = to_mqp(ibqp); + __be32 doorbell[2]; unsigned long flags; int err = 0; int nreq; @@ -1724,6 +1725,22 @@ int mthca_tavor_post_receive(struct ib_q ind = qp->rq.next_ind; for (nreq = 0; wr; ++nreq, wr = wr->next) { + if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { + nreq = 0; + + doorbell[0] = cpu_to_be32((qp->rq.next_ind << qp->rq.wqe_shift) | size0); + doorbell[1] = cpu_to_be32(qp->qpn << 8); + + wmb(); + + mthca_write64(doorbell, + dev->kar + MTHCA_RECEIVE_DOORBELL, + MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + + qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB; + size0 = 0; + } + if (mthca_wq_overflow(&qp->rq, nreq, qp->ibqp.recv_cq)) { mthca_err(dev, "RQ %06x full (%u head, %u tail," " %d max, %d nreq)\n", qp->qpn, @@ -1781,8 +1798,6 @@ int mthca_tavor_post_receive(struct ib_q out: if (likely(nreq)) { - __be32 doorbell[2]; - doorbell[0] = cpu_to_be32((qp->rq.next_ind << qp->rq.wqe_shift) | size0); doorbell[1] = cpu_to_be32((qp->qpn << 8) | nreq); diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index 292f55b..c3c0331 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -414,6 +414,7 @@ int mthca_tavor_post_srq_recv(struct ib_ { struct mthca_dev *dev = to_mdev(ibsrq->device); struct mthca_srq *srq = to_msrq(ibsrq); + __be32 doorbell[2]; unsigned long flags; int err = 0; int first_ind; @@ -429,6 +430,25 @@ int mthca_tavor_post_srq_recv(struct ib_ first_ind = srq->first_free; for (nreq = 0; wr; ++nreq, wr = wr->next) { + if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { + nreq = 0; + + doorbell[0] = cpu_to_be32(first_ind << srq->wqe_shift); + doorbell[1] = cpu_to_be32(srq->srqn << 8); + + /* +* Make sure that descriptors are written +* before doorbell is rung. +*/ + wmb(); + + mthca_write64(doorbell, + dev->kar + MTHCA_RECEIVE_DOORBELL, + MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + + first_ind = srq->first_free; + } + ind = srq->first_free; if (ind < 0) { @@ -491,8 +511,6 @@ int mthca_tavor_post_srq_recv(struct ib_ } if (likely(nreq)) { - __be32 doorbell[2]; - doorbell[0] = cpu_to_be32(first_ind << srq->wqe_shift); doorbell[1] = cpu_to_be32((srq->srqn << 8) | nreq); diff --git a/drivers/infiniband/hw/mthca/mthca_wqe.h b/drivers/infiniband/hw/mthca/mthca_wqe.h index 1f4c0ff..73f1c0b 100644 --- a/drivers/infiniband/hw/mthca/mthca_wqe.h +++ b/drivers/infiniband/hw/mthca/mthca_wqe.h @@ -49,7 +49,8 @@ enum { }; enum { - MTHCA_INVAL_LKEY = 0x100 + MTHCA_INVAL_LKEY= 0x100, + MTHCA_TAVOR_MAX_WQES_PER_RECV_DB= 256 }; struct mthca_next_seg { --- 0.99.9e ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [git patch review 2/7] [IB] umad: get rid of unused mr array
Now that ib_umad uses the new MAD sending interface, it no longer needs its own L_Key. So just delete the array of MRs that it keeps. Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- drivers/infiniband/core/user_mad.c | 29 - 1 files changed, 4 insertions(+), 25 deletions(-) applies-to: e7b9ffe6fca9246f29a0a3cdf6417770f5821cef ec914c52d6208d8752dfd85b48a9aff304911434 diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index f5ed36c..d61f544 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -116,7 +116,6 @@ struct ib_umad_file { spinlock_t recv_lock; wait_queue_head_trecv_wait; struct ib_mad_agent *agent[IB_UMAD_MAX_AGENTS]; - struct ib_mr*mr[IB_UMAD_MAX_AGENTS]; }; struct ib_umad_packet { @@ -505,29 +504,16 @@ found: goto out; } - file->mr[agent_id] = ib_get_dma_mr(agent->qp->pd, IB_ACCESS_LOCAL_WRITE); - if (IS_ERR(file->mr[agent_id])) { - ret = -ENOMEM; - goto err; - } - if (put_user(agent_id, (u32 __user *) (arg + offsetof(struct ib_user_mad_reg_req, id { ret = -EFAULT; - goto err_mr; + ib_unregister_mad_agent(agent); + goto out; } file->agent[agent_id] = agent; ret = 0; - goto out; - -err_mr: - ib_dereg_mr(file->mr[agent_id]); - -err: - ib_unregister_mad_agent(agent); - out: up_write(&file->port->mutex); return ret; @@ -536,7 +522,6 @@ out: static int ib_umad_unreg_agent(struct ib_umad_file *file, unsigned long arg) { struct ib_mad_agent *agent = NULL; - struct ib_mr *mr = NULL; u32 id; int ret = 0; @@ -551,16 +536,13 @@ static int ib_umad_unreg_agent(struct ib } agent = file->agent[id]; - mr= file->mr[id]; file->agent[id] = NULL; out: up_write(&file->port->mutex); - if (agent) { + if (agent) ib_unregister_mad_agent(agent); - ib_dereg_mr(mr); - } return ret; } @@ -629,10 +611,8 @@ static int ib_umad_close(struct inode *i int i; for (i = 0; i < IB_UMAD_MAX_AGENTS; ++i) - if (file->agent[i]) { + if (file->agent[i]) ib_unregister_mad_agent(file->agent[i]); - ib_dereg_mr(file->mr[i]); - } list_for_each_entry_safe(packet, tmp, &file->recv_list, list) kfree(packet); @@ -872,7 +852,6 @@ static void ib_umad_kill_port(struct ib_ for (id = 0; id < IB_UMAD_MAX_AGENTS; ++id) { if (!file->agent[id]) continue; - ib_dereg_mr(file->mr[id]); ib_unregister_mad_agent(file->agent[id]); file->agent[id] = NULL; } --- 0.99.9e ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [git patch review 3/7] [IB] uverbs: have kernel return QP capabilities
Move the computation of QP capabilities (max scatter/gather entries, max inline data, etc) into the kernel, and have the uverbs module return the values as part of the create QP response. This keeps precise knowledge of device limits in the low-level kernel driver. This requires an ABI bump, so while we're making changes, get rid of the max_sge parameter for the modify SRQ command -- it's not used and shouldn't be there. Signed-off-by: Jack Morgenstein <[EMAIL PROTECTED]> Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- drivers/infiniband/core/uverbs_cmd.c | 12 ++-- drivers/infiniband/hw/mthca/mthca_cmd.c |2 + drivers/infiniband/hw/mthca/mthca_dev.h |1 drivers/infiniband/hw/mthca/mthca_main.c |1 drivers/infiniband/hw/mthca/mthca_provider.c |2 - drivers/infiniband/hw/mthca/mthca_provider.h |1 drivers/infiniband/hw/mthca/mthca_qp.c | 86 -- include/rdma/ib_user_verbs.h |9 ++- 8 files changed, 98 insertions(+), 16 deletions(-) applies-to: 2741f22c820fb664f6958becc4f3d415eea0e61b 77369ed31daac51f4827c50d30f233c45480235a diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 63a7415..ed45da8 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -708,7 +708,7 @@ ssize_t ib_uverbs_poll_cq(struct ib_uver resp->wc[i].opcode = wc[i].opcode; resp->wc[i].vendor_err = wc[i].vendor_err; resp->wc[i].byte_len = wc[i].byte_len; - resp->wc[i].imm_data = wc[i].imm_data; + resp->wc[i].imm_data = (__u32 __force) wc[i].imm_data; resp->wc[i].qp_num = wc[i].qp_num; resp->wc[i].src_qp = wc[i].src_qp; resp->wc[i].wc_flags = wc[i].wc_flags; @@ -908,7 +908,12 @@ retry: if (ret) goto err_destroy; - resp.qp_handle = uobj->uobject.id; + resp.qp_handle = uobj->uobject.id; + resp.max_recv_sge= attr.cap.max_recv_sge; + resp.max_send_sge= attr.cap.max_send_sge; + resp.max_recv_wr = attr.cap.max_recv_wr; + resp.max_send_wr = attr.cap.max_send_wr; + resp.max_inline_data = attr.cap.max_inline_data; if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { @@ -1135,7 +1140,7 @@ ssize_t ib_uverbs_post_send(struct ib_uv next->num_sge= user_wr->num_sge; next->opcode = user_wr->opcode; next->send_flags = user_wr->send_flags; - next->imm_data = user_wr->imm_data; + next->imm_data = (__be32 __force) user_wr->imm_data; if (qp->qp_type == IB_QPT_UD) { next->wr.ud.ah = idr_find(&ib_uverbs_ah_idr, @@ -1701,7 +1706,6 @@ ssize_t ib_uverbs_modify_srq(struct ib_u } attr.max_wr= cmd.max_wr; - attr.max_sge = cmd.max_sge; attr.srq_limit = cmd.srq_limit; ret = ib_modify_srq(srq, &attr, cmd.attr_mask); diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 49f211d..9ed3458 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -1060,6 +1060,8 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev dev_lim->hca.arbel.resize_srq = field & 1; MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_SG_RQ_OFFSET); dev_lim->max_sg = min_t(int, field, dev_lim->max_sg); + MTHCA_GET(size, outbox, QUERY_DEV_LIM_MAX_DESC_SZ_RQ_OFFSET); + dev_lim->max_desc_sz = min_t(int, size, dev_lim->max_desc_sz); MTHCA_GET(size, outbox, QUERY_DEV_LIM_MPT_ENTRY_SZ_OFFSET); dev_lim->mpt_entry_sz = size; MTHCA_GET(field, outbox, QUERY_DEV_LIM_PBL_SZ_OFFSET); diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index 808037f..497ff79 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -131,6 +131,7 @@ struct mthca_limits { int max_sg; int num_qps; int max_wqes; + int max_desc_sz; int max_qp_init_rdma; int reserved_qps; int num_srqs; diff --git a/drivers/infiniband/hw/mthca/mthca_main.c b/drivers/infiniband/hw/mthca/mthca_main.c index 16594d1..147f248 100644 --- a/drivers/infiniband/hw/mthca/mthca_main.c +++ b/drivers/infiniband/hw/mthca/mthca_main.c @@ -168,6 +168,7 @@ static int __devinit mthca_dev_lim(struc mdev->limits.max_srq_wqes = dev_lim->max_srq_sz; mdev->limits.reserved_srqs = dev_lim->reserved_srqs; mdev->limits.reserved_eecs =
[openib-general] [git patch review 1/7] [IB] Have cq_resize() method take an int, not int*
Change the struct ib_device.resize_cq() method to take a plain integer that holds the new CQ size, rather than a pointer to an integer that it uses to return the new size. This makes the interface match the exported ib_resize_cq() signature, and allows the low-level driver to update the CQ size with proper locking if necessary. No in-tree drivers are exporting this method yet. Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- drivers/infiniband/core/verbs.c | 12 ++-- include/rdma/ib_verbs.h |2 +- 2 files changed, 3 insertions(+), 11 deletions(-) applies-to: 08d94f59d6f80937db5d87f0bb60eafcedd811d1 40de2e548c225e3ef859e3c60de9785e37e1b5b1 diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 72d3ef7..4f51d79 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -324,16 +324,8 @@ EXPORT_SYMBOL(ib_destroy_cq); int ib_resize_cq(struct ib_cq *cq, int cqe) { - int ret; - - if (!cq->device->resize_cq) - return -ENOSYS; - - ret = cq->device->resize_cq(cq, &cqe); - if (!ret) - cq->cqe = cqe; - - return ret; + return cq->device->resize_cq ? + cq->device->resize_cq(cq, cqe) : -ENOSYS; } EXPORT_SYMBOL(ib_resize_cq); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index f72d46d..a7f4c35 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -881,7 +881,7 @@ struct ib_device { struct ib_ucontext *context, struct ib_udata *udata); int(*destroy_cq)(struct ib_cq *cq); - int(*resize_cq)(struct ib_cq *cq, int *cqe); + int(*resize_cq)(struct ib_cq *cq, int cqe); int(*poll_cq)(struct ib_cq *cq, int num_entries, struct ib_wc *wc); int(*peek_cq)(struct ib_cq *cq, int wc_cnt); --- 0.99.9e ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB
At 02:09 PM 11/9/2005, Greg Lindahl wrote: On Wed, Nov 09, 2005 at 01:57:06PM -0800, Michael Krause wrote: > What you indicate above is that RDS > will implement a resync of the two sides of the association to determine > what has been successfully sent. More accurate to say that it "could" implement that. I'm just kibbutzing on someone else's proposal. > This then implies that the reliability of the underlying > interconnect isn't as critical per se as the end-to-end RDS protocol > will assure that data is delivered to the RDS components in the face > of hardware failures. Correct? Yes. That's the intent that I see in the proposal. The implementation required to actually support this may not be what the proposers had in mind. If it is to be reasonably robust, then RDS should be required to support the resync between the two sides of the communication. This aligns with the stated objective of implementing reliability in one location in software and one location in hardware. Without such resync being required in the ULP, then one ends up with a ULP that falls shorts of its stated objectives and pushes complexity back up to the application which is where the advocates have stated it is too complex or expensive to get it correct. This sort of message service, by the way, has a long history in distributed computing. Yep. Mike ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] libmthca: fix posting long wqe lists for srq
Thanks -- I had basically the same thing in my local working directory but forgot to commit it. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] libmthca: fix posting long wqe lists for srq
Fix posting long WQE lists for SRQ. Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> Index: src/userspace/libmthca/src/srq.c === --- src/userspace/libmthca/src/srq.c(revision 4016) +++ src/userspace/libmthca/src/srq.c(working copy) @@ -99,6 +99,7 @@ int mthca_tavor_post_srq_recv(struct ibv for (nreq = 0; wr; ++nreq, wr = wr->next) { if (nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB) { + nreq = 0; doorbell[0] = htonl(first_ind << srq->wqe_shift); doorbell[1] = htonl((srq->srqn << 8) | nreq); -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general