RE: [openib-general] QoS RFC - Resend using a friendly mailer
Hi Sasha, Please see my comments below > > > > 9. OpenSM features > > --- > > The QoS related functionality to be provided by OpenSM can be split into two > > main parts: > > > > 3.1. Fabric Setup > > During fabric initialization the SM should parse the policy and apply its > > settings to the discovered fabric elements. The following actions should be > > performed: > > * Parsing of policy > > * Node Group identification. Warning should be provided for each node not > > specified but found. > > * SL2VL settings validation should be checked: > > + A warning will be provided if there are no matching targets for the SL2VL > > setting statement. > > + An error message will be printed to the log file if an invalid setting is > > found. A setting is invalid if it refers to: > > - Non existing port numbers of the target devices > > - Unsupported VLs for the target device. In the later case the map to non > > existing VLs should be replaced to VL15 i.e. packets will be dropped. > > Not sure that unsupported VLs mapping to VL15 is best option. Actually > if SL2VL will be specified per port group this may mean that at least in > "generic" case all group members should have similar physical > capabilities or "reliable" part of SLs will be limited by lowest VLCap > in this group (other SLs will be just dropped somewhere). [EZ] I prefer not hiding the mismatch. In my mind the explicit setting should be provided for each of the groups of switches that do not share same VLs support. But this is not a strong requirement in my mind. In general I would prefer to get a clear error message when the fabric can not support the given policy. Once such error is provided I think we could use whatever "recovery" option you have in mind. > > In current SL2VL mapping implementation we are using such rule to replace > unsupported VLs: (new VL) = (requested VL) % (operational data VLs) > This may have some disadvantage too, but I think it is generally "safer". [EZ] It is safer since it will not cause data loss. But then the QoS will probably be broken. > > Also I guess that by "unsupported VLs" you are referring unsupported or > non-configured VLs. [EZ] Yes true. > > > * SL2VL setting is to be performed > > * VL Arbitration table settings should be validated according to the following > > rules: > > + A warning will be provided if there are no matching targets for the setting > > statement > > + An error will be provided if the port number exceeds the target ports > > + An error will be generated if the table length exceeds device capabilities > > + An warning will be generated if the table quote a VL that is not supported > > by the target device > > Should there be replacement rule for not supported VLs? > > In IBTA spec (v.1, p.190, l.14) is stated that entry with unsupported VL > may be skipped _OR_ "trusted" to other (supported) VL. I think if we will > not care about unsupported replacement there may be hole for > "device/vendor dependent" behavior. [EZ] OK good point. Lets have a replacement rule. > > Sasha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] QoS RFC - Resend using a friendly mailer
On Thu, Jun 01, 2006 at 09:09:49PM +0300, Sasha Khapyorsky wrote: > On 15:49 Tue 30 May , Grant Grundler wrote: > > On Tue, May 30, 2006 at 10:09:36PM +0300, Sasha Khapyorsky wrote: > > > > XML style syntax is provided for the policy file. > > > > > > Why XML? It is not too much readable and writable (by human) format. > > > > It is human readable and very portable. > > An example is here: > > http://svn.gnumonks.org/trunk/mmio_test/mmio_test.xml > > Yes it is readable, but for many people it is _less_ readable and even > _less_ writable than "plain" text. This might be a good starting point for "many people": http://ahds.ac.uk/creating/information-papers/xml-editors/ I tried conglomerate (debian) and it doesn't like mmiot_test.xnl for some reason. But I suppose that could be fixed. Anyway, my point is there is no shortage of GUIs to edit XML files and verify syntactical correctness. hth, grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] QoS RFC - Resend using a friendly mailer
Hi Eitan, Some more comments related to OpenSM. On 17:53 Tue 30 May , Eitan Zahavi wrote: > > 9. OpenSM features > --- > The QoS related functionality to be provided by OpenSM can be split into two > main parts: > > 3.1. Fabric Setup > During fabric initialization the SM should parse the policy and apply its > settings to the discovered fabric elements. The following actions should be > performed: > * Parsing of policy > * Node Group identification. Warning should be provided for each node not > specified but found. > * SL2VL settings validation should be checked: > + A warning will be provided if there are no matching targets for the SL2VL > setting statement. > + An error message will be printed to the log file if an invalid setting is > found. A setting is invalid if it refers to: > - Non existing port numbers of the target devices > - Unsupported VLs for the target device. In the later case the map to non > existing VLs should be replaced to VL15 i.e. packets will be dropped. Not sure that unsupported VLs mapping to VL15 is best option. Actually if SL2VL will be specified per port group this may mean that at least in "generic" case all group members should have similar physical capabilities or "reliable" part of SLs will be limited by lowest VLCap in this group (other SLs will be just dropped somewhere). In current SL2VL mapping implementation we are using such rule to replace unsupported VLs: (new VL) = (requested VL) % (operational data VLs) This may have some disadvantage too, but I think it is generally "safer". Also I guess that by "unsupported VLs" you are referring unsupported or non-configured VLs. > * SL2VL setting is to be performed > * VL Arbitration table settings should be validated according to the > following > rules: > + A warning will be provided if there are no matching targets for the > setting > statement > + An error will be provided if the port number exceeds the target ports > + An error will be generated if the table length exceeds device capabilities > + An warning will be generated if the table quote a VL that is not > supported > by the target device Should there be replacement rule for not supported VLs? In IBTA spec (v.1, p.190, l.14) is stated that entry with unsupported VL may be skipped _OR_ "trusted" to other (supported) VL. I think if we will not care about unsupported replacement there may be hole for "device/vendor dependent" behavior. Sasha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] QoS RFC - Resend using a friendly mailer
On 15:49 Tue 30 May , Grant Grundler wrote: > On Tue, May 30, 2006 at 10:09:36PM +0300, Sasha Khapyorsky wrote: > > > XML style syntax is provided for the policy file. > > > > Why XML? It is not too much readable and writable (by human) format. > > It is human readable and very portable. > An example is here: > http://svn.gnumonks.org/trunk/mmio_test/mmio_test.xml Yes it is readable, but for many people it is _less_ readable and even _less_ writable than "plain" text. > And GPL libraries can parse XML. It is true, but currently we have "portability" complaints even against using libpthread. Sasha > So the new code is fairly short: > http://svn.gnumonks.org/trunk/mmio_test/xmlin.c > > hth, > grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] QoS RFC - Resend using a friendly mailer
I am am member of MgtWG and will look for the discussion there. It doesn't seem like a topic LWG would cover. Todd Rimmer > -Original Message- > From: Eitan Zahavi [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 31, 2006 1:35 AM > To: Rimmer, Todd > Cc: openib-general@openib.org > Subject: RE: [openib-general] QoS RFC - Resend using a friendly mailer > > > Hi Todd, > > It is LWG. MgtWG will also be involved. > > > > I am a member of IBTA however I have not noticed this discussion on > the IBTA > > working groups. Which working group have you engaged with this > proposal? > > > > Todd Rimmer > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] QoS RFC - Resend using a friendly mailer
Hi Todd, It is LWG. MgtWG will also be involved. > > I am a member of IBTA however I have not noticed this discussion on the IBTA > working groups. Which working group have you engaged with this proposal? > > Todd Rimmer ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] QoS RFC - Resend using a friendly mailer
On Tue, 2006-05-30 at 18:05, Rimmer, Todd wrote: > > Eitan Wrote: > > > As Roland suggest, before implementing a non-standard approach, IBTA > > should be > > > engaged to define an appropriate extension to the standard. Such > > extensions would > > > need to be carefully defined to avoid breaking existing applications > > and fabrics. > > [EZ] You are welcome to join IBTA and work on this too. > > I am a member of IBTA however I have not noticed this discussion > on the IBTA working groups. Which working group have you engaged > with this proposal? It's at the LWG. -- Hal > > Todd Rimmer > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] QoS RFC - Resend using a friendly mailer
On Tue, May 30, 2006 at 10:09:36PM +0300, Sasha Khapyorsky wrote: > > XML style syntax is provided for the policy file. > > Why XML? It is not too much readable and writable (by human) format. It is human readable and very portable. An example is here: http://svn.gnumonks.org/trunk/mmio_test/mmio_test.xml And GPL libraries can parse XML. So the new code is fairly short: http://svn.gnumonks.org/trunk/mmio_test/xmlin.c hth, grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] QoS RFC - Resend using a friendly mailer
> Eitan Wrote: > > As Roland suggest, before implementing a non-standard approach, IBTA > should be > > engaged to define an appropriate extension to the standard. Such > extensions would > > need to be carefully defined to avoid breaking existing applications > and fabrics. > [EZ] You are welcome to join IBTA and work on this too. I am a member of IBTA however I have not noticed this discussion on the IBTA working groups. Which working group have you engaged with this proposal? Todd Rimmer ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] QoS RFC - Resend using a friendly mailer
On 22:43 Tue 30 May , Eitan Zahavi wrote: > > > > > > XML style syntax is provided for the policy file. > > > > Why XML? It is not too much readable and writable (by human) format. > [EZ] Well, I agree with you but already got so many requests for XML > that I could not resists. Maybe we could do both. If we have a nice BNF > it would be just a matter of some yacc exercise. I less care about a parser complexity but more about people which will wish to edit the policy definitions with just 'vi' (or any other text editor). And I agree that OpenSM config -> XML converter may be not so hard to do. Sasha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] QoS RFC - Resend using a friendly mailer
Hi Sasha, Thanks for your comments. Please see my comments inside > > 3. Supported Policy > > > > > > The QoS policy supported by this proposal is divided into 4 sub sections: > > > > * Node Group: a set of HCAs, Routers or Switches that share the same settings. > > A node groups might be a partition defined by the partition manager policy in > > terms of GUIDs. Future implementations might provide support for > NodeDescription > > based definition of node groups. > > Port/Node groups could be defined as separate configuration, then those > definitions will be shared by different policies like Partitions, QoS (and > maybe others in future). [EZ] Great idea. I would suggest using NodeDescription as a way to get node names. But this is yet another issue for discussion on the IBTA and this list. > > > * Fabric Setup: > > Defines how the SL2VL and VLArb tables should be setup. This policy definition > > assumes the computation of target behavior should be performed outside of > > OpenSM. > > > > * QoS-Levels Definition: > > This section defines the possible sets of parameters for QoS that a client might > > be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits > > (in case LMC > 0 is used for QoS) and TClass. > > > > * Matching Rules: > > A list of rules that match an incoming PathRecord request to a QoS-Level. The > > rules are processed in order such as the first match is applied. Each rule is > > built out of set of match expressions which should all match for the rule to > > apply. The matching expressions are defined for the following fields > > ** SRC and DST to lists of node groups > > ** Service-ID to a list of Service-ID or Service-ID ranges > > ** TClass to a list of TClass values or ranges > > > > XML style syntax is provided for the policy file. > > Why XML? It is not too much readable and writable (by human) format. [EZ] Well, I agree with you but already got so many requests for XML that I could not resists. Maybe we could do both. If we have a nice BNF it would be just a matter of some yacc exercise. IMO it is the least of our problems. > > Sasha > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] QoS RFC - Resend using a friendly mailer
Hi Hal, Please see my responses inside Eitan > > > > RFC: OpenFabrics Enhancements for QoS Support > > === > > > > Authors: . Eitan Zahavi <[EMAIL PROTECTED]> > > Date: May 2006. > > Revision: 0.1 > > > > Table of contents: > > 1. Overview > > 2. Architecture > > 3. Supported Policy > > 4. CMA functionality > > 5. IPoIB functionality > > 6. SDP functionality > > 7. SRP functionality > > 8. iSER functionality > > 9. OpenSM functionality > > > > 1. Overview > > > > Quality of Service requirements stem from the realization of I/O consolidation > > over IB network: As multiple applications and ULPs share the same fabric, means > > to control their use of the network resources are becoming a must. The basic > > need is to differentiate the service levels provided to different traffic flows. > > Such that a policy could be enforced and control each flow utilization of the > > fabric resources. > > > > IBTA specification defined several hardware features and management interfaces > > to support QoS: > > * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner > > * Arbitration between traffic of different VL is performed by a 2 priority > > levels weighted round robin arbiter. The arbiter is programmable with > > a sequence of (VL, weight) pairs and maximal number of high priority credits > > to be processed before low priority is served > > * Packets carry class of service marking in the range 0 to 15 in their > > header SL field > > * Each switch can map the incoming packet by its SL to a particular output > > VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) > > * The Subnet Administrator controls each communication flow parameters > > by providing them as a response to Path Record query > > > > The IB QoS features provide the means to implement a DiffServ like architecture. > > DiffServ architecture (IETF RFC2474 2475) is widely used today in highly dynamic > > fabrics. > > Only certain DSCP code point equivalents are provided by IBA. [EZ] True. > > > This proposal provides the detailed functional definition for the various > > software elements that are required to enable a DiffServ like architecture over > > the OpenFabrics software stack. > > > > > > > > > > > > 2. Architecture > > > > This proposal split the QoS functionality between the SM/SA, CMA and the various > > ULPS. We take the "chronology approach" to describe how the overall system > > works: > > > > 2.1. The network manager (human) provides a set of rules (policy) that defines > > how the network is being configured and how its resources are split to different > > QoS-Levels. The policy also define how to decide which QoS-Level each > > application or ULP or service use. > > > 2.2. The SM analyzes the provided policy to see if it is realizable and performs > > the necessary fabric setup. The SM may continuously monitor the policy and adapt > > to changes in it. > > Do you mean monitor the policy or the fabric here ? [EZ] I mean monitor the policy such that changes in it are enforced. > > > Part of this policy defines the default QoS-Level of each > > partition. The SA is being enhanced to match the requested Source, Destination, > > TClass, Service-ID > > Service ID does not apply to many ULPs. Also, how is it known what > ULP/application a particular service ID refers to (other than perhaps > some well known ones) ? [EZ] True - only well known Service-IDs can have a predefined policy attached to. But I disagree on the fact services are unknown - if they are unknown how are they being found by the clients? > > > (and optionally SL and priority) against the policy. So > > clients (ULPs, programs) can obtain a policy enforced QoS. The SM is also > > enhanced to support setting up partitions with appropriate IPoIB broadcast > > group. This broadcast group carries its QoS attributes: TClass, SL, MTU and > > RATE. > > > > 2.3. IPoIB is being setup. IPoIB uses the SL, MTU and RATE available on the > > multicast group which forms the broadcast group of this partition. > > > > 2.4. MPI which provides non IB based connection management should be > configured > > to run using hard coded SLs. It uses these SLs in every QP being opened. > > > > 2.5. ULPs that use CM interface (like SRP) should have their own pre-assigned > > Service-ID and use it while obtaining PathRecord for establishing their > > connections. The SA receiving the PathRecord should match it against the policy > > and return the appropriate PathRecord including SL, MTU, RATE and TClass. > > > > 2.6. ULPs and programs using CMA to establish RC connection should provide the > > CMA the target IP and Service-ID. Some of the ULPs might also provide TClass > > (E.g. for SDP sockets that are provided the TOS socket option). The CMA should > > then use the provided Service-ID and optional TClass and pass them in the > > PathRecor
RE: [openib-general] QoS RFC - Resend using a friendly mailer
Hi Todd, > While using the Service ID is an interesting idea, the problem is the Service ID values > are not well defined by IBTA. Rather each endpoint is permitted to define its own, > potentially transient set of Service ID values. The Service ID values are discovered via > Service Records in the SA or Device Management queries which get their data from > the IOU. [EZ] Actually there are quite a few rules for how service IDs are made. Different service vendors are supposed to use different Service-IDs. Also this RFC does enforce using Service-IDs in cases where they are not defined. But it does provide the means to do that when such service are defined. So in no way you can say it breaks existing implementations. Just provide a way for applications that do make a constant use of Service-IDs benefit from that property. > > Hence while a few service ID values are well defined (such as those for SDP), many > are not (such as those for MPI, uDAPL, SRP, etc) and may vary between both > hardware and software suppliers. Many are likely to be duplicated between different > vendors target devices (for example a uDAPL target application may duplicate values > used by an SRP target) and this would not be a problem provided both applications > were never run on the same IB Node target device. Some might even change on each > reboot (IBTA spec implies this could be a 64 bit pointer or context in the target), > although I'm not aware of any which do. > > I believe it is for the above reasons that IBTA chose not to make ServiceID part of the > PathRecord and MultiPathRecord queries. > > As Roland suggest, before implementing a non-standard approach, IBTA should be > engaged to define an appropriate extension to the standard. Such extensions would > need to be carefully defined to avoid breaking existing applications and fabrics. [EZ] You are welcome to join IBTA and work on this too. > > Todd Rimmer > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] QoS RFC - Resend using a friendly mailer
Hi Eitan, First comments... On 17:53 Tue 30 May , Eitan Zahavi wrote: > > 3. Supported Policy > > > The QoS policy supported by this proposal is divided into 4 sub sections: > > * Node Group: a set of HCAs, Routers or Switches that share the same > settings. > A node groups might be a partition defined by the partition manager policy in > terms of GUIDs. Future implementations might provide support for > NodeDescription > based definition of node groups. Port/Node groups could be defined as separate configuration, then those definitions will be shared by different policies like Partitions, QoS (and maybe others in future). > * Fabric Setup: > Defines how the SL2VL and VLArb tables should be setup. This policy > definition > assumes the computation of target behavior should be performed outside of > OpenSM. > > * QoS-Levels Definition: > This section defines the possible sets of parameters for QoS that a client > might > be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits > (in case LMC > 0 is used for QoS) and TClass. > > * Matching Rules: > A list of rules that match an incoming PathRecord request to a QoS-Level. The > rules are processed in order such as the first match is applied. Each rule is > built out of set of match expressions which should all match for the rule to > apply. The matching expressions are defined for the following fields > ** SRC and DST to lists of node groups > ** Service-ID to a list of Service-ID or Service-ID ranges > ** TClass to a list of TClass values or ranges > > XML style syntax is provided for the policy file. Why XML? It is not too much readable and writable (by human) format. Sasha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] QoS RFC - Resend using a friendly mailer
High-level feedback: - An IB fabric could be used for a single ULP and still require QoS. The issue is how to differentiate flows on a given shared element within the fabric. - QoS controls must be dynamic. The document references initialization as the time when decisions are made but obviously that is just a first pass on use of the fabric and not what it will become in potentially a short period of time. - QoS also involves multi-path support (not really touched upon in terms of specifics in this document). Distributing or segregating work even if for the same ULP should be done across multiple or distinct paths. In one sense this may complicate the work but in another it is simpler in that arbitration controls for shared links become easier to manage if the number of flows is reduced. - IP over IB defines a multicast group which is ultimately a spanning tree. That should not constrain what paths are used to communicate between endnode pairs. That only defines the multicast paths which are not strongly ordered relative to the unicast traffic. Further IP over IB may operate using the RC mode between endnodes. It is very simple to replicate RC and then segregate these into QoS domains (one could just align priority with the 802.1p for simplicity and practical execution) which can in turn flow over shared or distinct paths. - IB is a centrally managed fabric. Adding in SID into records and such really isn't going to help solve the problem unless there is also a centralized management entity well above IB that can prioritize communication service rates for different ULP and endnode pairs. Given most of these centralized management entities are rather ignorant of IB at the moment, this presents a chicken-egg dilemma which is further complicated by developing SOA technology. It might be more valuable in one sense to examine SOA technology and how it is translating itself to say Ethernet and then see how this can be leveraged to IB. - QoS needs to examine the sums of the consumers of a given path and their service rate requirements. It isn't just about setting a priority level but also about the packet injection rate to the fabric on that priority. This needs to be taken into account as well. Overall, it is not clear to me what the end value of this document. The challenge for any network admin is to translate SOA driven requirements into fabric control knob setting. Without such translation algorithms / understanding, it is not clear that there is anything truly missing in the IBTA spec suite or that this RFC will really advance the integration of IB into the data center in a truly meaningful manner. Mike At 07:53 AM 5/30/2006, Eitan Zahavi wrote: To: OPENIB Subject: QoS RFC - Resend using a friendly mailer --text follows this line-- Hi All Please find the attached RFC describing how QoS policy support could be implemented in the OpenFabrics stack. Your comments are welcome. Eitan RFC: OpenFabrics Enhancements for QoS Support === Authors: . Eitan Zahavi <[EMAIL PROTECTED]> Date: May 2006. Revision: 0.1 Table of contents: 1. Overview 2. Architecture 3. Supported Policy 4. CMA functionality 5. IPoIB functionality 6. SDP functionality 7. SRP functionality 8. iSER functionality 9. OpenSM functionality 1. Overview Quality of Service requirements stem from the realization of I/O consolidation over IB network: As multiple applications and ULPs share the same fabric, means to control their use of the network resources are becoming a must. The basic need is to differentiate the service levels provided to different traffic flows. Such that a policy could be enforced and control each flow utilization of the fabric resources. IBTA specification defined several hardware features and management interfaces to support QoS: * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner * Arbitration between traffic of different VL is performed by a 2 priority levels weighted round robin arbiter. The arbiter is programmable with a sequence of (VL, weight) pairs and maximal number of high priority credits to be processed before low priority is served * Packets carry class of service marking in the range 0 to 15 in their header SL field * Each switch can map the incoming packet by its SL to a particular output VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) * The Subnet Administrator controls each communication flow parameters by providing them as a response to Path Record query The IB QoS features provide the means to implement a DiffServ like architecture. DiffServ architecture (IETF RFC2474 2475) is widely used today in highly dynamic fabrics. This proposal provides the detailed functional definition for the various software elements that are required to enable a DiffServ like architecture over the OpenFabrics software stack. 2. Architecture --
Re: [openib-general] QoS RFC - Resend using a friendly mailer
On Tue, 2006-05-30 at 10:53, Eitan Zahavi wrote: > To: OPENIB > Subject: QoS RFC - Resend using a friendly mailer > --text follows this line-- > Hi All > > Please find the attached RFC describing how QoS policy support could be > implemented in the OpenFabrics stack. > Your comments are welcome. Some initial comments. > > Eitan > > RFC: OpenFabrics Enhancements for QoS Support > === > > Authors: . Eitan Zahavi <[EMAIL PROTECTED]> > Date: May 2006. > Revision: 0.1 > > Table of contents: > 1. Overview > 2. Architecture > 3. Supported Policy > 4. CMA functionality > 5. IPoIB functionality > 6. SDP functionality > 7. SRP functionality > 8. iSER functionality > 9. OpenSM functionality > > 1. Overview > > Quality of Service requirements stem from the realization of I/O > consolidation > over IB network: As multiple applications and ULPs share the same fabric, > means > to control their use of the network resources are becoming a must. The basic > need is to differentiate the service levels provided to different traffic > flows. > Such that a policy could be enforced and control each flow utilization of the > fabric resources. > > IBTA specification defined several hardware features and management > interfaces > to support QoS: > * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner > * Arbitration between traffic of different VL is performed by a 2 priority > levels weighted round robin arbiter. The arbiter is programmable with > a sequence of (VL, weight) pairs and maximal number of high priority > credits > to be processed before low priority is served > * Packets carry class of service marking in the range 0 to 15 in their > header SL field > * Each switch can map the incoming packet by its SL to a particular output > VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) > * The Subnet Administrator controls each communication flow parameters > by providing them as a response to Path Record query > > The IB QoS features provide the means to implement a DiffServ like > architecture. > DiffServ architecture (IETF RFC2474 2475) is widely used today in highly > dynamic > fabrics. Only certain DSCP code point equivalents are provided by IBA. > This proposal provides the detailed functional definition for the various > software elements that are required to enable a DiffServ like architecture > over > the OpenFabrics software stack. > > > > > > 2. Architecture > > This proposal split the QoS functionality between the SM/SA, CMA and the > various > ULPS. We take the "chronology approach" to describe how the overall system > works: > > 2.1. The network manager (human) provides a set of rules (policy) that > defines > how the network is being configured and how its resources are split to > different > QoS-Levels. The policy also define how to decide which QoS-Level each > application or ULP or service use. > 2.2. The SM analyzes the provided policy to see if it is realizable and > performs > the necessary fabric setup. The SM may continuously monitor the policy and > adapt > to changes in it. Do you mean monitor the policy or the fabric here ? > Part of this policy defines the default QoS-Level of each > partition. The SA is being enhanced to match the requested Source, > Destination, > TClass, Service-ID Service ID does not apply to many ULPs. Also, how is it known what ULP/application a particular service ID refers to (other than perhaps some well known ones) ? > (and optionally SL and priority) against the policy. So > clients (ULPs, programs) can obtain a policy enforced QoS. The SM is also > enhanced to support setting up partitions with appropriate IPoIB broadcast > group. This broadcast group carries its QoS attributes: TClass, SL, MTU and > RATE. > > 2.3. IPoIB is being setup. IPoIB uses the SL, MTU and RATE available on the > multicast group which forms the broadcast group of this partition. > > 2.4. MPI which provides non IB based connection management should be > configured > to run using hard coded SLs. It uses these SLs in every QP being opened. > > 2.5. ULPs that use CM interface (like SRP) should have their own pre-assigned > Service-ID and use it while obtaining PathRecord for establishing their > connections. The SA receiving the PathRecord should match it against the > policy > and return the appropriate PathRecord including SL, MTU, RATE and TClass. > > 2.6. ULPs and programs using CMA to establish RC connection should provide > the > CMA the target IP and Service-ID. Some of the ULPs might also provide TClass > (E.g. for SDP sockets that are provided the TOS socket option). The CMA > should > then use the provided Service-ID and optional TClass and pass them in the > PathRecord request. The resulting PathRecord should be used for configuring > the > connecti
RE: [openib-general] QoS RFC - Resend using a friendly mailer
> Eitan wrote: > 2.2. The SM analyzes the provided policy to see if it is > realizable and performs > the necessary fabric setup. The SM may continuously monitor > the policy and adapt > to changes in it. Part of this policy defines the default > QoS-Level of each > partition. The SA is being enhanced to match the requested > Source, Destination, > TClass, Service-ID (and optionally SL and priority) against > the policy. So > clients (ULPs, programs) can obtain a policy enforced QoS. > The SM is also > enhanced to support setting up partitions with appropriate > IPoIB broadcast > group. This broadcast group carries its QoS attributes: > TClass, SL, MTU and > RATE. While using the Service ID is an interesting idea, the problem is the Service ID values are not well defined by IBTA. Rather each endpoint is permitted to define its own, potentially transient set of Service ID values. The Service ID values are discovered via Service Records in the SA or Device Management queries which get their data from the IOU. Hence while a few service ID values are well defined (such as those for SDP), many are not (such as those for MPI, uDAPL, SRP, etc) and may vary between both hardware and software suppliers. Many are likely to be duplicated between different vendors target devices (for example a uDAPL target application may duplicate values used by an SRP target) and this would not be a problem provided both applications were never run on the same IB Node target device. Some might even change on each reboot (IBTA spec implies this could be a 64 bit pointer or context in the target), although I'm not aware of any which do. I believe it is for the above reasons that IBTA chose not to make ServiceID part of the PathRecord and MultiPathRecord queries. As Roland suggest, before implementing a non-standard approach, IBTA should be engaged to define an appropriate extension to the standard. Such extensions would need to be carefully defined to avoid breaking existing applications and fabrics. Todd Rimmer ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] QoS RFC - Resend using a friendly mailer
To: OPENIB Subject: QoS RFC - Resend using a friendly mailer --text follows this line-- Hi All Please find the attached RFC describing how QoS policy support could be implemented in the OpenFabrics stack. Your comments are welcome. Eitan RFC: OpenFabrics Enhancements for QoS Support === Authors: . Eitan Zahavi <[EMAIL PROTECTED]> Date: May 2006. Revision: 0.1 Table of contents: 1. Overview 2. Architecture 3. Supported Policy 4. CMA functionality 5. IPoIB functionality 6. SDP functionality 7. SRP functionality 8. iSER functionality 9. OpenSM functionality 1. Overview Quality of Service requirements stem from the realization of I/O consolidation over IB network: As multiple applications and ULPs share the same fabric, means to control their use of the network resources are becoming a must. The basic need is to differentiate the service levels provided to different traffic flows. Such that a policy could be enforced and control each flow utilization of the fabric resources. IBTA specification defined several hardware features and management interfaces to support QoS: * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner * Arbitration between traffic of different VL is performed by a 2 priority levels weighted round robin arbiter. The arbiter is programmable with a sequence of (VL, weight) pairs and maximal number of high priority credits to be processed before low priority is served * Packets carry class of service marking in the range 0 to 15 in their header SL field * Each switch can map the incoming packet by its SL to a particular output VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) * The Subnet Administrator controls each communication flow parameters by providing them as a response to Path Record query The IB QoS features provide the means to implement a DiffServ like architecture. DiffServ architecture (IETF RFC2474 2475) is widely used today in highly dynamic fabrics. This proposal provides the detailed functional definition for the various software elements that are required to enable a DiffServ like architecture over the OpenFabrics software stack. 2. Architecture This proposal split the QoS functionality between the SM/SA, CMA and the various ULPS. We take the "chronology approach" to describe how the overall system works: 2.1. The network manager (human) provides a set of rules (policy) that defines how the network is being configured and how its resources are split to different QoS-Levels. The policy also define how to decide which QoS-Level each application or ULP or service use. 2.2. The SM analyzes the provided policy to see if it is realizable and performs the necessary fabric setup. The SM may continuously monitor the policy and adapt to changes in it. Part of this policy defines the default QoS-Level of each partition. The SA is being enhanced to match the requested Source, Destination, TClass, Service-ID (and optionally SL and priority) against the policy. So clients (ULPs, programs) can obtain a policy enforced QoS. The SM is also enhanced to support setting up partitions with appropriate IPoIB broadcast group. This broadcast group carries its QoS attributes: TClass, SL, MTU and RATE. 2.3. IPoIB is being setup. IPoIB uses the SL, MTU and RATE available on the multicast group which forms the broadcast group of this partition. 2.4. MPI which provides non IB based connection management should be configured to run using hard coded SLs. It uses these SLs in every QP being opened. 2.5. ULPs that use CM interface (like SRP) should have their own pre-assigned Service-ID and use it while obtaining PathRecord for establishing their connections. The SA receiving the PathRecord should match it against the policy and return the appropriate PathRecord including SL, MTU, RATE and TClass. 2.6. ULPs and programs using CMA to establish RC connection should provide the CMA the target IP and Service-ID. Some of the ULPs might also provide TClass (E.g. for SDP sockets that are provided the TOS socket option). The CMA should then use the provided Service-ID and optional TClass and pass them in the PathRecord request. The resulting PathRecord should be used for configuring the connection QP. PathRecord and MultiPathRecord enhancement for QoS: As mentioned above the PathRecord and MultiPathRecord attributes should be enhanced to carry the Service-ID which is a 64bit value. Given the existing definition for these attributes we propose to use the following fields for Service-ID: * For PathRecord: use the first 2 reserved fields whicg are 32bits each (component masks 0x1 and 0x2). Component mask 1 should be used to refer to the merged Service-ID field * For MultiPathRecord: use 2 reserved fields: 1. after the packet life (8 bits) which is component mask bit 0x1 (17)