from:"Kanevsky, Arkady"

Re: [openib-general] dapl broken for iWARP

2007-02-09 Thread Kanevsky, Arkady


Steve,
what is an issue of using 
max_qp_rd_atom and max_qp_init_rd_atom
beside the bad name?
Thanks,

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Steve Wise [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 08, 2007 6:11 PM
 To: Arlin Davis
 Cc: openib-general
 Subject: Re: [openib-general] dapl broken for iWARP
 
 On Wed, 2007-02-07 at 15:57 -0600, Steve Wise wrote:
  On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:
   Arlin,
   
   The OFED dapl code is assuming the responder_resources and 
   initiator_depth passed up on a connection request event 
 are from the 
   remote peer.  This doesn't happen for iWARP.  In the 
 current iWARP 
   specifications, its up to the application to exchange this 
   information somehow. So these are defaulting to 0 on the 
 server side 
   of any dapl connection over iWARP.
   
   This is a fairly recent change, I think.  We need to come up with 
   some way to deal with this for OFED 1.2 IMO.
   
  
  The IWCM could set these to the device max values for instance.
  
  Steve.
  
 
 There is a slight problem with all this.  There are no device 
 attributes currently for ORD and IRD.  The ammasso driver 
 maps these to max_qp_rd_atom (IRD) and 
 max_qp_init_rd_atom(ORD).  But this is screwy.
 We need new attribute for these.
 
 For OFED 1.2, I think I should just have the IWCM set them to 
 8.  The only RNIC in ofed is cxgb3 and it supports 8...
 
 
 Steve.
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] dapl broken for iWARP

2007-02-08 Thread Kanevsky, Arkady

That is correct.
I am working with Krishna on it.
Expect patches soon.

By the way the problem is not DAPL specific
and so is a proposed solution.

There are 3 aspects of the solution.
One is APIs. We suggest that we do not augment these.
That is a connection requestor sets its QP
RDMA ORD and IRD.
When connection is established user can check the QP RDMA ORD and IRD
to see what he has now to use over the connection.
We may consider to extend QP attributes to support transport specific
parameters passing in the future.
For example, iWARP MPA CRC request.

Second is the semantic that CM provides.
The proposal is to match IBCM semantic.
That is CM guarantee that local IRD is = remote ORD.
This guarantees that incoming RDMA Read requests will not overwhelm
the QP RDMA Read capabilities.
Again there is not changes to IBCM only to IWCM.
Notice that as part of this IWCM will pass down to driver and extract
from driver
needed info.

The final part is iWARP CM extension to exchange RDMA ORD, IRD.
This is similar to IBTA Annex for IP Addressing.
The harder part that this will eventually require IETF MPA spec
extension,
and the fact that MPA protocol is implemented in RNIC HW by many
vendors,
and hence can not be done by IWCM itself.

Thanks,

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Steve Wise [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, February 07, 2007 6:12 PM
 To: Arlin Davis
 Cc: openib-general
 Subject: Re: [openib-general] dapl broken for iWARP
 
 On Wed, 2007-02-07 at 15:05 -0800, Arlin Davis wrote:
  Steve Wise wrote:
  
  On Wed, 2007-02-07 at 14:02 -0600, Steve Wise wrote:

  
  Arlin,
  
  The OFED dapl code is assuming the responder_resources and 
  initiator_depth passed up on a connection request event 
 are from the 
  remote peer.  This doesn't happen for iWARP.  In the 
 current iWARP 
  specifications, its up to the application to exchange this 
  information somehow. So these are defaulting to 0 on the 
 server side 
  of any dapl connection over iWARP.
  
  This is a fairly recent change, I think.  We need to come up with 
  some way to deal with this for OFED 1.2 IMO.
  
  
  Yes, this was changed recently to sync up with the rdma_cm changes 
  that exposed the values.
  
  
  
  
  The IWCM could set these to the device max values for instance.

  
  That would work fine as long as you know the remote 
 settings will be 
  equal or better. The provider just sets the min of local device max 
  values and the remote values provided with the request.
  
 
 I know Krishna Kumar is working on a solution for exchanging 
 this info in private data so the IWCM can do the right 
 thing.  Stay tuned for a patch series to review for this.  
 But this functionality is definitely post OFED-1.2.  
 
 
 So for the OFED-1.2, I will set these to the device max in the IWCM.
 Assuming the other side is OFED 1.2 DAPL, then it will work fine.
 
 Steve.
 
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] SVN deprication

2007-01-17 Thread Kanevsky, Arkady


Thanks Jeff.
This works.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Jeff Squyres [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, January 17, 2007 3:30 PM
 To: Kanevsky, Arkady
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] SVN deprication
 
 SVN is still available, but it is at a new URL:
 
  https://svn.openfabrics.org/svn/openib.
 
 All the history and everything should be there; let me know 
 if you have any problems.
 
 
 On Jan 17, 2007, at 3:11 PM, Arkady Kanevsky wrote:
 
  Jeff and Co,
  Is there a way to find out the date of a specific SVN revision #?
  I can no longer access svn:
  svn info -r 5400 https://openfabric.org/svn
  svn: PROPFIND request failed on '/svn'
  svn: PROPFIND of '/svn': could not connect to server (https://
  openfabric.org)
 
  Is the SVN server depricated for good?
  Do we have an SVN log somewhere in a git?
  If yes, how can I find the correlation between Linux 
 version and SVN 
  revision?
  Thanks,
  Arkady
 
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit http://openib.org/mailman/listinfo/
  openib-general
 
 
 -- 
 Jeff Squyres
 Server Virtualization Business Unit
 Cisco Systems
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OpenFabrics Developer Summit at SC06, Tampa Nov 16 - 17

2006-10-16 Thread Kanevsky, Arkady




Bill,
2 small changes to the diagram on slide 
6.
SRP box should be yellow since it is IB 
specific.
Drop the word "R-NIC" from the User APIs 
box.

I think we can improve this diagram 
message.
Both kernel and user API boxes for "verbs/API" should be 
non-colored "common".
Thanks,
Arkady






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
1601 
Trapelo Rd. - Suite 16.Fax: 
781-895-1195
Waltham, MA 
02451 
central phone: 781-768-5300


  
  
  From: Bill Boas 
  [mailto:[EMAIL PROTECTED] Sent: Sunday, October 15, 2006 
  5:03 PMTo: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  openib-general@openib.org; [EMAIL PROTECTED]Cc: 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 'Kyril Faenov'; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 'Jeffrey Scott'; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; Kianoosh Naghshineh; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
  [EMAIL PROTECTED]Subject: [openib-general] OpenFabrics 
  Developer Summit at SC06, Tampa Nov 16 - 17
  
  
  To all in the OpenFabrics 
  Community
  
  We will be holding our 
  first Developer Summit in the Tampa Convention 
  Center courtesy of SC06 starting at 1.30PM in Room 
  17 on Thursday November 16, 2006. On Friday November 17, we will start in Room 
  13 at 8.00 AM and continue till 5.00PM. We have had to schedule into these 
  time slots because no other usable space is available at any other times 
  during the week of SC06!
  
  OpenFabrics will cater 
  food and beverages for afternoon break and supper on Thursday, breakfast, 
  lunch and two breaks on Friday. We will set up a registration site at Acteva 
  to collect $$ to cover our out of pocket expenses  Ill email out the URL for 
  that site in the next day or two.
  
  Please review attached 
  Strawman purposes, suggested attendees and agenda. Any changes or comments, 
  please email them to the community for all to comment on please. 
  
  
  The Summit has several 
  dimensions and themes throughout our work there:
  1)  
  consistency and robustness of the Linux and Windows software stacks for 
  Release 2.0 of OpenFabrics;
  2) - feature 
  selection, development resources and timelines for Release 
  2.0;
  3) - 
  activities, features and processes of the Enterprise Working Group on OFED 1.x 
  until Release 2.0 is ready hand-off to the EWG;
  4)  
  enhancing the resources of the EWG to be ready for 2.0 it so that it may be 
  subsequently be distributed as OFED 2.0. and adopted by the OpenFabrics vendor 
  and customer communities for production use.
  
  This is a far too much 
  work for just a day and half! PLEASE START NOW exchanging ideas for additional 
  features, contact peer engineers from companies and customers to discuss work 
  sizing, development resources, identify volunteer developers for items so that 
  when we meet on the 16th were not starting from a blank 
  sheet!
  
  Sujal Das, Johann George, 
  Matt Leininger, Pramod Srivatsa, 
  Hal Rosenstock, Tom Tucker and Bob Woodruff are leading the pre-meeting, 
  STRAWMAN collation of requirements, feature prioritization, developer 
  assignments, sizing and processes so that we have the list largely complete 
  prior to the meeting and people know has already volunteered for items from 
  the list.
  
  Bill 
  Boas
  VP, Business Development | 
  System Fabric Works
  [EMAIL PROTECTED] | 
  510-375-8840
  
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] posting send requests in RTR

2006-07-28 Thread Kanevsky, Arkady

If a QP is not in the RTS state then Send post should
be flushed to CQ for IB.
This fact need to be preserved so ULP can ensure that
for Completion Suppression Sends have been completed.

Thanks,

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, July 27, 2006 6:19 PM
 To: Sean Hefty; Rimmer, Todd; Michael S. Tsirkin
 Cc: Or Gerlitz; Roland Dreier; openib-general@openib.org
 Subject: Re: [openib-general] posting send requests in RTR
 
 Sean Hefty wrote:
 
  
  Alternately, it would be reasonable to simply document 
 that a receive 
  completion *implied* a connection established event, and therefore 
  the application could post to the send queue after it reaped a 
  receive completion (or got a connection established event).
  
  The problem is that the QP is not in the RTS state, so 
 cannot accept 
  sends.
  
 
 Well, I suppose if your adapter can be in a state where it 
 has completed a receive work request for a connection but is 
 not yet convinced that that connection is established then it 
 would have to queue those work completions somewhere.
 
 If that is all you are proposing then I have no objections, 
 an iWARP adapter can never be in such a state.
 
 But I am curious as to why completing a receive work request 
 does not place the QP in the RTS state since the end-to-end 
 QP pairing has obviously been confirmed, and therefore the QP 
 can send.
 
 
 
 
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] mthca_reset question

2006-04-27 Thread Kanevsky, Arkady

Here is an extract from the mthca_reset.c 
/*
 * Reset the chip.  This is somewhat ugly because we have to
 * save off the PCI header before reset and then restore it
 * after the chip reboots.  We skip config space offsets 22
 * and 23 since those have a special meaning.
 *
 * To make matters worse, for Tavor (PCI-X HCA) we have to
 * find the associated bridge device and save off its PCI
 * header as well.
 */

if (!(mdev-mthca_flags  MTHCA_FLAG_PCIE)) {
/* Look for the bridge -- its device ID will be 2 more
   than HCA's device ID. */
while ((bridge = pci_get_device(mdev-pdev-vendor,
mdev-pdev-device + 2,
bridge)) != NULL) {
if (bridge-hdr_type==
PCI_HEADER_TYPE_BRIDGE 
bridge-subordinate == mdev-pdev-bus) {
mthca_dbg(mdev, Found bridge: %s\n,
  pci_name(bridge));
break;
}
}

First,
Why do we check for not PCIE instead of PCIX?
Second, why while instead of if?

Most interesting, third,
Why is bridge device ID 2 more than HCA device ID?
What is this hack rely/depends on?
Can we find a device parent which should be a bridge instead?

Thanks,
Arkady


Arkady Kanevsky email: [EMAIL PROTECTED]
Network Appliance Inc.  phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] cache.c

2006-02-14 Thread Kanevsky, Arkady




Roland,
in 
core/cache.c

should 


device-cache.gid_cache = 
kmalloc(sizeof *device-cache.pkey_cache * 
(end_port(device) - start_port(device) + 1), 
GFP_KERNEL);
be

device-cache.gid_cache = 
kmalloc(sizeof *device-cache.gid_cache * 
(end_port(device) - start_port(device) + 1), GFP_KERNEL);



Arkady





Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
1601 
Trapelo Rd. - Suite 16.Fax: 
781-895-1195
Waltham, MA 
02451 
central phone: 781-768-5300

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady

Why both Immediate Data and the Stag which was used for RDMA Write?
Immediate data already contains info in response to what operation
the RDMA Write has completed locally.

Stag would make sence if Stag invalidation also put in the mix.

But for MPI RMR_context have a long lifecycle so not clear which
apps will be interested in combining Invalidation with RDMA Write with
Immediate data.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 3:03 PM
 To: Larsen, Roy K; [EMAIL PROTECTED]; Arlin 
 Davis; Hefty, Sean
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
  Caitlin Bestler wrote:
  
  Arlin Davis wrote:
  Sean Hefty wrote:
  
  The requirement is to provide an API that supports RDMA writes 
  with immediate data.  A send that follows an RDMA write is not 
  immediate data, and the API should not be constructed around 
  trying to make it so.
  
  
  
  To be clear, I believe that write with immediate should 
 be part of 
  the normal APIs, rather than an extension, but should be 
 designed 
  around those devices that provide it natively.
  
  
  I totally agree. A standard RDMA write with immediate API can be 
  very useful to RDMA applications based on the requirements (native
  support) set forth in my earlier email. It is analogous to the new
  dat_ep_post_send_with_invalidate() call; a call that supports a 
  native iWARP transport operation but provides no 
 provisions to help 
  other transports emulate. So, other transports simply return 
  NOT_SUPPORTED and add it natively in the future if it makes sense.
  
  -arlin
  
  What is proposed in a definition of
  'dat_ep_post_rdma_write_with_immediate'
  that can be implemented over iWARP using the sequence of messages 
  that were intended to support the same purpose (i.e., letting the 
  other side know that an RDMA Write transfer has been fully 
 received).
  
  No, iWARP *CAN NOT* implement write immediate data any 
 better than IB 
  can implement send with invalidate.  Immediate data
  *MUST* be indicated to the ULP unambiguously.  Imposing an 
 algorithm 
  on the application to infer immediate data arrival is hack, 
 pure and 
  simple. An application is free to perform a write/send if 
 that is the 
  semantic they want.  Why does iWARP get transport unique 
 APIs but not 
  IB?  I find this attempt to bastardize the IB semantic of immediate 
  data a little curious.
  
 
 The transports aren't getting anything. Features are there 
 for applications, especially when the feature can be defined 
 in a way that makes sense without explaining transport mechanics.
 
 Completing a transaction, complete with supplying a 
 transaction response and releasing the advertised STag 
 associated with the transaction is something that makes sense 
 in the application domain and conforms to normal DAT ordering rules.
 
 Provide information about an RDMA Write to a receive operation
 also meets that definition -- as long as it conforms to the 
 existing ordering rules. Shifting to an 8 byte message over 
 iWARP to allow for the write length *and* immediate 'tag'
 is certainly doable. We could even consider having the DAT 
 Provider supply the 'buffer' silently in the DTO itself.
 
 With that definition the consumer would get a receive 
 completion that told them that their peer's RDMA Write had 
 been successfully placed, how long it is (the length) and 
 which one (a tag).
 
 I think that is of value. iWARP can implement it as two work 
 requests and maintain the overall semantics.
 
 Are you arguing that iWARP should NOT provide this service 
 until it can do it in a single work request? It seems to me 
 that allowing an extra work request and completion is a 
 fairly simple accomodation as opposed to using an alternate 
 algorithm in the main transaction processing of the application.
 
 If we enable the applicatin can query how a remote write with 
 immediate will complete outside of the transaction loop then 
 we can allow the application to have *no* overhead inside the 
 main transaction loop, and *identical* logic on the sending side.
 
 And IB *could* implement send with invalidate by simply 
 agreeing on how the RKey to be invalidated is communicated 
 between the IB providers (perhaps as an immediate).
 
 But more to the point, I don't see how the more flexible 
 definition of write with immediate negatively impacts the IB 
 implementation of the feature. IB providers do not need to 
 allow for the extra work requests. They are not being asked 
 to place the immediate data into the receive buffer, or to do 
 any extra work at all.
 
 
 
  
 Yahoo! Groups

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady

Caitlin,
can you clarify this.
Are you proposing that Consumer encode a bit of Immediate Data to
specify that it is immediate data?
iWARP will pass it in Send message and IB in Immediate Data.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 09, 2006 2:40 PM
 To: Arlin Davis; Roland Dreier
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
  Roland Dreier wrote:
  
  
  Hmm.  Can you put a number on how much better RDMA write with 
  immediate is on current HCA hardware?  How does using the 
 underlying 
  OpenIB verbs ability to post a list of work requests compare (ie 
  posting an RDMA write followed by a send in one verbs call)?
  Maybe post multiple is a better direction for DAT.
  
  
  With post multiple, unlike immediate data, you don't have 
 the ability 
  to distinguish between a normal receive and a rdma write completion 
  indication on the other end. This is the uniqueness of the service 
  that cannot be provided by the post multiple. Yes, post 
 multiple would 
  be a nice option for DAT it is just a different service. It 
 would also 
  be required to conform to the semantics rules of the bundled 
  operations so you could not do any optimization tricks under the 
  covers with an IB rdma_write_immediate operation.
  
 
 A post_multiple also requires defining a single DTO data 
 structure. If the post multiple is atomic (meaning all make 
 it or none do) then it requires an intermediate data 
 structure to have been created. If it is not atomic there 
 really isn't reason for it to not just be a utility function 
 layered above DAT.
 
 What I'm not seeing with the immediate is this urgent need by 
 the application to be able to use the same 32-bit value for 
 both an immediate and a 4 byte message that requires an 
 entire additional API just to support it.  Why can't the 
 application just add a bool to the send message?
 Or encode the 32-bits so that they come from disjoint domains?
 
 There seems to be agreement that a consolidated 
 write-and-send call would enable the application to get the 
 benefits of rdma write with immediate whenever the 
 application could distinguish the two.
 
 I cannot see why doing this is almost free for virtually all 
 applications, and trivial for the remainder. Adding and 
 documenting an extra call to deal with such an extreme corner 
 case that is being presented only in the abstract is just not 
 justified. This extra capability has to have enough 
 functionality for enough applications to justify keeping it 
 on the books, writing test cases for it, etc.
 
 We already made a similar decision in having a 128-bit IA 
 Address. That means we cannot support a host that interfaces 
 to the Internet with IPv6 and an InfiniBand network that not 
 only had global GIDs, but allocated a global subnetwork a 
 network id that was already in use as a valid public IPv6 network.
 
 The complexity of dealing with an IA Address that was
 128+1 bits was simply not jusitified to deal with
 an extreme corner case that could very easily be avoided 
 (there is no shortage of site local network IDs in the 
 IPv6/GID format, so using a global network prefix that was 
 disjoint from the official IPv6 hierarchy would be just plain silly).
 
 So far I haven't seen any explanation as to why an 
 application has a need to encode this 33rd bit of their 
 message in this terribly transport specific matter. Is there 
 some severe performance penalty to slightly restructuring the 
 send message so that it is no longer ambiguous with the 
 immeidate data?
 
 
 
  
 Yahoo! Groups Links
 
 * To visit your group on the web, go to:
 http://groups.yahoo.com/group/dat-discussions/
 
 * To unsubscribe from this group, send an email to:
 [EMAIL PROTECTED]
 
 * Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/
  
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general][RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady




Mike,
but then the combined operation can as easily be handle 
by a "multiple post operation".
What is the need specific transport-independent RDMA 
Write with immediate data.

I am still concern over the need of Consumer Recv side 
to separate recv of Immediate Data
from "regular" Recv. Consumer "knows" what it expect to 
match the posted Recv.
There is one to one mapping between non-pure RDMA 
transfer ops of one side with Recv
of another. Sure ULP may use the same size buffers for 
all. But how many
ULPs mix the Immediate Data size messages ( 4 bytes on 
IB ) with normal
Sends of the same exact size.

Arkady






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
1601 
Trapelo Rd. - Suite 16.Fax: 
781-895-1195
Waltham, MA 
02451 
central phone: 781-768-5300


  
  
  From: Michael Krause 
  [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 3:25 
  PMTo: Arlin DavisCc: [EMAIL PROTECTED]; 
  openib-general@openib.orgSubject: Re: [dat-discussions] 
  [openib-general][RFC] DAT2.0immediatedataproposal
  At 03:36 PM 2/8/2006, Arlin Davis wrote:
  Roland Dreier wrote:
 Michael So, 
  here we have a long discussion on attempting to 
  Michael perpetuate a concept that is not universal 
  across Michael transports and was deemed to have 
  minimal value that most Michael wanted to see removed 
  from the architecture.But this discussion is being driven by an 
  application developer whodoes see value in immediate 
  data.Arlin, can you quantify the benefit you see from RDMA write 
  withimmediate vs. RDMA write followed by a 
send?We need speed and simplicity.A 
very latency sensitive application that requires immediate notification of 
RDMA write completion on the remote node without ANY latency penalties 
associated with combining operations, HCA priority rules across QPs, wire 
congestion, etc. An application that has no requirement for messaging 
outside of remote rdma write completion notifications. The application would 
not have to register and manage additional message buffers on either side, 
we can just size the queues accordingly and post zero byte messages. We need 
something that would be equivelent to setting there polling on the last byte 
of inbound data. But, since data ordering within an operation is not 
guaranteed that is not an option. So, rdma with immediate data is the most 
optimal and simplistic method for indication of RDMA-write completion that 
we have available today. In fact, I would like to see it increased in size 
to make it even more useful.RDMA Write with Immediate is part 
  of the IB Extended Transport Header. It is a fixed-sized quantity and 
  not one subject to change, i.e. increasing its size.Your argument 
  above reinforces that the particular application need is IB-specific and thus 
  should not be part of a general API but a transport-specific API. 
  If the application will only operate optimally using immediate data, then it 
  is only suitable for an IB fabric. This reinforces the need for a 
  transport-specific API.Those applications that simply want to enable 
  completion notification when a RDMA Write has occurred can use a general 
  purpose API that is interconnect independent and whose code is predicated upon 
  a RDMA Write - Send set of operations. This will enable application 
  portability across all interconnect types.Mike 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady

Roy,
and if tomorrow iWARP decides to support Immediate data with variable
length. API does not changes. Semantic does not changes and IB
will not be able to support it.

I am trying to define the semantic and API which will not have to be
modified for each rev of the transport.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Larsen, Roy K [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 09, 2006 3:32 PM
 To: [EMAIL PROTECTED]; Arlin Davis; Roland Dreier
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] 
 [RFC]DAT2.0immediatedataproposal
 
  Hmm.  Can you put a number on how much better RDMA write with 
  immediate is on current HCA hardware?  How does using the 
 underlying 
  OpenIB verbs ability to post a list of work requests compare (ie 
  posting an RDMA write followed by a send in one verbs call)?
  Maybe post multiple is a better direction for DAT.
 
 
  With post multiple, unlike immediate data, you don't have 
 the ability 
  to distinguish between a normal receive and a rdma write 
 completion 
  indication on the other end. This is the uniqueness of the service 
  that cannot be provided by the post multiple. Yes, post multiple 
  would be a nice option for DAT it is just a different service. It 
  would also be required to conform to the semantics rules of the 
  bundled operations so you could not do any optimization 
 tricks under 
  the covers with an IB rdma_write_immediate operation.
 
 
 A post_multiple also requires defining a single DTO data 
 structure. 
 If the post multiple is atomic (meaning all make it or none 
 do) then it 
 requires an intermediate data structure to have been 
 created. If it is 
 not atomic there really isn't reason for it to not just be a utility 
 function layered above DAT.
 
 That is very good point.  And since the emulated immediate 
 data service can't make the atomic guarantee it is the killer 
 argument for just making the service plain - a potentially 
 more efficient write/send.
 
 
 What I'm not seeing with the immediate is this urgent need by the 
 application to be able to use the same 32-bit value for both an 
 immediate and a 4 byte message that requires an entire 
 additional API 
 just to support it.  Why can't the application just add a 
 bool to the 
 send message?
 Or encode the 32-bits so that they come from disjoint domains?
 
 Some applications can do as you suggest.  Some applications 
 can make good use of unambiguous indications where the buffer 
 size, content, or arrival timing is not constrained.  Some 
 don't need write notification at all.  What's your point?
 
 
 There seems to be agreement that a consolidated write-and-send call 
 would enable the application to get the benefits of rdma write with 
 immediate whenever the application could distinguish the two.
 
 Well, I think there is agreement that *some* applications can 
 use write-and-send in a beneficial way.  But then again, 
 nothing prevents them from doing that now.  They do not need 
 an additional API.  But again, I don't have an issue with 
 defining a helper function.  I do have an issue with defining 
 an API and semantic that says the target side needs to be 
 coded in a way to always deal with both true immediate data 
 and emulation.  Just define a write/send helper API and the 
 UPL can be coded in a consistent manner if that is a 
 beneficial service.  If a true unambiguous indication service 
 is more beneficial or required, it can use the extension and 
 accept the extra complexity.  To demand extra complexity in 
 applications that obviously don't need the true immediate 
 data semantic is just wrong in my option.
 
 
 I cannot see why doing this is almost free for virtually all 
 applications, and trivial for the remainder. Adding and 
 documenting an 
 extra call to deal with such an extreme corner case that is being 
 presented only in the abstract is just not justified. This extra 
 capability has to have enough functionality for enough 
 applications to 
 justify keeping it on the books, writing test cases for it, etc.
 
 All we're asking is that a write/send combined API not be 
 called immediate data unless it fits the semantics of 
 immediate data.  I am puzzled at the resistance this is 
 getting.  There is a standards body specification for 
 immediate data.  If it is not followed, don't call it 
 immediate data.  It's that simple.  For those transports that 
 can provide the service, the UPL may be able to gain access 
 to it through an extension.
 
 Roy
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady

Arlin,
This can be done.

But I have an issue that extension call violate Transport Requirement.
Currently, the matching semantic is well-defined since
Recv only matches Send. Since Spec does not have any idea what
operations are defined in extension(s) there is a problem
with the transport requirements. We can, of course,
make some generic statement that with does not cover APIs
that are defined in extensions.

The API requirements are easier to handle. Since they have been
written as Nonrequirement for the APIs we decide to define yet.
(I will need to review chapter 5 to make we had followed this
in all cases.)

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Arlin Davis [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 09, 2006 5:57 PM
 To: Michael Krause
 Cc: [EMAIL PROTECTED]; 
 openib-general@openib.org; Kanevsky, Arkady
 Subject: Re: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 Michael Krause wrote:
 
  RDMA Write with Immediate is part of the IB Extended 
 Transport Header.  
  It is a fixed-sized quantity and not one subject to change, i.e. 
  increasing its size.
 
  Your argument above reinforces that the particular 
 application need is 
  IB-specific and thus should not be part of a general API but a
  transport-specific API.   If the application will only operate 
  optimally using immediate data, then it is only suitable for an IB 
  fabric.  This reinforces the need for a transport-specific API.
 
 I agree. I will move the IB immediate data service back into 
 the extension interface and update the OpenIB uDAPL provider patch.
 
 
  Those applications that simply want to enable completion 
 notification 
  when a RDMA Write has occurred can use a general purpose 
 API that is 
  interconnect independent and whose code is predicated upon a RDMA 
  Write - Send set of operations.  This will enable application 
  portability across all interconnect types.
 
 I will defer this to Arkady to draft.
 
 -arlin
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Kanevsky, Arkady

One more issue to discuss.
Does Completion of Recv that matches RDMA Write with Immediate Data
automatically sync local memory or Consumer still need to do
lmr_sync_rdma_write prior to accessing RDMAed data.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 7:40 PM
 To: [EMAIL PROTECTED]; Larsen, Roy K; Arlin 
 Davis; Hefty, Sean
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
  We have problem no matter which option we choose.
  The current Transport Level Requirement state:
  
  There is a one-to-one correspondence between send operation on one 
  Endpoint of the Connection and recv operations on the other 
 Endpoint 
  of the Connection.
  There is no correspondence between RDMA operations on one 
 Endpoint of 
  the Connection and recv or send data transfer operation on 
 the other 
  Endpoint of the Connection.
  Receive operations on a Connection must be completed in the 
 order of 
  posting of their corresponding sends.
  
  The Immediate data and Atomic ops violate these 
 requirements including 
  ordering rules.
  
  I had started updating these rules when I generated the 
 first draft of 
  the requirements. They are included in the enclosed pdf file.
  But they do not cover Atomic ops that also impact transport 
  requirements. This chapter of the spec have not been changed since 
  DAPL 1.0 and I am very concern with any changes to it.
  
  Arkady
  
 
 If RDMA Write with Immediate is viewed as being the 
 equivalent of doing RDMA Write and then an RDMA Send the 
 correspondence rule is maintained. But *only* if the rdma 
 write with immediate
 has all of the semantics of a Send.
 
 Atomics do not violate the rules if you view them as being a 
 variation on an RDMA Read. They are an RDMA Read with modify.
 The real question is whether it makes sense to put it in the 
 RDMA device. It is also not subject to emulation at a highe layer. 
 
 With send with invalidate we know how InfiniBand *will* 
 support it, because of the IB 1.2 verbs. We do not know that 
 for atomics over iWARP. We do not know whether it will be 
 added, more importantly we do not know *how* it would be 
 added if it were added. That makes coming up with a transport 
 neutral definition very premature.
 In particular, if atomics were added to iWARP there is a 
 distinct design option where it would *not* be the same work 
 queue as RDMA Reads (adding atomics through Queue ID 3 would 
 make layering on top of a current implementation much easier. 
 But it would mean that atomic credits would be distinct from 
 read credits. This is a very strong reason to defer 
 attempting to define RDMA Atomics in a transport neutral fashion.
 
  
 
 
 
 
  
 Yahoo! Groups Links
 
 * To visit your group on the web, go to:
 http://groups.yahoo.com/group/dat-discussions/
 
 * To unsubscribe from this group, send an email to:
 [EMAIL PROTECTED]
 
 * Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/
  
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT 2.0immediatedataproposal

2006-02-07 Thread Kanevsky, Arkady

But each of the multiple work requests follow the semantic of single
completion per work request. It can be controlled by completion_flags
but it still not a semantic of a single post.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 10:39 AM
 To: 'Caitlin Bestler'; Kanevsky, Arkady; Larsen, Roy K; 
 [EMAIL PROTECTED]; Sean Hefty
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 
 2.0immediatedataproposal
 
  And further it is only on the receiving side.
  And only if the receiving side cares about the data
  (sometimes it only needs the notification).
 
 The send size cares about this check because it must size its 
 SQ appropriately.
 I disagree with the assumption that a transport neutral API 
 is inherently easier for the application developer.
 
 The attempt is to define a composite work request that can 
 reduce the 
 number of actual work requests required for some providers, without 
 requiring different work flows dependent on whether the immediate 
 feature was present.
 
 This is exactly what Roy was pointing out.  This is no longer 
 defining a write with immediate data, but instead addressing 
 some other requirement.  In this case, you can define a 
 generic send side API that takes multiple work requests as 
 input, since a provider may be able to reduce the actual 
 number of work requests in this case as well.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT 2.0immediatedataproposal

2006-02-07 Thread Kanevsky, Arkady

All 3 options: proposed APIs, extensions, or IB semantic API
all provide the same performance benefit on IB.
But the last option is the easiest to use.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 11:12 AM
 To: Kanevsky, Arkady; Caitlin Bestler; Larsen, Roy K; 
 [EMAIL PROTECTED]; Sean Hefty
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 
 2.0immediatedataproposal
 
 Why would any Consumer hook itself on proprietary features 
 and APIs 
 is a different question.
 
 Because it provides a real performance benefit.  This is the 
 same reason apps code to DAPL versus standard sockets.
 
 - Sean
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Kanevsky, Arkady

IB does optionally support send_with_invalidate as defined in IBTA 1.2
spec.
OpenIB does not support this yet but this is a different matter.
So this is bad analogy.

The better analogy is socket based CM. 

But I am still not clear what you are advocating:
extensions, IB specific API or something else.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Larsen, Roy K [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 2:46 PM
 To: [EMAIL PROTECTED]; Arlin Davis; Hefty, Sean
 Cc: Kanevsky, Arkady; Sean Hefty; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 Caitlin Bestler wrote:
 
 Arlin Davis wrote:
  Sean Hefty wrote:
 
  The requirement is to provide an API that supports RDMA 
 writes with 
  immediate data.  A send that follows an RDMA write is 
 not immediate 
  data, and the API should not be constructed around 
 trying to make 
  it so.
 
 
 
  To be clear, I believe that write with immediate should 
 be part of 
  the normal APIs, rather than an extension, but should be designed 
  around those devices that provide it natively.
 
 
  I totally agree. A standard RDMA write with immediate API 
 can be very 
  useful to RDMA applications based on the requirements (native 
  support) set forth in my earlier email. It is analogous to the new 
  dat_ep_post_send_with_invalidate() call; a call that supports a 
  native iWARP transport operation but provides no 
 provisions to help 
  other transports emulate. So, other transports simply return 
  NOT_SUPPORTED and add it natively in the future if it makes sense.
 
  -arlin
 
 What is proposed in a definition of
 'dat_ep_post_rdma_write_with_immediate'
 that can be implemented over iWARP using the sequence of 
 messages that 
 were intended to support the same purpose (i.e., letting the 
 other side 
 know that an RDMA Write transfer has been fully received).
 
 No, iWARP *CAN NOT* implement write immediate data any better 
 than IB can implement send with invalidate.  Immediate data 
 *MUST* be indicated to the ULP unambiguously.  Imposing an 
 algorithm on the application to infer immediate data arrival 
 is hack, pure and simple. An application is free to perform a 
 write/send if that is the semantic they want.  Why does iWARP 
 get transport unique APIs but not IB?  I find this attempt to 
 bastardize the IB semantic of immediate data a little curious.
 
 Roy
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Kanevsky, Arkady

We have problem no matter which option we choose.
The current Transport Level Requirement state:

There is a one-to-one correspondence between send operation on one
Endpoint of the Connection and recv operations on the other Endpoint of
the Connection.
There is no correspondence between RDMA operations on one Endpoint of
the Connection and recv or send data transfer operation on the other
Endpoint of the Connection.
Receive operations on a Connection must be completed in the order of
posting of their corresponding sends.

The Immediate data and Atomic ops violate these requirements including
ordering
rules.

I had started updating these rules when I generated the first draft of
the
requirements. They are included in the enclosed pdf file.
But they do not cover Atomic ops that also impact transport
requirements.
This chapter of the spec have not been changed since DAPL 1.0
and I am very concern with any changes to it.

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 6:57 PM
 To: Larsen, Roy K; [EMAIL PROTECTED]; Arlin 
 Davis; Hefty, Sean
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
 
  
  I was under the assumption that the DAT community defined 
 the APIs and 
  semantics through an open process.  Given that the IB write 
 immediate 
  data facility does not break the implementation or semantics of the 
  currently defined RDMA write facility, I see no reason the 
 DAPL spec 
  couldn't be updated, through consensus, with the realities 
 of existing 
  transport services.  Nevertheless, I presume you'll have no 
 objection 
  to implementing this useful service as a DAPL extension since the 
  semantic rules for extensions haven't been define yet.
  
  Roy
 
 That is correct, because as an extension the user would not 
 expect normal semantics to still be guaranteed.
 
 
 
 
 
  
 Yahoo! Groups Links
 
 * To visit your group on the web, go to:
 http://groups.yahoo.com/group/dat-discussions/
 
 * To unsubscribe from this group, send an email to:
 [EMAIL PROTECTED]
 
 * Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/
  
 
 


transport_req_020706.pdf
Description: transport_req_020706.pdf
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediate dataproposal

2006-02-06 Thread Kanevsky, Arkady

Arlin,
On Friday we agreed that receiver can not distinguish
between 4 byte of Send or 4 bytes of Immediate data
if RDMA Write with Immed is implemented as 2 operations:
RDMA Write followed by Send.

ULP Reciever expects Immediate data that is why it posts
Recv. Depending on Transport capability it MAY complete
as Recv or as Recv_RDMA_Write_with_Immed_in_event.

Neither Provider not Consumer can distinguish between the cases
unless there is additional info.

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Davis, Arlin R [mailto:[EMAIL PROTECTED] 
 Sent: Monday, February 06, 2006 1:25 PM
 To: Kanevsky, Arkady; Sean Hefty
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
 immediate dataproposal
 
 
 Arkady,
 
 Your requirements are slightly different then the proposed 
 set of requirements. 
 
 iii) DAPL Provider does not provide any identification that 
 that the Receive operation matches remote RDMA Write with 
 Immediate data if it completes as Receive DTO. 
 
   - It is up to an ULP to separate Receive completion of remote
 Send from remote RDMA Write withImmediate Data.
 
 Tell me how this is possible? How can the application 
 distinguish between a 4 byte message and a 4 byte immediate 
 data message? We would have to add a new requirement... If 
 the provider supports immediate data in the payload the ULP 
 cannot send a message equal to the immediate
 data size.   
 
 -arlin
 
 -Original Message-
 From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
 Sent: Monday, February 06, 2006 8:08 AM
 To: Sean Hefty; Davis, Arlin R
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 
 2.0 immediate
 dataproposal
 
 Here are the changes to the existing requirements chapters for RDMA 
 Write with Immediate Data.
 
 Feedback please.
 Arkady
 
 Arkady Kanevsky   email: [EMAIL PROTECTED]
 Network Appliance Inc.   phone: 781-768-5395
 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
 Waltham, MA 02451   central phone: 781-768-5300
 
 
  -Original Message-
  From: Sean Hefty [mailto:[EMAIL PROTECTED]
  Sent: Friday, February 03, 2006 7:30 PM
  To: Davis, Arlin R
  Cc: [EMAIL PROTECTED]; openib-general@openib.org
  Subject: Re: [dat-discussions] [openib-general] [RFC] DAT 2.0 
  immediate dataproposal
 
  Davis, Arlin R wrote:
   Applications need an optimized mechanism to notify the
  receiving end
   that RDMA write data has completed beyond the two 
 operation method 
   currently used (RDMA write followed by message send). 
 This new RDMA 
   write feature will support 4-bytes of inline data that 
 will be sent
 
  Is there any reason to restrict the size of the immediate data?  
  Could you define the API such that the size is variable?  I.e. the 
  provider can simply give the immediate data size, with 0 
 indicating 
  that it is not supported.
 
   It should avoid
   any latency penalties normally associated with a two
  operation method.
 
  I would state this as a requirement.  A write followed by a send 
  should be pushed to the application, since they may be able to 
  provide additional optimizations (such as combining
  operations) beyond what a provider could.
 
   The initiating side must expose a 4-byte immediate data
  parameter for
   the application to set the inline data. The receiving side must 
   provide a mechanism to accept the 4-byte immediate data. On the 
   receiving side, the write with immediate completion 
 notification is 
   indicated through a receive completion. It is the 
 responsibility of 
   the provider to identify to the application 4-byte
  immediate data from
   a normal 4-byte send message. The inline byte ordering is
  application specific.
 
  Requirements look good to me.
 
  - Sean
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
 
  To unsubscribe, please visit
  http://openib.org/mailman/listinfo/openib-general
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal

2006-02-06 Thread Kanevsky, Arkady

Arlin,
It is too strong to state that Consumer should never send a message
equal in size to the size of immediate data.
Consumer knows from the context which one it is.
it may be based on dedicated connection, or based on ULP protocol
ordering.
Arkady 

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Kanevsky, Arkady 
 Sent: Monday, February 06, 2006 2:05 PM
 To: Davis, Arlin R; Sean Hefty
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
 immediatedataproposal
 
 Arlin,
 On Friday we agreed that receiver can not distinguish between 
 4 byte of Send or 4 bytes of Immediate data if RDMA Write 
 with Immed is implemented as 2 operations:
 RDMA Write followed by Send.
 
 ULP Reciever expects Immediate data that is why it posts 
 Recv. Depending on Transport capability it MAY complete as 
 Recv or as Recv_RDMA_Write_with_Immed_in_event.
 
 Neither Provider not Consumer can distinguish between the 
 cases unless there is additional info.
 
 Arkady
 
 Arkady Kanevsky   email: [EMAIL PROTECTED]
 Network Appliance Inc.   phone: 781-768-5395
 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
 Waltham, MA 02451   central phone: 781-768-5300
  
 
  -Original Message-
  From: Davis, Arlin R [mailto:[EMAIL PROTECTED]
  Sent: Monday, February 06, 2006 1:25 PM
  To: Kanevsky, Arkady; Sean Hefty
  Cc: [EMAIL PROTECTED]; openib-general@openib.org
  Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
  immediate dataproposal
  
  
  Arkady,
  
  Your requirements are slightly different then the proposed set of 
  requirements.
  
  iii) DAPL Provider does not provide any identification 
 that that the 
  Receive operation matches remote RDMA Write with Immediate 
 data if it 
  completes as Receive DTO.
  
  - It is up to an ULP to separate Receive completion of remote
  Send from remote RDMA Write with  Immediate Data.
  
  Tell me how this is possible? How can the application distinguish 
  between a 4 byte message and a 4 byte immediate data 
 message? We would 
  have to add a new requirement... If the provider supports 
 immediate 
  data in the payload the ULP cannot send a message equal to the 
  immediate
  data size.   
  
  -arlin
  
  -Original Message-
  From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
  Sent: Monday, February 06, 2006 8:08 AM
  To: Sean Hefty; Davis, Arlin R
  Cc: [EMAIL PROTECTED]; openib-general@openib.org
  Subject: RE: [dat-discussions] [openib-general] [RFC] DAT
  2.0 immediate
  dataproposal
  
  Here are the changes to the existing requirements chapters 
 for RDMA 
  Write with Immediate Data.
  
  Feedback please.
  Arkady
  
  Arkady Kanevsky   email: [EMAIL PROTECTED]
  Network Appliance Inc.   phone: 781-768-5395
  1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
  Waltham, MA 02451   central phone: 781-768-5300
  
  
   -Original Message-
   From: Sean Hefty [mailto:[EMAIL PROTECTED]
   Sent: Friday, February 03, 2006 7:30 PM
   To: Davis, Arlin R
   Cc: [EMAIL PROTECTED]; openib-general@openib.org
   Subject: Re: [dat-discussions] [openib-general] [RFC] DAT 2.0 
   immediate dataproposal
  
   Davis, Arlin R wrote:
Applications need an optimized mechanism to notify the
   receiving end
that RDMA write data has completed beyond the two
  operation method
currently used (RDMA write followed by message send). 
  This new RDMA
write feature will support 4-bytes of inline data that
  will be sent
  
   Is there any reason to restrict the size of the immediate data?  
   Could you define the API such that the size is variable? 
  I.e. the 
   provider can simply give the immediate data size, with 0
  indicating
   that it is not supported.
  
It should avoid
any latency penalties normally associated with a two
   operation method.
  
   I would state this as a requirement.  A write followed by a send 
   should be pushed to the application, since they may be able to 
   provide additional optimizations (such as combining
   operations) beyond what a provider could.
  
The initiating side must expose a 4-byte immediate data
   parameter for
the application to set the inline data. The receiving 
 side must 
provide a mechanism to accept the 4-byte immediate 
 data. On the 
receiving side, the write with immediate completion
  notification is
indicated through a receive completion. It is the
  responsibility of
the provider to identify to the application 4-byte
   immediate data from
a normal 4-byte send message. The inline byte ordering is
   application specific.
  
   Requirements look good to me

RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal

2006-02-06 Thread Kanevsky, Arkady

Roy,
Can you explain, please?

For IB the operation will be layered properly on Transport primitive.
And on Recv side it will indicate in completion event DTO
that it matches RDMA Write with Immediate and that Immediate Data
is in event.

For iWARP I expect initially, it will be layered on RDMA Write
followed by Send. The Provider can do post more efficiently
than Consumer and guarantee atomicity. 
On Recv side Consumer will get Recv DTO completion in event
and Immediate Data inline as specified by Provider Attribute.

From the performance point of view Consumers who program to IB
only will have no performance degradation at all. But this API also
allows Consumers to write ULP to be transport independent
with minimal penalty: one binary comparison and extra 4 bytes in recv
buffer.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Larsen, Roy K [mailto:[EMAIL PROTECTED] 
 Sent: Monday, February 06, 2006 2:10 PM
 To: Caitlin Bestler; [EMAIL PROTECTED]; 
 Kanevsky, Arkady; Sean Hefty
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
 immediatedataproposal
 
 If it is up to the ULP to separate out normal receive data 
 from that associated with a write immediate, how is this 
 different from the ULP doing a write followed by a send?  If 
 there is no difference, then what we're really talking about 
 is a convenience to the initiating ULP.
 
 Perhaps what would be best is to construct an API that allows 
 the ULP to perform standard write/send operations into one 
 call which the underlying provider could optimize into one 
 transaction with the associated interconnect interface. 
 Better yet, a general request combining interface would have 
 even more value, but calling this write/send immediate data 
 is a stretch, if not downright silly.  Some transports have 
 true immediate data that provides unique value.  There is 
 nothing unique in a write/send sequence - ULPs do it all the time...
 
 Roy
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Caitlin Bestler
 Sent: Monday, February 06, 2006 10:48 AM
 To: [EMAIL PROTECTED]; Kanevsky, Arkady; Sean Hefty
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
 immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
  Arkady,
  
  Your requirements are slightly different then the proposed set of 
  requirements.
  
  iii) DAPL Provider does not provide any identification 
 that that the 
  Receive operation matches remote RDMA Write with Immediate 
 data if it 
  completes as Receive DTO.
  
  - It is up to an ULP to separate Receive completion of remote
  Send from remote RDMA Write with  Immediate Data.
  
  Tell me how this is possible? How can the application distinguish 
  between a 4 byte message and a 4 byte immediate data 
 message? We would 
  have to add a new requirement... If the provider supports 
 immediate 
  data in the payload the ULP cannot send a message equal to the 
  immediate data size.
  
 
 The data sink knows whether the 4 bytes was sent as a message 
 or as an immediate because it is clear in the ULP context.
 Possible methods:
   The expected completion is an immediate.
   All 4 byte messages are immediates.
   All 4 byte messages where the ms-byte is X are immediate.
   If its Tuesday its an immediate.
   If it's a prime number its an immediate
   ...
 
 But there is no clue from the transport layer.
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal

2006-02-06 Thread Kanevsky, Arkady

good point.
I will add this to the requirements and augement the necessary
transfered_length
text.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Davis, Arlin R [mailto:[EMAIL PROTECTED] 
 Sent: Monday, February 06, 2006 4:17 PM
 To: Kanevsky, Arkady; Sean Hefty
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
 immediatedataproposal
 
 I just want to get consensus on the requirements before we 
 get too far.
 One thing I forgot is that with Infiniband, the receive with 
 immediate provides the size of the rdma write that just 
 completed. I think we should include this in the requirements 
 since there is ULP value here.
 
 -arlin
 
 -Original Message-
 From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
 Sent: Monday, February 06, 2006 11:08 AM
 To: Kanevsky, Arkady; Davis, Arlin R; Sean Hefty
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0
 immediatedataproposal
 
 Arlin,
 It is too strong to state that Consumer should never send a message 
 equal in size to the size of immediate data.
 Consumer knows from the context which one it is.
 it may be based on dedicated connection, or based on ULP protocol 
 ordering.
 Arkady
 
 Arkady Kanevsky   email: [EMAIL PROTECTED]
 Network Appliance Inc.   phone: 781-768-5395
 1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
 Waltham, MA 02451   central phone: 781-768-5300
 
 
  -Original Message-
  From: Kanevsky, Arkady
  Sent: Monday, February 06, 2006 2:05 PM
  To: Davis, Arlin R; Sean Hefty
  Cc: [EMAIL PROTECTED]; openib-general@openib.org
  Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
  immediatedataproposal
 
  Arlin,
  On Friday we agreed that receiver can not distinguish between
  4 byte of Send or 4 bytes of Immediate data if RDMA Write 
 with Immed 
  is implemented as 2 operations:
  RDMA Write followed by Send.
 
  ULP Reciever expects Immediate data that is why it posts Recv. 
  Depending on Transport capability it MAY complete as Recv or as 
  Recv_RDMA_Write_with_Immed_in_event.
 
  Neither Provider not Consumer can distinguish between the cases 
  unless there is additional info.
 
  Arkady
 
  Arkady Kanevsky   email: [EMAIL PROTECTED]
  Network Appliance Inc.   phone: 781-768-5395
  1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
  Waltham, MA 02451   central phone: 781-768-5300
 
 
   -Original Message-
   From: Davis, Arlin R [mailto:[EMAIL PROTECTED]
   Sent: Monday, February 06, 2006 1:25 PM
   To: Kanevsky, Arkady; Sean Hefty
   Cc: [EMAIL PROTECTED]; openib-general@openib.org
   Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
   immediate dataproposal
  
  
   Arkady,
  
   Your requirements are slightly different then the 
 proposed set of 
   requirements.
  
   iii) DAPL Provider does not provide any identification
  that that the
   Receive operation matches remote RDMA Write with Immediate
  data if it
   completes as Receive DTO.
  
- It is up to an ULP to separate Receive completion of remote
   Send from remote RDMA Write with   Immediate Data.
  
   Tell me how this is possible? How can the application 
 distinguish 
   between a 4 byte message and a 4 byte immediate data
  message? We would
   have to add a new requirement... If the provider supports
  immediate
   data in the payload the ULP cannot send a message equal to the 
   immediate data size.
  
   -arlin
  
   -Original Message-
   From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
   Sent: Monday, February 06, 2006 8:08 AM
   To: Sean Hefty; Davis, Arlin R
   Cc: [EMAIL PROTECTED]; openib-general@openib.org
   Subject: RE: [dat-discussions] [openib-general] [RFC] DAT
   2.0 immediate
   dataproposal
   
   Here are the changes to the existing requirements chapters
  for RDMA
   Write with Immediate Data.
   
   Feedback please.
   Arkady
   
   Arkady Kanevsky   email: [EMAIL PROTECTED]
   Network Appliance Inc.   phone: 781-768-5395
   1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
   Waltham, MA 02451   central phone: 781-768-5300
   
   
-Original Message-
From: Sean Hefty [mailto:[EMAIL PROTECTED]
Sent: Friday, February 03, 2006 7:30 PM
To: Davis, Arlin R
Cc: [EMAIL PROTECTED]; openib-general@openib.org
Subject: Re: [dat-discussions] [openib-general] [RFC] DAT 2.0 
immediate dataproposal
   
Davis, Arlin R wrote:
 Applications need an optimized mechanism to notify the
receiving end
 that RDMA write data has completed beyond

RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal

2006-02-06 Thread Kanevsky, Arkady

Roy,
comments inline.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Larsen, Roy K [mailto:[EMAIL PROTECTED] 
 Sent: Monday, February 06, 2006 4:25 PM
 To: Kanevsky, Arkady; Caitlin Bestler; 
 [EMAIL PROTECTED]; Sean Hefty
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
 immediatedataproposal
 
 
 
 From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
 Roy,
 Can you explain, please?
 
 For IB the operation will be layered properly on Transport primitive.
 And on Recv side it will indicate in completion event DTO that it 
 matches RDMA Write with Immediate and that Immediate Data is 
 in event.
 
 For iWARP I expect initially, it will be layered on RDMA 
 Write followed 
 by Send. The Provider can do post more efficiently than Consumer and 
 guarantee atomicity.
 On Recv side Consumer will get Recv DTO completion in event and 
 Immediate Data inline as specified by Provider Attribute.
 
 From the performance point of view Consumers who program to IB only 
 will have no performance degradation at all. But this API 
 also allows 
 Consumers to write ULP to be transport independent with minimal 
 penalty: one binary comparison and extra 4 bytes in recv buffer.
 
 If the application could be written transport independently, 
 I would have no objection at all.  Instead, it must be 
 written in a transport-adaptive way and to be able to adapt 
 to all possible implementations, the application could not 
 send arbitrary immediate-sized data as messages because 
 there is no way to distinguish between them on the receiving 
 side.  That is HUGE!  It is my experience that send/receive 
 is generally used for small messages and to take away 
 particular message sizes or to depend on the so the 
 application can adapt to whatever the immediate size is for 
 a particular transport, if even needed, is a very weak 
 facility to offer.

But the remote side does posts Recv. Since it anticipate that
this Recv will be matched against the RDMA Write with immediate
it posts the recv buffer which fits. Yes, there is an issue
for Transport-independent ULP that it does needs a buffer.
For IB it is possible to post 0-size buffer. But if this is the case
Recv end Consumer DOES know that it will be macthed against RDMA
Write so ULP DOES know what it will be matched against.
So in the worst case Consumer does have to pay the price of creating
LMR to handle 4 byte buffer to match RDMA Write Immediate data.

 
 It also affects interface resource allocation.  Send queue 
 sizes will have to adapt to possibly twice there size.
 

That is correct. We argued about it at the meeting.
One alternative is to have EP and EVD attr. But this will not
be efficient since it will double the queue size where
a smaller increment is possible due to the depth of the RDMA Write
pipeline outstanding.

 It just dawned on me that the immediate data must be in 
 registered memory to be sent in a message.  This means the 
 API must be amended to pass an LMR or, even worse, the 
 provider would have to register memory in the speed path or 
 create and manipulate its own queue of immediate
 data buffers/LMRs.  Of course, LMRs are not needed and an 
 overhead for transports that provide true immediate data.

No registration on the speed path. It is Consumer responsibility
to provide Recv Buffer of the right size.
Yes for IB only ULP this can be avoided.
But ULP can be written to the proposed API to take full
advantage of IB performance but that code will not be transport
independent.

But this API allows to write transport independent code
albeit with certain price attached.

 
 Oh, and another thing.  InfiniBand indicates the size of the 
 RDMA write in the receive completion.  That is something that 
 will have to be addressed in a transport independent way or 
 dropped as part of the service.

Good point. I will augment Spec accordingly.

 
 The bottom line here is that it is NOT transport independent. 

implementation is not transport independent.
But API allows to write Transport-specific ULP with full perfromance
as well Transport-independent ULP with better performance
than without proposed API and with minimal performance
penalty for Transports that provide it.

 
 Now, the atomicity argument between write and send has some 
 credibility.
 If an application chooses to adapt to an explicit 
 write/send semantic for write completion notification in 
 environments that can't provide it natively, this could be 
 addressed by a generalized combined request API that can 
 guarantee thread-based atomicity to the send queue.  This 
 seems much more straightforward to me since, in essence, to 
 adapt to non-native immediate data services, they would have 
 to allocate resources and behave

RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 immediatedataproposal

2006-02-06 Thread Kanevsky, Arkady

I am not clear what you are proposing?
A transport specific API?

The current proposal provides on sending side:
single post, and single completion in the error free case.
This is commonality that simplify ULP.

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Larsen, Roy K [mailto:[EMAIL PROTECTED] 
 Sent: Monday, February 06, 2006 6:50 PM
 To: Kanevsky, Arkady; Caitlin Bestler; 
 [EMAIL PROTECTED]; Sean Hefty
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] DAT 2.0 
 immediatedataproposal
 
 
 
 From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
 Sent: Monday, February 06, 2006 2:27 PM
 
 Roy,
 comments inline.
 
 
 Mine too
 
 
  From: Kanevsky, Arkady [mailto:[EMAIL PROTECTED]
  Roy,
  Can you explain, please?
  
  For IB the operation will be layered properly on Transport
 primitive.
  And on Recv side it will indicate in completion event DTO that it 
  matches RDMA Write with Immediate and that Immediate Data is
  in event.
  
  For iWARP I expect initially, it will be layered on RDMA
  Write followed
  by Send. The Provider can do post more efficiently than 
 Consumer and 
  guarantee atomicity.
  On Recv side Consumer will get Recv DTO completion in event and 
  Immediate Data inline as specified by Provider Attribute.
  
  From the performance point of view Consumers who program 
 to IB only 
  will have no performance degradation at all. But this API
  also allows
  Consumers to write ULP to be transport independent with minimal
  penalty: one binary comparison and extra 4 bytes in recv buffer.
 
  If the application could be written transport 
 independently, I would 
  have no objection at all.  Instead, it must be written in a 
  transport-adaptive way and to be able to adapt to all possible 
  implementations, the application could not send arbitrary 
  immediate-sized data as messages because there is no way to 
  distinguish between them on the receiving side.  That is 
 HUGE!  It is 
  my experience that send/receive is generally used for 
 small messages 
  and to take away particular message sizes or to depend on 
 the so the 
  application can adapt to whatever the immediate size is for a 
  particular transport, if even needed, is a very weak facility to 
  offer.
 
 But the remote side does posts Recv. Since it anticipate 
 that this Recv 
 will be matched against the RDMA Write with immediate it 
 posts the recv 
 buffer which fits. Yes, there is an issue for 
 Transport-independent ULP 
 that it does needs a buffer.
 For IB it is possible to post 0-size buffer. But if this is the case 
 Recv end Consumer DOES know that it will be macthed against 
 RDMA Write 
 so ULP DOES know what it will be matched against.
 So in the worst case Consumer does have to pay the price of creating 
 LMR to handle 4 byte buffer to match RDMA Write Immediate data.
 
 I think you missed my larger point.  The point was that the 
 application must be written in such a way that it could 
 inferred when immediate data arrived for a variety of 
 immediate data sizes and that places a constraint on the 
 application wrt to data it may want to send/receive normally. 
 Where as, if the application embraced the fact that it was 
 responsible for sending a message to indicate a write 
 completion, it is free to send whatever amount of data best 
 met its needs.
 
 Transports that support true immediate data do not require 
 the ULP to perform buffer matching.  They can post a series 
 of receive buffers that may or may not indicate immediate 
 data.  The ULP does not have to know ahead of time when 
 immediate data will arrive **against other data receives**.  
 The fact that an IB oriented application never needs to back 
 a receive request with a buffer if they were only used to 
 indicate immediate data is orthogonal.
 
 
 
  It also affects interface resource allocation.  Send queue 
 sizes will 
  have to adapt to possibly twice there size.
 
 
 That is correct. We argued about it at the meeting.
 One alternative is to have EP and EVD attr. But this will not be 
 efficient since it will double the queue size where a 
 smaller increment 
 is possible due to the depth of the RDMA Write pipeline outstanding.
 
  It just dawned on me that the immediate data must be in registered 
  memory to be sent in a message.  This means the API must 
 be amended 
  to pass an LMR or, even worse, the provider would have to register 
  memory in the speed path or create and manipulate its own queue of 
  immediate
  data buffers/LMRs.  Of course, LMRs are not needed and an overhead 
  for transports that provide true immediate data.
 
 No registration on the speed path. It is Consumer responsibility to 
 provide Recv Buffer of the right size.
 Yes for IB

RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-02-01 Thread Kanevsky, Arkady

comments on Arlin and Caitlin's emails inline.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Monday, January 30, 2006 7:16 PM
 To: Arlin Davis; Kanevsky, Arkady
 Cc: Lentini, James; [EMAIL PROTECTED]; 
 openib-general@openib.org
 Subject: RE: [openib-general] RE: [RFC] DAT 2.0 immediate 
 data proposal
 
 Arlin Davis wrote:
  Kanevsky, Arkady wrote:
  
  Arlin,
  I am not convinced we need a new recv for immediate data.
  But what is needed is change in normative text in many places.
  Recv, RDMA Write, DTO completion events, error behavior.
  Sure you can define immed data in extension but it still effects 
  behavior of the normative part of the spec.
  
  
  How does it effect the normative part of the spec outside 
 of the DTO 
  event extension? The post_recv behaves exactly the same.

We will need a paragraph that size of the recv buffer
shall accommodate immediate data if the recv may be matched
with rdma_write_immed. There we can reference Provider attribute
for how immed data is returned. Then in Advice to Consumer
state how to generate transport independent recv
and how it can be optimized based on Provider attr.

  
  This is why my preference is to put it into the main spec.
  
  
  ok, with no new recv_immed call we do get a little closer.
  
  The xfer_size is minor thing. We just need to define it 
 meaning with 
  respect to immed_data. Defining it either way is fine.
  
  Handling extra space on CQ can be handled by Provider.
  We can add a new EVD attribute for the use for handling RDMA_write 
  with immed data and Provider can automatically add extra 
 space on CQ.
  Provider is already responsible to handing user a single 
 completion.
  SO it will only be used for error handling.
  
  
  sounds good.
  
  Error handling takes maost of the new write up anyhow.
  Regardless where it is done in the spec or in extension.
  
  Question on do we want to support Send with immed_data have to be 
  decided. Ditto remote RMR invalidation with new post(s) for 
  immed_data.
  Just because IB supports all possible correlation under 
 one Send post 
  does not mean that uDAPL should follow that too.
  
  
  I would agree, strike them all except rdma_write_immed.

The only one which need to be discussed it Remote invalidate
with rdma_write_immed and Local invalidate with rdma_write_immed.

  
  Can you give some idea how you would write up the normative 
 text for 
  the transport independent receive that would accept immediate data?
  
  thanks,
  
  -arlin
 
 The data source:
   posts an rdma write with immediate DTO, supplying
   the RDMA Write data source and an immediate value.
 
   This is translated into one work request (if the
   device supports write with immediate), or into
   a RDMA Write followed by a RDMA Send (if it does
   not). 

This should be Model Implication section.

 
   While successful completion of the RDMA Write will
   be suppressed, the Consumer must still allow for the
   extra space on the SendQ and the CQ. An IA attribute
   will document how many work requests a write_with_immediate
   will translate into.

This belongs to Model implication also and in Usage section.

 
 The data sink:
   post a recv (to EP or SRQ) with a four byte buffer.
 
   When it reaps the completion it needs to be ready
   to see the data either in an immediate field in
   the work completion, or in the buffer originally
   specified in the recv DTO.
 

This is in the Usage section.

 
 A Provider MAY indicate that it supports immediate receives, 
 but on iWARP or any transport where this is not the default 
 optimized receive processing MUST be enabled by the user.
 Otherwise, RFC compliance would require that a four byte 
 untagged message matched to a zero byte buffer was an error.
 Essentially the user is posting a receive operation that 
 names the four bytes in the Work completion as the buffer.
 

Ditto. Also it should reference Provider Attribute and not transport.

 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-01-27 Thread Kanevsky, Arkady

Caitlin,
Agree that Send with immed is too hard to handle.
I have not heard from any ULP that they need that.
So we can take informal vote and close that issue.

The sizing of EVD to handle 2 completions in case of the error
for post of RDMA_write_with_immed can be handled by Provider
adding extra if EVD will be used for posting RDMA_write_with_immed.
It does not allow Consumer to optimize queue size based on exact
number of oustanding RDMA_write_with_immed ops but it is simpler
to program to.

Of course ULP can be adaptive and chooses the code pass
based on Provider attr if we add the attr if extra queue size is needed.
It is separate from how immed data is returned.
We can combined the 2 under one Provider attr but conceptually it is
wrong
to combine two.


Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Friday, January 27, 2006 12:23 PM
 To: [EMAIL PROTECTED]; Kanevsky, Arkady; Arlin Davis
 Cc: Lentini, James; openib-general@openib.org
 Subject: RE: [dat-discussions] RE: [openib-general] RE: [RFC] 
 DAT 2.0 immediate data proposal
 
 [EMAIL PROTECTED] wrote:
  But this penalizes user which need to deal with 2 way to deal with 
  post calls and completions.
  
  I do not think we are not to far from consensus.
  Transport independent App will allocate 4 bytes extra for buffers 
  that can match immediate data. Completion data will return 
 where the 
  immediate data is return (Consumer can not request it on posting), 
  and 4 bytes for immediate data in completion event. The rest are 
  ironing details for complete specification.
  This is no different than for any other new functionality proposed.
  And except for wasting 4 bytes per buffer or completion I 
 do not see 
  how it penalizes IB. Moreover if Apps knows that Provider returns 
  immediate data in completion event it can avoid any penalty.
  
  There is no penalty to the user if you just provide native features 
  via extensions. Your extension will provide the best possible 
  interface for your native capabilities.
  
  I think we are further from consensus then we first thought:
  
  Right now we have a new post recv, different delivery 
 mechanisms, and 
  a requirement to allocate an extra 4 bytes of user data.
  
  The only requirement to support immediate data on IB, is a new post 
  send and write immediate data calls and a new event data construct. 
  The normal post_recv can be used unchanged and can already process 
  normal and immediate data. No requirement on the user to 
 allocate and 
  manage an extra 4 bytes in the receive buffer. In fact, you 
 can post 
  receive with no buffer.
  
  In order to support immediate data via iWARP, you now have a 
  requirement to use a special new receive post, new user buffer 
  constructs to place the data, and new delivery method that 
 has to be 
  checked via provider attributes or at event time.
  
  Is there anyway to get this closer? If not, I would recommend going 
  back to an extension interface for immediate data.
  
 
 I think the trick to finding out if there is something useful 
 that can be made transport neutral is to work in the opposite 
 direction.
 
 Start with the message sequence that the application would use
 *without* immediates, and then ask if there is a way to allow 
 an InfiniBand Provider to compress that message sequence.
 
 That is possible for RDMA Write with Immediate. With careful 
 definition of a composite message it can be viewed as a 
 transport specific replacement for an RDMA Write followed by 
 a 4-byte RDMA Send. There are only two special considerations 
 required:
 
 1) A single post has to submit the combination (otherwise it 
is too difficult for the Provider to detect the optimization).
 2) The receive completion may report the received data in the
user supplied buffer OR in an immediate data field in the
completion.
 
 I do not think it is feasible to define a transport neutral 
 equivalent of a RDMA Send with Immediate. How is the extra 
 data transmitted via iWARP? An extra send? Pre-pend the four 
 bytes? Or 4 bytes at the end? Delivery of the immediate data 
 is transport dependent?
 
 Adding an immediate data field to the completion doesn't cost 
 much, and it would allow IB DAT Provider to interact with 
 IB-specific fields. But I can't see adding a send with 
 immediate method in any way that would create an expectation 
 in developers that it would work in a transport neutral fashion.
 
 Write with immediate is possible. It carries the complexity 
 that a single DTO request might result in two flushed work 
 completions.
 The current consensus is that this was too complex relative 
 to the benefit. But that's really a call for application developers

RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-01-26 Thread Kanevsky, Arkady

Sean,
Immediate data can be handled in Transport independent way.
API for it certainly is. I am more concern that different vendors
will come up with their own extensions for the same features.

The size of immediate data is no big deal.
The reall issue is that App will need to be changes to handle
more data. So DAT can just increase the size of the immed_data field
in event and in posted buffer. NO API functionality change just API
header change
and recompile of app.

But these kind of changes will face the same problem whether it is part
of DAT
or part of the DAT extension.

Let talk more about it on the DAT call tomorrow.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, January 24, 2006 7:17 PM
 To: Kanevsky, Arkady
 Cc: Arlin Davis; Caitlin Bestler; Lentini, James; 
 [EMAIL PROTECTED]; openib-general@openib.org; 
 Davis, Arlin R
 Subject: Re: [openib-general] RE: [RFC] DAT 2.0 immediate 
 data proposal
 
 Kanevsky, Arkady wrote:
  But this penalizes user which need to deal with 2 way to deal with 
  post calls and completions.
 
 Yes, any app that wants to take advantage of transport 
 specific features, which immediate data is, is no longer 
 transport neutral.
 
 How do you plan to handle the next RDMA transport that comes 
 along with 64-bytes of immediate data?
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-01-26 Thread Kanevsky, Arkady

Arlin,
I am not convinced we need a new recv for immediate data.
But what is needed is change in normative text in many places.
Recv, RDMA Write, DTO completion events, error behavior.
Sure you can define immed data in extension but it still effects
behavior of the normative part of the spec.
This is why my preference is to put it into the main spec.

The xfer_size is minor thing. We just need to define it meaning
with respect to immed_data. Defining it either way is fine.

Handling extra space on CQ can be handled by Provider.
We can add a new EVD attribute for the use for handling RDMA_write with
immed
data and Provider can automatically add extra space on CQ.
Provider is already responsible to handing user a single completion.
SO it will only be used for error handling.

Error handling takes maost of the new write up anyhow.
Regardless where it is done in the spec or in extension.

Question on do we want to support Send with immed_data have to be
decided.
Ditto remote RMR invalidation with new post(s) for immed_data.
Just because IB supports all possible correlation under one Send post
does not mean that uDAPL should follow that too.

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Arlin Davis [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 26, 2006 3:02 PM
 To: Kanevsky, Arkady; Arlin Davis; Caitlin Bestler
 Cc: Lentini, James; [EMAIL PROTECTED]; 
 openib-general@openib.org
 Subject: RE: [openib-general] RE: [RFC] DAT 2.0 immediate 
 data proposal
 
 
 But this penalizes user which need to deal with 2 way to 
 deal with post 
 calls and completions.
 
 I do not think we are not to far from consensus.
 Transport independent App will allocate 4 bytes extra for 
 buffers that 
 can match immediate data.
 Completion data will return where the immediate data is return 
 (Consumer can not request it on posting), and 4 bytes for immediate 
 data in completion event.
 The rest are ironing details for complete specification.
 This is no different than for any other new functionality proposed.
 And except for wasting 4 bytes per buffer or completion I do not see 
 how it penalizes IB. Moreover if Apps knows that Provider returns 
 immediate data in completion event it can avoid any penalty.
 
 There is no penalty to the user if you just provide native 
 features via extensions. Your extension
 will provide the best possible interface for your native 
 capabilities.   
  
 I think we are further from consensus then we first thought:
 
 Right now we have a new post recv, different delivery 
 mechanisms, and a requirement to allocate an extra 4 bytes of 
 user data. 
 
 The only requirement to support immediate data on IB, is a 
 new post send and write immediate data calls and a new event 
 data construct. The normal post_recv can be used unchanged 
 and can already process normal and immediate data. No 
 requirement on the user to allocate and manage an extra 4 
 bytes in the receive buffer. In fact, you can post receive 
 with no buffer.
 
 In order to support immediate data via iWARP, you now have a 
 requirement to use a special new receive post, new user 
 buffer constructs to place the data, and new delivery method 
 that has to be checked via provider attributes or at event time. 
 
 Is there anyway to get this closer? If not, I would recommend 
 going back to an extension interface for immediate data. 
 
 -arlin
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-01-24 Thread Kanevsky, Arkady

But this penalizes user which need to deal with 2 way to deal
with post calls and completions.

I do not think we are not to far from consensus.
Transport independent App will allocate 4 bytes extra
for buffers that can match immediate data.
Completion data will return where the immediate data is return
(Consumer can not request it on posting), and 4 bytes for immediate
data in completion event.
The rest are ironing details for complete specification.
This is no different than for any other new functionality proposed.
And except for wasting 4 bytes per buffer or completion I do
not see how it penalizes IB. Moreover if Apps knows that Provider
returns immediate data in completion event it can avoid any penalty.

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Arlin Davis [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, January 24, 2006 5:42 PM
 To: Caitlin Bestler
 Cc: Davis, Arlin R; Kanevsky, Arkady; Lentini, James; 
 [EMAIL PROTECTED]; openib-general@openib.org
 Subject: Re: [openib-general] RE: [RFC] DAT 2.0 immediate 
 data proposal
 
 ok, maybe we should backup and start over
 
 This is exactly why immediate data was initially proposed as 
 an extension instead of general API. We start to penalize 
 native IB features based on the requirements of other RDMA 
 interfaces that have to emulate the feature anyway.  What 
 prevents the next  RDMA interface that comes along from 
 requiring other variations of the interface due to 
 implementation implications?  This is an IB specific feature 
 that does not map well on iWARP so lets just call it what it 
 is and let IB providers supply immediate data capabilities 
 via the extension interface.
 
 -arlin
 
 Caitlin Bestler wrote:
 
 
 Maybe we need to just go back to one model and always deliver
 via the event? With the post_recv_immed requirements, other
 transports have a mechanism to emulate and create the
 necessary resources on the recv side to place idata and copy
 to event when operation is completed. Would this work for iWARP?
 
 
 
 Two different models for receiving idata should be avoided if
 at all possible.
 
 
 
 
 
 
 Always delivering by the event is not feasible for an iWARP vendor.
 If you are working over RDMAC verbs then the work completion is no
 longer accessible by the time the Work Completion is reaped. 
 So copying
 from the receive buffer to the event does not work since the location
 of the receive buffer is now known only to the application.
 
 The same problem exists in the opposite direction for InfiniBand HCAs
 using standard verbs. They cannot copy from the CQE to the receive
 buffer.
 
 So the user is stuck checking a flag or the event type to know where
 their data is. This is not terribly user friendly, but it is the best
 that can be offered if we want to enable this optimization. The need
 to check the flag does reduce the value of the optimization though.
 
 
   
 
 
 6. Is dto_completion_data xfer_length include immediate_data
 size or not?
 
 
 
 no
 
 
 
 
 
 
 Then how does the receiver know how much data there is?
 
 Even if an iWarp Provider attempts to optimize immediate
 placement into the CQ, it will end up setting the xfer_length
 whenever the packet is received out of order.
 
 So it is far simpler for the application to simply know that
 the data will be in the buffer, and that the xfer_length will
 be set. It doesn't need to worry about whether they were set
 by the cq_poll verb or by the hardware.
 
   
 
 
 11. Need to cleanup operation description to make it clear
 that Send|RDMA_write and immediate data part
 
 is a single atomic operation. The current followed by
 language is misleading.
 
 Make it explicit that there is a single local DTO completion
 and single remote DTO completion.
 
 
 
 Ok, I will clean that up
 
 
 
 
 
 The best mapping available over RDMAC-compliant firmware for
 an iWARP NIC would be to post two operations (RDMA Write followed
 by a short Send). That would require additional spacein the send
 and completion queues since a completion for the write can only
 be suppressed for a successful completion.
 
 Whether these extra slots were required would be an IA attribute.
 
 And the requirement is that nothing for that QP can come between
 the iWARP Write and the Send. How the provider does that is up
 to it. Options include locking over both posts and a composite
 work request. Anyone working over existing RDMAC-compliant
 verbs will have to use the first approach.
 
 
   
 
 12. Is your intension that post_recv_immed can ONLY except
 immediate data and is not
 
 capable to recv any message?
 
 
 
 No, the intention is to extend the post_recv to handle 32bit
 idata which may arrive with or without other send or 
 rdma_write data

[openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-01-23 Thread Kanevsky, Arkady

Arlin,
comments inline.

Arkady Kanevsky
email: [EMAIL PROTECTED]
Network
Appliance Inc.
phone: 781-768-5395
1601
Trapelo Rd. - Suite 16.Fax:
781-895-1195
Waltham, MA
02451
central phone: 781-768-5300

From: Davis, Arlin R
[mailto:[EMAIL PROTECTED] Sent: Monday, January 23, 2006
7:15 PMTo: Kanevsky, Arkady; Lentini, JamesCc:
openib-general@openib.org; [EMAIL PROTECTED]Subject:
RE: [RFC] DAT 2.0 immediate data proposal

Arkady,

Response
inline

From:
Kanevsky, Arkady [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 17, 2006 7:16
AMTo: Davis, Arlin R;
Lentini, JamesCc:
[EMAIL PROTECTED]; openib-general@openib.orgSubject: RE: [RFC] DAT 2.0 immediate data
proposal

Arlin,
a few things need to
be addressed.

1. correlation with
local and remote invalidate
This potentially
effects both DAT_DTOs and post operations

How
does this differ from normal sends or writes?[AK]We had added a new Send_with_Invalidate.
The completion also states
whether RMR was invalidated and
which one. But the text for interaction
is added through out the
completion and post operations.
See the latest draft of uDAPL and
kDAPL 1.3 specs on the DAT reflector.

2. Need a precise
defintion for CONFIRM_FLAG definition in a transport independent
fashion.
What guarantees DAT
Provider "provides" on successful local
completion?
Remote end
guarantee?

My understanding what
you are trying to do is create 2 models one IB and one for
iWARP.
So for IB Consumers
will use CONFIRM_FLAG and for iWARP IMMED_FLAG.
Provider will
indicate in Provider_attr which model it
supports.

The issue I have with
it is that I do not see a model that Consumer can use to
create
a transport
independent code.
It looks like
Immed_flag can be made transport independent. But with "sender"
specifying
the behavior a
protocol extension is needed for IB. IB will always deliver Immediate
data
in the header not a
payload and remote Provider can control how it is delivered to a
Consumer.
But this means that
there is no need for DTO_flags for Send side. Instead it can
be
used for Recv side or
controlled purely by Provider.

Maybe
we need to just go back to one model and always deliver via the event? With
the post_recv_immed requirements, other transports have a mechanism to emulate
and create the necessary resources on the recv side to place idata and copy to
event when operation is completed. Would this work for
iWARP?

Two
different models for receiving idata should be avoided if at all
possible.[AK]Caitlin already responded to
this.

3. Need to define
error behavior. for new operations, async errors, EP
behavior.

I
will work on updating the draft. post_send_immed will look much like post_send
and post_rdma_write_immed will look a lot like post_rdma_write with some
additional errors based on the post receive buffer
requirement.[AK]Also consider if youwantto addremote
invalidate to the new operation.

4. Need to define
DAT_Provider attributes for immediate data and dto_flags
behavior

5. Does
Solicited_wait completion_flag value now applicable for RDMA_write for
immediate data?

yes,
applicable to send, send_immed, and
write_immed

6. Is
dto_completion_data xfer_length include immediate_data size or
not?

no[AK]It can work both ways.
Either we include4 extra bytes for immediate dataor
not.
Consumerjust have to know.The real data
alwaysstarts at 4 byte boundary into the buffer
is immediate data is returned inline. We need to
state how immediate data is positioned
if it is smaller than 4
bytes.
7. what memory
privilages needed for a recv buffer for immediate data?

Based
on the operation write_immed would require write privileges and send_immed
would require recv privileges.

8. SRQ
interaction?

Good
question. all post_recv_immed or all post_recv?[AK]Will this work for the
user model? Not supporting handling immediate recv and regular recv with
potential immediate data onone
SRQ.

9. What happens of
buffer for recv operation NOT recv_immed is matched for incomming
recv/rdma_write op?

The
rules should be:
Can
receive a send, send_immed, or write_immed with recv_immed.

Cannot
receive send_immed or write_immed on a recv.

However,
I am not sure how you would enforce this on IB (DTO error on the receiving
side?) since the idata is delivered via CQ and does not require a special
receive post descriptor.[AK]We can make
thisProvider attribute. Or we can state that if immed data is return in
event
then there is no error for
recv.

10. Change
dat_ep_post_write_immed to dat_ep_post_rdma_write_immed to be consis

[openib-general] RE: [RFC] DAT 2.0 immediate data proposal

2006-01-17 Thread Kanevsky, Arkady




Arlin,
a few things need to be addressed.

1. correlation with local and remote 
invalidate
This potentially effects both DAT_DTOs and post 
operations

2. Need a precise defintion for CONFIRM_FLAG definition 
in a transport independent fashion.
What guarantees DAT Provider "provides" on successful 
local completion?
Remote end guarantee?

My understanding what you are trying to do is create 2 
models one IB and one for iWARP.
So for IB Consumers will use CONFIRM_FLAG and for iWARP 
IMMED_FLAG.
Provider will indicate in Provider_attr which model it 
supports.

The issue I have with it is that I do not see a model 
that Consumer can use to create
a transport independent code.
It looks like Immed_flag can be made transport 
independent. But with "sender" specifying
the behavior a protocol extension is needed for IB. IB 
will always deliver Immediate data
in the header not a payload and remote Provider can 
control how it is delivered to a Consumer.
But this means that there is no need for DTO_flags for 
Send side. Instead it can be
used for Recv side or controlled purely by 
Provider.

3. Need to define error behavior. for new operations, 
async errors, EP behavior.

4. Need to define DAT_Provider attributes for immediate 
data and dto_flags behavior

5. Does Solicited_wait completion_flag value now 
applicable for RDMA_write for immediate data?

6. Is dto_completion_data xfer_length include 
immediate_data size or not?

7. what memory privilages needed for a recv buffer for 
immediate data?

8. SRQ interaction?

9. What happens of buffer for recv operation NOT 
recv_immed is matched for incomming recv/rdma_write op?

10. Change dat_ep_post_write_immed to 
dat_ep_post_rdma_write_immed to be consistent with current
terminology.

11. Need to cleanup operation description to make it 
clear that Send|RDMA_write and immediate data part
is a single atomic operation. The current "followed by" 
language is misleading.
Make it explicit that there is a single local DTO 
completion and single remote DTO completion.

12. Is your intension that post_recv_immed can ONLY 
except immediate data and is not
capable to recv any message?

13. size should be num_segments for 
dat_ep_post_recv_immed()

Arkady






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
1601 
Trapelo Rd. - Suite 16.Fax: 
781-895-1195
Waltham, MA 
02451 
central phone: 781-768-5300


  
  
  From: Arlin Davis 
  [mailto:[EMAIL PROTECTED] Sent: Monday, January 16, 2006 
  5:55 PMTo: Kanevsky, Arkady; Lentini, JamesCc: 
  [EMAIL PROTECTED]; openib-general@openib.orgSubject: 
  [RFC] DAT 2.0 immediate data proposal
  
  
  Arkady,
  
  The attached proposal adds 
  immediate data options as standard APIs instead of extensions for the 
  following calls. 
  
  dat_ep_post_send_immed()
  dat_ep_post_recv_immed()
  dat_ep_post_write_immed()
  
  The patch should be ready by 
  tomorrow.
  
  Thanks,
  
  -arlin
  
  
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: [RFC] DAT 2.0 extension proposal

2006-01-17 Thread Kanevsky, Arkady




Arlin,

1. Does it mean that existing DAT providers will have 
to be modified so they report
DAT_NOT_IMPLEMENTED for each 
extension?

2. Why is there DAT_INVALID in 
DAT_DTOS?

3. Do you want to use DAT_EXTENSION_DATA or 
DAT_EXT_DATA?

4. The proposed operations are operation on EP and they 
are DTOs.
Why not define DAT_DTO_EXT_OP instead of 
DAT_EXT_OP?

MY concern is that if these are not DTO then we have a 
new event stream type
for "extensions" and we need to define rules for this 
event stream including
ordering rules and interactions with other event 
streams, provider attributes
for stream mixing and so on...

If we restrictextensions to DTO operation 
extension we avoid all these issues
and simplify APIs. On the negative side these extension 
are restrictive.

5. Memory protection extension for atomic 
operations

6. error returns for extensions?

Arkady






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
1601 
Trapelo Rd. - Suite 16.Fax: 
781-895-1195
Waltham, MA 
02451 
central phone: 781-768-5300


  
  
  From: Davis, Arlin R 
  [mailto:[EMAIL PROTECTED] Sent: Monday, January 16, 2006 
  5:55 PMTo: Kanevsky, Arkady; Lentini, JamesCc: 
  [EMAIL PROTECTED]; openib-general@openib.orgSubject: 
  [RFC] DAT 2.0 extension proposal
  
  
  Arkady,
  
  The attached proposal adds generic 
  DTO extensions and provider specific atomic operations as follow. 
  
  
  dat_ep_post_cmp_and_swap()
  dat_ep_post_fetch_and_add()
  
  The patch should be ready by 
  tomorrow.
  
  Thanks,
  
  -arlin
  
  
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [RFC][PATCH] OpenIB uDAPL extension proposal - sample immed data and atomic api's

2006-01-06 Thread Kanevsky, Arkady

comments inline.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Arlin Davis [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 05, 2006 6:35 PM
 To: Kanevsky, Arkady
 Cc: Arlin Davis; Lentini, James; 
 [EMAIL PROTECTED]; openib-general@openib.org
 Subject: Re: [openib-general] RE: [RFC][PATCH] OpenIB uDAPL 
 extension proposal - sample immed data and atomic api's
 
 Kanevsky, Arkady wrote:
 
  Arlin,
  nice proposal, thanks.
  I have one high level question and a few specific technical ones.
   
  1. Why do you want to provide this functionality via 
 extension instead 
  of part of new DAT spec, say 2.0?
  This will allow Consumers to use all events, operations, and 
  Provider/IA functionality uniformly instead of via 2 
 separate layers. 
  This will also ensure that this basic funcionality can be 
 provided by 
  all DAPL Provider the same way on DAPL and DAT layers.
  DAPL 2.0 is not done yet so we have time to incorporate that.
  DAPL 2.0 already introduced new functionality which is easy 
 to beef up 
  for your proposal.
  See DAT_DTOS for example. DAT_EVENT is also modified to 
 handle remote 
  invalidation so a small addition for Immediate data and 
 Atoimc ops is 
  a sensible addition.
  This should simplify proposal significantly. As you will 
 not need to 
  introduce any new EXT structures.
 
 As mentioned on the con-call, there are two separate items to 
 consider while looking at the proposal. The first is the 
 ability to extend DAT for specific provider value-add and the 
 second is to validate the need for general atomic and 
 immediate data functionality in the basic set of API's for 
 all providers. I included atomics and immediate data as 
 examples since it is specific to one provider (IB), it 
 includes operations that require new ops, events, and event 
 data types, and it also provides a working model to validate 
 the extension model from request to completion events. I 
 would like to concentrate on getting consensus on the 
 extension proposal first if possible. Just try to think of 
 the actual operations as some opaque dat_ext_foobar_op().

The thing that bothers me is that we already have several APIs
that are transport specific. While some are possible to implement
on other transports the others, like Socket CM, can not.
So I view both of your specific extensions as transport specific
amd hence prefer to add them as normal APIs not extensions.
The secondary goal is that Provider can add extensions without requiring
to change to DAT. These fall into 3 categories.
1. New memory types including privilages and protection attributes.
We can add extension entry to these structures. We need to check
if this is sufficient. Think of shared memory for example.
I am assuming no changes to PZ.
2. New DTOs. The main issue is not DTOs but their completions and
async errors. This is why Immediate data is better handled by
incorporating into
DAT spec while atomic can be handled by extensions. That is completion
will return
extention and Consumer will do the secondary switch on the extension
type.
Extension should not impact backwards compatibility.
We had not looked at errors. But assuming a simple model that async
errors
break connection and we can return extension error with extensions
defining
new reason. Again details need to be polished.
3. new connection types or CM models... New connections seems to have
little impact
on existing API assuming that EP type can be extended. The new
connection can even
restrict which DTO they can handle. CM model is more problematic.

Arlin, it would be nice to consider some of your other extensions that
are not
transport specific to see how it will fit before we make the final
decision.
This should give us idea how extensible DAT extension model is.


 
   
  In general, extension route was intended for RNIC|HCA providers to 
  expose HW capabilities beyond IBTA, iWARP and VIA standards. The 
  standard RDMA functionality is best handle via spec addition.
  DAT 2.0 does it for FMR, remote and local memory 
 invalidation as well 
  as others.
 
 True, but the extension route is not fully defined, 
 documented, nor implemented. This is what I would like to 
 work on getting completed in time for 2.0 if possible. 
 
 BTW: The existing implementation actually uses 
 dapl_provider-extension to store the hca_ptr but the 
 specification states that it is reserved for the providers 
 private use (8.2.1 in DAPL1.2 spec). This is why I had to 
 defined another extension_func in the patch.
 
   
  I had posted a complete list of changes/addition to DAT 2.0 about a 
  month ago.
  But we had not discussed yet version change from 1.3 to 2.0 nor how 
  much backwards compatibility spec will provide.
   
  2. What is IMMED_EVENT

[openib-general] FW: [swg] 12/6 meeting minutes (2nd half)

2005-12-07 Thread Kanevsky, Arkady

SWG have approved the IP address proposal (v5).

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Mike Ko [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, December 06, 2005 6:41 PM
 To: [EMAIL PROTECTED]
 Subject: [swg] 12/6 meeting minutes (2nd half)
 
 We had a brief discussion on the revised slide deck from 
 Arkady on the RDMA-Aware SID and CM REQ Message Extension and 
 there were no disagreements on the direction.
 
 Arkady Kanevsky from NetApp made the following motion:
 Create a new Annex for RDMA aware ULPs that includes:
 a. port mapping between IETF protocols ports and IB SIDs b. 
 CM REQ message private data format extensions c. CM usage for 
 RDMA aware ULPs
 
 Ted Kim from Sun seconded the motion.
 
 Vote count:
 Against: 0 
 Abstain: 0
 
 Motion passed.
 
 We continued with a discussion on the slide deck from Mike Ko 
 on supporting iSER on InfiniBand.  There were disagreements 
 on the merits on the need for Connection Preference bits.  We 
 decided to move forward with the rest of the suggestions from 
 Mike and postpone the decision on the CP bits until the next meeting.
 
 Mike Ko from IBM made the following motion:
 Create a new annex to support iSER on InfiniBand release 1.1 
 and 1.2 as represented in Mike Ko's slidedeck dated December 
 1 but not including the support for Connection Preference 
 bits, and also making ARI a must requirement for CM REJ.
 
 Yaron Haviv from Voltaire seconded the motion.
 
 Vote count:
 Against: 0
 Abstain: 0
 
 Motion passed.
 
 The meeting was adjourned after the vote.
 
 Mike
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [swg] RE: [openib-general] socket based connectionmodel for IBproposal -round 4

2005-12-01 Thread Kanevsky, Arkady

agreed.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
275 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, November 30, 2005 12:59 PM
 To: Yaron Haviv
 Cc: Kanevsky, Arkady; Ted H. Kim; [EMAIL PROTECTED]; 
 openib-general@openib.org
 Subject: Re: [swg] RE: [openib-general] socket based 
 connectionmodel for IBproposal -round 4
 
 Yaron Haviv wrote:
  How about using ARP to get from IP to DGID+Partition Followed by an 
  SIDR to map DGID+PKey+Service to QKey  QP
  
  It is the same concept as CMA that first uses IP stack (ARP 
 etc') to 
  get to the remote end-point (in that case GID+PKey combination) 
  followed by SA-PR and CM REQ, we just substitute the CM REQ with a 
  SIDR REQ It may not solve all the cases but probably most of the 
  practical ones
 
 This was my thought as well.
 
  Anyway the packets will need to carry some header (since it's not a 
  connected model), you can add more stuff in that header 
 (e.g. can use 
  IPoIB header as is which contains already the src/dst IP)
 
 I was assuming that each packet would need to carry some sort 
 of header.
 
 At this point, we may want to defer defining anything for UDP 
 until there's a better understanding of what an application 
 would want.  My guess is that such an application will need 
 new APIs for posting sends based on UDP addressing.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] scoket based connection model for IB - round 5

2005-12-01 Thread Kanevsky, Arkady




Here is the fifth 
and I hope the final version of the proposal.

The changes from 
previous version:
1. IBTA bit 
numbering scheme (reserse order)
2. Protocol version 
is split into major and monr wiht 4 bits each.

Arkady






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
275 Totten 
Pond Rd. 
Fax: 
781-895-1195
Waltham, MA 
02451-2010 
central phone: 781-768-5300



IP Address Support by InfiniBand CM_v5.pdf
Description: IP Address Support by InfiniBand CM_v5.pdf
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [swg] RE: [openib-general] socket based connectionmodel for IB proposal -round 4

2005-11-29 Thread Kanevsky, Arkady

Sean,
SWG discussed today the extending private data format proposal to
SIDR_REQ.
The group does not see the need for it since ULP is no RDMA aware.
That is ULP does not use RDMA operations.
Do you have some specific ULP in mind for this functionality?
For UDP a different IP address can be used for each message. There is no
persistent connection.

Arkady


Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
275 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, November 23, 2005 3:41 PM
 To: Ted H. Kim
 Cc: Kanevsky, Arkady; [EMAIL PROTECTED]; openib-general@openib.org
 Subject: Re: [swg] RE: [openib-general] socket based 
 connectionmodel for IB proposal -round 4
 
 Ted H. Kim wrote:
  I know we originally set out to compress everything down to the 
  minimum to preserve as much ULP specific private data as 
 possible. But 
  it seems to me in the current proposal we have reserved space now 
  which could be used to re-expand the version to major 4-bits and 
  minor-4 bits without harming anything else.
 
 I don't see any benefit to having 2 4-bit version numbers 
 over a single 8-bit number.  A single 4-bit version number 
 should suffice.  If all version numbers are ever consumed, 
 then version 15 can define an extended version field.  IMO, 
 multiple version fields simply complicate the implementation.
 
 I would rather see the reserved space used to define the size 
 of carried user-private data.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] socket based connection model for IB proposal -round 4

2005-11-23 Thread Kanevsky, Arkady

Yes.
The private data format is not RC or UC specific.
I will add this comment that format covers both EE and C.

Is this sufficient?
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
275 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, November 17, 2005 12:40 PM
 To: Kanevsky, Arkady; [EMAIL PROTECTED]; 
 openib-general@openib.org; [EMAIL PROTECTED]
 Subject: RE: [openib-general] socket based connection model 
 for IB proposal -round 4
 
 
 If the proposal will include UDP, should the definition 
 extend beyond connections to include UD QPs as well (i.e. SIDR REQ)?
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [swg] RE: [openib-general] socket based connectionmodel for IB proposal -round 4

2005-11-23 Thread Kanevsky, Arkady

This is fine with me.
I will update the proposal with this for next version.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
275 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 

 -Original Message-
 From: Ted H. Kim [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, November 23, 2005 3:29 PM
 To: Kanevsky, Arkady
 Cc: [EMAIL PROTECTED]; openib-general@openib.org; 
 [EMAIL PROTECTED]
 Subject: Re: [swg] RE: [openib-general] socket based 
 connectionmodel for IB proposal -round 4
 
 Arkady,
 
 I know we originally set out to compress everything down to 
 the minimum to preserve as much ULP specific private data as 
 possible. But it seems to me in the current proposal we have 
 reserved space now which could be used to re-expand the 
 version to major 4-bits and minor-4 bits without harming 
 anything else.
 
 Can we entertain that as an option?
 My rationale is to err on the side of perhaps a little too 
 much version room than too little. This will put it in line 
 with the precedent of SDP.
 
 -ted
 
 
 
 Kanevsky, Arkady wrote:
  pdf version of the proposal.
   
  
  Arkady Kanevsky   email: [EMAIL PROTECTED] 
  mailto:[EMAIL PROTECTED]
  
  Network Appliance Inc.   phone: 781-768-5395
  
  275 Totten Pond Rd.  Fax: 781-895-1195
  
  Waltham, MA 02451-2010  central phone: 781-768-5300
  
   
  
  
 --
 --
  *From:* Kanevsky, Arkady
  *Sent:* Wednesday, November 16, 2005 11:59 AM
  *To:* [EMAIL PROTECTED]; openib-general@openib.org;
  [EMAIL PROTECTED]
  *Subject:* [openib-general] socket based connectionmodel for IB
  proposal -round 4
  
  This version incorporate the feedback on 3 reflectors and
  yesterday's SWG meeting.
   
  Major changes from previous version are:
  no REQ bit to identify private data formaing - SID 
 range used instead
  port mapping uses IBTA space and IETF protocol # is 
 encoded in SID
  protocol version is 4 bits.
   
  Arkady
   
  
  Arkady Kanevsky   email: [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED]
  
  Network Appliance Inc.   phone: 781-768-5395
  
  275 Totten Pond Rd.  Fax: 781-895-1195
  
  Waltham, MA 02451-2010  central phone: 781-768-5300
  
   
 
 --
 Ted H. Kim
 Sun Microsystems, Inc.  [EMAIL PROTECTED]
 222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
 El Segundo, CA  90245   (310) 341-1120 FAX
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] socket based connectionmodel for IB proposal -round 4

2005-11-16 Thread Kanevsky, Arkady




pdf version of the proposal.






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
275 Totten 
Pond Rd. 
Fax: 
781-895-1195
Waltham, MA 
02451-2010 
central phone: 781-768-5300


  
  
  From: Kanevsky, Arkady Sent: 
  Wednesday, November 16, 2005 11:59 AMTo: [EMAIL PROTECTED]; 
  openib-general@openib.org; [EMAIL PROTECTED]Subject: 
  [openib-general] socket based connectionmodel for IB proposal -round 
  4
  
  This version 
  incorporate the feedback on 3 reflectors and yesterday's SWG 
  meeting.
  
  Major changes from 
  previous version are:
  no REQ bit to 
  identify private data formaing - SID range used instead
  port mapping uses 
  IBTA space and IETF protocol # is encoded in SID
  protocol version 
  is 4 bits.
  
  Arkady
  
  

  

  
  Arkady Kanevsky 
  email: [EMAIL PROTECTED]
  Network 
  Appliance Inc. 
  phone: 781-768-5395
  275 Totten Pond Rd. 
  Fax: 
  781-895-1195
  Waltham, 
  MA 02451-2010 
  central phone: 781-768-5300
  


IP Address Support by InfiniBand CM_v4.pdf
Description: IP Address Support by InfiniBand CM_v4.pdf
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] socket based connectionmodel for IB proposal - round 3

2005-11-15 Thread Kanevsky, Arkady

The goal that this proposal is to provide underpinning for common RDMA
transport CM.
Thus, the API ULP (both user space and kernel space) use socket
addressing.

For ULP addressing this means 5 tuple: protocol, src IP addr, src port,
dst IP addr,
and dst port. Port is 16 bit entity.

The proposal just provide a mechanism for exchanging this 5-tuple
between two sides.

Which entity is responsible to use the proposed protocol is an
interesting one.
I was assuming that this will be CM. After all the proposed protocol is
CM extension
protocol.
But it can be another entity module between CM and ULP.
Its job will be taking 5 tuple and populating private data and
converting dst port to SID. Since OpenIB addr.c already deals with IP to
IB address translations it is a logical candidate for it. On remote side
it extracts info from private data and populates socket info for
Consumer and passes Consumer a pointer to Consumer private data.

Another interesting place to deal with is listening point.
Since it is common RDMA API, 16 bit port should be use for it also.
This means that the same module should  locally convert port to IB SID
before passing it
to CM.
CM just ensures that incoming connection request which matches listening
SID.

While it is possible to do wildcarding on the whole SID, I had not seen
it is used selectively on individual bits of a SID or a port.

While SDP does the conversion to IB SID from Ethernet port, this
proposal
shift the responsibility for port and IP address conversion from ULP
down.

Now lets look at each field proposed to be moved from protocol private
data to SID.

Protocol version. This mean that in the future if protocol version will
be
bumped up we will have to change the SID on which Consumer listens on
and
requests sent to. Not sure how to do that without changing ULP. Does not
look
like a good idea.

IP version. This can be incorporated into SID. But if HCA has multiple
IP addresses
assigned to it the listening point need to specify its IP address(es).
The current verbs and/or API will have to be changed to support it.
But if socket is passed to listen on it does have all the needed info.
Looks fine.

Ethernet Protocol. The same as the one above.

Src port. Very questionable. For that listening SID must have wild card
for portion
of SID where SRC port is incorporated. Since ULP is not aware or ever
see it,
it is possible. But this pushes the definition of SID beyond it current
IBTA spec
statement of similar to TCP port number. The query of listen point
should also
hide the wildcarded SID in this case.

DAPL APIs (uDAPL and kDAPL) does not expose local IP address for listen
point.
An additional API can be added to support passing local socket to listen
on
instead of Connection Qualifier. Since it is addition no backwards
compatibility issues.
The current ULPs/Apps will still use the default API address and the
protocol assigned SID as connection qualifier.

The new API ensures that locally SID conversion takes place.
The use of protocol defined range of SIDs ensures that remote side knows
to parse
private data according to proposed protocol format.

Arkady


Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
275 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Friday, November 11, 2005 12:43 PM
 To: Kanevsky, Arkady
 Cc: Sean Hefty; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; openib-general@openib.org
 Subject: Re: [openib-general] RE: [dat-discussions] socket 
 based connectionmodel for IB proposal - round 3
 
 Kanevsky, Arkady wrote:
  So what you are proposing is that Listener will specify 
 IETF port (2 
  bytes).
  CM will generate an IB SID to listen on. That SID will have 
  wildcarding for 24 bits.
  The requestor will specify: version, IP version, SRC port 
 and DST port.
  Based on that CM will generate the SID to send request to.
 
 No, the listener or requester generate the SID, not the IB CM 
 - the same way SDP works today.
 
  It will also encode IP addresses into Private data based on 
 IP version.
  
  This makes IP addresses, SIDs and private data format 
 interdependent 
  and not orthogonal which it is now.
  It also changes the meaning of SID which currently has a meaning of 
  TCP port.
 
 I'm not proposing this.  I'm merely stating that is is a 
 valid option to consider.  The private data format and SIDs 
 are not orthogonal anyway.  The port number's embedded in the 
 SID, and the SID indicates the format of the private data.  
 They are interdependent by definition.
 
 If it's okay to put the destination port number in the SID, 
 why not the protocol type, or IP version?
 
  It also does not allow to use the private data formating 
 for other SIDs.
 
 Private data is private.  It should not be owned, set, 
 interpreted, modified, or touched

RE: [openib-general] RE: [dat-discussions] socket based connectionmodel for IB proposal - round 3

2005-11-11 Thread Kanevsky, Arkady

So what you are proposing is that Listener will specify IETF port (2
bytes).
CM will generate an IB SID to listen on. That SID will have wildcarding
for 24 bits.
The requestor will specify: version, IP version, SRC port and DST port.
Based on that CM will generate the SID to send request to.
It will also encode IP addresses into Private data based on IP version.

This makes IP addresses, SIDs and private data format interdependent and
not
orthogonal which it is now.
It also changes the meaning of SID which currently has a meaning of TCP
port.

It also does not allow to use the private data formating for other SIDs.

It looks like a big hack. Is it worth it for extra 4 bytes of private
data
for Consumers?

Arkady


Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
275 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, November 10, 2005 6:53 PM
 To: Kanevsky, Arkady; Sean Hefty
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 openib-general@openib.org
 Subject: RE: [openib-general] RE: [dat-discussions] socket 
 based connectionmodel for IB proposal - round 3
 
  If you want to maximize consumer usable private data, then you can 
  move the version, IP version, protocol, source and 
 destination ports 
  into the service ID.
 
 Not at the expense of redefining what Service ID is.
 How do you propose to move all these fields into Service ID without 
 violating IBTA spec Annex A3.2.? Remember Service ID is what 
 responder 
 advertize and requestor sends communucation requests to. It may be 
 possible to server to advertize multiple service IDs to 
 cover version 
 and IP version variations but it will not be symmetrical to 
 iWARP. Port 
 is port (service ID) and address is address. Port does not encode IP 
 version.
 
 The service ID could be formatted as:
 
 Set ID:   24
 Version:   4  
 IP version:4
 Src port: 16
 Dst port: 16
 
 I don't see how this violates the spec.  Beyond the set ID, 
 the rest is defined as any.  It's not necessary, but it 
 does save 4 bytes of private data for the user.
 
  Separately, if there's any defined mapping to a service ID 
 or set of 
  service IDs, then the service ID indicates the format of 
 the private 
  data.  No additional information is needed in the CM REQ, such as 
  using a reserve bit.
 
 That is a good point.
 But this restricts the usage of IP addressing only to these ports.
 
 It doesn't restrict the usage at all.  It defines a portion 
 of the private data for a specific range of service IDs, the 
 same way it is done for SDP.  There's no restriction that 
 other service IDs not use the same format.
 
 Even with the proposal to use a reserved bit in the CM, a 
 particular service could format its private data this way, 
 not set the bit, and still be spec compliant.
 
 The question is what is easier to check 1 bit or Service ID.
 Of course, service ID will have to be checked anyhow to direct the 
 request.
 
 Exactly.  If the service ID is checked anyway, why set the bit?
 
 While this overloads the semantic meaning of Service ID it 
 is a viable 
 method.
 
 How is this not viable?  There's a _working_ implementation 
 today for both userspace and kernel mode clients to connect 
 using IP addressing that didn't require any modifications to 
 the IB CM.
 
  To be clear, the CM REQ _carries_ the IP address.  There 
 should be no 
  requirement that the CM performs the mapping, and I see no 
 reason why 
  it should even care.
 
 Can you elaborate on this? Is this addresses who populates 
 the formated 
 portion of the provate data?
 
 I'm referring to who formats the private data and performs 
 the mapping to the service IDs (slide 13)
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: [dat-discussions] socket based connection model for IB proposal - round 3

2005-11-10 Thread Kanevsky, Arkady




Fixed the bit value for formating 
indicator.






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
275 Totten 
Pond Rd. 
Fax: 
781-895-1195
Waltham, MA 
02451-2010 
central phone: 781-768-5300


IP Address Support by InfiniBand CM_v3.pdf
Description: IP Address Support by InfiniBand CM_v3.pdf
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] ping over IPoIB does not work between 2 cards on the same host

2005-10-27 Thread Kanevsky, Arkady

I have a host with 2 HCAs (dual port each but I only connected one port
per 
machine) connected to a switch.

When IPoIB configured I ping cards own IP address it works.
I can ping another machines with their HCA cards configured with IPoIB
fine.
And I can ping both local IP addresses from remote machine(s)

Details:

ifconfig ib1 192.168.0.1 netmask 255.255.0.0 ifconfig ib3 192.168.0.3
netmask 255.255.0.0

On remote machine:

ifconfig ib0 192.168.1.0 netmask 255.255.0.0

Locally:

ping -I ib3 192.168.0.3

PING 192.168.0.3 (192.168.97.3) from 192.168.0.3 ib3: 56(84) bytes of
data.

64 bytes from 192.168.0.3: icmp_seq=0 ttl=64 time=0.028 ms

ping -I ib1 192.168.0.1

PING 192.168.0.1 (192.168.97.1) from 192.168.0.1 ib1: 56(84) bytes of
data.

64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.028 ms

# ping -I ib3 192.168.1.0

PING 192.168.1.0 (192.168.1.0) from 192.168.0.3 ib3: 56(84) bytes of
data.

64 bytes from 192.168.1.0: icmp_seq=0 ttl=64 time=1.81 ms

From remote host:

# ping -I ib0 192.168.0.1

PING 192.168.0.1 (192.168.0.1) from 192.168.1.0 ib0: 56(84) bytes of
data.

64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.086 ms

# ping -I ib0 192.168.0.3

PING 192.168.0.3 (192.168.0.3) from 192.168.1.0 ib0: 56(84) bytes of
data.

64 bytes from 192.168.0.1: icmp_seq=0 ttl=64 time=0.086 ms

Locally between 2 cards:# ping -I ib3 192.168.0.1 PING 192.168.0.1
(192.168.0.1) from 192.168.0.3 ib3: 56(84) bytes of data.

From 192.168.0.3 icmp_seq=1 Destination Host Unreachable From
192.168.0.3 icmp_seq=2 Destination Host Unreachable From 192.168.0.3
icmp_seq=3 Destination Host Unreachable

Arkady

 

Arkady Kanevsky   email: [EMAIL PROTECTED]

Network Appliance Inc.   phone: 781-768-5395

275 Totten Pond Rd.  Fax: 781-895-1195

Waltham, MA 02451-2010  central phone: 781-768-5300
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [swg] RE: [openib-general] RE: [dat-discussions] round 2 - proposal forsocket based connection model

2005-10-26 Thread Kanevsky, Arkady

Of course, you can encode versions into service Id.
But that will mix concepts.
And I do not believe that is worse it to provide a couple more
bytes of Consumer private data.
This encoding will not be enough to give Consumer 64 bytes of private
data.

The port numbers are mapped differently for different protocol numbers
(families).
If we only concern with TCP port mapping this will not be needed.
But ULP right now make its decision by standard socket 5-tuple
which does include it.
I prefer that we do not require any changes in ULP to run over IB.
We can do that in the API if there is no need to support more than just
TCP. IN this case API can always return the protocol number for TCP
to a Consumer.

One concern I have is that some existing ULPs (say SDP)
rely on the existing format of the private data.
Thus, it would not want to use this CM encoding.
I do not want to force it to change.
Thus, a bit in CM which indicate whether encoding is present
looks like a right approach.

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Yaron Haviv [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 26, 2005 12:21 PM
 To: Kanevsky, Arkady; Sean Hefty
 Cc: [EMAIL PROTECTED]; openib-general@openib.org; 
 [EMAIL PROTECTED]
 Subject: [swg] RE: [openib-general] RE: [dat-discussions] 
 round 2 - proposal forsocket based connection model
 
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:openib-general- 
  [EMAIL PROTECTED] On Behalf Of Kanevsky, Arkady
  Sent: Tuesday, October 25, 2005 1:26 PM
  To: Sean Hefty
  Cc: [EMAIL PROTECTED]; openib-general@openib.org; dat- 
  [EMAIL PROTECTED]
  Subject: RE: [openib-general] RE: [dat-discussions] round 2 
 - proposal 
  forsocket based connection model
  
  Think of a single API that supports iWARP and IB (transport
 independent
  API).
  To a connection listener it provides the IP 5-tuple + private data. 
  For IB it means that CM parses REQ and extracts IP 5-tuple 
 as separate 
  fields from private data. Listener does not parse the private data 
  encoding of the proposal.
  
  So CM need to know if it need to encode IP 5-tuple on 
 requestor side 
  and if need to parse on responder side. Arkady
  
 
 Arkady, I agree with Sean you can encode the Dest Port in the 
 ServiceID
 And if you really want to verify its using that format you can look at
 the upper 48 bits in the serviceID.
 
 We may need to distinguish between Explicit RDMA protocols (iSER,
 NFS-RDMA, RDP, etc') and Implicit RDMA (SDP, where the Socket
 application doesn't know it is using RDMA), this can be done 
 in 3 ways:
 a. port mapper, b. different ServiceID prefix, or c. a bit in 
 the CM REQ
 Header.
 
 Also I'm not sure why we need the Protocol (UDP, TCP, SCTP, 
 ..) since we
 emulate RDMA we shouldn't care if its TCP or SCTP, and UDP is
 unconnected and cant drive RDMA anyway 
 
 Yaron
 
 
  
  Arkady Kanevsky   email: [EMAIL PROTECTED]
  Network Appliance phone: 781-768-5395
  375 Totten Pond Rd.  Fax: 781-895-1195
  Waltham, MA 02451-2010  central phone: 781-768-5300
  
  
  
   -Original Message-
   From: Sean Hefty [mailto:[EMAIL PROTECTED]
   Sent: Tuesday, October 25, 2005 1:08 PM
   To: Kanevsky, Arkady
   Cc: Caitlin Bestler; [EMAIL PROTECTED];
   openib-general@openib.org; [EMAIL PROTECTED]
   Subject: Re: [openib-general] RE: [dat-discussions] round 2 -
   proposal for socket based connection model
  
  
   Kanevsky, Arkady wrote:
Correct.
But this does bring the question how responder CM knows
   that it need
to parse the private data. I suspect this will be done via
   new version
of CM. But a suage of some of the CM REQ reserved 
 fields are also
possible. Anotherwords the current CM version assumes 
 that CM only
supports one version and there is no need to support more than 1
version.
  
   The responder knows how to parse the private data based on
   the service ID that
   they're listening on.  This is how it's done today, and how
   it will still need
   to be done.  What is the motivation to change it?
  
   What data is beyond the addressing?  How does the responder
   know how to
   interpret that?
  
   - Sean
  
  ___
  openib-general mailing list
  openib-general@openib.org
  http://openib.org/mailman/listinfo/openib-general
  
  To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-
  general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] round 2 - proposal for socketbased connectionmodel

2005-10-26 Thread Kanevsky, Arkady

This is the whole purpose of the protocol.
It is OS independent and ensures interoperability.
Nobody will change their OS protocol implementation
so it can communicate to Linux (or any other OS or vendor)
that invented its own protocol...
It is not OS (linux no exception) job to invent protocols.

But I think this argument have been bitten enough already.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Woodruff, Robert J [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 26, 2005 11:33 AM
 To: Kanevsky, Arkady; Sean Hefty
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [openib-general] round 2 - proposal for 
 socketbased connectionmodel
 
 
 Arkady wrote, 
 This is what we are trying to avoid.
 ULP should not change regardless whether or not it is running on IB, 
 iWARP, VIA or any other RDMA transport.
 
 The whole point of the CMA is that the ULP can code to an
 API that is independent of RDMA interconnect. The
 CMA wire protocol can be documented to allow
 non-Linux hosts to connect to a Linux box using 
 the same protocol. There is no need to change the existing
 IB CM protocol to accomplish this. All that is needed is
 to document that CMA protocol (contained in the private data 
 field of the IB CM requests).
 
 woody
 
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model

2005-10-25 Thread Kanevsky, Arkady

Title: Message



Caitlin,
how 
does it change the proposed protocol?
Arkady







Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance 
phone: 781-768-5395
375 Totten 
Pond Rd. 
Fax: 781-895-1195
Waltham, MA 
02451-2010 
central phone: 781-768-5300


  
  -Original Message-From: Caitlin Bestler 
  [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 
  12:36 PMTo: [EMAIL PROTECTED]; 
  openib-general@openib.org; [EMAIL PROTECTED]Subject: 
  [openib-general] RE: [dat-discussions] round 2 - proposal for socket based 
  connection model
  On an IP network, a non-privileged user is generally not 
  capable of forging
  a source IP address and is typically prevented from using 
  certain source ports.
  
  I would propose that the CM [MAY|SHOULD|MUST] enforce 
  that a non-privileged
  user can only use aSource IP Address and Port that 
  they would have been
  able to use following the normal stack path (or what it 
  would have been in the
  case that there is no conventional IP stack associated 
  with this path).
  
  So if IPoIB is installed, you would not be able to use 
  any address that
  you would have been blocked from using over IPoIB. Or at 
  least you
  would not be guaranteed that you 
  could.
  
  I think that MUST is the correct level of enforcement, 
  but it needs to be
  clear that the CM and OS *MAY* do this checking and that 
  a userspace
  IB application cannot use the IB stack to perform IP 
  spoofing.
  


From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Kanevsky, 
ArkadySent: Tuesday, October 25, 2005 9:00 AMTo: 
openib-general@openib.org; [EMAIL PROTECTED]; 
[EMAIL PROTECTED]Subject: [dat-discussions] round 2 - proposal 
for socket based connection model

Dear OpenIB, SWG 
and DAT members,
enclosed is teh 
second version of the proposal.
There are really 
2 proposals that are related.
The first one is 
encoding IP 5-tuple into REQ private data
with small 
additional info for versioning and IB capabilities.
The second is 
just a couple of ideas, not areal proposal,
on maping of IP 
ports
to IB Service 
IDs.

Thanks everybody 
for tons of feedback and deep discussions.
I appologize if 
I had missed something.

Happy 
reading,
Arkady






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance 
phone: 781-768-5395
375 Totten Pond Rd. 
Fax: 781-895-1195
Waltham, MA 02451-2010 
central phone: 781-768-5300




YAHOO! GROUPS LINKS 

  Visit your group "dat-discussions" 
  on the web. 
  To unsubscribe from this group, send an email 
  to:[EMAIL PROTECTED] 
  
  Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service. 
  



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model

2005-10-25 Thread Kanevsky, Arkady

Correct.
But this does bring the question how responder CM knows that it need to
parse
the private data. I suspect this will be done via new version of CM.
But a suage of some of the CM REQ reserved fields are also possible.
Anotherwords the current CM version assumes that CM only supports
one version and there is no need to support more than 1 version.

This proposal may change this assumption.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 25, 2005 12:56 PM
 To: Caitlin Bestler
 Cc: Kanevsky, Arkady; [EMAIL PROTECTED]; 
 openib-general@openib.org; [EMAIL PROTECTED]
 Subject: Re: [openib-general] RE: [dat-discussions] round 2 - 
 proposal for socket based connection model
 
 
 Caitlin Bestler wrote:
  I believe it requires a CM protocol version change, or a IP Address
  Header present bit.
   
  Basically, userspace consumers can supply *any* 72 bytes of private 
  data
  currently.
  To maintain backwards compatability you need an 
 authenticator that says 
  this IP
  header data vouched for by privileged components on this 
 end, and that 
  authenticator
  cannot be within the private data.
 
 I believe that the solution is keep the CM protocol as is.  
 The CM private data 
 should be completely controlled by the service.  The IB CM 
 does not care if an 
 IP address is in the private data or not.
 
 My reading of the proposal is that it defines a private data 
 format that a 
 particular service may or may not use.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model

2005-10-25 Thread Kanevsky, Arkady

Think of a single API that supports iWARP and IB (transport independent
API).
To a connection listener it provides the IP 5-tuple + private data.
For IB it means that CM parses REQ and extracts IP 5-tuple as separate
fields from private data.
Listener does not parse the private data encoding of the proposal.

So CM need to know if it need to encode IP 5-tuple on requestor side
and if need to parse on responder side.
Arkady


Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 25, 2005 1:08 PM
 To: Kanevsky, Arkady
 Cc: Caitlin Bestler; [EMAIL PROTECTED]; 
 openib-general@openib.org; [EMAIL PROTECTED]
 Subject: Re: [openib-general] RE: [dat-discussions] round 2 - 
 proposal for socket based connection model
 
 
 Kanevsky, Arkady wrote:
  Correct.
  But this does bring the question how responder CM knows 
 that it need 
  to parse the private data. I suspect this will be done via 
 new version 
  of CM. But a suage of some of the CM REQ reserved fields are also 
  possible. Anotherwords the current CM version assumes that CM only 
  supports one version and there is no need to support more than 1 
  version.
 
 The responder knows how to parse the private data based on 
 the service ID that 
 they're listening on.  This is how it's done today, and how 
 it will still need 
 to be done.  What is the motivation to change it?
 
 What data is beyond the addressing?  How does the responder 
 know how to 
 interpret that?
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model

2005-10-25 Thread Kanevsky, Arkady

Sean,
The reason IBTA is interested to address IP address issue
is because of multiple UPLs and APIs want to support
socket based connection model. Sure each one of them
can define its own protocol (for private data).
But this will not ensure interoperability.

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 25, 2005 1:34 PM
 To: Kanevsky, Arkady
 Cc: Caitlin Bestler; openib-general@openib.org; [EMAIL PROTECTED]
 Subject: Re: [openib-general] RE: [dat-discussions] round 2 - 
 proposal for socket based connection model
 
 
 Kanevsky, Arkady wrote:
  Think of a single API that supports iWARP and IB (transport 
  independent API).
 
 The CMA implements this today and did not require any changes 
 to the IB CM.
 
  To a connection listener it provides the IP 5-tuple + private data. 
  For IB it means that CM parses REQ and extracts IP 5-tuple 
 as separate 
  fields from private data.
 
 Why push this down into the CM?  The CM should operate on IB 
 addresses, not IP 
 addresses.  The mapping of IP addresses to IB addresses is 
 done at a higher level.
 
  Listener does not parse the private data encoding of the proposal.
 
 The listener is the one who cares about the IP addressing.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: [dat-discussions] round 2 - proposal for socket based connection model

2005-10-25 Thread Kanevsky, Arkady

It is APIs not ULPs that are concern.
Each ULP can define its own protocol.
But APIs can not.
But defining a protocol for each ULP is also bad.
This proposal defines it for all ULPs.
If ULP uses API, it does the parsing.
If ULP uses verbs it can do the parsing and encoding itself.
But in the later case it will have to have a different ULP
CM for each transport. Bad idea.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 25, 2005 1:52 PM
 To: Kanevsky, Arkady
 Cc: Caitlin Bestler; openib-general@openib.org; [EMAIL PROTECTED]
 Subject: Re: [openib-general] RE: [dat-discussions] round 2 - 
 proposal for socket based connection model
 
 
 Kanevsky, Arkady wrote:
  Sean,
  The reason IBTA is interested to address IP address issue
  is because of multiple UPLs and APIs want to support
  socket based connection model. Sure each one of them
  can define its own protocol (for private data).
  But this will not ensure interoperability.
 
 There's no interoperability between different ULPs anyway.  
 Each does define its 
 own protocol.  Trying to standardize part of the CM REQ 
 private data doesn't 
 help in this regard.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] round 2 - proposal for socket based connectionmodel

2005-10-25 Thread Kanevsky, Arkady

Title: Message



Sean,
answers in-line.
Arkady






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance 
phone: 781-768-5395
375 Totten 
Pond Rd. 
Fax: 781-895-1195
Waltham, MA 
02451-2010 
central phone: 781-768-5300


  
  -Original Message-From: Sean Hefty 
  [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 1:05 
  PMTo: Kanevsky, Arkady; openib-general@openib.org; 
  [EMAIL PROTECTED]Subject: RE: [openib-general] round 2 - 
  proposal for socket based connectionmodel
  
  
  
  Dear OpenIB, SWG and DAT 
  members,
  
  enclosed is teh second version of 
  the proposal.
  
  There are really 2 proposals that 
  are related.
  
  The first one is encoding IP 
  5-tuple into REQ private data
  
  with small additional info for 
  versioning and IB capabilities.
  
  The second is just a couple of 
  ideas, not areal proposal,
  
  on maping of IP 
  ports
  
  to IB Service 
  IDs.
  
  
  Comments on the 
  private data format:
  
  Combine major/minor 
  version into a single field. Theres no advantage to have two fields, so 
  keep it simple.[AK]agree
  
  Remove ZB and SI 
  bits. These are unrelated to socket addressing.[AK]That is true these are 
  unrelated to socket addressing. But sinceseveral ULPs over IB need this 
  info
  it can be added to the generic CM 
  extensionsfor IB.
  I will rename the proposal to 
  deal with it.
  I prefer a single private data 
  formating proposalrather then several layered on top ofeach 
  other.
  If IBTA think this is 
  genericenough and want to redefinesomereserved fields for it 
  - good.
  This is captured in discussion 
  slides.
  
  If the destination 
  port number is encoded in a service ID, then it can be removed from the 
  private data.[AK]This is dependent on how port mapping to Service ID is 
  done.But if SDP willincorporate this into 
  hello-wold
  protocol thismay still be needed.With 
  64-bytesConsumer private data requirement relaxed saving 2 
  bytes
  will not make much 
  difference.
  
  The transport 
  protocol number could also be encoded in the service ID and removed from the 
  private data. Actually, the version, IP version, and source port could 
  all be encoded in the service ID, limiting the private data to just 32 bytes 
  of IP addresses.[AK]EncodingIP version into Service ID sounds 
  strange. Service ID is a pprt equivalent. Sure it is much larger than IP ports 
  but why does CM extensions should encode more than port into 
  it?
  Even with this Consumer private data is still only 60 
  bytes (not old 64-bytes requirement).
  
  - 
  Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] round 2 - proposal for socket based connectionmodel

2005-10-25 Thread Kanevsky, Arkady

What are you trying to achieve?

I am trying to define an IB REQ protocol extension that
support IP connection 5-tuple exchange between connection
requestor and responder.
And define mapping between IP 5-tuple and IB entities.

That way ULP which was written to TCP/IP, UDP/IP, CSTP/IP (and so on)
can use RDMA transport without change.
To modify ULP to know that it runs on top of IB vs. iWARP
vs. (any other RDMA transport) is bad idea.
It is one thing to choose proper port to connect.
Completely different to ask ULP to parse private data
in transport specific way.

The same protocol must support both user level ULPs
and kernel level ULPs.
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 25, 2005 3:22 PM
 To: Kanevsky, Arkady
 Cc: Sean Hefty; openib-general@openib.org; [EMAIL PROTECTED]
 Subject: Re: [openib-general] round 2 - proposal for socket 
 based connectionmodel
 
 
 Kanevsky, Arkady wrote:
  Sean,
  answers in-line.
  Arkady
 
 At this point, I'm just going to disagree with this approach 
 and move on with 
 the current implementation of the CMA.  What's needed is a 
 service that provides 
 IB connections using TCP/IP addressing.  I don't believe this 
 proposal meets 
 this goal.
 
 To meet the requirement of connecting over IB using TCP/IP 
 addressing, I believe 
 that we need a service with a reserved service identifier or range of 
 identifiers, a mechanism for mapping between IP and IB 
 addresses, and a 
 mechanism for reversing the mapping.
 
 I don't see where the proposal addresses the bulk of the work 
 that's required, 
 nor do I think that it will present an API to the user that 
 does not expose IB 
 related addressing (such as service IDs).
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [swg] Re: [openib-general] TCP/IP connection service over IB

2005-10-25 Thread Kanevsky, Arkady

DAPL also strip this private data header
and present to Consumer IP addresses and ports as separate items
from Consumer private data.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Tom Tucker [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 25, 2005 5:52 PM
 To: Ted H. Kim
 Cc: [EMAIL PROTECTED]; openib-general
 Subject: Re: [swg] Re: [openib-general] TCP/IP connection 
 service over IB
 
 
 On Tue, 2005-10-25 at 13:16 -0700, Ted H. Kim wrote:
  Tom,
  
  Some comments inline ...
  
  
  Tom Tucker wrote:
   I think it's relevant, so let's make sure my assumptions are 
   correct:
   
   - The ITAPI will be a ULP on OpenIB
  
  ITAPI is like uDAPL, so if uDAPL is a ULP then the answer is yes. 
  The point is that for uDAPL you have the actual app running over 
  uDAPL. So I guess it's a matter of terminology whether 
 uDAPL is a ULP 
  or is it some sort of middleware with the app being the ULP.
  
 
 Yeah, you're right the terminology is probably a little 
 goofy. The reason for the goofosity is that some of the ulp 
 really are protocols (ISER, IPoIB), and some are API (DAPL, 
 MPI). All use the same interface 
 to register with OpenIB. 
 
 But that said, yes, ITAPI is like uDAPL.
 
  
   - The ITAPI will create the IRD/ORD headers in its 
 private data and 
   submit this as part of its connection establishment.
   - The ITAPI consumer at the remote peer will use this data to 
   configure it's local QP before accepting the connection
   
   Over IB, the IRD/ORD private data will be prepended with 
 a private 
   data header that contains the source and destination IP 
 addresses, 
   source port, etc... The remote peer will not see this 
 data as part 
   of the private data, but rather will see it in the CMA 
 event in the 
   upcall.
  
  Over IB, the IRD/ORD data is already built in to the 
 standard CM stuff 
  (i.e. the responder resources and initiator depth fields of REQ 
  and REP). So no additional demands are made on private data 
 for IB in 
  ITAPI for the IOH purpose. Of course the ITAPI app (like a 
 uDAPL app) 
  can also use private data for app specific/ULP reasons.
 
 ok -- bad example. Sorry. This is a weird one. On iWARP, you 
 need the private data header to pass this stuff along and on 
 IB, you don't. What I was trying to say is that whatever the 
 private data, on IB it will get a private data header 
 prepended and on iWARP, it won't.
 
  
  
   Over iWARP/MPA, there will be nothing else in the private data 
   except what was provided by the consumer (ITAPI in this 
 case). The 
   reason being that this extra information (IP addressing 
 info) is in 
   the protocol header proper.
  
  Just to restate for clarity, ITAPI for iWARP will use the first 16 
  bytes of MPA private date for the IOH (IRD/ORD header). The rest is 
  usable for app/ULP reasons.
 
 Yessir. And in fact, the ITAPI CM will strip this stuff 
 before presenting it to the app.
 
  
  
  I should point out that there was once a proposal of doing 
 a RDDP IETF 
  draft which would have sub-divided the MPA private data into a 
  middleware section and an app section. The idea was to be sure 
  that the app/ULP and middleware (e.g. the IOH) uses of private data 
  would not step on each other. I think this idea did not progress, 
  mostly because the author (John Carrier, formerly of 
 Adaptec) changed 
  jobs and was no longer working on iWARP stuff.
  
  While not directly proposed, this idea could have been 
 carried over to 
  IB. Some of the ideas on this thread are already implicitly 
 doing this 
  middleware (for IP addressing purpose) vs ULP/app split.
  
 
 I think we are grappling with a lot of these layering issues 
 now. We are also grappling with protocol vs. implementation issues.  
 
 Keep it coming, because this is exactly the kind of feedback 
 I think we need.
 
  -ted
  
 ___
 openib-general mailing list
 openib-general@openib.org 
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] round 2 - proposal for socket basedconnection model

2005-10-25 Thread Kanevsky, Arkady

No.
iWARP does not have to pass this info.
The info is needed for IB because ZB and SI were introduced
in IBTA 1.2 specs as optional functionality.
So if ULP wants to use that functionality it need to find
out whether remote side can support it.
This is needed for backwards compatibility.
For example iSER protocol defines the use of remote invalidate
but obviously can not be done if remote side can not support it.

I do not recall right now whether iWARP defined that functionality
as required or optional.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Tom Tucker [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 25, 2005 5:56 PM
 To: Kanevsky, Arkady
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [openib-general] round 2 - proposal for socket 
 basedconnection model
 
 
 Arkady:
 
 I may actually have a constructive comment about the protocol 
 (private data format). One thing I noticed is that *almost* 
 everything in the private data header is available in the 
 native iWARP protocol header except the ZB and SI bits.  If 
 these bits become part of the canonical private data header, 
 then does that require an iWARP transport to use the header 
 too even though only two bits are useful?
 
 Sorry if this is a dumb question,
 
 Tom
 
 On Tue, 2005-10-25 at 16:40 -0500, Tom Tucker wrote:
  Arkady:
  
  I don't think anyone disagrees with your goals. Unfortunately 
  additional requirements on the implementation were coupled with the 
  specification of the private data format (protocol). This 
 peripheral 
  discussion derailed any attempt to discuss the protocol.
  
  Attempts to separate the protocol discussion from the 
 implementation 
  failed. And so here we are...
  
  
  On Tue, 2005-10-25 at 15:38 -0400, Kanevsky, Arkady wrote:
   What are you trying to achieve?
   
   I am trying to define an IB REQ protocol extension that 
 support IP 
   connection 5-tuple exchange between connection requestor and 
   responder. And define mapping between IP 5-tuple and IB entities.
   
   That way ULP which was written to TCP/IP, UDP/IP, CSTP/IP (and so 
   on) can use RDMA transport without change. To modify ULP to know 
   that it runs on top of IB vs. iWARP vs. (any other RDMA 
 transport) 
   is bad idea. It is one thing to choose proper port to connect.
   Completely different to ask ULP to parse private data
   in transport specific way.
   
   The same protocol must support both user level ULPs
   and kernel level ULPs.
   Arkady
   
   Arkady Kanevsky   email: [EMAIL PROTECTED]
   Network Appliance phone: 781-768-5395
   375 Totten Pond Rd.  Fax: 781-895-1195
   Waltham, MA 02451-2010  central phone: 781-768-5300

   
   
-Original Message-
From: Sean Hefty [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 25, 2005 3:22 PM
To: Kanevsky, Arkady
Cc: Sean Hefty; openib-general@openib.org; [EMAIL PROTECTED]
Subject: Re: [openib-general] round 2 - proposal for socket 
based connectionmodel


Kanevsky, Arkady wrote:
 Sean,
 answers in-line.
 Arkady

At this point, I'm just going to disagree with this approach
and move on with 
the current implementation of the CMA.  What's needed is a 
service that provides 
IB connections using TCP/IP addressing.  I don't believe this 
proposal meets 
this goal.

To meet the requirement of connecting over IB using TCP/IP
addressing, I believe 
that we need a service with a reserved service 
 identifier or range of 
identifiers, a mechanism for mapping between IP and IB 
addresses, and a 
mechanism for reversing the mapping.

I don't see where the proposal addresses the bulk of the work
that's required, 
nor do I think that it will present an API to the user that 
does not expose IB 
related addressing (such as service IDs).

- Sean

   ___
   openib-general mailing list
   openib-general@openib.org 
   http://openib.org/mailman/listinfo/openib-general
   
   To unsubscribe, please visit 
   http://openib.org/mailman/listinfo/openib-general
  ___
  openib-general mailing list
  openib-general@openib.org 
  http://openib.org/mailman/listinfo/openib-general
  
  To unsubscribe, please visit 
  http://openib.org/mailman/listinfo/openib-general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] round 2 - proposal for socket based connectionmodel

2005-10-25 Thread Kanevsky, Arkady

Sean Hefty wrote:

 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 25, 2005 6:44 PM
 To: Kanevsky, Arkady
 Cc: Sean Hefty; openib-general@openib.org; [EMAIL PROTECTED]
 Subject: Re: [openib-general] round 2 - proposal for socket 
 based connectionmodel

 Kanevsky, Arkady wrote:
  What are you trying to achieve?

 I'm trying to define a connection *service* for Infiniband 
 that uses TCP/IP 
 addresses as its user interface.  That service will have its 
 own protocol, in 
 much the same way that SDP, SRP, etc. do today.

  I am trying to define an IB REQ protocol extension that support IP 
  connection 5-tuple exchange between connection requestor and 
  responder.

 Why?  What need is there for a protocol extension to the IB 
 CM?  To me, this is 
 similar to setting a bit in the CM REQ to indicate that the 
 private data format 
 looks like SDP's private data.  The format of the _private_ 
 data shouldn't be 
 known to the CM; that's why it's private data.

There is no requirement that the remote side uses the same Linux CM.
So in order to achieve interopability you need a protocol.
SDP hello-world protocol is defined for SDP.
We are defining an equivalent that is ULP independent.

If CM is not involved then it is ULP that populate the 5-tuple
info on requestor side and parses it on the remote side.
Thus, make ULP CM IB specific.
This is what we are trying to avoid.
ULP should not change regardless whether or not it is running
on IB, iWARP, VIA or any other RDMA transport.

iWARP does not need private data to pass 5-tuple.

  And define mapping between IP 5-tuple and IB entities.

 No mapping between IP - IB addresses was defined in the 
 proposal.  Defining 
 this mapping is required to make this work.  Right now, the 
 mapping is the 
 responsibility of every user.

  That way ULP which was written to TCP/IP, UDP/IP, CSTP/IP 
 (and so on) 
  can use RDMA transport without change.

 A ULP written to TCP/IP can use an RDMA transport without 
 change.  They use SDP. 
   However, an application that wants to take advantage of QP 
 semantics must 
 change.  (And if they want to take full advantage of RDMA, 
 they'll likely need 
 to be re-architected as well.)  The goal in that case becomes 
 to permit them to 
 establish connections using TCP/IP addresses.

 To meet this goal, we need to define how to map IP address to 
 and from IB 
 addresses.  That mapping is part of the protocol, and is 
 missing from the 
 proposal.  And if the application isn't going to know that 
 they're running on 
 Infiniband, then the mapping must also include mapping to a 
 destination service ID.

  To modify ULP to know that it runs on top of IB vs. iWARP
  vs. (any other RDMA transport) is bad idea.
  It is one thing to choose proper port to connect.
  Completely different to ask ULP to parse private data
  in transport specific way.
  The same protocol must support both user level ULPs
  and kernel level ULPs.

 Defining an interface that allows a ULP to use either iWarp, 
 IB, or some other 
 random RDMA transport is an implementation issue.  However, 
 it requires 
 something that maps IP to IB addresses (including service IDs).

 To be more concrete, you've gone from having source and 
 destination TCP/IP 
 addresses to including them in a CM REQ.  What translated the 
 source and 
 destination IP addresses into GIDs and a PKey?  Who converted 
 those into IB 
 routing information?  How was the destination of the CM REQ 
 determined?  What 
 service ID was selected?

IPoIB defines IP - GID
Port - IB Service ID (part of this proposal)
Pkey is configuration setup done by administrator.
Ditto for VLAN.

 - Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] configuring ipoib

2005-10-21 Thread Kanevsky, Arkady

Title: Message



How do you configure 
ipoib?
I used 
"ifconfig ib0 ip_address" which works fine.
But if I have 
several ports on an HCA how do I specify which port ip_address should be 
associated with?
Ditto if you have 
multiple cards.

Thanks,






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance 
phone: 781-768-5395
375 Totten 
Pond Rd. 
Fax: 781-895-1195
Waltham, MA 
02451-2010 
central phone: 781-768-5300


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] FW upgrade for TopSpin cards

2005-10-21 Thread Kanevsky, Arkady

Roland,
sorry to bug you on that but...

I have a Cisco HCA (PCI-X)
hca_typeMTS23108
hw_rev  a1
fw_ver  1.18.0

hca_type and hw_rev are clearly Mellanox nomenclature.
I suspect that this is Cisco FW version #.

But all OpenIB documentation is with respect to Mellanox
nomenclature.
For example from http://www.openib.org/docs/ipoib_faq.txt

1. Verify the firmware version via

cat /sys/class/infiniband/mthca0/fw_ver

For PCI-X HCAs, version 3.2.0 is recommended. For PCIe HCAs, version 
4.5.3 is recommended.

*

Is there analogous documentation for Cisco FW?
Where is that FW (this is Cougar card)?
Are Cisco FWs and Mellanox FW the same?
If yes what is the correspondance between the 2 numbering schemas.

While this specific question is for Cougar card,
the answer should be generic and cover all HCAs.

Can the documentation be updated to cover all supported HW
regardless of the vendor?

Thanks,

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, October 20, 2005 1:48 PM
 To: Kanevsky, Arkady
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] FW upgrade for TopSpin cards
 
 
 Arkady I get a bunch of warnings (see below).
 
 All of the warnings look benign (although you might want to 
 synchronize the clock between your build system and your file server).
 
 Arkady Can I use OpenIB tvflash to upgrade FW on a TopSpin card?
 
 Yes.
 
 Arkady Can I use OpenIB mstflint for it?
 
 Yes.
 
 Arkady Which version of the utilities should I use?
 
 I would use the latest subversion revision.
 
 Arkady Why warning when I build it?
 
 Because gcc 4.0 added a bunch of semi-bogus pointer sign 
 warnings, and you clocks are out of synch.
 
  - R.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol

2005-10-20 Thread Kanevsky, Arkady

OK.
I will update the proposal for IBTA based on this feedback
and all other feedback posted.
I will still separate private data usage proposal
and port mapping one.

If your Apps depends on 64 bytes of private data,
please, raise your voice now.
ARkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Richard Frank [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 19, 2005 7:19 PM
 To: Richard Frank; Lentini, James; Roland Dreier
 Cc: [EMAIL PROTECTED]; openib-general@openib.org; Davis, Arlin R
 Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP 
 emulationprotocol
 
 
 It's probably fine to go ahead and reduce the IPC private 
 data - I think we 
 (Oracle) can work around this.
 
 
 - Original Message - 
 From: Richard Frank [EMAIL PROTECTED]
 To: James Lentini [EMAIL PROTECTED]; Roland Dreier 
 [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; openib-general@openib.org; 
 Davis, Arlin R 
 [EMAIL PROTECTED]
 Sent: Wednesday, October 19, 2005 7:12 PM
 Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP 
 emulationprotocol
 
 
  Oracle's uDAPL ipc implementation uses 64 bytes of private 
 connection
  data - currently - some of this is the result of having 64 
 bytes to use at 
  the start - so we designed around this. We can probably reduce this 
  somewhat. And of course if we want to rewrite our 
 connection handling for 
  uDAPL (add our own wire protocol) we can probably skip 
 using the uDAPL 
  connection data all together.
 
  For RDS we use our own connection data sent via datagrams which has 
  always
  been part of the Oracle UDP ipc implementation.
 
  - Original Message -
  From: Roland Dreier [EMAIL PROTECTED]
  To: James Lentini [EMAIL PROTECTED]
  Cc: Richard Frank [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; 
  openib-general@openib.org; Davis, Arlin R 
 [EMAIL PROTECTED]
  Sent: Wednesday, October 19, 2005 5:56 PM
  Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP 
  emulationprotocol
 
 
 James The D is somewhat misleading. It refers to the
 James functionality provider to the consumer application.
 
  Right, that's what we're talking about.  The RDS 
 implementation only 
  needs a few bytes of private data on top of the IP address 
 info.  So 
  the RDS implementation itself is clearly OK with any of 
 the proposals 
  being discussed here.
 
  However, Rick mentioned that Oracle needs 64 bytes of 
 private data in 
  both directions for connections.  My question was how 
 Oracle works on 
  top of RDS, which does not provide any private data to consumers.
 
  - R.
 
  
 
 
 
 ___
 openib-general mailing list
 openib-general@openib.org 
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol

2005-10-20 Thread Kanevsky, Arkady

The updated proposal will have IP addresses and TCP ports of src and dst
in private data.

How TCP ports are mapped to IB service IDs is a separate proposal.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, October 20, 2005 11:51 AM
 To: Kanevsky, Arkady
 Cc: Richard Frank; Lentini, James; Roland Dreier; 
 [EMAIL PROTECTED]; openib-general@openib.org; Davis, Arlin R
 Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP 
 emulationprotocol
 
 
 Kanevsky, Arkady wrote:
  I will update the proposal for IBTA based on this feedback and all 
  other feedback posted. I will still separate private data usage 
  proposal and port mapping one.
 
 Again, I think that these should be in the same proposal.  
 The CM REQ carries 
 the IB transport layer address.  The goal here is to map 
 another transport layer 
 address to the IB one.  The source port is included in the 
 private data.  By not 
 including the destination port, there's an assumption that 
 it's provided 
 somewhere else in the CM REQ.  We should either make this 
 explicit, or put the 
 destination port in the private data as well.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol

2005-10-20 Thread Kanevsky, Arkady

with both SRC and DST IP addresses and TCP ports all these models will
be supported.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, October 20, 2005 12:26 PM
 To: Sean Hefty; Kanevsky, Arkady
 Cc: [EMAIL PROTECTED]; openib-general@openib.org; Lentini, 
 James; Davis, Arlin R
 Subject: RE: [dat-discussions] RE: [openib-general] Re: iWARP 
 emulationprotocol
 
 
  
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Sean Hefty
  Sent: Thursday, October 20, 2005 8:51 AM
  To: Kanevsky, Arkady
  Cc: [EMAIL PROTECTED]; openib-general@openib.org; Lentini, 
  James; Davis, Arlin R
  Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP 
  emulationprotocol
  
  Kanevsky, Arkady wrote:
   I will update the proposal for IBTA based on this feedback and all
   other feedback posted.
   I will still separate private data usage proposal and 
 port mapping 
   one.
  
  Again, I think that these should be in the same proposal.
  The CM REQ carries the IB transport layer address.  The goal 
  here is to map another transport layer address to the IB one. 
   The source port is included in the private data.  By not 
  including the destination port, there's an assumption that 
  it's provided somewhere else in the CM REQ.  We should either 
  make this explicit, or put the destination port in the 
  private data as well.
  
 
 Under the general programming model for an IP-centric daemon, 
 the listener can assume that connection requests will be for 
 the TCP port that the listen was issued upon.
 
 However, the daemon typically listens on *all* addresses that 
 the system supports. It is not uncommon for the application 
 to note which destination address was actually requested and 
 to vary the service provided based upon that. This is what 
 makes it possible for single machines to host vast numbers of 
 web sites.
 
 It is less common, but still requiring support, for the 
 daemon to differentiate service based upon the source 
 address. It is more common to simply refuse service based 
 upon the source 
 address, which can be handled by the CM or firewall itself 
 rather than by the application, but there are exceptions. 
 Some web-sites have intranet versus internet verions. Some 
 file servers control access lists based upon source address. 
 It is actually quite effective when combined with network 
 authentication of source addresses.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] FW upgrade for TopSpin cards

2005-10-20 Thread Kanevsky, Arkady

Title: Message



I want to upgrade FW on several TopSpin cards I have.
There is tvflash utility in gen2/trunk/src/userspace/tvflash
I tried to build tvflash on 2.6.13.3 system I have.
I get a bunch of warnings (see below).
gcc version is 
gcc version 4.0.0 20050519 (Red Hat 4.0.0-8).
What's the story?
Can I use OpenIB tvflash to upgrade FW on a TopSpin card?
Can I use OpenIB mstflint for it?
Which version of the utilities should I use?
Why warning when I build it?
Arkady
**
# make
make: Warning: File `.deps/src_tvflash-tvflash.Po' has modification time 
1.8e+04 s in the future
make all-am
make[1]: Entering directory 
`/u/arkady/openib/gen2/trunk/src/userspace/tvflash'
make[1]: Warning: File `.deps/src_tvflash-tvflash.Po' has modification time 

1.8e+04 s in the future
if gcc -DHAVE_CONFIG_H -I. -I. -I. -Wall -g -O2 -MT src_tvflash-tvflash.o 

-MD -MP -MF ".deps/src_tvflash-tvflash.Tpo" -c -o src_tvflash-tvflash.o `test 

-f 'src/tvflash.c' || echo './'`src/tvflash.c; \
then mv -f ".deps/src_tvflash-tvflash.Tpo" ".deps/src_tvflash-tvflash.Po"; 

else rm -f ".deps/src_tvflash-tvflash.Tpo"; exit 1; fi
src/tvflash.c: In function 'parse_guid':
src/tvflash.c:112: warning: pointer targets in passing argument 1 of 
'__builtin_strchr' differ in signedness
src/tvflash.c:117: warning: pointer targets in passing argument 1 of 
'strrchr' 
differ in signedness
src/tvflash.c:117: warning: pointer targets in assignment differ in 
signedness
src/tvflash.c:135: warning: pointer targets in passing argument 1 of 
'strrchr' 
differ in signedness
src/tvflash.c:135: warning: pointer targets in assignment differ in 
signedness
src/tvflash.c:205: warning: pointer targets in passing argument 1 of 'strtol' 

differ in signedness
src/tvflash.c: In function 'identify_board':
src/tvflash.c:702: warning: pointer targets in passing argument 1 of 
'strncasecmp' differ in signedness
src/tvflash.c: In function 'flash_image_read_from_file':
src/tvflash.c:828: warning: pointer targets in assignment differ in 
signedness
src/tvflash.c:830: warning: pointer targets in assignment differ in 
signedness
src/tvflash.c:832: warning: pointer targets in assignment differ in 
signedness
src/tvflash.c:844: warning: pointer targets in assignment differ in 
signedness
src/tvflash.c: In function 'flash_check_failsafe':
src/tvflash.c:905: warning: pointer targets in passing argument 2 of 
'validate_image' differ in signedness
src/tvflash.c:911: warning: pointer targets in passing argument 2 of 
'validate_image' differ in signedness
src/tvflash.c: In function 'create_ver_str':
src/tvflash.c:1033: warning: pointer targets in passing argument 1 of 
'snprintf' differ in signedness
src/tvflash.c:1039: warning: pointer targets in passing argument 1 of 
'snprintf' differ in signedness
src/tvflash.c:1044: warning: pointer targets in passing argument 1 of 
'snprintf' differ in signedness
src/tvflash.c:1046: warning: pointer targets in passing argument 1 of 
'strlen' 
differ in signedness
src/tvflash.c: In function 'identify_hca':
src/tvflash.c:1278: warning: pointer targets in passing argument 1 of 
'sscanf' 
differ in signedness
src/tvflash.c: In function 'identify_firmware':
src/tvflash.c:1399: warning: pointer targets in passing argument 1 of 
'sscanf' 
differ in signedness
src/tvflash.c: In function 'upload_firmware':
src/tvflash.c:1813: warning: pointer targets in passing argument 1 of 
'parse_guid' differ in signedness
src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 
'strlen' 
differ in signedness
src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 
'strlen' 
differ in signedness
src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 
'__builtin_strcmp' differ in signedness
src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 
'strlen' 
differ in signedness
src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 
'__builtin_strcmp' differ in signedness
src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 
'__builtin_strcmp' differ in signedness
src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 
'__builtin_strcmp' differ in signedness
src/tvflash.c:1932: warning: pointer targets in passing argument 1 of 
'strncmp' differ in signedness
src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 
'strlen' 
differ in signedness
src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 
'strlen' 
differ in signedness
src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 
'__builtin_strcmp' differ in signedness
src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 
'strlen' 
differ in signedness
src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 
'__builtin_strcmp' differ in signedness
src/tvflash.c:1936: warning: pointer targets in passing argument 1 of 
'__builtin_strcmp' differ in signedness
src/tvflash.c:1936: warning: pointer

RE: [openib-general] FW upgrade for TopSpin cards

2005-10-20 Thread Kanevsky, Arkady

Thanks Roland.

I was worried about pointer sign 
warnings. Clock is not an issue.

Do you plan to fix the srcs so gcc 4.0 warning will not be generated?

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, October 20, 2005 1:48 PM
 To: Kanevsky, Arkady
 Cc: openib-general@openib.org
 Subject: Re: [openib-general] FW upgrade for TopSpin cards
 
 
 Arkady I get a bunch of warnings (see below).
 
 All of the warnings look benign (although you might want to 
 synchronize the clock between your build system and your file server).
 
 Arkady Can I use OpenIB tvflash to upgrade FW on a TopSpin card?
 
 Yes.
 
 Arkady Can I use OpenIB mstflint for it?
 
 Yes.
 
 Arkady Which version of the utilities should I use?
 
 I would use the latest subversion revision.
 
 Arkady Why warning when I build it?
 
 Because gcc 4.0 added a bunch of semi-bogus pointer sign 
 warnings, and you clocks are out of synch.
 
  - R.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [swg] RE: [openib-general] Re: [swg] Re: private data...

2005-10-20 Thread Kanevsky, Arkady

But that require changes to CM APIs vs a module on top of it
to parse and populate private data field.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, October 20, 2005 5:23 PM
 To: 'Fab Tillier'; 'Sean Hefty'
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: [swg] RE: [openib-general] Re: [swg] Re: private data...
 
 
 The same can be said of the starting local QPN, responder resource, 
 initiator depth, starting PSN, MTU, and so forth.  The CM 
 doesn't care 
 about these - the application does, as these settings affect how it 
 configures its QP and what features of its protocol it can use.
 
 Not exactly the same.  The connection cares about these, 
 and must be included as part of the connection protocol.
 
 There are a number of fields that are not used by the CM 
 state machine 
 that are included in these MADs already.  These fields are 
 defined in 
 the CM protocol not because they impact MAD processing in 
 the CM, but 
 because they represent minimum information needed to 
 configure a QP and 
 client.
 
 Exactly.  The IP address does not configure the QP.
 
 What you're advocating is that a service ID can support two 
 private data formats depending on if a bit in the CM REQ is 
 set or not.  (If only a single format is supported, then the 
 bit is not needed.)  This is the wrong place to store this 
 information.  The format of the data beyond the addressing 
 information is not conveyed by this bit, so additional 
 information about the private data format is still needed.
 
 You can grab several reserved bits from the REQ and define it 
 as a private data version, but then apps that care about 
 this could just as easily record the version in the private 
 data itself.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol

2005-10-19 Thread Kanevsky, Arkady

Title: Message

Arlin,
just
to clarify, Intel MPI will not have problems with useing less than 64
bytes
of
private data.
Ifa solution will provide you with 48 bytes of
private data will it be sufficient?
Arkady

Arkady Kanevsky
email: [EMAIL PROTECTED]
Network
Appliance
phone: 781-768-5395
375 Totten
Pond Rd.
Fax: 781-895-1195
Waltham, MA
02451-2010
central phone: 781-768-5300

-Original Message-From: Davis, Arlin R
[mailto:[EMAIL PROTECTED] Sent: Wednesday, October 19, 2005
11:30 AMTo: [EMAIL PROTECTED]; Grant
GrundlerCc: [EMAIL PROTECTED];
openib-general@openib.orgSubject: RE: [dat-discussions] RE:
[openib-general] Re: iWARP emulationprotocol

Arkady,

Intel MPI (real
consumer of uDAPL) has no problem with this
change.

-arlin

From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Kanevsky,
ArkadySent: Wednesday,
October 19, 2005 6:40 AMTo:
Grant Grundler; Caitlin BestlerCc: Roland Dreier; [EMAIL PROTECTED];
[EMAIL PROTECTED]; openib-general@openib.orgSubject: [dat-discussions] RE:
[openib-general] Re: iWARP emulation
protocol

Grant,The developers of the application(s) in questions are aware
of thediscussion.I
will leave it to them to respond.I bring the discussion point at the weekly DAT
Collaborative meetingwhich we
have every Wednesday.I
appologize that the DAT Collaborative charter does not
allowto submit contribution
without joining DAT Collaborative.But this is no different from Linux not accepting any
contrubutionswithout proper
license.Byt be rest assure that
as a Chair I bring the concernsand suggestions stated in email discussion at the DAT
meetings.ArkadyArkady
Kanevsky
email: [EMAIL PROTECTED]Network
Appliance
phone: 781-768-5395375 Totten Pond
Rd.
Fax: 781-895-1195Waltham, MA 02451-2010
central phone: 781-768-5300 -Original Message- From: Grant Grundler [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 18,
2005 8:02 PM To: Caitlin
Bestler Cc: Grant Grundler;
Roland Dreier; Kanevsky, Arkady; [EMAIL PROTECTED]; [EMAIL PROTECTED];

openib-general@openib.org
Subject: Re: [openib-general] Re: iWARP emulation
protocol
On Tue, Oct 18, 2005 at 04:40:54PM -0700, Caitlin
Bestler wrote:
Roland (and the rest of us) would like to see someone name
a real consumer of
the proposed interface. ie who depends onthis change? Then the dependency for that use/user can be
discussed and
appropriate tradeoffs made. Make sense?
Unfortunately not every application that is under
development, or even
deployed, can be
discussed in a google-searchable public forum. That especially applies to user-mode
development.
Well, this is open source.
While I don't want to preclude closed source developement, it's usually necessary to
have an open source consumer
that any open source developer can test with.
So I could have actually tested such applications and still not be
free to cite them
here.
Understood. I'm not asking
*you* to cite one unless you
happen to own one of the consumers.
With any luck some of them
are following the discussion and will jump in on their own.
Unfortunately, since
they are developing to uDAPL they are unlikely to be following this
discussion.
It doesn't help that the DAT
yahoo-groups.com mailing list is rejecting my replies. It would be helpful if
someone following this forum
could share Roland's question with DAT mailing list if it didn't make it there already and
possibly explain why naming
a consumer is necessary.

hth,
grant

YAHOO! GROUPS
LINKS

Visit your group
"dat-discussions" on
the web.
To unsubscribe
from this group, send an email to:[EMAIL PROTECTED]

Your use of Yahoo!
Groups is subject to the Yahoo!
Terms of Service.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] RE: [openib-general] Re: iWARP emulationprotocol

2005-10-19 Thread Kanevsky, Arkady

Sean,

if look at the proposal it shows 2 ways to address this.

1. Have 2 protocols.
One just send SRC IP address and port, and provdie 64 bytes to ULP.
Another one send both SRC and DEST info and leaves 48(+-) bytes of
private data for ULP.

2. Have 2 protocols.
Split IPv4 and IPv6 methods.
For IPv4 send SRC and DST addressing and 64 bytes of ULP private data.
For IPv6 we have several options.
a. GID=IPv6 address
b. use second CM frame to have carry ULP private data.
c. others

But having multiple versions supported is not pleasant.
It looses a simple backwards compatibility of current
protocol which just formats CM private data field.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 19, 2005 1:00 PM
 To: Richard Frank
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 openib-general@openib.org; Davis, Arlin R
 Subject: Re: [dat-discussions] RE: [openib-general] Re: iWARP 
 emulationprotocol
 
 
 Richard Frank wrote:
  Oracle currently depends on 64 bytes of private data for connect and
  accept.
 
 Is any of that data used to exchange address information?
 
 It's impossible to provide both the source and destination 
 address in the CM REQ 
 private data and still give the user 64 bytes.  The source 
 address is needed for 
 the reverse GID-IP lookup.  Can we make due without the 
 destination address?
 
 - Sean
 ___
 openib-general mailing list
 openib-general@openib.org 
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: iWARP emulation protocol

2005-10-18 Thread Kanevsky, Arkady

uDAPL users.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Roland Dreier [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 18, 2005 2:19 PM
 To: Kanevsky, Arkady
 Cc: Yaron Haviv; openib-general@openib.org; 
 [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: iWARP emulation protocol
 
 
 Arkady The proposed protocol will be used by both kernel and user
 Arkady space Consumers.  There are existing Consumers that rely
 Arkady on 64 bytes of private data.
 
 Which consumers are these?
 
  - R.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: iWARP emulation protocol (was: [openib-general] RDMA connection andaddress translation API)

2005-10-18 Thread Kanevsky, Arkady

Sean,

 For the REQ to find its way to the destination, the 
 destination address must be 
 known beforehand.  We shouldn't need to pass any data in the 
 REP.  The CMA 
 passes both the source and destination address information in 
 the REQ, but only 
 uses the destination to validate against a listen request.  
 The source address 
 is passed to the user.

CM passes IB addresses of both src and dest in REQ.
How locally dest IP address is mapped to dest IB GID|LID is
defined by IPoIB.
We can request IBTA to define it also.
But the goal is to define a protocol part in IBTA. 

You are correct that if rely on CM storing the IP address of the dest
it is not needed to be passed back in REP.
If we do not need to know that response came from a different IP
address.
Or a different port.

 The slides should also discuss how to map from a TCP/IP 
 address to a service ID, 
 so that a REQ can match up with the correct listener.  The 
 approach currently 
 taken by the CMA is to use the openib OUI  48 + TCP port number.
 

Correct.
If we want IBTA to define a full mapping of addresses and ports then
yes.
But that does not change the protocol, it is local agreement
that must be the same on both sides of the connection.
I will include it in the next version.

Thanks,
Arkady



Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: iWARP emulation protocol (was: [openib-general] RDMA connection andaddress translation API)

2005-10-18 Thread Kanevsky, Arkady

I think it is better to use some of the CM REQ reserved field for it
so it will be separate from Addressing.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Sean Hefty [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 18, 2005 2:55 PM
 To: Kanevsky, Arkady
 Cc: Roland Dreier; Yaron Haviv; [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]; openib-general@openib.org
 Subject: Re: iWARP emulation protocol (was: [openib-general] 
 RDMA connection andaddress translation API)
 
 
 Kanevsky, Arkady wrote:
  Enclosed is the proposal to IBTA to add this functionality to CM 
  protocol.
 
 A couple of other notes.
 
 Combine major/minor version into a single version, which is 
 what you essentially 
 have anyway.
 
 I have no clue what zero based virtual address exception 
 means, but that and 
 the SI bit seem out of place in a header containing TCP/IP 
 address information. 
 I would say save the two bits and have a cleaner header.
 
 - Sean
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: iWARP emulation protocol

2005-10-18 Thread Kanevsky, Arkady

 
 An additional space preserving option that Arkady did not 
 mention is limiting the IP alias service to IPv4 addresses. 
 Anyone who really wants IPv6 addresses can get their SM to 
 assign IPv6 compatible GIDs. Of course the flat IPv6 option 
 is far simpler, and probably should be used unless a specific 
 application is identified where those extra 96 bits makes the 
 difference between making the private data be rewritten or left as is.
 

This can be an extension to proposal 3 of last page.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 18, 2005 3:16 PM
 To: Roland Dreier; Kanevsky, Arkady
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
 openib-general@openib.org
 Subject: RE: [openib-general] Re: iWARP emulation protocol
 
 
  
 
  -Original Message-
  From: Roland Dreier
  Sent: Tuesday, October 18, 2005 11:41 AM
  To: Kanevsky, Arkady
  Subject: [openib-general] Re: iWARP emulation protocol
  
  Arkady uDAPL users.
  
  
  2) Are there real users or is this a generic uDAPL API thing?
  
 
 uDAPL vs. kDAPL is irrelevant here. The user or Kernel 
 Consumer making the connection does not know whether their 
 peer is running in user or kernel, nor should they.
 
 Every discussion of reducing the guaranteed private data size 
 in DAPL has produced adverse reactions from application 
 developers. They're either very good actors or were working 
 on actual applications.
 
 An additional space preserving option that Arkady did not 
 mention is limiting the IP alias service to IPv4 addresses. 
 Anyone who really wants IPv6 addresses can get their SM to 
 assign IPv6 compatible GIDs. Of course the flat IPv6 option 
 is far simpler, and probably should be used unless a specific 
 application is identified where those extra 96 bits makes the 
 difference between making the private data be rewritten or left as is.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] RE: iWARP emulation protocol

2005-10-18 Thread Kanevsky, Arkady

Sean wrote:

 I'm not sure how much we should care about higher level 
 abstractions for this 
 discussion.  We should do what's right for IB.  Abstractions 
 that want to use IP 
 addresses can either use the standard protocol defined by the 
 IBTA or define 
 their own private data.

Correct. But we should define standard protocol suited for most apps
to avoid creations of multiple apps specific protocols.

 
 To me, it seems that the most flexible solution is to pass 
 the source and 
 destination IP address in the CM REQ. 

I agree. This is the cleanest and most simple
to define.
But it impacts some existing apps.
That is why DAT has 64 bytes private data req.
We do not loose too many users by the time we define the complete
solution stack.


 We can then define a 
 standard mapping 
 from TCP port numbers to IB service records, or change the CM 
 version to read 
 into the private data.  What's wrong with this approach?

It is the standard mapping which we just spend 1 hour discussing
at SWG. What is that standard mapping if it is native IB?
IPoIB as intermediate layer? SDP as intermediate layer?
What is the standard TCP port for iSER (pick your ULP) native over RDMA
vs.
the same ULP over IPoIB?

This have to be defined. But is it part of the IP address and TCP port
info sharing between 2 sides of the connection proposal or a separate
proposal?
I think it is separate proposal but both will have to be in place
to support iWARP emulation.

Arkady





Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] license mismatches

2005-08-29 Thread Kanevsky, Arkady

I had reviewed the licenses used by files in
https://openib.org/svn/gen2/trunk.
The following .c and .h files do not match the OpenIB licenses:
https://openib.org/svn/gen2/trunk/src/userspace/tvflash/src/tvflash.c
https://openib.org/svn/gen2/trunk/src/userspace/tvflash/src/firmware.h
https://openib.org/svn/gen2/trunk/src/userspace/examples/aio/ttcp.aio.c
https://openib.org/svn/gen2/trunk/src/userspace/management/osm/complib/M
akefile.mlx
https://openib.org/svn/gen2/trunk/src/userspace/management/osm/opensm/os
m_indent

all files in directories:
https://openib.org/svn/gen2/trunk/src/userspace/mstflint/
https://openib.org/svn/gen2/trunk/src/userspace/mpi/

files in directory
https://openib.org/svn/gen2/trunk/src/userspace/libsdp/src/
have the right licenses but the copyright message does not match the
OpenIB copyright.

Several files do not have any licences, like Makefile, configure and map
files.
For example,
https://openib.org/svn/gen2/trunk/src/userspace/libibcm/src/libibcm.map
https://openib.org/svn/gen2/trunk/src/userspace/libibcm/Makefile.am
I think this is OK.

I suspect that all these are oversites and all the files should be
available under both BSD and GPL2 licenses.

Thanks,
Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [iSER]How to get the dat_headers_1_1.tgz

2005-08-03 Thread Kanevsky, Arkady

The files are available to all.
The posting to reflector are for members only.
If you still have problems
they can be made available at http://www.datcollaborative.org/.
1.2 headers are available on it.


Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Tom Duffy [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 03, 2005 2:41 PM
 To: Ian Jiang
 Cc: openib-general@openib.org; 
 [EMAIL PROTECTED]; Kanevsky, Arkady
 Subject: Re: [openib-general] [iSER]How to get the dat_headers_1_1.tgz
 
 
 On Wed, 2005-08-03 at 17:14 +0800, Ian Jiang wrote:
  It's known to all that the kDAPL 1.1 is needed to build the iSER. I 
  failed
  to get
  
 http://groups.yahoo.com/group/dat-discussions/files/dat_header
s_1_1.tgz
 because only the members of the group could access this file

This is dumb.  Can Arkady just open the files up to anyone?

If not, groups.yahoo.com should not be used for an open source project.

-tduffy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [iSER]How to get the dat_headers_1_1.tgz

2005-08-03 Thread Kanevsky, Arkady



Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Tom Duffy [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 03, 2005 3:37 PM
 To: Kanevsky, Arkady
 Cc: Ian Jiang; [EMAIL PROTECTED]; 
 openib-general@openib.org
 Subject: RE: [openib-general] [iSER]How to get the dat_headers_1_1.tgz
 
 
 On Wed, 2005-08-03 at 15:18 -0400, Kanevsky, Arkady wrote:
  The files are available to all.
 
 http://groups.yahoo.com/group/dat-discussions/files/dat_header
s_1_1.tgz
 To access Yahoo! Groups...
 
 you need a Yahoo! ID.
 Don't have a Yahoo! ID?
 Signing up is easy.

that is *not* available to all...

-tduffy


dat_headers_1_1.tgz
Description: dat_headers_1_1.tgz
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] comments on DAT registry in OpenIB

2005-06-29 Thread Kanevsky, Arkady

Title: Message




Dear DAT and OpenIB members,
There isa 
debate going on on OpenIB and DAT reflectors which is going around about kDAT 
registry for Linux.

I would like to 
review the requirements wehad 
agreedat DAT 
collaborativeand captured in the 
kDAT
and uDAT specs and review DAT registry in OpenIB from that 
prospective.


I would like to make 
it clear that I do NOT speak on behalf of DAT Collaborative 
but
as just one of its 
members.


  Ability of Consumers to open IA based on its 
  name 
  
OpenIB supports 
it
  Support for Consumers to get a list of available IAs 
  to open 
  
OpenIB kDAT and uDAT registry provide 
this 
OpenIB kDAT registry no longer provide 
Provider attributes as stated above
Preserves 
dat_registry_list_providers
for kDAPL 
OpenIB changed dat_provider_info format so binary compatibility not 
preserved, but source compatibility is preserved.

  dat_provider_info differs between uDAPL and kDAPL 
  in OpenIB
  Ability to enumerate available IAs and their 
  attributes 
  
OpenIB 
supports that for uDAPL unchanged from DAPL SF RI 

for kDAPL openIB 
supports a single type of thread-safety defined by the Linux kernel and the 
version of Linux kernel defines the kDAPL APIs that kernel version 
supports.

  for kDAPL query will not return DAT version and 
  thread safety Provider 
  attribute.
  Map IA_name to 
  Provider library (kDAPL or uDAPL) 
  
OpenIB 
kDAPL and uDAPL support this
  Ability for 
  DAT providers to dynamically register and deregister DAPLProvider 
  
OpenIB supports 
that for kDAPL and uDAPL
All existing registry APIs at DAPL SF RI are 
preserved
  Single static DAT 
  registry - platform specific 
  
kDAT and uDAT 
specs explicitly state that the DAT registry is defined by the platform and 
DAT 
collaborative provided an example of Registry for Linux and Windows and 
agree that the DAT provided registry should be used by all providers. This 
ensures that DAT Registry will support all Providers and DAT registry from 
one vendor does not block other providers. 
OpenIB kDAT 
registry is the Linux platform DAT registry which achieves the goal of 
supporting all kDAPL providers. It also provides additional benefit that it 
is Linux core which maintain kDAT registry instead of DAT 
Collaborative
OpenIB uDAT 
registry remains the DAT collaborative one unchanged. 
We can discuss 
whether or not we want to get uDAT registry closer to the OpenIB kDAT 
one 
The DAT 
registry for kDAPL and uDAPLare 
differentat DAPL SF RI and OpenIB maintains 
it.
Some changes may be needed for kDAPL Registry hot 
plug support for OpenIB. How it may impact uDAPL 
registry.
  ia_name is under 
  system admin control 
  
remains the 
same following a platform 
convention
  IA can 
  represent 
  

  single 
  port 
  several 
  ports 
  several HBAs or 
  RNICs 
  multiple IAs 
  represent the same port
OpenIB kDAPL 
currently implements #1. Members can submit code patches to support other 
choices 
OpenIB 
uDAPL remains the same with current implementation providing #1 under Provider 
control.
  Support for 
  Consumers to get a list of available IAs to open 
  
OpenIB kDAT and 
uDAT registry provide this 
OpenIB kDAT 
registry no longer provide Provider attributes as stated 
above
  DAT registry 
  supports loading multiple DAPLProviders intothe same address space. 
  

  A Provider 
  library loaded into an address space once 
  A Provider 
  library unloaded only when all open instances of its IAs are 
  closed 
  The same 
  Provider library can be loaded into multiple address 
  spaces
OpenIB uDAPL 
continues to provide it 
OpenIB kDAPL 
supports it
  DAT registry shall 
  support polymorphism (Provider independency) 
  

  Consumer call 
  DAT functions by the DAT handle independently from Provider is 
  used 
  DAT registry 
  provides redirection 
  dat_ia_open is Provider specific and sets up 
  redirection table per address space per Provider
  
first time open ensures that table redirection 
for a Provider is set up
OpenIB kDAT and 
uDAT registry provide that 

  OpenIB kDAT 
  registry preserves the DAT redirection table as defined by DAT 
  Collaborative 
  OpenIB kDAT 
  registry preserves DAT_provider structure 
  
need to file errata to DAT to move dat_ia_close 
after dat_ia_query to match DAPL SF RI and OpenIB one for kDAPL and 
uDAPL
  The DAT_handle 
  structure first field provides a pointer for redirection 
  
OpenIB kDAT and 
uDAT registry support 
this

  DAT registry

RE: [openib-general] IB Address Translation service

2005-03-02 Thread Kanevsky, Arkady

Some historical perspective - ATS was defined prior to IPoIB.

The requirements.
DAT has two needs:
1. forward translation: given an IP address returns back IB GID/LID.
2. reverse translation: given IB GID/LID returns back an IP address of
the requestor.

ULPs: NFS, DAFS.

SDP encoded IP addresses into its headers.
But DAT is API and cannot define a protocol for it.

Abstract address translation is a good idea.
For IB we can use ATS or IPoIB.
For iWARP it will be no-op.
We must ensure that the DAPL that we submit to Linux can be layered on
top of all RDMA transports.

Since IPoIB had not had plugfest/connectathon or some other interop that
demonstrate ARP and RARP
I suggest we have both ATS and IPoIB support.
ATS has been fully successfully tested at DAPL Plugfest.

In DAPL we had not assessed the HA requirements implications on address
translations
which is currently under discussion.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Tom Duffy [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, March 01, 2005 6:02 PM
 To: Yaron Haviv
 Cc: openib-general@openib.org
 Subject: RE: [openib-general] IB Address Translation service
 
 
 [ putting back on list ]
 
 On Wed, 2005-03-02 at 00:29 +0200, Yaron Haviv wrote:
  Did you try RARP with IPoIB ?
 
 I have not.
 
  I thought that there is some issue that it doesn't work
 
 Currently, the rarpd only works with ethernet, but I don't 
 see why this couldn't be fixed.
 
  Also I hope you can comment on the other ib_at capabilities 
 which are 
  more important than ATS
 
 I don't mind the idea of abstracting out address translation. 
  I think maybe this is a premature optimization and we should 
 see how each ULP uses/does it first, then abstract out common 
 code.  Otherwise, I feel neither strongly for or against your 
 proposal.
 
 -tduffy
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop

2005-02-16 Thread Kanevsky, Arkady

woody,
If Open IB wants to go with LGPL license it is fine with us.
We will need to take a voit on DAT Collaborative onit also.
But from what I see on reflector LGPL license is outside the current
bylaws of
Open IB and there is a discussion on it going on.

SO until this issue is resolved on Open IB we will add GPL license to
uDAPL (and kDAPL)
and will work on getting it ready for submission to Open IB.
Once the license the GPL license is approved by DAT we can move the dev
work on Open IB SF.
If Arlin is available he can start the work now.
Lets first identify what areas require changes. 

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance phone: 781-768-5395
375 Totten Pond Rd.  Fax: 781-895-1195
Waltham, MA 02451-2010  central phone: 781-768-5300
 


 -Original Message-
 From: Woodruff, Robert J [mailto:[EMAIL PROTECTED] 
 Sent: Monday, February 14, 2005 7:48 PM
 To: Kanevsky, Arkady
 Cc: openib-general@openib.org; Davis, Arlin R; Matt Leininger
 Subject: RE: FW: [openib-general] Minutes from DAPL BOF at 
 OpenIB Workshop
 
 
  
 Hi Arkady,
 
 As I mentioned in the BOF, I have a person (Arlin Davis) that 
 can help with developing a uDAPL provider for the openib.org verbs. 
 After discussing it more
 with folks here, is seems to us that perhaps for the uDAPL 
 user-mode library,  it be provided to openib.org under a dual 
 BSD + LGPL library rather than a BSD + GPL since people 
 normally want to use LGPL for 
 libraries. 
 
 Also, I think that we can go ahead and start porting using the BSD 
 license available from sourceforge today. Once the port is 
 complete, we can submit it to openib.org under the dual 
 license and submit 
 any changes back to the main sourceforge project under the 
 BSD license, or you can simply accept the changes under the 
 BSD license, thus for uDAPL there is no need to wait for the 
 expanded licensing terms of the sourceforge project.
 
 Once the user-mode verbs, user-mode CM, SA support is 
 available from openib.org, 
 we can get started. If this is OK with folks, I'll have Arlin 
 start to take a look at this.
 
 Sound OK ?
 
 woody
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

76 matches

Mail list logo