RE: [dat-discussions] [openib-general][RFC]DAT2.0immediatedataproposal

2006-04-21 Thread Sean Hefty
We need a better job coordinating between 2 reflectors.

One issue is that someone must subscribe to the dat-discussion list to post to 
it.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal

2006-04-20 Thread Tom Tucker
On Wed, 2006-02-08 at 23:20 -0800, Sean Hefty wrote:
 Hmm.  Can you put a number on how much better RDMA write with
 immediate is on current HCA hardware?  How does using the underlying
 OpenIB verbs ability to post a list of work requests compare (ie
 posting an RDMA write followed by a send in one verbs call)?
 Maybe post multiple is a better direction for DAT.
 
 A post multiple call as a general API makes sense, but I think that's a
 separate issue.
 
 Given that IB provides true immediate data with RDMA writes, a way should be
 available to make use of it.  I don't know what the performance numbers 
 between
 using a write with immediate versus a write followed by a send, but I don't
 think that anyone could argue that the write with immediate wouldn't perform
 better.
 
 To me, the question is whether write with immediate is supported as a 
 transport
 specific extension, which was Arlin's original patch, or through some standard
 API.  The attempt to make the API standard, so that iWarp could emulate it
 (poorly in my view), is what appears to be driving the disagreements.
 
 It also appears to me that the decisions are coming down to one of the
 following.  If iWarp can emulate write with immediate, then a generic API 
 should
 be used.  

This opens Pandora's box. Should iWARP also emulate ATOMICs? Which
should be emulated and which should not ... What are the criteria for
deciding? 

 If iWarp cannot properly emulate write with immediate, then the API
 should be transport specific.  

It should be transport specific because it is a transport specific
feature. Although -- in this case -- it could but implemented in iWARP
in my view it _should_ not. 

 It's curious to me that in both cases, iWarp is
 driving the API decision and design for something that is an IB specific
 feature.

Huh?

 
 - Sean
 
 
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general][RFC]DAT2.0immediatedataproposal

2006-02-09 Thread Larsen, Roy K

But why define an IB specific feature when a transport neutral feature
can be defined?

Viewing the operation as Write with following Send maintains transport
neutral semantics AND allows IB to encode it as a Write with Immediate.

That avoids IB to use the silicon that already exists to support
compressing
the Write and Send into a single message. That is the real benefit,
isn't
it?

No, it's not

And for both transports it enables the Provider to pass the 4 byte
immediate
data by value rather than by registered reference. So there is a
definite
benefit
to IB, and a potential benefit to IP, and it works for both transports.

The *only* thing gained by making it a transport specific method is
the implicit 33rd bit in the that RDMA Write payload you asked for
has arrived message.


Ok, finally.  A realization that the semantics of write/send are not the
same as IB write with immediate data.  And the difference is important.
The proposed emulation could not pass a black box test since nothing
distinguishes an immediate receive message from standard one
containing rkeys or any other random data an application my need to
exchange through send/receive.  A true write with immediate data can
pass such a black box test because it offers a unique service whereas
the proposed emulation does not.  It is a helper function that uses
existing services.  I have no objection to a write/send helper function,
just call it that and not write with immediate data.  Leave the true
immediate data service as an extension as first proposed.

Is there a concrete example of any benefit from encoding a 33rd bit in
the selection of Write with Immediate versus Write followed by 32-bit
Send?

Yes, as stated several times, applications that use the send/receive
facility to exchange information such as rkeys as well as using write
immediate services must be able to unambiguously tell the difference
between receive indications.  Putting a requirement on the application
to make that distinction by their own devices provides no additional
service that they don't already have in existing APIs.

Roy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Arlin Davis

Roland Dreier wrote:

   
Hmm.  Can you put a number on how much better RDMA write with

immediate is on current HCA hardware?  How does using the underlying
OpenIB verbs ability to post a list of work requests compare (ie
posting an RDMA write followed by a send in one verbs call)?
Maybe post multiple is a better direction for DAT.
 

With post multiple, unlike immediate data, you don't have the ability to 
distinguish between a normal receive and a rdma write completion 
indication on the other end. This is the uniqueness of the service that 
cannot be provided by the post multiple. Yes, post multiple would be a 
nice option for DAT it is just a different service. It would also be 
required to conform to the semantics rules of the bundled operations so 
you could not do any optimization tricks under the covers with an IB 
rdma_write_immediate operation.


-arlin


- R.

 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Caitlin Bestler
[EMAIL PROTECTED] wrote:
 Roland Dreier wrote:
 
 
 Hmm.  Can you put a number on how much better RDMA write with
 immediate is on current HCA hardware?  How does using the underlying
 OpenIB verbs ability to post a list of work requests compare (ie
 posting an RDMA write followed by a send in one verbs call)?
 Maybe post multiple is a better direction for DAT.
 
 
 With post multiple, unlike immediate data, you don't have the
 ability to distinguish between a normal receive and a rdma
 write completion indication on the other end. This is the
 uniqueness of the service that cannot be provided by the post
 multiple. Yes, post multiple would be a nice option for DAT
 it is just a different service. It would also be required to
 conform to the semantics rules of the bundled operations so
 you could not do any optimization tricks under the covers
 with an IB rdma_write_immediate operation.
 

A post_multiple also requires defining a single DTO data 
structure. If the post multiple is atomic (meaning all make
it or none do) then it requires an intermediate data structure
to have been created. If it is not atomic there really isn't
reason for it to not just be a utility function layered 
above DAT.

What I'm not seeing with the immediate is this urgent need
by the application to be able to use the same 32-bit value
for both an immediate and a 4 byte message that requires
an entire additional API just to support it.  Why can't
the application just add a bool to the send message?
Or encode the 32-bits so that they come from disjoint
domains?

There seems to be agreement that a consolidated write-and-send
call would enable the application to get the benefits of
rdma write with immediate whenever the application could
distinguish the two.

I cannot see why doing this is almost free for virtually
all applications, and trivial for the remainder. Adding
and documenting an extra call to deal with such an
extreme corner case that is being presented only in
the abstract is just not justified. This extra capability
has to have enough functionality for enough applications
to justify keeping it on the books, writing test cases
for it, etc.

We already made a similar decision in having a 128-bit
IA Address. That means we cannot support a host that
interfaces to the Internet with IPv6 and an InfiniBand
network that not only had global GIDs, but allocated
a global subnetwork a network id that was already in
use as a valid public IPv6 network.

The complexity of dealing with an IA Address that was
128+1 bits was simply not jusitified to deal with
an extreme corner case that could very easily be
avoided (there is no shortage of site local network
IDs in the IPv6/GID format, so using a global network
prefix that was disjoint from the official IPv6 
hierarchy would be just plain silly).

So far I haven't seen any explanation as to why an
application has a need to encode this 33rd bit of
their message in this terribly transport specific
matter. Is there some severe performance penalty
to slightly restructuring the send message so that
it is no longer ambiguous with the immeidate data?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Michael Krause


At 03:36 PM 2/8/2006, Arlin Davis wrote:
Roland Dreier wrote:
 Michael So,
here we have a long discussion on attempting to
 Michael perpetuate a concept that is not universal
across
 Michael transports and was deemed to have minimal value
that most
 Michael wanted to see removed from the
architecture.
But this discussion is being driven by an application developer who
does see value in immediate data.
Arlin, can you quantify the benefit you see from RDMA write with
immediate vs. RDMA write followed by a send?

We need speed and simplicity.
A very latency sensitive application that requires immediate notification
of RDMA write completion on the remote node without ANY latency penalties
associated with combining operations, HCA priority rules across QPs, wire
congestion, etc. An application that has no requirement for messaging
outside of remote rdma write completion notifications. The application
would not have to register and manage additional message buffers on
either side, we can just size the queues accordingly and post zero byte
messages. We need something that would be equivelent to setting there
polling on the last byte of inbound data. But, since data ordering within
an operation is not guaranteed that is not an option. So, rdma with
immediate data is the most optimal and simplistic method for indication
of RDMA-write completion that we have available today. In fact, I would
like to see it increased in size to make it even more
useful.
RDMA Write with Immediate is part of the IB Extended Transport
Header. It is a fixed-sized quantity and not one subject to change,
i.e. increasing its size.
Your argument above reinforces that the particular application need is
IB-specific and thus should not be part of a general API but a
transport-specific API. If the application will only operate
optimally using immediate data, then it is only suitable for an IB
fabric. This reinforces the need for a transport-specific
API.
Those applications that simply want to enable completion notification
when a RDMA Write has occurred can use a general purpose API that is
interconnect independent and whose code is predicated upon a RDMA Write -
Send set of operations. This will enable application portability
across all interconnect types.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Larsen, Roy K
 Hmm.  Can you put a number on how much better RDMA write with
 immediate is on current HCA hardware?  How does using the underlying
 OpenIB verbs ability to post a list of work requests compare (ie
 posting an RDMA write followed by a send in one verbs call)?
 Maybe post multiple is a better direction for DAT.


 With post multiple, unlike immediate data, you don't have the
 ability to distinguish between a normal receive and a rdma
 write completion indication on the other end. This is the
 uniqueness of the service that cannot be provided by the post
 multiple. Yes, post multiple would be a nice option for DAT
 it is just a different service. It would also be required to
 conform to the semantics rules of the bundled operations so
 you could not do any optimization tricks under the covers
 with an IB rdma_write_immediate operation.


A post_multiple also requires defining a single DTO data
structure. If the post multiple is atomic (meaning all make
it or none do) then it requires an intermediate data structure
to have been created. If it is not atomic there really isn't
reason for it to not just be a utility function layered
above DAT.

That is very good point.  And since the emulated immediate data service
can't make the atomic guarantee it is the killer argument for just
making the service plain - a potentially more efficient write/send.


What I'm not seeing with the immediate is this urgent need
by the application to be able to use the same 32-bit value
for both an immediate and a 4 byte message that requires
an entire additional API just to support it.  Why can't
the application just add a bool to the send message?
Or encode the 32-bits so that they come from disjoint
domains?

Some applications can do as you suggest.  Some applications can make
good use of unambiguous indications where the buffer size, content, or
arrival timing is not constrained.  Some don't need write notification
at all.  What's your point?


There seems to be agreement that a consolidated write-and-send
call would enable the application to get the benefits of
rdma write with immediate whenever the application could
distinguish the two.

Well, I think there is agreement that *some* applications can use
write-and-send in a beneficial way.  But then again, nothing prevents
them from doing that now.  They do not need an additional API.  But
again, I don't have an issue with defining a helper function.  I do have
an issue with defining an API and semantic that says the target side
needs to be coded in a way to always deal with both true immediate
data and emulation.  Just define a write/send helper API and the UPL can
be coded in a consistent manner if that is a beneficial service.  If a
true unambiguous indication service is more beneficial or required, it
can use the extension and accept the extra complexity.  To demand extra
complexity in applications that obviously don't need the true immediate
data semantic is just wrong in my option.


I cannot see why doing this is almost free for virtually
all applications, and trivial for the remainder. Adding
and documenting an extra call to deal with such an
extreme corner case that is being presented only in
the abstract is just not justified. This extra capability
has to have enough functionality for enough applications
to justify keeping it on the books, writing test cases
for it, etc.

All we're asking is that a write/send combined API not be called
immediate data unless it fits the semantics of immediate data.  I am
puzzled at the resistance this is getting.  There is a standards body
specification for immediate data.  If it is not followed, don't call it
immediate data.  It's that simple.  For those transports that can
provide the service, the UPL may be able to gain access to it through an
extension.

Roy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady
Why both Immediate Data and the Stag which was used for RDMA Write?
Immediate data already contains info in response to what operation
the RDMA Write has completed locally.

Stag would make sence if Stag invalidation also put in the mix.

But for MPI RMR_context have a long lifecycle so not clear which
apps will be interested in combining Invalidation with RDMA Write with
Immediate data.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 3:03 PM
 To: Larsen, Roy K; [EMAIL PROTECTED]; Arlin 
 Davis; Hefty, Sean
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
  Caitlin Bestler wrote:
  
  Arlin Davis wrote:
  Sean Hefty wrote:
  
  The requirement is to provide an API that supports RDMA writes 
  with immediate data.  A send that follows an RDMA write is not 
  immediate data, and the API should not be constructed around 
  trying to make it so.
  
  
  
  To be clear, I believe that write with immediate should 
 be part of 
  the normal APIs, rather than an extension, but should be 
 designed 
  around those devices that provide it natively.
  
  
  I totally agree. A standard RDMA write with immediate API can be 
  very useful to RDMA applications based on the requirements (native
  support) set forth in my earlier email. It is analogous to the new
  dat_ep_post_send_with_invalidate() call; a call that supports a 
  native iWARP transport operation but provides no 
 provisions to help 
  other transports emulate. So, other transports simply return 
  NOT_SUPPORTED and add it natively in the future if it makes sense.
  
  -arlin
  
  What is proposed in a definition of
  'dat_ep_post_rdma_write_with_immediate'
  that can be implemented over iWARP using the sequence of messages 
  that were intended to support the same purpose (i.e., letting the 
  other side know that an RDMA Write transfer has been fully 
 received).
  
  No, iWARP *CAN NOT* implement write immediate data any 
 better than IB 
  can implement send with invalidate.  Immediate data
  *MUST* be indicated to the ULP unambiguously.  Imposing an 
 algorithm 
  on the application to infer immediate data arrival is hack, 
 pure and 
  simple. An application is free to perform a write/send if 
 that is the 
  semantic they want.  Why does iWARP get transport unique 
 APIs but not 
  IB?  I find this attempt to bastardize the IB semantic of immediate 
  data a little curious.
  
 
 The transports aren't getting anything. Features are there 
 for applications, especially when the feature can be defined 
 in a way that makes sense without explaining transport mechanics.
 
 Completing a transaction, complete with supplying a 
 transaction response and releasing the advertised STag 
 associated with the transaction is something that makes sense 
 in the application domain and conforms to normal DAT ordering rules.
 
 Provide information about an RDMA Write to a receive operation
 also meets that definition -- as long as it conforms to the 
 existing ordering rules. Shifting to an 8 byte message over 
 iWARP to allow for the write length *and* immediate 'tag'
 is certainly doable. We could even consider having the DAT 
 Provider supply the 'buffer' silently in the DTO itself.
 
 With that definition the consumer would get a receive 
 completion that told them that their peer's RDMA Write had 
 been successfully placed, how long it is (the length) and 
 which one (a tag).
 
 I think that is of value. iWARP can implement it as two work 
 requests and maintain the overall semantics.
 
 Are you arguing that iWARP should NOT provide this service 
 until it can do it in a single work request? It seems to me 
 that allowing an extra work request and completion is a 
 fairly simple accomodation as opposed to using an alternate 
 algorithm in the main transaction processing of the application.
 
 If we enable the applicatin can query how a remote write with 
 immediate will complete outside of the transaction loop then 
 we can allow the application to have *no* overhead inside the 
 main transaction loop, and *identical* logic on the sending side.
 
 And IB *could* implement send with invalidate by simply 
 agreeing on how the RKey to be invalidated is communicated 
 between the IB providers (perhaps as an immediate).
 
 But more to the point, I don't see how the more flexible 
 definition of write with immediate negatively impacts the IB 
 implementation of the feature. IB providers do not need to 
 allow for the extra work requests. They are not being asked 
 to place the immediate data into the receive buffer, or to do 
 any extra work at all.
 
 
 
  
 Yahoo! Groups

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady
Caitlin,
can you clarify this.
Are you proposing that Consumer encode a bit of Immediate Data to
specify that it is immediate data?
iWARP will pass it in Send message and IB in Immediate Data.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 09, 2006 2:40 PM
 To: Arlin Davis; Roland Dreier
 Cc: [EMAIL PROTECTED]; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
  Roland Dreier wrote:
  
  
  Hmm.  Can you put a number on how much better RDMA write with 
  immediate is on current HCA hardware?  How does using the 
 underlying 
  OpenIB verbs ability to post a list of work requests compare (ie 
  posting an RDMA write followed by a send in one verbs call)?
  Maybe post multiple is a better direction for DAT.
  
  
  With post multiple, unlike immediate data, you don't have 
 the ability 
  to distinguish between a normal receive and a rdma write completion 
  indication on the other end. This is the uniqueness of the service 
  that cannot be provided by the post multiple. Yes, post 
 multiple would 
  be a nice option for DAT it is just a different service. It 
 would also 
  be required to conform to the semantics rules of the bundled 
  operations so you could not do any optimization tricks under the 
  covers with an IB rdma_write_immediate operation.
  
 
 A post_multiple also requires defining a single DTO data 
 structure. If the post multiple is atomic (meaning all make 
 it or none do) then it requires an intermediate data 
 structure to have been created. If it is not atomic there 
 really isn't reason for it to not just be a utility function 
 layered above DAT.
 
 What I'm not seeing with the immediate is this urgent need by 
 the application to be able to use the same 32-bit value for 
 both an immediate and a 4 byte message that requires an 
 entire additional API just to support it.  Why can't the 
 application just add a bool to the send message?
 Or encode the 32-bits so that they come from disjoint domains?
 
 There seems to be agreement that a consolidated 
 write-and-send call would enable the application to get the 
 benefits of rdma write with immediate whenever the 
 application could distinguish the two.
 
 I cannot see why doing this is almost free for virtually all 
 applications, and trivial for the remainder. Adding and 
 documenting an extra call to deal with such an extreme corner 
 case that is being presented only in the abstract is just not 
 justified. This extra capability has to have enough 
 functionality for enough applications to justify keeping it 
 on the books, writing test cases for it, etc.
 
 We already made a similar decision in having a 128-bit IA 
 Address. That means we cannot support a host that interfaces 
 to the Internet with IPv6 and an InfiniBand network that not 
 only had global GIDs, but allocated a global subnetwork a 
 network id that was already in use as a valid public IPv6 network.
 
 The complexity of dealing with an IA Address that was
 128+1 bits was simply not jusitified to deal with
 an extreme corner case that could very easily be avoided 
 (there is no shortage of site local network IDs in the 
 IPv6/GID format, so using a global network prefix that was 
 disjoint from the official IPv6 hierarchy would be just plain silly).
 
 So far I haven't seen any explanation as to why an 
 application has a need to encode this 33rd bit of their 
 message in this terribly transport specific matter. Is there 
 some severe performance penalty to slightly restructuring the 
 send message so that it is no longer ambiguous with the 
 immeidate data?
 
 
 
  
 Yahoo! Groups Links
 
 * To visit your group on the web, go to:
 http://groups.yahoo.com/group/dat-discussions/
 
 * To unsubscribe from this group, send an email to:
 [EMAIL PROTECTED]
 
 * Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/
  
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Caitlin Bestler
[EMAIL PROTECTED] wrote:
 Hmm.  Can you put a number on how much better RDMA write with
 immediate is on current HCA hardware?  How does using the
 underlying OpenIB verbs ability to post a list of work requests
 compare (ie posting an RDMA write followed by a send in one verbs
 call)? Maybe post multiple is a better direction for DAT.
 
 
 With post multiple, unlike immediate data, you don't have the
 ability to distinguish between a normal receive and a rdma write
 completion indication on the other end. This is the uniqueness of
 the service that cannot be provided by the post multiple. Yes, post
 multiple would be a nice option for DAT it is just a different
 service. It would also be required to conform to the semantics
 rules of the bundled operations so you could not do any
 optimization tricks under the covers with an IB
 rdma_write_immediate operation. 
 
 
 A post_multiple also requires defining a single DTO data structure.
 If the post multiple is atomic (meaning all make it or none do) then
 it requires an intermediate data structure to have been created. If
 it is not atomic there really isn't reason for it to not just be a
 utility function layered above DAT.
 
 That is very good point.  And since the emulated immediate
 data service can't make the atomic guarantee it is the killer
 argument for just making the service plain - a potentially more
 efficient write/send. 
 
 
 What I'm not seeing with the immediate is this urgent need by the
 application to be able to use the same 32-bit value for both an
 immediate and a 4 byte message that requires an entire additional API
 just to support it.  Why can't the application just add a bool to
 the send message? Or encode the 32-bits so that they come from
 disjoint domains? 
 
 Some applications can do as you suggest.  Some applications
 can make good use of unambiguous indications where the buffer
 size, content, or arrival timing is not constrained.  Some
 don't need write notification at all.  What's your point?
 
 
 There seems to be agreement that a consolidated write-and-send call
 would enable the application to get the benefits of rdma write with
 immediate whenever the application could distinguish the two.
 
 Well, I think there is agreement that *some* applications can
 use write-and-send in a beneficial way.  But then again,
 nothing prevents them from doing that now.  They do not need
 an additional API.  But again, I don't have an issue with
 defining a helper function.  I do have an issue with defining
 an API and semantic that says the target side needs to be
 coded in a way to always deal with both true immediate data
 and emulation.  Just define a write/send helper API and the
 UPL can be coded in a consistent manner if that is a
 beneficial service.  If a true unambiguous indication service
 is more beneficial or required, it can use the extension and
 accept the extra complexity.  To demand extra complexity in
 applications that obviously don't need the true immediate
 data semantic is just wrong in my option.
 
 
 I cannot see why doing this is almost free for virtually all
 applications, and trivial for the remainder. Adding and documenting
 an extra call to deal with such an extreme corner case that is being
 presented only in the abstract is just not justified. This extra
 capability has to have enough functionality for enough applications
 to justify keeping it on the books, writing test cases for it, etc.
 
 All we're asking is that a write/send combined API not be
 called immediate data unless it fits the semantics of
 immediate data.  I am puzzled at the resistance this is
 getting.  There is a standards body specification for
 immediate data.  If it is not followed, don't call it
 immediate data.  It's that simple.  For those transports that
 can provide the service, the UPL may be able to gain access to it
 through an extension. 
 

I have no objection to calling this
dat_ep_post_rdma_write_with_notifier
and labelling the 32-bit data as a notifier tag.

Even on iWARP transports small send data can be in-lined,
avoiding the need for buffers to be registered. A special
API where the length of the send buffer is known in 
advance makes this even easier.

What I still fail to see is a rationale that works down
from the application layer on why an application would
need still one more page in their cookbook. Creating an
entire new method to enable a strange method of signalling
one bit of information to the other end doesn't seem like
much of a payoff to me.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Caitlin Bestler
[EMAIL PROTECTED] wrote:
 Caitlin,
 can you clarify this.
 Are you proposing that Consumer encode a bit of Immediate
 Data to specify that it is immediate data?
 iWARP will pass it in Send message and IB in Immediate Data.
 

If we agreed that there was some accute need for this 33rd
bit coming down from the application layer then creating an
iWARP untagged message that encoded the first 32 bits, the
length of the RDMA write and the magic bonus bit would 
indeed be a possible solution.

I am skeptical that there is a true application derived
need for this bonus bit that justifies the complexity
required to document it.

If the application only needs this bonus bit when running
over IB then it really doesn't need it at all.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general][RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady



Mike,
but then the combined operation can as easily be handle 
by a "multiple post operation".
What is the need specific transport-independent RDMA 
Write with immediate data.

I am still concern over the need of Consumer Recv side 
to separate recv of Immediate Data
from "regular" Recv. Consumer "knows" what it expect to 
match the posted Recv.
There is one to one mapping between non-pure RDMA 
transfer ops of one side with Recv
of another. Sure ULP may use the same size buffers for 
all. But how many
ULPs mix the Immediate Data size messages ( 4 bytes on 
IB ) with normal
Sends of the same exact size.

Arkady






Arkady Kanevsky 
email: [EMAIL PROTECTED]
Network 
Appliance Inc. 
phone: 781-768-5395
1601 
Trapelo Rd. - Suite 16.Fax: 
781-895-1195
Waltham, MA 
02451 
central phone: 781-768-5300


  
  
  From: Michael Krause 
  [mailto:[EMAIL PROTECTED] Sent: Thursday, February 09, 2006 3:25 
  PMTo: Arlin DavisCc: [EMAIL PROTECTED]; 
  openib-general@openib.orgSubject: Re: [dat-discussions] 
  [openib-general][RFC] DAT2.0immediatedataproposal
  At 03:36 PM 2/8/2006, Arlin Davis wrote:
  Roland Dreier wrote:
 Michael So, 
  here we have a long discussion on attempting to 
  Michael perpetuate a concept that is not universal 
  across Michael transports and was deemed to have 
  minimal value that most Michael wanted to see removed 
  from the architecture.But this discussion is being driven by an 
  application developer whodoes see value in immediate 
  data.Arlin, can you quantify the benefit you see from RDMA write 
  withimmediate vs. RDMA write followed by a 
send?We need speed and simplicity.A 
very latency sensitive application that requires immediate notification of 
RDMA write completion on the remote node without ANY latency penalties 
associated with combining operations, HCA priority rules across QPs, wire 
congestion, etc. An application that has no requirement for messaging 
outside of remote rdma write completion notifications. The application would 
not have to register and manage additional message buffers on either side, 
we can just size the queues accordingly and post zero byte messages. We need 
something that would be equivelent to setting there polling on the last byte 
of inbound data. But, since data ordering within an operation is not 
guaranteed that is not an option. So, rdma with immediate data is the most 
optimal and simplistic method for indication of RDMA-write completion that 
we have available today. In fact, I would like to see it increased in size 
to make it even more useful.RDMA Write with Immediate is part 
  of the IB Extended Transport Header. It is a fixed-sized quantity and 
  not one subject to change, i.e. increasing its size.Your argument 
  above reinforces that the particular application need is IB-specific and thus 
  should not be part of a general API but a transport-specific API. 
  If the application will only operate optimally using immediate data, then it 
  is only suitable for an IB fabric. This reinforces the need for a 
  transport-specific API.Those applications that simply want to enable 
  completion notification when a RDMA Write has occurred can use a general 
  purpose API that is interconnect independent and whose code is predicated upon 
  a RDMA Write - Send set of operations. This will enable application 
  portability across all interconnect types.Mike 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady
Roy,
and if tomorrow iWARP decides to support Immediate data with variable
length. API does not changes. Semantic does not changes and IB
will not be able to support it.

I am trying to define the semantic and API which will not have to be
modified for each rev of the transport.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Larsen, Roy K [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 09, 2006 3:32 PM
 To: [EMAIL PROTECTED]; Arlin Davis; Roland Dreier
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] 
 [RFC]DAT2.0immediatedataproposal
 
  Hmm.  Can you put a number on how much better RDMA write with 
  immediate is on current HCA hardware?  How does using the 
 underlying 
  OpenIB verbs ability to post a list of work requests compare (ie 
  posting an RDMA write followed by a send in one verbs call)?
  Maybe post multiple is a better direction for DAT.
 
 
  With post multiple, unlike immediate data, you don't have 
 the ability 
  to distinguish between a normal receive and a rdma write 
 completion 
  indication on the other end. This is the uniqueness of the service 
  that cannot be provided by the post multiple. Yes, post multiple 
  would be a nice option for DAT it is just a different service. It 
  would also be required to conform to the semantics rules of the 
  bundled operations so you could not do any optimization 
 tricks under 
  the covers with an IB rdma_write_immediate operation.
 
 
 A post_multiple also requires defining a single DTO data 
 structure. 
 If the post multiple is atomic (meaning all make it or none 
 do) then it 
 requires an intermediate data structure to have been 
 created. If it is 
 not atomic there really isn't reason for it to not just be a utility 
 function layered above DAT.
 
 That is very good point.  And since the emulated immediate 
 data service can't make the atomic guarantee it is the killer 
 argument for just making the service plain - a potentially 
 more efficient write/send.
 
 
 What I'm not seeing with the immediate is this urgent need by the 
 application to be able to use the same 32-bit value for both an 
 immediate and a 4 byte message that requires an entire 
 additional API 
 just to support it.  Why can't the application just add a 
 bool to the 
 send message?
 Or encode the 32-bits so that they come from disjoint domains?
 
 Some applications can do as you suggest.  Some applications 
 can make good use of unambiguous indications where the buffer 
 size, content, or arrival timing is not constrained.  Some 
 don't need write notification at all.  What's your point?
 
 
 There seems to be agreement that a consolidated write-and-send call 
 would enable the application to get the benefits of rdma write with 
 immediate whenever the application could distinguish the two.
 
 Well, I think there is agreement that *some* applications can 
 use write-and-send in a beneficial way.  But then again, 
 nothing prevents them from doing that now.  They do not need 
 an additional API.  But again, I don't have an issue with 
 defining a helper function.  I do have an issue with defining 
 an API and semantic that says the target side needs to be 
 coded in a way to always deal with both true immediate data 
 and emulation.  Just define a write/send helper API and the 
 UPL can be coded in a consistent manner if that is a 
 beneficial service.  If a true unambiguous indication service 
 is more beneficial or required, it can use the extension and 
 accept the extra complexity.  To demand extra complexity in 
 applications that obviously don't need the true immediate 
 data semantic is just wrong in my option.
 
 
 I cannot see why doing this is almost free for virtually all 
 applications, and trivial for the remainder. Adding and 
 documenting an 
 extra call to deal with such an extreme corner case that is being 
 presented only in the abstract is just not justified. This extra 
 capability has to have enough functionality for enough 
 applications to 
 justify keeping it on the books, writing test cases for it, etc.
 
 All we're asking is that a write/send combined API not be 
 called immediate data unless it fits the semantics of 
 immediate data.  I am puzzled at the resistance this is 
 getting.  There is a standards body specification for 
 immediate data.  If it is not followed, don't call it 
 immediate data.  It's that simple.  For those transports that 
 can provide the service, the UPL may be able to gain access 
 to it through an extension.
 
 Roy
 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general
 
 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Larsen, Roy K
 All we're asking is that a write/send combined API not be
 called immediate data unless it fits the semantics of
 immediate data.  I am puzzled at the resistance this is
 getting.  There is a standards body specification for
 immediate data.  If it is not followed, don't call it
 immediate data.  It's that simple.  For those transports that
 can provide the service, the UPL may be able to gain access to it
 through an extension.


I have no objection to calling this
dat_ep_post_rdma_write_with_notifier
and labelling the 32-bit data as a notifier tag.

If this MUST be implanted by the provider as a (possibly optimized)
write followed by a send, that sounds good to me.  All transports can
support it and provide the same semantic.  No need for application
schism.  However, I wouldn't place a restriction on the size of the
notifier tag.  Somewhere along the line, the send data has to reside in
a registered buffer.  Might as well have the ULP supply it and let it
define the contents and size.
 

Even on iWARP transports small send data can be in-lined,
avoiding the need for buffers to be registered. A special
API where the length of the send buffer is known in
advance makes this even easier.

Ah, I wasn't aware iWARP could carry inline data.  I take it that's not
possible on an iWARP RDMA write PDU however.


What I still fail to see is a rationale that works down
from the application layer on why an application would
need still one more page in their cookbook. Creating an
entire new method to enable a strange method of signalling
one bit of information to the other end doesn't seem like
much of a payoff to me.

Of course the semantics are much more that signaling one bit.
Nevertheless, if the contention is that applications don't need that
bit, that all they need are write/send semantics, then by all means,
simply define an API that gives them that and this thread is closed.
Provider writers for transports that can supply a true immediate data
service would be free to waste their time supplying an unused service
through an extension.  But that business decision should be left to the
provider writer, not his mailing list.

Roy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Caitlin Bestler
Larsen, Roy K wrote:

 
 Even on iWARP transports small send data can be in-lined, avoiding
 the need for buffers to be registered. A special API where the
 length of the send buffer is known in advance makes this even
 easier. 
 
 Ah, I wasn't aware iWARP could carry inline data.  I take it
 that's not possible on an iWARP RDMA write PDU however.
 

On the wire the data is always in-line, only an RDMA Read
Request references data that is not part of the message.
The iWARP protocols do not specify much about the local
interface. That role has been taken by the RDMAC verbs
and RNIC-PI so far. 

The standard functionality defined in the RDMAC verbs do not
mandate support for Inline Send work request. Neither do the
IBTA verbs. The option shows up in APIs, and in firmware,
because it is a valuable optimization that improves latency
in the Device/host exchange independent of the wire protocol.

In the vast majority of cases, the user verbs can implement 
inline sends very easily whenever the data is shorter than
the SGL would have been.

So in the sense that you can view the SQ itself as a registered
buffer then it is true. But there is no need for a *separate*
registered buffer.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal

2006-02-09 Thread Larsen, Roy K
Roy,
and if tomorrow iWARP decides to support Immediate data with variable
length. API does not changes. Semantic does not changes and IB
will not be able to support it.

I am trying to define the semantic and API which will not have to be
modified for each rev of the transport.

Arkady,

Simply define the API as all the parameters needed to do an RDMA write
followed by a send.  This is semantically all that many seem to believe
is required.  I would not restrict the size or contents of the send
buffer supplied by the ULP.  Could even be a zero length buffer just to
trigger the receive completion.  Don't try to make the operation any
more magical than that.  All transports can implement it consistently
and the ULP can handle it consistently too. I can't see how anyone could
object to that API since it is providing the service desired
consistently among all transports.

That said, I am not conceding that this service is the equivalent to IB
RDMA write with immediate data and want to see a general extension API
added for this and any future transport service that won't be supported
by the DAPL API. 

Roy


Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Arlin Davis

Michael Krause wrote:

RDMA Write with Immediate is part of the IB Extended Transport 
Header.  It is a fixed-sized quantity and not one subject to change, 
i.e. increasing its size.


Your argument above reinforces that the particular application need is 
IB-specific and thus should not be part of a general API but a 
transport-specific API.   If the application will only operate 
optimally using immediate data, then it is only suitable for an IB 
fabric.  This reinforces the need for a transport-specific API.


I agree. I will move the IB immediate data service back into the 
extension interface and update the OpenIB uDAPL provider patch.




Those applications that simply want to enable completion notification 
when a RDMA Write has occurred can use a general purpose API that is 
interconnect independent and whose code is predicated upon a RDMA 
Write - Send set of operations.  This will enable application 
portability across all interconnect types.


I will defer this to Arkady to draft.

-arlin
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-09 Thread Kanevsky, Arkady
Arlin,
This can be done.

But I have an issue that extension call violate Transport Requirement.
Currently, the matching semantic is well-defined since
Recv only matches Send. Since Spec does not have any idea what
operations are defined in extension(s) there is a problem
with the transport requirements. We can, of course,
make some generic statement that with does not cover APIs
that are defined in extensions.

The API requirements are easier to handle. Since they have been
written as Nonrequirement for the APIs we decide to define yet.
(I will need to review chapter 5 to make we had followed this
in all cases.)

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Arlin Davis [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 09, 2006 5:57 PM
 To: Michael Krause
 Cc: [EMAIL PROTECTED]; 
 openib-general@openib.org; Kanevsky, Arkady
 Subject: Re: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 Michael Krause wrote:
 
  RDMA Write with Immediate is part of the IB Extended 
 Transport Header.  
  It is a fixed-sized quantity and not one subject to change, i.e. 
  increasing its size.
 
  Your argument above reinforces that the particular 
 application need is 
  IB-specific and thus should not be part of a general API but a
  transport-specific API.   If the application will only operate 
  optimally using immediate data, then it is only suitable for an IB 
  fabric.  This reinforces the need for a transport-specific API.
 
 I agree. I will move the IB immediate data service back into 
 the extension interface and update the OpenIB uDAPL provider patch.
 
 
  Those applications that simply want to enable completion 
 notification 
  when a RDMA Write has occurred can use a general purpose 
 API that is 
  interconnect independent and whose code is predicated upon a RDMA 
  Write - Send set of operations.  This will enable application 
  portability across all interconnect types.
 
 I will defer this to Arkady to draft.
 
 -arlin
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Kanevsky, Arkady
One more issue to discuss.
Does Completion of Recv that matches RDMA Write with Immediate Data
automatically sync local memory or Consumer still need to do
lmr_sync_rdma_write prior to accessing RDMAed data.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 7:40 PM
 To: [EMAIL PROTECTED]; Larsen, Roy K; Arlin 
 Davis; Hefty, Sean
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
  We have problem no matter which option we choose.
  The current Transport Level Requirement state:
  
  There is a one-to-one correspondence between send operation on one 
  Endpoint of the Connection and recv operations on the other 
 Endpoint 
  of the Connection.
  There is no correspondence between RDMA operations on one 
 Endpoint of 
  the Connection and recv or send data transfer operation on 
 the other 
  Endpoint of the Connection.
  Receive operations on a Connection must be completed in the 
 order of 
  posting of their corresponding sends.
  
  The Immediate data and Atomic ops violate these 
 requirements including 
  ordering rules.
  
  I had started updating these rules when I generated the 
 first draft of 
  the requirements. They are included in the enclosed pdf file.
  But they do not cover Atomic ops that also impact transport 
  requirements. This chapter of the spec have not been changed since 
  DAPL 1.0 and I am very concern with any changes to it.
  
  Arkady
  
 
 If RDMA Write with Immediate is viewed as being the 
 equivalent of doing RDMA Write and then an RDMA Send the 
 correspondence rule is maintained. But *only* if the rdma 
 write with immediate
 has all of the semantics of a Send.
 
 Atomics do not violate the rules if you view them as being a 
 variation on an RDMA Read. They are an RDMA Read with modify.
 The real question is whether it makes sense to put it in the 
 RDMA device. It is also not subject to emulation at a highe layer. 
 
 With send with invalidate we know how InfiniBand *will* 
 support it, because of the IB 1.2 verbs. We do not know that 
 for atomics over iWARP. We do not know whether it will be 
 added, more importantly we do not know *how* it would be 
 added if it were added. That makes coming up with a transport 
 neutral definition very premature.
 In particular, if atomics were added to iWARP there is a 
 distinct design option where it would *not* be the same work 
 queue as RDMA Reads (adding atomics through Queue ID 3 would 
 make layering on top of a current implementation much easier. 
 But it would mean that atomic credits would be distinct from 
 read credits. This is a very strong reason to defer 
 attempting to define RDMA Atomics in a transport neutral fashion.
 
  
 
 
 
 
  
 Yahoo! Groups Links
 
 * To visit your group on the web, go to:
 http://groups.yahoo.com/group/dat-discussions/
 
 * To unsubscribe from this group, send an email to:
 [EMAIL PROTECTED]
 
 * Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/
  
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Caitlin Bestler
[EMAIL PROTECTED] wrote:
 One more issue to discuss.
 Does Completion of Recv that matches RDMA Write with
 Immediate Data automatically sync local memory or Consumer
 still need to do lmr_sync_rdma_write prior to accessing RDMAed data.
 

Why would it be any different than for a plain receive?
The intent is the same, to indicate that prior Writes have completed.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Michael Krause


At 09:16 PM 2/6/2006, Sean Hefty wrote:
The requirement is to
provide an API that supports RDMA writes with immediate
data. A send that follows an RDMA write is not immediate data,
and the API
should not be constructed around trying to make it so.
To be clear, I believe that write with immediate should be part of the
normal
APIs, rather than an extension, but should be designed around those
devices that
provide it natively.

One thing to keep in mind is that the IBTA workgroup responsible for the
transport wanted to eliminate immediate data support entirely but it was
retained solely to enable VIA application migration (even though the
application base was quite small). If that requirement could have
been eliminated, then it would have been gone in a heart beat.
Given a RDMA-WRITE followed by a SEND provides the same application
semantics based on the use models, iWARP chose not to support immediate
data. 
So, here we have a long discussion on attempting to perpetuate a concept
that is not universal across transports and was deemed to have minimal
value that most wanted to see removed from the architecture. One
has to question the value of trying to develop any API / software to
support immediate data instead of just enabling the preferred method
which is RDMA WRITE - SEND. I agree with those who have contended
that this is difficult to do in a general purpose fashion. When all
of this is taken into account, it seems the only good engineering answer
is to eliminate immediate data support by the software and focused on the
method that works across all interconnects.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Roland Dreier
Michael So, here we have a long discussion on attempting to
Michael perpetuate a concept that is not universal across
Michael transports and was deemed to have minimal value that most
Michael wanted to see removed from the architecture.

But this discussion is being driven by an application developer who
does see value in immediate data.

Arlin, can you quantify the benefit you see from RDMA write with
immediate vs. RDMA write followed by a send?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Arlin Davis

Roland Dreier wrote:


   Michael So, here we have a long discussion on attempting to
   Michael perpetuate a concept that is not universal across
   Michael transports and was deemed to have minimal value that most
   Michael wanted to see removed from the architecture.

But this discussion is being driven by an application developer who
does see value in immediate data.

Arlin, can you quantify the benefit you see from RDMA write with
immediate vs. RDMA write followed by a send?

 


We need speed and simplicity.

A very latency sensitive application that requires immediate 
notification of RDMA write completion on the remote node without ANY 
latency penalties associated with combining operations, HCA priority 
rules across QPs, wire congestion, etc. An application that has no 
requirement for messaging outside of remote rdma write completion 
notifications. The application would not have to register and manage 
additional message buffers on either side, we can just size the queues 
accordingly and post zero byte messages. We need something that would be 
equivelent to setting there polling on the last byte of inbound data. 
But, since data ordering within an operation is not guaranteed that is 
not an option. So, rdma with immediate data is the most optimal and 
simplistic method for indication of RDMA-write completion that we have 
available today. In fact, I would like to see it increased in size to 
make it even more useful.


-arlin






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Roland Dreier
Arlin A very latency sensitive application that requires
Arlin immediate notification of RDMA write completion on the
Arlin remote node without ANY latency penalties associated with
Arlin combining operations, HCA priority rules across QPs, wire
Arlin congestion, etc. An application that has no requirement for
Arlin messaging outside of remote rdma write completion
Arlin notifications. The application would not have to register
Arlin and manage additional message buffers on either side, we
Arlin can just size the queues accordingly and post zero byte
Arlin messages. We need something that would be equivelent to
Arlin setting there polling on the last byte of inbound
Arlin data. But, since data ordering within an operation is not
Arlin guaranteed that is not an option. So, rdma with immediate
Arlin data is the most optimal and simplistic method for
Arlin indication of RDMA-write completion that we have available
Arlin today. In fact, I would like to see it increased in size to
Arlin make it even more useful.

Hmm.  Can you put a number on how much better RDMA write with
immediate is on current HCA hardware?  How does using the underlying
OpenIB verbs ability to post a list of work requests compare (ie
posting an RDMA write followed by a send in one verbs call)?
Maybe post multiple is a better direction for DAT.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Larsen, Roy K








One thing to keep in mind is that the IBTA workgroup
responsible for the transport wanted to eliminate immediate data support
entirely but it was retained solely to enable VIA application migration (even
though the application base was quite small). If that requirement could
have been eliminated, then it would have been gone in a heart beat. Given
a RDMA-WRITE followed by a SEND provides the same application semantics based
on the use models, iWARP chose not to support immediate data.



Mike, 



I was not part of the original IBTA discussions and I wont argue
whether this facility should or shouldnt have been include. Nevertheless,
it is part of the specification, there are HCA vendors that implement it, and
we have applications that make use of it. I would, however, disagree with
your assertion that write followed by a send is semantically equivalent to
write immediate. Ordering may be semantically the same, but the service
is not. Receive work completions are explicitly indicated as being
associated with immediate data and therefore an associated write completion. A
write followed by a send does not provide the same indication semantic.



Roy






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-08 Thread Caitlin Bestler
[EMAIL PROTECTED] wrote:
 Arlin A very latency sensitive application that requires
 Arlin immediate notification of RDMA write completion on the
 Arlin remote node without ANY latency penalties associated with
 Arlin combining operations, HCA priority rules across QPs, wire
 Arlin congestion, etc. An application that has no requirement for
 Arlin messaging outside of remote rdma write completion
 Arlin notifications. The application would not have to register
 Arlin and manage additional message buffers on either side, we
 Arlin can just size the queues accordingly and post zero byte
 Arlin messages. We need something that would be equivelent to
 Arlin setting there polling on the last byte of inbound
 Arlin data. But, since data ordering within an operation is not
 Arlin guaranteed that is not an option. So, rdma with immediate
 Arlin data is the most optimal and simplistic method for
 Arlin indication of RDMA-write completion that we have available
 Arlin today. In fact, I would like to see it increased in size to
 Arlin make it even more useful.
 
 Hmm.  Can you put a number on how much better RDMA write with
 immediate is on current HCA hardware?  How does using the
 underlying OpenIB verbs ability to post a list of work
 requests compare (ie posting an RDMA write followed by a send
 in one verbs call)?
 Maybe post multiple is a better direction for DAT.
 

The distinction between Write and Send versus post multiple
is that it maintains a very simple one-to-one correspondence
with the post_recv at the data sink.

I also do not see how the *application* keeping the write and send
semantics can have a negative performance implication if we allow
InfiniBand Providers to encode it as an RDMA Write with Immediate.

If the Data Source needs to communicate to the Data Sink that
a specific RDMA Write transfer is done then it is sending a
message. Information transfer and synchronization is occuring.

I fail to see the value, let alone the optimization, of layering
on an extra bit of information disguised as an opcode and using
a specific transport's encoding methods as the model for a transport
neutral API (particularly one at the DAT layer, at the verb layer
it is a different issue because at the verb layer we do not want
to hide any hardware capabilities even while encouraging safe
harbor transport neutral practices).

If distinquishing between 32-bit messages and 32-bit immediates
that can arrive in indeterminate order is really that important
to your application then maybe you really needed a 33-bit message
to begin with. Encoding application layer information via your
choice of carrier pigeon is not a very robust strategy.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC]DAT2.0immediatedataproposal

2006-02-08 Thread Sean Hefty
Hmm.  Can you put a number on how much better RDMA write with
immediate is on current HCA hardware?  How does using the underlying
OpenIB verbs ability to post a list of work requests compare (ie
posting an RDMA write followed by a send in one verbs call)?
Maybe post multiple is a better direction for DAT.

A post multiple call as a general API makes sense, but I think that's a
separate issue.

Given that IB provides true immediate data with RDMA writes, a way should be
available to make use of it.  I don't know what the performance numbers between
using a write with immediate versus a write followed by a send, but I don't
think that anyone could argue that the write with immediate wouldn't perform
better.

To me, the question is whether write with immediate is supported as a transport
specific extension, which was Arlin's original patch, or through some standard
API.  The attempt to make the API standard, so that iWarp could emulate it
(poorly in my view), is what appears to be driving the disagreements.

It also appears to me that the decisions are coming down to one of the
following.  If iWarp can emulate write with immediate, then a generic API should
be used.  If iWarp cannot properly emulate write with immediate, then the API
should be transport specific.  It's curious to me that in both cases, iWarp is
driving the API decision and design for something that is an IB specific
feature.

- Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Arlin Davis

Sean Hefty wrote:


The requirement is to provide an API that supports RDMA writes with immediate
data.  A send that follows an RDMA write is not immediate data, and the API
should not be constructed around trying to make it so.
   



To be clear, I believe that write with immediate should be part of the normal
APIs, rather than an extension, but should be designed around those devices that
provide it natively.
 

I totally agree. A standard RDMA write with immediate API can be very 
useful to RDMA applications based on the requirements (native support) 
set forth in my earlier email. It is analogous to the new 
dat_ep_post_send_with_invalidate() call; a call that supports a native 
iWARP transport operation but provides no provisions to help other 
transports emulate. So, other transports simply return NOT_SUPPORTED and 
add it natively in the future if it makes sense.


-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Caitlin Bestler
Arlin Davis wrote:
 Sean Hefty wrote:
 
 The requirement is to provide an API that supports RDMA writes with
 immediate data.  A send that follows an RDMA write is not immediate
 data, and the API should not be constructed around trying to make
 it so. 
 
 
 
 To be clear, I believe that write with immediate should be part of
 the normal APIs, rather than an extension, but should be designed
 around those devices that provide it natively.
 
 
 I totally agree. A standard RDMA write with immediate API can
 be very useful to RDMA applications based on the requirements
 (native support) set forth in my earlier email. It is analogous to
 the new dat_ep_post_send_with_invalidate() call; a call that supports
 a native iWARP transport operation but provides no provisions
 to help other transports emulate. So, other transports simply
 return NOT_SUPPORTED and add it natively in the future if it makes
 sense. 
 
 -arlin

What is proposed in a definition of
'dat_ep_post_rdma_write_with_immediate'
that can be implemented over iWARP using the sequence of messages that
were intended to support the same purpose (i.e., letting the other
side know that an RDMA Write transfer has been fully received).

This definition also conforms to all existing DAT ordering rules.

Is there anything wrong with this definition for an IB provider?

There is a similarity between write_with_immediate and
send_with_invalidate
in that they combine operations which a) are already logically tied 
from the consumer's perspective and b) can be more easily optimized
by the Provider over the wire when presented as one request.

Indeed, with send_with_invalidate it *has* to be optimized since
you cannot send the invalidate later.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Larsen, Roy K
Caitlin Bestler wrote:

Arlin Davis wrote:
 Sean Hefty wrote:

 The requirement is to provide an API that supports RDMA writes with
 immediate data.  A send that follows an RDMA write is not immediate
 data, and the API should not be constructed around trying to make
 it so.



 To be clear, I believe that write with immediate should be part of
 the normal APIs, rather than an extension, but should be designed
 around those devices that provide it natively.


 I totally agree. A standard RDMA write with immediate API can
 be very useful to RDMA applications based on the requirements
 (native support) set forth in my earlier email. It is analogous to
 the new dat_ep_post_send_with_invalidate() call; a call that supports
 a native iWARP transport operation but provides no provisions
 to help other transports emulate. So, other transports simply
 return NOT_SUPPORTED and add it natively in the future if it makes
 sense.

 -arlin

What is proposed in a definition of
'dat_ep_post_rdma_write_with_immediate'
that can be implemented over iWARP using the sequence of messages that
were intended to support the same purpose (i.e., letting the other
side know that an RDMA Write transfer has been fully received).

No, iWARP *CAN NOT* implement write immediate data any better than IB
can implement send with invalidate.  Immediate data *MUST* be indicated
to the ULP unambiguously.  Imposing an algorithm on the application to
infer immediate data arrival is hack, pure and simple. An application is
free to perform a write/send if that is the semantic they want.  Why
does iWARP get transport unique APIs but not IB?  I find this attempt to
bastardize the IB semantic of immediate data a little curious.

Roy

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Kanevsky, Arkady
IB does optionally support send_with_invalidate as defined in IBTA 1.2
spec.
OpenIB does not support this yet but this is a different matter.
So this is bad analogy.

The better analogy is socket based CM. 

But I am still not clear what you are advocating:
extensions, IB specific API or something else.

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Larsen, Roy K [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 2:46 PM
 To: [EMAIL PROTECTED]; Arlin Davis; Hefty, Sean
 Cc: Kanevsky, Arkady; Sean Hefty; openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 Caitlin Bestler wrote:
 
 Arlin Davis wrote:
  Sean Hefty wrote:
 
  The requirement is to provide an API that supports RDMA 
 writes with 
  immediate data.  A send that follows an RDMA write is 
 not immediate 
  data, and the API should not be constructed around 
 trying to make 
  it so.
 
 
 
  To be clear, I believe that write with immediate should 
 be part of 
  the normal APIs, rather than an extension, but should be designed 
  around those devices that provide it natively.
 
 
  I totally agree. A standard RDMA write with immediate API 
 can be very 
  useful to RDMA applications based on the requirements (native 
  support) set forth in my earlier email. It is analogous to the new 
  dat_ep_post_send_with_invalidate() call; a call that supports a 
  native iWARP transport operation but provides no 
 provisions to help 
  other transports emulate. So, other transports simply return 
  NOT_SUPPORTED and add it natively in the future if it makes sense.
 
  -arlin
 
 What is proposed in a definition of
 'dat_ep_post_rdma_write_with_immediate'
 that can be implemented over iWARP using the sequence of 
 messages that 
 were intended to support the same purpose (i.e., letting the 
 other side 
 know that an RDMA Write transfer has been fully received).
 
 No, iWARP *CAN NOT* implement write immediate data any better 
 than IB can implement send with invalidate.  Immediate data 
 *MUST* be indicated to the ULP unambiguously.  Imposing an 
 algorithm on the application to infer immediate data arrival 
 is hack, pure and simple. An application is free to perform a 
 write/send if that is the semantic they want.  Why does iWARP 
 get transport unique APIs but not IB?  I find this attempt to 
 bastardize the IB semantic of immediate data a little curious.
 
 Roy
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Caitlin Bestler
[EMAIL PROTECTED] wrote:
 Caitlin Bestler wrote:
 
 Arlin Davis wrote:
 Sean Hefty wrote:
 
 The requirement is to provide an API that supports RDMA writes
 with immediate data.  A send that follows an RDMA write is not
 immediate data, and the API should not be constructed around
 trying to make it so. 
 
 
 
 To be clear, I believe that write with immediate should be part of
 the normal APIs, rather than an extension, but should be designed
 around those devices that provide it natively.
 
 
 I totally agree. A standard RDMA write with immediate API can be
 very useful to RDMA applications based on the requirements (native
 support) set forth in my earlier email. It is analogous to the new
 dat_ep_post_send_with_invalidate() call; a call that supports a
 native iWARP transport operation but provides no provisions to help
 other transports emulate. So, other transports simply return
 NOT_SUPPORTED and add it natively in the future if it makes sense.
 
 -arlin
 
 What is proposed in a definition of
 'dat_ep_post_rdma_write_with_immediate'
 that can be implemented over iWARP using the sequence of messages
 that were intended to support the same purpose (i.e., letting the
 other side know that an RDMA Write transfer has been fully received).
 
 No, iWARP *CAN NOT* implement write immediate data any better
 than IB can implement send with invalidate.  Immediate data
 *MUST* be indicated to the ULP unambiguously.  Imposing an
 algorithm on the application to infer immediate data arrival
 is hack, pure and simple. An application is free to perform a
 write/send if that is the semantic they want.  Why does iWARP
 get transport unique APIs but not IB?  I find this attempt to
 bastardize the IB semantic of immediate data a little curious.
 

The transports aren't getting anything. Features are there for
applications, especially when the feature can be defined in a
way that makes sense without explaining transport mechanics.

Completing a transaction, complete with supplying a transaction
response and releasing the advertised STag associated with the
transaction is something that makes sense in the application
domain and conforms to normal DAT ordering rules.

Provide information about an RDMA Write to a receive operation
also meets that definition -- as long as it conforms to the
existing ordering rules. Shifting to an 8 byte message over
iWARP to allow for the write length *and* immediate 'tag'
is certainly doable. We could even consider having the
DAT Provider supply the 'buffer' silently in the DTO itself.

With that definition the consumer would get a receive completion
that told them that their peer's RDMA Write had been successfully
placed, how long it is (the length) and which one (a tag).

I think that is of value. iWARP can implement it as two work
requests and maintain the overall semantics.

Are you arguing that iWARP should NOT provide this service
until it can do it in a single work request? It seems to 
me that allowing an extra work request and completion is
a fairly simple accomodation as opposed to using an alternate
algorithm in the main transaction processing of the application.

If we enable the applicatin can query how a remote write
with immediate will complete outside of the transaction loop
then we can allow the application to have *no* overhead inside
the main transaction loop, and *identical* logic on the sending
side.

And IB *could* implement send with invalidate by simply agreeing
on how the RKey to be invalidated is communicated between the
IB providers (perhaps as an immediate).

But more to the point, I don't see how the more flexible
definition of write with immediate negatively impacts the
IB implementation of the feature. IB providers do not need
to allow for the extra work requests. They are not being 
asked to place the immediate data into the receive buffer,
or to do any extra work at all.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Larsen, Roy K
IB does optionally support send_with_invalidate as defined in IBTA 1.2
spec.
OpenIB does not support this yet but this is a different matter.
So this is bad analogy.

The better analogy is socket based CM.

But I am still not clear what you are advocating:
extensions, IB specific API or something else.

I advocate a write with immediate data API that delivers immediate data
to the target ULP *unambiguously*.  That is, the ULP need never infer
from buffer contents or receive completion timing that a write with
immediate has taken place.  If it is not granted formal API status, I
advocate implementation as a DAPL extension.  The notion of a combined
request API is orthogonal so I won't pursue it any further in this
thread.

Roy
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Larsen, Roy K
 What is proposed in a definition of
 'dat_ep_post_rdma_write_with_immediate'
 that can be implemented over iWARP using the sequence of messages
 that were intended to support the same purpose (i.e., letting the
 other side know that an RDMA Write transfer has been fully
received).

 No, iWARP *CAN NOT* implement write immediate data any better
 than IB can implement send with invalidate.  Immediate data
 *MUST* be indicated to the ULP unambiguously.  Imposing an
 algorithm on the application to infer immediate data arrival
 is hack, pure and simple. An application is free to perform a
 write/send if that is the semantic they want.  Why does iWARP
 get transport unique APIs but not IB?  I find this attempt to
 bastardize the IB semantic of immediate data a little curious.


The transports aren't getting anything. Features are there for
applications, especially when the feature can be defined in a
way that makes sense without explaining transport mechanics.


APIs exist to gain access to transport services so of course it is all
about the transport.  Presumably the transport services were defined
because they seemed useful, but a transport service exists in a standard
somewhere before it is defined in DAPL.  I believe that the IB immediate
data service and semantic is useful and should be supported too.

Completing a transaction, complete with supplying a transaction
response and releasing the advertised STag associated with the
transaction is something that makes sense in the application
domain and conforms to normal DAT ordering rules.


I don't disagree.  And unambiguous immediate data indications fall into
that same category which is why I'm puzzled there is so much resistance.

Provide information about an RDMA Write to a receive operation
also meets that definition -- as long as it conforms to the
existing ordering rules. Shifting to an 8 byte message over
iWARP to allow for the write length *and* immediate 'tag'
is certainly doable. We could even consider having the
DAT Provider supply the 'buffer' silently in the DTO itself.


If you make the receive indication unambiguous as to the fact it's
associated with a write immediate, you've got my full support, even if
immediate data is delivered differently by different transports.  If
not, it is nothing more than a write/send that the application can do
itself.

With that definition the consumer would get a receive completion
that told them that their peer's RDMA Write had been successfully
placed, how long it is (the length) and which one (a tag).

I think that is of value. iWARP can implement it as two work
requests and maintain the overall semantics.

If completion of the service is ambiguous, I strongly disagree.  The
application can do this with write/send now and with more flexibility.
True immediate indications are unambiguous and doesn't rely on the
contents of a receive buffer or its completion timing.  An application
must be able to perform normal send/receives of any size and content
simultaneously with RDMA write with immediate and without regard to when
they arrive.  The semantic proposed would put a constraint on how an
application could use the send/receive facility.  If an application can
live with such a constraint, it is free to use write/send now.  Those
that can't or would perform much better with a legitimate
write/immediate should be given access to the facility.


Are you arguing that iWARP should NOT provide this service
until it can do it in a single work request?

I'm arguing that an iWARP provider NOT support this service until it can
deliver immediate data indications unambiguously.

It seems to
me that allowing an extra work request and completion is
a fairly simple accomodation as opposed to using an alternate
algorithm in the main transaction processing of the application.

If we enable the applicatin can query how a remote write
with immediate will complete outside of the transaction loop
then we can allow the application to have *no* overhead inside
the main transaction loop, and *identical* logic on the sending
side.

I would contend that placing constraints on what and when an application
can send normal data just to use write immediate is far far worse. And
all just too basically save one extra function call.


And IB *could* implement send with invalidate by simply agreeing
on how the RKey to be invalidated is communicated between the
IB providers (perhaps as an immediate).

I'm afraid I don't follow.  If you're talking about providers setting up
there own private EPs to communicate, perhaps that's a solution for
iWARP providers to supply unambiguous immediate data indications


But more to the point, I don't see how the more flexible
definition of write with immediate negatively impacts the
IB implementation of the feature. IB providers do not need
to allow for the extra work requests. They are not being
asked to place the immediate data into the receive buffer,
or to do any extra work at all.
 
This is not 

RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Caitlin Bestler
Larsen, Roy K wrote:

 
 Completing a transaction, complete with supplying a transaction
 response and releasing the advertised STag associated with the
 transaction is something that makes sense in the application domain
 and conforms to normal DAT ordering rules.
 
 
 I don't disagree.  And unambiguous immediate data indications
 fall into that same category which is why I'm puzzled there is so
 much resistance. 
 
 Provide information about an RDMA Write to a receive operation
 also meets that definition -- as long as it conforms to the existing
 ordering rules. Shifting to an 8 byte message over iWARP to allow for
 the write length *and* immediate 'tag'
 is certainly doable. We could even consider having the DAT Provider
 supply the 'buffer' silently in the DTO itself.
 
 
 If you make the receive indication unambiguous as to the fact
 it's associated with a write immediate, you've got my full
 support, even if immediate data is delivered differently by
 different transports.  If not, it is nothing more than a
 write/send that the application can do itself.
 

From the viewpoint of the Provider/RNIC/driver there is no
wire Send message which can be known to be associated with 
an RDMA Write. Up to the maximum message size, any combination
of bytes are legal.

Therefore this is a distinct that an iWARP provider CANNOT make.
Unlike send with invalidate, which CAN be supported under IB 1.2.

Keep in mind that under the basic DAT semantics, RDMA Writes
are *not* signalled to the Data Sink. That was settled years ago.

So under DAT semantics you use a Send to cause a completion at
the other end -- period.

What we are talking about is whether to allow short sends that
supply a minimal set of standard information to be piggy-backed
on a prior RDMA Write by making it an RDMA Write with immediate.

I am not opposed to allowing IB Providers to do that. I am opposed
to changing the fundamental DAT semantics that RDMA Writes are not
visible the Data Sink. Conceptually, a Send is required.

And as with all Send Messages, it is up to the *application* to
ensure tha their meaning is known at the Data Sink. This can be
done by ordering and/or content of the data.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Larsen, Roy K

 Completing a transaction, complete with supplying a transaction
 response and releasing the advertised STag associated with the
 transaction is something that makes sense in the application domain
 and conforms to normal DAT ordering rules.


 I don't disagree.  And unambiguous immediate data indications
 fall into that same category which is why I'm puzzled there is so
 much resistance.

 Provide information about an RDMA Write to a receive operation
 also meets that definition -- as long as it conforms to the existing
 ordering rules. Shifting to an 8 byte message over iWARP to allow
for
 the write length *and* immediate 'tag'
 is certainly doable. We could even consider having the DAT Provider
 supply the 'buffer' silently in the DTO itself.


 If you make the receive indication unambiguous as to the fact
 it's associated with a write immediate, you've got my full
 support, even if immediate data is delivered differently by
 different transports.  If not, it is nothing more than a
 write/send that the application can do itself.


From the viewpoint of the Provider/RNIC/driver there is no
wire Send message which can be known to be associated with
an RDMA Write. Up to the maximum message size, any combination
of bytes are legal.

Therefore this is a distinct that an iWARP provider CANNOT make.
Unlike send with invalidate, which CAN be supported under IB 1.2.

So, your stance is that if an RDMA transport protocol specification
exists that can't support or emulate a service faithfully, an API can't
exist for that service in DAPL.  Ok, that makes this discussion much
clearer and to the point.

Keep in mind that under the basic DAT semantics, RDMA Writes
are *not* signalled to the Data Sink. That was settled years ago.

So under DAT semantics you use a Send to cause a completion at
the other end -- period.

Addressed below...


What we are talking about is whether to allow short sends that
supply a minimal set of standard information to be piggy-backed
on a prior RDMA Write by making it an RDMA Write with immediate.

That is not what those advocating immediate data have been talking
about.  It has always been whether the IB capability could be exposed to
the ULP.  Remember, the first proposal by Arlin was to make this an
extension.  This list wanted to expose it as a formal API.  So, I find
that assertion puzzling.

I am not opposed to allowing IB Providers to do that. I am opposed
to changing the fundamental DAT semantics that RDMA Writes are not
visible the Data Sink. Conceptually, a Send is required.


Conceptual, eh?  Well, of course IB immediate data *is* indicated on the
receive queue.  Not conceptual enough?  But that aside, it is a rather
strict and convenient interpretation.  Are you sure you want to put a
stake that deep in the ground about all currently defined DAPL semantics
against transport standards that evolve, or just those that can't be
implemented by all transports?

I was under the assumption that the DAT community defined the APIs and
semantics through an open process.  Given that the IB write immediate
data facility does not break the implementation or semantics of the
currently defined RDMA write facility, I see no reason the DAPL spec
couldn't be updated, through consensus, with the realities of existing
transport services.  Nevertheless, I presume you'll have no objection to
implementing this useful service as a DAPL extension since the semantic
rules for extensions haven't been define yet.

Roy

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [dat-discussions] [openib-general] [RFC] DAT2.0immediatedataproposal

2006-02-07 Thread Kanevsky, Arkady
We have problem no matter which option we choose.
The current Transport Level Requirement state:

There is a one-to-one correspondence between send operation on one
Endpoint of the Connection and recv operations on the other Endpoint of
the Connection.
There is no correspondence between RDMA operations on one Endpoint of
the Connection and recv or send data transfer operation on the other
Endpoint of the Connection.
Receive operations on a Connection must be completed in the order of
posting of their corresponding sends.

The Immediate data and Atomic ops violate these requirements including
ordering
rules.

I had started updating these rules when I generated the first draft of
the
requirements. They are included in the enclosed pdf file.
But they do not cover Atomic ops that also impact transport
requirements.
This chapter of the spec have not been changed since DAPL 1.0
and I am very concern with any changes to it.

Arkady

Arkady Kanevsky   email: [EMAIL PROTECTED]
Network Appliance Inc.   phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.Fax: 781-895-1195
Waltham, MA 02451   central phone: 781-768-5300
 

 -Original Message-
 From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 07, 2006 6:57 PM
 To: Larsen, Roy K; [EMAIL PROTECTED]; Arlin 
 Davis; Hefty, Sean
 Cc: openib-general@openib.org
 Subject: RE: [dat-discussions] [openib-general] [RFC] 
 DAT2.0immediatedataproposal
 
 [EMAIL PROTECTED] wrote:
 
  
  I was under the assumption that the DAT community defined 
 the APIs and 
  semantics through an open process.  Given that the IB write 
 immediate 
  data facility does not break the implementation or semantics of the 
  currently defined RDMA write facility, I see no reason the 
 DAPL spec 
  couldn't be updated, through consensus, with the realities 
 of existing 
  transport services.  Nevertheless, I presume you'll have no 
 objection 
  to implementing this useful service as a DAPL extension since the 
  semantic rules for extensions haven't been define yet.
  
  Roy
 
 That is correct, because as an extension the user would not 
 expect normal semantics to still be guaranteed.
 
 
 
 
 
  
 Yahoo! Groups Links
 
 * To visit your group on the web, go to:
 http://groups.yahoo.com/group/dat-discussions/
 
 * To unsubscribe from this group, send an email to:
 [EMAIL PROTECTED]
 
 * Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/
  
 
 


transport_req_020706.pdf
Description: transport_req_020706.pdf
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general