Quoting r. Or Gerlitz <[EMAIL PROTECTED]>: > Subject: Re: [openib-general] [RFC] [PATCH v2] rdma/ib_cm: fix APM support > > Michael S. Tsirkin wrote: > > Quoting Or Gerlitz <[EMAIL PROTECTED]>: > >>> Protocols that rely on RC ACK for reliability guarantees (like SDP), > >>> basically > >>> do not make it possible to address the hca failure case: you got an ACK, > >>> but > >>> remote hca could have failed without committing data to memory. So APM > >>> failover > >>> is a requirement for these. It could be iser does not need APM, fine. > >> This is news to me, does your HCA first sends an ACK and only then does > >> the DMA transaction and if needed generates the CQE !?!?!? > > > I can't tell either way, but why not? > > Consider also that DMA write is a posted transaction - HCA gets no > > indication > > when it was committed to memory, so it can not delay the ACK until this > > occurs. > > OK, OK, I see now the IB spec piece below, it was me expecting somehow > too much from IB RC... rethinking on this matter i see now its more > problematic to support this ack-following-dma-memory-write-success > > 9.7.5.1.6 ACKNOWLEDGE MESSAGE SCHEDULING > > For SEND or RDMA WRITE requests, an ACK may be scheduled before > data is actually written into the responder?s memory. The ACK simply > indicates that the data has successfully reached the fault domain of the > responding node. That is, the data has been received by the channel > adapter and the channel adapter will write that data to the memory > system of the responding node, or the responding application will at > least be informed of the failure. > > So anyway, what's your HCA behavior wrt this?
The behavior matches the spec. I can't give you extra guarantees. > >> and how come APM is the solution to this crazy problem? > > > If HCA failure is a crazy problem, then what is the sane problem APM does > > *not* solve? > > you misunderstood me, the "crazy problem" was related to my > misconception of IB RC ACKs. > > My question is: how does APM solves the problem with transactions whose > ACK was received but their data was not written/committed to memory? APM does not solve it - I just say the problem as formulated is not solvable without protocol changes. So all we can solve for a generic RC protocol, is port/switch failure, and APM solves this elegantly and transparently. -- MST _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general