[PATCH RFC] dapltest: Add final send/recv sync for transaction tests.

2014-02-28 Thread Steve Wise
The transaction tests need both sides to send a sync message after running
the test.  This ensures that all remote operations are complete before
dapltest deregeisters memory and disconnects the endpoints.

Without this logic, we see async errors on iwarp devices because a read
or write arrives after the rmr has been destroyed.  I believe this is
more likely to happen with iWARP than IB because iWARP completions only
indicate the local buffer can be reused.  It doesn't imply that the
message has even arrived at the peer, let alone been placed in memory.

I've tested this on cxgb4 only so far.

Similar logic is probably needed for the performance tests as well.

Signed-off-by: Steve Wise sw...@opengridcomputing.com
---

 test/dapltest/test/dapl_transaction_stats.c |   10 ++
 test/dapltest/test/dapl_transaction_test.c  |  139 +++
 2 files changed, 149 insertions(+), 0 deletions(-)

diff --git a/test/dapltest/test/dapl_transaction_stats.c 
b/test/dapltest/test/dapl_transaction_stats.c
index f9d6377..6422ee3 100644
--- a/test/dapltest/test/dapl_transaction_stats.c
+++ b/test/dapltest/test/dapl_transaction_stats.c
@@ -59,6 +59,16 @@ DT_transaction_stats_set_ready(DT_Tdep_Print_Head * phead,
DT_Mdep_Unlock(transaction_stats-lock);
 }
 
+void
+DT_transaction_stats_reset_wait_count(DT_Tdep_Print_Head * phead,
+  Transaction_Stats_t * transaction_stats,
+  unsigned int num)
+{
+   DT_Mdep_Lock(transaction_stats-lock);
+   transaction_stats-wait_count = num;
+   DT_Mdep_Unlock(transaction_stats-lock);
+}
+
 bool
 DT_transaction_stats_wait_for_all(DT_Tdep_Print_Head * phead,
  Transaction_Stats_t * transaction_stats)
diff --git a/test/dapltest/test/dapl_transaction_test.c 
b/test/dapltest/test/dapl_transaction_test.c
index 779ea86..b94a2cc 100644
--- a/test/dapltest/test/dapl_transaction_test.c
+++ b/test/dapltest/test/dapl_transaction_test.c
@@ -1799,6 +1799,145 @@ DT_Transaction_Run(DT_Tdep_Print_Head * phead, 
Transaction_Test_t * test_ptr)
}   /* end loop for each op */
}   /* end loop for iteration */
 
+   /*
+* Final sync up to ensure all previous remote operations have
+* finished.
+*
+* Without a final sync exchange, we see async errors on iwarp 
+* devices because a read or write arrives after the rmr has 
+* been destroyed.  I believe this is more likely to happen with
+* iWARP than IB because iWARP completions only indicate the
+* local buffer can be reused.  It doesn't imply that the
+* message has even arrived at the peer, let alone been placed in 
memory.
+*/
+   if (test_ptr-is_server) {
+   /*
+* Server
+*/
+   DT_Tdep_PT_Debug(1,
+(phead,
+ Test[ F64x ]: Send Final Sync to Client\n,
+ test_ptr-base_port));
+   for (i = 0; i  test_ptr-cmd-eps_per_thread; i++) {
+   if (!DT_post_recv_buffer(phead,
+
test_ptr-ep_context[i].ep_handle,
+test_ptr-ep_context[i].bp,
+SYNC_RECV_BUFFER_ID, 
SYNC_BUFF_SIZE)) {
+   goto bail;
+   }
+   if (!DT_post_send_buffer(phead,
+test_ptr-ep_context[i].
+ep_handle,
+test_ptr-ep_context[i].bp,
+SYNC_SEND_BUFFER_ID,
+SYNC_BUFF_SIZE)) {
+   DT_Tdep_PT_Debug(1,
+(phead,
+ Test[ F64x
+ ]: Server final sync send 
error\n,
+ test_ptr-base_port));
+   goto bail;
+   }
+   }
+   for (i = 0; i  test_ptr-cmd-eps_per_thread; i++) {
+   DAT_DTO_COMPLETION_EVENT_DATA dto_stat;
+
+   if (!DT_dto_event_wait(phead,
+  test_ptr-ep_context[i].
+  reqt_evd_hdl, dto_stat)) {
+   DT_Tdep_PT_Debug(1,
+(phead,
+ Test[ F64x
+ ]: Server final sync send 
error\n,
+ 

RE: [PATCH RFC] dapltest: Add final send/recv sync for transaction tests.

2014-02-28 Thread Steve Wise
Hey Arlin,

This fix isn't quite correct:  If one side finishes its transaction IO, then 
posts the final sync SEND, but the peer hasn't finished with its transaction 
IO, then the peer could post a READ request, for example, and then wait for the 
READ completion.  However the READ response would get sent behind the local 
side's final sync SEND and wouldn't be placed because the peer's final sync 
RECV hasn't been posted yet.  Sigh.  I'll need to think more on this, but I'd 
like feedback if anybody has opinions/questions/comments...

Steve.

 -Original Message-
 From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
 ow...@vger.kernel.org] On Behalf Of Steve Wise
 Sent: Friday, February 28, 2014 11:14 AM
 To: arlin.r.da...@intel.com
 Cc: linux-rdma@vger.kernel.org
 Subject: [PATCH RFC] dapltest: Add final send/recv sync for transaction
 tests.
 
 The transaction tests need both sides to send a sync message after running
 the test.  This ensures that all remote operations are complete before
 dapltest deregeisters memory and disconnects the endpoints.
 
 Without this logic, we see async errors on iwarp devices because a read
 or write arrives after the rmr has been destroyed.  I believe this is
 more likely to happen with iWARP than IB because iWARP completions only
 indicate the local buffer can be reused.  It doesn't imply that the
 message has even arrived at the peer, let alone been placed in memory.
 
 I've tested this on cxgb4 only so far.
 
 Similar logic is probably needed for the performance tests as well.
 
 Signed-off-by: Steve Wise sw...@opengridcomputing.com
 ---
 
  test/dapltest/test/dapl_transaction_stats.c |   10 ++
  test/dapltest/test/dapl_transaction_test.c  |  139
 +++
  2 files changed, 149 insertions(+), 0 deletions(-)
 
 diff --git a/test/dapltest/test/dapl_transaction_stats.c
 b/test/dapltest/test/dapl_transaction_stats.c
 index f9d6377..6422ee3 100644
 --- a/test/dapltest/test/dapl_transaction_stats.c
 +++ b/test/dapltest/test/dapl_transaction_stats.c
 @@ -59,6 +59,16 @@
 DT_transaction_stats_set_ready(DT_Tdep_Print_Head * phead,
   DT_Mdep_Unlock(transaction_stats-lock);
  }
 
 +void
 +DT_transaction_stats_reset_wait_count(DT_Tdep_Print_Head * phead,
 +Transaction_Stats_t * transaction_stats,
 +unsigned int num)
 +{
 + DT_Mdep_Lock(transaction_stats-lock);
 + transaction_stats-wait_count = num;
 + DT_Mdep_Unlock(transaction_stats-lock);
 +}
 +
  bool
  DT_transaction_stats_wait_for_all(DT_Tdep_Print_Head * phead,
 Transaction_Stats_t * transaction_stats)
 diff --git a/test/dapltest/test/dapl_transaction_test.c
 b/test/dapltest/test/dapl_transaction_test.c
 index 779ea86..b94a2cc 100644
 --- a/test/dapltest/test/dapl_transaction_test.c
 +++ b/test/dapltest/test/dapl_transaction_test.c
 @@ -1799,6 +1799,145 @@ DT_Transaction_Run(DT_Tdep_Print_Head *
 phead, Transaction_Test_t * test_ptr)
   }   /* end loop for each op */
   }   /* end loop for iteration */
 
 + /*
 +  * Final sync up to ensure all previous remote operations have
 +  * finished.
 +  *
 +  * Without a final sync exchange, we see async errors on iwarp
 +  * devices because a read or write arrives after the rmr has
 +  * been destroyed.  I believe this is more likely to happen with
 +  * iWARP than IB because iWARP completions only indicate the
 +  * local buffer can be reused.  It doesn't imply that the
 +  * message has even arrived at the peer, let alone been placed in
 memory.
 +  */
 + if (test_ptr-is_server) {
 + /*
 +  * Server
 +  */
 + DT_Tdep_PT_Debug(1,
 +  (phead,
 +   Test[ F64x ]: Send Final Sync to
 Client\n,
 +   test_ptr-base_port));
 + for (i = 0; i  test_ptr-cmd-eps_per_thread; i++) {
 + if (!DT_post_recv_buffer(phead,
 +  test_ptr-
 ep_context[i].ep_handle,
 +  test_ptr-
 ep_context[i].bp,
 +  SYNC_RECV_BUFFER_ID,
 SYNC_BUFF_SIZE)) {
 + goto bail;
 + }
 + if (!DT_post_send_buffer(phead,
 +  test_ptr-ep_context[i].
 +  ep_handle,
 +  test_ptr-
 ep_context[i].bp,
 +  SYNC_SEND_BUFFER_ID,
 +  SYNC_BUFF_SIZE)) {
 + DT_Tdep_PT_Debug(1,
 +  (phead,
 +   Test[ F64x