Re: [ewg] RDS - Recovering from RDMA errors

2008-01-22 Thread Olaf Kirch
On Sunday 20 January 2008 20:57, Roland Dreier wrote:
 If you could send me some code and a recipe to get the bogus CQ
 message, that might be helpful.  Because as far as I can see, there
 shouldn't be any way for a consumer to get that message without a bug
 in the low-level driver.  It's fine if it's a whole big RDS test case,
 I just want to be able to run the test and instrument the low-level
 driver to get a better handle on what's happening.

Okay, I put my current patch queue into a git tree. It's in
the testing branch of

git://www.openfabrics.org/~okir/ofed_1_3/linux-2.6.git
git://www.openfabrics.org/~okir/ofed_1_3/rds-tools.git

In order to reproduce the problem, I usually run

while sleep 1; do
rds-stress -R -r locip -s remip -p 4000 -c -d2 -t8 -T5 -D1m
done

Within minutes, I get syslog messages saying

Timed out waiting for CQs to be drained - recv: 0 entries, send: 4 entries left

This message originates from net/rds_ib_cm.c - as a workaround, I added
a timeout of 1 second when waiting for the WQs to be drained. I usually
get those stalls after a WQE completes with status 10 (or sometimes 4).

 BTW, what kind of HCA are you using for this testing?

A pair of fairly new Mellanox cards.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-22 Thread Roland Dreier
   BTW, what kind of HCA are you using for this testing?
  
  A pair of fairly new Mellanox cards.

How new?  Is it ConnectX or something older -- ie do you use the
ib_mthca or mlx4_ib driver?  If you're using mlx4, then I could
believe there is a firmware bug that leads to lost completions.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-20 Thread Olaf Kirch
On Friday 18 January 2008 21:41, Roland Dreier wrote:
 I don't follow this.  All work requests should generate a completion
 eventually, unless you do something like destroy the work queue or
 overrun a CQ.  So what part of the spec are you talking about here?

The part on affiliated asynchronous errors says WQ processing is stopped.
This also happens if we're signalling a remote access error to the
other host.

When a RDMA operation errors out because the remote side destroyed the MR,
the RDMA WQE completes with error 10 (remote access error), which is exepected.
The other end sees an affiliated asynchronous error code 3 (remote access
error), which is also expected.

Now, on the sending system, I'm seeing send queue entries that do not get
completed. The RDMA itself is completed in error; the subsequent SEND
is completed (error 5, flushed) as well. But one or more entries seem to
remain on the queue - at least my book-keeping says so. I double checked
the book-keeping, and it seems accurate...

All very strange.

   I tried destroying the QP first, then we know we can pick off
   any remaining WRs still allocated. That didn't work, as the card
   seems to generate interrupts even after the QP is gone. This results
   in lots of errors on the console complaining about Completion to
   bogus CQ.
 
 Destroying a QP should immediately stop work processing, so no
 completions should be generated once the destroy QP operation
 returns.  I don't see how you get the bogus CQ message in this case --
 it certainly seems like a driver bug.  Unless you mean you are
 destroying the CQ with a QP still attached?  But that shouldn't be
 possible because the CQ's usecnt should be non-zero until all attached
 QPs are freed.  Not sure what could be going on but it sounds bad...

This may be a driver bug, yes.

   I then tried to move the QP to error state instead - this didn't
   elicit a storm of kernel messages anymore, but still I seem to get
   incoming completions.
 
 The cleanest way to destroy a QP is to move the QP to the error state,
 wait until you have seen a completion for every posted work request
 (the completions generated after the transition to the error state
 should have a flush status), and then destroy the QP.

Okay, that's what the RDS code does currently, but I get stuck waiting
for the queue to drain - it simply never does.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-20 Thread Roland Dreier
  The part on affiliated asynchronous errors says WQ processing is stopped.
  This also happens if we're signalling a remote access error to the
  other host.
  
  When a RDMA operation errors out because the remote side destroyed the MR,
  the RDMA WQE completes with error 10 (remote access error), which is 
  exepected.
  The other end sees an affiliated asynchronous error code 3 (remote access
  error), which is also expected.

I don't see anything about stopping processing -- in the spec I see:

For RC Service, the CI shall generate a Local Access Violation
Work Queue Error when the transport layer detects a Request access
violation at the Responder. The Responder's affiliated QP shall be
placed in the error state.

so the QP on the target side should just transition into the error
state and flush all requests as usual.

  Now, on the sending system, I'm seeing send queue entries that do not get
  completed. The RDMA itself is completed in error; the subsequent SEND
  is completed (error 5, flushed) as well. But one or more entries seem to
  remain on the queue - at least my book-keeping says so. I double checked
  the book-keeping, and it seems accurate...

So it seems that asynchronous events aren't an issue anyway, since the
problem is on the other end?  In any case it shouldn't happen that
send requests don't get flushed, so something is wrong somewhere.

 I tried destroying the QP first, then we know we can pick off
 any remaining WRs still allocated. That didn't work, as the card
 seems to generate interrupts even after the QP is gone. This results
 in lots of errors on the console complaining about Completion to
 bogus CQ.
   
   Destroying a QP should immediately stop work processing, so no
   completions should be generated once the destroy QP operation
   returns.  I don't see how you get the bogus CQ message in this case --
   it certainly seems like a driver bug.  Unless you mean you are
   destroying the CQ with a QP still attached?  But that shouldn't be
   possible because the CQ's usecnt should be non-zero until all attached
   QPs are freed.  Not sure what could be going on but it sounds bad...
  
  This may be a driver bug, yes.

If you could send me some code and a recipe to get the bogus CQ
message, that might be helpful.  Because as far as I can see, there
shouldn't be any way for a consumer to get that message without a bug
in the low-level driver.  It's fine if it's a whole big RDS test case,
I just want to be able to run the test and instrument the low-level
driver to get a better handle on what's happening.

BTW, what kind of HCA are you using for this testing?

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-19 Thread Tziporet Koren

Olaf Kirch wrote:

My question was more along the lines - can I expect that all pending
WRs have been flushed when ib_modify_qp returns? At least for error
state this does not seem to be the case - I'm still getting completions
on the receive QP from the mellanox card after I do this.

Olaf
  

You cannot expect all WQEs to be completed when ib_modify_qp returns.
Moving the QP to error guarantee that all completion will arrive (flush 
with error) and you need to wait till you get them all.

Only then the you should move the QP to reset (or close it).

Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-19 Thread Dotan Barak

Olaf Kirch wrote:

On Thursday 17 January 2008 16:51, Dotan Barak wrote:
  
Moving the QP to error state flushes all of the outstanding WRs and 
create a completion for each WR.
If you want to delete all of the outstanding WRs, you should move the QP 
state to reset.


(Is this is what you asked?)



My question was more along the lines - can I expect that all pending
WRs have been flushed when ib_modify_qp returns? At least for error
state this does not seem to be the case - I'm still getting completions
on the receive QP from the mellanox card after I do this.
  
I couldn't find in the IB spec any mention on this (that moving a QP to 
error means that all of the outstanding
completions are already flushed and can be found in the CQ when the 
modify_qp command returns).


I think that this is not guaranteed and you can't count on it on any IB 
HCA (even in none Mellanox HCAs).


Dotan
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-18 Thread Roland Dreier
  When I hit a RDMA error (which happens quite frequently now at rds-stress
  exit, thanks to the fixed mr pool flushing :) I often see the RDS 
  shutdown_worker
  getting stuck (and rmmod hangs). It's waiting for allocated WRs to disappear.
  This usually works, as all WQ entries are flushed out. This doesn't happen
  when a RDMA transfer generates a remote access error, and that seems to be
  intended according to the spec.

I don't follow this.  All work requests should generate a completion
eventually, unless you do something like destroy the work queue or
overrun a CQ.  So what part of the spec are you talking about here?

  I tried destroying the QP first, then we know we can pick off
  any remaining WRs still allocated. That didn't work, as the card
  seems to generate interrupts even after the QP is gone. This results
  in lots of errors on the console complaining about Completion to
  bogus CQ.

Destroying a QP should immediately stop work processing, so no
completions should be generated once the destroy QP operation
returns.  I don't see how you get the bogus CQ message in this case --
it certainly seems like a driver bug.  Unless you mean you are
destroying the CQ with a QP still attached?  But that shouldn't be
possible because the CQ's usecnt should be non-zero until all attached
QPs are freed.  Not sure what could be going on but it sounds bad...

  I then tried to move the QP to error state instead - this didn't
  elicit a storm of kernel messages anymore, but still I seem to get
  incoming completions.

The cleanest way to destroy a QP is to move the QP to the error state,
wait until you have seen a completion for every posted work request
(the completions generated after the transition to the error state
should have a flush status), and then destroy the QP.

 - R.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Olaf Kirch

When I hit a RDMA error (which happens quite frequently now at rds-stress
exit, thanks to the fixed mr pool flushing :) I often see the RDS 
shutdown_worker
getting stuck (and rmmod hangs). It's waiting for allocated WRs to disappear.
This usually works, as all WQ entries are flushed out. This doesn't happen
when a RDMA transfer generates a remote access error, and that seems to be
intended according to the spec.

I tried destroying the QP first, then we know we can pick off
any remaining WRs still allocated. That didn't work, as the card
seems to generate interrupts even after the QP is gone. This results
in lots of errors on the console complaining about Completion to
bogus CQ.

I then tried to move the QP to error state instead - this didn't
elicit a storm of kernel messages anymore, but still I seem to get
incoming completions.

Any other suggestions?

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RDS - Recovering from RDMA errors

2008-01-17 Thread Olaf Kirch
On Thursday 17 January 2008 16:51, Dotan Barak wrote:
 Moving the QP to error state flushes all of the outstanding WRs and 
 create a completion for each WR.
 If you want to delete all of the outstanding WRs, you should move the QP 
 state to reset.
 
 (Is this is what you asked?)

My question was more along the lines - can I expect that all pending
WRs have been flushed when ib_modify_qp returns? At least for error
state this does not seem to be the case - I'm still getting completions
on the receive QP from the mellanox card after I do this.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg