Just a sanity check for everyone, I rebuilt release 2.5.1, and everything works correctly. I looked back at the 2.5.1 code, and noticed that we were just hardcoding the max_wr value to 512, which is much less than what I'm seeing here in 2.6.1 - (70). I also verified that we have a 32768 max_wr value from 2.5.1's ibv_device_query() function.
However, 2.6.1 is reporting 0?

This looks to me now like we're tripping up the driver somehow, if the 0 is accurate.
Was there any change in order of init between the two releases?

I'm puzzled.

Kyle Schochenmaier wrote:
Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Wed, 27 Dec 2006 15:06 -0600:
Excellent, thanks Pete! For some reason i thought that patch was only for caching. I built the latest pvfs-CVS, and am having problems with openib. I originally thought it was a problem with having my client using CVS and the server using the latest release, but rebuilt the server to be on CVS head, and got this on the client:

p5l8:~# pvfs2-ls
[E 15:02:10.201683] Error: openib_new_connection: asked for 70 send WRs on QP, got 0.

So this is what I'm getting on the server now:

[D 11:41:10.340612] PVFS2 Server version 2.6.1pre1-2006-12-27-205836 starting.
[E 11:41:39.405117] Warning: exchange_data: partial read, 1/8 bytes.
[E 11:41:39.408171] SIGSEGV: skipping cleanup; exit now!

Something is weird here, I wouldnt expect a connection failure/resource issue on client side to
segfault the server, but not the client.  More debugging to come.

With network debugging on, I get this on server, which leads me to believe I have a configuration problem somewhere as well: [D 11:46:01.594426] PVFS2 Server version 2.6.1pre1-2006-12-27-205836 starting.
[D 11:46:01.595332] BMI_ib_initialize: init.
[D 11:46:01.595404] openib_ib_initialize: init.
[D 11:46:01.596383] openib_ib_initialize: max 65408 completion queue entries.
[D 11:46:01.596664] BMI_ib_initialize: done.

'pvfs2-ls on client'
<pages of this>
[D 11:46:15.711040] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:46:15.723029] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:46:15.735042] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:46:15.747048] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:46:15.759037] BMI_ib_testcontext: last activity too long ago, blocking.
[E 11:46:15.761497] Warning: exchange_data: partial read, 1/8 bytes.
[D 11:46:15.761542] ib_close_connection: closing connection to 10.1.4.57:60889.
(END)
The interesting part is I have timeouts effectively turned off in the filesystem config for other testing purposes, which when set back to default (timeouts in the pvfs2-fs config file)..

[D 11:54:11.946168] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:54:11.958164] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:54:11.970166] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:54:11.982167] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:54:11.994166] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:54:12.006190] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:54:12.018165] BMI_ib_testcontext: last activity too long ago, blocking. [D 11:54:12.030165] BMI_ib_testcontext: last activity too long ago, blocking.
[E 11:54:12.041386] Warning: exchange_data: partial read, 1/8 bytes.
[D 11:54:12.041441] ib_close_connection: closing connection to 10.1.4.57:34756.
[E 11:54:12.042161] SIGSEGV: skipping cleanup; exit now!

I realize this may also be a hardware issue, but I'd like to see the server not barf when clients fail to connect.. I also tried commenting out those checks, yes its a hardware problem on the client side for now, oddly enough though, netpipe over ib works fine, so do all of my standard IB tests.

That's just hardware.  It's comparing the returned values in
ibv_qp_init_attr to what was asked for.  You could try commenting
out those checks, just to see if the IB library did not set the
return values properly.  Or you could shrink the request num_wr and
see if that helps.  Either way, next stop is your IB card vendor.

        -- Pete






--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to