Thanks for the explanation.
Let me know if you have additional information.

We have a contact at Mellanox. I will contact him.

Thanks,

Céline.

Vu Pham wrote:
Celine,

I'm seeing mlx4 in the log so it is connectX.

nfsrdma does not work with any official connectX' fw release 2.6.0 because of fast registering work request problems between nfsrdma and the firmware.

We are currently debugging/fixing those problems.

Do you have direct contact with Mellanox field application engineer? Please contact him/her.
If not I can send you a contact on private channel.

thanks,
-vu

Hi Celine,

What HCA do you have on your system? Is it ConnectX? If yes, what is its firmware version?

-vu

Hey Celine,

Thanks for gathering all this info! So the rdma connections work fine with everything _but_ nfsrdma. And errno 103 indicates the connection was aborted, maybe by the server (since no failures are logged by the client).


More below:


Celine Bourde wrote:
Hi Steve,

This email summarizes the situation:

Standard mount -> OK
---------------------

[r...@twind ~]# mount -o rw 192.168.0.215:/vol0 /mnt/
Command works fine.

rdma mount -> KO
-----------------

[r...@twind ~]# mount -o rdma,port=2050 192.168.0.215:/vol0 /mnt/
Command blocks ! I should perform Ctr+C to kill process.

or

[r...@twind ofa_kernel-1.4.1]# strace mount.nfs 192.168.0.215:/vol0 /mnt/ -o rdma,port=2050
[..]
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
connect(3, {sa_family=AF_INET, sin_port=htons(610), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
fcntl(3, F_SETFL, O_RDWR)               = 0
sendto(3, "-3\245\357\0\0\0\0\0\0\0\2\0\1\206\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 40, 0, {sa_family=AF_INET, sin_port=htons(610), sin_addr=inet_addr("127.0.0.1")}, 16) = 40
poll([{fd=3, events=POLLIN}], 1, 3000)  = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "-3\245\357\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(610), sin_addr=inet_addr("127.0.0.1")}, [16]) = 24
close(3)                                = 0
mount("192.168.0.215:/vol0", "/mnt", "nfs", 0, "rdma,port=2050,addr=192.168.0.215"
..same problem

[r...@twind tmp]# dmesg
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)



Is there anything logged on the server side?

Also, can you try this again, but on both systems do this before attempting the mount:

echo 32768 > /proc/sys/sunrpc/rpc_debug

This will enable all the rpc trace points and add a bunch of logging to /var/log/messages. Maybe that will show us something. It think the server is aborting the connection for some reason.

Steve.




_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to