Package: nfs-common,rdma-core

I've been testing the upgrade of a compute node from Debian11 to Debian12.
That node was connected through nfs with rdma protocol to a zfs-storage server 
running on Debian11.
The compute node and the storage server are part of a high-performance compute 
cluster, connected over infiniband.
Not sure whether this is important, but the storage server is using zfs.

After the upgrade of the compute node (node client) to Debian 12, this machine could not 
correctly read a few (small) files. The files were correctly shown with "ls", 
and the size matched as well.
However the content was corrupted (looked like random garbage). In one case the 
.ssh/authorized_keys was corrupted, in some other case the "version.lua" from 
the lmod system was affected, rendering lmod unusable.
Interestingly, only very few files seemed to be affected. Most files were 
correctly retrieved.

So this is a very subtle error, and not obvious.
When retrieving these files, no error was reported, but data of the expected 
size was retrieved.
Effectively, the retrieved data was corrupted, and could lead to potential data 

The compute node on Debian12 had

ii  libnfsidmap1:amd64                    1:2.6.2-4                             
  amd64        NFS idmapping library
ii  nfs-common                            1:2.6.2-4                             
  amd64        NFS support files common to client and server
ii  librdmacm1:amd64                      44.0-2                                
  amd64        Library for managing RDMA connections
ii  rdma-core                             44.0-2                                
  amd64        RDMA core userspace infrastructure and documentation
ii  rdmacm-utils                          44.0-2                                
  amd64        Examples for the librdmacm library

The storage server on Debian11 had
ii  nfs-common                         1:1.3.4-6                      amd64     
   NFS support files common to client and server
ii  nfs-kernel-server                  1:1.3.4-6                      amd64     
   support for NFS kernel server
ii  librdmacm1:amd64                   33.2-1                         amd64     
   Library for managing RDMA connections

The problem went away, when changing nfs mount protocal from proto=rdma to 

I tried to learn about this incompatibility, but did not find any information.
I'm also curious whether an nfs 2.6 server would correctly talk to an nfs 1.3 
client over rdma ?
Can anyone provide more information on that topic ?

Reply via email to