I believe 0 values mean no retry based on test. With both values as 0, It kept receiving IBV_WC_RETRY_EXC_ERR for RDMA READ operations, which aborts the connection. Changing to 7 (which I find to be commonly used value in several examples from RDMA Core library), the error goes away and we can at least have stable connection in such RoCE network.
-----Original Message----- From: Alexey Kuznetsov <kuz...@acronis.com <mailto:kuz...@acronis.com>> Date: Thursday, 17 August 2023 at 11:52 PM To: Kui Liu <kui....@acronis.com <mailto:kui....@acronis.com>> Cc: Devel <devel@openvz.org <mailto:devel@openvz.org>> Subject: Re: [Devel] [PATCH RH7] fs/fuse kio: adjust rdma connection parameters Ack. What 0 values did mean? No retry or some default value? On Thu, Aug 17, 2023 at 11:47 PM Kui Liu <kui....@acronis.com <mailto:kui....@acronis.com>> wrote: > > > > In RoCE network, packet loss and dealy due to congestion can happen > > quite often. We need to tolerate such event. So increase retry_count > > and rnr_retry_count to 7 to allow NIC to retry operations when an > > error happens, instead of returning the error directly which causes > > the connection to be aborted. > > > > Signed-off-by: Liu Kui <kui....@acronis.com <mailto:kui....@acronis.com>> > > --- > > fs/fuse/kio/pcs/pcs_rdma_conn.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/fs/fuse/kio/pcs/pcs_rdma_conn.c b/fs/fuse/kio/pcs/pcs_rdma_conn.c > > index 4db903151de0..7339b1466d3a 100644 > > --- a/fs/fuse/kio/pcs/pcs_rdma_conn.c > > +++ b/fs/fuse/kio/pcs/pcs_rdma_conn.c > > @@ -44,8 +44,8 @@ conn_param_init(struct rdma_conn_param *cp, struct > pcs_rdmaio_conn_req *cr, > > cp->initiator_depth = min_t(int, U8_MAX, > cmid->device->attrs.max_qp_init_rd_atom); > > > > cp->flow_control = 1; /* does not matter */ > > - cp->retry_count = 0; /* # retransmissions when no ACK received */ > > - cp->rnr_retry_count = 0; /* # RNR retransmissions */ > > + cp->retry_count = 7; /* # retransmissions when no ACK received */ > > + cp->rnr_retry_count = 7; /* # RNR retransmissions */ > > } > > > > static int pcs_rdma_cm_event_handler(struct rdma_cm_id *cmid, > > -- > > 2.32.0 (Apple Git-132) > > _______________________________________________ > Devel mailing list > Devel@openvz.org <mailto:Devel@openvz.org> > https://lists.openvz.org/mailman/listinfo/devel > <https://lists.openvz.org/mailman/listinfo/devel> _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel