On Mon, Apr 05, 2021 at 04:29:10PM -0400, Alexander Ahring Oder Aring wrote:
> Hi,
> 
> On Mon, Apr 5, 2021 at 1:33 PM Alexander Ahring Oder Aring
> <aahri...@redhat.com> wrote:
> >
> > Hi,
> >
> > On Sat, Apr 3, 2021 at 11:34 AM Alexander Ahring Oder Aring
> > <aahri...@redhat.com> wrote:
> > >
> > ...
> > >
> > > > It seems to me that the only time DLM might need to retransmit data, is
> > > > when recovering from a connection failure. So why can't we just resend
> > > > unacknowledged data at reconnection time? That'd probably simplify the
> > > > code a lot (no need to maintain a retransmission timeout on TX, no need
> > > > to handle sequence numbers that are in the future on RX).
> > > >
> > >
> > > I can try to remove the timer, timeout and do the above approach to
> > > retransmit at reconnect. Then I test it again and I will report back
> > > to see if it works or why we have other problems.
> > >
> >
> > I have an implementation of this running and so far I don't see any 
> > problems.
> 
> There is a problem but it's related to the behaviour how reconnections
> are triggered. The whole communication can be stuck because the send()
> triggers a reconnection if not connected anymore. Before, the timer
> was triggering some send() and this was triggering a reconnection in a
> periodic way. Therefore we never had any stuck situation where nobody
> was sending anything anymore. It's a rare case but I am currently
> running into it. However I think I need to change how the
> reconnections are triggered with some "forever periodic try" which
> should solve this issue.

Would it be sufficient to detect socket errors to avoid this problem?
For example by letting lowcomms_error_report() do the reconnection when
necessary?

> 
> - Alex
> 

Reply via email to