I went through all the instances, where there would be an immediate soupcall 
triggered (before r367492).

If the problem is related to a race condition, where the socket is unlocked 
before the upcall, I can change the patch in such a way, to retain the lock on 
the socket all through TCP processing.

Both sorwakeups are with a locked socket (which is the critical part, I 
understand), while for the write upcall there is one unlocked, and one 
locked....


Richard Scheffenegger
Consulting Solution Architect
NAS & Networking

NetApp
+43 1 3676 811 3157 Direct Phone
+43 664 8866 1857 Mobile Phone
richard.scheffeneg...@netapp.com

https://ts.la/richard49892


-----Ursprüngliche Nachricht-----
Von: tue...@freebsd.org <tue...@freebsd.org> 
Gesendet: Samstag, 10. April 2021 18:13
An: Rick Macklem <rmack...@uoguelph.ca>
Cc: Scheffenegger, Richard <richard.scheffeneg...@netapp.com>; Youssef GHORBAL 
<youssef.ghor...@pasteur.fr>; freebsd-net@freebsd.org
Betreff: Re: NFS Mount Hangs

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




> On 10. Apr 2021, at 17:56, Rick Macklem <rmack...@uoguelph.ca> wrote:
>
> Scheffenegger, Richard <richard.scheffeneg...@netapp.com> wrote:
>>> Rick wrote:
>>> Hi Rick,
>>>
>>>> Well, I have some good news and some bad news (the bad is mostly for 
>>>> Richard).
>>>>
>>>> The only message logged is:
>>>> tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, segment 
>>>> processed normally
>>>>
> Btw, I did get one additional message during further testing (with r367492 
> reverted):
> tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, connection 
> attempt aborted
>   by remote endpoint
>
> This only happened once of several test cycles.
That is OK.
>
>>>> But...the RST battle no longer occurs. Just one RST that works and then 
>>>> the SYN gets SYN,ACK'd by the FreeBSD end and off it goes...
>>>>
>>>> So, what is different?
>>>>
>>>> r367492 is reverted from the FreeBSD server.
>>>> I did the revert because I think it might be what otis@ hang is being 
>>>> caused by. (In his case, the Recv-Q grows on the socket for the stuck 
>>>> Linux client, while others work.
>>>>
>>>> Why does reverting fix this?
>>>> My only guess is that the krpc gets the upcall right away and sees a EPIPE 
>>>> when it does soreceive()->results in soshutdown(SHUT_WR).
> This was bogus and incorrect. The diagnostic printf() I saw was 
> generated for the back channel, and that would have occurred after the socket 
> was shut down.
>
>>>
>>> With r367492 you don't get the upcall with the same error state? Or you 
>>> don't get an error on a write() call, when there should be one?
> If Send-Q is 0 when the network is partitioned, after healing, the 
> krpc sees no activity on the socket (until it acquires/processes an RPC it 
> will not do a sosend()).
> Without the 6minute timeout, the RST battle goes on "forever" (I've 
> never actually waited more than 30minutes, which is close enough to "forever" 
> for me).
> --> With the 6minute timeout, the "battle" stops after 6minutes, when 
> --> the timeout
>      causes a soshutdown(..SHUT_WR) on the socket.
>      (Since the soshutdown() patch is not yet in "main". I got comments, but 
> no "reviewed"
>       on it, the 6minute timer won't help if enabled in main. The soclose() 
> won't happen
>       for TCP connections with the back channel enabled, such as Linux 
> 4.1/4.2 ones.)
I'm confused. So you are saying that if the Send-Q is empty when you partition 
the network, and the peer starts to send SYNs after the healing, FreeBSD 
responds with a challenge ACK which triggers the sending of a RST by Linux. 
This RST is ignored multiple times.
Is that true? Even with my patch for the the bug I introduced?
What version of the kernel are you using?

Best regards
Michael
>
> If Send-Q is non-empty when the network is partitioned, the battle will not 
> happen.
>
>>
>> My understanding is that he needs this error indication when calling 
>> shutdown().
> There are several ways the krpc notices that a TCP connection is no longer 
> functional.
> - An error return like EPIPE from either sosend() or soreceive().
> - A return of 0 from soreceive() with no data (normal EOF from other end).
> - A 6minute timeout on the server end, when no activity has occurred 
> on the  connection. This timer is currently disabled for NFSv4.1/4.2 
> mounts in "main",  but I enabled it for this testing, to stop the "RST battle 
> goes on forever"
>  during testing. I am thinking of enabling it on "main", but this 
> crude bandaid  shouldn't be thought of as a "fix for the RST battle".
>
>>>
>>> From what you describe, this is on writes, isn't it? (I'm asking, at the 
>>> original problem that was fixed with r367492, occurs in the read path 
>>> (draining of ths so_rcv buffer in the upcall right away, which subsequently 
>>> influences the ACK sent by the stack).
>>>
>>> I only added the so_snd buffer after some discussion, if the WAKESOR 
>>> shouldn't have a symmetric equivalent on WAKESOW....
>>>
>>> Thus a partial backout (leaving the WAKESOR part inside, but reverting the 
>>> WAKESOW part) would still fix my initial problem about erraneous DSACKs 
>>> (which can also lead to extremely poor performance with Linux clients), but 
>>> possible address this issue...
>>>
>>> Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 for 
>>> the revert only on the so_snd upcall?
> Since the krpc only uses receive upcalls, I don't see how reverting 
> the send side would have any effect?
>
>> Since the release of 13.0 is almost done, can we try to fix the issue 
>> instead of reverting the commit?
> I think it has already shipped broken.
> I don't know if an errata is possible, or if it will be broken until 13.1.
>
> --> I am much more concerned with the otis@ stuck client problem than 
> --> this RST battle that only
>       occurs after a network partitioning, especially if it is 13.0 specific.
>       I did this testing to try to reproduce Jason's stuck client (with 
> connection in CLOSE_WAIT)
>       problem, which I failed to reproduce.
>
> rick
>
> Rs: agree, a good understanding where the interaction btwn stack, 
> socket and in kernel tcp user breaks is needed;
>
>>
>> If this doesn't help, some major surgery will be necessary to prevent NFS 
>> sessions with SACK enabled, to transmit DSACKs...
>
> My understanding is that the problem is related to getting a local 
> error indication after receiving a RST segment too late or not at all.
>
> Rs: but the move of the upcall should not materially change that; i don’t 
> have a pc here to see if any upcall actually happens on rst...
>
> Best regards
> Michael
>>
>>
>>> I know from a printf that this happened, but whether it caused the RST 
>>> battle to not happen, I don't know.
>>>
>>> I can put r367492 back in and do more testing if you'd like, but I think it 
>>> probably needs to be reverted?
>>
>> Please, I don't quite understand why the exact timing of the upcall would be 
>> that critical here...
>>
>> A comparison of the soxxx calls and errors between the "good" and the "bad" 
>> would be perfect. I don't know if this is easy to do though, as these calls 
>> appear to be scattered all around the RPC / NFS source paths.
>>
>>> This does not explain the original hung Linux client problem, but does shed 
>>> light on the RST war I could create by doing a network partitioning.
>>>
>>> rick
>>
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to