> On Sep 17, 2020, at 4:23 PM, Frank van der Linden <[email protected]> wrote:
> 
> On Thu, Sep 17, 2020 at 03:09:31PM -0400, bfields wrote:
>> 
>> On Thu, Sep 17, 2020 at 05:01:11PM +0100, Daire Byrne wrote:
>>> 
>>> ----- On 15 Sep, 2020, at 18:21, bfields [email protected] wrote:
>>> 
>>>>> 4) With an NFSv4 re-export, lots of open/close requests (hundreds per
>>>>> second) quickly eat up the CPU on the re-export server and perf top
>>>>> shows we are mostly in native_queued_spin_lock_slowpath.
>>>> 
>>>> Any statistics on who's calling that function?
>>> 
>>> I've always struggled to reproduce this with a simple open/close 
>>> simulation, so I suspect some other operations need to be mixed in too. But 
>>> I have one production workload that I know has lots of opens & closes 
>>> (buggy software) included in amongst the usual reads, writes etc.
>>> 
>>> With just 40 clients mounting the reexport server (v5.7.6) using NFSv4.2, 
>>> we see the CPU of the nfsd threads increase rapidly and by the time we have 
>>> 100 clients, we have maxed out the 32 cores of the server with most of that 
>>> in native_queued_spin_lock_slowpath.
>> 
>> That sounds a lot like what Frank Van der Linden reported:
>> 
>>        
>> https://lore.kernel.org/linux-nfs/20200608192122.ga19...@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com/
>> 
>> It looks like a bug in the filehandle caching code.
>> 
>> --b.
> 
> Yes, that does look like the same one.
> 
> I still think that not caching v4 files at all may be the best way to go
> here, since the intent of the filecache code was to speed up v2/v3 I/O,
> where you end up doing a lot of opens/closes, but it doesn't make as
> much sense for v4.
> 
> However, short of that, I tested a local patch a few months back, that
> I never posted here, so I'll do so now. It just makes v4 opens in to
> 'long term' opens, which do not get put on the LRU, since that doesn't
> make sense (they are in the hash table, so they are still cached).
> 
> Also, the file caching code seems to walk the LRU a little too often,
> but that's another issue - and this change keeps the LRU short, so it's
> not a big deal.
> 
> I don't particularly love this patch, but it does keep the LRU short, and
> did significantly speed up my testcase (by about 50%). So, maybe you can
> give it a try.
> 
> I'll also attach a second patch, that converts the hash table to an 
> rhashtable,
> which automatically grows and shrinks in size with usage. That patch also
> helped, but not by nearly as much (I think it yielded another 10%).

For what it's worth, I applied your two patches to my test server, along
with my patch that force-closes cached file descriptors during NFSv4
CLOSE processing. The patch combination improves performance (faster
elapsed time) for my workload as well.


--
Chuck Lever




--
Linux-cachefs mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cachefs

Reply via email to