Hello Krister,

> -----Original Message-----
> From: Krister Johansen <k...@templeofstupid.com>
> Sent: Friday, November 10, 2023 2:10 AM
> To: Borah, Chaitanya Kumar <chaitanya.kumar.bo...@intel.com>
> Cc: k...@templeofstupid.com; intel-gfx@lists.freedesktop.org; Kurmi, Suresh
> Kumar <suresh.kumar.ku...@intel.com>; Saarinen, Jani
> <jani.saari...@intel.com>; Miklos Szeredi <mszer...@redhat.com>
> Subject: Re: Regression on linux-next (next-20231107)
> 
> Hi Chaitanya,
> 
> On Thu, Nov 09, 2023 at 05:00:09PM +0000, Borah, Chaitanya Kumar wrote:
> > Hello Krister,
> >
> > Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> >
> > This mail is regarding a regression we are seeing in our CI runs[1] for some
> machines (dg2 and adl-p) on linux-next  repository.
> >
> > Since the version next-20231107 [2], we are seeing the following error
> > ```````````````````````````````````````````````````````````````````````````````
> > <4>[   32.015910] stack segment: 0000 [#1] PREEMPT SMP NOPTI
> > <4>[   32.021048] CPU: 15 PID: 766 Comm: fusermount Not tainted 6.6.0-
> next-20231107-next-20231107-g5cd631a52568+ #1
> > <4>[   32.031135] Hardware name: Intel Corporation Raptor Lake Client
> Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS
> RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> > <4>[   32.044657] RIP: 0010:fuse_evict_inode+0x61/0x150 [fuse]
> > ``````````````````````````````````````````````````````````````````````
> > ```````````
> >
> > Details log can be found in [3].
> >
> > After bisecting the tree, the following patch [4] seems to be the
> > first "bad" commit
> >
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> > 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5 is the first bad commit
> > commit 513dfacefd712bcbfab64e1a9c9c3e0d51c2dca5
> > Author: Krister Johansen k...@templeofstupid.com
> > Date:   Fri Nov 3 10:39:47 2023 -0700
> >
> >     fuse: share lookup state between submount and its parent
> >
> >     Fuse submounts do not perform a lookup for the nodeid that they inherit
> >     from their parent.  Instead, the code decrements the nlookup on the
> >     submount's fuse_inode when it is instantiated, and no forget is
> >     performed when a submount root is evicted.
> >
> >     Trouble arises when the submount's parent is evicted despite the
> >     submount itself being in use.  In this author's case, the submount was
> >     in a container and deatched from the initial mount namespace via a
> >     MNT_DEATCH operation.  When memory pressure triggered the shrinker,
> the
> >     inode from the parent was evicted, which triggered enough forgets to
> >     render the submount's nodeid invalid.
> >
> >     Since submounts should still function, even if their parent goes away,
> >     solve this problem by sharing refcounted state between the parent and
> >     its submount.  When all of the references on this shared state reach
> >     zero, it's safe to forget the final lookup of the fuse nodeid.
> >
> >
> > ``````````````````````````````````````````````````````````````````````
> > ```````````````````````````````````
> >
> > We also verified that if we revert the patch the issue is not seen.
> >
> > Could you please check why the patch causes this regression and provide a
> fix if necessary?
> 
> Apologies for the inconvenience.  I've reproduced the problem, tested a fix,
> and am in the process of preparing patches to send to Miklos.  I'll cc the
> people on this e-mail in that thread.
> 
> > [3]
> > http://gfx-ci.igk.intel.com/tree/linux-next/next-20231109/bat-dg2-14/b
> > oot0.txt
> 
> This link didn't resolve in DNS when I tried to access it.  I needed to use 
> intel-
> gfx-ci.01.org as the hostname instead.
> 

My bad. I realized it too late. Hope you found the logs. If not here they are.

https://intel-gfx-ci.01.org/tree/linux-next/next-20231109/bat-dg2-14/boot0.txt

Regards

Chaitanya
> Thanks,
> 
> -K

Reply via email to