Re: [PATCH 1/1] staging: android: ashmem: Fix lockdep warning for write operation

2020-07-30 Thread Suren Baghdasaryan
On Wed, Jul 29, 2020 at 8:24 PM Joel Fernandes  wrote:
>
> On Wed, Jul 15, 2020 at 10:45 PM Suren Baghdasaryan  wrote:
> >
> > syzbot report [1] describes a deadlock when write operation against an
> > ashmem fd executed at the time when ashmem is shrinking its cache results
> > in the following lock sequence:
> >
> > Possible unsafe locking scenario:
> >
> > CPU0CPU1
> > 
> >lock(fs_reclaim);
> > lock(>s_type->i_mutex_key#13);
> > lock(fs_reclaim);
> >lock(>s_type->i_mutex_key#13);
> >
> > kswapd takes fs_reclaim and then inode_lock while generic_perform_write
> > takes inode_lock and then fs_reclaim. However ashmem does not support
> > writing into backing shmem with a write syscall. The only way to change
> > its content is to mmap it and operate on mapped memory. Therefore the race
> > that lockdep is warning about is not valid. Resolve this by introducing a
> > separate lockdep class for the backing shmem inodes.
> >
> > [1]: https://lkml.kernel.org/lkml/0b5f9d059aa20...@google.com/
> >
> > Signed-off-by: Suren Baghdasaryan 
> > ---
>
> Once Eric's nits are resolved:
>
> Reviewed-by: Joel Fernandes (Google) 

Thanks Joel!
I'm fixing the nits and will report the patch shortly. One note about
adding the "Fixes: " tag - this is a fix for a false positive lockdep
warning and it's unclear which patch should be quoted here (I could
not find a clear cause that started this warning). In similar
situations, for example here: https://lkml.org/lkml/2020/6/15/958
developers seem to skip that tag. So I'll do the same.

>
> Thanks.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/1] staging: android: ashmem: Fix lockdep warning for write operation

2020-07-16 Thread Suren Baghdasaryan
On Wed, Jul 15, 2020 at 8:30 PM Eric Biggers  wrote:
>
> On Wed, Jul 15, 2020 at 07:45:27PM -0700, Suren Baghdasaryan wrote:
> > syzbot report [1] describes a deadlock when write operation against an
> > ashmem fd executed at the time when ashmem is shrinking its cache results
> > in the following lock sequence:
> >
> > Possible unsafe locking scenario:
> >
> > CPU0CPU1
> > 
> >lock(fs_reclaim);
> > lock(>s_type->i_mutex_key#13);
> > lock(fs_reclaim);
> >lock(>s_type->i_mutex_key#13);
> >
> > kswapd takes fs_reclaim and then inode_lock while generic_perform_write
> > takes inode_lock and then fs_reclaim. However ashmem does not support
> > writing into backing shmem with a write syscall. The only way to change
> > its content is to mmap it and operate on mapped memory. Therefore the race
> > that lockdep is warning about is not valid. Resolve this by introducing a
> > separate lockdep class for the backing shmem inodes.
> >
> > [1]: https://lkml.kernel.org/lkml/0b5f9d059aa20...@google.com/
> >
> > Signed-off-by: Suren Baghdasaryan 
>
> Please add proper tags:
>
> Reported-by: syzbot+7a0d9d0b26efefe61...@syzkaller.appspotmail.com
> Fixes: ...
> Cc: sta...@vger.kernel.org
>
>
> The Reported-by tag to use was given in the original syzbot report.

Will add in v2. Thanks!

>
> - Eric
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: possible deadlock in shmem_fallocate (4)

2020-07-14 Thread Suren Baghdasaryan
On Tue, Jul 14, 2020 at 8:47 AM Todd Kjos  wrote:
>
> +Suren Baghdasaryan +Hridya Valsaraju who support the ashmem driver.

Thanks for looping me in.

>
>
> On Tue, Jul 14, 2020 at 7:18 AM Michal Hocko  wrote:
> >
> > On Tue 14-07-20 22:08:59, Hillf Danton wrote:
> > >
> > > On Tue, 14 Jul 2020 10:26:29 +0200 Michal Hocko wrote:
> > > > On Tue 14-07-20 13:32:05, Hillf Danton wrote:
> > > > >
> > > > > On Mon, 13 Jul 2020 20:41:11 -0700 Eric Biggers wrote:
> > > > > > On Tue, Jul 14, 2020 at 11:32:52AM +0800, Hillf Danton wrote:
> > > > > > >
> > > > > > > Add FALLOC_FL_NOBLOCK and on the shmem side try to lock inode 
> > > > > > > upon the
> > > > > > > new flag. And the overall upside is to keep the current gfp 
> > > > > > > either in
> > > > > > > the khugepaged context or not.
> > > > > > >
> > > > > > > --- a/include/uapi/linux/falloc.h
> > > > > > > +++ b/include/uapi/linux/falloc.h
> > > > > > > @@ -77,4 +77,6 @@
> > > > > > >   */
> > > > > > >  #define FALLOC_FL_UNSHARE_RANGE  0x40
> > > > > > >
> > > > > > > +#define FALLOC_FL_NOBLOCK0x80
> > > > > > > +
> > > > > >
> > > > > > You can't add a new UAPI flag to fix a kernel-internal problem like 
> > > > > > this.
> > > > >
> > > > > Sounds fair, see below.
> > > > >
> > > > > What the report indicates is a missing PF_MEMALLOC_NOFS and it's
> > > > > checked on the ashmem side and added as an exception before going
> > > > > to filesystem. On shmem side, no more than a best effort is paid
> > > > > on the inteded exception.
> > > > >
> > > > > --- a/drivers/staging/android/ashmem.c
> > > > > +++ b/drivers/staging/android/ashmem.c
> > > > > @@ -437,6 +437,7 @@ static unsigned long
> > > > >  ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control 
> > > > > *sc)
> > > > >  {
> > > > >   unsigned long freed = 0;
> > > > > + bool nofs;
> > > > >
> > > > >   /* We might recurse into filesystem code, so bail out if necessary 
> > > > > */
> > > > >   if (!(sc->gfp_mask & __GFP_FS))
> > > > > @@ -445,6 +446,11 @@ ashmem_shrink_scan(struct shrinker *shri
> > > > >   if (!mutex_trylock(_mutex))
> > > > >   return -1;
> > > > >
> > > > > + /* enter filesystem with caution: nonblock on locking */
> > > > > + nofs = current->flags & PF_MEMALLOC_NOFS;
> > > > > + if (!nofs)
> > > > > + current->flags |= PF_MEMALLOC_NOFS;
> > > > > +
> > > > >   while (!list_empty(_lru_list)) {
> > > > >   struct ashmem_range *range =
> > > > >   list_first_entry(_lru_list, typeof(*range), 
> > > > > lru);
> > > >
> > > > I do not think this is an appropriate fix. First of all is this a real
> > > > deadlock or a lockdep false positive? Is it possible that ashmem just
> > >
> > > The warning matters and we can do something to quiesce it.
> >
> > The underlying issue should be fixed rather than _something_ done to
> > silence it.
> >
> > > > needs to properly annotate its shmem inodes? Or is it possible that
> > > > the internal backing shmem file is visible to the userspace so the write
> > > > path would be possible?
> > > >
> > > > If this a real problem then the proper fix would be to set internal
> > > > shmem mapping's gfp_mask to drop __GFP_FS.
> > >
> > > Thanks for the tip, see below.
> > >
> > > Can you expand a bit on how it helps direct reclaimers like khugepaged
> > > in the syzbot report wrt deadlock?
> >
> > I do not understand your question.
> >
> > > TBH I have difficult time following
> > > up after staring at the chart below for quite a while.
> >
> > Yes, lockdep reports are quite hard to follow and they tend to confuse
> > one hell out of me. But this one says that there is a reclaim dependency
> > between the shmem inode lock and the reclaim context.
> >
&g

Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-07 Thread Suren Baghdasaryan
From: Sultan Alsawaf 
Date: Tue, May 7, 2019 at 9:53 AM
To: Suren Baghdasaryan
Cc: Christian Brauner, Greg Kroah-Hartman, open list:ANDROID DRIVERS,
Daniel Colascione, Todd Kjos, Kees Cook, Peter Zijlstra, Martijn
Coenen, LKML, Tim Murray, Michal Hocko, linux-mm, Arve Hjønnevåg, Ingo
Molnar, Steven Rostedt, Oleg Nesterov, Joel Fernandes, Andy
Lutomirski, kernel-team

> On Tue, May 07, 2019 at 09:28:47AM -0700, Suren Baghdasaryan wrote:
> > Hi Sultan,
> > Looks like you are posting this patch for devices that do not use
> > userspace LMKD solution due to them using older kernels or due to
> > their vendors sticking to in-kernel solution. If so, I see couple
> > logistical issues with this patch. I don't see it being adopted in
> > upstream kernel 5.x since it re-implements a deprecated mechanism even
> > though vendors still use it. Vendors on the other hand, will not adopt
> > it until you show evidence that it works way better than what
> > lowmemorykilled driver does now. You would have to provide measurable
> > data and explain your tests before they would consider spending time
> > on this.
>
> Yes, this is mostly for the devices already produced that are forced to suffer
> with poor memory management. I can't even convince vendors to fix kernel
> memory leaks, so there's no way I'd be able to convince them of trying this
> patch, especially since it seems like you're having trouble convincing vendors
> to stop using lowmemorykiller in the first place. And thankfully, convincing
> vendors isn't my job :)
>
> > On the implementation side I'm not convinced at all that this would
> > work better on all devices and in all circumstances. We had cases when
> > a new mechanism would show very good results until one usecase
> > completely broke it. Bulk killing of processes that you are doing in
> > your patch was a very good example of such a decision which later on
> > we had to rethink. That's why baking these policies into kernel is
> > very problematic. Another problem I see with the implementation that
> > it ties process killing with the reclaim scan depth. It's very similar
> > to how vmpressure works and vmpressure in my experience is very
> > unpredictable.
>
> Could you elaborate a bit on why bulk killing isn't good?

Yes. Several months ago we got reports that LMKD got very aggressive
killing more processes in situations which did not require that many
kills to recover from memory pressure. After investigation we tracked
it to the bulk killing which would kill too many processes during a
memory usage spike. When killing gradually the kills would be avoided,
so we had to get back to a more balanced approach. The conceptual
issue with bulk killing is that you can't cancel it when device
recovers quicker than you expected.

> > > > I linked in the previous message, Google made a rather large set of
> > > > modifications to the supposedly-defunct lowmemorykiller.c not one month 
> > > > ago.
> > > > What's going on?
> >
> > If you look into that commit, it adds ability to report kill stats. If
> > that was a change in how that driver works it would be rejected.
>
> Fair, though it was quite strange seeing something that was supposedly totally
> abandoned receiving a large chunk of code for reporting stats.
>
> Thanks,
> Sultan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

2019-05-07 Thread Suren Baghdasaryan
From: Christian Brauner 
Date: Tue, May 7, 2019 at 3:58 AM
To: Sultan Alsawaf
Cc: Greg Kroah-Hartman, open list:ANDROID DRIVERS, Daniel Colascione,
Todd Kjos, Kees Cook, Peter Zijlstra, Martijn Coenen, LKML, Tim
Murray, Michal Hocko, Suren Baghdasaryan, linux-mm, Arve Hjønnevåg,
Ingo Molnar, Steven Rostedt, Oleg Nesterov, Joel Fernandes, Andy
Lutomirski, kernel-team

> On Tue, May 07, 2019 at 01:12:36AM -0700, Sultan Alsawaf wrote:
> > On Tue, May 07, 2019 at 09:43:34AM +0200, Greg Kroah-Hartman wrote:
> > > Given that any "new" android device that gets shipped "soon" should be
> > > using 4.9.y or newer, is this a real issue?
> >
> > It's certainly a real issue for those who can't buy brand new Android 
> > devices
> > without software bugs every six months :)
> >

Hi Sultan,
Looks like you are posting this patch for devices that do not use
userspace LMKD solution due to them using older kernels or due to
their vendors sticking to in-kernel solution. If so, I see couple
logistical issues with this patch. I don't see it being adopted in
upstream kernel 5.x since it re-implements a deprecated mechanism even
though vendors still use it. Vendors on the other hand, will not adopt
it until you show evidence that it works way better than what
lowmemorykilled driver does now. You would have to provide measurable
data and explain your tests before they would consider spending time
on this.
On the implementation side I'm not convinced at all that this would
work better on all devices and in all circumstances. We had cases when
a new mechanism would show very good results until one usecase
completely broke it. Bulk killing of processes that you are doing in
your patch was a very good example of such a decision which later on
we had to rethink. That's why baking these policies into kernel is
very problematic. Another problem I see with the implementation that
it ties process killing with the reclaim scan depth. It's very similar
to how vmpressure works and vmpressure in my experience is very
unpredictable.

> > > And if it is, I'm sure that asking for those patches to be backported to
> > > 4.4.y would be just fine, have you asked?
> > >
> > > Note that I know of Android Go devices, running 3.18.y kernels, do NOT
> > > use the in-kernel memory killer, but instead use the userspace solution
> > > today.  So trying to get another in-kernel memory killer solution added
> > > anywhere seems quite odd.
> >
> > It's even more odd that although a userspace solution is touted as the 
> > proper
> > way to go on LKML, almost no Android OEMs are using it, and even in that 
> > commit
>
> That's probably because without proper kernel changes this is rather
> tricky to use safely (see below).
>
> > I linked in the previous message, Google made a rather large set of
> > modifications to the supposedly-defunct lowmemorykiller.c not one month ago.
> > What's going on?

If you look into that commit, it adds ability to report kill stats. If
that was a change in how that driver works it would be rejected.

> >
> > Qualcomm still uses lowmemorykiller.c [1] on the Snapdragon 845. If PSI were
> > backported to 4.4, or even 3.18, would it really be used? I don't really
> > understand the aversion to an in-kernel memory killer on LKML despite the 
> > rest
> > of the industry's attraction to it. Perhaps there's some inherently great 
> > cost
> > in using the userspace solution that I'm unaware of?

Vendors are cautious about adopting userspace solution and it is a
process to address all concerns but we are getting there.

> > Regardless, even if PSI were backported, a full-fledged LMKD using it has 
> > yet to
> > be made, so it wouldn't be of much use now.
>
> This is work that is ongoing and requires kernel changes to make it
> feasible. One of the things that I have been working on for quite a
> while is the whole file descriptor for processes thing that is important
> for LMKD (Even though I never thought about this use-case when I started
> pitching this.). Joel and Daniel have joined in and are working on
> making LMKD possible.
> What I find odd is that every couple of weeks different solutions to the
> low memory problem are pitched. There is simple_lkml, there is LMKD, and
> there was a patchset that wanted to speed up memory reclaim at process
> kill-time by adding a new flag to the new pidfd_send_signal() syscall.
> That all seems - though related - rather uncoordinated.

I'm not sure why pidfd_wait and expedited reclaim is seen as
uncoordinated effort. All of them are done to improve userspace LMKD.

> Now granted,
> coordinated is usually not how kernel development necessarily works but
> it would probably be good