Re: [PATCH] mtd: use refcount to prevent corruption

2021-02-15 Thread Richard Weinberger
On Sat, Feb 13, 2021 at 6:12 PM Winkler, Tomas  wrote:
> > Well, the trick in glubi (and other MTDs with "hotplug" support) is not to
> > reject removal of the sub-device. ->_put_device() is of return type void.
> > The key is grabbing a reference on the sub-device in ->_get_device() such
> > that the layer below doesn't even try to remove while the MTD is in use.
>
> I understand that. But in that case the kernel is in the mercy of user space 
> to close the handle,
> the whole perception here is that of hotplug that the device is  physically 
> removed it cannot wait
> for the user space to complete. What's the fix is trying to do is to bail out 
> gracefully.
>
> > > so we postpone the mtd unregister to  mtd_info->_put_device()  but it
> > > that state we have nothing to hold on as the device is gone in
> > > remove() User will fail anyway, as the underlying device is not
> > > functional in that state.
> > > Anyway I've tried your suggestion, the kernel is crashing, hope I
> > > haven't done some silly bug.
> >
> > Can you point us to the affected code?
> > This would help a lot to understand the issue better.
> > I'm sure we can find a solution.
>
> Got green light on releasing the patches will send soon.

I'm eager to see them. :-)
As said, I'm sure we can find a nice solution.

-- 
Thanks,
//richard


RE: [PATCH] mtd: use refcount to prevent corruption

2021-02-13 Thread Winkler, Tomas

> 
> Tomas,
> 
> - Ursprüngliche Mail -
> >> As Richard was saying, we are really open to enhance MTD refcounting.
> >>
> >> However, the issue you are facing is, IMHO, not related to MTD but to
> MFD.
> >> There should be a way to avoid MFD to vanish by taking a reference of
> >> it through mtd->_get_device(). I don't think addressing the case
> >> where MFD vanishes while MTD (as a user) is still active is the right
> approach.
> >
> > I think it won't work because MFD sub-driver remove() is called   and it
> must
> > succeed because the main device  is not accessible unlike glueubi
> > which just returns -EBUSY.
> 
> Well, the trick in glubi (and other MTDs with "hotplug" support) is not to
> reject removal of the sub-device. ->_put_device() is of return type void.
> The key is grabbing a reference on the sub-device in ->_get_device() such
> that the layer below doesn't even try to remove while the MTD is in use.

I understand that. But in that case the kernel is in the mercy of user space to 
close the handle,  
the whole perception here is that of hotplug that the device is  physically 
removed it cannot wait 
for the user space to complete. What's the fix is trying to do is to bail out 
gracefully.

> > so we postpone the mtd unregister to  mtd_info->_put_device()  but it
> > that state we have nothing to hold on as the device is gone in
> > remove() User will fail anyway, as the underlying device is not
> > functional in that state.
> > Anyway I've tried your suggestion, the kernel is crashing, hope I
> > haven't done some silly bug.
> 
> Can you point us to the affected code?
> This would help a lot to understand the issue better.
> I'm sure we can find a solution.

Got green light on releasing the patches will send soon.


Thanks
Tomas



Re: [PATCH] mtd: use refcount to prevent corruption

2021-01-28 Thread Richard Weinberger
Tomas,

- Ursprüngliche Mail -
>> As Richard was saying, we are really open to enhance MTD refcounting.
>> 
>> However, the issue you are facing is, IMHO, not related to MTD but to MFD.
>> There should be a way to avoid MFD to vanish by taking a reference of it
>> through mtd->_get_device(). I don't think addressing the case where MFD
>> vanishes while MTD (as a user) is still active is the right approach.
> 
> I think it won't work because MFD sub-driver remove() is called   and it must
> succeed because the main device  is not accessible unlike glueubi which just
> returns -EBUSY.

Well, the trick in glubi (and other MTDs with "hotplug" support) is not to 
reject
removal of the sub-device. ->_put_device() is of return type void.
The key is grabbing a reference on the sub-device in ->_get_device() such that
the layer below doesn't even try to remove while the MTD is in use.

> so we postpone the mtd unregister to  mtd_info->_put_device()  but it that 
> state
> we have nothing to hold
> on as the device is gone in remove()
> User will fail anyway, as the underlying device is not functional in that 
> state.
> Anyway I've tried your suggestion, the kernel is crashing, hope I haven't done
> some silly bug.

Can you point us to the affected code?
This would help a lot to understand the issue better.
I'm sure we can find a solution.

Thanks,
//richard


RE: [PATCH] mtd: use refcount to prevent corruption

2021-01-28 Thread Winkler, Tomas
> Hi Tomas,
> 
> "Winkler, Tomas"  wrote on Thu, 28 Jan 2021
> 08:53:43 +:
> 
> > > Tomas,
> > >
> > > - Ursprüngliche Mail -
> > > >> >> Can you please explain a little more what devices are involved?
> > > >> >> Does it implement _get_device() and _put_device()?
> > > >> > No this is not connected to those handlers of the underlying
> > > >> > device and those won't help.
> > > >> > I have a spi device provided by MFD framework so it can go away
> anytime.
> > > >>
> > > >> Can it go away physically or just in software?
> > > > Software, but since this is mfd it's basically hotplug. The kernel
> > > > is crashing when I simulate hardware failure.
> > > >>
> > > >> Usually the pattern is that you make sure in the device driver
> > > >> that nobody can orphan the MTD while it is in use.
> > > >> e.g. drivers/mtd/ubi/gluebi.c does so. In _get_device() it grabs
> > > >> a reference on the underlying UBI volume to make sure it cannot
> > > >> go away while the MTD (on top of UBI) is in use.
> > > >
> > > > I can try that if it helps, because we are simulating possible
> > > > lower level crash.
> > > > In an case I believe that the proper refcouting is much more
> > > > robust solution, than the current one.
> > > > I'd appreciate if someone can review the actual implementation.
> > >
> > > This happens right now, I try to understand why exactly the current
> > > way is not good in enough. :-)
> > >
> > > Your approach makes sure that the MTD itself does not go away while
> > > it has users but how does this help in the case where the underlying
> > > MFD just vanishes?
> > > The MTD can be in use and the MFD can go away while e.g. mtd_read()
> > > or such takes place.
> >
> > Read will fail, but kernel won't crash on access to memory that was freed.
> 
> As Richard was saying, we are really open to enhance MTD refcounting.
> 
> However, the issue you are facing is, IMHO, not related to MTD but to MFD.
> There should be a way to avoid MFD to vanish by taking a reference of it
> through mtd->_get_device(). I don't think addressing the case where MFD
> vanishes while MTD (as a user) is still active is the right approach.

I think it won't work because MFD sub-driver remove() is called   and it must 
succeed because the main device  is not accessible unlike glueubi which just 
returns -EBUSY.
so we postpone the mtd unregister to  mtd_info->_put_device()  but it that 
state we have nothing to hold
on as the device is gone in remove()
User will fail anyway, as the underlying device is not functional in that state.
Anyway I've tried your suggestion, the kernel is crashing, hope I haven't done 
some silly bug.

Thanks
Tomas





Re: [PATCH] mtd: use refcount to prevent corruption

2021-01-28 Thread Miquel Raynal
Hi Tomas,

"Winkler, Tomas"  wrote on Thu, 28 Jan 2021
08:53:43 +:

> > Tomas,
> > 
> > - Ursprüngliche Mail -  
> > >> >> Can you please explain a little more what devices are involved?
> > >> >> Does it implement _get_device() and _put_device()?  
> > >> > No this is not connected to those handlers of the underlying device
> > >> > and those won't help.
> > >> > I have a spi device provided by MFD framework so it can go away 
> > >> > anytime.  
> > >>
> > >> Can it go away physically or just in software?  
> > > Software, but since this is mfd it's basically hotplug. The kernel is
> > > crashing when I simulate hardware failure.  
> > >>
> > >> Usually the pattern is that you make sure in the device driver that
> > >> nobody can orphan the MTD while it is in use.
> > >> e.g. drivers/mtd/ubi/gluebi.c does so. In _get_device() it grabs a
> > >> reference on the underlying UBI volume to make sure it cannot go away
> > >> while the MTD (on top of UBI) is in use.  
> > >
> > > I can try that if it helps, because we are simulating possible lower
> > > level crash.
> > > In an case I believe that the proper refcouting is much more robust
> > > solution, than the current one.
> > > I'd appreciate if someone can review the actual implementation.  
> > 
> > This happens right now, I try to understand why exactly the current way is 
> > not
> > good in enough. :-)
> > 
> > Your approach makes sure that the MTD itself does not go away while it has
> > users but how does this help in the case where the underlying MFD just
> > vanishes?
> > The MTD can be in use and the MFD can go away while e.g. mtd_read() or such
> > takes place.  
> 
> Read will fail, but kernel won't crash on access to memory that was freed.

As Richard was saying, we are really open to enhance MTD refcounting.

However, the issue you are facing is, IMHO, not related to MTD but to
MFD. There should be a way to avoid MFD to vanish by taking a reference
of it through mtd->_get_device(). I don't think addressing the case
where MFD vanishes while MTD (as a user) is still active is the
right approach.

Thanks,
Miquèl


RE: [PATCH] mtd: use refcount to prevent corruption

2021-01-28 Thread Winkler, Tomas


> Tomas,
> 
> - Ursprüngliche Mail -
> >> >> Can you please explain a little more what devices are involved?
> >> >> Does it implement _get_device() and _put_device()?
> >> > No this is not connected to those handlers of the underlying device
> >> > and those won't help.
> >> > I have a spi device provided by MFD framework so it can go away anytime.
> >>
> >> Can it go away physically or just in software?
> > Software, but since this is mfd it's basically hotplug. The kernel is
> > crashing when I simulate hardware failure.
> >>
> >> Usually the pattern is that you make sure in the device driver that
> >> nobody can orphan the MTD while it is in use.
> >> e.g. drivers/mtd/ubi/gluebi.c does so. In _get_device() it grabs a
> >> reference on the underlying UBI volume to make sure it cannot go away
> >> while the MTD (on top of UBI) is in use.
> >
> > I can try that if it helps, because we are simulating possible lower
> > level crash.
> > In an case I believe that the proper refcouting is much more robust
> > solution, than the current one.
> > I'd appreciate if someone can review the actual implementation.
> 
> This happens right now, I try to understand why exactly the current way is not
> good in enough. :-)
> 
> Your approach makes sure that the MTD itself does not go away while it has
> users but how does this help in the case where the underlying MFD just
> vanishes?
> The MTD can be in use and the MFD can go away while e.g. mtd_read() or such
> takes place.

Read will fail, but kernel won't crash on access to memory that was freed.

Thanks
Tomas




Re: [PATCH] mtd: use refcount to prevent corruption

2021-01-27 Thread Richard Weinberger
Tomas,

- Ursprüngliche Mail -
>> >> Can you please explain a little more what devices are involved?
>> >> Does it implement _get_device() and _put_device()?
>> > No this is not connected to those handlers of the underlying device
>> > and those won't help.
>> > I have a spi device provided by MFD framework so it can go away anytime.
>> 
>> Can it go away physically or just in software?
> Software, but since this is mfd it's basically hotplug. The kernel is crashing
> when I simulate hardware failure.
>> 
>> Usually the pattern is that you make sure in the device driver that nobody 
>> can
>> orphan the MTD while it is in use.
>> e.g. drivers/mtd/ubi/gluebi.c does so. In _get_device() it grabs a reference 
>> on
>> the underlying UBI volume to make sure it cannot go away while the MTD (on
>> top of UBI) is in use.
> 
> I can try that if it helps, because we are simulating possible lower level
> crash.
> In an case I believe that the proper refcouting is much more robust solution,
> than the current one.
> I'd appreciate if someone can review the actual implementation.

This happens right now, I try to understand why exactly the current way is not
good in enough. :-)

Your approach makes sure that the MTD itself does not go away while it has 
users but
how does this help in the case where the underlying MFD just vanishes?
The MTD can be in use and the MFD can go away while e.g. mtd_read() or such
takes place.

Thanks,
//richard


RE: [PATCH] mtd: use refcount to prevent corruption

2021-01-27 Thread Winkler, Tomas

> 
> - Ursprüngliche Mail -
> >> > When underlying device is removed mtd core will crash in case user
> >> > space is still holding an open handle to a mtd device node.
> >> > A proper refcounting is needed so device is release only when a
> >> > partition has no active users. The current simple counter is not
> >> > sufficient.
> >>
> >> Can you please explain a little more what devices are involved?
> >> Does it implement _get_device() and _put_device()?
> > No this is not connected to those handlers of the underlying device
> > and those won't help.
> > I have a spi device provided by MFD framework so it can go away anytime.
> 
> Can it go away physically or just in software?
Software, but since this is mfd it's basically hotplug. The kernel is crashing 
when I simulate hardware failure.
> 
> Usually the pattern is that you make sure in the device driver that nobody can
> orphan the MTD while it is in use.
> e.g. drivers/mtd/ubi/gluebi.c does so. In _get_device() it grabs a reference 
> on
> the underlying UBI volume to make sure it cannot go away while the MTD (on
> top of UBI) is in use.

I can try that if it helps, because we are simulating possible lower level 
crash. 
In an case I believe that the proper refcouting is much more robust solution, 
than the current one.
I'd appreciate if someone can review the actual implementation. 

Thanks
Tomas



Re: [PATCH] mtd: use refcount to prevent corruption

2021-01-27 Thread Richard Weinberger
- Ursprüngliche Mail -
>> > When underlying device is removed mtd core will crash in case user
>> > space is still holding an open handle to a mtd device node.
>> > A proper refcounting is needed so device is release only when a
>> > partition has no active users. The current simple counter is not
>> > sufficient.
>> 
>> Can you please explain a little more what devices are involved?
>> Does it implement _get_device() and _put_device()?
> No this is not connected to those handlers of the underlying device and those
> won't help.
> I have a spi device provided by MFD framework so it can go away anytime.

Can it go away physically or just in software?

Usually the pattern is that you make sure in the device driver that nobody
can orphan the MTD while it is in use.
e.g. drivers/mtd/ubi/gluebi.c does so. In _get_device() it grabs a reference on 
the
underlying UBI volume to make sure it cannot go away while the MTD (on top of 
UBI)
is in use.

Thanks,
//richard


RE: [PATCH] mtd: use refcount to prevent corruption

2021-01-27 Thread Winkler, Tomas

> Subject: Re: [PATCH] mtd: use refcount to prevent corruption
> 
> Tomas,
> 
> - Ursprüngliche Mail -
> > Von: "Tomas Winkler" 
> > An: "Miquel Raynal" , "richard"
> > , "Vignesh Raghavendra" , "linux-mtd"
> > , "linux-kernel"
> > 
> > CC: "Tomas Winkler" 
> > Gesendet: Mittwoch, 27. Januar 2021 21:03:19
> > Betreff: [PATCH] mtd: use refcount to prevent corruption
> 
> > When underlying device is removed mtd core will crash in case user
> > space is still holding an open handle to a mtd device node.
> > A proper refcounting is needed so device is release only when a
> > partition has no active users. The current simple counter is not
> > sufficient.
> 
> Can you please explain a little more what devices are involved?
> Does it implement _get_device() and _put_device()?
No this is not connected to those handlers of the underlying device and those 
won't help. 
I have a spi device provided by MFD framework so it can go away anytime. 
My solution tries to  replace the current simple partition reference counting.  
In previous solution it will return -EBUSY on partition that is held but will 
remove the actual parent device, leading to crash.
Also w/o reference counting there is no process to actually remove the 
partition that was previously busy.

Thanks
Tomas



> 
> Thanks,
> //richard


Re: [PATCH] mtd: use refcount to prevent corruption

2021-01-27 Thread Richard Weinberger
Tomas,

- Ursprüngliche Mail -
> Von: "Tomas Winkler" 
> An: "Miquel Raynal" , "richard" , 
> "Vignesh Raghavendra" ,
> "linux-mtd" , "linux-kernel" 
> 
> CC: "Tomas Winkler" 
> Gesendet: Mittwoch, 27. Januar 2021 21:03:19
> Betreff: [PATCH] mtd: use refcount to prevent corruption

> When underlying device is removed mtd core will crash
> in case user space is still holding an open handle to a mtd device node.
> A proper refcounting is needed so device is release only when a
> partition has no active users. The current simple counter is not
> sufficient.

Can you please explain a little more what devices are involved?
Does it implement _get_device() and _put_device()?

Thanks,
//richard