Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2017-05-30 Thread Shuah Khan
Hi Sailus/Mauro,

On 01/26/2017 02:10 AM, Mauro Carvalho Chehab wrote:
> Em Wed, 25 Jan 2017 13:02:31 +0200
> Sakari Ailus  escreveu:
> 
>> Hi Mauro,
>>
>> On Tue, Jan 24, 2017 at 08:49:02AM -0200, Mauro Carvalho Chehab wrote:
>>> Hi Sakari,
>>>
>>> Just returned this week from vacations. I'm reading my long e-mail backlog,
>>> starting from my main inbox...
>>>
>>> Em Mon, 2 Jan 2017 09:53:49 +0200
>>> Sakari Ailus  escreveu:
>>>   
 Hi Mauro,

 On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:  
> Em Fri, 16 Dec 2016 17:07:23 +0200
> Sakari Ailus  escreveu:
> 
>> Hi Hans,
> 
>>> chrdev_open in fs/char_dev.c increases the refcount on open() and 
>>> decreases it
>>> on release(). Thus ensuring that the cdev can never be removed while in 
>>> an
>>> ioctl.  
>>
>> It does, but it does not affect memory which is allocated separately of 
>> that.
>>
>> See this:
>>
>> 
>> 
>
> That sounds promising. If this bug issues other drivers than OMAP3,
> then indeed the core has a bug.
>
> I'll see if I can reproduce it here with some USB drivers later this 
> week.

 It's not a driver problem so yes, it is reproducible on other hardware.  
>>>
>>> Didn't have time to test it before entering into vacations.
>>>
>>> I guess I won't have any time this week to test those issues on
>>> my hardware, as I suspect that my patch queue is full. Also, we're
>>> approaching the next merge window. So, unfortunately, I won't have
>>> much time those days to do much testing. 
>>>
>>> Btw, Hans commented that you were planning to working on it this month.
>>>
>>> Do you have some news with regards to the media controller bind/unbind
>>> fixes?  
>>
>> I have a bunch of meeting notes to send from the Oslo meeting with Hans and
>> Laurent; I should have that ready by the end of the week. The RFC patchset
>> certainly needs changes based on that.
> 
> OK. I'll wait for your notes and the new patchset.

What is the status of this patch series? Did I miss RFC v4?

As you might remember, my resource sharing work for snd-usb-audio
and the shared media object API which is necessary for media
driver and snd-usb-audio to share the media device are pending
waiting for this RFC series to go from RFC to a version that can
be merged.

I would like to get the snd-usb-audio work done soon and target it
for an upcoming release in the near future!

Could you please send an update on the status on when the next RFC
version might be sent out.

thanks,
-- Shuah


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2017-01-26 Thread Mauro Carvalho Chehab
Em Wed, 25 Jan 2017 13:02:31 +0200
Sakari Ailus  escreveu:

> Hi Mauro,
> 
> On Tue, Jan 24, 2017 at 08:49:02AM -0200, Mauro Carvalho Chehab wrote:
> > Hi Sakari,
> > 
> > Just returned this week from vacations. I'm reading my long e-mail backlog,
> > starting from my main inbox...
> > 
> > Em Mon, 2 Jan 2017 09:53:49 +0200
> > Sakari Ailus  escreveu:
> >   
> > > Hi Mauro,
> > > 
> > > On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:  
> > > > Em Fri, 16 Dec 2016 17:07:23 +0200
> > > > Sakari Ailus  escreveu:
> > > > 
> > > > > Hi Hans,
> > > > 
> > > > > > chrdev_open in fs/char_dev.c increases the refcount on open() and 
> > > > > > decreases it
> > > > > > on release(). Thus ensuring that the cdev can never be removed 
> > > > > > while in an
> > > > > > ioctl.  
> > > > > 
> > > > > It does, but it does not affect memory which is allocated separately 
> > > > > of that.
> > > > > 
> > > > > See this:
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > That sounds promising. If this bug issues other drivers than OMAP3,
> > > > then indeed the core has a bug.
> > > > 
> > > > I'll see if I can reproduce it here with some USB drivers later this 
> > > > week.
> > > 
> > > It's not a driver problem so yes, it is reproducible on other hardware.  
> > 
> > Didn't have time to test it before entering into vacations.
> > 
> > I guess I won't have any time this week to test those issues on
> > my hardware, as I suspect that my patch queue is full. Also, we're
> > approaching the next merge window. So, unfortunately, I won't have
> > much time those days to do much testing. 
> > 
> > Btw, Hans commented that you were planning to working on it this month.
> > 
> > Do you have some news with regards to the media controller bind/unbind
> > fixes?  
> 
> I have a bunch of meeting notes to send from the Oslo meeting with Hans and
> Laurent; I should have that ready by the end of the week. The RFC patchset
> certainly needs changes based on that.

OK. I'll wait for your notes and the new patchset.

> > > > While IMHO it is overkill trying to support hot plug on omap3, I won't
> > > > mind if you do that, provided that your patch series can be applied in
> > > > a way that it won't cause regressions for real hot-pluggable hardware.  
> > > >   
> > > 
> > > This is not really about the OMAP3 ISP driver hotplug support; it is 
> > > indeed
> > > about the framework's ability to support hotpluggable hardware. The 
> > > current
> > > painpoint is removing hardware; the current frameworks aren't quite up to
> > > that at the moment.  
> > 
> > The point here is that, while it would be fun to allow unbinding OMAP3
> > V4L2 drivers, OMAP3 doesn't really require hotplug support. On the other
> > hand, on USB drivers, where unbind is a requirement, the current status
> > of the tree is that hotplug works. I did some massive parallel bind/unbind
> > loops here to double check, when we added such fixup patches. Granted, I
> > won't doubt that there are still some rare race conditions that I was
> > unable to reproduce on the time I tested. I also didn't try to hack the
> > Kernel to introduce extra delays to make those race conditions more
> > likely to happen.
> > 
> > Anyway, my main concern with this patch is that it breaks hotplug on devices
> > that really need it, while it fix support only for OMAP3 (with doesn't 
> > need).  
> 
> I don't disagree with you. Obviously the intent is not to break
> hot-pluggable hardware, albeit the changes needed to avoid that haven't been
> implemented yet. (One of the reasons it's been RFC all the time.)
> 
> > 
> > Also, it starts with a series of patches that will cause regressions.
> > 
> > I won't matter changing the solution to some other approach that would
> > work, provided that the patches are added on an incremented way, and
> > won't introduce regressions to USB drivers.  
> 
> It may be possible to avoid increasing the time window during which bad
> things could happen before fully removing them.

The fix should be to protecting those windows by either a kref, lock or
a lockless (RCU) approach.

> However the patchset is a
> lot easier to work with without bundling the reverts into other (and likely
> multiple) patches as the reverted patches took quite a different direction
> than is followed in this patchset.

Doing the reverts before doing the fixes do break things. What you're
reverting is basically the logic that unbinds the struct media_devnode
from struct media_device. This is independent from whatever changes
you would be doing at struct media_device. So, you could do all changes
there, apply such changes on OMAP3 and on the USB drivers and then
rebind struct media_devnode at struct media_device[1].

[1] assuming that everyone agrees that rebinding it is for the best.
I still think 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2017-01-25 Thread Sakari Ailus
Hi Mauro,

On Tue, Jan 24, 2017 at 08:49:02AM -0200, Mauro Carvalho Chehab wrote:
> Hi Sakari,
> 
> Just returned this week from vacations. I'm reading my long e-mail backlog,
> starting from my main inbox...
> 
> Em Mon, 2 Jan 2017 09:53:49 +0200
> Sakari Ailus  escreveu:
> 
> > Hi Mauro,
> > 
> > On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:
> > > Em Fri, 16 Dec 2016 17:07:23 +0200
> > > Sakari Ailus  escreveu:
> > >   
> > > > Hi Hans,  
> > >   
> > > > > chrdev_open in fs/char_dev.c increases the refcount on open() and 
> > > > > decreases it
> > > > > on release(). Thus ensuring that the cdev can never be removed while 
> > > > > in an
> > > > > ioctl.
> > > > 
> > > > It does, but it does not affect memory which is allocated separately of 
> > > > that.
> > > > 
> > > > See this:
> > > > 
> > > > 
> > > >   
> > > 
> > > That sounds promising. If this bug issues other drivers than OMAP3,
> > > then indeed the core has a bug.
> > > 
> > > I'll see if I can reproduce it here with some USB drivers later this 
> > > week.  
> > 
> > It's not a driver problem so yes, it is reproducible on other hardware.
> 
> Didn't have time to test it before entering into vacations.
> 
> I guess I won't have any time this week to test those issues on
> my hardware, as I suspect that my patch queue is full. Also, we're
> approaching the next merge window. So, unfortunately, I won't have
> much time those days to do much testing. 
> 
> Btw, Hans commented that you were planning to working on it this month.
> 
> Do you have some news with regards to the media controller bind/unbind
> fixes?

I have a bunch of meeting notes to send from the Oslo meeting with Hans and
Laurent; I should have that ready by the end of the week. The RFC patchset
certainly needs changes based on that.

> 
> > > While IMHO it is overkill trying to support hot plug on omap3, I won't
> > > mind if you do that, provided that your patch series can be applied in
> > > a way that it won't cause regressions for real hot-pluggable hardware.  
> > 
> > This is not really about the OMAP3 ISP driver hotplug support; it is indeed
> > about the framework's ability to support hotpluggable hardware. The current
> > painpoint is removing hardware; the current frameworks aren't quite up to
> > that at the moment.
> 
> The point here is that, while it would be fun to allow unbinding OMAP3
> V4L2 drivers, OMAP3 doesn't really require hotplug support. On the other
> hand, on USB drivers, where unbind is a requirement, the current status
> of the tree is that hotplug works. I did some massive parallel bind/unbind
> loops here to double check, when we added such fixup patches. Granted, I
> won't doubt that there are still some rare race conditions that I was
> unable to reproduce on the time I tested. I also didn't try to hack the
> Kernel to introduce extra delays to make those race conditions more
> likely to happen.
> 
> Anyway, my main concern with this patch is that it breaks hotplug on devices
> that really need it, while it fix support only for OMAP3 (with doesn't need).

I don't disagree with you. Obviously the intent is not to break
hot-pluggable hardware, albeit the changes needed to avoid that haven't been
implemented yet. (One of the reasons it's been RFC all the time.)

> 
> Also, it starts with a series of patches that will cause regressions.
> 
> I won't matter changing the solution to some other approach that would
> work, provided that the patches are added on an incremented way, and
> won't introduce regressions to USB drivers.

It may be possible to avoid increasing the time window during which bad
things could happen before fully removing them. However the patchset is a
lot easier to work with without bundling the reverts into other (and likely
multiple) patches as the reverted patches took quite a different direction
than is followed in this patchset.

Let's discuss this later, at the time when we have a patchset that produces
a sound code base (on the top of that patchset) that is understood to be
free of object lifetime issues as long as hot-pluggable hardware goes.

> 
> > > On that matter, just like we use vivid as a testbench and as an
> > > example for other drivers, it would be great if we could merge
> > > the vimc driver. What's the status of Helen's patchset?  
> > 
> > That's a good point. I wasn't reviewing that driver back then when the
> > patches were posted, but should it go in, it should make a good example for
> > writing other drivers as well.
> > 
> 
> I saw Laurent's comments about Helen's last patch series. From his
> comments:
> 
>   "I've reviewed the whole patch but haven't had time to test it. I've 
> also 
>skipped the items marked as TODO or FIXME as they're obviously not 
> ready yet 
>:-) Overall this looks good to me, all the issues are minor."
> 
> 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2017-01-24 Thread Mauro Carvalho Chehab
Hi Sakari,

Just returned this week from vacations. I'm reading my long e-mail backlog,
starting from my main inbox...

Em Mon, 2 Jan 2017 09:53:49 +0200
Sakari Ailus  escreveu:

> Hi Mauro,
> 
> On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:
> > Em Fri, 16 Dec 2016 17:07:23 +0200
> > Sakari Ailus  escreveu:
> >   
> > > Hi Hans,  
> >   
> > > > chrdev_open in fs/char_dev.c increases the refcount on open() and 
> > > > decreases it
> > > > on release(). Thus ensuring that the cdev can never be removed while in 
> > > > an
> > > > ioctl.
> > > 
> > > It does, but it does not affect memory which is allocated separately of 
> > > that.
> > > 
> > > See this:
> > > 
> > > 
> > >   
> > 
> > That sounds promising. If this bug issues other drivers than OMAP3,
> > then indeed the core has a bug.
> > 
> > I'll see if I can reproduce it here with some USB drivers later this week.  
> 
> It's not a driver problem so yes, it is reproducible on other hardware.

Didn't have time to test it before entering into vacations.

I guess I won't have any time this week to test those issues on
my hardware, as I suspect that my patch queue is full. Also, we're
approaching the next merge window. So, unfortunately, I won't have
much time those days to do much testing. 

Btw, Hans commented that you were planning to working on it this month.

Do you have some news with regards to the media controller bind/unbind
fixes?

> > While IMHO it is overkill trying to support hot plug on omap3, I won't
> > mind if you do that, provided that your patch series can be applied in
> > a way that it won't cause regressions for real hot-pluggable hardware.  
> 
> This is not really about the OMAP3 ISP driver hotplug support; it is indeed
> about the framework's ability to support hotpluggable hardware. The current
> painpoint is removing hardware; the current frameworks aren't quite up to
> that at the moment.

The point here is that, while it would be fun to allow unbinding OMAP3
V4L2 drivers, OMAP3 doesn't really require hotplug support. On the other
hand, on USB drivers, where unbind is a requirement, the current status
of the tree is that hotplug works. I did some massive parallel bind/unbind
loops here to double check, when we added such fixup patches. Granted, I
won't doubt that there are still some rare race conditions that I was
unable to reproduce on the time I tested. I also didn't try to hack the
Kernel to introduce extra delays to make those race conditions more
likely to happen.

Anyway, my main concern with this patch is that it breaks hotplug on devices
that really need it, while it fix support only for OMAP3 (with doesn't need).

Also, it starts with a series of patches that will cause regressions.

I won't matter changing the solution to some other approach that would
work, provided that the patches are added on an incremented way, and
won't introduce regressions to USB drivers.

> > On that matter, just like we use vivid as a testbench and as an
> > example for other drivers, it would be great if we could merge
> > the vimc driver. What's the status of Helen's patchset?  
> 
> That's a good point. I wasn't reviewing that driver back then when the
> patches were posted, but should it go in, it should make a good example for
> writing other drivers as well.
> 

I saw Laurent's comments about Helen's last patch series. From his
comments:

"I've reviewed the whole patch but haven't had time to test it. I've 
also 
 skipped the items marked as TODO or FIXME as they're obviously not 
ready yet 
 :-) Overall this looks good to me, all the issues are minor."

Helen promised a new version fixing those minor issues. Perhaps we should merge
her next series upstream with such issues addressed and see how it behaves.

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2017-01-01 Thread Sakari Ailus
Hi Mauro,

On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:
> Em Fri, 16 Dec 2016 17:07:23 +0200
> Sakari Ailus  escreveu:
> 
> > Hi Hans,
> 
> > > chrdev_open in fs/char_dev.c increases the refcount on open() and 
> > > decreases it
> > > on release(). Thus ensuring that the cdev can never be removed while in an
> > > ioctl.  
> > 
> > It does, but it does not affect memory which is allocated separately of 
> > that.
> > 
> > See this:
> > 
> > 
> 
> That sounds promising. If this bug issues other drivers than OMAP3,
> then indeed the core has a bug.
> 
> I'll see if I can reproduce it here with some USB drivers later this week.

It's not a driver problem so yes, it is reproducible on other hardware.

> 
> > > If the omap3 is used as a testbed, then that's fine by me, but even then I
> > > probably wouldn't want the omap3 code that makes this possible in the 
> > > kernel.
> > > It's just additional code for no purpose.  
> > 
> > The same problems exist on other devices, whether platform, pci or USB, as
> > the problems are in the core frameworks rather than (only) in the drivers.
> > 
> > On platform devices, this happens also when removing the module.
> > 
> > I've used omap3isp as an example since it demonstrates well the problems and
> > a lot of people have the hardware as well. Also, Mauro has requested all
> > drivers to be converted to the new API. I'm fine doing that for the actually
> > hot-pluggable hardware.
> 
> While IMHO it is overkill trying to support hot plug on omap3, I won't
> mind if you do that, provided that your patch series can be applied in
> a way that it won't cause regressions for real hot-pluggable hardware.

This is not really about the OMAP3 ISP driver hotplug support; it is indeed
about the framework's ability to support hotpluggable hardware. The current
painpoint is removing hardware; the current frameworks aren't quite up to
that at the moment.

I haven't checked how many plain V4L2 drivers do this correctly but the
problem domain becomes a lot more complex when you add V4L2 sub-device and
Media controller nodes. Having a driver that does implement this correctly
is important for writing new drivers, hence the changes to the OMAP3 ISP
driver.

> 
> I still think that keeping cdev embedded in a structure that it is
> created dynamically when registering the device node, instead of
> embedding it at struct media_device. Yet, if you prove that this does
> more harm than good, I'm ok on re-embeeding it. However, on such case,
> you need to put the patches re-embeeding it at the end of the patch
> series (and not at the beginning), as otherwise you'll be causing
> regressions.
> 
> > One additional reason is that as the omap3isp driver has been used as an
> > example to write a number of other drivers, people do see what's the right
> > way to do these things, instead of copying code from a driver doing it
> > wrong.
> 
> Interesting argument. Yet, IMHO, the best would be to do the proper
> review on the first platform driver that would support hot-plug,
> and use this as an example. It is a shame that project Aurora was
> discontinued, as media drivers for such kind of hardware would be an
> interesting example.
> 
> On that matter, just like we use vivid as a testbench and as an
> example for other drivers, it would be great if we could merge
> the vimc driver. What's the status of Helen's patchset?

That's a good point. I wasn't reviewing that driver back then when the
patches were posted, but should it go in, it should make a good example for
writing other drivers as well.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ai...@iki.fi XMPP: sai...@retiisi.org.uk
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-23 Thread Laurent Pinchart
Hi Shuah,

On Thursday 15 Dec 2016 09:06:41 Shuah Khan wrote:
> On 12/15/2016 08:26 AM, Hans Verkuil wrote:
> > On 15/12/16 15:45, Shuah Khan wrote:
> >> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
> >>> On 15/12/16 13:56, Laurent Pinchart wrote:
>  On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> > On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> >> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab 
wrote:

[snip]

> >>> In my view the main problem is that the media core is bound to a struct
> >>> device set by the driver that creates the MC. But since the MC gives an
> >>> overview of lots of other (sub)devices the refcount of the media device
> >>> should be increased for any (sub)device that adds itself to the MC and
> >>> decreased for any (sub)device that is removed. Only when the very last
> >>> user goes away can the MC memory be released.
> >> 
> >> Correct. Media Device Allocator API work I did allows creating media
> >> device on parent USB device to allow media sound driver share the media
> >> device. It does ref-counting on media device and media device is
> >> unregistered only when the last driver unregisters it.
> >> 
> >> There is another aspect to explore regarding media device and the graph.
> >> 
> >> Should all the entities stick around until all references to media
> >> device are gone? If an application has /dev/media open, does that
> >> mean all entities should not be free'd until this app. exits? What
> >> should happen if an app. is streaming? Should the graph stay intact
> >> until the app. exits?
> > 
> > Yes, everything must stay around until the last user has disappeared.
> > 
> > In general unplugs can happen at any time. So applications can be in the
> > middle of an ioctl, and removing memory during that time is just
> > impossible.
> > 
> > On unplug you:
> > 
> > 1) stop any HW DMA (highly device dependent)
> > 2) wake up any filehandles that wait for an event
> > 3) unregister any device nodes
> > 
> > Then just sit back and wait for refcounts to go down as filehandles are
> > closed by the application.
> > 
> > Note: the v4l2/media/cec/IR/whatever core is typically responsible for
> > rejecting any ioctls/mmap/etc. once the device node has been
> > unregistered. The only valid file operation is release().
> > 
> >>If yes, this would pose problems when we have multiple drivers bound
> >>to the media device. When audio driver goes away for example, it
> >>should
> >>be allowed to delete its entities.
> > 
> > Only if you can safely remove it from the topology data structures while
> > being 100% certain that nobody can ever access it. I'm not sure if that is
> > the case. Actually, looking at e.g. adv7604.c it does
> > media_entity_cleanup(>entity); in remove() which is an empty
> > function, so there doesn't appear any attempt to safely clean up an
> > entity (i.e. make sure no running media ioctl can access it or call ops).
> 
> Right. media_entity_cleanup() nothing at the moment. Also if it gets called
> after media_device_unregister_entity(), it could pose problems. I wonder if
> we have drivers that are calling media_entity_cleanup() after unregistering
> the entity?
> 
> > This probably will need to be serialized with the graph_mutex lock.
> > 
> >> The approach current mc-core takes is that the media_device and
> >> media_devnode stick around, but entities can be added and removed during
> >> media_device lifetime.
> > 
> > Seems reasonable. But the removal needs to be done carefully, and that
> > doesn't seem to be the case now (unless adv7604.c is just buggy).
> 
> Correct. It is possible media_device is embedded in this driver.

I assume you mean the private data structure instantiated by the adv7604 
driver. That can't be the case, as adv7604 is a subdev.

> When driver that embeds is unbound, media_device goes away. I needed to make
> the media device refcounted and sharable for audio work and that is what the
> Media Device Allocator API does.
> 
> Maybe we have more cases than this audio case that requires media_device
> refcounted. If we have to keep entities that are in use until all the
> references go away, we have to ref-count them as well.

I think we're converging towards refcounting media_device to manage its 
lifetime, so there's more cases, yes. That's why I propose first making 
media_device refcounted, and then adding the allocator API on top of that 
given that the allocator API requires refcounting. That seems to me to be the 
cleanest approach.

> >> If an app. is still running when media_device is unregistered,
> >> media_device isn't released until the last reference goes away and ioctls
> >> can check if media_device is registered or not.
> >> 
> >> We have to decide on the larger lifetime question surrounding
> >> media_device and graph as well.
> > 
> > I don't think there is any choice but to keep 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-23 Thread Laurent Pinchart
Hi Mauro,

On Thursday 15 Dec 2016 15:08:26 Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 16:26:19 +0100 Hans Verkuil escreveu:
> >> Should all the entities stick around until all references to media
> >> device are gone? If an application has /dev/media open, does that
> >> mean all entities should not be free'd until this app. exits? What
> >> should happen if an app. is streaming? Should the graph stay intact
> >> until the app. exits?
> > 
> > Yes, everything must stay around until the last user has disappeared.
> > 
> > In general unplugs can happen at any time. So applications can be in the
> > middle of an ioctl, and removing memory during that time is just
> > impossible.
> > 
> > On unplug you:
> > 
> > 1) stop any HW DMA (highly device dependent)
> > 2) wake up any filehandles that wait for an event
> > 3) unregister any device nodes
> > 
> > Then just sit back and wait for refcounts to go down as filehandles are
> > closed by the application.
> > 
> > Note: the v4l2/media/cec/IR/whatever core is typically responsible for
> > rejecting any ioctls/mmap/etc. once the device node has been
> > unregistered. The only valid file operation is release().
> 
> Agreed. The problem on OMAP3 is that it doesn't stop HW DMA when
> struct media_devnode is released. It tries to do it later, when the
> V4L2 core is unbind, by trying to dig into the media controller
> struct that the driver removed before.

Note that stopping the hardware doesn't mean updating the pipeline state to 
mark it as stopped. Unlike stopping the hardware that is mandatory at unbind 
time as hardware access is not allowed after the unbind handler returns, how 
we handle the software state is entirely up to us. I'm not saying it can't be 
done at unbind time, but I'm not sure yet whether it should either.

> That's said, for OMAP3 and all other drivers that don't support hot unplug,
> I would just use suppress_bind_attrs, as I fail to see any need to allow
> unbinding them.

That's akin to breaking the thermometer to cure the patient from fever. I'm 
not completely opposed to making drivers non-unbindable in a case-by-case 
basis (and based on the author's will), but if a driver author wants to make a 
driver unbindable the core should allow that to be implemented.

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-23 Thread Laurent Pinchart
Hi Hans,

On Thursday 15 Dec 2016 16:26:19 Hans Verkuil wrote:
> On 15/12/16 15:45, Shuah Khan wrote:
> > On 12/15/2016 07:03 AM, Hans Verkuil wrote:

[snip]

> >> In my view the main problem is that the media core is bound to a struct
> >> device set by the driver that creates the MC. But since the MC gives an
> >> overview of lots of other (sub)devices the refcount of the media device
> >> should be increased for any (sub)device that adds itself to the MC and
> >> decreased for any (sub)device that is removed. Only when the very last
> >> user goes away can the MC memory be released.
> > 
> > Correct. Media Device Allocator API work I did allows creating media
> > device on parent USB device to allow media sound driver share the media
> > device. It does ref-counting on media device and media device is
> > unregistered only when the last driver unregisters it.
> > 
> > There is another aspect to explore regarding media device and the graph.
> > 
> > Should all the entities stick around until all references to media
> > device are gone? If an application has /dev/media open, does that
> > mean all entities should not be free'd until this app. exits? What
> > should happen if an app. is streaming? Should the graph stay intact
> > until the app. exits?
> 
> Yes, everything must stay around until the last user has disappeared.
> 
> In general unplugs can happen at any time. So applications can be in the
> middle of an ioctl, and removing memory during that time is just
> impossible.
> 
> On unplug you:
> 
> 1) stop any HW DMA (highly device dependent)
> 2) wake up any filehandles that wait for an event
> 3) unregister any device nodes

Shouldn't 2 and 3 be switched ? We also need to return all buffers to vb2 
without any race condition, so I would say the sequence of events should be as 
follows.

1. Mark the device as being disconnected. This condition should be tested by 
the .buf_queue() handler that will then return the buffer immediately to vb2 
with the state set to VB2_BUF_STATE_ERROR.
2. Stop hardware operation (DMA, interrupts, ...).
3. Unregister the devnodes. This shall result in all new ioctl calls being 
blocked by the core.
4. Wake up all waiters.

There's still a race between 2 and 3, as new hardware operations could be 
started. We need to decide how to handle that.

The uvcvideo driver handles this in a reasonably clean way (at least for the 
video devnodes, there are races related to the media controller devnode), but 
the driver-side implementation is a bit complex (look at the comment in 
uvc_queue_cancel(), and how uvc_unregister_video() has to increase the 
refcount temporarily for instance) even if the fact that the USB core stops 
hardware access simplifies step 2 above. It would be nice if we could move 
some of the code to the core.

> Then just sit back and wait for refcounts to go down as filehandles are
> closed by the application.

Sit back doesn't mean that the unbind handler (.remove for platform devices, 
.disconnect for USB devices, ...) blocks, right ? It should return after 
completing the steps above, 

> Note: the v4l2/media/cec/IR/whatever core is typically responsible for
> rejecting any ioctls/mmap/etc. once the device node has been unregistered.
> The only valid file operation is release().

That's a very good start. The hard part is then the handling of ioctls in 
progress.

> >If yes, this would pose problems when we have multiple drivers bound
> >to the media device. When audio driver goes away for example, it should
> >be allowed to delete its entities.
> 
> Only if you can safely remove it from the topology data structures while
> being 100% certain that nobody can ever access it. I'm not sure if that is
> the case.

In some cases it might be, but I don't think we can build anything on top of 
that assumption.

> Actually, looking at e.g. adv7604.c it does
> media_entity_cleanup(>entity); in remove() which is an empty function,
> so there doesn't appear any attempt to safely clean up an entity (i.e. make
> sure no running media ioctl can access it or call ops).
> 
> This probably will need to be serialized with the graph_mutex lock.

In the worst case, but we should try to minimize lock contention with proper 
refcounting.

> > The approach current mc-core takes is that the media_device and
> > media_devnode stick around, but entities can be added and removed during
> > media_device lifetime.
> 
> Seems reasonable. But the removal needs to be done carefully, and that
> doesn't seem to be the case now (unless adv7604.c is just buggy).
> 
> > If an app. is still running when media_device is unregistered,
> > media_device isn't released until the last reference goes away and ioctls
> > can check if media_device is registered or not.
> > 
> > We have to decide on the larger lifetime question surrounding media_device
> > and graph as well.
> 
> I don't think there is any choice but to keep it all alive until the last
> reference goes away.

I agree with 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-23 Thread Laurent Pinchart
Hi Shuah,

On Thursday 15 Dec 2016 07:45:29 Shuah Khan wrote:
> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
> > On 15/12/16 13:56, Laurent Pinchart wrote:
> >> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>  Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> > On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:

[snip]

> >> There's plenty of way to try and work around the problem in drivers, some
> >> more racy than others, but if we require changes to all platform drivers
> >> to fix this we need to ensure that we get it right, not as half-baked
> >> hacks spread around the whole subsystem.
> > 
> > Why on earth do we want this for the omap3 driver? It is not a
> > hot-pluggable device and I see no reason whatsoever to start modifying
> > platform drivers just because you can do an unbind. I know there are real
> > hot-pluggable devices, and getting this right for those is of course
> > important.
> 
> This was my first reaction when I saw this RFC series. None of the platform
> drivers are designed to be unbound. Making core changes based on such as
> driver would make the core very complex.
>
> We can't even reproduce the problem on other drivers.
> 
> > If the omap3 is used as a testbed, then that's fine by me, but even then I
> > probably wouldn't want the omap3 code that makes this possible in the
> > kernel. It's just additional code for no purpose.
> 
> I agree with Hans. Why are we using the most complex case as a reference
> driver


The omap3isp driver is a very good test case, as it registers a media 
controller device node, multiple video device nodes and multiple subdev device 
nodes. This is not an exceptional situation (and is actually simpler than a 
driver that would also register an audio device, as we would span multiple 
subsystems there). If we can't design a clean lifetime management solution for 
MC and V4L2 objects that fixes the unbind problem with omap3isp then we could 
as well give up on kernel development completely.

> and basing that driver to make core changes which will force changes to all
> the driver that use mc-core?

Making changes to all drivers isn't a goal. My goal is to fix the objects 
lifetime management problem cleanly. If there's a way to do so that minimizes 
changes to drivers, great. Otherwise, we'll have to bite the bullet. The MC 
and V4L2 core code is the foundation on top of which everything is built, it 
has to be fail-proof and clean.

>  That could be done by overwriting the dev.release() callback at
>  omap3 driver, as I discussed on my past e-mails, and flagging the
>  driver that it should not accept streamon anymore, as the hardware
>  is being disconnecting.
> >>> 
> >>> A mutex will be needed to serialise the this with starting streaming.
> >>> 
>  Btw, that explains a lot why Shuah can't reproduce the stuff you're
>  complaining on her USB hardware.
>  
>  The USB subsystem has a a .disconnect() callback that notifies
>  the drivers that a device was unbound (likely physically removed).
>  The way USB media drivers handle it is by returning -ENODEV to any
>  V4L2 call that would try to touch at the hardware after unbound.
> > 
> > In my view the main problem is that the media core is bound to a struct
> > device set by the driver that creates the MC. But since the MC gives an
> > overview of lots of other (sub)devices the refcount of the media device
> > should be increased for any (sub)device that adds itself to the MC and
> > decreased for any (sub)device that is removed. Only when the very last
> > user goes away can the MC memory be released.
> 
> Correct. Media Device Allocator API work I did allows creating media device
> on parent USB device to allow media sound driver share the media device. It
> does ref-counting on media device and media device is unregistered only when
> the last driver unregisters it.

It doesn't address references taken to the media_device from v4l2_subdev and 
video_device though. I believe we need one reference counting implementation 
that can cover both references from other objects and media_device sharing.

> There is another aspect to explore regarding media device and the graph.
> 
> Should all the entities stick around until all references to media
> device are gone? If an application has /dev/media open, does that
> mean all entities should not be free'd until this app. exits?

Probably not, as we need to target dynamic updates of the media graph.

> What should happen if an app. is streaming? Should the graph stay intact
> until the app. exits?

I'll need to give this more thought, but I'd say yes.

>If yes, this would pose problems when we have multiple drivers bound
>to the media device. When audio driver goes away for example, it should
>be allowed to delete its entities.

There's two parts to "driver goes away". When the 

Re: Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-22 Thread Mauro Carvalho Chehab
Em Thu, 22 Dec 2016 19:47:15 +0200
Laurent Pinchart  escreveu:

> Hi Mauro,
> 
> On Tuesday 20 Dec 2016 23:31:42 Mauro Carvalho Chehab wrote:
> > Em Mon, 19 Dec 2016 07:28:29 -0200 Mauro Carvalho Chehab escreveu:  
> > > Em Fri, 16 Dec 2016 15:45:10 +0100 Hans Verkuil escreveu:  
> > >> We really need a whiteboard for this :-(  
> > > 
> > > Well, we could schedule a media summit together with ELC NA.
> > > 
> > > ELC will be in Feb, 21-23 in Portland.  
> > 
> > Btw, I'm pre reserving a room for us in Feb, 20, assuming that
> > people can make it.  
> 
> I'll be in Portland from the 18th to the 25th. I'm not sure yet whether I'll 
> be free on the 20th though as I also have other meetings to schedule. I'll 
> try 
> to find out soon.

Unfortunately, Feb, 20th seems to be the only day outside ELC
that LF can provide us a room. We could try to do something
between Feb, 21-23, but I guess this would be harder for us, as
people may have different arrangements during ELC days.

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-22 Thread Laurent Pinchart
Hi Mauro,

On Tuesday 20 Dec 2016 23:31:42 Mauro Carvalho Chehab wrote:
> Em Mon, 19 Dec 2016 07:28:29 -0200 Mauro Carvalho Chehab escreveu:
> > Em Fri, 16 Dec 2016 15:45:10 +0100 Hans Verkuil escreveu:
> >> We really need a whiteboard for this :-(
> > 
> > Well, we could schedule a media summit together with ELC NA.
> > 
> > ELC will be in Feb, 21-23 in Portland.
> 
> Btw, I'm pre reserving a room for us in Feb, 20, assuming that
> people can make it.

I'll be in Portland from the 18th to the 25th. I'm not sure yet whether I'll 
be free on the 20th though as I also have other meetings to schedule. I'll try 
to find out soon.

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-21 Thread Shuah Khan
On Tue, Dec 20, 2016 at 6:31 PM, Mauro Carvalho Chehab
 wrote:
> Em Mon, 19 Dec 2016 07:28:29 -0200
> Mauro Carvalho Chehab  escreveu:
>
>> Em Fri, 16 Dec 2016 15:45:10 +0100
>> Hans Verkuil  escreveu:
>>
>> > We really need a whiteboard for this :-(
>>
>> Well, we could schedule a media summit together with ELC NA.
>>
>> ELC will be in Feb, 21-23 in Portland.
>
> Btw, I'm pre reserving a room for us in Feb, 20, assuming that
> people can make it.
>

Hi Mauro,

I can make it.
>
> Thanks,
> Mauro

thanks,
-- Shuah
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-20 Thread Mauro Carvalho Chehab
Em Mon, 19 Dec 2016 07:28:29 -0200
Mauro Carvalho Chehab  escreveu:

> Em Fri, 16 Dec 2016 15:45:10 +0100
> Hans Verkuil  escreveu:
> 
> > We really need a whiteboard for this :-(
> 
> Well, we could schedule a media summit together with ELC NA.
> 
> ELC will be in Feb, 21-23 in Portland.

Btw, I'm pre reserving a room for us in Feb, 20, assuming that
people can make it.


Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-19 Thread Mauro Carvalho Chehab
Em Fri, 16 Dec 2016 17:07:23 +0200
Sakari Ailus  escreveu:

> Hi Hans,

> > chrdev_open in fs/char_dev.c increases the refcount on open() and decreases 
> > it
> > on release(). Thus ensuring that the cdev can never be removed while in an
> > ioctl.  
> 
> It does, but it does not affect memory which is allocated separately of that.
> 
> See this:
> 
> 

That sounds promising. If this bug issues other drivers than OMAP3,
then indeed the core has a bug.

I'll see if I can reproduce it here with some USB drivers later this week.

> > If the omap3 is used as a testbed, then that's fine by me, but even then I
> > probably wouldn't want the omap3 code that makes this possible in the 
> > kernel.
> > It's just additional code for no purpose.  
> 
> The same problems exist on other devices, whether platform, pci or USB, as
> the problems are in the core frameworks rather than (only) in the drivers.
> 
> On platform devices, this happens also when removing the module.
> 
> I've used omap3isp as an example since it demonstrates well the problems and
> a lot of people have the hardware as well. Also, Mauro has requested all
> drivers to be converted to the new API. I'm fine doing that for the actually
> hot-pluggable hardware.

While IMHO it is overkill trying to support hot plug on omap3, I won't
mind if you do that, provided that your patch series can be applied in
a way that it won't cause regressions for real hot-pluggable hardware.

I still think that keeping cdev embedded in a structure that it is
created dynamically when registering the device node, instead of
embedding it at struct media_device. Yet, if you prove that this does
more harm than good, I'm ok on re-embeeding it. However, on such case,
you need to put the patches re-embeeding it at the end of the patch
series (and not at the beginning), as otherwise you'll be causing
regressions.

> One additional reason is that as the omap3isp driver has been used as an
> example to write a number of other drivers, people do see what's the right
> way to do these things, instead of copying code from a driver doing it
> wrong.

Interesting argument. Yet, IMHO, the best would be to do the proper
review on the first platform driver that would support hot-plug,
and use this as an example. It is a shame that project Aurora was
discontinued, as media drivers for such kind of hardware would be an
interesting example.

On that matter, just like we use vivid as a testbench and as an
example for other drivers, it would be great if we could merge
the vimc driver. What's the status of Helen's patchset?

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-19 Thread Mauro Carvalho Chehab
Em Fri, 16 Dec 2016 15:45:10 +0100
Hans Verkuil  escreveu:

> We really need a whiteboard for this :-(

Well, we could schedule a media summit together with ELC NA.

ELC will be in Feb, 21-23 in Portland.

Comments?
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Laurent Pinchart
Hi Shuah,

On Thursday 15 Dec 2016 07:56:55 Shuah Khan wrote:
> On 12/15/2016 03:39 AM, Laurent Pinchart wrote:
> > On Tuesday 13 Dec 2016 15:23:53 Shuah Khan wrote:

[snip]

> >> Please don't pursue this RFC series that makes mc-core changes until
> >> ompa3 driver problems are addressed. There is no need to change the
> >> core unless it is necessary.
> > 
> > It is necessary as has been explained countless times, and will become
> > more and more necessary as media_device instances get shared between
> > multiple drivers, which is currently attempted *without* reference
> > counting.
> 
> You are probably forgetting the Media Device Allocator API work I did
> to make media_device sharable across media and audio drivers.

I haven't. How could I forget it ? :-) Media device sharing is important, and 
will become even more so in the future.

> Sakari's patches don't address the sharable need.

That's correct.

> I have been asking Sakari to use Media Device Allocator API in his patch
> series for allocating media device.

That's where I disagree. The more we dig the more we realize that the current 
infrastructure is broken. Adding anything on top of a construction that is on 
the verge of collapsing isn't a good idea until the foundations have been 
fixed and consolidated.

> I discussed the conflicts between the work I am doing and Sakari's series
> to find a common ground. But it doesn't look like we are going to get there.

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Laurent Pinchart
Hi Mauro,

On Thursday 15 Dec 2016 12:32:07 Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 15:03:36 +0100 Hans Verkuil escreveu:
> > On 15/12/16 13:56, Laurent Pinchart wrote:
> >> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>  Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> > On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> >> Hi Sakari,
> >> 
> >> There's plenty of way to try and work around the problem in drivers,
> >> some more racy than others, but if we require changes to all platform
> >> drivers to fix this we need to ensure that we get it right, not as
> >> half-baked hacks spread around the whole subsystem.
> > 
> > Why on earth do we want this for the omap3 driver? It is not a
> > hot-pluggable device and I see no reason whatsoever to start modifying
> > platform drivers just because you can do an unbind. I know there are real
> > hot-pluggable devices, and getting this right for those is of course
> > important.
> 
> That's indeed a very good point. If unbind is not needed by any usecase,
> the better fix for OMAP3 would be to just prevent it to happen in the first
> place.

There are several reasons to implement proper unbind support in the omap3isp 
driver. Sakari has outlined them in another e-mail in this thread, I won't 
copy them here to avoid splitting the discussion.

>  The USB subsystem has a a .disconnect() callback that notifies
>  the drivers that a device was unbound (likely physically removed).
>  The way USB media drivers handle it is by returning -ENODEV to any
>  V4L2 call that would try to touch at the hardware after unbound.
> > 
> > In my view the main problem is that the media core is bound to a struct
> > device set by the driver that creates the MC. But since the MC gives an
> > overview of lots of other (sub)devices the refcount of the media device
> > should be increased for any (sub)device that adds itself to the MC and
> > decreased for any (sub)device that is removed. Only when the very last
> > user goes away can the MC memory be released.
> > 
> > The memory/refcounting associated with device nodes is unrelated to this:
> > once a devnode is unregistered it will be removed in /dev, and once the
> > last open fh closes any memory associated with the devnode can be
> > released. That will also decrease the refcount to its parent device.
> > 
> > This also means that it is a bad idea to embed devnodes in a larger
> > struct. They should be allocated and freed when the devnode is
> > unregistered and the last open filehandle is closed.
> > 
> > Then the parent's device refcount is decreased, and that may now call its
> > release callback if the refcount reaches 0.
> > 
> > For the media controller's device: any other device driver that needs
> > access to it needs to increase that device's refcount, and only when
> > those devices are released will they decrease the MC device's refcount.
> > 
> > And when that refcount goes to 0 can we finally free everything.
> > 
> > With regards to the opposition to reverting those initial patches, I'm
> > siding with Greg KH. Just revert the bloody patches. It worked most of the
> > time before those patches, so reverting really won't cause bisect
> > problems.
> 
> You're contradicting yourself here ;)
> 
> The patches that this patch series is reverting are the ones that
> de-embeeds devnode struct and fixes its lifecycle.
>
> Reverting those patches will cause regressions on hot-pluggable drivers,
> preventing them to be unplugged. So, if we're willing to revert, then we
> should also revert MC support on them.
> 
> > Just revert and build up things as they should.
> > 
> > Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
> > parent pointer for example) and many drivers (including omap3isp) embed
> > video_device, which is wrong and can lead to complications.
> > 
> > I'm to blame for the embedding since I thought that was a good idea at one
> > time. I now realized that it isn't. Sorry about that...
> > 
> > And because the cdev of the video_device doesn't know about the parent
> > device it is (I think) possible that the parent device is released before
> > the cdev is released. Which can result in major problems.
> 
> I agree with you here. IMHO, de-embeeding cdev's struct from video_device
> seems to be the right thing to do at V4L2 side too.

I believe Hans' comment about embedded devnodes in larger structures referred 
to embedded video_device and media_device inside driver private structures. 
And even in that case I'm not convinced. I've replied to that in another part 
of the mail thread, let's keep the discussion there.

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Laurent Pinchart
On Thursday 15 Dec 2016 13:45:52 Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 15:45:22 +0100
> 
> Hans Verkuil  escreveu:
> > On 15/12/16 15:32, Mauro Carvalho Chehab wrote:
> > > Em Thu, 15 Dec 2016 15:03:36 +0100
> > > 
> > > Hans Verkuil  escreveu:
> > >> On 15/12/16 13:56, Laurent Pinchart wrote:
> > >>> Hi Sakari,
> > >>> 
> > >>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >  On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab 
wrote:
> > > Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> > >> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab 
wrote:
> > >>> Hi Sakari,
> > >>> 
> > >>> There's plenty of way to try and work around the problem in drivers,
> > >>> some more racy than others, but if we require changes to all platform
> > >>> drivers to fix this we need to ensure that we get it right, not as
> > >>> half-baked hacks spread around the whole subsystem.
> > >> 
> > >> Why on earth do we want this for the omap3 driver? It is not a
> > >> hot-pluggable device and I see no reason whatsoever to start modifying
> > >> platform drivers just because you can do an unbind. I know there are
> > >> real hot-pluggable devices, and getting this right for those is of
> > >> course important.
> > > 
> > > That's indeed a very good point. If unbind is not needed by any usecase,
> > > the better fix for OMAP3 would be to just prevent it to happen in the
> > > first
> > > place.
> > > 
> > > The USB subsystem has a a .disconnect() callback that notifies
> > > the drivers that a device was unbound (likely physically removed).
> > > The way USB media drivers handle it is by returning -ENODEV to any
> > > V4L2 call that would try to touch at the hardware after unbound.
> > >> 
> > >> In my view the main problem is that the media core is bound to a struct
> > >> device set by the driver that creates the MC. But since the MC gives an
> > >> overview of lots of other (sub)devices the refcount of the media device
> > >> should be increased for any (sub)device that adds itself to the MC and
> > >> decreased for any (sub)device that is removed. Only when the very last
> > >> user goes away can the MC memory be released.
> > >> 
> > >> The memory/refcounting associated with device nodes is unrelated to
> > >> this:
> > >> once a devnode is unregistered it will be removed in /dev, and once the
> > >> last open fh closes any memory associated with the devnode can be
> > >> released.
> > >> That will also decrease the refcount to its parent device.
> > >> 
> > >> This also means that it is a bad idea to embed devnodes in a larger
> > >> struct.
> > >> They should be allocated and freed when the devnode is unregistered and
> > >> the last open filehandle is closed.
> > >> 
> > >> Then the parent's device refcount is decreased, and that may now call
> > >> its
> > >> release callback if the refcount reaches 0.
> > >> 
> > >> For the media controller's device: any other device driver that needs
> > >> access to it needs to increase that device's refcount, and only when
> > >> those devices are released will they decrease the MC device's
> > >> refcount.
> > >> 
> > >> And when that refcount goes to 0 can we finally free everything.
> > >> 
> > >> With regards to the opposition to reverting those initial patches, I'm
> > >> siding with Greg KH. Just revert the bloody patches. It worked most of
> > >> the
> > >> time before those patches, so reverting really won't cause bisect
> > >> problems.
> > > 
> > > You're contradicting yourself here ;)
> > > 
> > > The patches that this patch series is reverting are the ones that
> > > de-embeeds devnode struct and fixes its lifecycle.
> > > 
> > > Reverting those patches will cause regressions on hot-pluggable drivers,
> > > preventing them to be unplugged. So, if we're willing to revert, then we
> > > should also revert MC support on them.
> > 
> > Two options:
> > 
> > 1) Revert, then build up a proper solution.
> 
> Reverting is a regression, as we'll strip off the MC support from the
> existing devices. We would also need to revert a lot more than just those
> 3 patches.

It's not a regression for all the drivers that were already broken before. It 
can be considered as a regression for the drivers that have been broken 
afterwards (as far as I understand that's several USB drivers that you and 
Shuah have migrated to MC in the past few months, but I haven't followed 
driver work closely enough to name them), and I would certainly not oppose to 
additional patches being reverted for those drivers.

> > 2) Do a big-bang patch switching directly over to the new solution, but
> > that's very hard to review.
> > 2a) Post the patch series in small chunks on the mailinglist (starting
> > with the reverts), but once we're all happy merge that patch series into
> > a single big-bang patch and apply that.
> 
> We could do that, but so far, what has been submitted 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Laurent Pinchart
Hi Hans,

On Friday 16 Dec 2016 17:07:23 Sakari Ailus wrote:
> On Thu, Dec 15, 2016 at 03:03:36PM +0100, Hans Verkuil wrote:
> > On 15/12/16 13:56, Laurent Pinchart wrote:
> >> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:

[snip]

>  It is not clear from the logs what the driver tried to do, but
>  that sounds like a driver's bug, with was not prepared to properly
>  handle unbinds.
> 
>  The problem here is that isp_video_release() is called by V4L2
>  release logic, and not by the MC one:
> 
>  static const struct v4l2_file_operations isp_video_fops = {
>   .owner  = THIS_MODULE,
>   .open   = isp_video_open,
>   .release= isp_video_release,
>   .poll   = vb2_fop_poll,
>   .unlocked_ioctl = video_ioctl2,
>   .mmap   = vb2_fop_mmap,
> };
> 
>  It seems that the driver's logic allows it to be called before or
>  after destroying the MC.
> 
>  Assuming that, if the OMAP3 driver is not used it works,
>  it means that, if the isp_video_release() is called
>  first, no errors will happen, but if MC is destroyed before
>  V4L2 call to its .release() callback, as there's no logic at the
>  driver that would detect it, isp_video_release() will be calling
>  isp_video_streamoff(), with depends on the MC to work.
> 
>  On a first glance, I can see two ways of fixing it:
> 
>  1) to increment devnode's device kobject refcount at OMAP3 .probe(),
>  decrementing it only at isp_video_release(). That will ensure that
>  MC will only be removed after V4L2 removal.
> >>
> >> As soon as you have to dig deep in a structure to find a reference
> >> counter and increment it, bypassing all the API layers, you can be
> >> entirely sure that the solution is wrong.
> > 
> > Indeed.
> > 
>  2) to call isp_video_streamoff() before removing the MC stuff, e. g.
>  inside the MC .release() callback.
> >>>
> >>> This is a fair suggestion, indeed. Let me see what could be done there.
> >>> Albeit this is just *one* of the existing issues. It will not address
> >>> all problems fixed by the patchset.
> >>
> >> We need to stop the hardware at .remove() time. That should not be linked
> >> to a videodev, v4l2_device or media_device .release() callback. When the
> >> .remove() callback returns the driver is not allowed to touch the
> >> hardware anymore. In particular, power domains might clocks or power
> >> supplies, leading to invalid access faults if we try to access hardware
> >> registers.
> > 
> > Correct.
> > 
> >> USB devices get help from the USB core that cancels all USB operations
> >> in progress when they're disconnected. Platform devices don't have it as
> >> easy, and need to implement everything themselves. We thus need to stop
> >> the hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF
> >> ioctl at .remove() time.
> > 
> > Please don't. This shouldn't be done automatically.
> > 
> >> That could introduce other races between .remove() and the
> >> userspace API. A better solution is to make sure the objects that are
> >> needed at .release() time of the device node are all reference-counted
> >> and only released when the last reference goes away.
> >>
> >> There's plenty of way to try and work around the problem in drivers, some
> >> more racy than others, but if we require changes to all platform drivers
> >> to fix this we need to ensure that we get it right, not as half-baked
> >> hacks spread around the whole subsystem.
> > 
> > Why on earth do we want this for the omap3 driver? It is not a
> > hot-pluggable device and I see no reason whatsoever to start modifying
> > platform drivers just because you can do an unbind. I know there are real
> > hot-pluggable devices, and getting this right for those is of course
> > important.
> > 
> > If the omap3 is used as a testbed, then that's fine by me, but even then I
> > probably wouldn't want the omap3 code that makes this possible in the
> > kernel. It's just additional code for no purpose.
> 
> The same problems exist on other devices, whether platform, pci or USB, as
> the problems are in the core frameworks rather than (only) in the drivers.
> 
> On platform devices, this happens also when removing the module.
> 
> I've used omap3isp as an example since it demonstrates well the problems and
> a lot of people have the hardware as well. Also, Mauro has requested all
> drivers to be converted to the new API. I'm fine doing that for the
> actually hot-pluggable hardware.
> 
> One additional reason is that as the omap3isp driver has been used as an
> example to write a number of other drivers, people do see what's the right
> way to do these things, instead of copying code from a driver doing it
> wrong.

That's a very important reason in my opinion. If we design core code properly 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Sakari Ailus
Hi Hans,

On Thu, Dec 15, 2016 at 03:03:36PM +0100, Hans Verkuil wrote:
> On 15/12/16 13:56, Laurent Pinchart wrote:
> >Hi Sakari,
> >
> >On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> >>>Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> >Hi Sakari,
> >
> >I answered you point to point below, but I suspect that you missed how
> >the current approach works. So, I decided to write a quick summary
> >here.
> >
> >The character devices /dev/media? are created via cdev, with relies on
> >a kobject per device, with has an embedded struct kref inside.
> >
> >Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> >struct device, when the code does:
> >  devnode->cdev.kobj.parent = >dev.kobj;
> >
> >before calling cdev_add().
> >
> >The current lifetime management is actually based on cdev's kobject's
> >refcount, provided by its embedded kref.
> >
> >The kref warrants that any data associated with /dev/media0 won't be
> >freed if there are any pending system call. In other words, when
> >cdev_del() is called, it will remove /dev/media0 from the filesystem,
> >and will call kobject_put().
> >
> >If the refcount is zero, it will call devnode->dev.release(). If the
> >kobject refcount is not zero, the data won't be freed.
> >
> >So, in the best case scenario, there's no opened file descriptors
> >by the time media device node is unregistered. So, it will free
> >everything.
> >
> >In the worse case scenario, e. g. when the driver is removed or
> >unbind while /dev/media0 has some opened file descriptor(s),
> >the cdev logic will do the proper lifetime management.
> >
> >On such case, /dev/media0 disappears from the file system, so another
> >open is not possible anymore. The data structures will remain
> >allocated until all associated file descriptors are not closed.
> >
> >When all file descriptors are closed, the data will be freed.
> >
> >On that time, it will call an optional dev.release() callback,
> >responsible to free any other data struct that the driver allocated.
> 
> The patchset does not change this. It's not a question of the
> media_devnode struct either. That's not an issue.
> 
> The issue is rather what else can be accessed through the media device
> and other interfaces. As IOCTLs are not serialised with device removal
> (which now releases much of the data structures)
> >>>
> >>>Huh? ioctls are serialized with struct device removal. The Driver core
> >>>warrants that.
> >>
> >>How?
> >>
> >>As far as I can tell, there's nothing in the way of an IOCTL being in
> >>progress on a character device which is registered by the driver for a
> >>hardware device which is being removed.
> >>
> >>vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
> >>case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
> >>are taken during that path, which I believe is by design.
> 
> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> on release(). Thus ensuring that the cdev can never be removed while in an
> ioctl.

It does, but it does not affect memory which is allocated separately of that.

See this:



> 
> >>
> there's a high chance of accessing
> released memory (or mutexes that have been already destroyed). An
> example of that is here, stopping a running pipeline after unbinding
> the device. What happens there is that the media device is released
> whilst it's in use through the video device.
> 
> 
> >>>
> >>>It is not clear from the logs what the driver tried to do, but
> >>>that sounds like a driver's bug, with was not prepared to properly
> >>>handle unbinds.
> >>>
> >>>The problem here is that isp_video_release() is called by V4L2
> >>>release logic, and not by the MC one:
> >>>
> >>>static const struct v4l2_file_operations isp_video_fops = {
> >>>  .owner  = THIS_MODULE,
> >>>  .open   = isp_video_open,
> >>>  .release= isp_video_release,
> >>>  .poll   = vb2_fop_poll,
> >>>  .unlocked_ioctl = video_ioctl2,
> >>>  .mmap   = vb2_fop_mmap,
> >>>};
> >>>
> >>>It seems that the driver's logic allows it to be called before or
> >>>after destroying the MC.
> >>>
> >>>Assuming that, if the OMAP3 driver is not used it works,
> >>>it means that, if the isp_video_release() is called
> >>>first, no errors will happen, but if MC is destroyed before
> >>>V4L2 call to its .release() callback, as there's no logic at the
> >>>driver that would 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Hans Verkuil

On 16/12/16 13:00, Mauro Carvalho Chehab wrote:

Em
 escreveu:


On 16/12/16 11:57, Mauro Carvalho Chehab wrote:

Em Fri, 16 Dec 2016 11:11:25 +0100
Hans Verkuil  escreveu:


Would it make sense to enforce that dependency. Can we tie /dev/media usecount
to /dev/video etc. usecount? In other words:

/dev/video is opened, then open /dev/media.


When a device node is registered it should increase the refcount on the 
media_device
(as I proposed, that would be mdev->dev). When a device node is unregistered 
and the
last user disappeared, then it can decrease the media_device refcount.

So as long as anyone is using a device node, the media_device will stick around 
as
well.

No need to take refcounts on open/close.


That makes sense. You're meaning something like the enclosed (untested)
patch?


One note: as I mentioned, the video_device does not set the cdev parent 
correctly,
so that bug needs to be fixed first for this to work.


Actually, __video_register_device() seems to be setting the parent
properly:

if (vdev->dev_parent == NULL)
vdev->dev_parent = vdev->v4l2_dev->dev;


No, I mean this code (from cec-core.c):


/* Part 2: Initialize and register the character device */
 cdev_init(>cdev, _devnode_fops);
 devnode->cdev.kobj.parent = >dev.kobj;
 devnode->cdev.owner = owner;

 ret = cdev_add(>cdev, devnode->dev.devt, 1);
 if (ret < 0) {
 pr_err("%s: cdev_add failed\n", __func__);
 goto clr_bit;
 }

 ret = device_add(>dev);
 if (ret)
 goto cdev_del;

which sets cdev.kobj.parent. And that would indeed be vdev->dev_parent.


Ah! So, you're basically proposing to have a separate struct for
V4L2 devnode as well, right?

Makes sense.


No need for that, that's already struct video_device.







Thanks,
Mauro

[PATCH] Be sure that the media_device won't be freed too early

This code snippet is untested.

Signed-off-by: Mauro Carvalho chehab 

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 8756275e9fc4..5fdeab382069 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -706,7 +706,7 @@ int __must_check __media_device_register(struct 
media_device *mdev,
struct media_devnode *devnode;
int ret;

-   devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
+   devnode = devm_kzalloc(mdev->dev, sizeof(*devnode), GFP_KERNEL);


I'm not sure about this change. I *think* this would work, but *only* if all
the refcounting is 100% correct, and we know it isn't at the moment. So I
think this should be postponed until we are confident everything is correct.


Yes, such change will require first to be sure that drivers would be
doing the right thing.


So devm_ resources are released right after remove() exits, not when the last 
reference
goes to 0. In other words, devm_ typically can't be used for these complex scenarios, 
certainly not for memory. See discussion with Laurent on irc.







if (!devnode)
return -ENOMEM;

diff --git a/drivers/media/v4l2-core/v4l2-dev.c 
b/drivers/media/v4l2-core/v4l2-dev.c
index 8be561ab2615..14a3c56dbcac 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -196,6 +196,7 @@ static void v4l2_device_release(struct device *cd)
 #if defined(CONFIG_MEDIA_CONTROLLER)
if (v4l2_dev->mdev) {
/* Remove interfaces and interface links */
+   put_device(v4l2_dev->mdev->dev);
media_devnode_remove(vdev->intf_devnode);
if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
media_device_unregister_entity(>entity);


I think this is the wrong order: put_device should go after 
media_device_unregister_entity().


OK.




@@ -810,6 +811,7 @@ static int video_register_media_controller(struct 
video_device *vdev, int type)
return -ENOMEM;
}
}
+   get_device(vdev->v4l2_dev->dev);


You mean v4l2_dev->mdev->dev?


Yes, that's right (vdev->v4l2_dev->mdev->dev).


Laurent helped me realize that this won't work either: mdev->dev is typically a 
platform/pci/usb device, and that won't go away when you rmmod the driver.


So while taking a refcount on that device doesn't hurt, we also need to take a 
refcount
on a kref inside the mdev. Just like v4l2_device this struct has an unfortunate 
name.
It's not a device, but a root structure for media devices.

We really need a whiteboard for this :-(

Regards,

Hans







/* FIXME: how to create the other interface links? */

@@ -1015,6 +1017,11 @@ void video_unregister_device(struct video_device *vdev)
if (!vdev || !video_is_registered(vdev))
return;

+#if defined(CONFIG_MEDIA_CONTROLLER)
+   if (vdev->v4l2_dev->dev)
+   put_device(vdev->v4l2_dev->dev);



Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Mauro Carvalho Chehab
Em 
 escreveu:

> On 16/12/16 11:57, Mauro Carvalho Chehab wrote:
> > Em Fri, 16 Dec 2016 11:11:25 +0100
> > Hans Verkuil  escreveu:
> >  
> >>> Would it make sense to enforce that dependency. Can we tie /dev/media 
> >>> usecount
> >>> to /dev/video etc. usecount? In other words:
> >>>
> >>> /dev/video is opened, then open /dev/media.  
> >>
> >> When a device node is registered it should increase the refcount on the 
> >> media_device
> >> (as I proposed, that would be mdev->dev). When a device node is 
> >> unregistered and the
> >> last user disappeared, then it can decrease the media_device refcount.
> >>
> >> So as long as anyone is using a device node, the media_device will stick 
> >> around as
> >> well.
> >>
> >> No need to take refcounts on open/close.  
> >
> > That makes sense. You're meaning something like the enclosed (untested)
> > patch?
> >  
> >> One note: as I mentioned, the video_device does not set the cdev parent 
> >> correctly,
> >> so that bug needs to be fixed first for this to work.  
> >
> > Actually, __video_register_device() seems to be setting the parent
> > properly:
> >
> > if (vdev->dev_parent == NULL)
> > vdev->dev_parent = vdev->v4l2_dev->dev;  
> 
> No, I mean this code (from cec-core.c):
> 
> 
> /* Part 2: Initialize and register the character device */
>  cdev_init(>cdev, _devnode_fops);
>  devnode->cdev.kobj.parent = >dev.kobj;
>  devnode->cdev.owner = owner;
> 
>  ret = cdev_add(>cdev, devnode->dev.devt, 1);
>  if (ret < 0) {
>  pr_err("%s: cdev_add failed\n", __func__);
>  goto clr_bit;
>  }
> 
>  ret = device_add(>dev);
>  if (ret)
>  goto cdev_del;
> 
> which sets cdev.kobj.parent. And that would indeed be vdev->dev_parent.

Ah! So, you're basically proposing to have a separate struct for
V4L2 devnode as well, right?

Makes sense.

> 
> >
> > Thanks,
> > Mauro
> >
> > [PATCH] Be sure that the media_device won't be freed too early
> >
> > This code snippet is untested.
> >
> > Signed-off-by: Mauro Carvalho chehab 
> >
> > diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
> > index 8756275e9fc4..5fdeab382069 100644
> > --- a/drivers/media/media-device.c
> > +++ b/drivers/media/media-device.c
> > @@ -706,7 +706,7 @@ int __must_check __media_device_register(struct 
> > media_device *mdev,
> > struct media_devnode *devnode;
> > int ret;
> >
> > -   devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
> > +   devnode = devm_kzalloc(mdev->dev, sizeof(*devnode), GFP_KERNEL);  
> 
> I'm not sure about this change. I *think* this would work, but *only* if all
> the refcounting is 100% correct, and we know it isn't at the moment. So I
> think this should be postponed until we are confident everything is correct.

Yes, such change will require first to be sure that drivers would be
doing the right thing.

> 
> > if (!devnode)
> > return -ENOMEM;
> >
> > diff --git a/drivers/media/v4l2-core/v4l2-dev.c 
> > b/drivers/media/v4l2-core/v4l2-dev.c
> > index 8be561ab2615..14a3c56dbcac 100644
> > --- a/drivers/media/v4l2-core/v4l2-dev.c
> > +++ b/drivers/media/v4l2-core/v4l2-dev.c
> > @@ -196,6 +196,7 @@ static void v4l2_device_release(struct device *cd)
> >  #if defined(CONFIG_MEDIA_CONTROLLER)
> > if (v4l2_dev->mdev) {
> > /* Remove interfaces and interface links */
> > +   put_device(v4l2_dev->mdev->dev);
> > media_devnode_remove(vdev->intf_devnode);
> > if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
> > media_device_unregister_entity(>entity);  
> 
> I think this is the wrong order: put_device should go after 
> media_device_unregister_entity().

OK.

> 
> > @@ -810,6 +811,7 @@ static int video_register_media_controller(struct 
> > video_device *vdev, int type)
> > return -ENOMEM;
> > }
> > }
> > +   get_device(vdev->v4l2_dev->dev);  
> 
> You mean v4l2_dev->mdev->dev?

Yes, that's right (vdev->v4l2_dev->mdev->dev).

> 
> >
> > /* FIXME: how to create the other interface links? */
> >
> > @@ -1015,6 +1017,11 @@ void video_unregister_device(struct video_device 
> > *vdev)
> > if (!vdev || !video_is_registered(vdev))
> > return;
> >
> > +#if defined(CONFIG_MEDIA_CONTROLLER)
> > +   if (vdev->v4l2_dev->dev)
> > +   put_device(vdev->v4l2_dev->dev);  
> 
> Ditto.
> 
> > +#endif
> > +
> > mutex_lock(_lock);
> > /* This must be in a critical section to prevent a race with v4l2_open.
> >  * Once this bit has been cleared video_get may never be called again.
> > diff --git a/drivers/media/v4l2-core/v4l2-device.c 
> > b/drivers/media/v4l2-core/v4l2-device.c
> > index 62bbed76dbbc..53f42090c762 100644
> > --- a/drivers/media/v4l2-core/v4l2-device.c
> > +++ b/drivers/media/v4l2-core/v4l2-device.c
> > @@ -188,6 +188,7 @@ int 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Hans Verkuil

On 16/12/16 11:57, Mauro Carvalho Chehab wrote:

Em Fri, 16 Dec 2016 11:11:25 +0100
Hans Verkuil  escreveu:


Would it make sense to enforce that dependency. Can we tie /dev/media usecount
to /dev/video etc. usecount? In other words:

/dev/video is opened, then open /dev/media.


When a device node is registered it should increase the refcount on the 
media_device
(as I proposed, that would be mdev->dev). When a device node is unregistered 
and the
last user disappeared, then it can decrease the media_device refcount.

So as long as anyone is using a device node, the media_device will stick around 
as
well.

No need to take refcounts on open/close.


That makes sense. You're meaning something like the enclosed (untested)
patch?


One note: as I mentioned, the video_device does not set the cdev parent 
correctly,
so that bug needs to be fixed first for this to work.


Actually, __video_register_device() seems to be setting the parent
properly:

if (vdev->dev_parent == NULL)
vdev->dev_parent = vdev->v4l2_dev->dev;


No, I mean this code (from cec-core.c):


   /* Part 2: Initialize and register the character device */
cdev_init(>cdev, _devnode_fops);
devnode->cdev.kobj.parent = >dev.kobj;
devnode->cdev.owner = owner;

ret = cdev_add(>cdev, devnode->dev.devt, 1);
if (ret < 0) {
pr_err("%s: cdev_add failed\n", __func__);
goto clr_bit;
}

ret = device_add(>dev);
if (ret)
goto cdev_del;

which sets cdev.kobj.parent. And that would indeed be vdev->dev_parent.



Thanks,
Mauro

[PATCH] Be sure that the media_device won't be freed too early

This code snippet is untested.

Signed-off-by: Mauro Carvalho chehab 

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 8756275e9fc4..5fdeab382069 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -706,7 +706,7 @@ int __must_check __media_device_register(struct 
media_device *mdev,
struct media_devnode *devnode;
int ret;

-   devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
+   devnode = devm_kzalloc(mdev->dev, sizeof(*devnode), GFP_KERNEL);


I'm not sure about this change. I *think* this would work, but *only* if all
the refcounting is 100% correct, and we know it isn't at the moment. So I
think this should be postponed until we are confident everything is correct.


if (!devnode)
return -ENOMEM;

diff --git a/drivers/media/v4l2-core/v4l2-dev.c 
b/drivers/media/v4l2-core/v4l2-dev.c
index 8be561ab2615..14a3c56dbcac 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -196,6 +196,7 @@ static void v4l2_device_release(struct device *cd)
 #if defined(CONFIG_MEDIA_CONTROLLER)
if (v4l2_dev->mdev) {
/* Remove interfaces and interface links */
+   put_device(v4l2_dev->mdev->dev);
media_devnode_remove(vdev->intf_devnode);
if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
media_device_unregister_entity(>entity);


I think this is the wrong order: put_device should go after 
media_device_unregister_entity().


@@ -810,6 +811,7 @@ static int video_register_media_controller(struct 
video_device *vdev, int type)
return -ENOMEM;
}
}
+   get_device(vdev->v4l2_dev->dev);


You mean v4l2_dev->mdev->dev?



/* FIXME: how to create the other interface links? */

@@ -1015,6 +1017,11 @@ void video_unregister_device(struct video_device *vdev)
if (!vdev || !video_is_registered(vdev))
return;

+#if defined(CONFIG_MEDIA_CONTROLLER)
+   if (vdev->v4l2_dev->dev)
+   put_device(vdev->v4l2_dev->dev);


Ditto.


+#endif
+
mutex_lock(_lock);
/* This must be in a critical section to prevent a race with v4l2_open.
 * Once this bit has been cleared video_get may never be called again.
diff --git a/drivers/media/v4l2-core/v4l2-device.c 
b/drivers/media/v4l2-core/v4l2-device.c
index 62bbed76dbbc..53f42090c762 100644
--- a/drivers/media/v4l2-core/v4l2-device.c
+++ b/drivers/media/v4l2-core/v4l2-device.c
@@ -188,6 +188,7 @@ int v4l2_device_register_subdev(struct v4l2_device 
*v4l2_dev,
err = media_device_register_entity(v4l2_dev->mdev, entity);
if (err < 0)
goto error_module;
+   get_device(v4l2_dev->mdev->dev);
}
 #endif

@@ -205,6 +206,8 @@ int v4l2_device_register_subdev(struct v4l2_device 
*v4l2_dev,

 error_unregister:
 #if defined(CONFIG_MEDIA_CONTROLLER)
+   if (v4l2_dev->mdev)
+   put_device(v4l2_dev->mdev->dev);
media_device_unregister_entity(entity);
 #endif
 error_module:
@@ -310,6 +313,7 @@ void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Mauro Carvalho Chehab
Em Fri, 16 Dec 2016 11:11:25 +0100
Hans Verkuil  escreveu:

> > Would it make sense to enforce that dependency. Can we tie /dev/media 
> > usecount
> > to /dev/video etc. usecount? In other words:
> >
> > /dev/video is opened, then open /dev/media.  
> 
> When a device node is registered it should increase the refcount on the 
> media_device
> (as I proposed, that would be mdev->dev). When a device node is unregistered 
> and the
> last user disappeared, then it can decrease the media_device refcount.
> 
> So as long as anyone is using a device node, the media_device will stick 
> around as
> well.
> 
> No need to take refcounts on open/close.

That makes sense. You're meaning something like the enclosed (untested)
patch?

> One note: as I mentioned, the video_device does not set the cdev parent 
> correctly,
> so that bug needs to be fixed first for this to work.

Actually, __video_register_device() seems to be setting the parent
properly:

if (vdev->dev_parent == NULL)
vdev->dev_parent = vdev->v4l2_dev->dev;

Thanks,
Mauro

[PATCH] Be sure that the media_device won't be freed too early

This code snippet is untested.

Signed-off-by: Mauro Carvalho chehab 

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 8756275e9fc4..5fdeab382069 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -706,7 +706,7 @@ int __must_check __media_device_register(struct 
media_device *mdev,
struct media_devnode *devnode;
int ret;
 
-   devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
+   devnode = devm_kzalloc(mdev->dev, sizeof(*devnode), GFP_KERNEL);
if (!devnode)
return -ENOMEM;
 
diff --git a/drivers/media/v4l2-core/v4l2-dev.c 
b/drivers/media/v4l2-core/v4l2-dev.c
index 8be561ab2615..14a3c56dbcac 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -196,6 +196,7 @@ static void v4l2_device_release(struct device *cd)
 #if defined(CONFIG_MEDIA_CONTROLLER)
if (v4l2_dev->mdev) {
/* Remove interfaces and interface links */
+   put_device(v4l2_dev->mdev->dev);
media_devnode_remove(vdev->intf_devnode);
if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
media_device_unregister_entity(>entity);
@@ -810,6 +811,7 @@ static int video_register_media_controller(struct 
video_device *vdev, int type)
return -ENOMEM;
}
}
+   get_device(vdev->v4l2_dev->dev);
 
/* FIXME: how to create the other interface links? */
 
@@ -1015,6 +1017,11 @@ void video_unregister_device(struct video_device *vdev)
if (!vdev || !video_is_registered(vdev))
return;
 
+#if defined(CONFIG_MEDIA_CONTROLLER)
+   if (vdev->v4l2_dev->dev)
+   put_device(vdev->v4l2_dev->dev);
+#endif
+
mutex_lock(_lock);
/* This must be in a critical section to prevent a race with v4l2_open.
 * Once this bit has been cleared video_get may never be called again.
diff --git a/drivers/media/v4l2-core/v4l2-device.c 
b/drivers/media/v4l2-core/v4l2-device.c
index 62bbed76dbbc..53f42090c762 100644
--- a/drivers/media/v4l2-core/v4l2-device.c
+++ b/drivers/media/v4l2-core/v4l2-device.c
@@ -188,6 +188,7 @@ int v4l2_device_register_subdev(struct v4l2_device 
*v4l2_dev,
err = media_device_register_entity(v4l2_dev->mdev, entity);
if (err < 0)
goto error_module;
+   get_device(v4l2_dev->mdev->dev);
}
 #endif
 
@@ -205,6 +206,8 @@ int v4l2_device_register_subdev(struct v4l2_device 
*v4l2_dev,
 
 error_unregister:
 #if defined(CONFIG_MEDIA_CONTROLLER)
+   if (v4l2_dev->mdev)
+   put_device(v4l2_dev->mdev->dev);
media_device_unregister_entity(entity);
 #endif
 error_module:
@@ -310,6 +313,7 @@ void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
 * links are removed by the function below, in the right order
 */
media_device_unregister_entity(>entity);
+   put_device(v4l2_dev->mdev->dev);
}
 #endif
video_unregister_device(sd->devnode);




--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Mauro Carvalho Chehab
Em Fri, 16 Dec 2016 11:03:09 +0100
Hans Verkuil  escreveu:

> So:
> 
> 1) subdev drivers should disallow unbind
> 2) interface entities should call media_device_unregister_entity() when they
> are unregistered (if that doesn't already happen)

Sounds like a plan to me.

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Hans Verkuil

On 15/12/16 18:51, Shuah Khan wrote:

On 12/15/2016 10:25 AM, Mauro Carvalho Chehab wrote:

Em Thu, 15 Dec 2016 10:09:53 -0700
Shuah Khan  escreveu:


On 12/15/2016 09:28 AM, Hans Verkuil wrote:

On 15/12/16 17:06, Shuah Khan wrote:




I think this will work for interface entities, but for subdev entities this
certainly won't work. Unbinding subdevs should be blocked (just set
suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
have pointers to subdev data, so unbinding them will just fail horribly.



Yes that is an option. I did something similar for au0828 and snd_usb_audio
case, so the module that registers the media_device can't unbound until the
other driver. If au0828 registers media_device, it becomes the owner and if
it gets unbound ioctls will start to see problems.


Sorry I meant to say rmmod'ed not unbound. Unbound will work just fine. If the
modules that owns the media_devnode goes away, there will be problems with
cdev trying to load module when application closes the device file and exits.
In this case, Media Device Allocator API takes module reference, so its use
count goes up.



What this means though is that drivers can't be unbound easily. But that is
a small price to pay compared to the problems we will see if a driver is
unbound when its entities are still in use. Also, unsetting bind_attrs has
to be done as well, otherwise we can never unbind any driver.


I don't think suppress_bind_attrs will work on USB drivers, as the
device can be physically removed.



Yeah setting suppress_bind_attrs would cause problems. On one hand keeping
all entities until all references are gone sound like a good option, however
this would cause problems coordinating removal especially in the case of
embedded entities. Can this be done in a simpler way? The way I see it, we
have /dev/video, /dev/dvb, /dev/snd/* etc. that depend on /dev/media for
graph nodes. Any one of these devices could be open when any of the drivers
is unbound (physical removal is a simpler case).

Would it make sense to enforce that dependency. Can we tie /dev/media usecount
to /dev/video etc. usecount? In other words:

/dev/video is opened, then open /dev/media.


When a device node is registered it should increase the refcount on the 
media_device
(as I proposed, that would be mdev->dev). When a device node is unregistered 
and the
last user disappeared, then it can decrease the media_device refcount.

So as long as anyone is using a device node, the media_device will stick around 
as
well.

No need to take refcounts on open/close.

One note: as I mentioned, the video_device does not set the cdev parent 
correctly,
so that bug needs to be fixed first for this to work.


prevent entities being removed if /dev/media is open.

Would that help. The above could be done in a generic way possibly. Would it
help if /dev/media is kept open when streaming is active? That is just one


Again, it's not about the device nodes, it's about the media_device.

Regards,

Hans


use-case, there might be others.

thanks,
-- Shuah


thanks,
-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-16 Thread Hans Verkuil

On 15/12/16 18:09, Shuah Khan wrote:

On 12/15/2016 09:28 AM, Hans Verkuil wrote:

On 15/12/16 17:06, Shuah Khan wrote:

On 12/15/2016 08:26 AM, Hans Verkuil wrote:

On 15/12/16 15:45, Shuah Khan wrote:

On 12/15/2016 07:03 AM, Hans Verkuil wrote:

On 15/12/16 13:56, Laurent Pinchart wrote:

Hi Sakari,

On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:

On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:

Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:

On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:

Hi Sakari,

I answered you point to point below, but I suspect that you missed how
the current approach works. So, I decided to write a quick summary
here.

The character devices /dev/media? are created via cdev, with relies on
a kobject per device, with has an embedded struct kref inside.

Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
struct device, when the code does:
  devnode->cdev.kobj.parent = >dev.kobj;

before calling cdev_add().

The current lifetime management is actually based on cdev's kobject's
refcount, provided by its embedded kref.

The kref warrants that any data associated with /dev/media0 won't be
freed if there are any pending system call. In other words, when
cdev_del() is called, it will remove /dev/media0 from the filesystem,
and will call kobject_put().

If the refcount is zero, it will call devnode->dev.release(). If the
kobject refcount is not zero, the data won't be freed.

So, in the best case scenario, there's no opened file descriptors
by the time media device node is unregistered. So, it will free
everything.

In the worse case scenario, e. g. when the driver is removed or
unbind while /dev/media0 has some opened file descriptor(s),
the cdev logic will do the proper lifetime management.

On such case, /dev/media0 disappears from the file system, so another
open is not possible anymore. The data structures will remain
allocated until all associated file descriptors are not closed.

When all file descriptors are closed, the data will be freed.

On that time, it will call an optional dev.release() callback,
responsible to free any other data struct that the driver allocated.


The patchset does not change this. It's not a question of the
media_devnode struct either. That's not an issue.

The issue is rather what else can be accessed through the media device
and other interfaces. As IOCTLs are not serialised with device removal
(which now releases much of the data structures)


Huh? ioctls are serialized with struct device removal. The Driver core
warrants that.


How?

As far as I can tell, there's nothing in the way of an IOCTL being in
progress on a character device which is registered by the driver for a
hardware device which is being removed.

vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
are taken during that path, which I believe is by design.


chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
on release(). Thus ensuring that the cdev can never be removed while in an
ioctl.




there's a high chance of accessing
released memory (or mutexes that have been already destroyed). An
example of that is here, stopping a running pipeline after unbinding
the device. What happens there is that the media device is released
whilst it's in use through the video device.




It is not clear from the logs what the driver tried to do, but
that sounds like a driver's bug, with was not prepared to properly
handle unbinds.

The problem here is that isp_video_release() is called by V4L2
release logic, and not by the MC one:

static const struct v4l2_file_operations isp_video_fops = {
  .owner  = THIS_MODULE,
  .open   = isp_video_open,
  .release= isp_video_release,
  .poll   = vb2_fop_poll,
  .unlocked_ioctl = video_ioctl2,
  .mmap   = vb2_fop_mmap,
};

It seems that the driver's logic allows it to be called before or
after destroying the MC.

Assuming that, if the OMAP3 driver is not used it works,
it means that, if the isp_video_release() is called
first, no errors will happen, but if MC is destroyed before
V4L2 call to its .release() callback, as there's no logic at the
driver that would detect it, isp_video_release() will be calling
isp_video_streamoff(), with depends on the MC to work.

On a first glance, I can see two ways of fixing it:

1) to increment devnode's device kobject refcount at OMAP3 .probe(),
decrementing it only at isp_video_release(). That will ensure that
MC will only be removed after V4L2 removal.


As soon as you have to dig deep in a structure to find a reference counter and
increment it, bypassing all the API layers, you can be entirely sure that the
solution is wrong.


Indeed.




2) to call isp_video_streamoff() before removing the MC 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Shuah Khan
On 12/15/2016 10:25 AM, Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 10:09:53 -0700
> Shuah Khan  escreveu:
> 
>> On 12/15/2016 09:28 AM, Hans Verkuil wrote:
>>> On 15/12/16 17:06, Shuah Khan wrote:  
> 
>>>
>>> I think this will work for interface entities, but for subdev entities this
>>> certainly won't work. Unbinding subdevs should be blocked (just set
>>> suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
>>> have pointers to subdev data, so unbinding them will just fail horribly.
>>>   
>>
>> Yes that is an option. I did something similar for au0828 and snd_usb_audio
>> case, so the module that registers the media_device can't unbound until the
>> other driver. If au0828 registers media_device, it becomes the owner and if
>> it gets unbound ioctls will start to see problems.

Sorry I meant to say rmmod'ed not unbound. Unbound will work just fine. If the
modules that owns the media_devnode goes away, there will be problems with
cdev trying to load module when application closes the device file and exits.
In this case, Media Device Allocator API takes module reference, so its use
count goes up.

>>
>> What this means though is that drivers can't be unbound easily. But that is
>> a small price to pay compared to the problems we will see if a driver is
>> unbound when its entities are still in use. Also, unsetting bind_attrs has
>> to be done as well, otherwise we can never unbind any driver.
> 
> I don't think suppress_bind_attrs will work on USB drivers, as the
> device can be physically removed. 
> 

Yeah setting suppress_bind_attrs would cause problems. On one hand keeping
all entities until all references are gone sound like a good option, however
this would cause problems coordinating removal especially in the case of
embedded entities. Can this be done in a simpler way? The way I see it, we
have /dev/video, /dev/dvb, /dev/snd/* etc. that depend on /dev/media for
graph nodes. Any one of these devices could be open when any of the drivers
is unbound (physical removal is a simpler case).

Would it make sense to enforce that dependency. Can we tie /dev/media usecount
to /dev/video etc. usecount? In other words:

/dev/video is opened, then open /dev/media.
prevent entities being removed if /dev/media is open.

Would that help. The above could be done in a generic way possibly. Would it
help if /dev/media is kept open when streaming is active? That is just one
use-case, there might be others.

thanks,
-- Shuah


thanks,
-- Shuah

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Mauro Carvalho Chehab
Em Thu, 15 Dec 2016 10:09:53 -0700
Shuah Khan  escreveu:

> On 12/15/2016 09:28 AM, Hans Verkuil wrote:
> > On 15/12/16 17:06, Shuah Khan wrote:  

> > 
> > I think this will work for interface entities, but for subdev entities this
> > certainly won't work. Unbinding subdevs should be blocked (just set
> > suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
> > have pointers to subdev data, so unbinding them will just fail horribly.
> >   
> 
> Yes that is an option. I did something similar for au0828 and snd_usb_audio
> case, so the module that registers the media_device can't unbound until the
> other driver. If au0828 registers media_device, it becomes the owner and if
> it gets unbound ioctls will start to see problems.
> 
> What this means though is that drivers can't be unbound easily. But that is
> a small price to pay compared to the problems we will see if a driver is
> unbound when its entities are still in use. Also, unsetting bind_attrs has
> to be done as well, otherwise we can never unbind any driver.

I don't think suppress_bind_attrs will work on USB drivers, as the
device can be physically removed. 

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Shuah Khan
On 12/15/2016 09:28 AM, Hans Verkuil wrote:
> On 15/12/16 17:06, Shuah Khan wrote:
>> On 12/15/2016 08:26 AM, Hans Verkuil wrote:
>>> On 15/12/16 15:45, Shuah Khan wrote:
 On 12/15/2016 07:03 AM, Hans Verkuil wrote:
> On 15/12/16 13:56, Laurent Pinchart wrote:
>> Hi Sakari,
>>
>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
 Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>> Hi Sakari,
>>
>> I answered you point to point below, but I suspect that you missed 
>> how
>> the current approach works. So, I decided to write a quick summary
>> here.
>>
>> The character devices /dev/media? are created via cdev, with relies 
>> on
>> a kobject per device, with has an embedded struct kref inside.
>>
>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>> struct device, when the code does:
>>   devnode->cdev.kobj.parent = >dev.kobj;
>>
>> before calling cdev_add().
>>
>> The current lifetime management is actually based on cdev's kobject's
>> refcount, provided by its embedded kref.
>>
>> The kref warrants that any data associated with /dev/media0 won't be
>> freed if there are any pending system call. In other words, when
>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>> and will call kobject_put().
>>
>> If the refcount is zero, it will call devnode->dev.release(). If the
>> kobject refcount is not zero, the data won't be freed.
>>
>> So, in the best case scenario, there's no opened file descriptors
>> by the time media device node is unregistered. So, it will free
>> everything.
>>
>> In the worse case scenario, e. g. when the driver is removed or
>> unbind while /dev/media0 has some opened file descriptor(s),
>> the cdev logic will do the proper lifetime management.
>>
>> On such case, /dev/media0 disappears from the file system, so another
>> open is not possible anymore. The data structures will remain
>> allocated until all associated file descriptors are not closed.
>>
>> When all file descriptors are closed, the data will be freed.
>>
>> On that time, it will call an optional dev.release() callback,
>> responsible to free any other data struct that the driver allocated.
>
> The patchset does not change this. It's not a question of the
> media_devnode struct either. That's not an issue.
>
> The issue is rather what else can be accessed through the media device
> and other interfaces. As IOCTLs are not serialised with device removal
> (which now releases much of the data structures)

 Huh? ioctls are serialized with struct device removal. The Driver core
 warrants that.
>>>
>>> How?
>>>
>>> As far as I can tell, there's nothing in the way of an IOCTL being in
>>> progress on a character device which is registered by the driver for a
>>> hardware device which is being removed.
>>>
>>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which 
>>> is, in
>>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other 
>>> locks)
>>> are taken during that path, which I believe is by design.
>
> chrdev_open in fs/char_dev.c increases the refcount on open() and 
> decreases it
> on release(). Thus ensuring that the cdev can never be removed while in an
> ioctl.
>
>>>
> there's a high chance of accessing
> released memory (or mutexes that have been already destroyed). An
> example of that is here, stopping a running pipeline after unbinding
> the device. What happens there is that the media device is released
> whilst it's in use through the video device.
>
> 

 It is not clear from the logs what the driver tried to do, but
 that sounds like a driver's bug, with was not prepared to properly
 handle unbinds.

 The problem here is that isp_video_release() is called by V4L2
 release logic, and not by the MC one:

 static const struct v4l2_file_operations isp_video_fops = {
   .owner  = THIS_MODULE,
   .open   = isp_video_open,
   .release= isp_video_release,
   .poll   = vb2_fop_poll,
   .unlocked_ioctl = video_ioctl2,
   .mmap   = vb2_fop_mmap,
 };

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Mauro Carvalho Chehab
Em Thu, 15 Dec 2016 16:26:19 +0100
Hans Verkuil  escreveu:

> > Should all the entities stick around until all references to media
> > device are gone? If an application has /dev/media open, does that
> > mean all entities should not be free'd until this app. exits? What
> > should happen if an app. is streaming? Should the graph stay intact
> > until the app. exits?  
> 
> Yes, everything must stay around until the last user has disappeared.
> 
> In general unplugs can happen at any time. So applications can be in the 
> middle
> of an ioctl, and removing memory during that time is just impossible.
> 
> On unplug you:
> 
> 1) stop any HW DMA (highly device dependent)
> 2) wake up any filehandles that wait for an event
> 3) unregister any device nodes
> 
> Then just sit back and wait for refcounts to go down as filehandles are closed
> by the application.
> 
> Note: the v4l2/media/cec/IR/whatever core is typically responsible for 
> rejecting
> any ioctls/mmap/etc. once the device node has been unregistered. The only 
> valid
> file operation is release().

Agreed. The problem on OMAP3 is that it doesn't stop HW DMA when
struct media_devnode is released. It tries to do it later, when the
V4L2 core is unbind, by trying to dig into the media controller
struct that the driver removed before.

That's said, for OMAP3 and all other drivers that don't support hot unplug,
I would just use suppress_bind_attrs, as I fail to see any need to allow
unbinding them.

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Hans Verkuil

On 15/12/16 17:06, Shuah Khan wrote:

On 12/15/2016 08:26 AM, Hans Verkuil wrote:

On 15/12/16 15:45, Shuah Khan wrote:

On 12/15/2016 07:03 AM, Hans Verkuil wrote:

On 15/12/16 13:56, Laurent Pinchart wrote:

Hi Sakari,

On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:

On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:

Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:

On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:

Hi Sakari,

I answered you point to point below, but I suspect that you missed how
the current approach works. So, I decided to write a quick summary
here.

The character devices /dev/media? are created via cdev, with relies on
a kobject per device, with has an embedded struct kref inside.

Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
struct device, when the code does:
  devnode->cdev.kobj.parent = >dev.kobj;

before calling cdev_add().

The current lifetime management is actually based on cdev's kobject's
refcount, provided by its embedded kref.

The kref warrants that any data associated with /dev/media0 won't be
freed if there are any pending system call. In other words, when
cdev_del() is called, it will remove /dev/media0 from the filesystem,
and will call kobject_put().

If the refcount is zero, it will call devnode->dev.release(). If the
kobject refcount is not zero, the data won't be freed.

So, in the best case scenario, there's no opened file descriptors
by the time media device node is unregistered. So, it will free
everything.

In the worse case scenario, e. g. when the driver is removed or
unbind while /dev/media0 has some opened file descriptor(s),
the cdev logic will do the proper lifetime management.

On such case, /dev/media0 disappears from the file system, so another
open is not possible anymore. The data structures will remain
allocated until all associated file descriptors are not closed.

When all file descriptors are closed, the data will be freed.

On that time, it will call an optional dev.release() callback,
responsible to free any other data struct that the driver allocated.


The patchset does not change this. It's not a question of the
media_devnode struct either. That's not an issue.

The issue is rather what else can be accessed through the media device
and other interfaces. As IOCTLs are not serialised with device removal
(which now releases much of the data structures)


Huh? ioctls are serialized with struct device removal. The Driver core
warrants that.


How?

As far as I can tell, there's nothing in the way of an IOCTL being in
progress on a character device which is registered by the driver for a
hardware device which is being removed.

vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
are taken during that path, which I believe is by design.


chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
on release(). Thus ensuring that the cdev can never be removed while in an
ioctl.




there's a high chance of accessing
released memory (or mutexes that have been already destroyed). An
example of that is here, stopping a running pipeline after unbinding
the device. What happens there is that the media device is released
whilst it's in use through the video device.




It is not clear from the logs what the driver tried to do, but
that sounds like a driver's bug, with was not prepared to properly
handle unbinds.

The problem here is that isp_video_release() is called by V4L2
release logic, and not by the MC one:

static const struct v4l2_file_operations isp_video_fops = {
  .owner  = THIS_MODULE,
  .open   = isp_video_open,
  .release= isp_video_release,
  .poll   = vb2_fop_poll,
  .unlocked_ioctl = video_ioctl2,
  .mmap   = vb2_fop_mmap,
};

It seems that the driver's logic allows it to be called before or
after destroying the MC.

Assuming that, if the OMAP3 driver is not used it works,
it means that, if the isp_video_release() is called
first, no errors will happen, but if MC is destroyed before
V4L2 call to its .release() callback, as there's no logic at the
driver that would detect it, isp_video_release() will be calling
isp_video_streamoff(), with depends on the MC to work.

On a first glance, I can see two ways of fixing it:

1) to increment devnode's device kobject refcount at OMAP3 .probe(),
decrementing it only at isp_video_release(). That will ensure that
MC will only be removed after V4L2 removal.


As soon as you have to dig deep in a structure to find a reference counter and
increment it, bypassing all the API layers, you can be entirely sure that the
solution is wrong.


Indeed.




2) to call isp_video_streamoff() before removing the MC stuff, e. g.
inside the MC .release() callback.


This is a fair suggestion, 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Hans Verkuil

On 15/12/16 16:45, Mauro Carvalho Chehab wrote:

Em Thu, 15 Dec 2016 15:45:22 +0100
Hans Verkuil  escreveu:


On 15/12/16 15:32, Mauro Carvalho Chehab wrote:

Em Thu, 15 Dec 2016 15:03:36 +0100
Hans Verkuil  escreveu:


On 15/12/16 13:56, Laurent Pinchart wrote:

Hi Sakari,

On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:

On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:

Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:

On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:

Hi Sakari,





There's plenty of way to try and work around the problem in drivers, some more
racy than others, but if we require changes to all platform drivers to fix
this we need to ensure that we get it right, not as half-baked hacks spread
around the whole subsystem.


Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
device and I see no reason whatsoever to start modifying platform drivers just
because you can do an unbind. I know there are real hot-pluggable devices, and
getting this right for those is of course important.


That's indeed a very good point. If unbind is not needed by any usecase,
the better fix for OMAP3 would be to just prevent it to happen in the first
place.


The USB subsystem has a a .disconnect() callback that notifies
the drivers that a device was unbound (likely physically removed).
The way USB media drivers handle it is by returning -ENODEV to any
V4L2 call that would try to touch at the hardware after unbound.




In my view the main problem is that the media core is bound to a struct
device set by the driver that creates the MC. But since the MC gives an
overview of lots of other (sub)devices the refcount of the media device
should be increased for any (sub)device that adds itself to the MC and
decreased for any (sub)device that is removed. Only when the very last
user goes away can the MC memory be released.

The memory/refcounting associated with device nodes is unrelated to this:
once a devnode is unregistered it will be removed in /dev, and once the
last open fh closes any memory associated with the devnode can be released.
That will also decrease the refcount to its parent device.

This also means that it is a bad idea to embed devnodes in a larger struct.
They should be allocated and freed when the devnode is unregistered and
the last open filehandle is closed.

Then the parent's device refcount is decreased, and that may now call its
release callback if the refcount reaches 0.

For the media controller's device: any other device driver that needs access
to it needs to increase that device's refcount, and only when those devices
are released will they decrease the MC device's refcount.

And when that refcount goes to 0 can we finally free everything.

With regards to the opposition to reverting those initial patches, I'm
siding with Greg KH. Just revert the bloody patches. It worked most of the
time before those patches, so reverting really won't cause bisect problems.


You're contradicting yourself here ;)

The patches that this patch series is reverting are the ones that
de-embeeds devnode struct and fixes its lifecycle.

Reverting those patches will cause regressions on hot-pluggable drivers,
preventing them to be unplugged. So, if we're willing to revert, then we
should also revert MC support on them.


Two options:

1) Revert, then build up a proper solution.


Reverting is a regression, as we'll strip off the MC support from the
existing devices. We would also need to revert a lot more than just those
3 patches.


2) Do a big-bang patch switching directly over to the new solution, but that's
very hard to review.
2a) Post the patch series in small chunks on the mailinglist (starting with the
reverts), but once we're all happy merge that patch series into a single 
big-bang
patch and apply that.


We could do that, but so far, what has been submitted are incomplete,
as they only touch on a single driver (with doesn't require hot-plugging),
breaking all the other ones.


Step 1 is to find a solution that works and isn't a hack or workaround.

The next step is to roll it out for all drivers (ideally with an absolute 
minimum of
required changes), and the final step is to figure out how to organize the patch
series to ensure bisect, etc.

But the first step is the most important, and one that should be reviewed
even if the way the patch series is organized might/will cause problems when
it is merged.

This whole discussion about those revert patches is simply not relevant at the
moment.

Regards,

Hans




As far as I am concerned the whole hotplugging code is broken and has been for
a very long time. We (or at least I :-) ) understand the underlying concepts
a lot better, so we can do a better job. But the transition may well be
painful.


It is not broken currently on the devices that require hotplugging.

Thanks,
Mauro



--
To unsubscribe from this list: send 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Shuah Khan
On 12/15/2016 08:26 AM, Hans Verkuil wrote:
> On 15/12/16 15:45, Shuah Khan wrote:
>> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
>>> On 15/12/16 13:56, Laurent Pinchart wrote:
 Hi Sakari,

 On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
 Hi Sakari,

 I answered you point to point below, but I suspect that you missed how
 the current approach works. So, I decided to write a quick summary
 here.

 The character devices /dev/media? are created via cdev, with relies on
 a kobject per device, with has an embedded struct kref inside.

 Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
 struct device, when the code does:
   devnode->cdev.kobj.parent = >dev.kobj;

 before calling cdev_add().

 The current lifetime management is actually based on cdev's kobject's
 refcount, provided by its embedded kref.

 The kref warrants that any data associated with /dev/media0 won't be
 freed if there are any pending system call. In other words, when
 cdev_del() is called, it will remove /dev/media0 from the filesystem,
 and will call kobject_put().

 If the refcount is zero, it will call devnode->dev.release(). If the
 kobject refcount is not zero, the data won't be freed.

 So, in the best case scenario, there's no opened file descriptors
 by the time media device node is unregistered. So, it will free
 everything.

 In the worse case scenario, e. g. when the driver is removed or
 unbind while /dev/media0 has some opened file descriptor(s),
 the cdev logic will do the proper lifetime management.

 On such case, /dev/media0 disappears from the file system, so another
 open is not possible anymore. The data structures will remain
 allocated until all associated file descriptors are not closed.

 When all file descriptors are closed, the data will be freed.

 On that time, it will call an optional dev.release() callback,
 responsible to free any other data struct that the driver allocated.
>>>
>>> The patchset does not change this. It's not a question of the
>>> media_devnode struct either. That's not an issue.
>>>
>>> The issue is rather what else can be accessed through the media device
>>> and other interfaces. As IOCTLs are not serialised with device removal
>>> (which now releases much of the data structures)
>>
>> Huh? ioctls are serialized with struct device removal. The Driver core
>> warrants that.
>
> How?
>
> As far as I can tell, there's nothing in the way of an IOCTL being in
> progress on a character device which is registered by the driver for a
> hardware device which is being removed.
>
> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, 
> in
> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
> are taken during that path, which I believe is by design.
>>>
>>> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases 
>>> it
>>> on release(). Thus ensuring that the cdev can never be removed while in an
>>> ioctl.
>>>
>
>>> there's a high chance of accessing
>>> released memory (or mutexes that have been already destroyed). An
>>> example of that is here, stopping a running pipeline after unbinding
>>> the device. What happens there is that the media device is released
>>> whilst it's in use through the video device.
>>>
>>> 
>>
>> It is not clear from the logs what the driver tried to do, but
>> that sounds like a driver's bug, with was not prepared to properly
>> handle unbinds.
>>
>> The problem here is that isp_video_release() is called by V4L2
>> release logic, and not by the MC one:
>>
>> static const struct v4l2_file_operations isp_video_fops = {
>>   .owner  = THIS_MODULE,
>>   .open   = isp_video_open,
>>   .release= isp_video_release,
>>   .poll   = vb2_fop_poll,
>>   .unlocked_ioctl = video_ioctl2,
>>   .mmap   = vb2_fop_mmap,
>> };
>>
>> It seems that the driver's logic allows it to be called before or
>> after destroying the MC.
>>
>> Assuming that, if the OMAP3 driver is not used it works,
>> it means that, if the isp_video_release() is called
>> first, no errors will happen, but if MC is destroyed before
>> V4L2 call 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Mauro Carvalho Chehab
Em Thu, 15 Dec 2016 15:45:22 +0100
Hans Verkuil  escreveu:

> On 15/12/16 15:32, Mauro Carvalho Chehab wrote:
> > Em Thu, 15 Dec 2016 15:03:36 +0100
> > Hans Verkuil  escreveu:
> >  
> >> On 15/12/16 13:56, Laurent Pinchart wrote:  
> >>> Hi Sakari,
> >>>
> >>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:  
>  On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:  
> > Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:  
> >> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote: 
> >>  
> >>> Hi Sakari,
> >>>  
> >
> >  
> >>> There's plenty of way to try and work around the problem in drivers, some 
> >>> more
> >>> racy than others, but if we require changes to all platform drivers to fix
> >>> this we need to ensure that we get it right, not as half-baked hacks 
> >>> spread
> >>> around the whole subsystem.  
> >>
> >> Why on earth do we want this for the omap3 driver? It is not a 
> >> hot-pluggable
> >> device and I see no reason whatsoever to start modifying platform drivers 
> >> just
> >> because you can do an unbind. I know there are real hot-pluggable devices, 
> >> and
> >> getting this right for those is of course important.  
> >
> > That's indeed a very good point. If unbind is not needed by any usecase,
> > the better fix for OMAP3 would be to just prevent it to happen in the first
> > place.
> >  
> > The USB subsystem has a a .disconnect() callback that notifies
> > the drivers that a device was unbound (likely physically removed).
> > The way USB media drivers handle it is by returning -ENODEV to any
> > V4L2 call that would try to touch at the hardware after unbound.  
> >>>  
> >>
> >> In my view the main problem is that the media core is bound to a struct
> >> device set by the driver that creates the MC. But since the MC gives an
> >> overview of lots of other (sub)devices the refcount of the media device
> >> should be increased for any (sub)device that adds itself to the MC and
> >> decreased for any (sub)device that is removed. Only when the very last
> >> user goes away can the MC memory be released.
> >>
> >> The memory/refcounting associated with device nodes is unrelated to this:
> >> once a devnode is unregistered it will be removed in /dev, and once the
> >> last open fh closes any memory associated with the devnode can be released.
> >> That will also decrease the refcount to its parent device.
> >>
> >> This also means that it is a bad idea to embed devnodes in a larger struct.
> >> They should be allocated and freed when the devnode is unregistered and
> >> the last open filehandle is closed.
> >>
> >> Then the parent's device refcount is decreased, and that may now call its
> >> release callback if the refcount reaches 0.
> >>
> >> For the media controller's device: any other device driver that needs 
> >> access
> >> to it needs to increase that device's refcount, and only when those devices
> >> are released will they decrease the MC device's refcount.
> >>
> >> And when that refcount goes to 0 can we finally free everything.
> >>
> >> With regards to the opposition to reverting those initial patches, I'm
> >> siding with Greg KH. Just revert the bloody patches. It worked most of the
> >> time before those patches, so reverting really won't cause bisect 
> >> problems.  
> >
> > You're contradicting yourself here ;)
> >
> > The patches that this patch series is reverting are the ones that
> > de-embeeds devnode struct and fixes its lifecycle.
> >
> > Reverting those patches will cause regressions on hot-pluggable drivers,
> > preventing them to be unplugged. So, if we're willing to revert, then we
> > should also revert MC support on them.  
> 
> Two options:
> 
> 1) Revert, then build up a proper solution.

Reverting is a regression, as we'll strip off the MC support from the
existing devices. We would also need to revert a lot more than just those
3 patches.

> 2) Do a big-bang patch switching directly over to the new solution, but that's
> very hard to review.
> 2a) Post the patch series in small chunks on the mailinglist (starting with 
> the
> reverts), but once we're all happy merge that patch series into a single 
> big-bang
> patch and apply that.

We could do that, but so far, what has been submitted are incomplete,
as they only touch on a single driver (with doesn't require hot-plugging),
breaking all the other ones.

> As far as I am concerned the whole hotplugging code is broken and has been for
> a very long time. We (or at least I :-) ) understand the underlying concepts
> a lot better, so we can do a better job. But the transition may well be
> painful.

It is not broken currently on the devices that require hotplugging.

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Hans Verkuil

On 15/12/16 15:45, Shuah Khan wrote:

On 12/15/2016 07:03 AM, Hans Verkuil wrote:

On 15/12/16 13:56, Laurent Pinchart wrote:

Hi Sakari,

On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:

On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:

Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:

On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:

Hi Sakari,

I answered you point to point below, but I suspect that you missed how
the current approach works. So, I decided to write a quick summary
here.

The character devices /dev/media? are created via cdev, with relies on
a kobject per device, with has an embedded struct kref inside.

Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
struct device, when the code does:
  devnode->cdev.kobj.parent = >dev.kobj;

before calling cdev_add().

The current lifetime management is actually based on cdev's kobject's
refcount, provided by its embedded kref.

The kref warrants that any data associated with /dev/media0 won't be
freed if there are any pending system call. In other words, when
cdev_del() is called, it will remove /dev/media0 from the filesystem,
and will call kobject_put().

If the refcount is zero, it will call devnode->dev.release(). If the
kobject refcount is not zero, the data won't be freed.

So, in the best case scenario, there's no opened file descriptors
by the time media device node is unregistered. So, it will free
everything.

In the worse case scenario, e. g. when the driver is removed or
unbind while /dev/media0 has some opened file descriptor(s),
the cdev logic will do the proper lifetime management.

On such case, /dev/media0 disappears from the file system, so another
open is not possible anymore. The data structures will remain
allocated until all associated file descriptors are not closed.

When all file descriptors are closed, the data will be freed.

On that time, it will call an optional dev.release() callback,
responsible to free any other data struct that the driver allocated.


The patchset does not change this. It's not a question of the
media_devnode struct either. That's not an issue.

The issue is rather what else can be accessed through the media device
and other interfaces. As IOCTLs are not serialised with device removal
(which now releases much of the data structures)


Huh? ioctls are serialized with struct device removal. The Driver core
warrants that.


How?

As far as I can tell, there's nothing in the way of an IOCTL being in
progress on a character device which is registered by the driver for a
hardware device which is being removed.

vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
are taken during that path, which I believe is by design.


chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
on release(). Thus ensuring that the cdev can never be removed while in an
ioctl.




there's a high chance of accessing
released memory (or mutexes that have been already destroyed). An
example of that is here, stopping a running pipeline after unbinding
the device. What happens there is that the media device is released
whilst it's in use through the video device.




It is not clear from the logs what the driver tried to do, but
that sounds like a driver's bug, with was not prepared to properly
handle unbinds.

The problem here is that isp_video_release() is called by V4L2
release logic, and not by the MC one:

static const struct v4l2_file_operations isp_video_fops = {
  .owner  = THIS_MODULE,
  .open   = isp_video_open,
  .release= isp_video_release,
  .poll   = vb2_fop_poll,
  .unlocked_ioctl = video_ioctl2,
  .mmap   = vb2_fop_mmap,
};

It seems that the driver's logic allows it to be called before or
after destroying the MC.

Assuming that, if the OMAP3 driver is not used it works,
it means that, if the isp_video_release() is called
first, no errors will happen, but if MC is destroyed before
V4L2 call to its .release() callback, as there's no logic at the
driver that would detect it, isp_video_release() will be calling
isp_video_streamoff(), with depends on the MC to work.

On a first glance, I can see two ways of fixing it:

1) to increment devnode's device kobject refcount at OMAP3 .probe(),
decrementing it only at isp_video_release(). That will ensure that
MC will only be removed after V4L2 removal.


As soon as you have to dig deep in a structure to find a reference counter and
increment it, bypassing all the API layers, you can be entirely sure that the
solution is wrong.


Indeed.




2) to call isp_video_streamoff() before removing the MC stuff, e. g.
inside the MC .release() callback.


This is a fair suggestion, indeed. Let me see what could be done there.
Albeit this is just *one* of the existing 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Shuah Khan
On 12/15/2016 03:39 AM, Laurent Pinchart wrote:
> Hello,
> 
> On Tuesday 13 Dec 2016 15:23:53 Shuah Khan wrote:
>> On 12/13/2016 05:24 AM, Mauro Carvalho Chehab wrote:
>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
 On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> Hi Sakari,
>
> I answered you point to point below, but I suspect that you missed how
> the current approach works. So, I decided to write a quick summary here.
>
> The character devices /dev/media? are created via cdev, with relies on a
> kobject per device, with has an embedded struct kref inside.
>
> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>
> struct device, when the code does:
>   devnode->cdev.kobj.parent = >dev.kobj;
>
> before calling cdev_add().
>
> The current lifetime management is actually based on cdev's kobject's
> refcount, provided by its embedded kref.
>
> The kref warrants that any data associated with /dev/media0 won't be
> freed if there are any pending system call. In other words, when
> cdev_del() is called, it will remove /dev/media0 from the filesystem,
> and
> will call kobject_put().
>
> If the refcount is zero, it will call devnode->dev.release(). If the
> kobject refcount is not zero, the data won't be freed.
>
> So, in the best case scenario, there's no opened file descriptors
> by the time media device node is unregistered. So, it will free
> everything.
>
> In the worse case scenario, e. g. when the driver is removed or
> unbind while /dev/media0 has some opened file descriptor(s),
> the cdev logic will do the proper lifetime management.
>
> On such case, /dev/media0 disappears from the file system, so another
> open
> is not possible anymore. The data structures will remain allocated until
> all associated file descriptors are not closed.
>
> When all file descriptors are closed, the data will be freed.
>
> On that time, it will call an optional dev.release() callback,
> responsible to free any other data struct that the driver allocated.

 The patchset does not change this. It's not a question of the
 media_devnode struct either. That's not an issue.

 The issue is rather what else can be accessed through the media device
 and other interfaces. As IOCTLs are not serialised with device removal
 (which now releases much of the data structures)
>>>
>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>> warrants that.
> 
> Code references please.
>  
 there's a high chance of accessing
 released memory (or mutexes that have been already destroyed). An example
 of that is here, stopping a running pipeline after unbinding the device.
 What happens there is that the media device is released whilst it's in
 use through the video device.

 
>>>
>>> It is not clear from the logs what the driver tried to do, but
>>> that sounds like a driver's bug, with was not prepared to properly
>>> handle unbinds.
>>>
>>> The problem here is that isp_video_release() is called by V4L2
>>> release logic, and not by the MC one:
>>>
>>> static const struct v4l2_file_operations isp_video_fops = {
>>> .owner  = THIS_MODULE,
>>> .open   = isp_video_open,
>>> .release= isp_video_release,
>>> .poll   = vb2_fop_poll,
>>> .unlocked_ioctl = video_ioctl2,
>>> .mmap   = vb2_fop_mmap,
>>> };
>>>
>>> It seems that the driver's logic allows it to be called before or
>>> after destroying the MC.
>>
>> Right isp_video_release() will definitely be called after driver is
>> gone which means media device is gone and the device itself.
> 
> Certainly not after the driver is gone (as in the module being unloaded from 
> memory), but it can be called after the device is unbound from the driver, 
> yes.
> 
>> Both au0828 and em28xx have these release handlers. Neither one uses
>> devm resource for their device structs.
> 
> And no driver exposing objects to userspace-accessible code paths should. 
> I've 
> been pointing at how devm_kzalloc() is abused for more than a year now, it's 
> nice to see that people slowly start listening.
> 
>> Also, both em28xx and au0828 keep disconnected state and have logic
>> to detect the state of the driver and device. em28xx holds reference
>> to v4l2->ref
> 
> That's very, very wrong. The v4l2_device::ref field must *not* be touched by 
> drivers. Acquiring and releasing references to v4l2_device instances must be 
> done with v4l2_device_get() and v4l2_device_put(), and the structure has a 
> release handler that drivers can use. Why do people write such horrible code 
> that pokes at private fields ?
> 
>> and releases the reference in em28xx_v4l2_close() which is
>> its 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Shuah Khan
On 12/15/2016 07:03 AM, Hans Verkuil wrote:
> On 15/12/16 13:56, Laurent Pinchart wrote:
>> Hi Sakari,
>>
>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
 Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>> Hi Sakari,
>>
>> I answered you point to point below, but I suspect that you missed how
>> the current approach works. So, I decided to write a quick summary
>> here.
>>
>> The character devices /dev/media? are created via cdev, with relies on
>> a kobject per device, with has an embedded struct kref inside.
>>
>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>> struct device, when the code does:
>>   devnode->cdev.kobj.parent = >dev.kobj;
>>
>> before calling cdev_add().
>>
>> The current lifetime management is actually based on cdev's kobject's
>> refcount, provided by its embedded kref.
>>
>> The kref warrants that any data associated with /dev/media0 won't be
>> freed if there are any pending system call. In other words, when
>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>> and will call kobject_put().
>>
>> If the refcount is zero, it will call devnode->dev.release(). If the
>> kobject refcount is not zero, the data won't be freed.
>>
>> So, in the best case scenario, there's no opened file descriptors
>> by the time media device node is unregistered. So, it will free
>> everything.
>>
>> In the worse case scenario, e. g. when the driver is removed or
>> unbind while /dev/media0 has some opened file descriptor(s),
>> the cdev logic will do the proper lifetime management.
>>
>> On such case, /dev/media0 disappears from the file system, so another
>> open is not possible anymore. The data structures will remain
>> allocated until all associated file descriptors are not closed.
>>
>> When all file descriptors are closed, the data will be freed.
>>
>> On that time, it will call an optional dev.release() callback,
>> responsible to free any other data struct that the driver allocated.
>
> The patchset does not change this. It's not a question of the
> media_devnode struct either. That's not an issue.
>
> The issue is rather what else can be accessed through the media device
> and other interfaces. As IOCTLs are not serialised with device removal
> (which now releases much of the data structures)

 Huh? ioctls are serialized with struct device removal. The Driver core
 warrants that.
>>>
>>> How?
>>>
>>> As far as I can tell, there's nothing in the way of an IOCTL being in
>>> progress on a character device which is registered by the driver for a
>>> hardware device which is being removed.
>>>
>>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
>>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
>>> are taken during that path, which I believe is by design.
> 
> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> on release(). Thus ensuring that the cdev can never be removed while in an
> ioctl.
> 
>>>
> there's a high chance of accessing
> released memory (or mutexes that have been already destroyed). An
> example of that is here, stopping a running pipeline after unbinding
> the device. What happens there is that the media device is released
> whilst it's in use through the video device.
>
> 

 It is not clear from the logs what the driver tried to do, but
 that sounds like a driver's bug, with was not prepared to properly
 handle unbinds.

 The problem here is that isp_video_release() is called by V4L2
 release logic, and not by the MC one:

 static const struct v4l2_file_operations isp_video_fops = {
   .owner  = THIS_MODULE,
   .open   = isp_video_open,
   .release= isp_video_release,
   .poll   = vb2_fop_poll,
   .unlocked_ioctl = video_ioctl2,
   .mmap   = vb2_fop_mmap,
 };

 It seems that the driver's logic allows it to be called before or
 after destroying the MC.

 Assuming that, if the OMAP3 driver is not used it works,
 it means that, if the isp_video_release() is called
 first, no errors will happen, but if MC is destroyed before
 V4L2 call to its .release() callback, as there's no logic at the
 driver that would detect it, isp_video_release() will be calling
 isp_video_streamoff(), with depends on the MC to work.

 On a first glance, I can see two ways of fixing it:

 1) to increment devnode's device kobject refcount 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Hans Verkuil

On 15/12/16 15:32, Mauro Carvalho Chehab wrote:

Em Thu, 15 Dec 2016 15:03:36 +0100
Hans Verkuil  escreveu:


On 15/12/16 13:56, Laurent Pinchart wrote:

Hi Sakari,

On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:

On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:

Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:

On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:

Hi Sakari,





There's plenty of way to try and work around the problem in drivers, some more
racy than others, but if we require changes to all platform drivers to fix
this we need to ensure that we get it right, not as half-baked hacks spread
around the whole subsystem.


Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
device and I see no reason whatsoever to start modifying platform drivers just
because you can do an unbind. I know there are real hot-pluggable devices, and
getting this right for those is of course important.


That's indeed a very good point. If unbind is not needed by any usecase,
the better fix for OMAP3 would be to just prevent it to happen in the first
place.


The USB subsystem has a a .disconnect() callback that notifies
the drivers that a device was unbound (likely physically removed).
The way USB media drivers handle it is by returning -ENODEV to any
V4L2 call that would try to touch at the hardware after unbound.




In my view the main problem is that the media core is bound to a struct
device set by the driver that creates the MC. But since the MC gives an
overview of lots of other (sub)devices the refcount of the media device
should be increased for any (sub)device that adds itself to the MC and
decreased for any (sub)device that is removed. Only when the very last
user goes away can the MC memory be released.

The memory/refcounting associated with device nodes is unrelated to this:
once a devnode is unregistered it will be removed in /dev, and once the
last open fh closes any memory associated with the devnode can be released.
That will also decrease the refcount to its parent device.

This also means that it is a bad idea to embed devnodes in a larger struct.
They should be allocated and freed when the devnode is unregistered and
the last open filehandle is closed.

Then the parent's device refcount is decreased, and that may now call its
release callback if the refcount reaches 0.

For the media controller's device: any other device driver that needs access
to it needs to increase that device's refcount, and only when those devices
are released will they decrease the MC device's refcount.

And when that refcount goes to 0 can we finally free everything.

With regards to the opposition to reverting those initial patches, I'm
siding with Greg KH. Just revert the bloody patches. It worked most of the
time before those patches, so reverting really won't cause bisect problems.


You're contradicting yourself here ;)

The patches that this patch series is reverting are the ones that
de-embeeds devnode struct and fixes its lifecycle.

Reverting those patches will cause regressions on hot-pluggable drivers,
preventing them to be unplugged. So, if we're willing to revert, then we
should also revert MC support on them.


Two options:

1) Revert, then build up a proper solution.
2) Do a big-bang patch switching directly over to the new solution, but that's
very hard to review.
2a) Post the patch series in small chunks on the mailinglist (starting with the
reverts), but once we're all happy merge that patch series into a single 
big-bang
patch and apply that.

As far as I am concerned the whole hotplugging code is broken and has been for
a very long time. We (or at least I :-) ) understand the underlying concepts
a lot better, so we can do a better job. But the transition may well be
painful.

Regards,

Hans




Just revert and build up things as they should.

Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
parent pointer for example) and many drivers (including omap3isp) embed
video_device, which is wrong and can lead to complications.

I'm to blame for the embedding since I thought that was a good idea at one
time. I now realized that it isn't. Sorry about that...

And because the cdev of the video_device doesn't know about the parent
device it is (I think) possible that the parent device is released before
the cdev is released. Which can result in major problems.


I agree with you here. IMHO, de-embeeding cdev's parent struct from
video_device seems to be the right thing to do at V4L2 side too.

Regards,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Mauro Carvalho Chehab
Em Thu, 15 Dec 2016 15:03:36 +0100
Hans Verkuil  escreveu:

> On 15/12/16 13:56, Laurent Pinchart wrote:
> > Hi Sakari,
> >
> > On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:  
> >> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:  
> >>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:  
>  On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:  
> > Hi Sakari,
> >


> > There's plenty of way to try and work around the problem in drivers, some 
> > more
> > racy than others, but if we require changes to all platform drivers to fix
> > this we need to ensure that we get it right, not as half-baked hacks spread
> > around the whole subsystem.  
> 
> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
> device and I see no reason whatsoever to start modifying platform drivers just
> because you can do an unbind. I know there are real hot-pluggable devices, and
> getting this right for those is of course important.

That's indeed a very good point. If unbind is not needed by any usecase,
the better fix for OMAP3 would be to just prevent it to happen in the first
place.

> >>> The USB subsystem has a a .disconnect() callback that notifies
> >>> the drivers that a device was unbound (likely physically removed).
> >>> The way USB media drivers handle it is by returning -ENODEV to any
> >>> V4L2 call that would try to touch at the hardware after unbound.  
> >  
> 
> In my view the main problem is that the media core is bound to a struct
> device set by the driver that creates the MC. But since the MC gives an
> overview of lots of other (sub)devices the refcount of the media device
> should be increased for any (sub)device that adds itself to the MC and
> decreased for any (sub)device that is removed. Only when the very last
> user goes away can the MC memory be released.
> 
> The memory/refcounting associated with device nodes is unrelated to this:
> once a devnode is unregistered it will be removed in /dev, and once the
> last open fh closes any memory associated with the devnode can be released.
> That will also decrease the refcount to its parent device.
> 
> This also means that it is a bad idea to embed devnodes in a larger struct.
> They should be allocated and freed when the devnode is unregistered and
> the last open filehandle is closed.
> 
> Then the parent's device refcount is decreased, and that may now call its
> release callback if the refcount reaches 0.
> 
> For the media controller's device: any other device driver that needs access
> to it needs to increase that device's refcount, and only when those devices
> are released will they decrease the MC device's refcount.
> 
> And when that refcount goes to 0 can we finally free everything.
> 
> With regards to the opposition to reverting those initial patches, I'm
> siding with Greg KH. Just revert the bloody patches. It worked most of the
> time before those patches, so reverting really won't cause bisect problems.

You're contradicting yourself here ;)

The patches that this patch series is reverting are the ones that
de-embeeds devnode struct and fixes its lifecycle.

Reverting those patches will cause regressions on hot-pluggable drivers,
preventing them to be unplugged. So, if we're willing to revert, then we
should also revert MC support on them.

> Just revert and build up things as they should.
> 
> Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
> parent pointer for example) and many drivers (including omap3isp) embed
> video_device, which is wrong and can lead to complications.
> 
> I'm to blame for the embedding since I thought that was a good idea at one
> time. I now realized that it isn't. Sorry about that...
> 
> And because the cdev of the video_device doesn't know about the parent
> device it is (I think) possible that the parent device is released before
> the cdev is released. Which can result in major problems.

I agree with you here. IMHO, de-embeeding cdev's struct from video_device
seems to be the right thing to do at V4L2 side too.

Regards,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Hans Verkuil

On 15/12/16 13:56, Laurent Pinchart wrote:

Hi Sakari,

On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:

On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:

Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:

On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:

Hi Sakari,

I answered you point to point below, but I suspect that you missed how
the current approach works. So, I decided to write a quick summary
here.

The character devices /dev/media? are created via cdev, with relies on
a kobject per device, with has an embedded struct kref inside.

Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
struct device, when the code does:
  devnode->cdev.kobj.parent = >dev.kobj;

before calling cdev_add().

The current lifetime management is actually based on cdev's kobject's
refcount, provided by its embedded kref.

The kref warrants that any data associated with /dev/media0 won't be
freed if there are any pending system call. In other words, when
cdev_del() is called, it will remove /dev/media0 from the filesystem,
and will call kobject_put().

If the refcount is zero, it will call devnode->dev.release(). If the
kobject refcount is not zero, the data won't be freed.

So, in the best case scenario, there's no opened file descriptors
by the time media device node is unregistered. So, it will free
everything.

In the worse case scenario, e. g. when the driver is removed or
unbind while /dev/media0 has some opened file descriptor(s),
the cdev logic will do the proper lifetime management.

On such case, /dev/media0 disappears from the file system, so another
open is not possible anymore. The data structures will remain
allocated until all associated file descriptors are not closed.

When all file descriptors are closed, the data will be freed.

On that time, it will call an optional dev.release() callback,
responsible to free any other data struct that the driver allocated.


The patchset does not change this. It's not a question of the
media_devnode struct either. That's not an issue.

The issue is rather what else can be accessed through the media device
and other interfaces. As IOCTLs are not serialised with device removal
(which now releases much of the data structures)


Huh? ioctls are serialized with struct device removal. The Driver core
warrants that.


How?

As far as I can tell, there's nothing in the way of an IOCTL being in
progress on a character device which is registered by the driver for a
hardware device which is being removed.

vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
are taken during that path, which I believe is by design.


chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
on release(). Thus ensuring that the cdev can never be removed while in an
ioctl.




there's a high chance of accessing
released memory (or mutexes that have been already destroyed). An
example of that is here, stopping a running pipeline after unbinding
the device. What happens there is that the media device is released
whilst it's in use through the video device.




It is not clear from the logs what the driver tried to do, but
that sounds like a driver's bug, with was not prepared to properly
handle unbinds.

The problem here is that isp_video_release() is called by V4L2
release logic, and not by the MC one:

static const struct v4l2_file_operations isp_video_fops = {
  .owner  = THIS_MODULE,
  .open   = isp_video_open,
  .release= isp_video_release,
  .poll   = vb2_fop_poll,
  .unlocked_ioctl = video_ioctl2,
  .mmap   = vb2_fop_mmap,
};

It seems that the driver's logic allows it to be called before or
after destroying the MC.

Assuming that, if the OMAP3 driver is not used it works,
it means that, if the isp_video_release() is called
first, no errors will happen, but if MC is destroyed before
V4L2 call to its .release() callback, as there's no logic at the
driver that would detect it, isp_video_release() will be calling
isp_video_streamoff(), with depends on the MC to work.

On a first glance, I can see two ways of fixing it:

1) to increment devnode's device kobject refcount at OMAP3 .probe(),
decrementing it only at isp_video_release(). That will ensure that
MC will only be removed after V4L2 removal.


As soon as you have to dig deep in a structure to find a reference counter and
increment it, bypassing all the API layers, you can be entirely sure that the
solution is wrong.


Indeed.




2) to call isp_video_streamoff() before removing the MC stuff, e. g.
inside the MC .release() callback.


This is a fair suggestion, indeed. Let me see what could be done there.
Albeit this is just *one* of the existing issues. It will not address all
problems fixed by the patchset.


We need to stop 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Laurent Pinchart
Hi Sakari,

On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> > Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> >>> Hi Sakari,
> >>>
> >>> I answered you point to point below, but I suspect that you missed how
> >>> the current approach works. So, I decided to write a quick summary
> >>> here.
> >>>
> >>> The character devices /dev/media? are created via cdev, with relies on
> >>> a kobject per device, with has an embedded struct kref inside.
> >>>
> >>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> >>> struct device, when the code does:
> >>>   devnode->cdev.kobj.parent = >dev.kobj;
> >>>
> >>> before calling cdev_add().
> >>>
> >>> The current lifetime management is actually based on cdev's kobject's
> >>> refcount, provided by its embedded kref.
> >>>
> >>> The kref warrants that any data associated with /dev/media0 won't be 
> >>> freed if there are any pending system call. In other words, when 
> >>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
> >>> and will call kobject_put(). 
> >>>
> >>> If the refcount is zero, it will call devnode->dev.release(). If the 
> >>> kobject refcount is not zero, the data won't be freed.
> >>>
> >>> So, in the best case scenario, there's no opened file descriptors
> >>> by the time media device node is unregistered. So, it will free
> >>> everything.
> >>>
> >>> In the worse case scenario, e. g. when the driver is removed or 
> >>> unbind while /dev/media0 has some opened file descriptor(s),
> >>> the cdev logic will do the proper lifetime management.
> >>>
> >>> On such case, /dev/media0 disappears from the file system, so another
> >>> open is not possible anymore. The data structures will remain
> >>> allocated until all associated file descriptors are not closed.
> >>>
> >>> When all file descriptors are closed, the data will be freed.
> >>>
> >>> On that time, it will call an optional dev.release() callback,
> >>> responsible to free any other data struct that the driver allocated.  
> >>
> >> The patchset does not change this. It's not a question of the
> >> media_devnode struct either. That's not an issue.
> >>
> >> The issue is rather what else can be accessed through the media device
> >> and other interfaces. As IOCTLs are not serialised with device removal
> >> (which now releases much of the data structures) 
> >
> > Huh? ioctls are serialized with struct device removal. The Driver core
> > warrants that.
> 
> How?
> 
> As far as I can tell, there's nothing in the way of an IOCTL being in
> progress on a character device which is registered by the driver for a
> hardware device which is being removed.
> 
> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
> are taken during that path, which I believe is by design.
> 
> >> there's a high chance of accessing
> >> released memory (or mutexes that have been already destroyed). An
> >> example of that is here, stopping a running pipeline after unbinding
> >> the device. What happens there is that the media device is released
> >> whilst it's in use through the video device.
> >>
> >> 
> >
> > It is not clear from the logs what the driver tried to do, but
> > that sounds like a driver's bug, with was not prepared to properly
> > handle unbinds.
> >
> > The problem here is that isp_video_release() is called by V4L2
> > release logic, and not by the MC one:
> >
> > static const struct v4l2_file_operations isp_video_fops = {
> >   .owner  = THIS_MODULE,
> >   .open   = isp_video_open,
> >   .release= isp_video_release,
> >   .poll   = vb2_fop_poll,
> >   .unlocked_ioctl = video_ioctl2,
> >   .mmap   = vb2_fop_mmap,
> > };
> >
> > It seems that the driver's logic allows it to be called before or
> > after destroying the MC.
> >
> > Assuming that, if the OMAP3 driver is not used it works,
> > it means that, if the isp_video_release() is called
> > first, no errors will happen, but if MC is destroyed before
> > V4L2 call to its .release() callback, as there's no logic at the
> > driver that would detect it, isp_video_release() will be calling
> > isp_video_streamoff(), with depends on the MC to work.
> >
> > On a first glance, I can see two ways of fixing it:
> >
> > 1) to increment devnode's device kobject refcount at OMAP3 .probe(), 
> > decrementing it only at isp_video_release(). That will ensure that
> > MC will only be removed after V4L2 removal.

As soon as you have to dig deep in a structure to find a reference counter and 
increment it, bypassing all the API layers, you can be entirely sure that the 
solution is wrong.

> > 2) to call isp_video_streamoff() before removing 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Sakari Ailus
Hi Mauro,

On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> Em Tue, 13 Dec 2016 12:53:05 +0200
> Sakari Ailus  escreveu:
> 
> > Hi Mauro,
> > 
> > On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> > > Hi Sakari,
> > > 
> > > I answered you point to point below, but I suspect that you missed how 
> > > the 
> > > current approach works. So, I decided to write a quick summary here.
> > > 
> > > The character devices /dev/media? are created via cdev, with relies on a 
> > > kobject per device, with has an embedded struct kref inside.
> > > 
> > > Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> > > struct device, when the code does:
> > >   devnode->cdev.kobj.parent = >dev.kobj;
> > > 
> > > before calling cdev_add().
> > > 
> > > The current lifetime management is actually based on cdev's kobject's
> > > refcount, provided by its embedded kref.
> > > 
> > > The kref warrants that any data associated with /dev/media0 won't be 
> > > freed if there are any pending system call. In other words, when 
> > > cdev_del() is called, it will remove /dev/media0 from the filesystem, and
> > > will call kobject_put(). 
> > > 
> > > If the refcount is zero, it will call devnode->dev.release(). If the 
> > > kobject refcount is not zero, the data won't be freed.
> > > 
> > > So, in the best case scenario, there's no opened file descriptors
> > > by the time media device node is unregistered. So, it will free
> > > everything.
> > > 
> > > In the worse case scenario, e. g. when the driver is removed or 
> > > unbind while /dev/media0 has some opened file descriptor(s),
> > > the cdev logic will do the proper lifetime management.
> > > 
> > > On such case, /dev/media0 disappears from the file system, so another open
> > > is not possible anymore. The data structures will remain allocated until
> > > all associated file descriptors are not closed.
> > > 
> > > When all file descriptors are closed, the data will be freed.
> > > 
> > > On that time, it will call an optional dev.release() callback,
> > > responsible to free any other data struct that the driver allocated.  
> > 
> > The patchset does not change this. It's not a question of the media_devnode
> > struct either. That's not an issue.
> > 
> > The issue is rather what else can be accessed through the media device and
> > other interfaces. As IOCTLs are not serialised with device removal (which
> > now releases much of the data structures) 
> 
> Huh? ioctls are serialized with struct device removal. The Driver core
> warrants that.

How?

As far as I can tell, there's nothing in the way of an IOCTL being in
progress on a character device which is registered by the driver for a
hardware device which is being removed.

vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
are taken during that path, which I believe is by design.

> 
> > there's a high chance of accessing
> > released memory (or mutexes that have been already destroyed). An example of
> > that is here, stopping a running pipeline after unbinding the device. What
> > happens there is that the media device is released whilst it's in use
> > through the video device.
> > 
> > 
> 
> It is not clear from the logs what the driver tried to do, but
> that sounds like a driver's bug, with was not prepared to properly
> handle unbinds.
> 
> The problem here is that isp_video_release() is called by V4L2
> release logic, and not by the MC one:
> 
> static const struct v4l2_file_operations isp_video_fops = {
>   .owner  = THIS_MODULE,
>   .open   = isp_video_open,
>   .release= isp_video_release,
>   .poll   = vb2_fop_poll,
>   .unlocked_ioctl = video_ioctl2,
>   .mmap   = vb2_fop_mmap,
> };
> 
> It seems that the driver's logic allows it to be called before or
> after destroying the MC.
> 
> Assuming that, if the OMAP3 driver is not used it works,
> it means that, if the isp_video_release() is called
> first, no errors will happen, but if MC is destroyed before
> V4L2 call to its .release() callback, as there's no logic at the
> driver that would detect it, isp_video_release() will be calling
> isp_video_streamoff(), with depends on the MC to work.
> 
> On a first glance, I can see two ways of fixing it:
> 
> 1) to increment devnode's device kobject refcount at OMAP3 .probe(), 
> decrementing it only at isp_video_release(). That will ensure that
> MC will only be removed after V4L2 removal.
> 
> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
> inside the MC .release() callback. 

This is a fair suggestion, indeed. Let me see what could be done there.
Albeit this is just *one* of the existing issues. It will not address all
problems fixed by the patchset.

> 
> That could be done by 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-15 Thread Laurent Pinchart
Hello,

On Tuesday 13 Dec 2016 15:23:53 Shuah Khan wrote:
> On 12/13/2016 05:24 AM, Mauro Carvalho Chehab wrote:
> > Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> >>> Hi Sakari,
> >>> 
> >>> I answered you point to point below, but I suspect that you missed how
> >>> the current approach works. So, I decided to write a quick summary here.
> >>> 
> >>> The character devices /dev/media? are created via cdev, with relies on a
> >>> kobject per device, with has an embedded struct kref inside.
> >>> 
> >>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> >>> 
> >>> struct device, when the code does:
> >>>   devnode->cdev.kobj.parent = >dev.kobj;
> >>> 
> >>> before calling cdev_add().
> >>> 
> >>> The current lifetime management is actually based on cdev's kobject's
> >>> refcount, provided by its embedded kref.
> >>> 
> >>> The kref warrants that any data associated with /dev/media0 won't be
> >>> freed if there are any pending system call. In other words, when
> >>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
> >>> and
> >>> will call kobject_put().
> >>> 
> >>> If the refcount is zero, it will call devnode->dev.release(). If the
> >>> kobject refcount is not zero, the data won't be freed.
> >>> 
> >>> So, in the best case scenario, there's no opened file descriptors
> >>> by the time media device node is unregistered. So, it will free
> >>> everything.
> >>> 
> >>> In the worse case scenario, e. g. when the driver is removed or
> >>> unbind while /dev/media0 has some opened file descriptor(s),
> >>> the cdev logic will do the proper lifetime management.
> >>> 
> >>> On such case, /dev/media0 disappears from the file system, so another
> >>> open
> >>> is not possible anymore. The data structures will remain allocated until
> >>> all associated file descriptors are not closed.
> >>> 
> >>> When all file descriptors are closed, the data will be freed.
> >>> 
> >>> On that time, it will call an optional dev.release() callback,
> >>> responsible to free any other data struct that the driver allocated.
> >> 
> >> The patchset does not change this. It's not a question of the
> >> media_devnode struct either. That's not an issue.
> >> 
> >> The issue is rather what else can be accessed through the media device
> >> and other interfaces. As IOCTLs are not serialised with device removal
> >> (which now releases much of the data structures)
> > 
> > Huh? ioctls are serialized with struct device removal. The Driver core
> > warrants that.

Code references please.
 
> >> there's a high chance of accessing
> >> released memory (or mutexes that have been already destroyed). An example
> >> of that is here, stopping a running pipeline after unbinding the device.
> >> What happens there is that the media device is released whilst it's in
> >> use through the video device.
> >> 
> >> 
> > 
> > It is not clear from the logs what the driver tried to do, but
> > that sounds like a driver's bug, with was not prepared to properly
> > handle unbinds.
> > 
> > The problem here is that isp_video_release() is called by V4L2
> > release logic, and not by the MC one:
> > 
> > static const struct v4l2_file_operations isp_video_fops = {
> > .owner  = THIS_MODULE,
> > .open   = isp_video_open,
> > .release= isp_video_release,
> > .poll   = vb2_fop_poll,
> > .unlocked_ioctl = video_ioctl2,
> > .mmap   = vb2_fop_mmap,
> > };
> > 
> > It seems that the driver's logic allows it to be called before or
> > after destroying the MC.
> 
> Right isp_video_release() will definitely be called after driver is
> gone which means media device is gone and the device itself.

Certainly not after the driver is gone (as in the module being unloaded from 
memory), but it can be called after the device is unbound from the driver, 
yes.

> Both au0828 and em28xx have these release handlers. Neither one uses
> devm resource for their device structs.

And no driver exposing objects to userspace-accessible code paths should. I've 
been pointing at how devm_kzalloc() is abused for more than a year now, it's 
nice to see that people slowly start listening.

> Also, both em28xx and au0828 keep disconnected state and have logic
> to detect the state of the driver and device. em28xx holds reference
> to v4l2->ref

That's very, very wrong. The v4l2_device::ref field must *not* be touched by 
drivers. Acquiring and releasing references to v4l2_device instances must be 
done with v4l2_device_get() and v4l2_device_put(), and the structure has a 
release handler that drivers can use. Why do people write such horrible code 
that pokes at private fields ?

> and releases the reference in em28xx_v4l2_close() which is
> its v4l2_file_operations .release handler. It also makes sure to not
> touch device hardware if 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-13 Thread Shuah Khan
Hi Sakari and Mauro,


On 12/13/2016 05:24 AM, Mauro Carvalho Chehab wrote:
> Em Tue, 13 Dec 2016 12:53:05 +0200
> Sakari Ailus  escreveu:
> 
>> Hi Mauro,
>>
>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>> Hi Sakari,
>>>
>>> I answered you point to point below, but I suspect that you missed how the 
>>> current approach works. So, I decided to write a quick summary here.
>>>
>>> The character devices /dev/media? are created via cdev, with relies on a 
>>> kobject per device, with has an embedded struct kref inside.
>>>
>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>> struct device, when the code does:
>>> devnode->cdev.kobj.parent = >dev.kobj;
>>>
>>> before calling cdev_add().
>>>
>>> The current lifetime management is actually based on cdev's kobject's
>>> refcount, provided by its embedded kref.
>>>
>>> The kref warrants that any data associated with /dev/media0 won't be 
>>> freed if there are any pending system call. In other words, when 
>>> cdev_del() is called, it will remove /dev/media0 from the filesystem, and
>>> will call kobject_put(). 
>>>
>>> If the refcount is zero, it will call devnode->dev.release(). If the 
>>> kobject refcount is not zero, the data won't be freed.
>>>
>>> So, in the best case scenario, there's no opened file descriptors
>>> by the time media device node is unregistered. So, it will free
>>> everything.
>>>
>>> In the worse case scenario, e. g. when the driver is removed or 
>>> unbind while /dev/media0 has some opened file descriptor(s),
>>> the cdev logic will do the proper lifetime management.
>>>
>>> On such case, /dev/media0 disappears from the file system, so another open
>>> is not possible anymore. The data structures will remain allocated until
>>> all associated file descriptors are not closed.
>>>
>>> When all file descriptors are closed, the data will be freed.
>>>
>>> On that time, it will call an optional dev.release() callback,
>>> responsible to free any other data struct that the driver allocated.  
>>
>> The patchset does not change this. It's not a question of the media_devnode
>> struct either. That's not an issue.
>>
>> The issue is rather what else can be accessed through the media device and
>> other interfaces. As IOCTLs are not serialised with device removal (which
>> now releases much of the data structures) 
> 
> Huh? ioctls are serialized with struct device removal. The Driver core
> warrants that.
> 
>> there's a high chance of accessing
>> released memory (or mutexes that have been already destroyed). An example of
>> that is here, stopping a running pipeline after unbinding the device. What
>> happens there is that the media device is released whilst it's in use
>> through the video device.
>>
>> 
> 
> It is not clear from the logs what the driver tried to do, but
> that sounds like a driver's bug, with was not prepared to properly
> handle unbinds.
> 
> The problem here is that isp_video_release() is called by V4L2
> release logic, and not by the MC one:
> 
> static const struct v4l2_file_operations isp_video_fops = {
>   .owner  = THIS_MODULE,
>   .open   = isp_video_open,
>   .release= isp_video_release,
>   .poll   = vb2_fop_poll,
>   .unlocked_ioctl = video_ioctl2,
>   .mmap   = vb2_fop_mmap,
> };
> 
> It seems that the driver's logic allows it to be called before or
> after destroying the MC.

Right isp_video_release() will definitely be called after driver is
gone which means media device is gone and the device itself.

Both au0828 and em28xx have these release handlers. Neither one uses
devm resource for their device structs.

Also, both em28xx and au0828 keep disconnected state and have logic
to detect the state of the driver and device. em28xx holds reference
to v4l2->ref and releases the reference in em28xx_v4l2_close() which is
its v4l2_file_operations .release handler. It also makes sure to not
touch device hardware if device is disconnected.

Also, media graph access is done only when it has a valid media_device.
au0828 allocates media_device struct and it gets free'd when it does
its unregister sequence. Subsequent calls will check if it is null.
It also does checks to see if media_device is registered or not in
some cases.

isp_video_release() isn't safe to be called after isp device is gone,
leave alone media_device. Since isp is a devm resource, it is long
gone when device_release() release managed resources.

I agree with Mauro that this is a driver problem. Mauro and I did lot
of work to get the USB drivers (em28xx and au0828) to handle disconnect
and unbind cases even before the media controller support was added to
them.

I think what needs to happen is:

1. Remove devm use from omap3
2. Make sure media graph isn't accessed after media_device is unregistered
3. Take reference to v4l2 device to be able to make 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-13 Thread Mauro Carvalho Chehab
Em Tue, 13 Dec 2016 12:53:05 +0200
Sakari Ailus  escreveu:

> Hi Mauro,
> 
> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> > Hi Sakari,
> > 
> > I answered you point to point below, but I suspect that you missed how the 
> > current approach works. So, I decided to write a quick summary here.
> > 
> > The character devices /dev/media? are created via cdev, with relies on a 
> > kobject per device, with has an embedded struct kref inside.
> > 
> > Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> > struct device, when the code does:
> > devnode->cdev.kobj.parent = >dev.kobj;
> > 
> > before calling cdev_add().
> > 
> > The current lifetime management is actually based on cdev's kobject's
> > refcount, provided by its embedded kref.
> > 
> > The kref warrants that any data associated with /dev/media0 won't be 
> > freed if there are any pending system call. In other words, when 
> > cdev_del() is called, it will remove /dev/media0 from the filesystem, and
> > will call kobject_put(). 
> > 
> > If the refcount is zero, it will call devnode->dev.release(). If the 
> > kobject refcount is not zero, the data won't be freed.
> > 
> > So, in the best case scenario, there's no opened file descriptors
> > by the time media device node is unregistered. So, it will free
> > everything.
> > 
> > In the worse case scenario, e. g. when the driver is removed or 
> > unbind while /dev/media0 has some opened file descriptor(s),
> > the cdev logic will do the proper lifetime management.
> > 
> > On such case, /dev/media0 disappears from the file system, so another open
> > is not possible anymore. The data structures will remain allocated until
> > all associated file descriptors are not closed.
> > 
> > When all file descriptors are closed, the data will be freed.
> > 
> > On that time, it will call an optional dev.release() callback,
> > responsible to free any other data struct that the driver allocated.  
> 
> The patchset does not change this. It's not a question of the media_devnode
> struct either. That's not an issue.
> 
> The issue is rather what else can be accessed through the media device and
> other interfaces. As IOCTLs are not serialised with device removal (which
> now releases much of the data structures) 

Huh? ioctls are serialized with struct device removal. The Driver core
warrants that.

> there's a high chance of accessing
> released memory (or mutexes that have been already destroyed). An example of
> that is here, stopping a running pipeline after unbinding the device. What
> happens there is that the media device is released whilst it's in use
> through the video device.
> 
> 

It is not clear from the logs what the driver tried to do, but
that sounds like a driver's bug, with was not prepared to properly
handle unbinds.

The problem here is that isp_video_release() is called by V4L2
release logic, and not by the MC one:

static const struct v4l2_file_operations isp_video_fops = {
.owner  = THIS_MODULE,
.open   = isp_video_open,
.release= isp_video_release,
.poll   = vb2_fop_poll,
.unlocked_ioctl = video_ioctl2,
.mmap   = vb2_fop_mmap,
};

It seems that the driver's logic allows it to be called before or
after destroying the MC.

Assuming that, if the OMAP3 driver is not used it works,
it means that, if the isp_video_release() is called
first, no errors will happen, but if MC is destroyed before
V4L2 call to its .release() callback, as there's no logic at the
driver that would detect it, isp_video_release() will be calling
isp_video_streamoff(), with depends on the MC to work.

On a first glance, I can see two ways of fixing it:

1) to increment devnode's device kobject refcount at OMAP3 .probe(), 
decrementing it only at isp_video_release(). That will ensure that
MC will only be removed after V4L2 removal.

2) to call isp_video_streamoff() before removing the MC stuff, e. g.
inside the MC .release() callback. 

That could be done by overwriting the dev.release() callback at
omap3 driver, as I discussed on my past e-mails, and flagging the
driver that it should not accept streamon anymore, as the hardware
is being disconnecting.

Btw, that explains a lot why Shuah can't reproduce the stuff you're
complaining on her USB hardware.

The USB subsystem has a a .disconnect() callback that notifies
the drivers that a device was unbound (likely physically removed).
The way USB media drivers handle it is by returning -ENODEV to any
V4L2 call that would try to touch at the hardware after unbound.

So, on au0828, there's no need to add any extra release logic.

> 
> 
> > 
> > Em Mon, 28 Nov 2016 12:45:56 +0200
> > Sakari Ailus  escreveu:
> >   
> > > Hi Mauro,
> > > 
> > > On Tue, Nov 22, 2016 at 03:44:29PM 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-12-13 Thread Sakari Ailus
Hi Mauro,

On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> Hi Sakari,
> 
> I answered you point to point below, but I suspect that you missed how the 
> current approach works. So, I decided to write a quick summary here.
> 
> The character devices /dev/media? are created via cdev, with relies on a 
> kobject per device, with has an embedded struct kref inside.
> 
> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> struct device, when the code does:
>   devnode->cdev.kobj.parent = >dev.kobj;
> 
> before calling cdev_add().
> 
> The current lifetime management is actually based on cdev's kobject's
> refcount, provided by its embedded kref.
> 
> The kref warrants that any data associated with /dev/media0 won't be 
> freed if there are any pending system call. In other words, when 
> cdev_del() is called, it will remove /dev/media0 from the filesystem, and
> will call kobject_put(). 
> 
> If the refcount is zero, it will call devnode->dev.release(). If the 
> kobject refcount is not zero, the data won't be freed.
> 
> So, in the best case scenario, there's no opened file descriptors
> by the time media device node is unregistered. So, it will free
> everything.
> 
> In the worse case scenario, e. g. when the driver is removed or 
> unbind while /dev/media0 has some opened file descriptor(s),
> the cdev logic will do the proper lifetime management.
> 
> On such case, /dev/media0 disappears from the file system, so another open
> is not possible anymore. The data structures will remain allocated until
> all associated file descriptors are not closed.
> 
> When all file descriptors are closed, the data will be freed.
> 
> On that time, it will call an optional dev.release() callback,
> responsible to free any other data struct that the driver allocated.

The patchset does not change this. It's not a question of the media_devnode
struct either. That's not an issue.

The issue is rather what else can be accessed through the media device and
other interfaces. As IOCTLs are not serialised with device removal (which
now releases much of the data structures) there's a high chance of accessing
released memory (or mutexes that have been already destroyed). An example of
that is here, stopping a running pipeline after unbinding the device. What
happens there is that the media device is released whilst it's in use
through the video device.




> 
> Em Mon, 28 Nov 2016 12:45:56 +0200
> Sakari Ailus  escreveu:
> 
> > Hi Mauro,
> > 
> > On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:
> > > Em Mon, 14 Nov 2016 15:27:22 +0200
> > > Sakari Ailus  escreveu:
> > >   
> > > > Hi Mauro,
> > > > 
> > > > I'm replying below but let me first summarise the remaining problem area
> > > > that this patchset addresses.  
> > > 
> > > Sorry for answering too late. Somehow, I missed this email in the cloud.
> > >   
> > > > The problems you and Shuah have seen and partially addressed are 
> > > > related to
> > > > a larger picture which is the lifetime of (mostly) memory resources 
> > > > related
> > > > to various objects used by as well both the Media controller and V4L2
> > > > frameworks (including videobuf2) as the drivers which make use of these
> > > > frameworks.
> > > > 
> > > > The Media controller and V4L2 interfaces exposed by drivers consist of
> > > > multiple devices nodes, data structures with interdependencies within 
> > > > the
> > > > frameworks themselves and dependencies from the driver's own data 
> > > > structures
> > > > towards the framework data structures. The Media device and the media 
> > > > graph
> > > > objects are central to the problem area as well.
> > > > 
> > > > So what are the issues then? Until now, we've attempted to regulate the
> > > > users' ability to access the devices at the time they're being 
> > > > unregistered
> > > > (and the associated memory released), but that approach does not really
> > > > scale: you have to make sure that the unregistering also will not take 
> > > > place
> > > > _during_ the system call --- not just in the beginning of it.
> > > >
> > > > The media graph contains media graph objects, some of which are media
> > > > entities (contained in struct video_device or struct v4l2_subdev, for
> > > > instance). Media entities as graph nodes have links to other entities. 
> > > > In
> > > > order to implement the system calls, the drivers do parse this graph in
> > > > order to obtain information they need to obtain from it. For instance, 
> > > > it's
> > > > not uncommon for an implementation for video node format enumeration to
> > > > figure out which sub-device the link from that video nodes leads to. 
> > > > Drivers
> > > > may also have similar paths they follow.
> > > > 
> > > > Interrupt handling may also be taking place 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-29 Thread Mauro Carvalho Chehab
Hi Sakari,

I answered you point to point below, but I suspect that you missed how the 
current approach works. So, I decided to write a quick summary here.

The character devices /dev/media? are created via cdev, with relies on a 
kobject per device, with has an embedded struct kref inside.

Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
struct device, when the code does:
devnode->cdev.kobj.parent = >dev.kobj;

before calling cdev_add().

The current lifetime management is actually based on cdev's kobject's
refcount, provided by its embedded kref.

The kref warrants that any data associated with /dev/media0 won't be 
freed if there are any pending system call. In other words, when 
cdev_del() is called, it will remove /dev/media0 from the filesystem, and
will call kobject_put(). 

If the refcount is zero, it will call devnode->dev.release(). If the 
kobject refcount is not zero, the data won't be freed.

So, in the best case scenario, there's no opened file descriptors
by the time media device node is unregistered. So, it will free
everything.

In the worse case scenario, e. g. when the driver is removed or 
unbind while /dev/media0 has some opened file descriptor(s),
the cdev logic will do the proper lifetime management.

On such case, /dev/media0 disappears from the file system, so another open
is not possible anymore. The data structures will remain allocated until
all associated file descriptors are not closed.

When all file descriptors are closed, the data will be freed.

On that time, it will call an optional dev.release() callback,
responsible to free any other data struct that the driver allocated.

Em Mon, 28 Nov 2016 12:45:56 +0200
Sakari Ailus  escreveu:

> Hi Mauro,
> 
> On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:
> > Em Mon, 14 Nov 2016 15:27:22 +0200
> > Sakari Ailus  escreveu:
> >   
> > > Hi Mauro,
> > > 
> > > I'm replying below but let me first summarise the remaining problem area
> > > that this patchset addresses.  
> > 
> > Sorry for answering too late. Somehow, I missed this email in the cloud.
> >   
> > > The problems you and Shuah have seen and partially addressed are related 
> > > to
> > > a larger picture which is the lifetime of (mostly) memory resources 
> > > related
> > > to various objects used by as well both the Media controller and V4L2
> > > frameworks (including videobuf2) as the drivers which make use of these
> > > frameworks.
> > > 
> > > The Media controller and V4L2 interfaces exposed by drivers consist of
> > > multiple devices nodes, data structures with interdependencies within the
> > > frameworks themselves and dependencies from the driver's own data 
> > > structures
> > > towards the framework data structures. The Media device and the media 
> > > graph
> > > objects are central to the problem area as well.
> > > 
> > > So what are the issues then? Until now, we've attempted to regulate the
> > > users' ability to access the devices at the time they're being 
> > > unregistered
> > > (and the associated memory released), but that approach does not really
> > > scale: you have to make sure that the unregistering also will not take 
> > > place
> > > _during_ the system call --- not just in the beginning of it.
> > >
> > > The media graph contains media graph objects, some of which are media
> > > entities (contained in struct video_device or struct v4l2_subdev, for
> > > instance). Media entities as graph nodes have links to other entities. In
> > > order to implement the system calls, the drivers do parse this graph in
> > > order to obtain information they need to obtain from it. For instance, 
> > > it's
> > > not uncommon for an implementation for video node format enumeration to
> > > figure out which sub-device the link from that video nodes leads to. 
> > > Drivers
> > > may also have similar paths they follow.
> > > 
> > > Interrupt handling may also be taking place during the device removal 
> > > during
> > > which a number of data structures are now freed. This really does call 
> > > for a
> > > solution based on reference counting.
> > > 
> > > This leads to the conclusion that all the memory resources that could be
> > > accessed by the drivers or the kernel frameworks must stay intact until 
> > > the
> > > last file handle to the said devices is closed. Otherwise, there is a
> > > possibility of accessing released memory.  
> > 
> > So far, we're aligned.
> >   
> > > Right now in a lot of the cases, such as for video device and sub-device
> > > nodes, we do release the memory when a device (as in struct device) is 
> > > being
> > > unregistered. There simply is in the current mainline kernel a way to do
> > > this in a safe way.  
> >   
> > > Drivers do use devm_() family of functions to allocate
> > > the memory of the media graph object and their internal data structures.  
> > 
> > Removing devm_() from those drivers seem to be the 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-28 Thread Sakari Ailus
Hi Mauro,

On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:
> Em Mon, 14 Nov 2016 15:27:22 +0200
> Sakari Ailus  escreveu:
> 
> > Hi Mauro,
> > 
> > I'm replying below but let me first summarise the remaining problem area
> > that this patchset addresses.
> 
> Sorry for answering too late. Somehow, I missed this email in the cloud.
> 
> > The problems you and Shuah have seen and partially addressed are related to
> > a larger picture which is the lifetime of (mostly) memory resources related
> > to various objects used by as well both the Media controller and V4L2
> > frameworks (including videobuf2) as the drivers which make use of these
> > frameworks.
> > 
> > The Media controller and V4L2 interfaces exposed by drivers consist of
> > multiple devices nodes, data structures with interdependencies within the
> > frameworks themselves and dependencies from the driver's own data structures
> > towards the framework data structures. The Media device and the media graph
> > objects are central to the problem area as well.
> > 
> > So what are the issues then? Until now, we've attempted to regulate the
> > users' ability to access the devices at the time they're being unregistered
> > (and the associated memory released), but that approach does not really
> > scale: you have to make sure that the unregistering also will not take place
> > _during_ the system call --- not just in the beginning of it.
> >
> > The media graph contains media graph objects, some of which are media
> > entities (contained in struct video_device or struct v4l2_subdev, for
> > instance). Media entities as graph nodes have links to other entities. In
> > order to implement the system calls, the drivers do parse this graph in
> > order to obtain information they need to obtain from it. For instance, it's
> > not uncommon for an implementation for video node format enumeration to
> > figure out which sub-device the link from that video nodes leads to. Drivers
> > may also have similar paths they follow.
> > 
> > Interrupt handling may also be taking place during the device removal during
> > which a number of data structures are now freed. This really does call for a
> > solution based on reference counting.
> > 
> > This leads to the conclusion that all the memory resources that could be
> > accessed by the drivers or the kernel frameworks must stay intact until the
> > last file handle to the said devices is closed. Otherwise, there is a
> > possibility of accessing released memory.
> 
> So far, we're aligned.
> 
> > Right now in a lot of the cases, such as for video device and sub-device
> > nodes, we do release the memory when a device (as in struct device) is being
> > unregistered. There simply is in the current mainline kernel a way to do
> > this in a safe way.
> 
> > Drivers do use devm_() family of functions to allocate
> > the memory of the media graph object and their internal data structures.
> 
> Removing devm_() from those drivers seem to be the first thing to do,
> and it is independent from any MC rework.
> 
> As you'll see below, we have different opinions on other matters,
> so, my suggestion about how to proceed is that you should submit
> first the things we're aligned.
> 
> In other words, please submit the patches that get rid of devm_()
> first. Then, we can address the remaining stuff.

Removing devm_*() is needed, but when should the memory be released then?
There's no callback currently from the media device the driver could use.

OTOH devm_*() interfaces are very convenient to use, it's a lot of extra
work for drivers to handle releasing all the resources. It'd be great to
find another object where to bind those resources. Still, device_release()
does first release devres resources and then calls the release() callback,
which obviously makes the setup problematic to begin with.

> 
> > 
> > With this patchset:
> > 
> > - The media_device which again contains the media_devnode is allocated
> >   dynamically. The lifetime of the media device --- and the media graph
> >   objects it contains --- is bound to device nodes that are bound to the
> >   media device (video and sub-device nodes) as well as open file handles.
> 
> No. Data structures with cdev embedded into them have their lifetime
> controlled by the driver's core, and are destroyed only when there's
> no pending fops. The current approach uses device's core dev.release()

Fair enough; that part is indeed handled towards the user space as far as I
can tell. However that's still not enough: the media graph contains the
graph objects, and the media device that holds the graph, must outlive the
graph objects themselves.

Also removing entities doesn't really work currently: touching an entity, a
link or any kind of a graph object is not guaranteed to work unless you hold
the media graph lock. And that's simply unfeasible. Just look at what the
drivers do with entities: they use the v4l2_subdev interface and the 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-22 Thread Shuah Khan
On 11/22/2016 10:44 AM, Mauro Carvalho Chehab wrote:
> Em Mon, 14 Nov 2016 15:27:22 +0200
> Sakari Ailus  escreveu:
> 
>> Hi Mauro,
>>
>> I'm replying below but let me first summarise the remaining problem area
>> that this patchset addresses.
> 
> Sorry for answering too late. Somehow, I missed this email in the cloud.
> 
>> The problems you and Shuah have seen and partially addressed are related to
>> a larger picture which is the lifetime of (mostly) memory resources related
>> to various objects used by as well both the Media controller and V4L2
>> frameworks (including videobuf2) as the drivers which make use of these
>> frameworks.
>>
>> The Media controller and V4L2 interfaces exposed by drivers consist of
>> multiple devices nodes, data structures with interdependencies within the
>> frameworks themselves and dependencies from the driver's own data structures
>> towards the framework data structures. The Media device and the media graph
>> objects are central to the problem area as well.
>>
>> So what are the issues then? Until now, we've attempted to regulate the
>> users' ability to access the devices at the time they're being unregistered
>> (and the associated memory released), but that approach does not really
>> scale: you have to make sure that the unregistering also will not take place
>> _during_ the system call --- not just in the beginning of it.
>>
>> The media graph contains media graph objects, some of which are media
>> entities (contained in struct video_device or struct v4l2_subdev, for
>> instance). Media entities as graph nodes have links to other entities. In
>> order to implement the system calls, the drivers do parse this graph in
>> order to obtain information they need to obtain from it. For instance, it's
>> not uncommon for an implementation for video node format enumeration to
>> figure out which sub-device the link from that video nodes leads to. Drivers
>> may also have similar paths they follow.
>>
>> Interrupt handling may also be taking place during the device removal during
>> which a number of data structures are now freed. This really does call for a
>> solution based on reference counting.
>>
>> This leads to the conclusion that all the memory resources that could be
>> accessed by the drivers or the kernel frameworks must stay intact until the
>> last file handle to the said devices is closed. Otherwise, there is a
>> possibility of accessing released memory.
> 
> So far, we're aligned.
> 
>> Right now in a lot of the cases, such as for video device and sub-device
>> nodes, we do release the memory when a device (as in struct device) is being
>> unregistered. There simply is in the current mainline kernel a way to do
>> this in a safe way.
> 
>> Drivers do use devm_() family of functions to allocate
>> the memory of the media graph object and their internal data structures.
> 
> Removing devm_() from those drivers seem to be the first thing to do,
> and it is independent from any MC rework.
> 
> As you'll see below, we have different opinions on other matters,
> so, my suggestion about how to proceed is that you should submit
> first the things we're aligned.
> 
> In other words, please submit the patches that get rid of devm_()
> first. Then, we can address the remaining stuff.

I reviewed the patches that are not reverts. Especially the patches
that get rid of devm usage in omap3isp. The dmesg included in this
isn't complete and I also looked at the dmesg Lauren sent me from vsp1.

I tested unbind of au0828 while vlc is running streaming video on 4.9-rc5
unbind is successful with no Oops. vlc stops streaming and shows the updated
device list which doesn't include the video device that disappeared due to
unbind, just as expected.

I also tested it on 4.9-rc4 with Media Device Allocator API which
includes au0828 and snd_usb_audio using the API. Same result No Oops.

So I do think it is worth while testing Vsp1 and Omap3isp on 4.9-rc5 with
just the changes to not use devm. I am going to share my analysis of the
VSP1 log Lauren shared with me. I have seen a very similar log when media
device was devm with au0828 and snd_usb_audio.

The log shows vsp1_video_release() which is a fops release routine invoking
various cleanup routines until it runs into Oops accessing already released
resource. This release routine gets called when user application closes the
device file.

vsp1_video is a devm resource. This will get released very early on during
unbind from device_release()

/*
 * Some platform devices are driven without driver attached
 * and managed resources may have been acquired.  Make sure
 * all resources are released.
 *
 * Drivers still can add resources into device after device
 * is deleted but alive, so release devres here to avoid
 * possible memory leak.
 */
devres_release_all(dev);

So way before vsp1_video_release() is called, this resource is gone.
I 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-22 Thread Shuah Khan
On 11/22/2016 11:13 AM, Hans Verkuil wrote:
> On 22/11/16 18:44, Mauro Carvalho Chehab wrote:
>>> * media: fix use-after-free in cdev_put() when app exits after driver unbind
>>>   5b28dde51d0c
>>>
>>> The patch avoids the problem of deleting a character device (cdev_del())
>>> after its memory has been released. The change is sound as such but the
>>> problem is addressed by another, a lot more simple patch in my series:
>>>
>>> 
>>
>> Your approach is not clean, as it is based on a cdev's hack of doing:
>>
>> devnode->cdev.kobj.parent = >dev.kobj;
>>
>> That is an ugly hack, as it touches inside cdev's internal stuff,
>> to do something that the driver's core doesn't expect. This is the
>> kind of patch that could cause messy errors, by cheating with the
>> cdev's internal refcount checking.
> 
> Actually, this is what many frameworks in the kernel do:
> 
> $ git grep "kobj.parent = " drivers/
> drivers/base/bus.c: dev->kobj.parent = parent_of_root;
> drivers/base/core.c:dev->kobj.parent = kobj;
> drivers/char/tpm/tpm-chip.c:chip->cdev.kobj.parent = >dev.kobj;
> drivers/dax/dax.c:  cdev->kobj.parent = >kobj;
> drivers/gpio/gpiolib.c: gdev->chrdev.kobj.parent = >dev.kobj;
> drivers/iio/industrialio-core.c:indio_dev->chrdev.kobj.parent = 
> _dev->dev.kobj;
> drivers/infiniband/core/user_mad.c: port->cdev.kobj.parent = 
> _dev->kobj;
> drivers/infiniband/core/user_mad.c: port->sm_cdev.kobj.parent = 
> _dev->kobj;
> drivers/infiniband/core/uverbs_main.c:  uverbs_dev->cdev.kobj.parent = 
> _dev->kobj;
> drivers/infiniband/hw/hfi1/device.c:cdev->kobj.parent = parent;
> drivers/input/evdev.c:  evdev->cdev.kobj.parent = >dev.kobj;
> drivers/input/joydev.c: joydev->cdev.kobj.parent = >dev.kobj;
> drivers/input/mousedev.c:   mousedev->cdev.kobj.parent = 
> >dev.kobj;
> drivers/media/cec/cec-core.c:   devnode->cdev.kobj.parent = 
> >dev.kobj;
> drivers/media/media-devnode.c:  devnode->cdev.kobj.parent = 
> >dev.kobj;
> drivers/platform/chrome/cros_ec_dev.c:  ec->cdev.kobj.parent = 
> >class_dev.kobj;
> drivers/rtc/rtc-dev.c:  rtc->char_dev.kobj.parent = >dev.kobj;
> 
> And it is what Russell King told me to use in CEC as well.

Hans recommended the approach to me when I tied cdev to media_devnode
as its parent while fixing the media_device lifetime issues.

> drivers/media/media-devnode.c:  devnode->cdev.kobj.parent = 
> >dev.kobj;

> 
> fs/chardev.c currently doesn't have a function that sets cdev.kobj.parent,
> even though it does use it internally to call kobject_get/put on the
> parent kobject. It really expects the caller to set cdev.kobj.parent.
> 
> It ensures that when cdev_add/del is called the parent gets correctly
> refcounted as well.

Yes this is what is done now to make sure cdev lifetime matches its
parent and parent doesn't get released while cdev is in use. There is
seem to some concerns about referencing cdev private object in other
parts of the kernel. However, that is something that could be solved
by adding cdev interface to set parent.

thanks,
-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-22 Thread Hans Verkuil

On 22/11/16 18:44, Mauro Carvalho Chehab wrote:

* media: fix use-after-free in cdev_put() when app exits after driver unbind
  5b28dde51d0c

The patch avoids the problem of deleting a character device (cdev_del())
after its memory has been released. The change is sound as such but the
problem is addressed by another, a lot more simple patch in my series:




Your approach is not clean, as it is based on a cdev's hack of doing:

devnode->cdev.kobj.parent = >dev.kobj;

That is an ugly hack, as it touches inside cdev's internal stuff,
to do something that the driver's core doesn't expect. This is the
kind of patch that could cause messy errors, by cheating with the
cdev's internal refcount checking.


Actually, this is what many frameworks in the kernel do:

$ git grep "kobj.parent = " drivers/
drivers/base/bus.c: dev->kobj.parent = parent_of_root;
drivers/base/core.c:dev->kobj.parent = kobj;
drivers/char/tpm/tpm-chip.c:chip->cdev.kobj.parent = >dev.kobj;
drivers/dax/dax.c:  cdev->kobj.parent = >kobj;
drivers/gpio/gpiolib.c: gdev->chrdev.kobj.parent = >dev.kobj;
drivers/iio/industrialio-core.c:indio_dev->chrdev.kobj.parent = 
_dev->dev.kobj;
drivers/infiniband/core/user_mad.c: port->cdev.kobj.parent = 
_dev->kobj;
drivers/infiniband/core/user_mad.c: port->sm_cdev.kobj.parent = 
_dev->kobj;
drivers/infiniband/core/uverbs_main.c:  uverbs_dev->cdev.kobj.parent = 
_dev->kobj;

drivers/infiniband/hw/hfi1/device.c:cdev->kobj.parent = parent;
drivers/input/evdev.c:  evdev->cdev.kobj.parent = >dev.kobj;
drivers/input/joydev.c: joydev->cdev.kobj.parent = >dev.kobj;
drivers/input/mousedev.c:   mousedev->cdev.kobj.parent = 
>dev.kobj;
drivers/media/cec/cec-core.c:   devnode->cdev.kobj.parent = 
>dev.kobj;
drivers/media/media-devnode.c:  devnode->cdev.kobj.parent = 
>dev.kobj;
drivers/platform/chrome/cros_ec_dev.c:  ec->cdev.kobj.parent = 
>class_dev.kobj;

drivers/rtc/rtc-dev.c:  rtc->char_dev.kobj.parent = >dev.kobj;

And it is what Russell King told me to use in CEC as well.

fs/chardev.c currently doesn't have a function that sets cdev.kobj.parent,
even though it does use it internally to call kobject_get/put on the
parent kobject. It really expects the caller to set cdev.kobj.parent.

It ensures that when cdev_add/del is called the parent gets correctly
refcounted as well.

I plan on looking more into this patch series on Thursday or perhaps Friday.

Regards,

Hans
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-22 Thread Mauro Carvalho Chehab
Em Mon, 14 Nov 2016 15:27:22 +0200
Sakari Ailus  escreveu:

> Hi Mauro,
> 
> I'm replying below but let me first summarise the remaining problem area
> that this patchset addresses.

Sorry for answering too late. Somehow, I missed this email in the cloud.

> The problems you and Shuah have seen and partially addressed are related to
> a larger picture which is the lifetime of (mostly) memory resources related
> to various objects used by as well both the Media controller and V4L2
> frameworks (including videobuf2) as the drivers which make use of these
> frameworks.
> 
> The Media controller and V4L2 interfaces exposed by drivers consist of
> multiple devices nodes, data structures with interdependencies within the
> frameworks themselves and dependencies from the driver's own data structures
> towards the framework data structures. The Media device and the media graph
> objects are central to the problem area as well.
> 
> So what are the issues then? Until now, we've attempted to regulate the
> users' ability to access the devices at the time they're being unregistered
> (and the associated memory released), but that approach does not really
> scale: you have to make sure that the unregistering also will not take place
> _during_ the system call --- not just in the beginning of it.
>
> The media graph contains media graph objects, some of which are media
> entities (contained in struct video_device or struct v4l2_subdev, for
> instance). Media entities as graph nodes have links to other entities. In
> order to implement the system calls, the drivers do parse this graph in
> order to obtain information they need to obtain from it. For instance, it's
> not uncommon for an implementation for video node format enumeration to
> figure out which sub-device the link from that video nodes leads to. Drivers
> may also have similar paths they follow.
> 
> Interrupt handling may also be taking place during the device removal during
> which a number of data structures are now freed. This really does call for a
> solution based on reference counting.
> 
> This leads to the conclusion that all the memory resources that could be
> accessed by the drivers or the kernel frameworks must stay intact until the
> last file handle to the said devices is closed. Otherwise, there is a
> possibility of accessing released memory.

So far, we're aligned.

> Right now in a lot of the cases, such as for video device and sub-device
> nodes, we do release the memory when a device (as in struct device) is being
> unregistered. There simply is in the current mainline kernel a way to do
> this in a safe way.

> Drivers do use devm_() family of functions to allocate
> the memory of the media graph object and their internal data structures.

Removing devm_() from those drivers seem to be the first thing to do,
and it is independent from any MC rework.

As you'll see below, we have different opinions on other matters,
so, my suggestion about how to proceed is that you should submit
first the things we're aligned.

In other words, please submit the patches that get rid of devm_()
first. Then, we can address the remaining stuff.

> 
> With this patchset:
> 
> - The media_device which again contains the media_devnode is allocated
>   dynamically. The lifetime of the media device --- and the media graph
>   objects it contains --- is bound to device nodes that are bound to the
>   media device (video and sub-device nodes) as well as open file handles.

No. Data structures with cdev embedded into them have their lifetime
controlled by the driver's core, and are destroyed only when there's
no pending fops. The current approach uses device's core dev.release()
callback to release memory.

In other words, dev.release() is only called after the driver's base
knows that the cdev is not in use anymore. So, no ioctl() or any
other syscalls on that point.

Ok, nothing prevents some driver to do the wrong thing, keeping a
copy of struct device and using it after free, for example storing
it on a devm alocated memory, and printing some debug message
after struct device is freed, but this is a driver's bug.

What really worries me on this series is that it seemed that you 
didn't understood how the current approach works. So, you decided
to just revert it and start from scratch. This is dangerous, as
it could cause problems to other scenarios than yours.

> - Care is taken that the unregistration process and releasing memory happens
>   in the right order. This was not always the case previously.

Freeing memory for struct media_devnode, struct device and struct cdev 
is currently handled by the driver's core, when it known to be safe,
and using the same logic that other subsystems do.

We might do it different, but we need a strong reason to do it, as
going away from the usual practice is dangerous.

> - The driver remains responsible for the memory of the video and sub-device
>   nodes. However, now the Media controller provides a 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-14 Thread Sakari Ailus
Hi Mauro,

I'm replying below but let me first summarise the remaining problem area
that this patchset addresses.

The problems you and Shuah have seen and partially addressed are related to
a larger picture which is the lifetime of (mostly) memory resources related
to various objects used by as well both the Media controller and V4L2
frameworks (including videobuf2) as the drivers which make use of these
frameworks.

The Media controller and V4L2 interfaces exposed by drivers consist of
multiple devices nodes, data structures with interdependencies within the
frameworks themselves and dependencies from the driver's own data structures
towards the framework data structures. The Media device and the media graph
objects are central to the problem area as well.

So what are the issues then? Until now, we've attempted to regulate the
users' ability to access the devices at the time they're being unregistered
(and the associated memory released), but that approach does not really
scale: you have to make sure that the unregistering also will not take place
_during_ the system call --- not just in the beginning of it.

The media graph contains media graph objects, some of which are media
entities (contained in struct video_device or struct v4l2_subdev, for
instance). Media entities as graph nodes have links to other entities. In
order to implement the system calls, the drivers do parse this graph in
order to obtain information they need to obtain from it. For instance, it's
not uncommon for an implementation for video node format enumeration to
figure out which sub-device the link from that video nodes leads to. Drivers
may also have similar paths they follow.

Interrupt handling may also be taking place during the device removal during
which a number of data structures are now freed. This really does call for a
solution based on reference counting.

This leads to the conclusion that all the memory resources that could be
accessed by the drivers or the kernel frameworks must stay intact until the
last file handle to the said devices is closed. Otherwise, there is a
possibility of accessing released memory.

Right now in a lot of the cases, such as for video device and sub-device
nodes, we do release the memory when a device (as in struct device) is being
unregistered. There simply is in the current mainline kernel a way to do
this in a safe way. Drivers do use devm_() family of functions to allocate
the memory of the media graph object and their internal data structures.

With this patchset:

- The media_device which again contains the media_devnode is allocated
  dynamically. The lifetime of the media device --- and the media graph
  objects it contains --- is bound to device nodes that are bound to the
  media device (video and sub-device nodes) as well as open file handles.

- Care is taken that the unregistration process and releasing memory happens
  in the right order. This was not always the case previously.

- The driver remains responsible for the memory of the video and sub-device
  nodes. However, now the Media controller provides a convenient callback to
  the driver to release any memory resources when the time has come to do
  so. This takes place just before the media device memory is released.

- Drivers that do not strictly need to be removable require no changes. The
  benefits of this set become tangible for any driver by changing how the
  driver allocates memory for the data structures. Ideally at least
  drivers for hot-removable devices should be converted.

In order to make the current drivers to behave well it is necessary to make
changes to how memory is allocated in the drivers. If you look at the sample
patches that are part of the set for the omap3isp driver, you'll find that
around 95% of the changes are related to removing the user of devm_() family
of functions instead of Media controller API changes. In this regard, the
approach taken here requires very little if any additional overhead.

On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:
> Em Wed, 9 Nov 2016 10:00:58 -0700
> Shuah Khan  escreveu:
> 
> > > Maybe we can get the Media Device Allocator API work in and then we can
> > > get your RFC series in after that. Here is what I propose:
> > > 
> > > - Keep the fixes in 4.9
> 
> Fixes should always be kept. Reverting a fix is not an option.
> Instead, do incremental patches on the top of it.
> 
> > > - Get Media Device Allocator API patches into 4.9.  
> > 
> > I meant 4.10 not 4.9
> > 
> > > - snd-usb-auido work go into 4.10
> 
> Sounds like a plan.
> 
> > > Then your RFC series could go in. I am looking at the RFC series and that
> > > the drivers need to change as well, so this RFC work could take longer.
> > > Since we have to make media_device sharable, it is necessary to have a
> > > global list approach Media Device Allocator API takes. So it is possible
> > > for your RFC series to go on top of the Media Device Allocator 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-09 Thread Mauro Carvalho Chehab
Em Wed, 9 Nov 2016 10:00:58 -0700
Shuah Khan  escreveu:

> > Maybe we can get the Media Device Allocator API work in and then we can
> > get your RFC series in after that. Here is what I propose:
> > 
> > - Keep the fixes in 4.9

Fixes should always be kept. Reverting a fix is not an option.
Instead, do incremental patches on the top of it.

> > - Get Media Device Allocator API patches into 4.9.  
> 
> I meant 4.10 not 4.9
> 
> > - snd-usb-auido work go into 4.10

Sounds like a plan.

> > Then your RFC series could go in. I am looking at the RFC series and that
> > the drivers need to change as well, so this RFC work could take longer.
> > Since we have to make media_device sharable, it is necessary to have a
> > global list approach Media Device Allocator API takes. So it is possible
> > for your RFC series to go on top of the Media Device Allocator API.

Firstly, the RFC series should be converted into something that can
be applicable upstream, e. g.:

- doing the changes over the top of upstream, instead of needing to
  revert patches;

- change all drivers as the kAPI changes;

- be git bisectable, e. g. all patches should compile and run fine
  after each single patch, without introducing regressions.

That probably means that the series should be tested not only on
omap3, but also on some other device drivers.


-- 
Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-09 Thread Shuah Khan
On 11/09/2016 09:49 AM, Shuah Khan wrote:
> On 11/08/2016 01:19 AM, Sakari Ailus wrote:
>> Hi Shuah,
>>
>> On Mon, Nov 07, 2016 at 01:16:45PM -0700, Shuah Khan wrote:
>>> Hi Sakari,
>>>
>>> On 08/26/2016 05:43 PM, Sakari Ailus wrote:
 Hi folks,

 This is the third version of the RFC set to fix referencing in media
 devices.

 The lifetime of the media device (and media devnode) is now bound to that
 of struct device embedded in it and its memory is only released once the
 last reference is gone: unregistering is simply unregistering, it no
 longer should release memory which could be further accessed.

  
 A video node or a sub-device node also gets a reference to the media
 device, i.e. the release function of the video device node will release
 its reference to the media device. The same goes for file handles to the
 media device.

  
 As a side effect of this is that the media device, it is allocate together
 with the media devnode. The driver may also rely its own resources to the
 media device. Alternatively there's also a priv field to hold drivers
 private pointer (for container_of() is an option in this case). We could
 drop one of these options but currently both are possible.

  
 I've tested this by manually unbinding the omap3isp platform device while
 streaming. Driver changes are required for this to work; by not using
 dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
 supported. This is still unlikely to be a grave problem as there are not
 that many device drivers that support physically removable devices. We've
 had this problem for other devices for many years without paying much
 notice --- that doesn't mean I don't think at least drivers for removable
 devices shouldn't be changed as part of the set later on, I'd just like to
 get review comments on the approach first.

  
 The three patches that originally partially resolved some of these issues
 are reverted in the beginning of the set. I'm still posting this as an RFC
 mainly since the testing is somewhat limited so far.
>>>
>>> The main difference between the approach taken in these 3 reverted fixes and
>>> this RFC series is as follows:
>>>
>>> Reverted fixes:
>>> - Fix the lifetime problem with the media devnode by dynamically allocating
>>>   devnode instead of media_device. One of the main considerations to this
>>>   approach is to isolate the changes in media core and avoid changes to
>>>   drivers.
>>> - I tested these fixes extensively and added selftests and README file for
>>>   the regression tests. I haven't seen any problems after these fixes went
>>>   in while physically removing au0828 device, em028xx, and uvcvideo
>>
>> I'd rather call them workarounds, as they do work around the issues rather
>> than properly fixing them. This approach isn't really extensible to fix the
>> remaining problems either. It is true that *some* of the issues that were
>> present before these patches do not show up anymore with them, but we really
>> do need to fix all of these bugs.
>>
>> The underlying problem is that there may be opened file handles, references
>> from elsewhere in the kernel or such to in-memory objects that are not
>> refcounted properly: referencing released memory is a no-go in kernel.
>>
>>>
>>> This RFC series:
>>> - Dynamically allocates media_device
>>> - This approach requires changes to drivers. It would be wise to not require
>>>   churn to driver code and fix the problem in media-core.
>>>
>>> Do you have information on the problems that still remain with the above 
>>> fixes
>>> in place? These fixes went into 4.8 is I recall correctly. Could you please
>>> send us the list of problems and dmesg for the problems you found with the
>>> above fixes and how this RFC series addresses them.
>>
>> Just try removing a device when it's streaming. No more than that is needed.
>>
>> This is one of the bugs fixed by the patchset, albeit drivers do need to be
>> changed as well to benefit from the changes. 
>>
>>>
>>> Can these problems be fixed without needing to change the approach in the
>>> reverted patches?
>>
>> I don't think it's feasible, really. Besides, the workaround were rather
>> ugly and were merged only since there was a said urgency to have a partial
>> fix early. See above as well.
>>
>>>
>>> I have a patch series on top of the fixes this RFC series is reverting
>>> to allocate media_device only in the cases where sharing media device
>>> is necessary. e.g: au0828 and snd-usb-audio.
>>>
>>> Media Device Allocator 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-09 Thread Shuah Khan
On 11/08/2016 01:19 AM, Sakari Ailus wrote:
> Hi Shuah,
> 
> On Mon, Nov 07, 2016 at 01:16:45PM -0700, Shuah Khan wrote:
>> Hi Sakari,
>>
>> On 08/26/2016 05:43 PM, Sakari Ailus wrote:
>>> Hi folks,
>>>
>>> This is the third version of the RFC set to fix referencing in media
>>> devices.
>>>
>>> The lifetime of the media device (and media devnode) is now bound to that
>>> of struct device embedded in it and its memory is only released once the
>>> last reference is gone: unregistering is simply unregistering, it no
>>> longer should release memory which could be further accessed.
>>> 
>>> 
>>> A video node or a sub-device node also gets a reference to the media
>>> device, i.e. the release function of the video device node will release
>>> its reference to the media device. The same goes for file handles to the
>>> media device.
>>> 
>>> 
>>> As a side effect of this is that the media device, it is allocate together
>>> with the media devnode. The driver may also rely its own resources to the
>>> media device. Alternatively there's also a priv field to hold drivers
>>> private pointer (for container_of() is an option in this case). We could
>>> drop one of these options but currently both are possible.
>>> 
>>> 
>>> I've tested this by manually unbinding the omap3isp platform device while
>>> streaming. Driver changes are required for this to work; by not using
>>> dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
>>> supported. This is still unlikely to be a grave problem as there are not
>>> that many device drivers that support physically removable devices. We've
>>> had this problem for other devices for many years without paying much
>>> notice --- that doesn't mean I don't think at least drivers for removable
>>> devices shouldn't be changed as part of the set later on, I'd just like to
>>> get review comments on the approach first.
>>> 
>>> 
>>> The three patches that originally partially resolved some of these issues
>>> are reverted in the beginning of the set. I'm still posting this as an RFC
>>> mainly since the testing is somewhat limited so far.
>>
>> The main difference between the approach taken in these 3 reverted fixes and
>> this RFC series is as follows:
>>
>> Reverted fixes:
>> - Fix the lifetime problem with the media devnode by dynamically allocating
>>   devnode instead of media_device. One of the main considerations to this
>>   approach is to isolate the changes in media core and avoid changes to
>>   drivers.
>> - I tested these fixes extensively and added selftests and README file for
>>   the regression tests. I haven't seen any problems after these fixes went
>>   in while physically removing au0828 device, em028xx, and uvcvideo
> 
> I'd rather call them workarounds, as they do work around the issues rather
> than properly fixing them. This approach isn't really extensible to fix the
> remaining problems either. It is true that *some* of the issues that were
> present before these patches do not show up anymore with them, but we really
> do need to fix all of these bugs.
> 
> The underlying problem is that there may be opened file handles, references
> from elsewhere in the kernel or such to in-memory objects that are not
> refcounted properly: referencing released memory is a no-go in kernel.
> 
>>
>> This RFC series:
>> - Dynamically allocates media_device
>> - This approach requires changes to drivers. It would be wise to not require
>>   churn to driver code and fix the problem in media-core.
>>
>> Do you have information on the problems that still remain with the above 
>> fixes
>> in place? These fixes went into 4.8 is I recall correctly. Could you please
>> send us the list of problems and dmesg for the problems you found with the
>> above fixes and how this RFC series addresses them.
> 
> Just try removing a device when it's streaming. No more than that is needed.
> 
> This is one of the bugs fixed by the patchset, albeit drivers do need to be
> changed as well to benefit from the changes. 
> 
>>
>> Can these problems be fixed without needing to change the approach in the
>> reverted patches?
> 
> I don't think it's feasible, really. Besides, the workaround were rather
> ugly and were merged only since there was a said urgency to have a partial
> fix early. See above as well.
> 
>>
>> I have a patch series on top of the fixes this RFC series is reverting
>> to allocate media_device only in the cases where sharing media device
>> is necessary. e.g: au0828 and snd-usb-audio.
>>
>> Media Device Allocator API
>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
>> 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-08 Thread Sakari Ailus
Hi Shuah,

On Mon, Nov 07, 2016 at 01:16:45PM -0700, Shuah Khan wrote:
> Hi Sakari,
> 
> On 08/26/2016 05:43 PM, Sakari Ailus wrote:
> > Hi folks,
> > 
> > This is the third version of the RFC set to fix referencing in media
> > devices.
> > 
> > The lifetime of the media device (and media devnode) is now bound to that
> > of struct device embedded in it and its memory is only released once the
> > last reference is gone: unregistering is simply unregistering, it no
> > longer should release memory which could be further accessed.
> > 
> > 
> > A video node or a sub-device node also gets a reference to the media
> > device, i.e. the release function of the video device node will release
> > its reference to the media device. The same goes for file handles to the
> > media device.
> > 
> > 
> > As a side effect of this is that the media device, it is allocate together
> > with the media devnode. The driver may also rely its own resources to the
> > media device. Alternatively there's also a priv field to hold drivers
> > private pointer (for container_of() is an option in this case). We could
> > drop one of these options but currently both are possible.
> > 
> > 
> > I've tested this by manually unbinding the omap3isp platform device while
> > streaming. Driver changes are required for this to work; by not using
> > dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
> > supported. This is still unlikely to be a grave problem as there are not
> > that many device drivers that support physically removable devices. We've
> > had this problem for other devices for many years without paying much
> > notice --- that doesn't mean I don't think at least drivers for removable
> > devices shouldn't be changed as part of the set later on, I'd just like to
> > get review comments on the approach first.
> > 
> > 
> > The three patches that originally partially resolved some of these issues
> > are reverted in the beginning of the set. I'm still posting this as an RFC
> > mainly since the testing is somewhat limited so far.
> 
> The main difference between the approach taken in these 3 reverted fixes and
> this RFC series is as follows:
> 
> Reverted fixes:
> - Fix the lifetime problem with the media devnode by dynamically allocating
>   devnode instead of media_device. One of the main considerations to this
>   approach is to isolate the changes in media core and avoid changes to
>   drivers.
> - I tested these fixes extensively and added selftests and README file for
>   the regression tests. I haven't seen any problems after these fixes went
>   in while physically removing au0828 device, em028xx, and uvcvideo

I'd rather call them workarounds, as they do work around the issues rather
than properly fixing them. This approach isn't really extensible to fix the
remaining problems either. It is true that *some* of the issues that were
present before these patches do not show up anymore with them, but we really
do need to fix all of these bugs.

The underlying problem is that there may be opened file handles, references
from elsewhere in the kernel or such to in-memory objects that are not
refcounted properly: referencing released memory is a no-go in kernel.

> 
> This RFC series:
> - Dynamically allocates media_device
> - This approach requires changes to drivers. It would be wise to not require
>   churn to driver code and fix the problem in media-core.
> 
> Do you have information on the problems that still remain with the above fixes
> in place? These fixes went into 4.8 is I recall correctly. Could you please
> send us the list of problems and dmesg for the problems you found with the
> above fixes and how this RFC series addresses them.

Just try removing a device when it's streaming. No more than that is needed.

This is one of the bugs fixed by the patchset, albeit drivers do need to be
changed as well to benefit from the changes. 

> 
> Can these problems be fixed without needing to change the approach in the
> reverted patches?

I don't think it's feasible, really. Besides, the workaround were rather
ugly and were merged only since there was a said urgency to have a partial
fix early. See above as well.

> 
> I have a patch series on top of the fixes this RFC series is reverting
> to allocate media_device only in the cases where sharing media device
> is necessary. e.g: au0828 and snd-usb-audio.
> 
> Media Device Allocator API
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html
> 
> This series has been 

Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed

2016-11-07 Thread Shuah Khan
Hi Sakari,

On 08/26/2016 05:43 PM, Sakari Ailus wrote:
> Hi folks,
> 
> This is the third version of the RFC set to fix referencing in media
> devices.
> 
> The lifetime of the media device (and media devnode) is now bound to that
> of struct device embedded in it and its memory is only released once the
> last reference is gone: unregistering is simply unregistering, it no
> longer should release memory which could be further accessed.
>   
>   
> A video node or a sub-device node also gets a reference to the media
> device, i.e. the release function of the video device node will release
> its reference to the media device. The same goes for file handles to the
> media device.
>   
>   
> As a side effect of this is that the media device, it is allocate together
> with the media devnode. The driver may also rely its own resources to the
> media device. Alternatively there's also a priv field to hold drivers
> private pointer (for container_of() is an option in this case). We could
> drop one of these options but currently both are possible.
>   
>   
> I've tested this by manually unbinding the omap3isp platform device while
> streaming. Driver changes are required for this to work; by not using
> dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
> supported. This is still unlikely to be a grave problem as there are not
> that many device drivers that support physically removable devices. We've
> had this problem for other devices for many years without paying much
> notice --- that doesn't mean I don't think at least drivers for removable
> devices shouldn't be changed as part of the set later on, I'd just like to
> get review comments on the approach first.
>   
>   
> The three patches that originally partially resolved some of these issues
> are reverted in the beginning of the set. I'm still posting this as an RFC
> mainly since the testing is somewhat limited so far.

The main difference between the approach taken in these 3 reverted fixes and
this RFC series is as follows:

Reverted fixes:
- Fix the lifetime problem with the media devnode by dynamically allocating
  devnode instead of media_device. One of the main considerations to this
  approach is to isolate the changes in media core and avoid changes to
  drivers.
- I tested these fixes extensively and added selftests and README file for
  the regression tests. I haven't seen any problems after these fixes went
  in while physically removing au0828 device, em028xx, and uvcvideo

This RFC series:
- Dynamically allocates media_device
- This approach requires changes to drivers. It would be wise to not require
  churn to driver code and fix the problem in media-core.

Do you have information on the problems that still remain with the above fixes
in place? These fixes went into 4.8 is I recall correctly. Could you please
send us the list of problems and dmesg for the problems you found with the
above fixes and how this RFC series addresses them.

Can these problems be fixed without needing to change the approach in the
reverted patches?

I have a patch series on top of the fixes this RFC series is reverting
to allocate media_device only in the cases where sharing media device
is necessary. e.g: au0828 and snd-usb-audio.

Media Device Allocator API
https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html

This series has been reviewed. The work I did to change snd-usb-audio to
use Media Controller API to coordinate access to resources with au0828
is dependent on the above patch series.

snip

> The to-do list includes changes to drivers that can be physically removed.
> Drivers not using the new API can mostly ignore these changes, albeit
> media_device_init() now grabs a reference to struct device of the media
> device which must be released.
> 

As I mentioned earlier, requiring changes to drivers there by exposing
the fix to all drivers is a problem with this RFC series. I would like
to understand the reasons why the current approach to allocate media
devnode and limit the changes to media-ocre doesn't work and also the
reasons why problems if any can't be fixed on top of these fixes.

I have a patch series on top of the fixes this RFC series is reverting
to allocate media_device only in the cases where sharing media device
is necessary. e.g: au0828 and snd-usb-audio.

Media Device Allocator API
https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html