On Sun, Aug 02, 2020 at 07:22:26PM -0300, Jason Gunthorpe wrote:
> On Fri, Jul 31, 2020 at 02:11:22PM -0700, Rustam Kovhaev wrote:
> 
> > IB roce driver receives NETDEV_UNREGISTER event, calls dev_hold() and
> > schedules work item to execute, and before wq gets a chance to complete
> > it, we return to ip_tunnel.c:274 and call free_netdev(), and then later
> > we get UAF when scheduled function references already freed net_device
> > 
> > i added verbose logging to ip_tunnel.c to see pcpu_refcnt:
> > +       pr_info("about to free_netdev(dev) dev->pcpu_refcnt %d", 
> > netdev_refcnt_read(dev));
> > 
> > and got the following:
> > [  410.220127][ T2944] ip_tunnel: about to free_netdev(dev) 
> > dev->pcpu_refcnt 8
> 
> I think there is a missing call to netdev_wait_allrefs() in the
> rollback_registered_many().
calling it there leads to rtnl deadlock, i think we should call
net_set_todo(), so that later when we call rtnl_unlock() it will
execute netdev_run_todo() and there it will proceed to calling
netdev_wait_allrefs(), but in ip tunnel i will need get
free_netdev() to be called after we unlock rtnl mutex
i'll try to send a new patch for review

> The normal success flow has this wait after delivering
> NETDEV_UNREGISTER, the error unwind for register_netdevice should as
> well.
> 
> If the netdevice can progress to free while a dev_hold is active I
> think it means dev_hold is functionally useless.
good point

Reply via email to