Hi, The new patch is fine by me.
Tested several dozens restarts of our proprietary application without apparent problem. FYI, Elad. בתאריך יום ו׳, 9 באפר׳ 2021, 17:56, מאת Ferruh Yigit < ferruh.yi...@intel.com>: > On 3/29/2021 3:36 PM, Ferruh Yigit wrote: > > KNI runs userspace callback with rtnl lock held, this is not working > > fine with some devices that needs to interact with kernel interface in > > the callback, like Mellanox devices. > > > > The solution is releasing the rtnl lock before calling the userspace > > callback. But it requires two consideration: > > > > 1. The rtnl lock needs to released before 'kni->sync_lock', otherwise it > > causes deadlock with multiple KNI devices, please check below the A. > > for the details of the deadlock condition. > > > > 2. When rtnl lock is released for interface down event, it cause a > > regression and deadlock, so can't release the rtnl lock for interface > > down event, please check below B. for the details. > > > > As a solution, interface down event is handled asynchronously and for > > all other events rtnl lock is released before processing the callback. > > > > A. KNI sync lock is being locked while rtnl is held. > > If two threads are calling kni_net_process_request() , > > then the first one will take the sync lock, release rtnl lock then sleep. > > The second thread will try to lock sync lock while holding rtnl. > > The first thread will wake, and try to lock rtnl, resulting in a > > deadlock. The remedy is to release rtnl before locking the KNI sync > > lock. > > Since in between nothing is accessing Linux network-wise, no rtnl > > locking is needed. > > > > B. There is a race condition in __dev_close_many() processing the > > close_list while the application terminates. > > It looks like if two KNI interfaces are terminating, > > and one releases the rtnl lock, the other takes it, > > updating the close_list in an unstable state, > > causing the close_list to become a circular linked list, > > hence list_for_each_entry() will endlessly loop inside > > __dev_close_many() . > > > > To summarize: > > request != interface down : unlock rtnl, send request to user-space, > > wait for response, send the response error code to caller in user-space. > > > > request == interface down: send request to user-space, return immediately > > with error code of 0 (success) to user-space. > > > > Fixes: 3fc5ca2f6352 ("kni: initial import") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Elad Nachman <ela...@gmail.com> > > --- > > Cc: Stephen Hemminger <step...@networkplumber.org> > > Cc: Igor Ryzhov <iryz...@nfware.com> > > Cc: Dan Gora <d...@adax.com> > > > > Hi Elad, Igor, > > Can you please review/test this set when you have time? > > Thanks, > ferruh > >