On 6/29/2018 2:55 AM, Dan Gora wrote:
> Currently the rte_kni kernel driver suffers from a problem where
> when the interface is released, it generates a callback to the DPDK
> application to change the interface state to Down. However, after the
> DPDK application handles the callback and generates a response back to
> the kernel, the rte_kni driver cannot wake the thread which is asleep
> waiting for the response, because it is holding the kni_link_lock
> semaphore and it has already removed the 'struct kni_dev' from the
> list of interfaces to poll for responses.
>
> This means that if the KNI interface is in the Up state when
> rte_kni_release() is called, it will always sleep for three seconds
> until kni_net_release gives up waiting for a response from the DPDK
> application.
>
> To fix this, we must separate the step to release the kernel network
> interface from the steps to remove the KNI interface from the list
> of interfaces to poll.
>
> When the kernel network interface is removed with unregister_netdev(),
> if the interface is up, it will generate a callback to mark the
> interface down, which calls kni_net_release(). kni_net_release() will
> block waiting for the DPDK application to call rte_kni_handle_request()
> to handle the callback, but it also needs the thread in the KNI driver
> (either the per-dev thread for multi-thread or the per-driver thread)
> to call kni_net_poll_resp() in order to wake the thread sleeping in
> kni_net_release (actually kni_net_process_request()).
>
> So now, KNI interfaces should be removed as such:
>
> 1) The user calls rte_kni_release(). This only unregisters the
> netdev in the kernel, but touches nothing else. This allows all the
> threads to run which are necessary to handle the callback into the
> DPDK application to mark the interface down.
>
> 2) The user stops the thread running rte_kni_handle_request().
> After rte_kni_release() has been called, there will be no more
> callbacks for that interface so it is not necessary. It cannot be
> running at the same time that rte_kni_free() frees all of the FIFOs
> and DPDK memory for that KNI interface.
>
> 3) The user calls rte_kni_free(). This performs the RTE_KNI_IOCTL_FREE
> ioctl which calls kni_ioctl_free(). This function removes the struct
> kni_dev from the list of interfaces to poll (and kills the per-dev
> kthread, if configured for multi-thread), then frees the memory in
> the FIFOs.
>
> Signed-off-by: Dan Gora <[email protected]>
You are right, that problem exits.
Although I don't see problem related to holding the kni_list_lock, polling
thread terminated before unregister interface cause the problem.
And it has a reason to terminate polling thread first, because it uses device
resources.
Separating unregister and free steps looks good, but I am not sure if this
should be reflected to the user, with a new ioctl and API.
When user done with interface it calls rte_kni_release() to release them, does
user really need a rte_kni_free() API or need to know the difference of two, is
there any action to take in userspace between these two APIs? I think no.
What about keeping single rte_kni_release() API and solve the issue internally
in KNI?
Previously it was doing:
- Stop threads (also there is another single/multi thread error [1])
- kni_dev_remove()
- unregister and free netdev() [2]
- kni_net_release_fifo_phy() [3]
Instead internally can we do:
a- Unregister kernel interfaces, rte_kni_unregister()?
b- stop threads
c- kni_net_release_fifo_phy
d- free netdev
The challenge I can see is some time required between a) and b) to let userspace
app to response, we need a way to know response received before stopping the
thread.
Another thing is there are two release path, kni_release() and
kni_ioctl_release() both should be fixed.
[1]
If multi thread enabled they have been stopped, but if single thread used it has
not been stopped that is why you don't see the 3 seconds delay for default
single thread case, but not stopping the polling thread but removing the
interface is wrong.
[2]
unregistering netdev will trigger a userspace request but response won't be read
because polling thread also polls the response queue, and that thread is already
stopped at this stage.
[3]
This is also wrong as you have pointed in later patch in your series,
kni_net_release_fifo_phy() moves packets from rxq/alloq queue to free queue,
queues are still allocated but the references kept in kernel may be invalid at
this stage because of free netdev()