On Mon, Oct 4, 2021 at 7:05 PM Ferruh Yigit <ferruh.yi...@intel.com> wrote:
> On 10/4/2021 3:58 PM, Elad Nachman wrote: > > בתאריך יום ב׳, 4 באוק׳ 2021, 17:51, מאת Ferruh Yigit < > > ferruh.yi...@intel.com>: > > > >> On 10/4/2021 3:25 PM, Elad Nachman wrote: > >> > >> Can you please try to not top post, it will make impossible to follow > this > >> discussion later from the mail archives. > >> > >>> 1. Userspace will get an error > >> > >> So there is nothing special with returning '-EAGAIN', user will only > >> observe an > >> error. > >> Wasn't initial intention to use '-EAGAIN' to try request again? > >> > > To signal user-space to retry the operation. > > > > Not sure if it will reach to the end user. If user is calling "ifconfig > <iface> > down", it will just fail right, it won't recognize the error type. > > Unless this is common usage by the Linux network drivers, having this > usage in > KNI won't help much. I am for handling this in the kernel side if we can. > > If user calls ifconfig <iface> down it will not happen. It requires some multi-core race condition only Eric can recreate. > >> > >>> 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl > unlocked > >>> for interface down command causes a crash because of a race condition > in > >>> the device delete/unregister list in the kernel. > >>> > >> > >> Why waiting with rthnl lock causes a deadlock? As said below we are > already > >> doing it, why it is different with retry logic? > >> > > Because it can be interface down request. > > > > (sure you like short answers) > > Please help me to see why "interface down" is special. Isn't it point of > your > patch to wait the request execution in the userspace even it is an async > request? > > And yet again, number of retry can be limited. > > No, it is not. Please look again: https://patches.dpdk.org/project/dpdk/patch/20210924105409.21711-1-ela...@gmail.com/ > > > > >> I agree to not wait with rtnl unlocked. > >> > >>> FYI, > >>> > >>> Elad. > >>> > >>> בתאריך יום ב׳, 4 באוק׳ 2021, 17:13, מאת Ferruh Yigit < > >>> ferruh.yi...@intel.com>: > >>> > >>>> On 10/4/2021 2:09 PM, Elad Nachman wrote: > >>>>> Hi, > >>>>> > >>>>> EAGAIN is propogated back to the kernel and to the caller. > >>>>> > >>>> > >>>> So will the user get an error, or it will be handled by the kernel and > >>>> retried? > >>>> > >>>>> We cannot retry from the kni kernel module since we hold the rtnl > lock. > >>>>> > >>>> > >>>> Why not? We are already waiting until a command time out, like > >>>> 'kni_net_open()' > >>>> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we can > >>>> limit the > >>>> number of retry for safety. > >>>> > >>>>> FYI, > >>>>> > >>>>> Elad > >>>>> > >>>>> בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit < > >>>>> ferruh.yi...@intel.com>: > >>>>> > >>>>>> On 9/24/2021 11:54 AM, Elad Nachman wrote: > >>>>>>> Fix lack of multiple KNI requests handling support by introducing a > >>>>>>> request in progress flag which will fail additional requests with > >>>>>>> EAGAIN return code if the original request has not been processed > >>>>>>> by user-space. > >>>>>>> > >>>>>>> Bugzilla ID: 809 > >>>>>> > >>>>>> Hi Eric, > >>>>>> > >>>>>> Can you please test this patch, if it solves the issue you reported? > >>>>>> > >>>>>>> > >>>>>>> Signed-off-by: Elad Nachman <ela...@gmail.com> > >>>>>>> --- > >>>>>>> kernel/linux/kni/kni_net.c | 9 +++++++++ > >>>>>>> lib/kni/rte_kni.c | 2 ++ > >>>>>>> lib/kni/rte_kni_common.h | 1 + > >>>>>>> 3 files changed, 12 insertions(+) > >>>>>>> > >>>>>> > >>>>>> <...> > >>>>>> > >>>>>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device > *dev, > >>>>>> struct rte_kni_request *req) > >>>>>>> > >>>>>>> mutex_lock(&kni->sync_lock); > >>>>>>> > >>>>>>> + /* Check that existing request has been processed: */ > >>>>>>> + cur_req = (struct rte_kni_request *)kni->sync_kva; > >>>>>>> + if (cur_req->req_in_progress) { > >>>>>>> + ret = -EAGAIN; > >>>>>> > >>>>>> Overall logic in the KNI looks good to me, this helps to serialize > the > >>>>>> requests > >>>>>> even for async ones. > >>>>>> > >>>>>> But can you please clarify how it behaves in the kernel side with > >>>> '-EAGAIN' > >>>>>> return type? Will linux call the ndo again, or will it just fail. > >>>>>> > >>>>>> If it just fails should we handle the re-try on '-EAGAIN' within the > >> kni > >>>>>> module? > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> Elad. > >