On Mon, Oct 4, 2021 at 7:05 PM Ferruh Yigit <ferruh.yi...@intel.com> wrote:

> On 10/4/2021 3:58 PM, Elad Nachman wrote:
> > בתאריך יום ב׳, 4 באוק׳ 2021, 17:51, מאת Ferruh Yigit ‏<
> > ferruh.yi...@intel.com>:
> >
> >> On 10/4/2021 3:25 PM, Elad Nachman wrote:
> >>
> >> Can you please try to not top post, it will make impossible to follow
> this
> >> discussion later from the mail archives.
> >>
> >>> 1. Userspace will get an error
> >>
> >> So there is nothing special with returning '-EAGAIN', user will only
> >> observe an
> >> error.
> >> Wasn't initial intention to use '-EAGAIN' to try request again?
> >>
> > To signal user-space to retry the operation.
> >
>
> Not sure if it will reach to the end user. If user is calling "ifconfig
> <iface>
> down", it will just fail right, it won't recognize the error type.
>
> Unless this is common usage by the Linux network drivers, having this
> usage in
> KNI won't help much. I am for handling this in the kernel side if we can.
>
>
If user calls ifconfig <iface> down it will not happen. It requires some
multi-core race condition only Eric can recreate.


> >>
> >>> 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl
> unlocked
> >>> for interface down command causes a crash because of a race condition
> in
> >>> the device delete/unregister list in the kernel.
> >>>
> >>
> >> Why waiting with rthnl lock causes a deadlock? As said below we are
> already
> >> doing it, why it is different with retry logic?
> >>
> > Because it can be interface down request.
> >
>
> (sure you like short answers)
>
> Please help me to see why "interface down" is special. Isn't it point of
> your
> patch to wait the request execution in the userspace even it is an async
> request?
>
> And yet again, number of retry can be limited.
>
>
No, it is not. Please look again:
https://patches.dpdk.org/project/dpdk/patch/20210924105409.21711-1-ela...@gmail.com/



>
> >
> >> I agree to not wait with rtnl unlocked.
> >>
> >>> FYI,
> >>>
> >>> Elad.
> >>>
> >>> בתאריך יום ב׳, 4 באוק׳ 2021, 17:13, מאת Ferruh Yigit ‏<
> >>> ferruh.yi...@intel.com>:
> >>>
> >>>> On 10/4/2021 2:09 PM, Elad Nachman wrote:
> >>>>> Hi,
> >>>>>
> >>>>> EAGAIN is propogated back to the kernel and to the caller.
> >>>>>
> >>>>
> >>>> So will the user get an error, or it will be handled by the kernel and
> >>>> retried?
> >>>>
> >>>>> We cannot retry from the kni kernel module since we hold the rtnl
> lock.
> >>>>>
> >>>>
> >>>> Why not? We are already waiting until a command time out, like
> >>>> 'kni_net_open()'
> >>>> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we can
> >>>> limit the
> >>>> number of retry for safety.
> >>>>
> >>>>> FYI,
> >>>>>
> >>>>> Elad
> >>>>>
> >>>>> בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit ‏<
> >>>>> ferruh.yi...@intel.com>:
> >>>>>
> >>>>>> On 9/24/2021 11:54 AM, Elad Nachman wrote:
> >>>>>>> Fix lack of multiple KNI requests handling support by introducing a
> >>>>>>> request in progress flag which will fail additional requests with
> >>>>>>> EAGAIN return code if the original request has not been processed
> >>>>>>> by user-space.
> >>>>>>>
> >>>>>>> Bugzilla ID: 809
> >>>>>>
> >>>>>> Hi Eric,
> >>>>>>
> >>>>>> Can you please test this patch, if it solves the issue you reported?
> >>>>>>
> >>>>>>>
> >>>>>>> Signed-off-by: Elad Nachman <ela...@gmail.com>
> >>>>>>> ---
> >>>>>>>  kernel/linux/kni/kni_net.c | 9 +++++++++
> >>>>>>>  lib/kni/rte_kni.c          | 2 ++
> >>>>>>>  lib/kni/rte_kni_common.h   | 1 +
> >>>>>>>  3 files changed, 12 insertions(+)
> >>>>>>>
> >>>>>>
> >>>>>> <...>
> >>>>>>
> >>>>>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device
> *dev,
> >>>>>> struct rte_kni_request *req)
> >>>>>>>
> >>>>>>>       mutex_lock(&kni->sync_lock);
> >>>>>>>
> >>>>>>> +     /* Check that existing request has been processed: */
> >>>>>>> +     cur_req = (struct rte_kni_request *)kni->sync_kva;
> >>>>>>> +     if (cur_req->req_in_progress) {
> >>>>>>> +             ret = -EAGAIN;
> >>>>>>
> >>>>>> Overall logic in the KNI looks good to me, this helps to serialize
> the
> >>>>>> requests
> >>>>>> even for async ones.
> >>>>>>
> >>>>>> But can you please clarify how it behaves in the kernel side with
> >>>> '-EAGAIN'
> >>>>>> return type? Will linux call the ndo again, or will it just fail.
> >>>>>>
> >>>>>> If it just fails should we handle the re-try on '-EAGAIN' within the
> >> kni
> >>>>>> module?
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >> Elad.
>
>

Reply via email to