Hi Matan, Sorry for the delay on this.
On Wed, Dec 20, 2017 at 10:58:29AM +0000, Matan Azrad wrote: > Hi Gaetan > > > -----Original Message----- > > From: Gaëtan Rivet [mailto:gaetan.ri...@6wind.com] > > Sent: Wednesday, December 20, 2017 12:22 AM > > To: Matan Azrad <ma...@mellanox.com> > > Cc: Adrien Mazarguil <adrien.mazarg...@6wind.com>; Thomas Monjalon > > <tho...@monjalon.net>; dev@dpdk.org > > Subject: Re: [PATCH v3 6/6] net/failsafe: fix removed device handling > > > > Hi Matan, > > > > On Tue, Dec 19, 2017 at 05:10:15PM +0000, Matan Azrad wrote: > > > There is time between the physical removal of the device until > > > sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and > > > applications still don't know about the removal and may call > > > sub-device control operation which should return an error. > > > > > > In previous code this error is reported to the application contrary to > > > fail-safe principle that the app should not be aware of device removal. > > > > > > Add an removal check in each relevant control command error flow and > > > prevent an error report to application when the sub-device is removed. > > > > > > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD") > > > Fixes: b737a1e ("net/failsafe: support flow API") As stated previously, please do not include those fixes lines. > > > > > > Signed-off-by: Matan Azrad <ma...@mellanox.com> > > > --- > > > > <snip> > > > > > +/* > > > + * Check if error should be reported to the user. > > > + */ > > > +static inline bool > > > +fs_is_error(struct sub_device *sdev, int err) { > > > + /* A device removal shouldn't be reported as an error. */ > > > + if (err == 0 || sdev->remove == 1 || err == -EIO) > > > + return false; > > > + return true; > > > +} > > > > This is better, thanks. > > > > However is there a reason you did not follow the same pattern as ethdev > > with eth_err? I see the two functions as similar in their intent, making > > them > > close to each other would be clearer to a reader being familiar with the > > ethdev API and that would be interested in fail-safe. > > > > What do you think? > > > > I think that there is a real different between eth_err function to > fs_is_error: > ethdev uses eth_err function to adjust removal return value to be -EIO. > fail-safe uses fs_is_error function to check if an error should be reported > to the user to save the fail-safe principle that the app should not be aware > of device removal - this is the main idea that also causes me to change the > name from fs_is_removed to fs_is_error. I would have preferred if it followed the same pattern as ethdev (that function be used to adjust the return value, not performing a flag check). While better on its own, the pattern: if (fs_is_error(sdev, err)) { ERROR("xxxx"); return err; } is dangerous, as then the author is forbidden from returning err, assuming err could be -EIO. He or she would be forced to return an explicit "0". To be clear, here would be an easy mistake to do: if (fs_is_error(sdev, err)) { ERROR("xxxx"); } return err; And this kind of code-flow is not unusual, or even unwanted. I dislike having this kind of implicit rule derived from using a helper such as fs_is_error(). The alternative if ((err = fs_err(sdev, err))) { ERROR("xxxx"); return err; } Forces the value err to be set to the correct one. This mistake can already be found in your patch: > @@ -150,7 +150,7 @@ > continue; > local_ret = rte_flow_destroy(PORT_ID(sdev), > flow->flows[i], error); > - if (local_ret) { > + if (fs_is_error(sdev, local_ret)) { > ERROR("Failed to destroy flow on sub_device %d: %d", > i, local_ret); > if (ret == 0) Your environment does not include the function, but this is within fs_flow_destroy (please update to include the context by the way it helps a lot the review :). Afterward, line 162 ret is directly used as return value. Also, fs_err() would need to transform rte_errno when relevant (mostly in failsafe_flow.c I think). This is the kind of subtlety that needs to be avoided when designing APIs, even internal ones. This will induce errors afterward and complicate the maintenance of the codebase. Best regards, -- Gaëtan Rivet 6WIND