> On 23-Jun-21 12:00 PM, Ananyev, Konstantin wrote: > > > >>>>> > >>>>>> Previously, the semantics of power monitor were such that we were > >>>>>> checking current value against the expected value, and if they matched, > >>>>>> then the sleep was aborted. This is somewhat inflexible, because it > >>>>>> only > >>>>>> allowed us to check for a specific value. > >>>>>> > >>>>>> This commit adds an option to reverse the check, so that we can have > >>>>>> monitor sleep aborted if the expected value *doesn't* match what's in > >>>>>> memory. This allows us to both implement all currently implemented > >>>>>> driver code, as well as support more use cases which don't easily map > >>>>>> to > >>>>>> previous semantics (such as waiting on writes to AF_XDP counter value). > >>>>>> > >>>>>> Since the old behavior is the default, no need to adjust existing > >>>>>> implementations. > >>>>>> > >>>>>> Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > >>>>>> --- > >>>>>> lib/eal/include/generic/rte_power_intrinsics.h | 4 ++++ > >>>>>> lib/eal/x86/rte_power_intrinsics.c | 5 ++++- > >>>>>> 2 files changed, 8 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/lib/eal/include/generic/rte_power_intrinsics.h > >>>>>> b/lib/eal/include/generic/rte_power_intrinsics.h > >>>>>> index dddca3d41c..1006c2edfc 100644 > >>>>>> --- a/lib/eal/include/generic/rte_power_intrinsics.h > >>>>>> +++ b/lib/eal/include/generic/rte_power_intrinsics.h > >>>>>> @@ -31,6 +31,10 @@ struct rte_power_monitor_cond { > >>>>>> * 4, or 8. Supplying any other value will > >>>>>> result in > >>>>>> * an error. > >>>>>> */ > >>>>>> + uint8_t invert; /**< Invert check for expected value (e.g. > >>>>>> instead of > >>>>>> + * checking if `val` matches something, check > >>>>>> if > >>>>>> + * `val` *doesn't* match a particular value) > >>>>>> + */ > >>>>>> }; > >>>>>> > >>>>>> /** > >>>>>> diff --git a/lib/eal/x86/rte_power_intrinsics.c > >>>>>> b/lib/eal/x86/rte_power_intrinsics.c > >>>>>> index 39ea9fdecd..5d944e9aa4 100644 > >>>>>> --- a/lib/eal/x86/rte_power_intrinsics.c > >>>>>> +++ b/lib/eal/x86/rte_power_intrinsics.c > >>>>>> @@ -117,7 +117,10 @@ rte_power_monitor(const struct > >>>>>> rte_power_monitor_cond *pmc, > >>>>>> const uint64_t masked = cur_value & pmc->mask; > >>>>>> > >>>>>> /* if the masked value is already matching, abort */ > >>>>>> - if (masked == pmc->val) > >>>>>> + if (!pmc->invert && masked == pmc->val) > >>>>>> + goto end; > >>>>>> + /* same, but for inverse check */ > >>>>>> + if (pmc->invert && masked != pmc->val) > >>>>>> goto end; > >>>>>> } > >>>>>> > >>>>> > >>>>> Hmm..., such approach looks too 'patchy'... > >>>>> Can we at least replace 'inver' with something like: > >>>>> enum rte_power_monitor_cond_op { > >>>>> _EQ, NEQ,... > >>>>> }; > >>>>> Then at least new comparions ops can be added in future. > >>>>> Even better I think would be to just leave to PMD to provide a > >>>>> comparison callback. > >>>>> Will make things really simple and generic: > >>>>> struct rte_power_monitor_cond { > >>>>> volatile void *addr; > >>>>> int (*cmp)(uint64_t val); > >>>>> uint8_t size; > >>>>> }; > >>>>> And then in rte_power_monitor(...): > >>>>> .... > >>>>> const uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size); > >>>>> if (pmc->cmp(cur_value) != 0) > >>>>> goto end; > >>>>> .... > >>>>> > >>>> > >>>> I like the idea of a callback, but these are supposed to be > >>>> intrinsic-like functions, so putting too much into them is contrary to > >>>> their goal, and it's going to make the API hard to use in simpler cases > >>>> (e.g. when we're explicitly calling rte_power_monitor as opposed to > >>>> letting the RX callback do it for us). For example, event/dlb code calls > >>>> rte_power_monitor explicitly. > >>> > >>> Good point, I didn't know that. > >>> Would be interesting to see how do they use it. > >> > >> To be fair, it should be possible to rewrite their code using a > >> callback. Perhaps adding a (void *) parameter for any custom data > >> related to the callback (because C doesn't have closures...), but > >> otherwise it should be doable, so the question isn't that it's > >> impossible to rewrite event/dlb code to use callbacks, it's more of an > >> issue with complicating usage of already-not-quite-straightforward API > >> even more. > >> > >>> > >>>> > >>>> It's going to be especially "fun" to do these indirect function calls > >>>> from inside transactional region on call to multi-monitor. > >>> > >>> But the callback is not supposed to do any memory reads/writes. > >>> Just mask/compare of the provided value with some constant. > >> > >> Yeah, but with callbacks we can't really control that, can we? I mean i > >> guess a *sane* implementation wouldn't do that, but still, it's > >> theoretically possible to perform more complex checks and even touch > >> some unrelated data in the process. > > > > Yep, PMD developer can ignore recommendations and do whatever > > he wants in the call-back. We can't control it. > > If he touches some memory in it - probably there will be more spurious > > wakeups and less power saves. > > In principle it is the same with all other PMD dev-ops - we have to trust > > that they are > > doing what they have to. > > I did a quick prototype for this, and i don't think it is going to work. > > Callbacks with just "current value" as argument will be pretty limited > and will only really work for cases where we know what we are expecting. > However, for cases like event/dlb or net/mlx5, the expected value is (or > appears to be) dependent upon some internal device data, and is not > constant like in case of net/ixgbe for example. > > This can be fixed by passing an opaque pointer, either by storing it in > the monitor condition, or by passing it directly to rte_power_monitor at > invocation time. > > The latter doesn't work well because when we call rte_power_monitor from > inside the rte_power library, we lack the context necessary to get said > opaque pointer. > > The former doesn't work either, because the only place where we can get > this argument is inside get_monitor_addr, but the opaque pointer must > persist after we exit that function in order to avoid use-after-free - > which means that it either has to be statically allocated (which means > it's not thread-safe for a non-trivial case), or dynamically allocated > (which a big no-no on a hotpath).
If I get you right, expected_value (and probably mask) can be variable ones. So for callback approach to work we need to pass all this as parameters to PMD comparison callback: int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask) Correct? > > Any other suggestions? :) > > > > >> > >>> > >>>> I'm not > >>>> opposed to having a callback here, but maybe others have more thoughts > >>>> on this? > >>>> > >>>> -- > >>>> Thanks, > >>>> Anatoly > >> > >> > >> -- > >> Thanks, > >> Anatoly > > > -- > Thanks, > Anatoly