Re: [dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion

Ananyev, Konstantin Wed, 23 Jun 2021 06:27:45 -0700

> On 23-Jun-21 12:00 PM, Ananyev, Konstantin wrote:
> >
> >>>>>
> >>>>>> Previously, the semantics of power monitor were such that we were
> >>>>>> checking current value against the expected value, and if they matched,
> >>>>>> then the sleep was aborted. This is somewhat inflexible, because it 
> >>>>>> only
> >>>>>> allowed us to check for a specific value.
> >>>>>>
> >>>>>> This commit adds an option to reverse the check, so that we can have
> >>>>>> monitor sleep aborted if the expected value *doesn't* match what's in
> >>>>>> memory. This allows us to both implement all currently implemented
> >>>>>> driver code, as well as support more use cases which don't easily map 
> >>>>>> to
> >>>>>> previous semantics (such as waiting on writes to AF_XDP counter value).
> >>>>>>
> >>>>>> Since the old behavior is the default, no need to adjust existing
> >>>>>> implementations.
> >>>>>>
> >>>>>> Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
> >>>>>> ---
> >>>>>>     lib/eal/include/generic/rte_power_intrinsics.h | 4 ++++
> >>>>>>     lib/eal/x86/rte_power_intrinsics.c             | 5 ++++-
> >>>>>>     2 files changed, 8 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/lib/eal/include/generic/rte_power_intrinsics.h 
> >>>>>> b/lib/eal/include/generic/rte_power_intrinsics.h
> >>>>>> index dddca3d41c..1006c2edfc 100644
> >>>>>> --- a/lib/eal/include/generic/rte_power_intrinsics.h
> >>>>>> +++ b/lib/eal/include/generic/rte_power_intrinsics.h
> >>>>>> @@ -31,6 +31,10 @@ struct rte_power_monitor_cond {
> >>>>>>                            *   4, or 8. Supplying any other value will 
> >>>>>> result in
> >>>>>>                            *   an error.
> >>>>>>                            */
> >>>>>> +     uint8_t invert;  /**< Invert check for expected value (e.g. 
> >>>>>> instead of
> >>>>>> +                       *   checking if `val` matches something, check 
> >>>>>> if
> >>>>>> +                       *   `val` *doesn't* match a particular value)
> >>>>>> +                       */
> >>>>>>     };
> >>>>>>
> >>>>>>     /**
> >>>>>> diff --git a/lib/eal/x86/rte_power_intrinsics.c 
> >>>>>> b/lib/eal/x86/rte_power_intrinsics.c
> >>>>>> index 39ea9fdecd..5d944e9aa4 100644
> >>>>>> --- a/lib/eal/x86/rte_power_intrinsics.c
> >>>>>> +++ b/lib/eal/x86/rte_power_intrinsics.c
> >>>>>> @@ -117,7 +117,10 @@ rte_power_monitor(const struct 
> >>>>>> rte_power_monitor_cond *pmc,
> >>>>>>                  const uint64_t masked = cur_value & pmc->mask;
> >>>>>>
> >>>>>>                  /* if the masked value is already matching, abort */
> >>>>>> -             if (masked == pmc->val)
> >>>>>> +             if (!pmc->invert && masked == pmc->val)
> >>>>>> +                     goto end;
> >>>>>> +             /* same, but for inverse check */
> >>>>>> +             if (pmc->invert && masked != pmc->val)
> >>>>>>                          goto end;
> >>>>>>          }
> >>>>>>
> >>>>>
> >>>>> Hmm..., such approach looks too 'patchy'...
> >>>>> Can we at least replace 'inver' with something like:
> >>>>> enum rte_power_monitor_cond_op {
> >>>>>            _EQ, NEQ,...
> >>>>> };
> >>>>> Then at least new comparions ops can be added in future.
> >>>>> Even better I think would be to just leave to PMD to provide a 
> >>>>> comparison callback.
> >>>>> Will make things really simple and generic:
> >>>>> struct rte_power_monitor_cond {
> >>>>>         volatile void *addr;
> >>>>>         int (*cmp)(uint64_t val);
> >>>>>         uint8_t size;
> >>>>> };
> >>>>> And then in rte_power_monitor(...):
> >>>>> ....
> >>>>> const uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size);
> >>>>> if (pmc->cmp(cur_value) != 0)
> >>>>>            goto end;
> >>>>> ....
> >>>>>
> >>>>
> >>>> I like the idea of a callback, but these are supposed to be
> >>>> intrinsic-like functions, so putting too much into them is contrary to
> >>>> their goal, and it's going to make the API hard to use in simpler cases
> >>>> (e.g. when we're explicitly calling rte_power_monitor as opposed to
> >>>> letting the RX callback do it for us). For example, event/dlb code calls
> >>>> rte_power_monitor explicitly.
> >>>
> >>> Good point, I didn't know that.
> >>> Would be interesting to see how do they use it.
> >>
> >> To be fair, it should be possible to rewrite their code using a
> >> callback. Perhaps adding a (void *) parameter for any custom data
> >> related to the callback (because C doesn't have closures...), but
> >> otherwise it should be doable, so the question isn't that it's
> >> impossible to rewrite event/dlb code to use callbacks, it's more of an
> >> issue with complicating usage of already-not-quite-straightforward API
> >> even more.
> >>
> >>>
> >>>>
> >>>> It's going to be especially "fun" to do these indirect function calls
> >>>> from inside transactional region on call to multi-monitor.
> >>>
> >>> But the callback is not supposed to do any memory reads/writes.
> >>> Just mask/compare of the provided value with some constant.
> >>
> >> Yeah, but with callbacks we can't really control that, can we? I mean i
> >> guess a *sane* implementation wouldn't do that, but still, it's
> >> theoretically possible to perform more complex checks and even touch
> >> some unrelated data in the process.
> >
> > Yep, PMD developer can ignore recommendations and do whatever
> > he wants in the call-back. We can't control it.
> > If he touches some memory in it - probably there will be more spurious 
> > wakeups and less power saves.
> > In principle it is the same with all other PMD dev-ops - we have to trust 
> > that they are
> > doing what they have to.
> 
> I did a quick prototype for this, and i don't think it is going to work.
> 
> Callbacks with just "current value" as argument will be pretty limited
> and will only really work for cases where we know what we are expecting.
> However, for cases like event/dlb or net/mlx5, the expected value is (or
> appears to be) dependent upon some internal device data, and is not
> constant like in case of net/ixgbe for example.
> 
> This can be fixed by passing an opaque pointer, either by storing it in
> the monitor condition, or by passing it directly to rte_power_monitor at
> invocation time.
> 
> The latter doesn't work well because when we call rte_power_monitor from
> inside the rte_power library, we lack the context necessary to get said
> opaque pointer.
> 
> The former doesn't work either, because the only place where we can get
> this argument is inside get_monitor_addr, but the opaque pointer must
> persist after we exit that function in order to avoid use-after-free -
> which means that it either has to be statically allocated (which means
> it's not thread-safe for a non-trivial case), or dynamically allocated
> (which a big no-no on a hotpath).


If I get you right, expected_value (and probably mask) can be variable ones.
So for callback approach to work we need to pass all this as parameters
to PMD comparison callback:
int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
Correct? 

> 
> Any other suggestions? :)
> 
> >
> >>
> >>>
> >>>> I'm not
> >>>> opposed to having a callback here, but maybe others have more thoughts
> >>>> on this?
> >>>>
> >>>> --
> >>>> Thanks,
> >>>> Anatoly
> >>
> >>
> >> --
> >> Thanks,
> >> Anatoly
> 
> 
> --
> Thanks,
> Anatoly

Re: [dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion

Reply via email to