Hi Serhei,
On Thu, Apr 9, 2026 at 3:55 PM Serhei Makarov <[email protected]> wrote:
>
> Some comments as I prepare the patchset for re-submission.
>
> On Sun, Apr 5, 2026, at 8:22 PM, Aaron Merey wrote:
> >> + const int *regs_mapping, size_t
> >> n_regs_mapping,
> >> + ebl_tid_registers_t *setfunc,
> >> + void *arg)
> >> +{
> >> +/* TODO(REVIEW): The #ifdef here seems strictly optional as we don't
> >> + refer to perf_events or ptrace arch-specific declarations. */
> >
> > This builds for me on x86_64 with the #if !defined(__arch64__) stub
> > removed. If we wanted to support reading aarch64 samples on a
> > non-aarch64 host then we don't want the stub in any case.
> Sounds good, removing the #ifdef and testing on x86 to confirm.
>
> >> + /* TODO(REVIEW): May need to obtain PAC mask since the unwinder
> >> + needs to strip it from LR/X30 to handle pointer
> >> + authentication. */
> >
> > It looks like PAC support was added to aarch64_initreg.c recently in
> > commit 52a747a316. I'm not familiar with the details of aarch64 PAC
> > but it sounds like a mask may need to be applied to sample register
> > values in some cases.
> Unfortunately, perf_events does not make working with PAC easy
> -- we need an additional operation (ptrace query?) to get the PAC mask
> and then the information needs to be provided to libebl somehow.
> Probably out of scope for this patchset given the API design questions
> involved.
>
> It may also be possible to handle most cases by stripping top 16 bits,
> but I'm not confident enough in this heuristic to recommend it.
>
> The CentOS Stream systems I've used for testing don't seem to require
> handling PAC, so I will leave the TODO in place for this release,
> and return to this issue when I find a test system for which having
> the functionality matters.
Ack.
>
> >> + /* TODO: Register locations could be cached and rechecked on a
> >> + fastpath without needing to loop, though the overhead reduction
> >> + is minimal. */
> >
> > That's fine to leave for a future patch.
> Exactly, I don't see any performance issues when testing,
> it's more that the redundancy 'feels' improvable in future.
>
> >> + int j, need_sp = (sp != NULL), need_pc = (pc != NULL);
> >> + for (j = 0; (need_sp || need_pc) && n_regs_mapping > (uint32_t)j; j++)
> >> + {
> >> + if (n_regs < (uint32_t)j) break;
> >
> > I think this should be <= instead of <. Otherwise regs[n_regs] could
> > be read which is one past the end.
> Agreed, fixing.
>
> >> - /* If sample_perf_regs_mapping is unsupported then PERF_FRAME_REGS_MASK
> >> is zero. */
> >> - assert (ebl->sample_perf_regs_mapping != NULL);
> >> - return ebl->sample_perf_regs_mapping (ebl, perf_regs_mask, abi,
> >> - regs_mapping, n_regs_mapping);
> >> + /* If sample_perf_regs_mapping is unsupported then perf_frame_regs_mask
> >> is zero. */
> >> + assert (ebl->perf_frame_regs_mask != 0);
> >
> > Can we replace this assert with error handling? We have been
> > gradually removing asserts from elfutils libraries.
> Ok, your call. I can have the function return false instead.
> Existing handler paths set DWFL_E_LIBEBL_BAD with message
> "Internal error due to ebl", which is not too different from an
> assertion failure from user's PoV -- the cause is attempting to
> use the functionality on an architecture that doesn't support it.
>
> (There's also DWFL_E_LIBEBL which has a much more confusing
> message associated with it. Sticking with DWFL_E_LIBEBL_BAD.)
Ok thanks for adjusting.
>
> >> + if (perf_regs_mask != 0 && ebl->cached_perf_regs_mask == perf_regs_mask)
> >
> > Does cached_perf_regs_mask ever get set on ARM arches beyond the
> > default value of 0? It's set in x86_initreg_sample.c but I don't see
> > an ARM equivilent.
>
> Thanks for the catch.
> perf_regs_mask is provided by the user.
> cached_perf_regs_mask should be set directly below in the current function
> -- its nonzero value is an indication that a mapping has been cached.
> Adding the missing assignment.
>
> For clarity, I think the conditional should be:
>
> if (ebl->cached_perf_regs_mask != 0
> && ebl->cached_perf_regs_mask == perf_regs_mask)
>
> i.e. if a mapping has been cached and corresponds to
> the perf_regs_mask the user is requesting, return existing data.
> (Otherwise, proceed to compute a new mapping.)
Ack.
>
> >> + {
> >> + *regs_mapping = ebl->cached_regs_mapping;
> >> + *n_regs_mapping = ebl->cached_n_regs_mapping;
> >> + return true;
> >> + }
> >> +
> >> + /* XXX Unwind-relevant register file should be no bigger than this: */
> >> + int count = 64;
> >> +
> >> + ebl->cached_regs_mapping = (int *)calloc (count, sizeof(int));
> >
> > Can this code ever be reached if ebl->cached_regs_mapping isn't NULL?
> > If so then the previous cached_regs_mapping should be freed.
> Good point, providing packets with different perf_regs_mask triggers
> replacement of the cached mapping, in which case the previous one
> should be freed.
>
> >> + ebl->cached_n_regs_mapping = count;
> >> +
> >> + int j, k; uint64_t bit;
> >> + for (j = 0, k = 0, bit = 1;
> >> + k < count; k++, bit <<= 1)
> >> + {
> >> + ebl->cached_regs_mapping[k] = -1;
> >> + if ((bit & perf_regs_mask)) {
> >> + ebl->cached_regs_mapping[j] = k;
> >> + j++;
> >> + }
> >> + }
> >
> > If I understand this correctly then the length of
> > ebl->cached_regs_mapping is always 64 and only the first j entries are
> > meaningful. In x86_initreg_sample.c it looks like the
> > cached_regs_mapping has only as many entries as needed. Would that
> > approach work here?
> No, because this code is architecture-independent
> -- it handles cases where perf_events and dwarf use
> an identical ordering for the register file.
> 64 is a reasonable upper bound, we would need to add
> another constant to the ebl to track what the lower-bound
> would be for each arch (seems like fussy extra optimization
> for this case).
>
> The mapping based on perf_regs_mask is still necessary
> to compute because perf_events may only be providing a subset
> of the register file.
Ok sounds reasonable.
>
> >> +/* Called by the initialization functions for backends which support
> >> + hook sample_perf_regs_mapping(). */
> >> +void
> >> +internal_function
> >> +__libebl_init_cached_regs_mapping (Ebl *eh)
> >> +{
> >> + eh->cached_perf_regs_mask = 0;
> >> + eh->cached_regs_mapping = NULL;
> >> + eh->cached_n_regs_mapping = -1;
> >
> > cached_n_regs_mapping is a size_t so this actually sets it to
> > SIZE_MAX. I would use just use SIZE_MAX here to make this clear and to
> > avoid triggering warnings. Does code related to cached_n_regs_mapping
> > account for SIZE_MAX being the invalid/unset marker? Another option is
> > to use 0 plus a NULL cached_regs_mapping to indicate invalid/unset.
> I would assign SIZE_MAX. The unset marker is actually cached_perf_regs_mask
> as discussed above, so it doesn't matter what eh->cached_n_regs_mapping is
> as long as it's clearly invalid.
Ack.
Aaron
>
> -- Serhei
>