Some comments as I prepare the patchset for re-submission.
On Sun, Apr 5, 2026, at 8:22 PM, Aaron Merey wrote:
>> + const int *regs_mapping, size_t
>> n_regs_mapping,
>> + ebl_tid_registers_t *setfunc,
>> + void *arg)
>> +{
>> +/* TODO(REVIEW): The #ifdef here seems strictly optional as we don't
>> + refer to perf_events or ptrace arch-specific declarations. */
>
> This builds for me on x86_64 with the #if !defined(__arch64__) stub
> removed. If we wanted to support reading aarch64 samples on a
> non-aarch64 host then we don't want the stub in any case.
Sounds good, removing the #ifdef and testing on x86 to confirm.
>> + /* TODO(REVIEW): May need to obtain PAC mask since the unwinder
>> + needs to strip it from LR/X30 to handle pointer
>> + authentication. */
>
> It looks like PAC support was added to aarch64_initreg.c recently in
> commit 52a747a316. I'm not familiar with the details of aarch64 PAC
> but it sounds like a mask may need to be applied to sample register
> values in some cases.
Unfortunately, perf_events does not make working with PAC easy
-- we need an additional operation (ptrace query?) to get the PAC mask
and then the information needs to be provided to libebl somehow.
Probably out of scope for this patchset given the API design questions
involved.
It may also be possible to handle most cases by stripping top 16 bits,
but I'm not confident enough in this heuristic to recommend it.
The CentOS Stream systems I've used for testing don't seem to require
handling PAC, so I will leave the TODO in place for this release,
and return to this issue when I find a test system for which having
the functionality matters.
>> + /* TODO: Register locations could be cached and rechecked on a
>> + fastpath without needing to loop, though the overhead reduction
>> + is minimal. */
>
> That's fine to leave for a future patch.
Exactly, I don't see any performance issues when testing,
it's more that the redundancy 'feels' improvable in future.
>> + int j, need_sp = (sp != NULL), need_pc = (pc != NULL);
>> + for (j = 0; (need_sp || need_pc) && n_regs_mapping > (uint32_t)j; j++)
>> + {
>> + if (n_regs < (uint32_t)j) break;
>
> I think this should be <= instead of <. Otherwise regs[n_regs] could
> be read which is one past the end.
Agreed, fixing.
>> - /* If sample_perf_regs_mapping is unsupported then PERF_FRAME_REGS_MASK
>> is zero. */
>> - assert (ebl->sample_perf_regs_mapping != NULL);
>> - return ebl->sample_perf_regs_mapping (ebl, perf_regs_mask, abi,
>> - regs_mapping, n_regs_mapping);
>> + /* If sample_perf_regs_mapping is unsupported then perf_frame_regs_mask
>> is zero. */
>> + assert (ebl->perf_frame_regs_mask != 0);
>
> Can we replace this assert with error handling? We have been
> gradually removing asserts from elfutils libraries.
Ok, your call. I can have the function return false instead.
Existing handler paths set DWFL_E_LIBEBL_BAD with message
"Internal error due to ebl", which is not too different from an
assertion failure from user's PoV -- the cause is attempting to
use the functionality on an architecture that doesn't support it.
(There's also DWFL_E_LIBEBL which has a much more confusing
message associated with it. Sticking with DWFL_E_LIBEBL_BAD.)
>> + if (perf_regs_mask != 0 && ebl->cached_perf_regs_mask == perf_regs_mask)
>
> Does cached_perf_regs_mask ever get set on ARM arches beyond the
> default value of 0? It's set in x86_initreg_sample.c but I don't see
> an ARM equivilent.
Thanks for the catch.
perf_regs_mask is provided by the user.
cached_perf_regs_mask should be set directly below in the current function
-- its nonzero value is an indication that a mapping has been cached.
Adding the missing assignment.
For clarity, I think the conditional should be:
if (ebl->cached_perf_regs_mask != 0
&& ebl->cached_perf_regs_mask == perf_regs_mask)
i.e. if a mapping has been cached and corresponds to
the perf_regs_mask the user is requesting, return existing data.
(Otherwise, proceed to compute a new mapping.)
>> + {
>> + *regs_mapping = ebl->cached_regs_mapping;
>> + *n_regs_mapping = ebl->cached_n_regs_mapping;
>> + return true;
>> + }
>> +
>> + /* XXX Unwind-relevant register file should be no bigger than this: */
>> + int count = 64;
>> +
>> + ebl->cached_regs_mapping = (int *)calloc (count, sizeof(int));
>
> Can this code ever be reached if ebl->cached_regs_mapping isn't NULL?
> If so then the previous cached_regs_mapping should be freed.
Good point, providing packets with different perf_regs_mask triggers
replacement of the cached mapping, in which case the previous one
should be freed.
>> + ebl->cached_n_regs_mapping = count;
>> +
>> + int j, k; uint64_t bit;
>> + for (j = 0, k = 0, bit = 1;
>> + k < count; k++, bit <<= 1)
>> + {
>> + ebl->cached_regs_mapping[k] = -1;
>> + if ((bit & perf_regs_mask)) {
>> + ebl->cached_regs_mapping[j] = k;
>> + j++;
>> + }
>> + }
>
> If I understand this correctly then the length of
> ebl->cached_regs_mapping is always 64 and only the first j entries are
> meaningful. In x86_initreg_sample.c it looks like the
> cached_regs_mapping has only as many entries as needed. Would that
> approach work here?
No, because this code is architecture-independent
-- it handles cases where perf_events and dwarf use
an identical ordering for the register file.
64 is a reasonable upper bound, we would need to add
another constant to the ebl to track what the lower-bound
would be for each arch (seems like fussy extra optimization
for this case).
The mapping based on perf_regs_mask is still necessary
to compute because perf_events may only be providing a subset
of the register file.
>> +/* Called by the initialization functions for backends which support
>> + hook sample_perf_regs_mapping(). */
>> +void
>> +internal_function
>> +__libebl_init_cached_regs_mapping (Ebl *eh)
>> +{
>> + eh->cached_perf_regs_mask = 0;
>> + eh->cached_regs_mapping = NULL;
>> + eh->cached_n_regs_mapping = -1;
>
> cached_n_regs_mapping is a size_t so this actually sets it to
> SIZE_MAX. I would use just use SIZE_MAX here to make this clear and to
> avoid triggering warnings. Does code related to cached_n_regs_mapping
> account for SIZE_MAX being the invalid/unset marker? Another option is
> to use 0 plus a NULL cached_regs_mapping to indicate invalid/unset.
I would assign SIZE_MAX. The unset marker is actually cached_perf_regs_mask
as discussed above, so it doesn't matter what eh->cached_n_regs_mapping is
as long as it's clearly invalid.
-- Serhei