Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-04-19 Thread Rob Herring
On Mon, Apr 19, 2021 at 11:14 AM Will Deacon  wrote:
>
> On Thu, Apr 08, 2021 at 01:38:17PM -0500, Rob Herring wrote:
> > On Thu, Apr 8, 2021 at 6:08 AM Mark Rutland  wrote:
> > > On Wed, Apr 07, 2021 at 01:44:37PM +0100, Will Deacon wrote:
> > > > On Thu, Apr 01, 2021 at 02:45:21PM -0500, Rob Herring wrote:
> > > > > On Wed, Mar 31, 2021 at 11:01 AM Will Deacon  wrote:
> > > > I guess I'm just worried about exposing the counters to userspace after
> > > > the PMU driver (or perf core?) thinks that they're no longer exposed in
> > > > case we leak other events.
> > >
> > > IMO that's not practically different from the single-PMU case (i.e.
> > > multi-PMU isn't material, either we have a concern with leaking or we
> > > don't); more on that below.
>
> Well, maybe. It looks the single-PMU case is exposed to the same issue,
> but I think a solution needs to take into account the multi-PMU situation.
>
> > > While it looks odd to place this on the mm, I don't think it's the end
> > > of the world.
> > >
> > > > However, I'm not sure how this is supposed to work normally: what
> > > > happens if e.g. a privileged user has a per-cpu counter for a kernel
> > > > event while a task has a counter with direct access -- can that task
> > > > read the kernel event out of the PMU registers from userspace?
> > >
> > > Yes -- userspace could go read any counters even though it isn't
> > > supposed to, and could potentially infer information from those. It
> > > won't have access to the config registers or kernel data structures, so
> > > it isn't guaranteed to know what the even is or when it is
> > > context-switched/reprogrammed/etc.
> > >
> > > If we believe that's a problem, then it's difficult to do anything
> > > robust other than denying userspace access entirely, since disabling
> > > userspace access while in use would surprise applications, and denying
> > > privileged events would need some global state that we consult at event
> > > creation time (in addition to being an inversion of privilege).
> > >
> > > IIRC there was some fuss about this a while back on x86; I'll go dig and
> > > see what I can find, unless Peter has a memory...
> >
> > Maybe this one[1].
> >
> > Rob
> >
> > [1] 
> > https://lore.kernel.org/lkml/20200730123815.18518-1-kan.li...@linux.intel.com/
>
> Going through the archives and talking to Peter, it looks like this is still
> an active area of concern:
>
>   - There are patches to clear "dirty" counters on context-switch. They were
> queued for 5.13 but broke -tip on Friday:
>
> 
> https://lore.kernel.org/lkml/yhm%2fm4za2lpry...@hirez.programming.kicks-ass.net/

Yes, nice timing. I've reworked the arm64 support to do the same
things (minus the breakage). And it looks like we can simplify things
a bit by moving all the context switch handling into .sched_task() and
out of switch_mm. Unless there's some case where that wouldn't work
that I'm not aware of (entirely likely).

>   - Per-cpu events cannot be protected in software:
>
> 
> https://lore.kernel.org/lkml/calcetrvvpzud_hq8xoomhn_wwrqjuvroect2do4_d4rozoa...@mail.gmail.com/
>
> so without hardware support, we need a way to disable user access for
> people that care about this leakage
>
> x86 has an "rdpmc" file exposed for the PMU device in sysfs which allows
> access to be disabled. I don't think these patches add such a thing, and
> that's where the fun with multi-PMU machines would come into play.

The fun is because sysfs will end up with multiple 'rdpmc' files or
something else?

Rob


Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-04-19 Thread Will Deacon
On Thu, Apr 08, 2021 at 01:38:17PM -0500, Rob Herring wrote:
> On Thu, Apr 8, 2021 at 6:08 AM Mark Rutland  wrote:
> > On Wed, Apr 07, 2021 at 01:44:37PM +0100, Will Deacon wrote:
> > > On Thu, Apr 01, 2021 at 02:45:21PM -0500, Rob Herring wrote:
> > > > On Wed, Mar 31, 2021 at 11:01 AM Will Deacon  wrote:
> > > I guess I'm just worried about exposing the counters to userspace after
> > > the PMU driver (or perf core?) thinks that they're no longer exposed in
> > > case we leak other events.
> >
> > IMO that's not practically different from the single-PMU case (i.e.
> > multi-PMU isn't material, either we have a concern with leaking or we
> > don't); more on that below.

Well, maybe. It looks the single-PMU case is exposed to the same issue,
but I think a solution needs to take into account the multi-PMU situation.

> > While it looks odd to place this on the mm, I don't think it's the end
> > of the world.
> >
> > > However, I'm not sure how this is supposed to work normally: what
> > > happens if e.g. a privileged user has a per-cpu counter for a kernel
> > > event while a task has a counter with direct access -- can that task
> > > read the kernel event out of the PMU registers from userspace?
> >
> > Yes -- userspace could go read any counters even though it isn't
> > supposed to, and could potentially infer information from those. It
> > won't have access to the config registers or kernel data structures, so
> > it isn't guaranteed to know what the even is or when it is
> > context-switched/reprogrammed/etc.
> >
> > If we believe that's a problem, then it's difficult to do anything
> > robust other than denying userspace access entirely, since disabling
> > userspace access while in use would surprise applications, and denying
> > privileged events would need some global state that we consult at event
> > creation time (in addition to being an inversion of privilege).
> >
> > IIRC there was some fuss about this a while back on x86; I'll go dig and
> > see what I can find, unless Peter has a memory...
> 
> Maybe this one[1].
> 
> Rob
> 
> [1] 
> https://lore.kernel.org/lkml/20200730123815.18518-1-kan.li...@linux.intel.com/

Going through the archives and talking to Peter, it looks like this is still
an active area of concern:

  - There are patches to clear "dirty" counters on context-switch. They were
queued for 5.13 but broke -tip on Friday:


https://lore.kernel.org/lkml/yhm%2fm4za2lpry...@hirez.programming.kicks-ass.net/

  - Per-cpu events cannot be protected in software:


https://lore.kernel.org/lkml/calcetrvvpzud_hq8xoomhn_wwrqjuvroect2do4_d4rozoa...@mail.gmail.com/

so without hardware support, we need a way to disable user access for
people that care about this leakage

x86 has an "rdpmc" file exposed for the PMU device in sysfs which allows
access to be disabled. I don't think these patches add such a thing, and
that's where the fun with multi-PMU machines would come into play.

Will


Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-04-08 Thread Rob Herring
On Thu, Apr 8, 2021 at 6:08 AM Mark Rutland  wrote:
>
> On Wed, Apr 07, 2021 at 01:44:37PM +0100, Will Deacon wrote:
> > [Moving Mark to To: since I'd like his view on this]
> >
> > On Thu, Apr 01, 2021 at 02:45:21PM -0500, Rob Herring wrote:
> > > On Wed, Mar 31, 2021 at 11:01 AM Will Deacon  wrote:
> > > >
> > > > On Tue, Mar 30, 2021 at 12:09:38PM -0500, Rob Herring wrote:
> > > > > On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> > > > > >
> > > > > > On Wed, Mar 10, 2021 at 05:08:29PM -0700, Rob Herring wrote:
> > > > > > > From: Raphael Gault 
>
> > > > > > > +static void armv8pmu_event_unmapped(struct perf_event *event, 
> > > > > > > struct mm_struct *mm)
> > > > > > > +{
> > > > > > > + struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > > > > > +
> > > > > > > + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> > > > > > > + return;
> > > > > > > +
> > > > > > > + if (atomic_dec_and_test(>context.pmu_direct_access))
> > > > > > > + on_each_cpu_mask(>supported_cpus, 
> > > > > > > refresh_pmuserenr, mm, 1);
> > > > > >
> > > > > > Given that the pmu_direct_access field is global per-mm, won't this 
> > > > > > go
> > > > > > wrong if multiple PMUs are opened by the same process but only a 
> > > > > > subset
> > > > > > are exposed to EL0? Perhaps pmu_direct_access should be treated as 
> > > > > > a mask
> > > > > > rather than a counter, so that we can 'and' it with the 
> > > > > > supported_cpus for
> > > > > > the PMU we're dealing with.
> > > > >
> > > > > It needs to be a count to support multiple events on the same PMU. If
> > > > > the event is not enabled for EL0, then we'd exit out on the
> > > > > ARMPMU_EL0_RD_CNTR check. So I think we're fine.
> > > >
> > > > I'm still not convinced; pmu_direct_access is shared between PMUs, so
> > > > testing the result of atomic_dec_and_test() just doesn't make sense to
> > > > me, as another PMU could be playing with the count.
> > >
> > > How is that a problem? Let's make a concrete example:
> > >
> > > map PMU1:event2 -> pmu_direct_access = 1 -> enable access
> > > map PMU2:event3 -> pmu_direct_access = 2
> > > map PMU1:event4 -> pmu_direct_access = 3
> > > unmap PMU2:event3 -> pmu_direct_access = 2
> > > unmap PMU1:event2 -> pmu_direct_access = 1
> > > unmap PMU1:event4 -> pmu_direct_access = 0 -> disable access
> > >
> > > The only issue I can see is PMU2 remains enabled for user access until
> > > the last unmap. But we're sharing the mm, so who cares? Also, in this
> > > scenario it is the user's problem to pin themselves to cores sharing a
> > > PMU. If the user doesn't do that, they get to keep the pieces.
> >
> > I guess I'm just worried about exposing the counters to userspace after
> > the PMU driver (or perf core?) thinks that they're no longer exposed in
> > case we leak other events.
>
> IMO that's not practically different from the single-PMU case (i.e.
> multi-PMU isn't material, either we have a concern with leaking or we
> don't); more on that below.
>
> While it looks odd to place this on the mm, I don't think it's the end
> of the world.
>
> > However, I'm not sure how this is supposed to work normally: what
> > happens if e.g. a privileged user has a per-cpu counter for a kernel
> > event while a task has a counter with direct access -- can that task
> > read the kernel event out of the PMU registers from userspace?
>
> Yes -- userspace could go read any counters even though it isn't
> supposed to, and could potentially infer information from those. It
> won't have access to the config registers or kernel data structures, so
> it isn't guaranteed to know what the even is or when it is
> context-switched/reprogrammed/etc.
>
> If we believe that's a problem, then it's difficult to do anything
> robust other than denying userspace access entirely, since disabling
> userspace access while in use would surprise applications, and denying
> privileged events would need some global state that we consult at event
> creation time (in addition to being an inversion of privilege).
>
> IIRC there was some fuss about this a while back on x86; I'll go dig and
> see what I can find, unless Peter has a memory...

Maybe this one[1].

Rob

[1] 
https://lore.kernel.org/lkml/20200730123815.18518-1-kan.li...@linux.intel.com/


Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-04-08 Thread Mark Rutland
On Wed, Apr 07, 2021 at 01:44:37PM +0100, Will Deacon wrote:
> [Moving Mark to To: since I'd like his view on this]
> 
> On Thu, Apr 01, 2021 at 02:45:21PM -0500, Rob Herring wrote:
> > On Wed, Mar 31, 2021 at 11:01 AM Will Deacon  wrote:
> > >
> > > On Tue, Mar 30, 2021 at 12:09:38PM -0500, Rob Herring wrote:
> > > > On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> > > > >
> > > > > On Wed, Mar 10, 2021 at 05:08:29PM -0700, Rob Herring wrote:
> > > > > > From: Raphael Gault 

> > > > > > +static void armv8pmu_event_unmapped(struct perf_event *event, 
> > > > > > struct mm_struct *mm)
> > > > > > +{
> > > > > > + struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > > > > +
> > > > > > + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> > > > > > + return;
> > > > > > +
> > > > > > + if (atomic_dec_and_test(>context.pmu_direct_access))
> > > > > > + on_each_cpu_mask(>supported_cpus, 
> > > > > > refresh_pmuserenr, mm, 1);
> > > > >
> > > > > Given that the pmu_direct_access field is global per-mm, won't this go
> > > > > wrong if multiple PMUs are opened by the same process but only a 
> > > > > subset
> > > > > are exposed to EL0? Perhaps pmu_direct_access should be treated as a 
> > > > > mask
> > > > > rather than a counter, so that we can 'and' it with the 
> > > > > supported_cpus for
> > > > > the PMU we're dealing with.
> > > >
> > > > It needs to be a count to support multiple events on the same PMU. If
> > > > the event is not enabled for EL0, then we'd exit out on the
> > > > ARMPMU_EL0_RD_CNTR check. So I think we're fine.
> > >
> > > I'm still not convinced; pmu_direct_access is shared between PMUs, so
> > > testing the result of atomic_dec_and_test() just doesn't make sense to
> > > me, as another PMU could be playing with the count.
> > 
> > How is that a problem? Let's make a concrete example:
> > 
> > map PMU1:event2 -> pmu_direct_access = 1 -> enable access
> > map PMU2:event3 -> pmu_direct_access = 2
> > map PMU1:event4 -> pmu_direct_access = 3
> > unmap PMU2:event3 -> pmu_direct_access = 2
> > unmap PMU1:event2 -> pmu_direct_access = 1
> > unmap PMU1:event4 -> pmu_direct_access = 0 -> disable access
> > 
> > The only issue I can see is PMU2 remains enabled for user access until
> > the last unmap. But we're sharing the mm, so who cares? Also, in this
> > scenario it is the user's problem to pin themselves to cores sharing a
> > PMU. If the user doesn't do that, they get to keep the pieces.
> 
> I guess I'm just worried about exposing the counters to userspace after
> the PMU driver (or perf core?) thinks that they're no longer exposed in
> case we leak other events.

IMO that's not practically different from the single-PMU case (i.e.
multi-PMU isn't material, either we have a concern with leaking or we
don't); more on that below.

While it looks odd to place this on the mm, I don't think it's the end
of the world.

> However, I'm not sure how this is supposed to work normally: what
> happens if e.g. a privileged user has a per-cpu counter for a kernel
> event while a task has a counter with direct access -- can that task
> read the kernel event out of the PMU registers from userspace?

Yes -- userspace could go read any counters even though it isn't
supposed to, and could potentially infer information from those. It
won't have access to the config registers or kernel data structures, so
it isn't guaranteed to know what the even is or when it is
context-switched/reprogrammed/etc.

If we believe that's a problem, then it's difficult to do anything
robust other than denying userspace access entirely, since disabling
userspace access while in use would surprise applications, and denying
privileged events would need some global state that we consult at event
creation time (in addition to being an inversion of privilege).

IIRC there was some fuss about this a while back on x86; I'll go dig and
see what I can find, unless Peter has a memory...

Thanks,
Mark.


Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-04-07 Thread Will Deacon
[Moving Mark to To: since I'd like his view on this]

On Thu, Apr 01, 2021 at 02:45:21PM -0500, Rob Herring wrote:
> On Wed, Mar 31, 2021 at 11:01 AM Will Deacon  wrote:
> >
> > On Tue, Mar 30, 2021 at 12:09:38PM -0500, Rob Herring wrote:
> > > On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> > > >
> > > > On Wed, Mar 10, 2021 at 05:08:29PM -0700, Rob Herring wrote:
> > > > > From: Raphael Gault 
> > > > >
> > > > > Keep track of event opened with direct access to the hardware counters
> > > > > and modify permissions while they are open.
> > > > >
> > > > > The strategy used here is the same which x86 uses: every time an event
> > > > > is mapped, the permissions are set if required. The atomic field added
> > > > > in the mm_context helps keep track of the different event opened and
> > > > > de-activate the permissions when all are unmapped.
> > > > > We also need to update the permissions in the context switch code so
> > > > > that tasks keep the right permissions.
> > > > >
> > > > > In order to enable 64-bit counters for userspace when available, a new
> > > > > config1 bit is added for userspace to indicate it wants userspace 
> > > > > counter
> > > > > access. This bit allows the kernel to decide if chaining should be
> > > > > disabled and chaining and userspace access are incompatible.
> > > > > The modes for config1 are as follows:
> > > > >
> > > > > config1 = 0 or 2 : user access enabled and always 32-bit
> > > > > config1 = 1 : user access disabled and always 64-bit (using chaining 
> > > > > if needed)
> > > > > config1 = 3 : user access enabled and counter size matches underlying 
> > > > > counter.
> > > >
> > > > In this last case, how does userspace know whether it got a 32-bit or a
> > > > 64-bit counter?
> > >
> > > pmc_width in the user page. If the read loop is correct, then it
> > > doesn't matter to the user.
> >
> > Gotcha; please mention that in this comment,
> >
> > > > > +static void refresh_pmuserenr(void *mm)
> > > > > +{
> > > > > + if (mm == current->active_mm)
> > > > > + perf_switch_user_access(mm);
> > > > > +}
> > > > > +
> > > > > +static void armv8pmu_event_mapped(struct perf_event *event, struct 
> > > > > mm_struct *mm)
> > > > > +{
> > > > > + struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > > > +
> > > > > + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> > > > > + return;
> > > > > +
> > > > > + /*
> > > > > +  * This function relies on not being called concurrently in two
> > > > > +  * tasks in the same mm.  Otherwise one task could observe
> > > > > +  * pmu_direct_access > 1 and return all the way back to
> > > > > +  * userspace with user access disabled while another task is 
> > > > > still
> > > > > +  * doing on_each_cpu_mask() to enable user access.
> > > > > +  *
> > > > > +  * For now, this can't happen because all callers hold mmap_lock
> > > > > +  * for write.  If this changes, we'll need a different solution.
> > > > > +  */
> > > > > + lockdep_assert_held_write(>mmap_lock);
> > > > > +
> > > > > + if (atomic_inc_return(>context.pmu_direct_access) == 1)
> > > > > + on_each_cpu_mask(>supported_cpus, 
> > > > > refresh_pmuserenr, mm, 1);
> > > > > +}
> > > >
> > > > Why do we need to cross-call here? Seems like it would be a tonne 
> > > > simpler to
> > > > handle the trap. Is there a reason not to do that?
> > >
> > > Alignment with x86? You didn't like the trap handler either:
> > >
> > > > Hmm... this feels pretty fragile since, although we may expect 
> > > > userspace only
> > > > to trigger this in the context of the specific perf use-case, we don't 
> > > > have
> > > > a way to detect that, so the ABI we're exposing is that EL0 accesses to
> > > > non-existent counters will return 0. I don't really think that's 
> > > > something
> > > > we want to commit to.
> > >
> > > Full mail: 
> > > https://lore.kernel.org/r/20200928182601.GA11974@willie-the-truck
> > >
> > > The handler would be different, but we'd still have the problem of not
> > > being able to detect the use case, right?
> >
> > Well hang on, the thing you link to was about _emulating_ the access. I'm
> > not saying we should do that, but rather just enable EL0 PMU access lazily
> > instead of using the IPI. We're going to enable it when we context-switch
> > anyway.
> >
> > The alternative is enabling EL0 access unconditionally and leaving it
> > enabled.
> 
> That BTW is what x86 originally did. This pmu_direct_access stuff came
> about as part of locking down access a bit.
> 
> In any case, I've modified this to trap. It avoids the mmap lock
> dependency too, so that's nice.
> 
> > > > > +
> > > > > +static void armv8pmu_event_unmapped(struct perf_event *event, struct 
> > > > > mm_struct *mm)
> > > > > +{
> > > > > + struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > > > +
> > > > > + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> > > > > + 

Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-04-01 Thread Rob Herring
On Wed, Mar 31, 2021 at 11:01 AM Will Deacon  wrote:
>
> On Tue, Mar 30, 2021 at 12:09:38PM -0500, Rob Herring wrote:
> > On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> > >
> > > On Wed, Mar 10, 2021 at 05:08:29PM -0700, Rob Herring wrote:
> > > > From: Raphael Gault 
> > > >
> > > > Keep track of event opened with direct access to the hardware counters
> > > > and modify permissions while they are open.
> > > >
> > > > The strategy used here is the same which x86 uses: every time an event
> > > > is mapped, the permissions are set if required. The atomic field added
> > > > in the mm_context helps keep track of the different event opened and
> > > > de-activate the permissions when all are unmapped.
> > > > We also need to update the permissions in the context switch code so
> > > > that tasks keep the right permissions.
> > > >
> > > > In order to enable 64-bit counters for userspace when available, a new
> > > > config1 bit is added for userspace to indicate it wants userspace 
> > > > counter
> > > > access. This bit allows the kernel to decide if chaining should be
> > > > disabled and chaining and userspace access are incompatible.
> > > > The modes for config1 are as follows:
> > > >
> > > > config1 = 0 or 2 : user access enabled and always 32-bit
> > > > config1 = 1 : user access disabled and always 64-bit (using chaining if 
> > > > needed)
> > > > config1 = 3 : user access enabled and counter size matches underlying 
> > > > counter.
> > >
> > > In this last case, how does userspace know whether it got a 32-bit or a
> > > 64-bit counter?
> >
> > pmc_width in the user page. If the read loop is correct, then it
> > doesn't matter to the user.
>
> Gotcha; please mention that in this comment,
>
> > > > +static void refresh_pmuserenr(void *mm)
> > > > +{
> > > > + if (mm == current->active_mm)
> > > > + perf_switch_user_access(mm);
> > > > +}
> > > > +
> > > > +static void armv8pmu_event_mapped(struct perf_event *event, struct 
> > > > mm_struct *mm)
> > > > +{
> > > > + struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > > +
> > > > + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> > > > + return;
> > > > +
> > > > + /*
> > > > +  * This function relies on not being called concurrently in two
> > > > +  * tasks in the same mm.  Otherwise one task could observe
> > > > +  * pmu_direct_access > 1 and return all the way back to
> > > > +  * userspace with user access disabled while another task is still
> > > > +  * doing on_each_cpu_mask() to enable user access.
> > > > +  *
> > > > +  * For now, this can't happen because all callers hold mmap_lock
> > > > +  * for write.  If this changes, we'll need a different solution.
> > > > +  */
> > > > + lockdep_assert_held_write(>mmap_lock);
> > > > +
> > > > + if (atomic_inc_return(>context.pmu_direct_access) == 1)
> > > > + on_each_cpu_mask(>supported_cpus, 
> > > > refresh_pmuserenr, mm, 1);
> > > > +}
> > >
> > > Why do we need to cross-call here? Seems like it would be a tonne simpler 
> > > to
> > > handle the trap. Is there a reason not to do that?
> >
> > Alignment with x86? You didn't like the trap handler either:
> >
> > > Hmm... this feels pretty fragile since, although we may expect userspace 
> > > only
> > > to trigger this in the context of the specific perf use-case, we don't 
> > > have
> > > a way to detect that, so the ABI we're exposing is that EL0 accesses to
> > > non-existent counters will return 0. I don't really think that's something
> > > we want to commit to.
> >
> > Full mail: https://lore.kernel.org/r/20200928182601.GA11974@willie-the-truck
> >
> > The handler would be different, but we'd still have the problem of not
> > being able to detect the use case, right?
>
> Well hang on, the thing you link to was about _emulating_ the access. I'm
> not saying we should do that, but rather just enable EL0 PMU access lazily
> instead of using the IPI. We're going to enable it when we context-switch
> anyway.
>
> The alternative is enabling EL0 access unconditionally and leaving it
> enabled.

That BTW is what x86 originally did. This pmu_direct_access stuff came
about as part of locking down access a bit.

In any case, I've modified this to trap. It avoids the mmap lock
dependency too, so that's nice.

> > > > +
> > > > +static void armv8pmu_event_unmapped(struct perf_event *event, struct 
> > > > mm_struct *mm)
> > > > +{
> > > > + struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > > +
> > > > + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> > > > + return;
> > > > +
> > > > + if (atomic_dec_and_test(>context.pmu_direct_access))
> > > > + on_each_cpu_mask(>supported_cpus, 
> > > > refresh_pmuserenr, mm, 1);
> > >
> > > Given that the pmu_direct_access field is global per-mm, won't this go
> > > wrong if multiple PMUs are opened by the same process but only a subset
> > > are exposed to EL0? 

Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-04-01 Thread Will Deacon
On Wed, Mar 31, 2021 at 12:52:11PM -0500, Rob Herring wrote:
> On Wed, Mar 31, 2021 at 10:38 AM Will Deacon  wrote:
> >
> > On Tue, Mar 30, 2021 at 04:08:11PM -0500, Rob Herring wrote:
> > > On Tue, Mar 30, 2021 at 12:09 PM Rob Herring  wrote:
> > > > On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> > > > > The logic here feels like it
> > > > > could with a bit of untangling.
> > > >
> > > > Yes, I don't love it, but couldn't come up with anything better. It is
> > > > complicated by the fact that flags have to be set before we assign the
> > > > counter and can't set/change them when we assign the counter. It would
> > > > take a lot of refactoring with armpmu code to fix that.
> > >
> > > How's this instead?:
> > >
> > > if (armv8pmu_event_want_user_access(event) || 
> > > !armv8pmu_event_is_64bit(event))
> > > event->hw.flags |= ARMPMU_EL0_RD_CNTR;
> > >
> > > /*
> > > * At this point, the counter is not assigned. If a 64-bit counter is
> > > * requested, we must make sure the h/w has 64-bit counters if we set
> > > * the event size to 64-bit because chaining is not supported with
> > > * userspace access. This may still fail later on if the CPU cycle
> > > * counter is in use.
> > > */
> > > if (armv8pmu_event_is_64bit(event) &&
> > > (!armv8pmu_event_want_user_access(event) ||
> > >  armv8pmu_has_long_event(cpu_pmu) || (hw_event_id ==
> > > ARMV8_PMUV3_PERFCTR_CPU_CYCLES)))
> > > event->hw.flags |= ARMPMU_EVT_64BIT;
> >
> > I thought there were some cases where we could assign cycles event to an
> > event counter; does that not happen anymore?
> 
> Yes, but if we hit that scenario when the user has asked for 64-bit
> user access, then we return an error later when assigning the counter.
> I think we can assume if users have gone to the trouble of requesting
> 64-bit counters, then they can deal with ensuring they don't have
> multiple users.
> 
> Otherwise, the only way I see to simplify this is we only support
> 64-bit counters in userspace when we have v8.5 PMU.

I'm happy to start from that position, and then we can extend it later if
there's a need.

Will


Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-03-31 Thread Rob Herring
On Wed, Mar 31, 2021 at 10:38 AM Will Deacon  wrote:
>
> On Tue, Mar 30, 2021 at 04:08:11PM -0500, Rob Herring wrote:
> > On Tue, Mar 30, 2021 at 12:09 PM Rob Herring  wrote:
> > > On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> > > > The logic here feels like it
> > > > could with a bit of untangling.
> > >
> > > Yes, I don't love it, but couldn't come up with anything better. It is
> > > complicated by the fact that flags have to be set before we assign the
> > > counter and can't set/change them when we assign the counter. It would
> > > take a lot of refactoring with armpmu code to fix that.
> >
> > How's this instead?:
> >
> > if (armv8pmu_event_want_user_access(event) || 
> > !armv8pmu_event_is_64bit(event))
> > event->hw.flags |= ARMPMU_EL0_RD_CNTR;
> >
> > /*
> > * At this point, the counter is not assigned. If a 64-bit counter is
> > * requested, we must make sure the h/w has 64-bit counters if we set
> > * the event size to 64-bit because chaining is not supported with
> > * userspace access. This may still fail later on if the CPU cycle
> > * counter is in use.
> > */
> > if (armv8pmu_event_is_64bit(event) &&
> > (!armv8pmu_event_want_user_access(event) ||
> >  armv8pmu_has_long_event(cpu_pmu) || (hw_event_id ==
> > ARMV8_PMUV3_PERFCTR_CPU_CYCLES)))
> > event->hw.flags |= ARMPMU_EVT_64BIT;
>
> I thought there were some cases where we could assign cycles event to an
> event counter; does that not happen anymore?

Yes, but if we hit that scenario when the user has asked for 64-bit
user access, then we return an error later when assigning the counter.
I think we can assume if users have gone to the trouble of requesting
64-bit counters, then they can deal with ensuring they don't have
multiple users.

Otherwise, the only way I see to simplify this is we only support
64-bit counters in userspace when we have v8.5 PMU.

Rob


Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-03-31 Thread Will Deacon
On Tue, Mar 30, 2021 at 12:09:38PM -0500, Rob Herring wrote:
> On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> >
> > On Wed, Mar 10, 2021 at 05:08:29PM -0700, Rob Herring wrote:
> > > From: Raphael Gault 
> > >
> > > Keep track of event opened with direct access to the hardware counters
> > > and modify permissions while they are open.
> > >
> > > The strategy used here is the same which x86 uses: every time an event
> > > is mapped, the permissions are set if required. The atomic field added
> > > in the mm_context helps keep track of the different event opened and
> > > de-activate the permissions when all are unmapped.
> > > We also need to update the permissions in the context switch code so
> > > that tasks keep the right permissions.
> > >
> > > In order to enable 64-bit counters for userspace when available, a new
> > > config1 bit is added for userspace to indicate it wants userspace counter
> > > access. This bit allows the kernel to decide if chaining should be
> > > disabled and chaining and userspace access are incompatible.
> > > The modes for config1 are as follows:
> > >
> > > config1 = 0 or 2 : user access enabled and always 32-bit
> > > config1 = 1 : user access disabled and always 64-bit (using chaining if 
> > > needed)
> > > config1 = 3 : user access enabled and counter size matches underlying 
> > > counter.
> >
> > In this last case, how does userspace know whether it got a 32-bit or a
> > 64-bit counter?
> 
> pmc_width in the user page. If the read loop is correct, then it
> doesn't matter to the user.

Gotcha; please mention that in this comment,

> > > +static void refresh_pmuserenr(void *mm)
> > > +{
> > > + if (mm == current->active_mm)
> > > + perf_switch_user_access(mm);
> > > +}
> > > +
> > > +static void armv8pmu_event_mapped(struct perf_event *event, struct 
> > > mm_struct *mm)
> > > +{
> > > + struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > +
> > > + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> > > + return;
> > > +
> > > + /*
> > > +  * This function relies on not being called concurrently in two
> > > +  * tasks in the same mm.  Otherwise one task could observe
> > > +  * pmu_direct_access > 1 and return all the way back to
> > > +  * userspace with user access disabled while another task is still
> > > +  * doing on_each_cpu_mask() to enable user access.
> > > +  *
> > > +  * For now, this can't happen because all callers hold mmap_lock
> > > +  * for write.  If this changes, we'll need a different solution.
> > > +  */
> > > + lockdep_assert_held_write(>mmap_lock);
> > > +
> > > + if (atomic_inc_return(>context.pmu_direct_access) == 1)
> > > + on_each_cpu_mask(>supported_cpus, 
> > > refresh_pmuserenr, mm, 1);
> > > +}
> >
> > Why do we need to cross-call here? Seems like it would be a tonne simpler to
> > handle the trap. Is there a reason not to do that?
> 
> Alignment with x86? You didn't like the trap handler either:
> 
> > Hmm... this feels pretty fragile since, although we may expect userspace 
> > only
> > to trigger this in the context of the specific perf use-case, we don't have
> > a way to detect that, so the ABI we're exposing is that EL0 accesses to
> > non-existent counters will return 0. I don't really think that's something
> > we want to commit to.
> 
> Full mail: https://lore.kernel.org/r/20200928182601.GA11974@willie-the-truck
> 
> The handler would be different, but we'd still have the problem of not
> being able to detect the use case, right?

Well hang on, the thing you link to was about _emulating_ the access. I'm
not saying we should do that, but rather just enable EL0 PMU access lazily
instead of using the IPI. We're going to enable it when we context-switch
anyway.

The alternative is enabling EL0 access unconditionally and leaving it
enabled.

> > > +
> > > +static void armv8pmu_event_unmapped(struct perf_event *event, struct 
> > > mm_struct *mm)
> > > +{
> > > + struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > +
> > > + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> > > + return;
> > > +
> > > + if (atomic_dec_and_test(>context.pmu_direct_access))
> > > + on_each_cpu_mask(>supported_cpus, 
> > > refresh_pmuserenr, mm, 1);
> >
> > Given that the pmu_direct_access field is global per-mm, won't this go
> > wrong if multiple PMUs are opened by the same process but only a subset
> > are exposed to EL0? Perhaps pmu_direct_access should be treated as a mask
> > rather than a counter, so that we can 'and' it with the supported_cpus for
> > the PMU we're dealing with.
> 
> It needs to be a count to support multiple events on the same PMU. If
> the event is not enabled for EL0, then we'd exit out on the
> ARMPMU_EL0_RD_CNTR check. So I think we're fine.

I'm still not convinced; pmu_direct_access is shared between PMUs, so
testing the result of atomic_dec_and_test() just doesn't make sense to
me, 

Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-03-31 Thread Will Deacon
On Tue, Mar 30, 2021 at 04:08:11PM -0500, Rob Herring wrote:
> On Tue, Mar 30, 2021 at 12:09 PM Rob Herring  wrote:
> > On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> > > The logic here feels like it
> > > could with a bit of untangling.
> >
> > Yes, I don't love it, but couldn't come up with anything better. It is
> > complicated by the fact that flags have to be set before we assign the
> > counter and can't set/change them when we assign the counter. It would
> > take a lot of refactoring with armpmu code to fix that.
> 
> How's this instead?:
> 
> if (armv8pmu_event_want_user_access(event) || !armv8pmu_event_is_64bit(event))
> event->hw.flags |= ARMPMU_EL0_RD_CNTR;
> 
> /*
> * At this point, the counter is not assigned. If a 64-bit counter is
> * requested, we must make sure the h/w has 64-bit counters if we set
> * the event size to 64-bit because chaining is not supported with
> * userspace access. This may still fail later on if the CPU cycle
> * counter is in use.
> */
> if (armv8pmu_event_is_64bit(event) &&
> (!armv8pmu_event_want_user_access(event) ||
>  armv8pmu_has_long_event(cpu_pmu) || (hw_event_id ==
> ARMV8_PMUV3_PERFCTR_CPU_CYCLES)))
> event->hw.flags |= ARMPMU_EVT_64BIT;

I thought there were some cases where we could assign cycles event to an
event counter; does that not happen anymore?

Will


Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-03-30 Thread Rob Herring
On Tue, Mar 30, 2021 at 12:09 PM Rob Herring  wrote:
>
> On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
> >
> > On Wed, Mar 10, 2021 at 05:08:29PM -0700, Rob Herring wrote:
> > > From: Raphael Gault 
> > >
> > > Keep track of event opened with direct access to the hardware counters
> > > and modify permissions while they are open.
> > >
> > > The strategy used here is the same which x86 uses: every time an event
> > > is mapped, the permissions are set if required. The atomic field added
> > > in the mm_context helps keep track of the different event opened and
> > > de-activate the permissions when all are unmapped.
> > > We also need to update the permissions in the context switch code so
> > > that tasks keep the right permissions.
> > >
> > > In order to enable 64-bit counters for userspace when available, a new
> > > config1 bit is added for userspace to indicate it wants userspace counter
> > > access. This bit allows the kernel to decide if chaining should be
> > > disabled and chaining and userspace access are incompatible.
> > > The modes for config1 are as follows:
> > >
> > > config1 = 0 or 2 : user access enabled and always 32-bit
> > > config1 = 1 : user access disabled and always 64-bit (using chaining if 
> > > needed)
> > > config1 = 3 : user access enabled and counter size matches underlying 
> > > counter.

[...]

> > > @@ -980,9 +1032,23 @@ static int __armv8_pmuv3_map_event(struct 
> > > perf_event *event,
> > >  _pmuv3_perf_cache_map,
> > >  ARMV8_PMU_EVTYPE_EVENT);
> > >
> > > - if (armv8pmu_event_is_64bit(event))
> > > + if (armv8pmu_event_want_user_access(event) || 
> > > !armv8pmu_event_is_64bit(event)) {
> > > + event->hw.flags |= ARMPMU_EL0_RD_CNTR;
> >
> > Why do you set this for all 32-bit events?
>
> It goes back to the config1 bits as explained in the commit msg. We
> can always support user access for 32-bit counters, but for 64-bit
> counters the user has to request both user access and 64-bit counters.
> We could require explicit user access request for 32-bit access, but I
> thought it was better to not require userspace to do something Arm
> specific on open.
>
> > The logic here feels like it
> > could with a bit of untangling.
>
> Yes, I don't love it, but couldn't come up with anything better. It is
> complicated by the fact that flags have to be set before we assign the
> counter and can't set/change them when we assign the counter. It would
> take a lot of refactoring with armpmu code to fix that.

How's this instead?:

if (armv8pmu_event_want_user_access(event) || !armv8pmu_event_is_64bit(event))
event->hw.flags |= ARMPMU_EL0_RD_CNTR;

/*
* At this point, the counter is not assigned. If a 64-bit counter is
* requested, we must make sure the h/w has 64-bit counters if we set
* the event size to 64-bit because chaining is not supported with
* userspace access. This may still fail later on if the CPU cycle
* counter is in use.
*/
if (armv8pmu_event_is_64bit(event) &&
(!armv8pmu_event_want_user_access(event) ||
 armv8pmu_has_long_event(cpu_pmu) || (hw_event_id ==
ARMV8_PMUV3_PERFCTR_CPU_CYCLES)))
event->hw.flags |= ARMPMU_EVT_64BIT;

> > > + /*
> > > +  * At this point, the counter is not assigned. If a 64-bit
> > > +  * counter is requested, we must make sure the h/w has 
> > > 64-bit
> > > +  * counters if we set the event size to 64-bit because 
> > > chaining
> > > +  * is not supported with userspace access. This may still 
> > > fail
> > > +  * later on if the CPU cycle counter is in use.
> > > +  */
> > > + if (armv8pmu_event_is_64bit(event) &&
> > > + (armv8pmu_has_long_event(armpmu) ||
> > > +  hw_event_id == ARMV8_PMUV3_PERFCTR_CPU_CYCLES))
> > > + event->hw.flags |= ARMPMU_EVT_64BIT;
> > > + } else if (armv8pmu_event_is_64bit(event))
> > >   event->hw.flags |= ARMPMU_EVT_64BIT;


Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-03-30 Thread Rob Herring
On Tue, Mar 30, 2021 at 10:31 AM Will Deacon  wrote:
>
> On Wed, Mar 10, 2021 at 05:08:29PM -0700, Rob Herring wrote:
> > From: Raphael Gault 
> >
> > Keep track of event opened with direct access to the hardware counters
> > and modify permissions while they are open.
> >
> > The strategy used here is the same which x86 uses: every time an event
> > is mapped, the permissions are set if required. The atomic field added
> > in the mm_context helps keep track of the different event opened and
> > de-activate the permissions when all are unmapped.
> > We also need to update the permissions in the context switch code so
> > that tasks keep the right permissions.
> >
> > In order to enable 64-bit counters for userspace when available, a new
> > config1 bit is added for userspace to indicate it wants userspace counter
> > access. This bit allows the kernel to decide if chaining should be
> > disabled and chaining and userspace access are incompatible.
> > The modes for config1 are as follows:
> >
> > config1 = 0 or 2 : user access enabled and always 32-bit
> > config1 = 1 : user access disabled and always 64-bit (using chaining if 
> > needed)
> > config1 = 3 : user access enabled and counter size matches underlying 
> > counter.
>
> In this last case, how does userspace know whether it got a 32-bit or a
> 64-bit counter?

pmc_width in the user page. If the read loop is correct, then it
doesn't matter to the user.

>
> > User access is enabled with config1 == 0 so that we match x86 behavior
> > and don't need Arm specific code (outside the counter read).
> >
> > Signed-off-by: Raphael Gault 
> > Signed-off-by: Rob Herring 
> > ---
> > I'm not completely sure if using current->active_mm in an IPI is okay?
> > It seems to work in my testing.
> >
> > Peter Z says (event->oncpu == smp_processor_id()) in the user page
> > update is always true, but my testing says otherwise[1].
>
> Peter? Sounds like there's either a misunderstanding here or we have some
> fundamental issue elsewhere.
>
> > v6:
> >  - Add new attr.config1 rdpmc bit for userspace to hint it wants
> >userspace access when also requesting 64-bit counters.
> >
> > v5:
> >  - Only set cap_user_rdpmc if event is on current cpu
> >  - Limit enabling/disabling access to CPUs associated with the PMU
> >(supported_cpus) and with the mm_struct matching current->active_mm.
> >
> > v2:
> >  - Move mapped/unmapped into arm64 code. Fixes arm32.
> >  - Rebase on cap_user_time_short changes
> >
> > Changes from Raphael's v4:
> >   - Drop homogeneous check
> >   - Disable access for chained counters
> >   - Set pmc_width in user page
> >
> > [1] 
> > https://lore.kernel.org/lkml/cal_jsqk+ekef5navnbfarcjre3myhfbfe54f9yhkbstnwql...@mail.gmail.com/
> >
> > user fix
> > ---
> >  arch/arm64/include/asm/mmu.h |  5 ++
> >  arch/arm64/include/asm/mmu_context.h |  2 +
> >  arch/arm64/include/asm/perf_event.h  | 14 +
> >  arch/arm64/kernel/perf_event.c   | 86 +++-
> >  4 files changed, 104 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> > index 75beffe2ee8a..ee08447455da 100644
> > --- a/arch/arm64/include/asm/mmu.h
> > +++ b/arch/arm64/include/asm/mmu.h
> > @@ -18,6 +18,11 @@
> >
> >  typedef struct {
> >   atomic64_t  id;
> > + /*
> > +  * non-zero if userspace have access to hardware
> > +  * counters directly.
> > +  */
> > + atomic_tpmu_direct_access;
> >  #ifdef CONFIG_COMPAT
> >   void*sigpage;
> >  #endif
> > diff --git a/arch/arm64/include/asm/mmu_context.h 
> > b/arch/arm64/include/asm/mmu_context.h
> > index 70ce8c1d2b07..ccb5ff417b42 100644
> > --- a/arch/arm64/include/asm/mmu_context.h
> > +++ b/arch/arm64/include/asm/mmu_context.h
> > @@ -21,6 +21,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >
> > @@ -230,6 +231,7 @@ static inline void __switch_mm(struct mm_struct *next)
> >   }
> >
> >   check_and_switch_context(next);
> > + perf_switch_user_access(next);
> >  }
> >
> >  static inline void
> > diff --git a/arch/arm64/include/asm/perf_event.h 
> > b/arch/arm64/include/asm/perf_event.h
> > index 60731f602d3e..112f3f63b79e 100644
> > --- a/arch/arm64/include/asm/perf_event.h
> > +++ b/arch/arm64/include/asm/perf_event.h
> > @@ -8,6 +8,7 @@
> >
> >  #include 
> >  #include 
> > +#include 
> >
> >  #define  ARMV8_PMU_MAX_COUNTERS  32
> >  #define  ARMV8_PMU_COUNTER_MASK  (ARMV8_PMU_MAX_COUNTERS - 1)
> > @@ -254,4 +255,17 @@ extern unsigned long perf_misc_flags(struct pt_regs 
> > *regs);
> >   (regs)->pstate = PSR_MODE_EL1h; \
> >  }
> >
> > +static inline void perf_switch_user_access(struct mm_struct *mm)
> > +{
> > + if (!IS_ENABLED(CONFIG_PERF_EVENTS))
> > + return;
>
> CONFIG_HW_PERF_EVENTS might be a better fit here.
>
> > +
> > + if (atomic_read(>context.pmu_direct_access)) {
> > + 

Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-03-30 Thread Will Deacon
On Wed, Mar 10, 2021 at 05:08:29PM -0700, Rob Herring wrote:
> From: Raphael Gault 
> 
> Keep track of event opened with direct access to the hardware counters
> and modify permissions while they are open.
> 
> The strategy used here is the same which x86 uses: every time an event
> is mapped, the permissions are set if required. The atomic field added
> in the mm_context helps keep track of the different event opened and
> de-activate the permissions when all are unmapped.
> We also need to update the permissions in the context switch code so
> that tasks keep the right permissions.
> 
> In order to enable 64-bit counters for userspace when available, a new
> config1 bit is added for userspace to indicate it wants userspace counter
> access. This bit allows the kernel to decide if chaining should be
> disabled and chaining and userspace access are incompatible.
> The modes for config1 are as follows:
> 
> config1 = 0 or 2 : user access enabled and always 32-bit
> config1 = 1 : user access disabled and always 64-bit (using chaining if 
> needed)
> config1 = 3 : user access enabled and counter size matches underlying counter.

In this last case, how does userspace know whether it got a 32-bit or a
64-bit counter?

> User access is enabled with config1 == 0 so that we match x86 behavior
> and don't need Arm specific code (outside the counter read).
> 
> Signed-off-by: Raphael Gault 
> Signed-off-by: Rob Herring 
> ---
> I'm not completely sure if using current->active_mm in an IPI is okay?
> It seems to work in my testing.
> 
> Peter Z says (event->oncpu == smp_processor_id()) in the user page
> update is always true, but my testing says otherwise[1].

Peter? Sounds like there's either a misunderstanding here or we have some
fundamental issue elsewhere.

> v6:
>  - Add new attr.config1 rdpmc bit for userspace to hint it wants
>userspace access when also requesting 64-bit counters.
> 
> v5:
>  - Only set cap_user_rdpmc if event is on current cpu
>  - Limit enabling/disabling access to CPUs associated with the PMU
>(supported_cpus) and with the mm_struct matching current->active_mm.
> 
> v2:
>  - Move mapped/unmapped into arm64 code. Fixes arm32.
>  - Rebase on cap_user_time_short changes
> 
> Changes from Raphael's v4:
>   - Drop homogeneous check
>   - Disable access for chained counters
>   - Set pmc_width in user page
> 
> [1] 
> https://lore.kernel.org/lkml/cal_jsqk+ekef5navnbfarcjre3myhfbfe54f9yhkbstnwql...@mail.gmail.com/
> 
> user fix
> ---
>  arch/arm64/include/asm/mmu.h |  5 ++
>  arch/arm64/include/asm/mmu_context.h |  2 +
>  arch/arm64/include/asm/perf_event.h  | 14 +
>  arch/arm64/kernel/perf_event.c   | 86 +++-
>  4 files changed, 104 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 75beffe2ee8a..ee08447455da 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -18,6 +18,11 @@
>  
>  typedef struct {
>   atomic64_t  id;
> + /*
> +  * non-zero if userspace have access to hardware
> +  * counters directly.
> +  */
> + atomic_tpmu_direct_access;
>  #ifdef CONFIG_COMPAT
>   void*sigpage;
>  #endif
> diff --git a/arch/arm64/include/asm/mmu_context.h 
> b/arch/arm64/include/asm/mmu_context.h
> index 70ce8c1d2b07..ccb5ff417b42 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -230,6 +231,7 @@ static inline void __switch_mm(struct mm_struct *next)
>   }
>  
>   check_and_switch_context(next);
> + perf_switch_user_access(next);
>  }
>  
>  static inline void
> diff --git a/arch/arm64/include/asm/perf_event.h 
> b/arch/arm64/include/asm/perf_event.h
> index 60731f602d3e..112f3f63b79e 100644
> --- a/arch/arm64/include/asm/perf_event.h
> +++ b/arch/arm64/include/asm/perf_event.h
> @@ -8,6 +8,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #define  ARMV8_PMU_MAX_COUNTERS  32
>  #define  ARMV8_PMU_COUNTER_MASK  (ARMV8_PMU_MAX_COUNTERS - 1)
> @@ -254,4 +255,17 @@ extern unsigned long perf_misc_flags(struct pt_regs 
> *regs);
>   (regs)->pstate = PSR_MODE_EL1h; \
>  }
>  
> +static inline void perf_switch_user_access(struct mm_struct *mm)
> +{
> + if (!IS_ENABLED(CONFIG_PERF_EVENTS))
> + return;

CONFIG_HW_PERF_EVENTS might be a better fit here.

> +
> + if (atomic_read(>context.pmu_direct_access)) {
> + write_sysreg(ARMV8_PMU_USERENR_ER|ARMV8_PMU_USERENR_CR,
> +  pmuserenr_el0);
> + } else {
> + write_sysreg(0, pmuserenr_el0);
> + }
> +}
> +
>  #endif
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 387838496955..9ad3cc523ef4 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ 

Re: [PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-03-30 Thread Zachary Leaf

On 11/03/2021 00:08, Rob Herring wrote:


In order to enable 64-bit counters for userspace when available, a new
config1 bit is added for userspace to indicate it wants userspace counter
access. This bit allows the kernel to decide if chaining should be
disabled and chaining and userspace access are incompatible.
The modes for config1 are as follows:

config1 = 0 or 2 : user access enabled and always 32-bit
config1 = 1 : user access disabled and always 64-bit (using chaining if needed)
config1 = 3 : user access enabled and counter size matches underlying counter.



Thanks for this Rob. That makes it extremely easy to request 64 bit
userspace counters without having to worry about the underlying bit
width supported on your system.

The underlying PMUv3 bit width is otherwise not accessible from
userspace as far as I can tell (e.g. the relevant PMUVer bits [11:8] of
ID_AA64DFR0_EL1 are masked off when reading from EL0 [1]), and the
workaround of requesting 64 bit, checking cap_user_rdpmc for userspace
access, and retrying with 32 bit was not that user friendly. I think it
makes a lot of sense for the kernel to handle/expose it here rather than
handled in the application code.

I think it's worth mentioning here if anyone searches, is that the 32
bit counter behaviour when added to the perf_event_mmap_page->offset is
effectively the same as a single 64 bit counter due to the offset being
incremented on overflow. Using a true 64 bit counter can avoid the
overhead of handling an interrupt for each overflow (and obviously has a
lot more headroom before it overflows, if you require very long running
perf stats).

I have tested the new config1 flags on N1-SDP and the behaviour is as
expected.

[1]
https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.rst


[PATCH v6 02/10] arm64: perf: Enable PMU counter direct access for perf event

2021-03-10 Thread Rob Herring
From: Raphael Gault 

Keep track of event opened with direct access to the hardware counters
and modify permissions while they are open.

The strategy used here is the same which x86 uses: every time an event
is mapped, the permissions are set if required. The atomic field added
in the mm_context helps keep track of the different event opened and
de-activate the permissions when all are unmapped.
We also need to update the permissions in the context switch code so
that tasks keep the right permissions.

In order to enable 64-bit counters for userspace when available, a new
config1 bit is added for userspace to indicate it wants userspace counter
access. This bit allows the kernel to decide if chaining should be
disabled and chaining and userspace access are incompatible.
The modes for config1 are as follows:

config1 = 0 or 2 : user access enabled and always 32-bit
config1 = 1 : user access disabled and always 64-bit (using chaining if needed)
config1 = 3 : user access enabled and counter size matches underlying counter.

User access is enabled with config1 == 0 so that we match x86 behavior
and don't need Arm specific code (outside the counter read).

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
I'm not completely sure if using current->active_mm in an IPI is okay?
It seems to work in my testing.

Peter Z says (event->oncpu == smp_processor_id()) in the user page
update is always true, but my testing says otherwise[1].

v6:
 - Add new attr.config1 rdpmc bit for userspace to hint it wants
   userspace access when also requesting 64-bit counters.

v5:
 - Only set cap_user_rdpmc if event is on current cpu
 - Limit enabling/disabling access to CPUs associated with the PMU
   (supported_cpus) and with the mm_struct matching current->active_mm.

v2:
 - Move mapped/unmapped into arm64 code. Fixes arm32.
 - Rebase on cap_user_time_short changes

Changes from Raphael's v4:
  - Drop homogeneous check
  - Disable access for chained counters
  - Set pmc_width in user page

[1] 
https://lore.kernel.org/lkml/cal_jsqk+ekef5navnbfarcjre3myhfbfe54f9yhkbstnwql...@mail.gmail.com/

user fix
---
 arch/arm64/include/asm/mmu.h |  5 ++
 arch/arm64/include/asm/mmu_context.h |  2 +
 arch/arm64/include/asm/perf_event.h  | 14 +
 arch/arm64/kernel/perf_event.c   | 86 +++-
 4 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 75beffe2ee8a..ee08447455da 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -18,6 +18,11 @@
 
 typedef struct {
atomic64_t  id;
+   /*
+* non-zero if userspace have access to hardware
+* counters directly.
+*/
+   atomic_tpmu_direct_access;
 #ifdef CONFIG_COMPAT
void*sigpage;
 #endif
diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index 70ce8c1d2b07..ccb5ff417b42 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -230,6 +231,7 @@ static inline void __switch_mm(struct mm_struct *next)
}
 
check_and_switch_context(next);
+   perf_switch_user_access(next);
 }
 
 static inline void
diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index 60731f602d3e..112f3f63b79e 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 
 #defineARMV8_PMU_MAX_COUNTERS  32
 #defineARMV8_PMU_COUNTER_MASK  (ARMV8_PMU_MAX_COUNTERS - 1)
@@ -254,4 +255,17 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
(regs)->pstate = PSR_MODE_EL1h; \
 }
 
+static inline void perf_switch_user_access(struct mm_struct *mm)
+{
+   if (!IS_ENABLED(CONFIG_PERF_EVENTS))
+   return;
+
+   if (atomic_read(>context.pmu_direct_access)) {
+   write_sysreg(ARMV8_PMU_USERENR_ER|ARMV8_PMU_USERENR_CR,
+pmuserenr_el0);
+   } else {
+   write_sysreg(0, pmuserenr_el0);
+   }
+}
+
 #endif
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 387838496955..9ad3cc523ef4 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -288,15 +288,22 @@ static const struct attribute_group 
armv8_pmuv3_events_attr_group = {
 
 PMU_FORMAT_ATTR(event, "config:0-15");
 PMU_FORMAT_ATTR(long, "config1:0");
+PMU_FORMAT_ATTR(rdpmc, "config1:1");
 
 static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
 {
return event->attr.config1 & 0x1;
 }
 
+static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
+{
+   return event->attr.config1 & 0x2;
+}
+
 static struct attribute *armv8_pmuv3_format_attrs[] = {
_attr_event.attr,