Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-23 Thread Suzuki Poulose

On 10/23/20 2:16 PM, Peter Zijlstra wrote:

On Fri, Oct 23, 2020 at 01:56:47PM +0100, Suzuki Poulose wrote:

On 10/23/20 11:54 AM, Peter Zijlstra wrote:



I think I'm more confused now :-/

Where do we use ->owner after event creation? The moment you create your
eventN you create the link to sink0. That link either succeeds (same
'cookie') or fails.


The event->sink link is established at creation. At event::add(), we
check the sink is free (i.e, it is inactive) or is used by an event
of the same session (this is where the owner field *was* required. But
this is not needed anymore, as we cache the "owner" read pid in the
handle->rb->aux_priv for each event and this is compared against the
pid from the handle currently driving the hardware)


*groan*.. that's going to be a mess with sinks that are shared between
CPUs :/


I'm also not seeing why exactly we need ->owner in the first place.

Suppose we make the sink0 device return -EBUSY on open() when it is
active. Then a perf session can open the sink0 device, create perf
events and attach them to the sink0 device using
perf_event_attr::config2. The events will attach to sink0 and increment
its usage count, such that any further open() will fail.


Thats where we are diverging. The sink device doesn't have any fops. It
is all managed by the coresight driver transparent to the perf tool. All
the perf tool does is, specifying which sink to use (btw, we now have
automatic sink selection support which gets rid of this, and uses
the best possible sink e.g, in case of per-CPU sinks).


per-CPU sinks sounds a lot better.

I'm really not convinced it makes sense to do what you do with shared
sinks though. You'll loose random parts of the execution trace because
of what the other CPUs do.


The ETM trace protocol has in built TraceID to distinguish the packets
and thus we could decode the trace streams from the shared buffer.
[ But, we don't have buffer overflow interrupts (I am keeping the lid 
closed on that can, for the sake of keeping sanity ;-) ), and thus

any shared session could easily loose data unless we tune the AUX
buffer size to a really large buffer ].



Full exclusive sink access is far more deterministic.


Once the events are created, the perf tool close()s the sink0 device,
which is now will in-use by the events. No other events can be attached
to it.

Or are you doing the event->sink mapping every time you do: pmu::add()?
That sounds insane.


Sink is already mapped at event create. But yes, the refcount on the
sink is managed at start/stop. Thats when we need to make sure that the
event being scheduled belongs to the same owner as the one already
driving the sink.


pmu::add() I might hope, because pmu::start() is not allowed to fail.



Right. If we can't get the sink, we simply truncate the buffer.


That way another session could use the same sink if it is free. i.e

perf record -e cs_etm/@sink0/u --per-thread app1

and

perf record -e cs_etm/@sink0/u --per-thread app2

both can work as long as the sink is not used by the other session.


Like said above, if sink is shared between CPUs, that's going to be a
trainwreck :/ Why do you want that?


That ship has sailed. That is how the current generation of systems are,
unfortunately. But as I said, this is changing and there are guidelines
in place to avoid these kind of topologies. With the future
technologies, this will be completely gone.



And once you have per-CPU sinks like mentioned above, the whole problem
goes away.


True, until then, this is the best we could do.

Suzuki





Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-23 Thread Suzuki Poulose

On 10/23/20 11:54 AM, Peter Zijlstra wrote:

On Fri, Oct 23, 2020 at 11:34:32AM +0100, Suzuki Poulose wrote:

On 10/23/20 10:41 AM, Peter Zijlstra wrote:

On Fri, Oct 23, 2020 at 09:49:53AM +0100, Suzuki Poulose wrote:

On 10/23/20 8:39 AM, Peter Zijlstra wrote:



So then I don't understand the !->owner issue, that only happens when
the task dies, which cannot be concurrent with event creation. Are you


Part of the patch from Sai, fixes this by avoiding the dereferencing
after event creation (by caching it). But the kernel events needs
fixing.


I'm fundamentally failing here. Creating a link to the sink is strictly
event-creation time. Why would you ever need it again later? Later you
already have the sink setup.



Sorry for the lack of clarity here, and you are not alone unless you
have drowned in the CoreSight topologies ;-)

Typically current generation of systems have the following topology :

CPU0
  etm0   \
  \  
  /  \
CPU1/\
   etm1\
\
/---  sink0
CPU2  /
   etm2  \/
  \  /
  /
CPU3/
   etm3


i.e, Multiple ETMs share a sink. [for the sake of simplicity, I have
used one sink. Even though there could be potential sinks (of different
types), none of them are private to the ETMs. So, in a nutshell, a sink
can be reached by multiple ETMs. ]

Now, for a session :

perf record -e cs_etm/sinkid=sink0/u workload

We create an event per CPU (say eventN, which are scheduled based on the
threads that could execute on the CPU. At this point we have finalized
the sink0, and have allocated necessary buffer for the sink0.

Now, when the threads are scheduled on the CPUs, we start the
appropriate events for the CPUs.

e.g,
  CPU0 sched -> workload:0 - > etm0->event0_start -> Turns all
the components upto sink0, starting the trace collection in the buffer.

Now, if another CPU, CPU1 starts tracing event1 for workload:1 thread,
it will eventually try to turn ON the sink0.Since sink0 is already
active tracing event0, we could allow this to go through and collect
the trace in the *same hardware buffer* (which can be demuxed from the
single AUX record using the TraceID in the packets). Please note that
we do double buffering and hardware buffer is copied only when the sink0
is stopped (see below).

But, if the event scheduled on CPU1 doesn't belong to the above session, but
belongs to different perf session
  (say, perf record -e  cs_etm/sinkid=sink0/u benchmark),

we can't allow this to succeed and mix the trace data in to that of workload
and thus fail the operation.

In a nutshell, since the sinks are shared, we start the sink on the
first event and keeps sharing the sink buffer with any event that
belongs to the same session (using refcounts). The sink is only released
for other sessions, when there are no more events in the session tracing
on any of the ETMs.

I know this is fundamentally a topology issue, but that is not something
we can fix. But the situation is changing and we are starting to see
systems with per-CPU sinks.

Hope this helps.


I think I'm more confused now :-/

Where do we use ->owner after event creation? The moment you create your
eventN you create the link to sink0. That link either succeeds (same
'cookie') or fails.


The event->sink link is established at creation. At event::add(), we
check the sink is free (i.e, it is inactive) or is used by an event
of the same session (this is where the owner field *was* required. But
this is not needed anymore, as we cache the "owner" read pid in the 
handle->rb->aux_priv for each event and this is compared against the

pid from the handle currently driving the hardware)




If it fails, event creation fails, the end.




On success, we have the sink pointer in our event and we never ever need
to look at ->owner ever again.


Correct, we don't need it after the event creation "one part of the 
patch" as explained above.




I'm also not seeing why exactly we need ->owner in the first place.

Suppose we make the sink0 device return -EBUSY on open() when it is
active. Then a perf session can open the sink0 device, create perf
events and attach them to the sink0 device using
perf_event_attr::config2. The events will attach to sink0 and increment
its usage count, such that any further open() will fail.


Thats where we are diverging. The sink device doesn't have any fops. It
is all managed by the coresight driver transparent to the perf tool. All
the perf tool does is, specifying which sink to use (btw, we now have
automatic sink selection support which gets rid of this, and uses
the best possible sink e.g, in case of per-CPU sinks).



Once the events are created, the perf tool close()s the sink0 device,
which is now will in-use by the events. No other events can be attached
to 

Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-23 Thread Suzuki Poulose

On 10/23/20 10:23 AM, Peter Zijlstra wrote:

On Fri, Oct 23, 2020 at 09:49:53AM +0100, Suzuki Poulose wrote:

On 10/23/20 8:39 AM, Peter Zijlstra wrote:



So then I don't understand the !->owner issue, that only happens when
the task dies, which cannot be concurrent with event creation. Are you


Part of the patch from Sai, fixes this by avoiding the dereferencing
after event creation (by caching it). But the kernel events needs
fixing.

One follow up question on the !->owner issue. Given the ->owner is
dying, does it prevent events from being scheduled ? Or is there a delay
between that and eventually stopping the events. In this case, we hit
the issue when :

A A or B ?

event_start()
   ...  event->owner = NULL

  READ_ONCE(event->owner);

Is this expected ?


Yeah, teardown is a bit of an effort. Also, you can pass an fd over a
unix socket to another process, so this isn't something you can rely on
in any case.

The perf tool doesn't do it, but the kernel infra should be able to deal
with someone doing a perf-deamon of sorts, where you can request a perf
event and recieve a fd from it.

Imagine the fun ;-)


As for the kernel events.. why do you care about the actual task_struct
* in there? I see you're using it to grab the task-pid, but how is that
useful?


Correct, kernel events are something that the driver didn't account for.
May be we could handle this case with a "special pid" and simply
disallow sharing (which is fine I believe, given there are not grouping
for the kernel created events).


Why do you need a pid in the first place? Can't you use the "task_struct
*" as a value?


We could. But, without a refcount on the task pointer, that could be
tricky, even though we don't dereference it. In the same situation,
if the tsk owner dies and is freed and is reallocated to a new perf 
session task but with different PID, we could be mixing things up again

?

Special pid here could be -1.


Re: [PATCH v3] coresight: etm4x: Modify core-commit of cpu to avoid the overflow of HiSilicon ETM

2020-10-23 Thread Suzuki Poulose

On 10/23/20 11:23 AM, Qi Liu wrote:

The ETM device can't keep up with the core pipeline when cpu core
is at full speed. This may cause overflow within core and its ETM.
This is a common phenomenon on ETM devices.

On HiSilicon Hip08 platform, a specific feature is added to set
core pipeline. So commit rate can be reduced manually to avoid ETM
overflow.

Signed-off-by: Qi Liu 
---
Change since v1:
- add CONFIG_ETM4X_IMPDEF_FEATURE and CONFIG_ETM4X_IMPDEF_HISILICON
   to keep specific feature off platforms which don't use it.
Change since v2:
- remove some unused variable.

  drivers/hwtracing/coresight/Kconfig| 18 
  drivers/hwtracing/coresight/coresight-etm4x-core.c | 50 ++
  2 files changed, 68 insertions(+)

diff --git a/drivers/hwtracing/coresight/Kconfig 
b/drivers/hwtracing/coresight/Kconfig
index c119824..9665d70 100644
--- a/drivers/hwtracing/coresight/Kconfig
+++ b/drivers/hwtracing/coresight/Kconfig
@@ -110,6 +110,24 @@ config CORESIGHT_SOURCE_ETM4X
  To compile this driver as a module, choose M here: the
  module will be called coresight-etm4x.

+config ETM4X_IMPDEF_FEATURE
+   bool "Control overflow impdef support in CoreSight ETM 4.x driver "
+   depends on CORESIGHT_SOURCE_ETM4X
+   help
+ This control provides overflow implement define for CoreSight
+ ETM 4.x tracer module which could not reduce commit race
+ automatically, and could avoid overflow within ETM tracer module
+ and its cpu core.
+
+config ETM4X_IMPDEF_HISILICON
+   bool "Control overflow impdef support in HiSilicon ETM 4.x driver "
+   depends on ETM4X_IMPDEF_FEATURE
+   help
+ This control provides overflow implement define for HiSilicon
+ ETM 4.x tracer module of Hip08 platform. Overflow within ETM
+ tracer module and its cpu core can be avoided by reducing core
+ commit manually.
+
  config CORESIGHT_STM
tristate "CoreSight System Trace Macrocell driver"
depends on (ARM && !(CPU_32v3 || CPU_32v4 || CPU_32v4T)) || ARM64
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c 
b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index abd706b..35f4333 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -103,12 +103,61 @@ struct etm4_enable_arg {
int rc;
  };

+#ifdef CONFIG_ETM4X_IMPDEF_FEATURE
+
+#ifdef CONFIG_ETM4X_IMPDEF_HISILICON
+
+#define HISI_HIP08_CORE_COMMIT_CLEAR   0x3000
+#define HISI_HIP08_CORE_COMMIT_SHIFT   12
+static void etm4_hisi_config_core_commit(int flag)


nit: s/int flag/bool enable ?


+{
+   u64 val;
+
+   asm volatile("mrs %0,s3_1_c15_c2_5" : "=r"(val));
+   val &= ~HISI_HIP08_CORE_COMMIT_CLEAR;
+   val |= flag << HISI_HIP08_CORE_COMMIT_SHIFT;
+   asm volatile("msr s3_1_c15_c2_5,%0" : : "r"(val));


Sorry for missing this out. We don't encourage the above encodings
as it will break on legacy toolchain. So, please could you follow what
we do normally, by using sys_reg() macros to define the encoding and use
read/write_sysreg_s() instead. See arch/arm64/include/asm/sysreg.h



+}
+#else
+static void etm4_hisi_config_core_commit(int flag)
+{
+}
+#endif /* CONFIG_ETM4X_IMPDEF_HISILICON */
+
+static void etm4_enable_arch_specific(void)
+{
+   /*
+* If ETM device is HiSilicon ETM device, reduce the
+* core-commit to avoid ETM overflow.
+*/
+   etm4_hisi_config_core_commit(1);
+}
+
+static void etm4_disable_arch_specific(void)
+{
+   /*
+* If ETM device is HiSilicon ETM device, resume the
+* core-commit after ETM trace is complete.
+*/
+   etm4_hisi_config_core_commit(0);
+}
+#else
+static void etm4_enable_arch_specific(void)
+{
+}
+
+static void etm4_disable_arch_specific(void)
+{
+}
+#endif /* CONFIG_ETM4X_IMPDEF_FEATURE */
+
  static int etm4_enable_hw(struct etmv4_drvdata *drvdata)
  {
int i, rc;
struct etmv4_config *config = &drvdata->config;
struct device *etm_dev = &drvdata->csdev->dev;

+   etm4_enable_arch_specific();
CS_UNLOCK(drvdata->base);


Please could we move the enable_arch_specific() after the CS_UNLOCK ?
That way any potential code could do something with the ETM4x registers.



etm4_os_unlock(drvdata);
@@ -475,6 +524,7 @@ static void etm4_disable_hw(void *info)
struct device *etm_dev = &drvdata->csdev->dev;
int i;

+   etm4_disable_arch_specific();
CS_UNLOCK(drvdata->base);



Same here

Suzuki


Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-23 Thread Suzuki Poulose

On 10/23/20 10:41 AM, Peter Zijlstra wrote:

On Fri, Oct 23, 2020 at 09:49:53AM +0100, Suzuki Poulose wrote:

On 10/23/20 8:39 AM, Peter Zijlstra wrote:



So then I don't understand the !->owner issue, that only happens when
the task dies, which cannot be concurrent with event creation. Are you


Part of the patch from Sai, fixes this by avoiding the dereferencing
after event creation (by caching it). But the kernel events needs
fixing.


I'm fundamentally failing here. Creating a link to the sink is strictly
event-creation time. Why would you ever need it again later? Later you
already have the sink setup.



Sorry for the lack of clarity here, and you are not alone unless you
have drowned in the CoreSight topologies ;-)

Typically current generation of systems have the following topology :

CPU0
 etm0   \
 \  
 /  \
CPU1/\
  etm1\
   \
   /---  sink0
CPU2  /
  etm2  \/
 \  /
 /
CPU3/
  etm3


i.e, Multiple ETMs share a sink. [for the sake of simplicity, I have
used one sink. Even though there could be potential sinks (of different
types), none of them are private to the ETMs. So, in a nutshell, a sink
can be reached by multiple ETMs. ]

Now, for a session :

perf record -e cs_etm/sinkid=sink0/u workload

We create an event per CPU (say eventN, which are scheduled based on the
threads that could execute on the CPU. At this point we have finalized
the sink0, and have allocated necessary buffer for the sink0.

Now, when the threads are scheduled on the CPUs, we start the
appropriate events for the CPUs.

e.g,
 CPU0 sched -> workload:0 - > etm0->event0_start -> Turns all
the components upto sink0, starting the trace collection in the buffer.

Now, if another CPU, CPU1 starts tracing event1 for workload:1 thread,
it will eventually try to turn ON the sink0.Since sink0 is already
active tracing event0, we could allow this to go through and collect
the trace in the *same hardware buffer* (which can be demuxed from the 
single AUX record using the TraceID in the packets). Please note that

we do double buffering and hardware buffer is copied only when the sink0
is stopped (see below).

But, if the event scheduled on CPU1 doesn't belong to the above session, 
but belongs to different perf session

 (say, perf record -e  cs_etm/sinkid=sink0/u benchmark),

we can't allow this to succeed and mix the trace data in to that of 
workload and thus fail the operation.


In a nutshell, since the sinks are shared, we start the sink on the
first event and keeps sharing the sink buffer with any event that
belongs to the same session (using refcounts). The sink is only released
for other sessions, when there are no more events in the session tracing
on any of the ETMs.

I know this is fundamentally a topology issue, but that is not something
we can fix. But the situation is changing and we are starting to see
systems with per-CPU sinks.

Hope this helps.

Suzuki


Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-23 Thread Suzuki Poulose

Hi Peter

On 10/23/20 8:39 AM, Peter Zijlstra wrote:

On Thu, Oct 22, 2020 at 03:20:33PM -0600, Mathieu Poirier wrote:

Suzuki's depiction of the usecase is accurate.  Using the pid of the process
that created the events comes out of a discussion you and I had in the common
area by the Intel booth at ELC in Edinburgh in the fall of 2018.  At the time I
exposed the problem of having multiple events sharing the same HW resources and
you advised to proceed this way.


Bah, I was afraid of that. I desperately tried to find correspondence on
it, but alas, verbal crap doesn't end up in the Sent folder :-/


That being said it is plausible that I did not expressed myself clearly enough
for you to understand the full extend of the problem.  If that is the case we
are more than willing to revisit that solution.  Do you see a better option than
what has currently been implemented?


Moo... that really could've done with a comment I suppose.

So then I don't understand the !->owner issue, that only happens when
the task dies, which cannot be concurrent with event creation. Are you


Part of the patch from Sai, fixes this by avoiding the dereferencing
after event creation (by caching it). But the kernel events needs
fixing.

One follow up question on the !->owner issue. Given the ->owner is
dying, does it prevent events from being scheduled ? Or is there a delay
between that and eventually stopping the events. In this case, we hit
the issue when :

A A or B ?

event_start()
  ...   event->owner = NULL

 READ_ONCE(event->owner);

Is this expected ?


somehow accessing ->owner later?




As for the kernel events.. why do you care about the actual task_struct
* in there? I see you're using it to grab the task-pid, but how is that
useful?


Correct, kernel events are something that the driver didn't account for.
May be we could handle this case with a "special pid" and simply
disallow sharing (which is fine I believe, given there are not grouping
for the kernel created events).

Suzuki


Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-22 Thread Suzuki Poulose

On 10/22/20 4:06 PM, Peter Zijlstra wrote:

On Thu, Oct 22, 2020 at 02:30:21PM +0100, Suzuki Poulose wrote:

On 10/22/20 12:32 PM, Peter Zijlstra wrote:

On Thu, Oct 22, 2020 at 04:27:52PM +0530, Sai Prakash Ranjan wrote:


Looking at the ETR and other places in the kernel, ETF and the
ETB are the only places trying to dereference the task(owner)
in tmc_enable_etf_sink_perf() which is also called from the
sched_in path as in the call trace.



@@ -391,6 +392,10 @@ static void *tmc_alloc_etf_buffer(struct coresight_device 
*csdev,
   {
int node;
struct cs_buffers *buf;
+   struct task_struct *task = READ_ONCE(event->owner);
+
+   if (!task || is_kernel_event(event))
+   return NULL;



This is *wrong*... why do you care about who owns the events?



This is due to the special case of the CoreSight configuration, where
a "sink" (where the trace data is captured) is shared by multiple Trace
units. So, we could share the "sink" for multiple trace units if they
are tracing the events that belong to the same "perf" session. (The
userspace tool could decode the trace data based on the TraceID
in the trace packets). Is there a better way to do this ?


I thought we added sink identification through perf_event_attr::config2
?



Correct. attr:config2 identifies the "sink" for the collection. But,
that doesn't solve the problem we have here. If two separate perf
sessions use the "same sink", we don't want to mix the
trace data into the same sink for events from different sessions.

Thus, we need a way to check if a new event starting the tracing on
an ETM belongs to the same session as the one already pumping the trace
into the sink.

We use event->owner pid for this check and thats where we encountered
a NULL event->owner. Looking at the code further, we identified that
kernel events could also trigger this issue.

Suzuki


Re: [PATCHv2 2/4] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-22 Thread Suzuki Poulose

On 10/22/20 12:32 PM, Peter Zijlstra wrote:

On Thu, Oct 22, 2020 at 04:27:52PM +0530, Sai Prakash Ranjan wrote:


Looking at the ETR and other places in the kernel, ETF and the
ETB are the only places trying to dereference the task(owner)
in tmc_enable_etf_sink_perf() which is also called from the
sched_in path as in the call trace.



@@ -391,6 +392,10 @@ static void *tmc_alloc_etf_buffer(struct coresight_device 
*csdev,
  {
int node;
struct cs_buffers *buf;
+   struct task_struct *task = READ_ONCE(event->owner);
+
+   if (!task || is_kernel_event(event))
+   return NULL;



This is *wrong*... why do you care about who owns the events?



This is due to the special case of the CoreSight configuration, where
a "sink" (where the trace data is captured) is shared by multiple Trace
units. So, we could share the "sink" for multiple trace units if they
are tracing the events that belong to the same "perf" session. (The
userspace tool could decode the trace data based on the TraceID
in the trace packets). Is there a better way to do this ?

Suzuki


Re: [PATCH 1/2] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-22 Thread Suzuki Poulose

On 10/22/20 12:07 PM, Sai Prakash Ranjan wrote:

On 2020-10-22 14:57, Suzuki Poulose wrote:

On 10/22/20 9:02 AM, Sai Prakash Ranjan wrote:

On 2020-10-21 15:38, Suzuki Poulose wrote:

On 10/21/20 8:29 AM, Sai Prakash Ranjan wrote:

On 2020-10-20 21:40, Sai Prakash Ranjan wrote:

On 2020-10-14 21:29, Sai Prakash Ranjan wrote:

On 2020-10-14 18:46, Suzuki K Poulose wrote:

On 10/14/2020 10:36 AM, Sai Prakash Ranjan wrote:

On 2020-10-13 22:05, Suzuki K Poulose wrote:

On 10/07/2020 02:00 PM, Sai Prakash Ranjan wrote:

There was a report of NULL pointer dereference in ETF enable
path for perf CS mode with PID monitoring. It is almost 100%
reproducible when the process to monitor is something very
active such as chrome and with ETF as the sink and not ETR.
Currently in a bid to find the pid, the owner is dereferenced
via task_pid_nr() call in tmc_enable_etf_sink_perf() and with
owner being NULL, we get a NULL pointer dereference.

Looking at the ETR and other places in the kernel, ETF and the
ETB are the only places trying to dereference the task(owner)
in tmc_enable_etf_sink_perf() which is also called from the
sched_in path as in the call trace. Owner(task) is NULL even
in the case of ETR in tmc_enable_etr_sink_perf(), but since we
cache the PID in alloc_buffer() callback and it is done as part
of etm_setup_aux() when allocating buffer for ETR sink, we never
dereference this NULL pointer and we are safe. So lets do the


The patch is necessary to fix some of the issues. But I feel 
it is
not complete. Why is it safe earlier and not later ? I believe 
we are
simply reducing the chances of hitting the issue, by doing 
this earlier than
later. I would say we better fix all instances to make sure 
that the

event->owner is valid. (e.g, I can see that the for kernel events
event->owner == -1 ?)

struct task_struct *tsk = READ_ONCE(event->owner);

if (!tsk || is_kernel_event(event))
   /* skip ? */



Looking at it some more, is_kernel_event() is not exposed
outside events core and probably for good reason. Why do
we need to check for this and not just tsk?


Because the event->owner could be :

 = NULL
 = -1UL  // kernel event
 = valid.



Yes I understood that part, but here we were trying to
fix the NULL pointer dereference right and hence the
question as to why we need to check for kernel events?
I am no expert in perf but I don't see anywhere in the
kernel checking for is_kernel_event(), so I am a bit
skeptical if exporting that is actually right or not.



I have stress tested with the original patch many times
now, i.e., without a check for event->owner and is_kernel_event()
and didn't observe any crash. Plus on ETR where this was already
done, no crashes were reported till date and with ETF, the issue
was quickly reproducible, so I am fairly confident that this
doesn't just delay the original issue but actually fixes
it. I will run an overnight test again to confirm this.



I ran the overnight test which collected aroung 4G data(see below),
with the following small change to see if the two cases
(event->owner=NULL and is_kernel_event()) are triggered
with suggested changes and it didn't trigger at all.
Do we still need those additional checks?



Yes. Please see perf_event_create_kernel_event(), which is
an exported function allowing any kernel code (including modules)
to use the PMU (just like the userspace perf tool would do).
Just because your use case doesn't trigger this (because
you don't run something that can trigger this) doesn't mean
this can't be triggered.



Thanks for that pointer, I will add them in the next version.



And instead of redefining TASK_TOMBSTONE in the driver, you
may simply use IS_ERR_OR_NULL(tsk) to cover both NULL case
and kernel event.



Ugh sorry, sent out v2 exporting is_kernel_event() before seeing
this comment, I will resend.


Saw that. I would say, wait until someone complains about that. If
people are Ok with exporting it, it is fine. I guess it will be useful.
You could fall back to this approach if there is resistance.

Cheers
Suzuki


Re: [PATCH 1/2] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-22 Thread Suzuki Poulose

On 10/22/20 9:02 AM, Sai Prakash Ranjan wrote:

On 2020-10-21 15:38, Suzuki Poulose wrote:

On 10/21/20 8:29 AM, Sai Prakash Ranjan wrote:

On 2020-10-20 21:40, Sai Prakash Ranjan wrote:

On 2020-10-14 21:29, Sai Prakash Ranjan wrote:

On 2020-10-14 18:46, Suzuki K Poulose wrote:

On 10/14/2020 10:36 AM, Sai Prakash Ranjan wrote:

On 2020-10-13 22:05, Suzuki K Poulose wrote:

On 10/07/2020 02:00 PM, Sai Prakash Ranjan wrote:

There was a report of NULL pointer dereference in ETF enable
path for perf CS mode with PID monitoring. It is almost 100%
reproducible when the process to monitor is something very
active such as chrome and with ETF as the sink and not ETR.
Currently in a bid to find the pid, the owner is dereferenced
via task_pid_nr() call in tmc_enable_etf_sink_perf() and with
owner being NULL, we get a NULL pointer dereference.

Looking at the ETR and other places in the kernel, ETF and the
ETB are the only places trying to dereference the task(owner)
in tmc_enable_etf_sink_perf() which is also called from the
sched_in path as in the call trace. Owner(task) is NULL even
in the case of ETR in tmc_enable_etr_sink_perf(), but since we
cache the PID in alloc_buffer() callback and it is done as part
of etm_setup_aux() when allocating buffer for ETR sink, we never
dereference this NULL pointer and we are safe. So lets do the


The patch is necessary to fix some of the issues. But I feel it is
not complete. Why is it safe earlier and not later ? I believe 
we are
simply reducing the chances of hitting the issue, by doing this 
earlier than
later. I would say we better fix all instances to make sure that 
the

event->owner is valid. (e.g, I can see that the for kernel events
event->owner == -1 ?)

struct task_struct *tsk = READ_ONCE(event->owner);

if (!tsk || is_kernel_event(event))
   /* skip ? */



Looking at it some more, is_kernel_event() is not exposed
outside events core and probably for good reason. Why do
we need to check for this and not just tsk?


Because the event->owner could be :

 = NULL
 = -1UL  // kernel event
 = valid.



Yes I understood that part, but here we were trying to
fix the NULL pointer dereference right and hence the
question as to why we need to check for kernel events?
I am no expert in perf but I don't see anywhere in the
kernel checking for is_kernel_event(), so I am a bit
skeptical if exporting that is actually right or not.



I have stress tested with the original patch many times
now, i.e., without a check for event->owner and is_kernel_event()
and didn't observe any crash. Plus on ETR where this was already
done, no crashes were reported till date and with ETF, the issue
was quickly reproducible, so I am fairly confident that this
doesn't just delay the original issue but actually fixes
it. I will run an overnight test again to confirm this.



I ran the overnight test which collected aroung 4G data(see below),
with the following small change to see if the two cases
(event->owner=NULL and is_kernel_event()) are triggered
with suggested changes and it didn't trigger at all.
Do we still need those additional checks?



Yes. Please see perf_event_create_kernel_event(), which is
an exported function allowing any kernel code (including modules)
to use the PMU (just like the userspace perf tool would do).
Just because your use case doesn't trigger this (because
you don't run something that can trigger this) doesn't mean
this can't be triggered.



Thanks for that pointer, I will add them in the next version.



And instead of redefining TASK_TOMBSTONE in the driver, you
may simply use IS_ERR_OR_NULL(tsk) to cover both NULL case
and kernel event.

Cheers
Suzuki


Re: [PATCH 1/2] coresight: tmc-etf: Fix NULL ptr dereference in tmc_enable_etf_sink_perf()

2020-10-21 Thread Suzuki Poulose

On 10/21/20 8:29 AM, Sai Prakash Ranjan wrote:

On 2020-10-20 21:40, Sai Prakash Ranjan wrote:

On 2020-10-14 21:29, Sai Prakash Ranjan wrote:

On 2020-10-14 18:46, Suzuki K Poulose wrote:

On 10/14/2020 10:36 AM, Sai Prakash Ranjan wrote:

On 2020-10-13 22:05, Suzuki K Poulose wrote:

On 10/07/2020 02:00 PM, Sai Prakash Ranjan wrote:

There was a report of NULL pointer dereference in ETF enable
path for perf CS mode with PID monitoring. It is almost 100%
reproducible when the process to monitor is something very
active such as chrome and with ETF as the sink and not ETR.
Currently in a bid to find the pid, the owner is dereferenced
via task_pid_nr() call in tmc_enable_etf_sink_perf() and with
owner being NULL, we get a NULL pointer dereference.

Looking at the ETR and other places in the kernel, ETF and the
ETB are the only places trying to dereference the task(owner)
in tmc_enable_etf_sink_perf() which is also called from the
sched_in path as in the call trace. Owner(task) is NULL even
in the case of ETR in tmc_enable_etr_sink_perf(), but since we
cache the PID in alloc_buffer() callback and it is done as part
of etm_setup_aux() when allocating buffer for ETR sink, we never
dereference this NULL pointer and we are safe. So lets do the


The patch is necessary to fix some of the issues. But I feel it is
not complete. Why is it safe earlier and not later ? I believe we are
simply reducing the chances of hitting the issue, by doing this 
earlier than

later. I would say we better fix all instances to make sure that the
event->owner is valid. (e.g, I can see that the for kernel events
event->owner == -1 ?)

struct task_struct *tsk = READ_ONCE(event->owner);

if (!tsk || is_kernel_event(event))
   /* skip ? */



Looking at it some more, is_kernel_event() is not exposed
outside events core and probably for good reason. Why do
we need to check for this and not just tsk?


Because the event->owner could be :

 = NULL
 = -1UL  // kernel event
 = valid.



Yes I understood that part, but here we were trying to
fix the NULL pointer dereference right and hence the
question as to why we need to check for kernel events?
I am no expert in perf but I don't see anywhere in the
kernel checking for is_kernel_event(), so I am a bit
skeptical if exporting that is actually right or not.



I have stress tested with the original patch many times
now, i.e., without a check for event->owner and is_kernel_event()
and didn't observe any crash. Plus on ETR where this was already
done, no crashes were reported till date and with ETF, the issue
was quickly reproducible, so I am fairly confident that this
doesn't just delay the original issue but actually fixes
it. I will run an overnight test again to confirm this.



I ran the overnight test which collected aroung 4G data(see below),
with the following small change to see if the two cases
(event->owner=NULL and is_kernel_event()) are triggered
with suggested changes and it didn't trigger at all.
Do we still need those additional checks?



Yes. Please see perf_event_create_kernel_event(), which is
an exported function allowing any kernel code (including modules)
to use the PMU (just like the userspace perf tool would do).
Just because your use case doesn't trigger this (because
you don't run something that can trigger this) doesn't mean
this can't be triggered.

Cheers
Suzuki



[ perf record: Captured and wrote 4677.989 MB perf.data ]

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c

index 989d965f3d90..123c446ce585 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -13,6 +13,13 @@
  #include "coresight-tmc.h"
  #include "coresight-etm-perf.h"

+#define TASK_TOMBSTONE ((void *)-1L)
+
+static bool is_kernel_event2(struct perf_event *event)
+{
+   return READ_ONCE(event->owner) == TASK_TOMBSTONE;
+}
+
  static int tmc_set_etf_buffer(struct coresight_device *csdev,
   struct perf_output_handle *handle);

@@ -392,6 +399,15 @@ static void *tmc_alloc_etf_buffer(struct 
coresight_device *csdev,

  {
     int node;
     struct cs_buffers *buf;
+   struct task_struct *task = READ_ONCE(event->owner);
+
+   if (!task) {
+   pr_info("**sai in task=NULL**\n");
+   return NULL;
+   }
+
+   if (is_kernel_event2(event))
+   pr_info("**sai in is_kernel_event**\n");

     node = (event->cpu == -1) ? NUMA_NO_NODE : 
cpu_to_node(event->cpu);



Thanks,
Sai





Re: [PATCH] coresight: etm4x: Add config to exclude kernel mode tracing

2020-10-16 Thread Suzuki Poulose

On 10/16/20 2:14 PM, Leo Yan wrote:

On Fri, Oct 16, 2020 at 12:38:47PM +0100, Suzuki Kuruppassery Poulose wrote:

[...]


What happens to the sysfs mode of tracing? For that we would still
need a config right to exclude kernel mode tracing completely.


IIUC, sysfs mode and perf mode both can apply the same approach, the
guest OS runs a thread context for the host, so when a guest OS is
switched in or out, the hypervisor can save/restore the context for
the guest OS; thus every guest OS will have its dedicated context and
trace data ideally.


I don't think Guest Context is something we can support as mentioned
above, at least for systems without sysreg access for ETMs (and virtualizing
ETRs is a different story !)


Thanks for sharing thoughts, Suzuki.

I missed the device virtulisation.  Here should virtualize all devices
(includes CoreSight ETM/funnel/ETR/ETF)?  Or only need to virtualize
ETRs?


I wouldn't even think of virtualizing the components without sysreg
access. So let us not worry about it :-)

Cheers
Suzuki


Re: [PATCH] coresight: etm4x: Skip setting LPOVERRIDE bit for qcom,skip-power-up

2020-10-16 Thread Suzuki Poulose

On 10/16/20 12:47 PM, Sai Prakash Ranjan wrote:

Hi Suzuki,

On 2020-10-16 16:51, Suzuki Poulose wrote:

Hi Sai,

On 10/16/20 11:10 AM, Sai Prakash Ranjan wrote:

There is a bug on the systems supporting to skip power up
(qcom,skip-power-up) where setting LPOVERRIDE bit(low-power
state override behaviour) will result in CPU hangs/lockups
even on the implementations which supports it. So skip
setting the LPOVERRIDE bit for such platforms.

Fixes: 02510a5aa78d ("coresight: etm4x: Add support to skip trace 
unit power up")

Signed-off-by: Sai Prakash Ranjan 


The fix is fine by me. Btw, is there a hardware Erratum assigned for
this ? It would be good to have the Erratum documented somewhere,
preferrably ( Documentation/arm64/silicon-errata.rst )



No, afaik we don't have any erratum assigned to this bug.


Ok. Please double check, if there are any.


It was already present in downstream kernel and since we
support these targets with the previous HW bug
(qcom,skip-power-up) now in upstream, we would need this
fix in upstream kernel as well.


I understand the need for the fix and we must fix it. I was
looking to document this in the central place for errata's
handled in the kernel. And I missed asking this question
when the original patch was posted. So, thought of asking
the question now anyways. Better late than never ;-)

Reviewed-by: Suzuki K Poulose 


Re: [PATCH] coresight: etm4x: Add config to exclude kernel mode tracing

2020-10-16 Thread Suzuki Poulose

On 10/16/20 10:24 AM, Leo Yan wrote:

Hi Sai,

On Fri, Oct 16, 2020 at 02:10:47PM +0530, Sai Prakash Ranjan wrote:

Hi Leo,

On 2020-10-16 12:54, Leo Yan wrote:

On Thu, Oct 15, 2020 at 11:40:05PM -0700, Denis Nikitin wrote:

Hi Mathieu,

I think one of the use cases could be VMs.
Is there isolation between EL1 guest kernels which we can control
from perf
in a system wide mode?


Sorry for suddenly jumping in.

For KVM, I think we need to implement mechanism for saving/restoring
CoreSight context for every guest OS, the CPU PMUs has implemented
related features [1].

Thanks,
Leo

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kvm/pmu.c



Its not as easy as the CPU PMU for virtualizing the ETMs (with memory
mapped access only), i.e supporting ETMs from VMs.
We could definitely stop/resume on guest entry/exit, to support 
attr.exclude_guest.




What happens to the sysfs mode of tracing? For that we would still
need a config right to exclude kernel mode tracing completely.


IIUC, sysfs mode and perf mode both can apply the same approach, the
guest OS runs a thread context for the host, so when a guest OS is
switched in or out, the hypervisor can save/restore the context for
the guest OS; thus every guest OS will have its dedicated context and
trace data ideally.


I don't think Guest Context is something we can support as mentioned
above, at least for systems without sysreg access for ETMs (and 
virtualizing ETRs is a different story !)


Cheers
Suzuki


Re: [PATCH] coresight: etm4x: Skip setting LPOVERRIDE bit for qcom,skip-power-up

2020-10-16 Thread Suzuki Poulose

Hi Sai,

On 10/16/20 11:10 AM, Sai Prakash Ranjan wrote:

There is a bug on the systems supporting to skip power up
(qcom,skip-power-up) where setting LPOVERRIDE bit(low-power
state override behaviour) will result in CPU hangs/lockups
even on the implementations which supports it. So skip
setting the LPOVERRIDE bit for such platforms.

Fixes: 02510a5aa78d ("coresight: etm4x: Add support to skip trace unit power 
up")
Signed-off-by: Sai Prakash Ranjan 


The fix is fine by me. Btw, is there a hardware Erratum assigned for
this ? It would be good to have the Erratum documented somewhere,
preferrably ( Documentation/arm64/silicon-errata.rst )


---
  drivers/hwtracing/coresight/coresight-etm4x-core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c 
b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index abd706b216ac..6096d7abf80d 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -779,7 +779,7 @@ static void etm4_init_arch_data(void *info)
 * LPOVERRIDE, bit[23] implementation supports
 * low-power state override
 */
-   if (BMVAL(etmidr5, 23, 23))
+   if (BMVAL(etmidr5, 23, 23) && (!drvdata->skip_power_up))
drvdata->lpoverride = true;
else
drvdata->lpoverride = false;

base-commit: 3477326277451000bc667dfcc4fd0774c039184c





Re: [PATCH] coresight: etm4x: Add config to exclude kernel mode tracing

2020-10-16 Thread Suzuki Poulose

On 10/16/20 7:40 AM, Denis Nikitin wrote:

Hi Mathieu,

I think one of the use cases could be VMs.
Is there isolation between EL1 guest kernels which we can control from 
perf in a system wide mode?


The proposed solution doesn't solve this for VMs anyway. It only
excludes EL1 *OR* EL2, depending on the host kernel's running  EL.
We cannot support Virtual ETM access for VMs with memory mapped
accesses.

Unforutnately, trace filtering is the solution for preventing tracing
for EL1 guest/kernel (available from v8.4 Self Hosted extensions). Other
option is to add support for "exclude_guest" support for CoreSight for perf.
But again this can't be controlled by sysfs. And it can't be enforced 
for perf, if not specified. Again it all goes back to the root

permission hammer lock which Mathieu pointed out.


With the v8.4 Self hosted trace extensions, Guest and Host both could
control individually if they can be traced (both EL0 and EL1/2).

Suzuki



Thanks,
Denis

On Thu, Oct 15, 2020 at 9:03 AM Mathieu Poirier 
mailto:mathieu.poir...@linaro.org>> wrote:


On Thu, Oct 15, 2020 at 06:15:22PM +0530, Sai Prakash Ranjan wrote:
 > On production systems with ETMs enabled, it is preferred to
 > exclude kernel mode(NS EL1) tracing for security concerns and
 > support only userspace(NS EL0) tracing. So provide an option
 > via kconfig to exclude kernel mode tracing if it is required.
 > This config is disabled by default and would not affect the
 > current configuration which has both kernel and userspace
 > tracing enabled by default.
 >

One requires root access (or be part of a special trace group) to be
able to use
the cs_etm PMU.  With this kind of elevated access restricting
tracing at EL1
provides little in terms of security.

Thanks,
Mathieu

 > Signed-off-by: Sai Prakash Ranjan
mailto:saiprakash.ran...@codeaurora.org>>
 > ---
 >  drivers/hwtracing/coresight/Kconfig                | 9 +
 >  drivers/hwtracing/coresight/coresight-etm4x-core.c | 6 +-
 >  2 files changed, 14 insertions(+), 1 deletion(-)
 >
 > diff --git a/drivers/hwtracing/coresight/Kconfig
b/drivers/hwtracing/coresight/Kconfig
 > index c1198245461d..52435de8824c 100644
 > --- a/drivers/hwtracing/coresight/Kconfig
 > +++ b/drivers/hwtracing/coresight/Kconfig
 > @@ -110,6 +110,15 @@ config CORESIGHT_SOURCE_ETM4X
 >         To compile this driver as a module, choose M here: the
 >         module will be called coresight-etm4x.
 >
 > +config CORESIGHT_ETM4X_EXCL_KERN
 > +     bool "Coresight ETM 4.x exclude kernel mode tracing"
 > +     depends on CORESIGHT_SOURCE_ETM4X
 > +     help
 > +       This will exclude kernel mode(NS EL1) tracing if enabled.
This option
 > +       will be useful to provide more flexible options on
production systems
 > +       where only userspace(NS EL0) tracing might be preferred
for security
 > +       reasons.
 > +
 >  config CORESIGHT_STM
 >       tristate "CoreSight System Trace Macrocell driver"
 >       depends on (ARM && !(CPU_32v3 || CPU_32v4 || CPU_32v4T)) ||
ARM64
 > diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c
b/drivers/hwtracing/coresight/coresight-etm4x-core.c
 > index abd706b216ac..7e5669e5cd1f 100644
 > --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
 > +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
 > @@ -832,6 +832,9 @@ static u64 etm4_get_ns_access_type(struct
etmv4_config *config)
 >  {
 >       u64 access_type = 0;
 >
 > +     if (IS_ENABLED(CONFIG_CORESIGHT_ETM4X_EXCL_KERN))
 > +             config->mode |= ETM_MODE_EXCL_KERN;
 > +
 >       /*
 >        * EXLEVEL_NS, bits[15:12]
 >        * The Exception levels are:
 > @@ -849,7 +852,8 @@ static u64 etm4_get_ns_access_type(struct
etmv4_config *config)
 >               access_type = ETM_EXLEVEL_NS_HYP;
 >       }
 >
 > -     if (config->mode & ETM_MODE_EXCL_USER)
 > +     if (config->mode & ETM_MODE_EXCL_USER &&
 > +         !IS_ENABLED(CONFIG_CORESIGHT_ETM4X_EXCL_KERN))
 >               access_type |= ETM_EXLEVEL_NS_APP;
 >
 >       return access_type;
 >
 > base-commit: 3477326277451000bc667dfcc4fd0774c039184c
 > --
 > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is
a member
 > of Code Aurora Forum, hosted by The Linux Foundation
 >