On Fri, Mar 19, 2021 at 11:55:10AM +0000, Mike Leach wrote:
> HI Suzuki,
> 
> On Fri, 19 Mar 2021 at 10:30, Suzuki K Poulose <suzuki.poul...@arm.com> wrote:
> >
> > Hi Mike
> >
> > > On 8 Mar 2021, at 17:26, Mike Leach <mike.le...@linaro.org> wrote:
> > >
> > > Hi Suzuki,
> > >
> > > On Thu, 25 Feb 2021 at 19:36, Suzuki K Poulose <suzuki.poul...@arm.com> 
> > > wrote:
> > >>
> > >> From: Anshuman Khandual <anshuman.khand...@arm.com>
> > >>
> > >> Trace Buffer Extension (TRBE) implements a trace buffer per CPU which is
> > >> accessible via the system registers. The TRBE supports different 
> > >> addressing
> > >> modes including CPU virtual address and buffer modes including the 
> > >> circular
> > >> buffer mode. The TRBE buffer is addressed by a base pointer 
> > >> (TRBBASER_EL1),
> > >> an write pointer (TRBPTR_EL1) and a limit pointer (TRBLIMITR_EL1). But 
> > >> the
> > >> access to the trace buffer could be prohibited by a higher exception 
> > >> level
> > >> (EL3 or EL2), indicated by TRBIDR_EL1.P. The TRBE can also generate a CPU
> > >> private interrupt (PPI) on address translation errors and when the buffer
> > >> is full. Overall implementation here is inspired from the Arm SPE driver.
> > >>
> > >> Cc: Mathieu Poirier <mathieu.poir...@linaro.org>
> > >> Cc: Mike Leach <mike.le...@linaro.org>
> > >> Cc: Suzuki K Poulose <suzuki.poul...@arm.com>
> > >> Signed-off-by: Anshuman Khandual <anshuman.khand...@arm.com>
> > >> Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
> > >>
> > >> +
> > >> +static unsigned long arm_trbe_update_buffer(struct coresight_device 
> > >> *csdev,
> > >> +                                           struct perf_output_handle 
> > >> *handle,
> > >> +                                           void *config)
> > >> +{
> > >> +       struct trbe_drvdata *drvdata = 
> > >> dev_get_drvdata(csdev->dev.parent);
> > >> +       struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev);
> > >> +       struct trbe_buf *buf = config;
> > >> +       enum trbe_fault_action act;
> > >> +       unsigned long size, offset;
> > >> +       unsigned long write, base, status;
> > >> +       unsigned long flags;
> > >> +
> > >> +       WARN_ON(buf->cpudata != cpudata);
> > >> +       WARN_ON(cpudata->cpu != smp_processor_id());
> > >> +       WARN_ON(cpudata->drvdata != drvdata);
> > >> +       if (cpudata->mode != CS_MODE_PERF)
> > >> +               return 0;
> > >> +
> > >> +       perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
> > >> +
> > >> +       /*
> > >> +        * We are about to disable the TRBE. And this could in turn
> > >> +        * fill up the buffer triggering, an IRQ. This could be consumed
> > >> +        * by the PE asynchronously, causing a race here against
> > >> +        * the IRQ handler in closing out the handle. So, let us
> > >> +        * make sure the IRQ can't trigger while we are collecting
> > >> +        * the buffer. We also make sure that a WRAP event is handled
> > >> +        * accordingly.
> > >> +        */
> > >> +       local_irq_save(flags);
> > >> +
> > >> +       /*
> > >> +        * If the TRBE was disabled due to lack of space in the AUX 
> > >> buffer or a
> > >> +        * spurious fault, the driver leaves it disabled, truncating the 
> > >> buffer.
> > >> +        * Since the etm_perf driver expects to close out the AUX 
> > >> buffer, the
> > >> +        * driver skips it. Thus, just pass in 0 size here to indicate 
> > >> that the
> > >> +        * buffer was truncated.
> > >> +        */
> > >> +       if (!is_trbe_enabled()) {
> > >> +               size = 0;
> > >> +               goto done;
> > >> +       }
> > >> +       /*
> > >> +        * perf handle structure needs to be shared with the TRBE IRQ 
> > >> handler for
> > >> +        * capturing trace data and restarting the handle. There is a 
> > >> probability
> > >> +        * of an undefined reference based crash when etm event is being 
> > >> stopped
> > >> +        * while a TRBE IRQ also getting processed. This happens due the 
> > >> release
> > >> +        * of perf handle via perf_aux_output_end() in etm_event_stop(). 
> > >> Stopping
> > >> +        * the TRBE here will ensure that no IRQ could be generated when 
> > >> the perf
> > >> +        * handle gets freed in etm_event_stop().
> > >> +        */
> > >> +       trbe_drain_and_disable_local();
> > >> +       write = get_trbe_write_pointer();
> > >> +       base = get_trbe_base_pointer();
> > >> +
> > >> +       /* Check if there is a pending interrupt and handle it here */
> > >> +       status = read_sysreg_s(SYS_TRBSR_EL1);
> > >> +       if (is_trbe_irq(status)) {
> > >> +
> > >> +               /*
> > >> +                * Now that we are handling the IRQ here, clear the IRQ
> > >> +                * from the status, to let the irq handler know that it
> > >> +                * is taken care of.
> > >> +                */
> > >> +               clr_trbe_irq();
> > >> +               isb();
> > >> +
> > >> +               act = trbe_get_fault_act(status);
> > >> +               /*
> > >> +                * If this was not due to a WRAP event, we have some
> > >> +                * errors and as such buffer is empty.
> > >> +                */
> > >> +               if (act != TRBE_FAULT_ACT_WRAP) {
> > >> +                       size = 0;
> > >> +                       goto done;
> > >> +               }
> > >
> > > We are using TRBE FILL mode - which halts capture on a full buffer and
> > > triggers the IRQ, without disabling the source first.
> > > This means that the mode is inherently lossy (unless by some unlikely
> > > co-incidence the last byte that caused the wrap was also the last byte
> > > to be sent from an ETE that was in the process of being disabled.)
> > > Therefore we must have a perf_aux_output_flag(handle,
> > > PERF_AUX_FLAG_TRUNCATED) call in here to signal that some trace was
> > > lost, for consistence of operation with ETR etc, and intelpt.
> > >
> >
> > I agree that the there is a bit of loss here due to the FILL mode. But it 
> > is not  comparable to that of the ETR. In this case, the WRAP event is 
> > triggered when we flush the ETE. i.e, this could be mostly due to the fact 
> > that the tracing was enabled for the kernel mode and the last few bytes of 
> > trace which caused the FILL belong to the code responsible for stopping the 
> > components in the CoreSight trace. I personally do not think this data is 
> > of any interest to the user.
> > Otherwise, if the data didn’t belong to the perf event side, it should have 
> > triggered the IRQ.
> >
> > This is true in case of the buffer overflow interrupt too, with a bit more 
> > data lost. i.e, since the interrupt is PPI, the overflow is triggered when 
> > the buffer is full (which includes the data that is cached in the TRBE). 
> > But there could be a bit of data that is still cached in the ETE, before it 
> > is captured in the trace. And the moment we get a FILL event, we stop 
> > executing anything that is relevant for the Trace session (as we are in the 
> > driver handling the interrupt).
> > And then we reconfigure the buffer to continue the execution. Now, the 
> > interrupt delivery is not necessarily synchronous and there could be data 
> > lost in the interval between WRAP event and the IRQ is triggered.
> >
> > I am OK with suggesting that there was some loss of trace data during the 
> > session, if we hit WRAP event. But this could cause worry to the consumers 
> > that they lost too much of trace data of their interest, while that is not 
> > the case.
> >
> 
> We can never know what has been lost. It may be some trace around the
> driver of no interest to the user, it may also be an event or
> timestamp related to an earlier marker - which could be highly
> relevant.
> With ETR we do not know how much is lost on wrap - it might be one
> byte, it might be much more - but the point is we mark as truncated
> for _any_ amount.
> 
> It is unfortunate that we will see multiple buffers marked as
> truncated - but this is far better than creating the false impression
> that no trace has been lost - that there is a continuous record where
> there is not.
> For some users - such as autofdo where sampling is taking place anyway
> - truncated buffers probably do not matter. For others - who are
> looking to trace a specific section of code - then they need to be
> aware that there could be decode anomolies relating to buffer wrap.
>

I think Mike has a point here - we should report it to users when data gets
lost, no matter how small that lost is. If that is a problem they always have
the choice of dedicating more pages to the AUX buffer.

Thanks,
Mathieu 
 
> Regards
> 
> Mike
> 
> > >> +static inline unsigned long get_trbe_limit_pointer(void)
> > >> +{
> > >> +       u64 trblimitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
> > >> +       unsigned long limit = (trblimitr >> TRBLIMITR_LIMIT_SHIFT) & 
> > >> TRBLIMITR_LIMIT_MASK;
> > >> +       unsigned long addr = limit << TRBLIMITR_LIMIT_SHIFT;
> > >
> > > Could this not be:
> > > unsigned long addr = trblimitr & (TRBLIMITR_LIMIT_MASK <<
> > > TRBLIMITR_LIMIT_SHIFT);
> > > like the base ponter below?
> > >
> >
> > Sure, it can be consistent.
> >
> >
> > >> +
> > >> +       WARN_ON(!IS_ALIGNED(addr, PAGE_SIZE));
> > >> +       return addr;
> > >> +}
> > >> +
> > >> +static inline unsigned long get_trbe_base_pointer(void)
> > >> +{
> > >> +       u64 trbbaser = read_sysreg_s(SYS_TRBBASER_EL1);
> > >> +       unsigned long addr = trbbaser & (TRBBASER_BASE_MASK << 
> > >> TRBBASER_BASE_SHIFT);
> > >> +
> > >> +       WARN_ON(!IS_ALIGNED(addr, PAGE_SIZE));
> > >> +       return addr;
> > >> +}
> > >> +
> >
> > Thank you for the review
> >
> > Kind regards
> > Suzuki
> >
> 
> 
> -- 
> Mike Leach
> Principal Engineer, ARM Ltd.
> Manchester Design Centre. UK

Reply via email to