From: John Jacques <john.jacq...@intel.com>

repository: https://github.com/Linaro/OpenCSD.git
branch: perf-opencsd-4.9
tip:
commit e38cbdb2928a36c5a ("cs-etm: Update to perf cs-etm decoder for OpenCSD 
v0.5")

Signed-off-by: John Jacques <john.jacq...@intel.com>
---
 Documentation/trace/coresight.txt                  |  138 +-
 drivers/hwtracing/coresight/coresight-etm-perf.c   |   31 +-
 drivers/hwtracing/coresight/coresight-etm.h        |    5 +
 .../hwtracing/coresight/coresight-etm3x-sysfs.c    |   12 +-
 drivers/hwtracing/coresight/coresight-priv.h       |    4 +-
 drivers/hwtracing/coresight/coresight-stm.c        |    9 +-
 drivers/hwtracing/coresight/coresight-tmc-etf.c    |   48 +-
 drivers/hwtracing/coresight/coresight-tmc-etr.c    |  286 +++-
 drivers/hwtracing/coresight/coresight-tmc.h        |    2 +-
 drivers/hwtracing/coresight/coresight.c            |   62 +-
 tools/perf/Makefile.config                         |   18 +
 tools/perf/Makefile.perf                           |    3 +
 tools/perf/arch/arm/util/cs-etm.c                  |    2 -
 tools/perf/builtin-script.c                        |    3 +-
 tools/perf/scripts/python/cs-trace-disasm.py       |  134 ++
 tools/perf/scripts/python/cs-trace-ranges.py       |   44 +
 tools/perf/util/Build                              |    2 +
 tools/perf/util/auxtrace.c                         |    2 +
 tools/perf/util/cs-etm-decoder/Build               |    6 +
 .../perf/util/cs-etm-decoder/cs-etm-decoder-stub.c |   99 ++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  527 +++++++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |  117 ++
 tools/perf/util/cs-etm.c                           | 1501 ++++++++++++++++++++
 tools/perf/util/cs-etm.h                           |   10 +
 tools/perf/util/machine.c                          |   46 +-
 .../util/scripting-engines/trace-event-python.c    |    2 +
 tools/perf/util/symbol-minimal.c                   |    3 +-
 27 files changed, 3001 insertions(+), 115 deletions(-)
 create mode 100644 tools/perf/scripts/python/cs-trace-disasm.py
 create mode 100644 tools/perf/scripts/python/cs-trace-ranges.py
 create mode 100644 tools/perf/util/cs-etm-decoder/Build
 create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder-stub.c
 create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
 create mode 100644 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
 create mode 100644 tools/perf/util/cs-etm.c

diff --git a/Documentation/trace/coresight.txt 
b/Documentation/trace/coresight.txt
index a33c88c..a2e7ccb 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -20,13 +20,13 @@ Components are generally categorised as source, link and 
sinks and are
 
 "Sources" generate a compressed stream representing the processor instruction
 path based on tracing scenarios as configured by users.  From there the stream
-flows through the coresight system (via ATB bus) using links that are 
connecting
-the emanating source to a sink(s).  Sinks serve as endpoints to the coresight
+flows through the Coresight system (via ATB bus) using links that are 
connecting
+the emanating source to a sink(s).  Sinks serve as endpoints to the Coresight
 implementation, either storing the compressed stream in a memory buffer or
 creating an interface to the outside world where data can be transferred to a
-host without fear of filling up the onboard coresight memory buffer.
+host without fear of filling up the onboard Coresight memory buffer.
 
-At typical coresight system would look like this:
+At typical Coresight system would look like this:
 
   *****************************************************************
  **************************** AMBA AXI  ****************************===||
@@ -83,8 +83,8 @@ While on target configuration of the components is done via 
the APB bus,
 all trace data are carried out-of-band on the ATB bus.  The CTM provides
 a way to aggregate and distribute signals between CoreSight components.
 
-The coresight framework provides a central point to represent, configure and
-manage coresight devices on a platform.  This first implementation centers on
+The Coresight framework provides a central point to represent, configure and
+manage Coresight devices on a platform.  This first implementation centers on
 the basic tracing functionality, enabling components such ETM/PTM, funnel,
 replicator, TMC, TPIU and ETB.  Future work will enable more
 intricate IP blocks such as STM and CTI.
@@ -129,11 +129,11 @@ expected to be added as the solution matures.
 Framework and implementation
 ----------------------------
 
-The coresight framework provides a central point to represent, configure and
-manage coresight devices on a platform.  Any coresight compliant device can
+The Coresight framework provides a central point to represent, configure and
+manage Coresight devices on a platform.  Any Coresight compliant device can
 register with the framework for as long as they use the right APIs:
 
-struct coresight_device *coresight_register(struct coresight_desc *desc);
+struct Coresight_device *coresight_register(struct coresight_desc *desc);
 void coresight_unregister(struct coresight_device *csdev);
 
 The registering function is taking a "struct coresight_device *csdev" and
@@ -193,10 +193,120 @@ the information carried in "THIS_MODULE".
 How to use the tracer modules
 -----------------------------
 
-Before trace collection can start, a coresight sink needs to be identify.
-There is no limit on the amount of sinks (nor sources) that can be enabled at
-any given moment.  As a generic operation, all device pertaining to the sink
-class will have an "active" entry in sysfs:
+There is two ways to use the Coresight framework: 1) using the perf cmd line
+tool and 2) interacting directly with the Coresight devices using the sysFS
+interface.  The latter will slowly be faded out as more functionality become
+available from the perf cmd line tool but for the time being both are still
+supported.  The following sections provide details on using both methods.
+
+1) Using perf framework:
+
+Coresight tracers like ETM and PTM are represented using the Perf framework's
+Performance Monitoring Unit (PMU).  As such the perf framework takes charge of
+controlling when tracing happens based on when the process(es) of interest are
+scheduled.  When configure in a system, Coresight PMUs will be listed when
+queried by the perf command line tool:
+
+linaro@linaro-nano:~$ ./perf list pmu
+
+List of pre-defined events (to be used in -e):
+
+  cs_etm//                                           [Kernel PMU event]
+
+linaro@linaro-nano:~$
+
+Regardless of the amount ETM/PTM IP block in a system (usually equal to the
+amount of processor core), the "cs_etm" PMU will be listed only once.
+
+Before a trace can be configured and started a Coresight sink needs to be
+selected using the sysFS method (see below).  This is only temporary until
+sink selection can be made from the command line tool.
+
+linaro@linaro-nano:~$ ls /sys/bus/coresight/devices
+20010000.etb  20030000.tpiu  20040000.funnel  2201c000.ptm
+2201d000.ptm  2203c000.etm  2203d000.etm  2203e000.etm  replicator
+
+linaro@linaro-nano:~$ echo 1 > 
/sys/bus/coresight/devices/20010000.etb/enable_sink
+
+Once a sink has been selected configuring a Coresight PMU works the same way as
+any other PMU.  As such tracing can happen for a single CPU, a group of CPU, 
per
+thread or a combination of those:
+
+linaro@linaro-nano:~$ perf record -e cs_etm// --per-thread <command>
+
+linaro@linaro-nano:~$ perf record -C 0,2-3 -e cs_etm// <command>
+
+Tracing limited to user and kernel space can also be used to narrow the amount
+of collected traces:
+
+linaro@linaro-nano:~$ perf record -e cs_etm//u --per-thread <command>
+
+linaro@linaro-nano:~$ perf record -C 0,2-3 -e cs_etm//k <command>
+
+As of this writing two ETM/PTM specific options have are available: cycle
+accurate and timestamp (please refer to the Embedded Trace Macrocell reference
+manual for details on these options).  By default both are disabled but using
+the "cycacc" and "timestamp" mnemonic within the double '/' will see those
+options configure for the upcoming trace run:
+
+linaro@linaro-nano:~$ perf record -e cs_etm/cycacc/ --per-thread <command>
+
+linaro@linaro-nano:~$ perf record -C 0,2-3 -e cs_etm/cycacc,timestamp/ 
<command>
+
+The Coresight PMUs can be configured to work in "full trace" or "snapshot" 
mode.
+In full trace mode trace acquisition is enabled from beginning to end with 
trace
+data being recorded continuously:
+
+linaro@linaro-nano:~$ perf record -e cs_etm// dd if=/dev/random of=./test.txt 
bs=1k count=1000
+
+Since this can lead to a significant amount of data and because some devices 
are
+limited in disk space snapshot mode can be used instead.  In snapshot mode
+traces are still collected in the ring buffer but not communicated to user
+space.  The ring buffer is allowed to wrap around, providing the latest
+information before an event of interest happens.  Significant events are
+communicated by sending a USR2 signal to the user space command line tool.
+From there the tool will stop trace collection and harvest data from the ring
+buffer before re-enabling traces.  Snapshot mode can be invoked using '-S' when
+launching a trace collection:
+
+linaro@linaro-nano:~$ perf record -S -e cs_etm// dd if=/dev/random 
of=./test.txt bs=1k count=1000
+
+Trace data collected during trace runs ends up in the "perf.data" file.  Trace
+configuration information necessary for trace decoding is also embedded in the
+"perf.data" file.  Two new headers, 'PERF_RECORD_AUXTRACE_INFO' and
+'PERF_RECORD_AUXTRACE' have been added to the list of event types in order to
+find out where the different sections start.
+
+It is worth noting that a set of metadata information exists for each tracer
+that participated in a trace run.  As such if 5 processors have been engaged,
+5 sets of metadata will be found in the perf.data file.  This is to ensure that
+tracer decompression tools have all the information they need in order to
+process the trace data.
+
+Metadata information is collected directly from the ETM/PTM management 
registers
+using the sysFS interface.  Since there is no way for the perf command line
+tool to associate a CPU with a tracer, a symbolic link has been created between
+the cs_etm sysFS event directory and each Coresight tracer:
+
+linaro@linaro-nano:~$ ls /sys/bus/event_source/devices/cs_etm
+cpu0  cpu1  cpu2  cpu3  cpu4  format  perf_event_mux_interval_ms
+power  subsystem  type  uevent
+
+linaro@linaro-nano:~$ ls /sys/bus/event_source/devices/cs_etm/cpu0/mgmt/
+etmccer  etmccr  etmcr  etmidr  etmscr  etmtecr1  etmtecr2
+etmteevr  etmtraceidr  etmtssvr
+
+2) Using the sysFS interface:
+
+Most, if not all, configuration registers are made available to users via the
+sysFS interface.  Until all Coresight ETM drivers have been converted to perf,
+it will also be possible to start and stop traces from sysFS.
+
+As with the perf method described above, a Coresight sink needs to be identify
+before trace collection can commence.  Using the sysFS method _only_, there is
+no limit on the amount of sinks (nor sources) that can be enabled at
+any given moment.  As a generic operation, all devices pertaining to the sink
+class will have an "enable_sink" entry in sysfs:
 
 root:/sys/bus/coresight/devices# ls
 replicator  20030000.tpiu    2201c000.ptm  2203c000.etm  2203e000.etm
@@ -246,7 +356,7 @@ The file cstrace.bin can be decompressed using "ptm2human", 
DS-5 or Trace32.
 
 Following is a DS-5 output of an experimental loop that increments a variable 
up
 to a certain value.  The example is simple and yet provides a glimpse of the
-wealth of possibilities that coresight provides.
+wealth of possibilities that Coresight provides.
 
 Info                                    Tracing enabled
 Instruction     106378866       0x8026B53C      E52DE004        false   PUSH   
  {lr}
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c 
b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 2cd7c71..1774196 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -202,6 +202,21 @@ static void *etm_setup_aux(int event_cpu, void **pages,
        if (!event_data)
                return NULL;
 
+       /*
+        * In theory nothing prevent tracers in a trace session from being
+        * associated with different sinks, nor having a sink per tracer.  But
+        * until we have HW with this kind of topology we need to assume tracers
+        * in a trace session are using the same sink.  Therefore go through
+        * the coresight bus and pick the first enabled sink.
+        *
+        * When operated from sysFS users are responsible to enable the sink
+        * while from perf, the perf tools will do it based on the choice made
+        * on the cmd line.  As such the "enable_sink" flag in sysFS is reset.
+        */
+       sink = coresight_get_enabled_sink(true);
+       if (!sink)
+               goto err;
+
        INIT_WORK(&event_data->work, free_event_data);
 
        mask = &event_data->mask;
@@ -219,25 +234,11 @@ static void *etm_setup_aux(int event_cpu, void **pages,
                 * list of devices from source to sink that can be
                 * referenced later when the path is actually needed.
                 */
-               event_data->path[cpu] = coresight_build_path(csdev);
+               event_data->path[cpu] = coresight_build_path(csdev, sink);
                if (IS_ERR(event_data->path[cpu]))
                        goto err;
        }
 
-       /*
-        * In theory nothing prevent tracers in a trace session from being
-        * associated with different sinks, nor having a sink per tracer.  But
-        * until we have HW with this kind of topology and a way to convey
-        * sink assignement from the perf cmd line we need to assume tracers
-        * in a trace session are using the same sink.  Therefore pick the sink
-        * found at the end of the first available path.
-        */
-       cpu = cpumask_first(mask);
-       /* Grab the sink at the end of the path */
-       sink = coresight_get_sink(event_data->path[cpu]);
-       if (!sink)
-               goto err;
-
        if (!sink_ops(sink)->alloc_buffer)
                goto err;
 
diff --git a/drivers/hwtracing/coresight/coresight-etm.h 
b/drivers/hwtracing/coresight/coresight-etm.h
index 4a18ee4..ad063d7 100644
--- a/drivers/hwtracing/coresight/coresight-etm.h
+++ b/drivers/hwtracing/coresight/coresight-etm.h
@@ -89,11 +89,13 @@
 /* ETMCR - 0x00 */
 #define ETMCR_PWD_DWN          BIT(0)
 #define ETMCR_STALL_MODE       BIT(7)
+#define ETMCR_BRANCH_BROADCAST BIT(8)
 #define ETMCR_ETM_PRG          BIT(10)
 #define ETMCR_ETM_EN           BIT(11)
 #define ETMCR_CYC_ACC          BIT(12)
 #define ETMCR_CTXID_SIZE       (BIT(14)|BIT(15))
 #define ETMCR_TIMESTAMP_EN     BIT(28)
+#define ETMCR_RETURN_STACK     BIT(29)
 /* ETMCCR - 0x04 */
 #define ETMCCR_FIFOFULL                BIT(23)
 /* ETMPDCR - 0x310 */
@@ -110,8 +112,11 @@
 #define ETM_MODE_STALL         BIT(2)
 #define ETM_MODE_TIMESTAMP     BIT(3)
 #define ETM_MODE_CTXID         BIT(4)
+#define ETM_MODE_BBROAD                BIT(5)
+#define ETM_MODE_RET_STACK     BIT(6)
 #define ETM_MODE_ALL           (ETM_MODE_EXCLUDE | ETM_MODE_CYCACC | \
                                 ETM_MODE_STALL | ETM_MODE_TIMESTAMP | \
+                                ETM_MODE_BBROAD | ETM_MODE_RET_STACK | \
                                 ETM_MODE_CTXID | ETM_MODE_EXCL_KERN | \
                                 ETM_MODE_EXCL_USER)
 
diff --git a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c 
b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c
index e9b0719..ca98ad1 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x-sysfs.c
@@ -146,7 +146,7 @@ static ssize_t mode_store(struct device *dev,
                        goto err_unlock;
                }
                config->ctrl |= ETMCR_STALL_MODE;
-        } else
+       } else
                config->ctrl &= ~ETMCR_STALL_MODE;
 
        if (config->mode & ETM_MODE_TIMESTAMP) {
@@ -164,6 +164,16 @@ static ssize_t mode_store(struct device *dev,
        else
                config->ctrl &= ~ETMCR_CTXID_SIZE;
 
+       if (config->mode & ETM_MODE_BBROAD)
+               config->ctrl |= ETMCR_BRANCH_BROADCAST;
+       else
+               config->ctrl &= ~ETMCR_BRANCH_BROADCAST;
+
+       if (config->mode & ETM_MODE_RET_STACK)
+               config->ctrl |= ETMCR_RETURN_STACK;
+       else
+               config->ctrl &= ~ETMCR_RETURN_STACK;
+
        if (config->mode & (ETM_MODE_EXCL_KERN | ETM_MODE_EXCL_USER))
                etm_config_trace_mode(config);
 
diff --git a/drivers/hwtracing/coresight/coresight-priv.h 
b/drivers/hwtracing/coresight/coresight-priv.h
index 196a14b..ef9d8e9 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -111,7 +111,9 @@ static inline void CS_UNLOCK(void __iomem *addr)
 void coresight_disable_path(struct list_head *path);
 int coresight_enable_path(struct list_head *path, u32 mode);
 struct coresight_device *coresight_get_sink(struct list_head *path);
-struct list_head *coresight_build_path(struct coresight_device *csdev);
+struct coresight_device *coresight_get_enabled_sink(bool reset);
+struct list_head *coresight_build_path(struct coresight_device *csdev,
+                                      struct coresight_device *sink);
 void coresight_release_path(struct list_head *path);
 
 #ifdef CONFIG_CORESIGHT_SOURCE_ETM3X
diff --git a/drivers/hwtracing/coresight/coresight-stm.c 
b/drivers/hwtracing/coresight/coresight-stm.c
index 8e79056..2d16260 100644
--- a/drivers/hwtracing/coresight/coresight-stm.c
+++ b/drivers/hwtracing/coresight/coresight-stm.c
@@ -419,10 +419,10 @@ static ssize_t stm_generic_packet(struct stm_data 
*stm_data,
                                                   struct stm_drvdata, stm);
 
        if (!(drvdata && local_read(&drvdata->mode)))
-               return 0;
+               return -EACCES;
 
        if (channel >= drvdata->numsp)
-               return 0;
+               return -EINVAL;
 
        ch_addr = (unsigned long)stm_channel_addr(drvdata, channel);
 
@@ -920,6 +920,11 @@ static struct amba_id stm_ids[] = {
                .mask   = 0x0003ffff,
                .data   = "STM32",
        },
+       {
+               .id     = 0x0003b963,
+               .mask   = 0x0003ffff,
+               .data   = "STM500",
+       },
        { 0, 0},
 };
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index d6941ea..1549436 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -70,7 +70,7 @@ static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
         * When operating in sysFS mode the content of the buffer needs to be
         * read before the TMC is disabled.
         */
-       if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
+       if (drvdata->mode == CS_MODE_SYSFS)
                tmc_etb_dump_hw(drvdata);
        tmc_disable_hw(drvdata);
 
@@ -103,19 +103,14 @@ static void tmc_etf_disable_hw(struct tmc_drvdata 
*drvdata)
        CS_LOCK(drvdata->base);
 }
 
-static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 {
        int ret = 0;
        bool used = false;
        char *buf = NULL;
-       long val;
        unsigned long flags;
        struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-        /* This shouldn't be happening */
-       if (WARN_ON(mode != CS_MODE_SYSFS))
-               return -EINVAL;
-
        /*
         * If we don't have a buffer release the lock and allocate memory.
         * Otherwise keep the lock and move along.
@@ -138,13 +133,12 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
                goto out;
        }
 
-       val = local_xchg(&drvdata->mode, mode);
        /*
         * In sysFS mode we can have multiple writers per sink.  Since this
         * sink is already enabled no memory is needed and the HW need not be
         * touched.
         */
-       if (val == CS_MODE_SYSFS)
+       if (drvdata->mode == CS_MODE_SYSFS)
                goto out;
 
        /*
@@ -163,6 +157,7 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
                drvdata->buf = buf;
        }
 
+       drvdata->mode = CS_MODE_SYSFS;
        tmc_etb_enable_hw(drvdata);
 out:
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -177,34 +172,29 @@ static int tmc_enable_etf_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
        return ret;
 }
 
-static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etf_sink_perf(struct coresight_device *csdev)
 {
        int ret = 0;
-       long val;
        unsigned long flags;
        struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-        /* This shouldn't be happening */
-       if (WARN_ON(mode != CS_MODE_PERF))
-               return -EINVAL;
-
        spin_lock_irqsave(&drvdata->spinlock, flags);
        if (drvdata->reading) {
                ret = -EINVAL;
                goto out;
        }
 
-       val = local_xchg(&drvdata->mode, mode);
        /*
         * In Perf mode there can be only one writer per sink.  There
         * is also no need to continue if the ETB/ETR is already operated
         * from sysFS.
         */
-       if (val != CS_MODE_DISABLED) {
+       if (drvdata->mode != CS_MODE_DISABLED) {
                ret = -EINVAL;
                goto out;
        }
 
+       drvdata->mode = CS_MODE_PERF;
        tmc_etb_enable_hw(drvdata);
 out:
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -216,9 +206,9 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)
 {
        switch (mode) {
        case CS_MODE_SYSFS:
-               return tmc_enable_etf_sink_sysfs(csdev, mode);
+               return tmc_enable_etf_sink_sysfs(csdev);
        case CS_MODE_PERF:
-               return tmc_enable_etf_sink_perf(csdev, mode);
+               return tmc_enable_etf_sink_perf(csdev);
        }
 
        /* We shouldn't be here */
@@ -227,7 +217,6 @@ static int tmc_enable_etf_sink(struct coresight_device 
*csdev, u32 mode)
 
 static void tmc_disable_etf_sink(struct coresight_device *csdev)
 {
-       long val;
        unsigned long flags;
        struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -237,10 +226,11 @@ static void tmc_disable_etf_sink(struct coresight_device 
*csdev)
                return;
        }
 
-       val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
        /* Disable the TMC only if it needs to */
-       if (val != CS_MODE_DISABLED)
+       if (drvdata->mode != CS_MODE_DISABLED) {
                tmc_etb_disable_hw(drvdata);
+               drvdata->mode = CS_MODE_DISABLED;
+       }
 
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
@@ -260,7 +250,7 @@ static int tmc_enable_etf_link(struct coresight_device 
*csdev,
        }
 
        tmc_etf_enable_hw(drvdata);
-       local_set(&drvdata->mode, CS_MODE_SYSFS);
+       drvdata->mode = CS_MODE_SYSFS;
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
        dev_info(drvdata->dev, "TMC-ETF enabled\n");
@@ -280,7 +270,7 @@ static void tmc_disable_etf_link(struct coresight_device 
*csdev,
        }
 
        tmc_etf_disable_hw(drvdata);
-       local_set(&drvdata->mode, CS_MODE_DISABLED);
+       drvdata->mode = CS_MODE_DISABLED;
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
        dev_info(drvdata->dev, "TMC disabled\n");
@@ -383,7 +373,7 @@ static void tmc_update_etf_buffer(struct coresight_device 
*csdev,
                return;
 
        /* This shouldn't happen */
-       if (WARN_ON_ONCE(local_read(&drvdata->mode) != CS_MODE_PERF))
+       if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
                return;
 
        CS_UNLOCK(drvdata->base);
@@ -504,7 +494,6 @@ const struct coresight_ops tmc_etf_cs_ops = {
 
 int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
 {
-       long val;
        enum tmc_mode mode;
        int ret = 0;
        unsigned long flags;
@@ -528,9 +517,8 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
                goto out;
        }
 
-       val = local_read(&drvdata->mode);
        /* Don't interfere if operated from Perf */
-       if (val == CS_MODE_PERF) {
+       if (drvdata->mode == CS_MODE_PERF) {
                ret = -EINVAL;
                goto out;
        }
@@ -542,7 +530,7 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
        }
 
        /* Disable the TMC if need be */
-       if (val == CS_MODE_SYSFS)
+       if (drvdata->mode == CS_MODE_SYSFS)
                tmc_etb_disable_hw(drvdata);
 
        drvdata->reading = true;
@@ -573,7 +561,7 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata)
        }
 
        /* Re-enable the TMC if need be */
-       if (local_read(&drvdata->mode) == CS_MODE_SYSFS) {
+       if (drvdata->mode == CS_MODE_SYSFS) {
                /*
                 * The trace run will continue with the same allocated trace
                 * buffer. As such zero-out the buffer so that we don't end
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 886ea83..2db4857 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -15,11 +15,30 @@
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/circ_buf.h>
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
+#include <linux/slab.h>
+
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
+/**
+ * struct cs_etr_buffer - keep track of a recording session' specifics
+ * @tmc:       generic portion of the TMC buffers
+ * @paddr:     the physical address of a DMA'able contiguous memory area
+ * @vaddr:     the virtual address associated to @paddr
+ * @size:      how much memory we have, starting at @paddr
+ * @dev:       the device @vaddr has been tied to
+ */
+struct cs_etr_buffers {
+       struct cs_buffers       tmc;
+       dma_addr_t              paddr;
+       void __iomem            *vaddr;
+       u32                     size;
+       struct device           *dev;
+};
+
 static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
 {
        u32 axictl;
@@ -86,26 +105,22 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
         * When operating in sysFS mode the content of the buffer needs to be
         * read before the TMC is disabled.
         */
-       if (local_read(&drvdata->mode) == CS_MODE_SYSFS)
+       if (drvdata->mode == CS_MODE_SYSFS)
                tmc_etr_dump_hw(drvdata);
        tmc_disable_hw(drvdata);
 
        CS_LOCK(drvdata->base);
 }
 
-static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 {
        int ret = 0;
        bool used = false;
-       long val;
        unsigned long flags;
        void __iomem *vaddr = NULL;
        dma_addr_t paddr;
        struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-        /* This shouldn't be happening */
-       if (WARN_ON(mode != CS_MODE_SYSFS))
-               return -EINVAL;
 
        /*
         * If we don't have a buffer release the lock and allocate memory.
@@ -134,13 +149,12 @@ static int tmc_enable_etr_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
                goto out;
        }
 
-       val = local_xchg(&drvdata->mode, mode);
        /*
         * In sysFS mode we can have multiple writers per sink.  Since this
         * sink is already enabled no memory is needed and the HW need not be
         * touched.
         */
-       if (val == CS_MODE_SYSFS)
+       if (drvdata->mode == CS_MODE_SYSFS)
                goto out;
 
        /*
@@ -155,8 +169,7 @@ static int tmc_enable_etr_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
                drvdata->buf = drvdata->vaddr;
        }
 
-       memset(drvdata->vaddr, 0, drvdata->size);
-
+       drvdata->mode = CS_MODE_SYSFS;
        tmc_etr_enable_hw(drvdata);
 out:
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -171,34 +184,29 @@ static int tmc_enable_etr_sink_sysfs(struct 
coresight_device *csdev, u32 mode)
        return ret;
 }
 
-static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, u32 mode)
+static int tmc_enable_etr_sink_perf(struct coresight_device *csdev)
 {
        int ret = 0;
-       long val;
        unsigned long flags;
        struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
-        /* This shouldn't be happening */
-       if (WARN_ON(mode != CS_MODE_PERF))
-               return -EINVAL;
-
        spin_lock_irqsave(&drvdata->spinlock, flags);
        if (drvdata->reading) {
                ret = -EINVAL;
                goto out;
        }
 
-       val = local_xchg(&drvdata->mode, mode);
        /*
         * In Perf mode there can be only one writer per sink.  There
         * is also no need to continue if the ETR is already operated
         * from sysFS.
         */
-       if (val != CS_MODE_DISABLED) {
+       if (drvdata->mode != CS_MODE_DISABLED) {
                ret = -EINVAL;
                goto out;
        }
 
+       drvdata->mode = CS_MODE_PERF;
        tmc_etr_enable_hw(drvdata);
 out:
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -210,9 +218,9 @@ static int tmc_enable_etr_sink(struct coresight_device 
*csdev, u32 mode)
 {
        switch (mode) {
        case CS_MODE_SYSFS:
-               return tmc_enable_etr_sink_sysfs(csdev, mode);
+               return tmc_enable_etr_sink_sysfs(csdev);
        case CS_MODE_PERF:
-               return tmc_enable_etr_sink_perf(csdev, mode);
+               return tmc_enable_etr_sink_perf(csdev);
        }
 
        /* We shouldn't be here */
@@ -221,7 +229,6 @@ static int tmc_enable_etr_sink(struct coresight_device 
*csdev, u32 mode)
 
 static void tmc_disable_etr_sink(struct coresight_device *csdev)
 {
-       long val;
        unsigned long flags;
        struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 
@@ -231,19 +238,244 @@ static void tmc_disable_etr_sink(struct coresight_device 
*csdev)
                return;
        }
 
-       val = local_xchg(&drvdata->mode, CS_MODE_DISABLED);
        /* Disable the TMC only if it needs to */
-       if (val != CS_MODE_DISABLED)
+       if (drvdata->mode != CS_MODE_DISABLED) {
                tmc_etr_disable_hw(drvdata);
+               drvdata->mode = CS_MODE_DISABLED;
+       }
 
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
        dev_info(drvdata->dev, "TMC-ETR disabled\n");
 }
 
+static void *tmc_alloc_etr_buffer(struct coresight_device *csdev, int cpu,
+                                 void **pages, int nr_pages, bool overwrite)
+{
+       int node;
+       struct cs_etr_buffers *buf;
+       struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+       if (cpu == -1)
+               cpu = smp_processor_id();
+       node = cpu_to_node(cpu);
+
+       /* Allocate memory structure for interaction with Perf */
+       buf = kzalloc_node(sizeof(struct cs_etr_buffers), GFP_KERNEL, node);
+       if (!buf)
+               return NULL;
+
+       buf->dev = drvdata->dev;
+       buf->size = drvdata->size;
+       buf->vaddr = dma_alloc_coherent(buf->dev, buf->size,
+                                       &buf->paddr, GFP_KERNEL);
+       if (!buf->vaddr) {
+               kfree(buf);
+               return NULL;
+       }
+
+       buf->tmc.snapshot = overwrite;
+       buf->tmc.nr_pages = nr_pages;
+       buf->tmc.data_pages = pages;
+
+       return buf;
+}
+
+static void tmc_free_etr_buffer(void *config)
+{
+       struct cs_etr_buffers *buf = config;
+
+       dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->paddr);
+       kfree(buf);
+}
+
+static int tmc_set_etr_buffer(struct coresight_device *csdev,
+                             struct perf_output_handle *handle,
+                             void *sink_config)
+{
+       int ret = 0;
+       unsigned long head;
+       struct cs_etr_buffers *buf = sink_config;
+       struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+       /* wrap head around to the amount of space we have */
+       head = handle->head & ((buf->tmc.nr_pages << PAGE_SHIFT) - 1);
+
+       /* find the page to write to */
+       buf->tmc.cur = head / PAGE_SIZE;
+
+       /* and offset within that page */
+       buf->tmc.offset = head % PAGE_SIZE;
+
+       local_set(&buf->tmc.data_size, 0);
+
+       /* Tell the HW where to put the trace data */
+       drvdata->vaddr = buf->vaddr;
+       drvdata->paddr = buf->paddr;
+       memset(drvdata->vaddr, 0, drvdata->size);
+
+       return ret;
+}
+
+static unsigned long tmc_reset_etr_buffer(struct coresight_device *csdev,
+                                         struct perf_output_handle *handle,
+                                         void *sink_config, bool *lost)
+{
+       long size = 0;
+       struct cs_etr_buffers *buf = sink_config;
+       struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+       if (buf) {
+               /*
+                * In snapshot mode ->data_size holds the new address of the
+                * ring buffer's head.  The size itself is the whole address
+                * range since we want the latest information.
+                */
+               if (buf->tmc.snapshot) {
+                       size = buf->tmc.nr_pages << PAGE_SHIFT;
+                       handle->head = local_xchg(&buf->tmc.data_size, size);
+               }
+
+               /*
+                * Tell the tracer PMU how much we got in this run and if
+                * something went wrong along the way.  Nobody else can use
+                * this cs_etr_buffers instance until we are done.  As such
+                * resetting parameters here and squaring off with the ring
+                * buffer API in the tracer PMU is fine.
+                */
+               *lost = !!local_xchg(&buf->tmc.lost, 0);
+               size = local_xchg(&buf->tmc.data_size, 0);
+       }
+
+       /* Get ready for another run */
+       drvdata->vaddr = NULL;
+       drvdata->paddr = 0;
+
+       return size;
+}
+
+static void tmc_update_etr_buffer(struct coresight_device *csdev,
+                                 struct perf_output_handle *handle,
+                                 void *sink_config)
+{
+       int i, cur;
+       u32 *buf_ptr;
+       u32 read_ptr, write_ptr;
+       u32 status, to_read;
+       unsigned long offset;
+       struct cs_buffers *buf = sink_config;
+       struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+       if (!buf)
+               return;
+
+       /* This shouldn't happen */
+       if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
+               return;
+
+       CS_UNLOCK(drvdata->base);
+
+       tmc_flush_and_stop(drvdata);
+
+       read_ptr = readl_relaxed(drvdata->base + TMC_RRP);
+       write_ptr = readl_relaxed(drvdata->base + TMC_RWP);
+
+       /*
+        * Get a hold of the status register and see if a wrap around
+        * has occurred.  If so adjust things accordingly.
+        */
+       status = readl_relaxed(drvdata->base + TMC_STS);
+       if (status & TMC_STS_FULL) {
+               local_inc(&buf->lost);
+               to_read = drvdata->size;
+       } else {
+               to_read = CIRC_CNT(write_ptr, read_ptr, drvdata->size);
+       }
+
+       /*
+        * The TMC RAM buffer may be bigger than the space available in the
+        * perf ring buffer (handle->size).  If so advance the RRP so that we
+        * get the latest trace data.
+        */
+       if (to_read > handle->size) {
+               u32 buffer_start, mask = 0;
+
+               /* Read buffer start address in system memory */
+               buffer_start = readl_relaxed(drvdata->base + TMC_DBALO);
+
+               /*
+                * The value written to RRP must be byte-address aligned to
+                * the width of the trace memory databus _and_ to a frame
+                * boundary (16 byte), whichever is the biggest. For example,
+                * for 32-bit, 64-bit and 128-bit wide trace memory, the four
+                * LSBs must be 0s. For 256-bit wide trace memory, the five
+                * LSBs must be 0s.
+                */
+               switch (drvdata->memwidth) {
+               case TMC_MEM_INTF_WIDTH_32BITS:
+               case TMC_MEM_INTF_WIDTH_64BITS:
+               case TMC_MEM_INTF_WIDTH_128BITS:
+                       mask = GENMASK(31, 5);
+                       break;
+               case TMC_MEM_INTF_WIDTH_256BITS:
+                       mask = GENMASK(31, 6);
+                       break;
+               }
+
+               /*
+                * Make sure the new size is aligned in accordance with the
+                * requirement explained above.
+                */
+               to_read = handle->size & mask;
+               /* Move the RAM read pointer up */
+               read_ptr = (write_ptr + drvdata->size) - to_read;
+               /* Make sure we are still within our limits */
+               if (read_ptr > (buffer_start + (drvdata->size - 1)))
+                       read_ptr -= drvdata->size;
+               /* Tell the HW */
+               writel_relaxed(read_ptr, drvdata->base + TMC_RRP);
+               local_inc(&buf->lost);
+       }
+
+       cur = buf->cur;
+       offset = buf->offset;
+
+       /* for every byte to read */
+       for (i = 0; i < to_read; i += 4) {
+               buf_ptr = buf->data_pages[cur] + offset;
+               *buf_ptr = readl_relaxed(drvdata->base + TMC_RRD);
+
+               offset += 4;
+               if (offset >= PAGE_SIZE) {
+                       offset = 0;
+                       cur++;
+                       /* wrap around at the end of the buffer */
+                       cur &= buf->nr_pages - 1;
+               }
+       }
+
+       /*
+        * In snapshot mode all we have to do is communicate to
+        * perf_aux_output_end() the address of the current head.  In full
+        * trace mode the same function expects a size to move rb->aux_head
+        * forward.
+        */
+       if (buf->snapshot)
+               local_set(&buf->data_size, (cur * PAGE_SIZE) + offset);
+       else
+               local_add(to_read, &buf->data_size);
+
+       CS_LOCK(drvdata->base);
+}
+
 static const struct coresight_ops_sink tmc_etr_sink_ops = {
        .enable         = tmc_enable_etr_sink,
        .disable        = tmc_disable_etr_sink,
+       .alloc_buffer   = tmc_alloc_etr_buffer,
+       .free_buffer    = tmc_free_etr_buffer,
+       .set_buffer     = tmc_set_etr_buffer,
+       .reset_buffer   = tmc_reset_etr_buffer,
+       .update_buffer  = tmc_update_etr_buffer,
 };
 
 const struct coresight_ops tmc_etr_cs_ops = {
@@ -253,7 +485,6 @@ const struct coresight_ops tmc_etr_cs_ops = {
 int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
 {
        int ret = 0;
-       long val;
        unsigned long flags;
 
        /* config types are set a boot time and never change */
@@ -266,9 +497,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
                goto out;
        }
 
-       val = local_read(&drvdata->mode);
        /* Don't interfere if operated from Perf */
-       if (val == CS_MODE_PERF) {
+       if (drvdata->mode == CS_MODE_PERF) {
                ret = -EINVAL;
                goto out;
        }
@@ -280,7 +510,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
        }
 
        /* Disable the TMC if need be */
-       if (val == CS_MODE_SYSFS)
+       if (drvdata->mode == CS_MODE_SYSFS)
                tmc_etr_disable_hw(drvdata);
 
        drvdata->reading = true;
@@ -303,7 +533,7 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata)
        spin_lock_irqsave(&drvdata->spinlock, flags);
 
        /* RE-enable the TMC if need be */
-       if (local_read(&drvdata->mode) == CS_MODE_SYSFS) {
+       if (drvdata->mode == CS_MODE_SYSFS) {
                /*
                 * The trace run will continue with the same allocated trace
                 * buffer. The trace buffer is cleared in tmc_etr_enable_hw(),
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h 
b/drivers/hwtracing/coresight/coresight-tmc.h
index 44b3ae3..51c0185 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -117,7 +117,7 @@ struct tmc_drvdata {
        void __iomem            *vaddr;
        u32                     size;
        u32                     len;
-       local_t                 mode;
+       u32                     mode;
        enum tmc_config_type    config_type;
        enum tmc_mem_intf_width memwidth;
        u32                     trigger_cntr;
diff --git a/drivers/hwtracing/coresight/coresight.c 
b/drivers/hwtracing/coresight/coresight.c
index 7bf00a0..40ede64 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -368,6 +368,40 @@ struct coresight_device *coresight_get_sink(struct 
list_head *path)
        return csdev;
 }
 
+static int coresight_enabled_sink(struct device *dev, void *data)
+{
+       bool *reset = data;
+       struct coresight_device *csdev = to_coresight_device(dev);
+
+       if ((csdev->type == CORESIGHT_DEV_TYPE_SINK ||
+            csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) &&
+            csdev->activated) {
+               /*
+                * Now that we have a handle on the sink for this session,
+                * disable the sysFS "enable_sink" flag so that possible
+                * concurrent perf session that wish to use another sink don't
+                * trip on it.  Doing so has no ramification for the current
+                * session.
+                */
+               if (*reset)
+                       csdev->activated = false;
+
+               return 1;
+       }
+
+       return 0;
+}
+
+struct coresight_device *coresight_get_enabled_sink(bool reset)
+{
+       struct device *dev = NULL;
+
+       dev = bus_find_device(&coresight_bustype, NULL, &reset,
+                             coresight_enabled_sink);
+
+       return dev ? to_coresight_device(dev) : NULL;
+}
+
 /**
  * _coresight_build_path - recursively build a path from a @csdev to a sink.
  * @csdev:     The device to start from.
@@ -380,6 +414,7 @@ struct coresight_device *coresight_get_sink(struct 
list_head *path)
  * last one.
  */
 static int _coresight_build_path(struct coresight_device *csdev,
+                                struct coresight_device *sink,
                                 struct list_head *path)
 {
        int i;
@@ -387,15 +422,15 @@ static int _coresight_build_path(struct coresight_device 
*csdev,
        struct coresight_node *node;
 
        /* An activated sink has been found.  Enqueue the element */
-       if ((csdev->type == CORESIGHT_DEV_TYPE_SINK ||
-            csdev->type == CORESIGHT_DEV_TYPE_LINKSINK) && csdev->activated)
+       if (csdev == sink)
                goto out;
 
        /* Not a sink - recursively explore each port found on this element */
        for (i = 0; i < csdev->nr_outport; i++) {
                struct coresight_device *child_dev = csdev->conns[i].child_dev;
 
-               if (child_dev && _coresight_build_path(child_dev, path) == 0) {
+               if (child_dev &&
+                   _coresight_build_path(child_dev, sink, path) == 0) {
                        found = true;
                        break;
                }
@@ -422,18 +457,22 @@ static int _coresight_build_path(struct coresight_device 
*csdev,
        return 0;
 }
 
-struct list_head *coresight_build_path(struct coresight_device *csdev)
+struct list_head *coresight_build_path(struct coresight_device *source,
+                                      struct coresight_device *sink)
 {
        struct list_head *path;
        int rc;
 
+       if (!sink)
+               return ERR_PTR(-EINVAL);
+
        path = kzalloc(sizeof(struct list_head), GFP_KERNEL);
        if (!path)
                return ERR_PTR(-ENOMEM);
 
        INIT_LIST_HEAD(path);
 
-       rc = _coresight_build_path(csdev, path);
+       rc = _coresight_build_path(source, sink, path);
        if (rc) {
                kfree(path);
                return ERR_PTR(rc);
@@ -497,6 +536,7 @@ static int coresight_validate_source(struct 
coresight_device *csdev,
 int coresight_enable(struct coresight_device *csdev)
 {
        int cpu, ret = 0;
+       struct coresight_device *sink;
        struct list_head *path;
 
        mutex_lock(&coresight_mutex);
@@ -508,7 +548,17 @@ int coresight_enable(struct coresight_device *csdev)
        if (csdev->enable)
                goto out;
 
-       path = coresight_build_path(csdev);
+       /*
+        * Search for a valid sink for this session but don't reset the
+        * "enable_sink" flag in sysFS.  Users get to do that explicitly.
+        */
+       sink = coresight_get_enabled_sink(false);
+       if (!sink) {
+               ret = -EINVAL;
+               goto out;
+       }
+
+       path = coresight_build_path(csdev, sink);
        if (IS_ERR(path)) {
                pr_err("building path(s) failed\n");
                ret = PTR_ERR(path);
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 4ec127b..fe465b5 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -533,6 +533,24 @@ endif
 grep-libs  = $(filter -l%,$(1))
 strip-libs = $(filter-out -l%,$(1))
 
+ifdef CSTRACE_PATH
+  ifeq (${IS_64_BIT}, 1)
+    CSTRACE_LNX = linux64
+  else
+    CSTRACE_LNX = linux
+  endif
+  ifeq (${DEBUG}, 1)
+    LIBCSTRACE = -lcstraced_c_api -lcstraced
+    CSTRACE_LIB_PATH = $(CSTRACE_PATH)/lib/$(CSTRACE_LNX)/dbg
+  else
+    LIBCSTRACE = -lcstraced_c_api -lcstraced
+    CSTRACE_LIB_PATH = $(CSTRACE_PATH)/lib/$(CSTRACE_LNX)/rel
+  endif
+  $(call detected,CSTRACE)
+  $(call detected_var,CSTRACE_PATH)
+  EXTLIBS += -L$(CSTRACE_LIB_PATH) $(LIBCSTRACE) -lstdc++
+endif
+
 ifdef NO_LIBPERL
   CFLAGS += -DNO_LIBPERL
 else
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index a10f064..2810a64 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -86,6 +86,9 @@ include ../scripts/utilities.mak
 #
 # Define FEATURES_DUMP to provide features detection dump file
 # and bypass the feature detection
+#
+# Define NO_CSTRACE if you do not want CoreSight trace decoding support
+#
 
 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL
diff --git a/tools/perf/arch/arm/util/cs-etm.c 
b/tools/perf/arch/arm/util/cs-etm.c
index 47d584d..dfea6b6 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -575,8 +575,6 @@ static FILE *cs_device__open_file(const char *name)
        snprintf(path, PATH_MAX,
                 "%s" CS_BUS_DEVICE_PATH "%s", sysfs, name);
 
-       printf("path: %s\n", path);
-
        if (stat(path, &st) < 0)
                return NULL;
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7228d14..667f8c2 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -109,7 +109,8 @@ static struct {
 
                .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
                              PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
-                             PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+                             PERF_OUTPUT_EVNAME | PERF_OUTPUT_ADDR |
+                             PERF_OUTPUT_IP |
                              PERF_OUTPUT_SYM | PERF_OUTPUT_DSO |
                              PERF_OUTPUT_PERIOD,
 
diff --git a/tools/perf/scripts/python/cs-trace-disasm.py 
b/tools/perf/scripts/python/cs-trace-disasm.py
new file mode 100644
index 0000000..c370e26
--- /dev/null
+++ b/tools/perf/scripts/python/cs-trace-disasm.py
@@ -0,0 +1,134 @@
+# perf script event handlers, generated by perf script -g python
+# Licensed under the terms of the GNU GPL License version 2
+
+# The common_* event handler fields are the most useful fields common to
+# all events.  They don't necessarily correspond to the 'common_*' fields
+# in the format files.  Those fields not available as handler params can
+# be retrieved using Python functions of the form common_*(context).
+# See the perf-trace-python Documentation for the list of available functions.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+                '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from subprocess import *
+from Core import *
+import re;
+
+from optparse import OptionParser
+
+#
+# Add options to specify vmlinux file and the objdump executable
+#
+parser = OptionParser()
+parser.add_option("-k", "--vmlinux", dest="vmlinux_name",
+                  help="path to vmlinux file")
+parser.add_option("-d", "--objdump", dest="objdump_name",
+                  help="name of objdump executable (in path)")
+(options, args) = parser.parse_args()
+
+if (options.objdump_name == None):
+        sys.exit("No objdump executable specified - use -d or --objdump 
option")
+
+# initialize global dicts and regular expression
+
+build_ids = dict();
+mmaps = dict();
+disasm_cache = dict();
+disasm_re = re.compile("^\s*([0-9a-fA-F]+):")
+
+cache_size = 16*1024
+
+def trace_begin():
+        cmd_output = check_output(["perf", "buildid-list"]).split('\n');
+        bid_re = re.compile("([a-fA-f0-9]+)[ \t]([^ \n]+)")
+        for line in cmd_output:
+                m = bid_re.search(line)
+                if (m != None) :
+                       if (m.group(2) == "[kernel.kallsyms]") :
+                               append = "/kallsyms"
+                               dirname = "/" + m.group(2)
+                       elif (m.group(2) == "[vdso]") :
+                               append = "/vdso"
+                               dirname = "/" + m.group(2)
+                       else:
+                               append = "/elf"
+                               dirname = m.group(2)
+
+                        build_ids[m.group(2)] =  \
+                        os.environ['PERF_BUILDID_DIR'] +  \
+                       dirname + "/" + m.group(1) + append;
+
+        if ((options.vmlinux_name != None) and ("[kernel.kallsyms]" in 
build_ids)):
+                build_ids['[kernel.kallsyms]'] = options.vmlinux_name;
+        else:
+                del build_ids['[kernel.kallsyms]']
+
+        mmap_re = re.compile("PERF_RECORD_MMAP2 -?[0-9]+/[0-9]+: 
\[(0x[0-9a-fA-F]+).*:\s.*\s(\S*)")
+        cmd_output= check_output("perf script --show-mmap-events | fgrep 
PERF_RECORD_MMAP2",shell=True).split('\n')
+        for line in cmd_output:
+                m = mmap_re.search(line)
+                if (m != None) :
+                        mmaps[m.group(2)] = int(m.group(1),0)
+
+
+
+def trace_end():
+        pass
+
+def process_event(t):
+        global cache_size
+        global options
+
+        sample = t['sample']
+        dso = t['dso']
+
+        # don't let the cache get too big, but don't bother with a fancy 
replacement policy
+        # just clear it when it hits max size
+
+        if (len(disasm_cache) > cache_size):
+                disasm_cache.clear();
+
+        cpu = format(sample['cpu'], "d");
+        addr_range = format(sample['ip'],"x")  + ":" + 
format(sample['addr'],"x");
+
+        try:
+                disasm_output = disasm_cache[addr_range];
+        except:
+                try:
+                        fname = build_ids[dso];
+                except KeyError:
+                        if (dso == '[kernel.kallsyms]'):
+                                return;
+                        fname = dso;
+
+                if (dso in mmaps):
+                        offset = mmaps[dso];
+                        disasm = [options.objdump_name,"-d","-z", 
"--adjust-vma="+format(offset,"#x"),"--start-address="+format(sample['ip'],"#x"),"--stop-address="+format(sample['addr'],"#x"),
 fname]
+                else:
+                        offset = 0
+                        disasm = [options.objdump_name,"-d","-z", 
"--start-address="+format(sample['ip'],"#x"),"--stop-address="+format(sample['addr'],"#x"),fname]
+                disasm_output = check_output(disasm).split('\n')
+                disasm_cache[addr_range] = disasm_output;
+
+        print "FILE: %s\tCPU: %s" % (dso, cpu);
+        for line in disasm_output:
+                m = disasm_re.search(line)
+                if (m != None) :
+                        try:
+                                print "\t",line
+                        except:
+                                exit(1);
+                else:
+                        continue;
+
+def trace_unhandled(event_name, context, event_fields_dict):
+               print ' '.join(['%s=%s'%(k,str(v))for k,v in 
sorted(event_fields_dict.items())])
+
+def print_header(event_name, cpu, secs, nsecs, pid, comm):
+        print "print_header"
+       print "%-20s %5u %05u.%09u %8u %-20s " % \
+       (event_name, cpu, secs, nsecs, pid, comm),
diff --git a/tools/perf/scripts/python/cs-trace-ranges.py 
b/tools/perf/scripts/python/cs-trace-ranges.py
new file mode 100644
index 0000000..c8edacb
--- /dev/null
+++ b/tools/perf/scripts/python/cs-trace-ranges.py
@@ -0,0 +1,44 @@
+#
+# Copyright(C) 2016 Linaro Limited. All rights reserved.
+# Author: Tor Jeremiassen <tor.jeremias...@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License version 2 as published by
+# the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+#
+# You should have received a copy of the GNU General Public License along with
+# this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+                '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+
+def trace_begin():
+        pass;
+
+def trace_end():
+        pass
+
+def process_event(t):
+
+        sample = t['sample']
+
+        print "range:",format(sample['ip'],"x"),"-",format(sample['addr'],"x")
+
+def trace_unhandled(event_name, context, event_fields_dict):
+               print ' '.join(['%s=%s'%(k,str(v))for k,v in 
sorted(event_fields_dict.items())])
+
+def print_header(event_name, cpu, secs, nsecs, pid, comm):
+        print "print_header"
+       print "%-20s %5u %05u.%09u %8u %-20s " % \
+       (event_name, cpu, secs, nsecs, pid, comm),
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 1dc67ef..2bbb725 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -80,6 +80,8 @@ libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
 libperf-$(CONFIG_AUXTRACE) += intel-bts.o
+libperf-$(CONFIG_AUXTRACE) += cs-etm.o
+libperf-$(CONFIG_AUXTRACE) += cs-etm-decoder/
 libperf-y += parse-branch-options.o
 libperf-y += parse-regs-options.o
 libperf-y += term.o
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index c5a6e0b..4dbd500 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -58,6 +58,7 @@
 
 #include "intel-pt.h"
 #include "intel-bts.h"
+#include "cs-etm.h"
 
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
                        struct auxtrace_mmap_params *mp,
@@ -902,6 +903,7 @@ int perf_event__process_auxtrace_info(struct perf_tool 
*tool __maybe_unused,
        case PERF_AUXTRACE_INTEL_BTS:
                return intel_bts_process_auxtrace_info(event, session);
        case PERF_AUXTRACE_CS_ETM:
+               return cs_etm__process_auxtrace_info(event, session);
        case PERF_AUXTRACE_UNKNOWN:
        default:
                return -EINVAL;
diff --git a/tools/perf/util/cs-etm-decoder/Build 
b/tools/perf/util/cs-etm-decoder/Build
new file mode 100644
index 0000000..e097599
--- /dev/null
+++ b/tools/perf/util/cs-etm-decoder/Build
@@ -0,0 +1,6 @@
+ifeq ($(CSTRACE_PATH),)
+libperf-$(CONFIG_AUXTRACE) += cs-etm-decoder-stub.o
+else
+CFLAGS_cs-etm-decoder.o += -I$(CSTRACE_PATH)/include
+libperf-$(CONFIG_AUXTRACE) += cs-etm-decoder.o
+endif
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder-stub.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder-stub.c
new file mode 100644
index 0000000..d2ebbd2
--- /dev/null
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder-stub.c
@@ -0,0 +1,99 @@
+/*
+ *
+ * Copyright(C) 2015 Linaro Limited. All rights reserved.
+ * Author: Tor Jeremiassen <tor.jeremias...@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
+ * Public License for more details.
+ *
+ * You should have received a copy of the GNU GEneral Public License along
+ * with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdlib.h>
+
+#include "cs-etm-decoder.h"
+#include "../util.h"
+
+
+struct cs_etm_decoder {
+       void *state;
+       int dummy;
+};
+
+int cs_etm_decoder__flush(struct cs_etm_decoder *decoder)
+{
+       (void) decoder;
+       return -1;
+}
+
+int cs_etm_decoder__add_bin_file(struct cs_etm_decoder *decoder,
+                               uint64_t offset,
+                               uint64_t address,
+                               uint64_t len,
+                               const char *fname)
+{
+       (void) decoder;
+       (void) offset;
+       (void) address;
+       (void) len;
+       (void) fname;
+       return -1;
+}
+
+const struct cs_etm_state *cs_etm_decoder__process_data_block(
+                               struct cs_etm_decoder *decoder,
+                               uint64_t indx,
+                               const uint8_t *buf,
+                               size_t len,
+                               size_t *consumed)
+{
+       (void) decoder;
+       (void) indx;
+       (void) buf;
+       (void) len;
+       (void) consumed;
+       return NULL;
+}
+
+int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder,
+                               uint64_t address,
+                               uint64_t len,
+                               cs_etm_mem_cb_type cb_func)
+{
+       (void) decoder;
+       (void) address;
+       (void) len;
+       (void) cb_func;
+       return -1;
+}
+
+int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder,
+                               struct cs_etm_packet *packet)
+{
+       (void) decoder;
+       (void) packet;
+       return -1;
+}
+
+struct cs_etm_decoder *cs_etm_decoder__new(uint32_t num_cpu,
+                               struct cs_etm_decoder_params *d_params,
+                               struct cs_etm_trace_params t_params[])
+{
+       (void) num_cpu;
+       (void) d_params;
+       (void) t_params;
+       return NULL;
+}
+
+void cs_etm_decoder__free(struct cs_etm_decoder *decoder)
+{
+       (void) decoder;
+       return;
+}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
new file mode 100644
index 0000000..ee2e02f
--- /dev/null
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -0,0 +1,527 @@
+/*
+ *
+ * Copyright(C) 2015 Linaro Limited. All rights reserved.
+ * Author: Tor Jeremiassen <tor.jeremias...@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
+ * Public License for more details.
+ *
+ * You should have received a copy of the GNU GEneral Public License along
+ * with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/err.h>
+#include <stdlib.h>
+
+#include "../cs-etm.h"
+#include "cs-etm-decoder.h"
+#include "../util.h"
+#include "../util/intlist.h"
+
+#include "c_api/opencsd_c_api.h"
+#include "ocsd_if_types.h"
+#include "etmv4/trc_pkt_types_etmv4.h"
+
+#define MAX_BUFFER 1024
+
+struct cs_etm_decoder {
+       struct cs_etm_state     state;
+       dcd_tree_handle_t       dcd_tree;
+       void (*packet_printer)(const char *);
+       cs_etm_mem_cb_type      mem_access;
+       ocsd_datapath_resp_t    prev_return;
+       size_t                  prev_processed;
+       bool                    trace_on;
+       bool                    discontinuity;
+       struct cs_etm_packet    packet_buffer[MAX_BUFFER];
+       uint32_t                packet_count;
+       uint32_t                head;
+       uint32_t                tail;
+       uint32_t                end_tail;
+};
+
+static uint32_t cs_etm_decoder__mem_access(const void *context,
+                                          const ocsd_vaddr_t address,
+                                          const ocsd_mem_space_acc_t mem_space,
+                                          const uint32_t req_size,
+                                          uint8_t *buffer)
+{
+       struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
+       (void) mem_space;
+
+       return decoder->mem_access(decoder->state.data, address, req_size, 
buffer);
+}
+
+static int cs_etm_decoder__gen_etmv4_config(struct cs_etm_trace_params *params,
+                                           ocsd_etmv4_cfg *config)
+{
+       config->reg_configr = params->reg_configr;
+       config->reg_traceidr = params->reg_traceidr;
+       config->reg_idr0 = params->reg_idr0;
+       config->reg_idr1 = params->reg_idr1;
+       config->reg_idr2 = params->reg_idr2;
+       config->reg_idr8 = params->reg_idr8;
+
+       config->reg_idr9 = 0;
+       config->reg_idr10 = 0;
+       config->reg_idr11 = 0;
+       config->reg_idr12 = 0;
+       config->reg_idr13 = 0;
+       config->arch_ver = ARCH_V8;
+       config->core_prof = profile_CortexA;
+
+       return 0;
+}
+
+static int cs_etm_decoder__flush_packet(struct cs_etm_decoder *decoder)
+{
+       int err = 0;
+
+       if (decoder == NULL)
+               return -1;
+
+       if (decoder->packet_count >= 31)
+               return -1;
+
+       if (decoder->tail != decoder->end_tail) {
+               decoder->tail = (decoder->tail + 1) & (MAX_BUFFER - 1);
+               decoder->packet_count++;
+       }
+
+       return err;
+}
+
+int cs_etm_decoder__flush(struct cs_etm_decoder *decoder)
+{
+       return cs_etm_decoder__flush_packet(decoder);
+}
+
+static int cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
+                                        const ocsd_generic_trace_elem *elem,
+                                        const uint8_t trace_chan_id,
+                                        enum cs_etm_sample_type sample_type)
+{
+       int err = 0;
+       uint32_t et = 0;
+       struct int_node *inode = NULL;
+
+       if (decoder == NULL)
+               return -1;
+
+       if (decoder->packet_count >= 31)
+               return -1;
+
+       err = cs_etm_decoder__flush_packet(decoder);
+
+       if (err)
+               return err;
+
+       et = decoder->end_tail;
+       /* Search the RB tree for the cpu associated with this traceID */
+       inode = intlist__find(traceid_list, trace_chan_id);
+       if (!inode)
+               return PTR_ERR(inode);
+
+       decoder->packet_buffer[et].sample_type = sample_type;
+       decoder->packet_buffer[et].start_addr = elem->st_addr;
+       decoder->packet_buffer[et].end_addr   = elem->en_addr;
+       decoder->packet_buffer[et].exc  = false;
+       decoder->packet_buffer[et].exc_ret    = false;
+       decoder->packet_buffer[et].cpu  = *((int *)inode->priv);
+
+       et = (et + 1) & (MAX_BUFFER - 1);
+
+       decoder->end_tail = et;
+
+       return err;
+}
+
+static int cs_etm_decoder__mark_exception(struct cs_etm_decoder *decoder)
+{
+       int err = 0;
+
+       if (decoder == NULL)
+               return -1;
+
+       decoder->packet_buffer[decoder->end_tail].exc = true;
+
+       return err;
+}
+
+static int cs_etm_decoder__mark_exception_return(struct cs_etm_decoder 
*decoder)
+{
+       int err = 0;
+
+       if (decoder == NULL)
+               return -1;
+
+       decoder->packet_buffer[decoder->end_tail].exc_ret = true;
+
+       return err;
+}
+
+static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
+                       const void *context,
+                       const ocsd_trc_index_t indx,
+                       const uint8_t trace_chan_id,
+                       const ocsd_generic_trace_elem *elem)
+{
+       ocsd_datapath_resp_t resp = OCSD_RESP_CONT;
+       struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
+
+       (void) indx;
+       (void) trace_chan_id;
+
+       switch (elem->elem_type) {
+       case OCSD_GEN_TRC_ELEM_UNKNOWN:
+               break;
+       case OCSD_GEN_TRC_ELEM_NO_SYNC:
+               decoder->trace_on = false;
+               break;
+       case OCSD_GEN_TRC_ELEM_TRACE_ON:
+               decoder->trace_on = true;
+               break;
+       case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
+               cs_etm_decoder__buffer_packet(decoder, elem,
+                                            trace_chan_id, CS_ETM_RANGE);
+               resp = OCSD_RESP_WAIT;
+               break;
+       case OCSD_GEN_TRC_ELEM_EXCEPTION:
+               cs_etm_decoder__mark_exception(decoder);
+               break;
+       case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
+               cs_etm_decoder__mark_exception_return(decoder);
+               break;
+       case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
+       case OCSD_GEN_TRC_ELEM_EO_TRACE:
+       case OCSD_GEN_TRC_ELEM_ADDR_NACC:
+       case OCSD_GEN_TRC_ELEM_TIMESTAMP:
+       case OCSD_GEN_TRC_ELEM_CYCLE_COUNT:
+       case OCSD_GEN_TRC_ELEM_ADDR_UNKNOWN:
+       case OCSD_GEN_TRC_ELEM_EVENT:
+       case OCSD_GEN_TRC_ELEM_SWTRACE:
+       case OCSD_GEN_TRC_ELEM_CUSTOM:
+       default:
+               break;
+       }
+
+       decoder->state.err = 0;
+
+       return resp;
+}
+
+static ocsd_datapath_resp_t cs_etm_decoder__etmv4i_packet_printer(
+       const void *context,
+       const ocsd_datapath_op_t op,
+       const ocsd_trc_index_t indx,
+       const ocsd_etmv4_i_pkt *pkt)
+{
+       const size_t PACKET_STR_LEN = 1024;
+       ocsd_datapath_resp_t ret = OCSD_RESP_CONT;
+       char packet_str[PACKET_STR_LEN];
+       size_t offset;
+       struct cs_etm_decoder *decoder = (struct cs_etm_decoder *) context;
+
+       sprintf(packet_str, "%ld: ", (long int) indx);
+       offset = strlen(packet_str);
+
+       switch (op) {
+       case OCSD_OP_DATA:
+               if (ocsd_pkt_str(OCSD_PROTOCOL_ETMV4I,
+                                 (void *)pkt,
+                                 packet_str+offset,
+                                 PACKET_STR_LEN-offset) != OCSD_OK)
+                       ret = OCSD_RESP_FATAL_INVALID_PARAM;
+               break;
+       case OCSD_OP_EOT:
+               sprintf(packet_str, "**** END OF TRACE ****\n");
+               break;
+       case OCSD_OP_FLUSH:
+       case OCSD_OP_RESET:
+       default:
+               break;
+       }
+
+       decoder->packet_printer(packet_str);
+
+       return ret;
+}
+
+static int cs_etm_decoder__create_etmv4i_packet_printer(struct 
cs_etm_decoder_params *d_params,
+                                                       struct 
cs_etm_trace_params *t_params,
+                                                       struct cs_etm_decoder 
*decoder)
+{
+       ocsd_etmv4_cfg trace_config;
+       int ret = 0;
+       unsigned char CSID; /* CSID extracted from the config data */
+
+       if (d_params->packet_printer == NULL)
+               return -1;
+
+       ret = cs_etm_decoder__gen_etmv4_config(t_params, &trace_config);
+
+       if (ret != 0)
+               return -1;
+
+       decoder->packet_printer = d_params->packet_printer;
+
+       ret = ocsd_dt_create_decoder(decoder->dcd_tree,
+                                    OCSD_BUILTIN_DCD_ETMV4I,
+                                    OCSD_CREATE_FLG_PACKET_PROC,
+                                    (void *)&trace_config,
+                                    &CSID);
+
+       if (ret != 0)
+               return -1;
+
+       ret = ocsd_dt_attach_packet_callback(decoder->dcd_tree,
+                                         CSID,
+                                         OCSD_C_API_CB_PKT_SINK,
+                                         cs_etm_decoder__etmv4i_packet_printer,
+                                         decoder);
+       return ret;
+}
+
+static int cs_etm_decoder__create_etmv4i_packet_decoder(struct 
cs_etm_decoder_params *d_params,
+                                                       struct 
cs_etm_trace_params *t_params,
+                                                       struct cs_etm_decoder 
*decoder)
+{
+       ocsd_etmv4_cfg trace_config;
+       int ret = 0;
+       unsigned char CSID; /* CSID extracted from the config data */
+
+       decoder->packet_printer = d_params->packet_printer;
+
+       ret = cs_etm_decoder__gen_etmv4_config(t_params, &trace_config);
+
+       if (ret != 0)
+               return -1;
+
+       ret = ocsd_dt_create_decoder(decoder->dcd_tree,
+                                    OCSD_BUILTIN_DCD_ETMV4I,
+                                    OCSD_CREATE_FLG_FULL_DECODER,
+                                    (void *)&trace_config,
+                                    &CSID);
+
+       if (ret != 0)
+               return -1;
+
+       ret = ocsd_dt_set_gen_elem_outfn(decoder->dcd_tree,
+                                       cs_etm_decoder__gen_trace_elem_printer, 
decoder);
+       return ret;
+}
+
+int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *decoder,
+                                     uint64_t address,
+                                     uint64_t len,
+                                     cs_etm_mem_cb_type cb_func)
+{
+       int err;
+
+       decoder->mem_access = cb_func;
+       err = ocsd_dt_add_callback_mem_acc(decoder->dcd_tree,
+                                          address,
+                                          address+len-1,
+                                          OCSD_MEM_SPACE_ANY,
+                                          cs_etm_decoder__mem_access,
+                                          decoder);
+       return err;
+}
+
+
+int cs_etm_decoder__add_bin_file(struct cs_etm_decoder *decoder,
+                                uint64_t offset,
+                                uint64_t address,
+                                uint64_t len,
+                                const char *fname)
+{
+       int err = 0;
+       ocsd_file_mem_region_t region;
+
+       (void) len;
+       if (NULL == decoder)
+               return -1;
+
+       if (NULL == decoder->dcd_tree)
+               return -1;
+
+       region.file_offset = offset;
+       region.start_address = address;
+       region.region_size = len;
+       err = ocsd_dt_add_binfile_region_mem_acc(decoder->dcd_tree,
+                                          &region,
+                                          1,
+                                          OCSD_MEM_SPACE_ANY,
+                                          fname);
+
+       return err;
+}
+
+const struct cs_etm_state *cs_etm_decoder__process_data_block(struct 
cs_etm_decoder *decoder,
+                                       uint64_t indx,
+                                       const uint8_t *buf,
+                                       size_t len,
+                                       size_t *consumed)
+{
+       int ret = 0;
+       ocsd_datapath_resp_t dp_ret = decoder->prev_return;
+       size_t processed = 0;
+
+       if (decoder->packet_count > 0) {
+               decoder->state.err = ret;
+               *consumed = processed;
+               return &(decoder->state);
+       }
+
+       while ((processed < len) && (0 == ret)) {
+
+               if (OCSD_DATA_RESP_IS_CONT(dp_ret)) {
+                       uint32_t count;
+                       dp_ret = ocsd_dt_process_data(decoder->dcd_tree,
+                                                       OCSD_OP_DATA,
+                                                       indx+processed,
+                                                       len - processed,
+                                                       &buf[processed],
+                                                       &count);
+                       processed += count;
+
+               } else if (OCSD_DATA_RESP_IS_WAIT(dp_ret)) {
+                       dp_ret = ocsd_dt_process_data(decoder->dcd_tree,
+                                                       OCSD_OP_FLUSH,
+                                                       0,
+                                                       0,
+                                                       NULL,
+                                                       NULL);
+                       break;
+               } else
+                       ret = -1;
+       }
+       if (OCSD_DATA_RESP_IS_WAIT(dp_ret)) {
+               if (OCSD_DATA_RESP_IS_CONT(decoder->prev_return)) {
+                       decoder->prev_processed = processed;
+               }
+               processed = 0;
+       } else if (OCSD_DATA_RESP_IS_WAIT(decoder->prev_return)) {
+               processed = decoder->prev_processed;
+               decoder->prev_processed = 0;
+       }
+       *consumed = processed;
+       decoder->prev_return = dp_ret;
+       decoder->state.err = ret;
+       return &(decoder->state);
+}
+
+int cs_etm_decoder__get_packet(struct cs_etm_decoder *decoder,
+                              struct cs_etm_packet *packet)
+{
+       if (decoder->packet_count == 0)
+               return -1;
+
+       if (packet == NULL)
+               return -1;
+
+       *packet = decoder->packet_buffer[decoder->head];
+
+       decoder->head = (decoder->head + 1) & (MAX_BUFFER - 1);
+
+       decoder->packet_count--;
+
+       return 0;
+}
+
+static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
+{
+       unsigned i;
+
+       decoder->head = 0;
+       decoder->tail = 0;
+       decoder->end_tail = 0;
+       decoder->packet_count = 0;
+       for (i = 0; i < MAX_BUFFER; i++) {
+               decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL;
+               decoder->packet_buffer[i].end_addr   = 0xdeadbeefdeadbeefUL;
+               decoder->packet_buffer[i].exc   = false;
+               decoder->packet_buffer[i].exc_ret    = false;
+               decoder->packet_buffer[i].cpu   = INT_MIN;
+       }
+}
+
+struct cs_etm_decoder *cs_etm_decoder__new(uint32_t num_cpu,
+                                          struct cs_etm_decoder_params 
*d_params,
+                                          struct cs_etm_trace_params 
t_params[])
+{
+       struct cs_etm_decoder *decoder;
+       ocsd_dcd_tree_src_t format;
+       uint32_t flags;
+       int ret;
+       size_t i;
+
+       if ((t_params == NULL) || (d_params == 0))
+               return NULL;
+
+       decoder = zalloc(sizeof(struct cs_etm_decoder));
+
+       if (decoder == NULL)
+               return NULL;
+
+       decoder->state.data = d_params->data;
+       decoder->prev_return = OCSD_RESP_CONT;
+       cs_etm_decoder__clear_buffer(decoder);
+       format = (d_params->formatted ? OCSD_TRC_SRC_FRAME_FORMATTED :
+                                        OCSD_TRC_SRC_SINGLE);
+       flags = 0;
+       flags |= (d_params->fsyncs ? OCSD_DFRMTR_HAS_FSYNCS : 0);
+       flags |= (d_params->hsyncs ? OCSD_DFRMTR_HAS_HSYNCS : 0);
+       flags |= (d_params->frame_aligned ? OCSD_DFRMTR_FRAME_MEM_ALIGN : 0);
+
+       /* Create decode tree for the data source */
+       decoder->dcd_tree = ocsd_create_dcd_tree(format, flags);
+
+       if (decoder->dcd_tree == 0)
+               goto err_free_decoder;
+
+       for (i = 0; i < num_cpu; ++i) {
+               switch (t_params[i].protocol) {
+               case CS_ETM_PROTO_ETMV4i:
+                       if (d_params->operation == CS_ETM_OPERATION_PRINT)
+                               ret = 
cs_etm_decoder__create_etmv4i_packet_printer(d_params, &t_params[i], decoder);
+                       else if (d_params->operation == CS_ETM_OPERATION_DECODE)
+                               ret = 
cs_etm_decoder__create_etmv4i_packet_decoder(d_params, &t_params[i], decoder);
+                       else
+                               ret = -CS_ETM_ERR_PARAM;
+                       if (ret != 0)
+                               goto err_free_decoder_tree;
+                       break;
+               default:
+                       goto err_free_decoder_tree;
+                       break;
+               }
+       }
+
+
+       return decoder;
+
+err_free_decoder_tree:
+       ocsd_destroy_dcd_tree(decoder->dcd_tree);
+err_free_decoder:
+       free(decoder);
+       return NULL;
+}
+
+
+void cs_etm_decoder__free(struct cs_etm_decoder *decoder)
+{
+       if (decoder == NULL)
+               return;
+
+       ocsd_destroy_dcd_tree(decoder->dcd_tree);
+       decoder->dcd_tree = NULL;
+
+       free(decoder);
+}
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
new file mode 100644
index 0000000..7e9db4c
--- /dev/null
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -0,0 +1,117 @@
+/*
+ * Copyright(C) 2015 Linaro Limited. All rights reserved.
+ * Author: Tor Jeremiassen <tor.jeremias...@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
+ * Public License for more details.
+ *
+ * You should have received a copy of the GNU GEneral Public License along
+ * with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef INCLUDE__CS_ETM_DECODER_H__
+#define INCLUDE__CS_ETM_DECODER_H__
+
+#include <linux/types.h>
+#include <stdio.h>
+
+struct cs_etm_decoder;
+
+struct cs_etm_buffer {
+       const unsigned char *buf;
+       size_t  len;
+       uint64_t offset;
+       //bool    consecutive;
+       uint64_t        ref_timestamp;
+       //uint64_t      trace_nr;
+};
+
+enum cs_etm_sample_type {
+       CS_ETM_RANGE      = 1 << 0,
+};
+
+struct cs_etm_state {
+       int err;
+       void *data;
+       unsigned isa;
+       uint64_t start;
+       uint64_t end;
+       uint64_t timestamp;
+};
+
+struct cs_etm_packet {
+       enum cs_etm_sample_type sample_type;
+       uint64_t start_addr;
+       uint64_t end_addr;
+       bool     exc;
+       bool     exc_ret;
+       int cpu;
+};
+
+
+struct cs_etm_queue;
+typedef uint32_t (*cs_etm_mem_cb_type)(struct cs_etm_queue *, uint64_t, 
size_t, uint8_t *);
+
+struct cs_etm_trace_params {
+       void *etmv4i_packet_handler;
+       uint32_t reg_idr0;
+       uint32_t reg_idr1;
+       uint32_t reg_idr2;
+       uint32_t reg_idr8;
+       uint32_t reg_configr;
+       uint32_t reg_traceidr;
+       int  protocol;
+};
+
+struct cs_etm_decoder_params {
+       int  operation;
+       void (*packet_printer)(const char *);
+       cs_etm_mem_cb_type  mem_acc_cb;
+       bool formatted;
+       bool fsyncs;
+       bool hsyncs;
+       bool frame_aligned;
+       void *data;
+};
+
+enum {
+       CS_ETM_PROTO_ETMV3 = 1,
+       CS_ETM_PROTO_ETMV4i,
+       CS_ETM_PROTO_ETMV4d,
+};
+
+enum {
+       CS_ETM_OPERATION_PRINT = 1,
+       CS_ETM_OPERATION_DECODE,
+};
+
+enum {
+       CS_ETM_ERR_NOMEM = 1,
+       CS_ETM_ERR_NODATA,
+       CS_ETM_ERR_PARAM,
+};
+
+
+struct cs_etm_decoder *cs_etm_decoder__new(uint32_t num_cpu, struct 
cs_etm_decoder_params *, struct cs_etm_trace_params []);
+
+int cs_etm_decoder__add_mem_access_cb(struct cs_etm_decoder *, uint64_t, 
uint64_t, cs_etm_mem_cb_type);
+
+int cs_etm_decoder__flush(struct cs_etm_decoder *);
+void cs_etm_decoder__free(struct cs_etm_decoder *);
+int cs_etm_decoder__get_packet(struct cs_etm_decoder *, struct cs_etm_packet 
*);
+
+int cs_etm_decoder__add_bin_file(struct cs_etm_decoder *, uint64_t, uint64_t, 
uint64_t, const char *);
+
+const struct cs_etm_state *cs_etm_decoder__process_data_block(struct 
cs_etm_decoder *,
+                                      uint64_t,
+                                      const uint8_t *,
+                                      size_t,
+                                      size_t *);
+
+#endif /* INCLUDE__CS_ETM_DECODER_H__ */
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
new file mode 100644
index 0000000..91d6a8a
--- /dev/null
+++ b/tools/perf/util/cs-etm.c
@@ -0,0 +1,1501 @@
+/*
+ * Copyright(C) 2016 Linaro Limited. All rights reserved.
+ * Author: Tor Jeremiassen <tor.jeremias...@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "perf.h"
+#include "thread_map.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "callchain.h"
+#include "auxtrace.h"
+#include "evlist.h"
+#include "machine.h"
+#include "util.h"
+#include "util/intlist.h"
+#include "color.h"
+#include "cs-etm.h"
+#include "cs-etm-decoder/cs-etm-decoder.h"
+#include "debug.h"
+
+#include <stdlib.h>
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define MAX_TIMESTAMP (~0ULL)
+
+struct cs_etm_auxtrace {
+       struct auxtrace         auxtrace;
+       struct auxtrace_queues  queues;
+       struct auxtrace_heap    heap;
+       u64                     **metadata;
+       u32                     auxtrace_type;
+       struct perf_session     *session;
+       struct machine          *machine;
+       struct perf_evsel       *switch_evsel;
+       struct thread           *unknown_thread;
+       uint32_t                num_cpu;
+       bool                    timeless_decoding;
+       bool                    sampling_mode;
+       bool                    snapshot_mode;
+       bool                    data_queued;
+       bool                    sync_switch;
+       bool                    synth_needs_swap;
+       int                     have_sched_switch;
+
+       bool                    sample_instructions;
+       u64                     instructions_sample_type;
+       u64                     instructions_sample_period;
+       u64                     instructions_id;
+       struct itrace_synth_opts synth_opts;
+       unsigned                pmu_type;
+};
+
+struct cs_etm_queue {
+       struct cs_etm_auxtrace  *etm;
+       unsigned                queue_nr;
+       struct auxtrace_buffer  *buffer;
+       const struct            cs_etm_state *state;
+       struct ip_callchain     *chain;
+       union perf_event        *event_buf;
+       bool                    on_heap;
+       bool                    step_through_buffers;
+       bool                    use_buffer_pid_tid;
+       pid_t                   pid, tid;
+       int                     cpu;
+       struct thread           *thread;
+       u64                     time;
+       u64                     timestamp;
+       bool                    stop;
+       struct cs_etm_decoder   *decoder;
+       u64                     offset;
+       bool                    eot;
+       bool                    kernel_mapped;
+};
+
+static int cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue 
*etmq);
+static int cs_etm__update_queues(struct cs_etm_auxtrace *);
+static int cs_etm__process_queues(struct cs_etm_auxtrace *, u64);
+static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *, pid_t, 
u64);
+static uint32_t cs_etm__mem_access(struct cs_etm_queue *, uint64_t, size_t, 
uint8_t *);
+
+static void cs_etm__packet_dump(const char *pkt_string)
+{
+       const char *color = PERF_COLOR_BLUE;
+
+       color_fprintf(stdout, color, "  %s\n", pkt_string);
+       fflush(stdout);
+}
+
+static void cs_etm__dump_event(struct cs_etm_auxtrace *etm,
+                             struct auxtrace_buffer *buffer)
+{
+       const char *color = PERF_COLOR_BLUE;
+       struct cs_etm_decoder_params d_params;
+       struct cs_etm_trace_params *t_params;
+       struct cs_etm_decoder *decoder;
+       size_t buffer_used = 0;
+       size_t i;
+
+       fprintf(stdout, "\n");
+       color_fprintf(stdout, color,
+                    ". ... CoreSight ETM Trace data: size %zu bytes\n",
+                    buffer->size);
+
+       t_params = zalloc(sizeof(struct cs_etm_trace_params) * etm->num_cpu);
+       for (i = 0; i < etm->num_cpu; ++i) {
+               t_params[i].protocol = CS_ETM_PROTO_ETMV4i;
+               t_params[i].reg_idr0 = etm->metadata[i][CS_ETMV4_TRCIDR0];
+               t_params[i].reg_idr1 = etm->metadata[i][CS_ETMV4_TRCIDR1];
+               t_params[i].reg_idr2 = etm->metadata[i][CS_ETMV4_TRCIDR2];
+               t_params[i].reg_idr8 = etm->metadata[i][CS_ETMV4_TRCIDR8];
+               t_params[i].reg_configr = etm->metadata[i][CS_ETMV4_TRCCONFIGR];
+               t_params[i].reg_traceidr = 
etm->metadata[i][CS_ETMV4_TRCTRACEIDR];
+  //[CS_ETMV4_TRCAUTHSTATUS] = "   TRCAUTHSTATUS                 %"PRIx64"\n",
+       }
+       d_params.packet_printer = cs_etm__packet_dump;
+       d_params.operation = CS_ETM_OPERATION_PRINT;
+       d_params.formatted = true;
+       d_params.fsyncs = false;
+       d_params.hsyncs = false;
+       d_params.frame_aligned = true;
+
+       decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params);
+
+       zfree(&t_params);
+
+       if (decoder == NULL)
+               return;
+       do {
+           size_t consumed;
+           cs_etm_decoder__process_data_block(decoder, buffer->offset,
+                                              &(((uint8_t 
*)buffer->data)[buffer_used]),
+                                              buffer->size - buffer_used, 
&consumed);
+           buffer_used += consumed;
+       } while (buffer_used < buffer->size);
+       cs_etm_decoder__free(decoder);
+}
+
+static int cs_etm__flush_events(struct perf_session *session, struct perf_tool 
*tool)
+{
+       struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+                                                  struct cs_etm_auxtrace,
+                                                  auxtrace);
+
+       int ret;
+
+       if (dump_trace)
+               return 0;
+
+       if (!tool->ordered_events)
+               return -EINVAL;
+
+       ret = cs_etm__update_queues(etm);
+
+       if (ret < 0)
+               return ret;
+
+       if (etm->timeless_decoding)
+               return cs_etm__process_timeless_queues(etm, -1, MAX_TIMESTAMP - 
1);
+
+       return cs_etm__process_queues(etm, MAX_TIMESTAMP);
+}
+
+static void  cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
+                                   struct auxtrace_queue *queue)
+{
+       struct cs_etm_queue *etmq = queue->priv;
+
+       if ((queue->tid == -1) || (etm->have_sched_switch)) {
+               etmq->tid = machine__get_current_tid(etm->machine, etmq->cpu);
+               thread__zput(etmq->thread);
+       }
+
+       if ((!etmq->thread) && (etmq->tid != -1))
+               etmq->thread = machine__find_thread(etm->machine, -1, 
etmq->tid);
+
+       if (etmq->thread) {
+               etmq->pid = etmq->thread->pid_;
+               if (queue->cpu == -1)
+                       etmq->cpu = etmq->thread->cpu;
+       }
+}
+
+static void cs_etm__free_queue(void *priv)
+{
+       struct cs_etm_queue *etmq = priv;
+
+       if (!etmq)
+               return;
+
+       thread__zput(etmq->thread);
+       cs_etm_decoder__free(etmq->decoder);
+       zfree(&etmq->event_buf);
+       zfree(&etmq->chain);
+       free(etmq);
+}
+
+static void cs_etm__free_events(struct perf_session *session)
+{
+       struct cs_etm_auxtrace *aux = container_of(session->auxtrace,
+                                                  struct cs_etm_auxtrace,
+                                                  auxtrace);
+
+       struct auxtrace_queues *queues = &(aux->queues);
+
+       unsigned i;
+
+       for (i = 0; i < queues->nr_queues; ++i) {
+               cs_etm__free_queue(queues->queue_array[i].priv);
+               queues->queue_array[i].priv = 0;
+       }
+
+       auxtrace_queues__free(queues);
+
+}
+
+static void cs_etm__free(struct perf_session *session)
+{
+
+       size_t i;
+       struct int_node *inode, *tmp;
+       struct cs_etm_auxtrace *aux = container_of(session->auxtrace,
+                                                  struct cs_etm_auxtrace,
+                                                  auxtrace);
+       auxtrace_heap__free(&aux->heap);
+       cs_etm__free_events(session);
+       session->auxtrace = NULL;
+
+       /* First remove all traceID/CPU# nodes from the RB tree */
+       intlist__for_each_entry_safe(inode, tmp, traceid_list)
+               intlist__remove(traceid_list, inode);
+       /* Then the RB tree itself */
+       intlist__delete(traceid_list);
+
+       //thread__delete(aux->unknown_thread);
+       for (i = 0; i < aux->num_cpu; ++i)
+               zfree(&aux->metadata[i]);
+       zfree(&aux->metadata);
+       free(aux);
+}
+
+static void cs_etm__use_buffer_pid_tid(struct cs_etm_queue *etmq,
+                                     struct auxtrace_queue *queue,
+                                     struct auxtrace_buffer *buffer)
+{
+       if ((queue->cpu == -1) && (buffer->cpu != -1))
+               etmq->cpu = buffer->cpu;
+
+       etmq->pid = buffer->pid;
+       etmq->tid = buffer->tid;
+
+       thread__zput(etmq->thread);
+
+       if (etmq->tid != -1) {
+               if (etmq->pid != -1) {
+                       etmq->thread = 
machine__findnew_thread(etmq->etm->machine,
+                                                              etmq->pid,
+                                                              etmq->tid);
+               } else {
+                       etmq->thread = 
machine__findnew_thread(etmq->etm->machine,
+                                                              -1,
+                                                              etmq->tid);
+               }
+       }
+}
+
+
+static int cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue 
*etmq)
+{
+       struct auxtrace_buffer *aux_buffer = etmq->buffer;
+       struct auxtrace_buffer *old_buffer = aux_buffer;
+       struct auxtrace_queue *queue;
+
+       if (etmq->stop) {
+               buff->len = 0;
+               return 0;
+       }
+
+       queue = &etmq->etm->queues.queue_array[etmq->queue_nr];
+
+       aux_buffer = auxtrace_buffer__next(queue, aux_buffer);
+
+       if (!aux_buffer) {
+               if (old_buffer)
+                       auxtrace_buffer__drop_data(old_buffer);
+               buff->len = 0;
+               return 0;
+       }
+
+       etmq->buffer = aux_buffer;
+
+       if (!aux_buffer->data) {
+               int fd = perf_data_file__fd(etmq->etm->session->file);
+
+               aux_buffer->data = auxtrace_buffer__get_data(aux_buffer, fd);
+               if (!aux_buffer->data)
+                       return -ENOMEM;
+       }
+
+       if (old_buffer)
+               auxtrace_buffer__drop_data(old_buffer);
+
+       if (aux_buffer->use_data) {
+               buff->offset = aux_buffer->offset;
+               buff->len = aux_buffer->use_size;
+               buff->buf = aux_buffer->use_data;
+       } else {
+               buff->offset = aux_buffer->offset;
+               buff->len = aux_buffer->size;
+               buff->buf = aux_buffer->data;
+       }
+       /*
+        * buff->offset = 0;
+        * buff->len = sizeof(cstrace);
+        * buff->buf = cstrace;
+       */
+
+       buff->ref_timestamp = aux_buffer->reference;
+
+       if (etmq->use_buffer_pid_tid &&
+           ((etmq->pid != aux_buffer->pid) ||
+            (etmq->tid != aux_buffer->tid))) {
+               cs_etm__use_buffer_pid_tid(etmq, queue, aux_buffer);
+       }
+
+       if (etmq->step_through_buffers)
+               etmq->stop = true;
+
+       return buff->len;
+}
+
+static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
+                                              unsigned int queue_nr)
+{
+       struct cs_etm_decoder_params d_params;
+       struct cs_etm_trace_params   *t_params;
+       struct cs_etm_queue *etmq;
+       size_t i;
+
+       etmq = zalloc(sizeof(struct cs_etm_queue));
+       if (!etmq)
+               return NULL;
+
+       if (etm->synth_opts.callchain) {
+               size_t sz = sizeof(struct ip_callchain);
+
+               sz += etm->synth_opts.callchain_sz * sizeof(u64);
+               etmq->chain = zalloc(sz);
+               if (!etmq->chain)
+                       goto out_free;
+       } else {
+               etmq->chain = NULL;
+       }
+
+       etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+       if (!etmq->event_buf)
+               goto out_free;
+
+       etmq->etm = etm;
+       etmq->queue_nr = queue_nr;
+       etmq->pid = -1;
+       etmq->tid = -1;
+       etmq->cpu = -1;
+       etmq->stop = false;
+       etmq->kernel_mapped = false;
+
+       t_params = zalloc(sizeof(struct cs_etm_trace_params)*etm->num_cpu);
+
+       for (i = 0; i < etm->num_cpu; ++i) {
+               t_params[i].reg_idr0 = etm->metadata[i][CS_ETMV4_TRCIDR0];
+               t_params[i].reg_idr1 = etm->metadata[i][CS_ETMV4_TRCIDR1];
+               t_params[i].reg_idr2 = etm->metadata[i][CS_ETMV4_TRCIDR2];
+               t_params[i].reg_idr8 = etm->metadata[i][CS_ETMV4_TRCIDR8];
+               t_params[i].reg_configr = etm->metadata[i][CS_ETMV4_TRCCONFIGR];
+               t_params[i].reg_traceidr = 
etm->metadata[i][CS_ETMV4_TRCTRACEIDR];
+               t_params[i].protocol = CS_ETM_PROTO_ETMV4i;
+       }
+       d_params.packet_printer = cs_etm__packet_dump;
+       d_params.operation = CS_ETM_OPERATION_DECODE;
+       d_params.formatted = true;
+       d_params.fsyncs = false;
+       d_params.hsyncs = false;
+       d_params.frame_aligned = true;
+       d_params.data = etmq;
+
+       etmq->decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params);
+
+
+       zfree(&t_params);
+
+       if (!etmq->decoder)
+               goto out_free;
+
+       etmq->offset = 0;
+       etmq->eot = false;
+
+       return etmq;
+
+out_free:
+       zfree(&etmq->event_buf);
+       zfree(&etmq->chain);
+       free(etmq);
+       return NULL;
+}
+
+static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
+                             struct auxtrace_queue *queue,
+                             unsigned int queue_nr)
+{
+       struct cs_etm_queue *etmq = queue->priv;
+
+       if (list_empty(&(queue->head)))
+               return 0;
+
+       if (etmq == NULL) {
+               etmq = cs_etm__alloc_queue(etm, queue_nr);
+
+               if (etmq == NULL)
+                       return -ENOMEM;
+
+               queue->priv = etmq;
+
+               if (queue->cpu != -1)
+                       etmq->cpu = queue->cpu;
+
+               etmq->tid = queue->tid;
+
+               if (etm->sampling_mode)
+                       if (etm->timeless_decoding)
+                               etmq->step_through_buffers = true;
+                       if (etm->timeless_decoding || !etm->have_sched_switch)
+                               etmq->use_buffer_pid_tid = true;
+       }
+
+       if (!etmq->on_heap &&
+           (!etm->sync_switch)) {
+               const struct cs_etm_state *state;
+               int ret = 0;
+
+               if (etm->timeless_decoding)
+                       return ret;
+
+               //cs_etm__log("queue %u getting timestamp\n", queue_nr);
+               //cs_etm__log("queue %u decoding cpu %d pid %d tid %d\n",
+                          //queue_nr, etmq->cpu, etmq->pid, etmq->tid);
+               (void) state;
+               return ret;
+               /*
+               while (1) {
+                       state = cs_etm_decoder__decode(etmq->decoder);
+                       if (state->err) {
+                               if (state->err == CS_ETM_ERR_NODATA) {
+                                       //cs_etm__log("queue %u has no 
timestamp\n",
+                                                  //queue_nr);
+                                       return 0;
+                               }
+                               continue;
+                       }
+                       if (state->timestamp)
+                               break;
+               }
+
+               etmq->timestamp = state->timestamp;
+               //cs_etm__log("queue %u timestamp 0x%"PRIx64 "\n",
+                          //queue_nr, etmq->timestamp);
+               etmq->state = state;
+               etmq->have_sample = true;
+               //cs_etm__sample_flags(etmq);
+               ret = auxtrace_heap__add(&etm->heap, queue_nr, etmq->timestamp);
+               if (ret)
+                       return ret;
+               etmq->on_heap = true;
+               */
+       }
+
+       return 0;
+}
+
+
+static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
+{
+       unsigned int i;
+       int ret;
+
+       for (i = 0; i < etm->queues.nr_queues; i++) {
+               ret = cs_etm__setup_queue(etm, &(etm->queues.queue_array[i]), 
i);
+               if (ret)
+                       return ret;
+       }
+       return 0;
+}
+
+#if 0
+struct cs_etm_cache_entry {
+       struct auxtrace_cache_entry     entry;
+       uint64_t                        icount;
+       uint64_t                        bcount;
+};
+
+static size_t cs_etm__cache_divisor(void)
+{
+       static size_t d = 64;
+
+       return d;
+}
+
+static size_t cs_etm__cache_size(struct dso *dso,
+                               struct machine *machine)
+{
+       off_t size;
+
+       size = dso__data_size(dso, machine);
+       size /= cs_etm__cache_divisor();
+
+       if (size < 1000)
+               return 10;
+
+       if (size > (1 << 21))
+               return 21;
+
+       return 32 - __builtin_clz(size);
+}
+
+static struct auxtrace_cache *cs_etm__cache(struct dso *dso,
+                                          struct machine *machine)
+{
+       struct auxtrace_cache *c;
+       size_t bits;
+
+       if (dso->auxtrace_cache)
+               return dso->auxtrace_cache;
+
+       bits = cs_etm__cache_size(dso, machine);
+
+       c = auxtrace_cache__new(bits, sizeof(struct cs_etm_cache_entry), 200);
+
+       dso->auxtrace_cache = c;
+
+       return c;
+}
+
+static int cs_etm__cache_add(struct dso *dso, struct machine *machine,
+                           uint64_t offset, uint64_t icount, uint64_t bcount)
+{
+       struct auxtrace_cache *c = cs_etm__cache(dso, machine);
+       struct cs_etm_cache_entry *e;
+       int err;
+
+       if (!c)
+               return -ENOMEM;
+
+       e = auxtrace_cache__alloc_entry(c);
+       if (!e)
+               return -ENOMEM;
+
+       e->icount = icount;
+       e->bcount = bcount;
+
+       err = auxtrace_cache__add(c, offset, &e->entry);
+
+       if (err)
+               auxtrace_cache__free_entry(c, e);
+
+       return err;
+}
+
+static struct cs_etm_cache_entry *cs_etm__cache_lookup(struct dso *dso,
+                                                     struct machine *machine,
+                                                     uint64_t offset)
+{
+       struct auxtrace_cache *c = cs_etm__cache(dso, machine);
+
+       if (!c)
+               return NULL;
+
+       return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
+}
+#endif
+
+static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
+                                           struct cs_etm_packet *packet)
+{
+       int ret = 0;
+       struct cs_etm_auxtrace *etm = etmq->etm;
+       union perf_event *event = etmq->event_buf;
+       struct perf_sample sample = {.ip = 0,};
+       uint64_t start_addr = packet->start_addr;
+       uint64_t end_addr = packet->end_addr;
+
+       event->sample.header.type = PERF_RECORD_SAMPLE;
+       event->sample.header.misc = PERF_RECORD_MISC_USER;
+       event->sample.header.size = sizeof(struct perf_event_header);
+
+
+       sample.ip = start_addr;
+       sample.pid = etmq->pid;
+       sample.tid = etmq->tid;
+       sample.addr = end_addr;
+       sample.id = etmq->etm->instructions_id;
+       sample.stream_id = etmq->etm->instructions_id;
+       sample.period = (end_addr - start_addr) >> 2;
+       sample.cpu = packet->cpu;
+       sample.flags = 0; // etmq->flags;
+       sample.insn_len = 1; // etmq->insn_len;
+       sample.cpumode = event->header.misc;
+
+       //etmq->last_insn_cnt = etmq->state->tot_insn_cnt;
+
+#if 0
+       {
+               struct   addr_location al;
+               uint64_t offset;
+               struct   thread *thread;
+               struct   machine *machine = etmq->etm->machine;
+               uint8_t  cpumode;
+               struct   cs_etm_cache_entry *e;
+               uint8_t  buf[256];
+               size_t   bufsz;
+
+               thread = etmq->thread;
+
+               if (!thread)
+                       thread = etmq->etm->unknown_thread;
+
+               if (start_addr > 0xffffffc000000000UL)
+                       cpumode = PERF_RECORD_MISC_KERNEL;
+               else
+                       cpumode = PERF_RECORD_MISC_USER;
+
+               thread__find_addr_map(thread, cpumode, MAP__FUNCTION, 
start_addr, &al);
+               if (!al.map || !al.map->dso)
+                       goto endTest;
+               if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
+                   dso__data_status_seen(al.map->dso, 
DSO_DATA_STATUS_SEEN_ITRACE))
+                       goto endTest;
+
+               offset = al.map->map_ip(al.map, start_addr);
+
+
+               e = cs_etm__cache_lookup(al.map->dso, machine, offset);
+
+               if (e)
+                       (void) e;
+               else {
+                       int len;
+                       map__load(al.map, machine->symbol_filter);
+
+                       bufsz = sizeof(buf);
+                       len = dso__data_read_offset(al.map->dso, machine,
+                                                   offset, buf, bufsz);
+
+                       if (len <= 0)
+                               goto endTest;
+
+                       cs_etm__cache_add(al.map->dso, machine, offset, 
(end_addr - start_addr) >> 2, end_addr - start_addr);
+
+               }
+endTest:
+               (void) offset;
+       }
+#endif
+
+       ret = perf_session__deliver_synth_event(etm->session, event, &sample);
+
+       if (ret)
+               pr_err("CS ETM Trace: failed to deliver instruction event, 
error %d\n", ret);
+
+       return ret;
+}
+
+struct cs_etm_synth {
+       struct perf_tool dummy_tool;
+       struct perf_session *session;
+};
+
+
+static int cs_etm__event_synth(struct perf_tool *tool,
+                             union perf_event *event,
+                             struct perf_sample *sample,
+                             struct machine *machine)
+{
+       struct cs_etm_synth *cs_etm_synth =
+                     container_of(tool, struct cs_etm_synth, dummy_tool);
+
+       (void) sample;
+       (void) machine;
+
+       return perf_session__deliver_synth_event(cs_etm_synth->session, event, 
NULL);
+
+}
+
+
+static int cs_etm__synth_event(struct perf_session *session,
+                             struct perf_event_attr *attr, u64 id)
+{
+       struct cs_etm_synth cs_etm_synth;
+
+       memset(&cs_etm_synth, 0, sizeof(struct cs_etm_synth));
+       cs_etm_synth.session = session;
+
+       return perf_event__synthesize_attr(&cs_etm_synth.dummy_tool, attr, 1,
+                                          &id, cs_etm__event_synth);
+}
+
+static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
+                               struct perf_session *session)
+{
+       struct perf_evlist *evlist = session->evlist;
+       struct perf_evsel *evsel;
+       struct perf_event_attr attr;
+       bool found = false;
+       u64 id;
+       int err;
+
+       evlist__for_each_entry(evlist, evsel) {
+
+               if (evsel->attr.type == etm->pmu_type) {
+                       found = true;
+                       break;
+               }
+       }
+
+       if (!found) {
+               pr_debug("There are no selected events with Core Sight Trace 
data\n");
+               return 0;
+       }
+
+       memset(&attr, 0, sizeof(struct perf_event_attr));
+       attr.size = sizeof(struct perf_event_attr);
+       attr.type = PERF_TYPE_HARDWARE;
+       attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+       attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+                           PERF_SAMPLE_PERIOD;
+       if (etm->timeless_decoding)
+               attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+       else
+               attr.sample_type |= PERF_SAMPLE_TIME;
+
+       attr.exclude_user = evsel->attr.exclude_user;
+       attr.exclude_kernel = evsel->attr.exclude_kernel;
+       attr.exclude_hv = evsel->attr.exclude_hv;
+       attr.exclude_host = evsel->attr.exclude_host;
+       attr.exclude_guest = evsel->attr.exclude_guest;
+       attr.sample_id_all = evsel->attr.sample_id_all;
+       attr.read_format = evsel->attr.read_format;
+
+       id = evsel->id[0] + 1000000000;
+
+       if (!id)
+               id = 1;
+
+       if (etm->synth_opts.instructions) {
+               attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+               attr.sample_period = etm->synth_opts.period;
+               etm->instructions_sample_period = attr.sample_period;
+               err = cs_etm__synth_event(session, &attr, id);
+
+               if (err) {
+                       pr_err("%s: failed to synthesize 'instructions' event 
type\n",
+                              __func__);
+                       return err;
+               }
+               etm->sample_instructions = true;
+               etm->instructions_sample_type = attr.sample_type;
+               etm->instructions_id = id;
+               id += 1;
+       }
+
+       etm->synth_needs_swap = evsel->needs_swap;
+       return 0;
+}
+
+static int cs_etm__sample(struct cs_etm_queue *etmq)
+{
+       //const struct cs_etm_state *state = etmq->state;
+       struct cs_etm_packet packet;
+       //struct cs_etm_auxtrace *etm = etmq->etm;
+       int err;
+
+       err = cs_etm_decoder__get_packet(etmq->decoder, &packet);
+       // if there is no sample, it returns err = -1, no real error
+
+       if (!err && packet.sample_type & CS_ETM_RANGE) {
+               err = cs_etm__synth_instruction_sample(etmq, &packet);
+               if (err)
+                       return err;
+       }
+       return 0;
+}
+
+static int cs_etm__run_decoder(struct cs_etm_queue *etmq, u64 *timestamp)
+{
+       struct cs_etm_buffer buffer;
+       size_t buffer_used;
+       int err = 0;
+
+       /* Go through each buffer in the queue and decode them one by one */
+more:
+       buffer_used = 0;
+       memset(&buffer, 0, sizeof(buffer));
+       err = cs_etm__get_trace(&buffer, etmq);
+       if (err <= 0)
+               return err;
+
+       do {
+               size_t processed = 0;
+               etmq->state = cs_etm_decoder__process_data_block(etmq->decoder,
+                                       etmq->offset,
+                                       &buffer.buf[buffer_used],
+                                       buffer.len-buffer_used,
+                                       &processed);
+               err = etmq->state->err;
+               etmq->offset += processed;
+               buffer_used += processed;
+               if (!err)
+                       cs_etm__sample(etmq);
+       } while (!etmq->eot && (buffer.len > buffer_used));
+goto more;
+
+       (void) timestamp;
+
+       return err;
+}
+
+static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
+{
+  if (etm->queues.new_data) {
+       etm->queues.new_data = false;
+       return cs_etm__setup_queues(etm);
+  }
+  return 0;
+}
+
+static int cs_etm__process_queues(struct cs_etm_auxtrace *etm, u64 timestamp)
+{
+       unsigned int queue_nr;
+       u64 ts;
+       int ret;
+
+       while (1) {
+               struct auxtrace_queue *queue;
+               struct cs_etm_queue *etmq;
+
+               if (!etm->heap.heap_cnt)
+                       return 0;
+
+               if (etm->heap.heap_array[0].ordinal >= timestamp)
+                       return 0;
+
+               queue_nr = etm->heap.heap_array[0].queue_nr;
+               queue = &etm->queues.queue_array[queue_nr];
+               etmq = queue->priv;
+
+               //cs_etm__log("queue %u processing 0x%" PRIx64 " to 0x%" PRIx64 
"\n",
+                          //queue_nr, etm->heap.heap_array[0].ordinal,
+                          //timestamp);
+
+               auxtrace_heap__pop(&etm->heap);
+
+               if (etm->heap.heap_cnt) {
+                       ts = etm->heap.heap_array[0].ordinal + 1;
+                       if (ts > timestamp)
+                               ts = timestamp;
+               } else {
+                       ts = timestamp;
+               }
+
+               cs_etm__set_pid_tid_cpu(etm, queue);
+
+               ret = cs_etm__run_decoder(etmq, &ts);
+
+               if (ret < 0) {
+                       auxtrace_heap__add(&etm->heap, queue_nr, ts);
+                       return ret;
+               }
+
+               if (!ret) {
+                       ret = auxtrace_heap__add(&etm->heap, queue_nr, ts);
+                       if (ret < 0)
+                               return ret;
+               } else {
+                       etmq->on_heap = false;
+               }
+       }
+       return 0;
+}
+
+static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
+                                         pid_t tid,
+                                         u64 time_)
+{
+       struct auxtrace_queues *queues = &etm->queues;
+       unsigned int i;
+       u64 ts = 0;
+
+       for (i = 0; i < queues->nr_queues; ++i) {
+               struct auxtrace_queue *queue = &(etm->queues.queue_array[i]);
+               struct cs_etm_queue *etmq = queue->priv;
+
+               if (etmq && ((tid == -1) || (etmq->tid == tid))) {
+                       etmq->time = time_;
+                       cs_etm__set_pid_tid_cpu(etm, queue);
+                       cs_etm__run_decoder(etmq, &ts);
+
+               }
+       }
+       return 0;
+}
+
+static struct cs_etm_queue *cs_etm__cpu_to_etmq(struct cs_etm_auxtrace *etm,
+                                               int cpu)
+{
+       unsigned q, j;
+
+       if (etm->queues.nr_queues == 0)
+               return NULL;
+
+       if (cpu < 0)
+               q = 0;
+       else if ((unsigned) cpu >= etm->queues.nr_queues)
+               q = etm->queues.nr_queues - 1;
+       else
+               q = cpu;
+
+       if (etm->queues.queue_array[q].cpu == cpu)
+               return etm->queues.queue_array[q].priv;
+
+       for (j = 0; q > 0; j++) {
+               if (etm->queues.queue_array[--q].cpu == cpu)
+                       return etm->queues.queue_array[q].priv;
+       }
+
+       for (; j < etm->queues.nr_queues; j++) {
+               if (etm->queues.queue_array[j].cpu == cpu)
+                       return etm->queues.queue_array[j].priv;
+
+       }
+
+       return NULL;
+}
+
+static uint32_t cs_etm__mem_access(struct cs_etm_queue *etmq, uint64_t 
address, size_t size, uint8_t *buffer)
+{
+       struct   addr_location al;
+       uint64_t offset;
+       struct   thread *thread;
+       struct   machine *machine;
+       uint8_t  cpumode;
+       int len;
+
+       if (etmq == NULL)
+               return -1;
+
+       machine = etmq->etm->machine;
+       thread = etmq->thread;
+       if (address > 0xffffffc000000000UL)
+               cpumode = PERF_RECORD_MISC_KERNEL;
+       else
+               cpumode = PERF_RECORD_MISC_USER;
+
+       thread__find_addr_map(thread, cpumode, MAP__FUNCTION, address, &al);
+
+       if (!al.map || !al.map->dso)
+               return 0;
+
+       if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
+           dso__data_status_seen(al.map->dso, DSO_DATA_STATUS_SEEN_ITRACE))
+               return 0;
+
+       offset = al.map->map_ip(al.map, address);
+
+       map__load(al.map);
+
+       len = dso__data_read_offset(al.map->dso, machine,
+                                   offset, buffer, size);
+
+       if (len <= 0)
+               return 0;
+
+       return len;
+}
+
+static bool check_need_swap(int file_endian)
+{
+       const int data = 1;
+       u8 *check = (u8 *)&data;
+       int host_endian;
+
+       if (check[0] == 1)
+               host_endian = ELFDATA2LSB;
+       else
+               host_endian = ELFDATA2MSB;
+
+       return host_endian != file_endian;
+}
+
+static int cs_etm__read_elf_info(const char *fname, uint64_t *foffset, 
uint64_t *fstart, uint64_t *fsize)
+{
+       FILE *fp;
+       u8 e_ident[EI_NIDENT];
+       int ret = -1;
+       bool need_swap = false;
+       size_t buf_size;
+       void *buf;
+       int i;
+
+       fp = fopen(fname, "r");
+       if (fp == NULL)
+               return -1;
+
+       if (fread(e_ident, sizeof(e_ident), 1, fp) != 1)
+               goto out;
+
+       if (memcmp(e_ident, ELFMAG, SELFMAG) ||
+           e_ident[EI_VERSION] != EV_CURRENT)
+               goto out;
+
+       need_swap = check_need_swap(e_ident[EI_DATA]);
+
+       /* for simplicity */
+       fseek(fp, 0, SEEK_SET);
+
+       if (e_ident[EI_CLASS] == ELFCLASS32) {
+               Elf32_Ehdr ehdr;
+               Elf32_Phdr *phdr;
+
+               if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1)
+                       goto out;
+
+               if (need_swap) {
+                       ehdr.e_phoff = bswap_32(ehdr.e_phoff);
+                       ehdr.e_phentsize = bswap_16(ehdr.e_phentsize);
+                       ehdr.e_phnum = bswap_16(ehdr.e_phnum);
+               }
+
+               buf_size = ehdr.e_phentsize * ehdr.e_phnum;
+               buf = malloc(buf_size);
+               if (buf == NULL)
+                       goto out;
+
+               fseek(fp, ehdr.e_phoff, SEEK_SET);
+               if (fread(buf, buf_size, 1, fp) != 1)
+                       goto out_free;
+
+               for (i = 0, phdr = buf; i < ehdr.e_phnum; i++, phdr++) {
+
+                       if (need_swap) {
+                               phdr->p_type = bswap_32(phdr->p_type);
+                               phdr->p_offset = bswap_32(phdr->p_offset);
+                               phdr->p_filesz = bswap_32(phdr->p_filesz);
+                       }
+
+                       if (phdr->p_type != PT_LOAD)
+                               continue;
+
+                       *foffset = phdr->p_offset;
+                       *fstart = phdr->p_vaddr;
+                       *fsize = phdr->p_filesz;
+                       ret = 0;
+                       break;
+               }
+       } else {
+               Elf64_Ehdr ehdr;
+               Elf64_Phdr *phdr;
+
+               if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1)
+                       goto out;
+
+               if (need_swap) {
+                       ehdr.e_phoff = bswap_64(ehdr.e_phoff);
+                       ehdr.e_phentsize = bswap_16(ehdr.e_phentsize);
+                       ehdr.e_phnum = bswap_16(ehdr.e_phnum);
+               }
+
+               buf_size = ehdr.e_phentsize * ehdr.e_phnum;
+               buf = malloc(buf_size);
+               if (buf == NULL)
+                       goto out;
+
+               fseek(fp, ehdr.e_phoff, SEEK_SET);
+               if (fread(buf, buf_size, 1, fp) != 1)
+                       goto out_free;
+
+               for (i = 0, phdr = buf; i < ehdr.e_phnum; i++, phdr++) {
+
+                       if (need_swap) {
+                               phdr->p_type = bswap_32(phdr->p_type);
+                               phdr->p_offset = bswap_64(phdr->p_offset);
+                               phdr->p_filesz = bswap_64(phdr->p_filesz);
+                       }
+
+                       if (phdr->p_type != PT_LOAD)
+                               continue;
+
+                       *foffset = phdr->p_offset;
+                       *fstart = phdr->p_vaddr;
+                       *fsize = phdr->p_filesz;
+                       ret = 0;
+                       break;
+               }
+       }
+out_free:
+       free(buf);
+out:
+       fclose(fp);
+       return ret;
+}
+
+static int cs_etm__process_event(struct perf_session *session,
+                               union perf_event *event,
+                               struct perf_sample *sample,
+                               struct perf_tool *tool)
+{
+       struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+                                                  struct cs_etm_auxtrace,
+                                                  auxtrace);
+
+       u64 timestamp;
+       int err = 0;
+
+       if (dump_trace)
+               return 0;
+
+       if (!tool->ordered_events) {
+               pr_err("CoreSight ETM Trace requires ordered events\n");
+               return -EINVAL;
+       }
+
+       if (sample->time && (sample->time != (u64)-1))
+               timestamp = sample->time;
+       else
+               timestamp = 0;
+
+       if (timestamp || etm->timeless_decoding) {
+               err = cs_etm__update_queues(etm);
+               if (err)
+                       return err;
+
+       }
+
+       if (event->header.type == PERF_RECORD_MMAP2) {
+               struct dso *dso;
+               int cpu;
+               struct cs_etm_queue *etmq;
+
+               cpu = sample->cpu;
+
+               etmq = cs_etm__cpu_to_etmq(etm, cpu);
+
+               if (!etmq)
+                       return -1;
+
+               dso = dsos__find(&(etm->machine->dsos), event->mmap2.filename, 
false);
+               if (NULL != dso) {
+                       err = cs_etm_decoder__add_mem_access_cb(
+                           etmq->decoder,
+                           event->mmap2.start,
+                           event->mmap2.len,
+                           cs_etm__mem_access);
+               }
+
+               if ((symbol_conf.vmlinux_name != NULL) && 
(!etmq->kernel_mapped)) {
+                       uint64_t foffset;
+                       uint64_t fstart;
+                       uint64_t fsize;
+
+                       err = cs_etm__read_elf_info(symbol_conf.vmlinux_name,
+                                                     &foffset, &fstart, 
&fsize);
+
+                       if (!err) {
+                               cs_etm_decoder__add_bin_file(
+                                       etmq->decoder,
+                                       foffset,
+                                       fstart,
+                                       fsize & ~0xULL,
+                                       symbol_conf.vmlinux_name);
+
+                               etmq->kernel_mapped = true;
+                       }
+               }
+
+       }
+
+       if (etm->timeless_decoding) {
+               if (event->header.type == PERF_RECORD_EXIT) {
+                       err = cs_etm__process_timeless_queues(etm,
+                                                            event->fork.tid,
+                                                            sample->time);
+               }
+       } else if (timestamp) {
+               err = cs_etm__process_queues(etm, timestamp);
+       }
+
+       //cs_etm__log("event %s (%u): cpu %d time%"PRIu64" tsc %#"PRIx64"\n",
+                  //perf_event__name(event->header.type), event->header.type,
+                  //sample->cpu, sample->time, timestamp);
+       return err;
+}
+
+static int cs_etm__process_auxtrace_event(struct perf_session *session,
+                                 union perf_event *event,
+                                 struct perf_tool *tool)
+{
+       struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+                                                  struct cs_etm_auxtrace,
+                                                  auxtrace);
+
+       (void) tool;
+
+       if (!etm->data_queued) {
+               struct auxtrace_buffer *buffer;
+               off_t  data_offset;
+               int fd = perf_data_file__fd(session->file);
+               bool is_pipe = perf_data_file__is_pipe(session->file);
+               int err;
+
+               if (is_pipe)
+                       data_offset = 0;
+               else {
+                       data_offset = lseek(fd, 0, SEEK_CUR);
+                       if (data_offset == -1)
+                               return -errno;
+               }
+
+               err = auxtrace_queues__add_event(&etm->queues,
+                                                session,
+                                                event,
+                                                data_offset,
+                                                &buffer);
+               if (err)
+                       return err;
+
+               if (dump_trace)
+                       if (auxtrace_buffer__get_data(buffer, fd)) {
+                               cs_etm__dump_event(etm, buffer);
+                               auxtrace_buffer__put_data(buffer);
+                       }
+       }
+
+       return 0;
+
+}
+
+static const char * const cs_etm_global_header_fmts[] = {
+  [CS_HEADER_VERSION_0]        = "   Header version            %"PRIx64"\n",
+  [CS_PMU_TYPE_CPUS]   = "   PMU type/num cpus         %"PRIx64"\n",
+  [CS_ETM_SNAPSHOT]    = "   Snapshot                  %"PRIx64"\n",
+};
+
+static const char * const cs_etm_priv_fmts[] = {
+  [CS_ETM_MAGIC]       = "   Magic number              %"PRIx64"\n",
+  [CS_ETM_CPU]         = "   CPU                       %"PRIx64"\n",
+  [CS_ETM_ETMCR]       = "   ETMCR                     %"PRIx64"\n",
+  [CS_ETM_ETMTRACEIDR] = "   ETMTRACEIDR               %"PRIx64"\n",
+  [CS_ETM_ETMCCER]     = "   ETMCCER                   %"PRIx64"\n",
+  [CS_ETM_ETMIDR]      = "   ETMIDR                    %"PRIx64"\n",
+};
+
+static const char * const cs_etmv4_priv_fmts[] = {
+  [CS_ETM_MAGIC]               = "   Magic number              %"PRIx64"\n",
+  [CS_ETM_CPU]                 = "   CPU                       %"PRIx64"\n",
+  [CS_ETMV4_TRCCONFIGR]                = "   TRCCONFIGR                
%"PRIx64"\n",
+  [CS_ETMV4_TRCTRACEIDR]       = "   TRCTRACEIDR               %"PRIx64"\n",
+  [CS_ETMV4_TRCIDR0]           = "   TRCIDR0                   %"PRIx64"\n",
+  [CS_ETMV4_TRCIDR1]           = "   TRCIDR1                   %"PRIx64"\n",
+  [CS_ETMV4_TRCIDR2]           = "   TRCIDR2                   %"PRIx64"\n",
+  [CS_ETMV4_TRCIDR8]           = "   TRCIDR8                   %"PRIx64"\n",
+  [CS_ETMV4_TRCAUTHSTATUS]     = "   TRCAUTHSTATUS             %"PRIx64"\n",
+};
+
+static void cs_etm__print_auxtrace_info(u64 *val, size_t num)
+{
+       unsigned i, j, cpu;
+
+       for (i = 0, cpu = 0; cpu < num; ++cpu)
+               if (val[i] == __perf_cs_etmv3_magic)
+                       for (j = 0; j < CS_ETM_PRIV_MAX; ++j, ++i)
+                               fprintf(stdout, cs_etm_priv_fmts[j], val[i]);
+               else if (val[i] == __perf_cs_etmv4_magic)
+                       for (j = 0; j < CS_ETMV4_PRIV_MAX; ++j, ++i)
+                               fprintf(stdout, cs_etmv4_priv_fmts[j], val[i]);
+               else
+                       // failure.. return
+                       return;
+}
+
+int cs_etm__process_auxtrace_info(union perf_event *event,
+                                struct perf_session *session)
+{
+       struct auxtrace_info_event *auxtrace_info = &(event->auxtrace_info);
+       size_t event_header_size = sizeof(struct perf_event_header);
+       size_t info_header_size = 8;
+       size_t total_size = auxtrace_info->header.size;
+       size_t priv_size = 0;
+       size_t num_cpu;
+       struct cs_etm_auxtrace *etm = 0;
+       int err = 0, idx = -1;
+       u64 *ptr;
+       u64 *hdr = NULL;
+       u64 **metadata = NULL;
+       size_t i, j, k;
+       unsigned pmu_type;
+       struct int_node *inode;
+
+       /*
+        * sizeof(auxtrace_info_event::type) +
+        * sizeof(auxtrace_info_event::reserved) == 8
+        */
+       info_header_size = 8;
+
+       if (total_size < (event_header_size + info_header_size))
+               return -EINVAL;
+
+       priv_size = total_size - event_header_size - info_header_size;
+
+       // First the global part
+
+       ptr = (u64 *) auxtrace_info->priv;
+       if (ptr[0] == 0) {
+               hdr = zalloc(sizeof(u64 *) * CS_HEADER_VERSION_0_MAX);
+               if (hdr == NULL)
+                       return -EINVAL;
+               for (i = 0; i < CS_HEADER_VERSION_0_MAX; ++i)
+                       hdr[i] = ptr[i];
+               num_cpu = hdr[CS_PMU_TYPE_CPUS] & 0xffffffff;
+               pmu_type = (unsigned) ((hdr[CS_PMU_TYPE_CPUS] >> 32) & 
0xffffffff);
+       } else
+               return -EINVAL;
+
+       /*
+        * Create an RB tree for traceID-CPU# tuple.  Since the conversion has
+        * to be made for each packet that gets decoded optimizing access in
+        * anything other than a sequential array is worth doing.
+        */
+       traceid_list = intlist__new(NULL);
+       if (!traceid_list)
+               return -ENOMEM;
+
+       metadata = zalloc(sizeof(u64 *) * num_cpu);
+       if (!metadata) {
+               err = -ENOMEM;
+               goto err_free_traceid_list;
+       }
+
+       if (metadata == NULL)
+               return -EINVAL;
+
+       for (j = 0; j < num_cpu; ++j) {
+               if (ptr[i] == __perf_cs_etmv3_magic) {
+                       metadata[j] = zalloc(sizeof(u64)*CS_ETM_PRIV_MAX);
+                       if (metadata == NULL)
+                               return -EINVAL;
+                       for (k = 0; k < CS_ETM_PRIV_MAX; k++)
+                               metadata[j][k] = ptr[i+k];
+
+                       /* The traceID is our handle */
+                       idx = metadata[j][CS_ETM_ETMIDR];
+                       i += CS_ETM_PRIV_MAX;
+               } else if (ptr[i] == __perf_cs_etmv4_magic) {
+                       metadata[j] = zalloc(sizeof(u64)*CS_ETMV4_PRIV_MAX);
+                       if (metadata == NULL)
+                               return -EINVAL;
+                       for (k = 0; k < CS_ETMV4_PRIV_MAX; k++)
+                               metadata[j][k] = ptr[i+k];
+
+                       /* The traceID is our handle */
+                       idx = metadata[j][CS_ETMV4_TRCTRACEIDR];
+                       i += CS_ETMV4_PRIV_MAX;
+               }
+
+               /* Get an RB node for this CPU */
+               inode = intlist__findnew(traceid_list, idx);
+
+               /* Something went wrong, no need to continue */
+               if (!inode) {
+                       err = PTR_ERR(inode);
+                       goto err_free_metadata;
+               }
+
+               /*
+                * The node for that CPU should not have been taken already.
+                * Backout if that's the case.
+                */
+               if (inode->priv) {
+                       err = -EINVAL;
+                       goto err_free_metadata;
+               }
+
+               /* All good, associate the traceID with the CPU# */
+               inode->priv = &metadata[j][CS_ETM_CPU];
+
+       }
+
+       if (i*8 != priv_size)
+               return -EINVAL;
+
+       if (dump_trace)
+               cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu);
+
+       etm = zalloc(sizeof(struct cs_etm_auxtrace));
+
+       etm->num_cpu = num_cpu;
+       etm->pmu_type = pmu_type;
+       etm->snapshot_mode = (hdr[CS_ETM_SNAPSHOT] != 0);
+
+       if (!etm)
+               return -ENOMEM;
+
+
+       err = auxtrace_queues__init(&etm->queues);
+       if (err)
+               goto err_free;
+
+       etm->unknown_thread = thread__new(999999999, 999999999);
+       if (etm->unknown_thread == NULL) {
+               err = -ENOMEM;
+               goto err_free_queues;
+       }
+       err = thread__set_comm(etm->unknown_thread, "unknown", 0);
+       if (err)
+               goto err_delete_thread;
+
+       if (thread__init_map_groups(etm->unknown_thread,
+                                   etm->machine)) {
+               err = -ENOMEM;
+               goto err_delete_thread;
+       }
+
+       etm->timeless_decoding = true;
+       etm->sampling_mode = false;
+       etm->metadata = metadata;
+       etm->session = session;
+       etm->machine = &session->machines.host;
+       etm->auxtrace_type = auxtrace_info->type;
+
+       etm->auxtrace.process_event = cs_etm__process_event;
+       etm->auxtrace.process_auxtrace_event = cs_etm__process_auxtrace_event;
+       etm->auxtrace.flush_events = cs_etm__flush_events;
+       etm->auxtrace.free_events  = cs_etm__free_events;
+       etm->auxtrace.free       = cs_etm__free;
+       session->auxtrace = &(etm->auxtrace);
+
+       if (dump_trace)
+               return 0;
+
+       if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+               etm->synth_opts = *session->itrace_synth_opts;
+       else
+               itrace_synth_opts__set_default(&etm->synth_opts);
+       etm->synth_opts.branches = false;
+       etm->synth_opts.callchain = false;
+       etm->synth_opts.calls = false;
+       etm->synth_opts.returns = false;
+
+       err = cs_etm__synth_events(etm, session);
+       if (err)
+               goto err_delete_thread;
+
+       err = auxtrace_queues__process_index(&etm->queues, session);
+       if (err)
+               goto err_delete_thread;
+
+       etm->data_queued = etm->queues.populated;
+
+       return 0;
+
+err_delete_thread:
+       thread__delete(etm->unknown_thread);
+err_free_queues:
+       auxtrace_queues__free(&etm->queues);
+       session->auxtrace = NULL;
+err_free:
+       free(etm);
+err_free_metadata:
+       /* No need to check @metadata[j], free(NULL) is supported */
+       for (j = 0; j < num_cpu; ++j)
+               free(metadata[j]);
+       free(metadata);
+err_free_traceid_list:
+       intlist__delete(traceid_list);
+
+       return err;
+}
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
index 3cc6bc3..32400ac 100644
--- a/tools/perf/util/cs-etm.h
+++ b/tools/perf/util/cs-etm.h
@@ -18,6 +18,10 @@
 #ifndef INCLUDE__UTIL_PERF_CS_ETM_H__
 #define INCLUDE__UTIL_PERF_CS_ETM_H__
 
+#include "util/event.h"
+#include "util/intlist.h"
+#include "util/session.h"
+
 /* Versionning header in case things need tro change in the future.  That way
  * decoding of old snapshot is still possible.
  */
@@ -61,6 +65,9 @@ enum {
        CS_ETMV4_PRIV_MAX,
 };
 
+/* RB tree for quick conversion between traceID and CPUs */
+struct intlist *traceid_list;
+
 #define KiB(x) ((x) * 1024)
 #define MiB(x) ((x) * 1024 * 1024)
 
@@ -71,4 +78,7 @@ static const u64 __perf_cs_etmv4_magic   = 
0x4040404040404040ULL;
 #define CS_ETMV3_PRIV_SIZE (CS_ETM_PRIV_MAX * sizeof(u64))
 #define CS_ETMV4_PRIV_SIZE (CS_ETMV4_PRIV_MAX * sizeof(u64))
 
+int cs_etm__process_auxtrace_info(union perf_event *event,
+                                 struct perf_session *session);
+
 #endif
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index df85b9e..bcec333 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1,3 +1,4 @@
+#include "build-id.h"
 #include "callchain.h"
 #include "debug.h"
 #include "event.h"
@@ -483,7 +484,8 @@ int machine__process_comm_event(struct machine *machine, 
union perf_event *event
 }
 
 int machine__process_lost_event(struct machine *machine __maybe_unused,
-                               union perf_event *event, struct perf_sample 
*sample __maybe_unused)
+                               union perf_event *event,
+                               struct perf_sample *sample __maybe_unused)
 {
        dump_printf(": id:%" PRIu64 ": lost:%" PRIu64 "\n",
                    event->lost.id, event->lost.lost);
@@ -491,7 +493,8 @@ int machine__process_lost_event(struct machine *machine 
__maybe_unused,
 }
 
 int machine__process_lost_samples_event(struct machine *machine __maybe_unused,
-                                       union perf_event *event, struct 
perf_sample *sample)
+                                       union perf_event *event,
+                                       struct perf_sample *sample)
 {
        dump_printf(": id:%" PRIu64 ": lost samples :%" PRIu64 "\n",
                    sample->id, event->lost_samples.lost);
@@ -711,8 +714,16 @@ static struct dso *machine__get_kernel(struct machine 
*machine)
                                                 DSO_TYPE_GUEST_KERNEL);
        }
 
-       if (kernel != NULL && (!kernel->has_build_id))
-               dso__read_running_kernel_build_id(kernel, machine);
+       if (kernel != NULL && (!kernel->has_build_id)) {
+               if (symbol_conf.vmlinux_name != NULL) {
+                       filename__read_build_id(symbol_conf.vmlinux_name,
+                                               kernel->build_id,
+                                               sizeof(kernel->build_id));
+                       kernel->has_build_id = 1;
+               } else {
+                       dso__read_running_kernel_build_id(kernel, machine);
+               }
+       }
 
        return kernel;
 }
@@ -726,8 +737,19 @@ static void machine__get_kallsyms_filename(struct machine 
*machine, char *buf,
 {
        if (machine__is_default_guest(machine))
                scnprintf(buf, bufsz, "%s", symbol_conf.default_guest_kallsyms);
-       else
-               scnprintf(buf, bufsz, "%s/proc/kallsyms", machine->root_dir);
+       else {
+               if (symbol_conf.vmlinux_name != 0) {
+                       unsigned char build_id[BUILD_ID_SIZE];
+                       char build_id_hex[SBUILD_ID_SIZE];
+                       filename__read_build_id(symbol_conf.vmlinux_name,
+                                               build_id,
+                                               sizeof(build_id));
+                       build_id__sprintf(build_id, sizeof(build_id), 
build_id_hex);
+                       build_id_cache__linkname((char *)build_id_hex, buf, 
bufsz);
+               } else {
+                       scnprintf(buf, bufsz, "%s/proc/kallsyms", 
machine->root_dir);
+               }
+       }
 }
 
 const char *ref_reloc_sym_names[] = {"_text", "_stext", NULL};
@@ -736,7 +758,7 @@ const char *ref_reloc_sym_names[] = {"_text", "_stext", 
NULL};
  * Returns the name of the start symbol in *symbol_name. Pass in NULL as
  * symbol_name if it's not that important.
  */
-static u64 machine__get_running_kernel_start(struct machine *machine,
+static u64 machine__get_kallsyms_kernel_start(struct machine *machine,
                                             const char **symbol_name)
 {
        char filename[PATH_MAX];
@@ -764,7 +786,7 @@ static u64 machine__get_running_kernel_start(struct machine 
*machine,
 int __machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
 {
        enum map_type type;
-       u64 start = machine__get_running_kernel_start(machine, NULL);
+       u64 start = machine__get_kallsyms_kernel_start(machine, NULL);
 
        /* In case of renewal the kernel map, destroy previous one */
        machine__destroy_kernel_maps(machine);
@@ -1126,10 +1148,10 @@ int machine__create_kernel_maps(struct machine *machine)
 {
        struct dso *kernel = machine__get_kernel(machine);
        const char *name;
-       u64 addr;
+       u64 addr = machine__get_kallsyms_kernel_start(machine, &name);
        int ret;
 
-       if (kernel == NULL)
+       if (!addr || kernel == NULL)
                return -1;
 
        ret = __machine__create_kernel_maps(machine, kernel);
@@ -1151,7 +1173,7 @@ int machine__create_kernel_maps(struct machine *machine)
         */
        map_groups__fixup_end(&machine->kmaps);
 
-       addr = machine__get_running_kernel_start(machine, &name);
+       addr = machine__get_kallsyms_kernel_start(machine, &name);
        if (!addr) {
        } else if (maps__set_kallsyms_ref_reloc_sym(machine->vmlinux_maps, 
name, addr)) {
                machine__destroy_kernel_maps(machine);
@@ -1901,7 +1923,7 @@ static int thread__resolve_callchain_sample(struct thread 
*thread,
                ip = chain->ips[j];
 
                if (ip < PERF_CONTEXT_MAX)
-                       ++nr_entries;
+                      ++nr_entries;
 
                err = add_callchain_ip(thread, cursor, parent, root_al, 
&cpumode, ip);
 
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c 
b/tools/perf/util/scripting-engines/trace-event-python.c
index fdbbf04..7698291 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -833,6 +833,8 @@ static void python_process_general_event(struct perf_sample 
*sample,
                        PyInt_FromLong(sample->cpu));
        pydict_set_item_string_decref(dict_sample, "ip",
                        PyLong_FromUnsignedLongLong(sample->ip));
+       pydict_set_item_string_decref(dict_sample, "addr",
+                       PyLong_FromUnsignedLongLong(sample->addr));
        pydict_set_item_string_decref(dict_sample, "time",
                        PyLong_FromUnsignedLongLong(sample->time));
        pydict_set_item_string_decref(dict_sample, "period",
diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index 11cdde9..c094091 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -342,9 +342,8 @@ int dso__load_sym(struct dso *dso, struct map *map 
__maybe_unused,
        if (ret >= 0)
                dso->is_64_bit = ret;
 
-       if (filename__read_build_id(ss->name, build_id, BUILD_ID_SIZE) > 0) {
+       if ((!dso->has_build_id) && (filename__read_build_id(ss->name, 
build_id, BUILD_ID_SIZE) > 0))
                dso__set_build_id(dso, build_id);
-       }
        return 0;
 }
 
-- 
2.7.4

-- 
_______________________________________________
linux-yocto mailing list
linux-yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/linux-yocto

Reply via email to