Kevin,
I quickly looked at this patch first. I think there are things
that can be simplified.
It seems to me like you have to copied the data out of the HW buffer into the
software buffer. Is the case thet the HW buffer is hosted in registers and
not in memory?
I am trying to compare this with Intel's PEBS. There we do not copy data out,
we directly expose the memory region where the CPU stores the samples.
I see you have imported some code to:
- reset the counter
- mask monitoring
- send the overflow notificatio nmessage
- get the active set identification.
This is already provided to you in the struct pfm_ovfl_arg argument passed
to the handler routine:
- ovfl_arg->set_id : active set id
The modules controls:
- when to mask monitoring
- when to send a notification
- when to reset the counters
Yet it does not have to do that itself. it needs to inform the perfmon core
of what it wants to see done when returning from the handler routine. These
controls are available in pfm_ovfl_arg in the ovfl_ctrl field. You have:
- PFM_OVFL_CTRL_MASK: mask monitoring (monitoring is stopped)
- PFM_OVFL_CTRL_RESET: reset counters
- PFM_OVFL_CTRL_NOTIFY: send an overflow notification
Those can be combined. There are some rules/priorities that apply to the order
in which
operation are performed:
- if masking is requested, then reset is not performed immediately, it will
be upon pfm_restart()
- masking is doen prior to notification
So typically, you have in the handle():
- if buffer not full: PFM_OVFL_CTRL_RESET:
- if buffer full: PFM_OVFL_CTRL_MASK|PFM_OVFL_CTRL_NOTIFY
And in the fmt_restart callback:
- PFM_OVFL_CTRL_RESET
If you have the set_id, then you do not need to get to the context.
In pfm_cell_hw_smpl_restart():
- you are asking "what about is_active?". This is a very good question.
pfm_restart() can be called when monitoring is masked (stopped) or
when it
is active (not masked, i.e., not stopped). The flag indicates in
which situation
you are called. You can imagine a sampling format which implements
double buffering
in which case, monitoring is never actually masked, yet you may need
to reset certain
parameters when one half of the buffer has been processed by user
code.
- you decide the content of pfm_cell_hw_smpl_hdr. I saw you have an
overflows field.
You probably cloned it from the default format in which it is used to
indicate the
total number of overflows since the beginning. This could be used to
detect whether
or not we have leftover samples at the end of a run. It appears you
have chosen a
different approach, where this is managed like a sticky bit
indicating whether or
nor there was a buffer full condition. Is that want you wanted? It is
not clear to
me how with this bit, you would handle the partial buffer condition?
I am glad to see that the sampling format could be used to support this
advanced feature.
This is one more proof (after PEBS) that this feature was indeed needed.
Thanks for your contritbutions.
> +}
> +
> +/**
> + * read_partial_sample
> + *
> + * We are disabling this event-set, and are in the middle of a sampling
> + * interval. When this event set gets re-enabled, the pm_interval will get
> + * reset to its initial value, so we'll lose the counter values from this
> + * incomplete sample. In "count" sampling-mode, we should be able to prevent
> + * this data loss by recording a "partial" sample from the actual counters.
> + * In occurrence and threshold sampling, however, the partial data is not
> + * available, so we'll have to settle for an "empty" sample. In any case,
> + * fill in the current pm_interval value so user-space will know that the
> last
> + * sample in this entry is partial or empty, and so they will be able to
> + * calculate the percentage-complete for that sample.
> + **/
> +static void read_partial_sample(struct pfm_event_set *set,
> + struct pfm_cell_hw_smpl_entry_hdr *ent_hdr,
> + u64 **trace_buffer_sample)
> +{
> + u32 pm_control = set->pmcs[CELL_PMC_PM_CONTROL];
> + u32 *partial_sample = (u32 *)*trace_buffer_sample;
> + int i, cpu = smp_processor_id();
> +
> + if (CBE_PM_TRACE_MODE_GET(pm_control) == CBE_PM_TRACE_MODE_COUNT) {
> + for (i = 0; i < NR_PHYS_CTRS; i++) {
> + partial_sample[i] = cbe_read_phys_ctr(cpu, i);
> + }
> + } else {
> + memset(partial_sample, 0,
> + NR_PHYS_CTRS * sizeof(*partial_sample));
> + }
> +
> + ent_hdr->pm_interval = cbe_read_pm(cpu, pm_interval);
> + ent_hdr->num_samples++;
> + *trace_buffer_sample += 2;
> +
> + /* In all cases, reset the in-use PMDs to their "long" reset value
> + * since we've effectively invalidated the data in this interval.
> + */
> + pmds_long_reset(set, cpu);
> +}
> +
> +/**
> + * handle_full_buffer
> + **/
> +static int handle_full_buffer(struct pfm_cell_hw_smpl_hdr *buf_hdr,
> + struct pfm_context *ctx,
> + struct pfm_event_set *set)
> +{
> + int cpu = smp_processor_id();
> +
> + /* Increment the number of sampling-buffer overflows. This
> + * is important for detecting duplicate sets of samples.
> + */
> + buf_hdr->overflows++;
> +
> + /* Reset the PMDs to their "long-reset" value. The hardware counters
> + * will be saved during the mask-monitoring call, and will be
> + * restored later when we are unmasked, so we need to be sure the
> + * counters are set to their correct initial values.
> + */
> + pmds_long_reset(set, cpu);
> +
> + /* Reset the interval timer so we start a whole
> + * new interval when we get restarted.
> + */
> + cbe_write_pm(cpu, pm_interval, set->pmcs[CELL_PMC_PM_INTERVAL]);
> +
> + /* Mask monitoring until a pfm_restart() occurs. */
> + pfm_mask_monitoring(ctx, set);
> + ctx->state = PFM_CTX_MASKED;
> + ctx->flags.can_restart = 1;
> +
> + /* Add a message to the context's message queue. */
> + pfm_cell_hw_smpl_add_msg(ctx);
> +
> + return -ENOBUFS;
> +}
> +
> +/**
> + * pfm_cell_hw_smpl_handler
> + *
> + * Create a new entry-header in the sampling-buffer and copy the current
> + * contents of the trace-buffer into the sampling-buffer.
> + **/
> +static int pfm_cell_hw_smpl_handler(void *smpl_buf,
> + struct pfm_ovfl_arg *ovfl_arg,
> + unsigned long ip,
> + u64 tstamp,
> + void *data)
> +{
> + struct pfm_cell_hw_smpl_hdr *buf_hdr = smpl_buf;
> + struct pfm_cell_hw_smpl_entry_hdr *ent_hdr;
> + struct pfm_event_set *set;
> + struct pfm_context *ctx;
> + u64 *trace_buffer_sample;
> + u64 free_bytes;
> +
> + /* If this handler was called due to an actual PMD overflowing, do
> + * nothing. Only store the contents of the trace-buffer if the trace-
> + * buffer overflowed or if we're disabling an event-set (during a
> + * process context-switch or an event-set switch).
> + */
> + if (!(ovfl_arg->ovfl_pmd == PFM_CELL_HW_SMPL_OVFL_PMD ||
> + ovfl_arg->ovfl_pmd == PFM_CELL_HW_SMPL_OVFL_PMD_PARTIAL))
> + return 0;
> +
> + ctx = __get_cpu_var(pmu_ctx);
> + set = ctx->active_set;
> +
> + /* Check if the sampling-buffer is full. This should never happen,
> + * since we check for a full buffer after adding the new entry.
> + */
> + free_bytes = buf_hdr->buf_size - buf_hdr->cur_offset;
> + if (free_bytes < PFM_CELL_HW_SMPL_MAX_ENTRY_SIZE)
> + return handle_full_buffer(buf_hdr, ctx, set);
> +
> + ent_hdr = init_entry_header(buf_hdr, set);
> +
> + read_trace_buffer(ent_hdr, &trace_buffer_sample);
> +
> + if (ovfl_arg->ovfl_pmd == PFM_CELL_HW_SMPL_OVFL_PMD_PARTIAL &&
> + ent_hdr->num_samples < CBE_PM_TRACE_BUF_MAX_COUNT) {
> + read_partial_sample(set, ent_hdr, &trace_buffer_sample);
> + }
> +
> + /* Update the sampling-buffer header for the next entry. Since the
> + * hw_smpl_hdr and hw_smpl_entry_hdr structures are both padded to
> + * 128-bits, and each trace-buffer sample is 128-bits, we know that
> + * every buffer entry will start on a 128-bit boundary.
> + */
> + if (ent_hdr->num_samples) {
> + buf_hdr->cur_offset = (void *)trace_buffer_sample -
> + (void *)buf_hdr;
> + buf_hdr->count++;
> + }
> +
> + /* Check the available size in the buffer again so we won't lose the
> + * next sample entry.
> + */
> + free_bytes = buf_hdr->buf_size - buf_hdr->cur_offset;
> + if (free_bytes < PFM_CELL_HW_SMPL_MAX_ENTRY_SIZE)
> + return handle_full_buffer(buf_hdr, ctx, set);
> +
> + return 0;
> +}
> +
> +/**
> + * pfm_cell_hw_smpl_restart
> + *
> + * Reinitialize the sampling-buffer header, effectively deleting all entries
> + * previously stored in the sampling-buffer.
> + *
> + * FIX: What is the "is_active" argument for? It's not used by any of the
> + * other sampling modules.
> + **/
> +static int pfm_cell_hw_smpl_restart(int is_active,
> + u32 *ovfl_ctrl,
> + void *smpl_buf)
> +{
> + struct pfm_cell_hw_smpl_hdr *buf_hdr = smpl_buf;
> +
> + buf_hdr->count = 0;
> + buf_hdr->cur_offset = sizeof(*buf_hdr);
> + buf_hdr->overflows = 0;
> +
> + return 0;
> +}
> +
> +/**
> + * pfm_cell_hw_smpl_exit
> + **/
> +static int pfm_cell_hw_smpl_exit(void *smpl_buf)
> +{
> + return 0;
> +}
> +
> +/**
> + * cell_hw_smpl_fmt
> + *
> + * Structure to describe the Cell hardware-sampling module to the Perfmon2
> core.
> + **/
> +static struct pfm_smpl_fmt cell_hw_smpl_fmt = {
> + .fmt_name = PFM_CELL_HW_SMPL_NAME,
> + .fmt_arg_size = sizeof(struct pfm_cell_hw_smpl_arg),
> + .fmt_flags = PFM_FMT_BUILTIN_FLAG,
> + .fmt_version = PFM_CELL_HW_SMPL_VERSION,
> + .fmt_validate = pfm_cell_hw_smpl_validate,
> + .fmt_getsize = pfm_cell_hw_smpl_get_size,
> + .fmt_init = pfm_cell_hw_smpl_init,
> + .fmt_handler = pfm_cell_hw_smpl_handler,
> + .fmt_restart = pfm_cell_hw_smpl_restart,
> + .fmt_exit = pfm_cell_hw_smpl_exit,
> + .owner = THIS_MODULE,
> +};
> +
> +static int __init pfm_cell_hw_smpl_init_module(void)
> +{
> + return pfm_fmt_register(&cell_hw_smpl_fmt);
> +}
> +
> +static void __exit pfm_cell_hw_smpl_exit_module(void)
> +{
> + pfm_fmt_unregister(&cell_hw_smpl_fmt);
> +}
> +
> +module_init(pfm_cell_hw_smpl_init_module);
> +module_exit(pfm_cell_hw_smpl_exit_module);
> diff --git a/arch/powerpc/platforms/cell/pmu.c
> b/arch/powerpc/platforms/cell/pmu.c
> index 66ca4b5..e518af7 100644
> --- a/arch/powerpc/platforms/cell/pmu.c
> +++ b/arch/powerpc/platforms/cell/pmu.c
> @@ -213,7 +213,7 @@ u32 cbe_read_pm(u32 cpu, enum pm_reg_name reg)
> break;
>
> case pm_interval:
> - READ_SHADOW_REG(val, pm_interval);
> + READ_MMIO_UPPER32(val, pm_interval);
> break;
>
> case pm_start_stop:
> diff --git a/include/asm-powerpc/cell-pmu.h b/include/asm-powerpc/cell-pmu.h
> index 981db26..aa7416d 100644
> --- a/include/asm-powerpc/cell-pmu.h
> +++ b/include/asm-powerpc/cell-pmu.h
> @@ -41,6 +41,11 @@
> #define CBE_PM_FREEZE_ALL_CTRS 0x00100000
> #define CBE_PM_ENABLE_EXT_TRACE 0x00008000
>
> +#define CBE_PM_TRACE_MODE_NONE 0
> +#define CBE_PM_TRACE_MODE_COUNT 1
> +#define CBE_PM_TRACE_MODE_OCCURRENCE 2
> +#define CBE_PM_TRACE_MODE_THRESHOLD 3
> +
> /* Macros for the trace_address register. */
> #define CBE_PM_TRACE_BUF_FULL 0x00000800
> #define CBE_PM_TRACE_BUF_EMPTY 0x00000400
> diff --git a/include/asm-powerpc/perfmon_cell_hw_smpl.h
> b/include/asm-powerpc/perfmon_cell_hw_smpl.h
> new file mode 100644
> index 0000000..4927438
> --- /dev/null
> +++ b/include/asm-powerpc/perfmon_cell_hw_smpl.h
> @@ -0,0 +1,119 @@
> +/*
> + * Copyright IBM Corp 2007
> + *
> + * Contributed by Carl Love <[EMAIL PROTECTED]>
> + * and Kevin Corry <[EMAIL PROTECTED]>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
> + * 02111-1307 USA
> + */
> +#ifndef __ASM_POWERPC_PERFMON_CELL_HW_SMPL_H__
> +#define __ASM_POWERPC_PERFMON_CELL_HW_SMPL_H__
> +
> +/**
> + * struct pfm_cell_hw_smpl_arg
> + *
> + * @buf_size: Desired size of full sampling-buffer (in bytes).
> + * @buf_flags: Sampling-buffer-specific flags.
> + * @reserved1: Pad to 128-bit boundary.
> + *
> + * Format specific parameters (passed at context creation).
> + **/
> +struct pfm_cell_hw_smpl_arg {
> + __u64 buf_size;
> + __u32 buf_flags;
> + __u32 reserved1;
> +};
> +
> +/**
> + * struct pfm_cell_hw_smpl_hdr
> + *
> + * @count: Number of valid pfm_cell_hw_smpl_entry_hdr's in the
> sampling-buffer.
> + * @cur_offset: Offset (in bytes) from the top of the sampling-buffer to the
> + * next available space for a sampling-buffer entry.
> + * @overflows: Number of times the sampling-buffer has filled up.
> + * @buf_size: Total bytes in the sampling-buffer.
> + * @version: Sampling module version.
> + * @buf_flags: Copy of buf_flags from pfm_cell_hw_smpl_arg.
> + * @reserved1: Pad to 128-bit boundary.
> + *
> + * This header is at the beginning of the sampling-buffer returned to the
> user.
> + * It is directly followed by the first entry header.
> + **/
> +struct pfm_cell_hw_smpl_hdr {
> + __u64 count;
> + __u64 cur_offset;
> + __u64 overflows;
> + __u64 buf_size;
> + __u32 version;
> + __u32 buf_flags;
> + __u64 reserved1;
> +};
> +
> +/**
> + * struct pfm_cell_hw_smpl_entry_hdr
> + *
> + * @pid: Thread ID.
> + * @tgid: Thread group ID. For NPTL, this is getpid().
> + * @cpu: CPU on which the overflow occurred.
> + * @set: Event-set that was active when the overflow occurred.
> + * @num_samples: Number of 128-bit trace-buffer samples in this entry.
> + * @entry_num: Sequence number of sampling-buffer entries.
> + * @pm_interval: If the last sample in this entry is for a partial time
> + * interval, this is the value of pm_interval when the partial
> + * sample was recorded. If the value is zero, there is no partial
> + * sample at the end of the entry.
> + * @reserved1: Pad to 128-bit boundary.
> + *
> + * The header for each data entry in the sampling-buffer. The entry header
> + * is immediately followed by the contents of the trace-buffer. Each sample
> in
> + * the trace-buffer is 128 bits wide. The entry header specifies the number
> + * of 128-bit trace-buffer samples that follow the header.
> + **/
> +struct pfm_cell_hw_smpl_entry_hdr {
> + __u32 pid;
> + __u32 tgid;
> + __u32 cpu;
> + __u32 set;
> + __u32 num_samples;
> + __u32 entry_num;
> + __u32 pm_interval;
> + __u32 reserved1;
> +};
> +
> +/* The max size of each sampling-buffer entry is the size of the entry header
> + * plus the full size of the trace-buffer.
> + */
> +#define PFM_CELL_HW_SMPL_MAX_ENTRY_SIZE \
> + (sizeof(struct pfm_cell_hw_smpl_entry_hdr) + \
> + 2 * sizeof(u64) * CBE_PM_TRACE_BUF_MAX_COUNT)
> +
> +/* The sampling-buffer must be at least as large as the sampling-buffer
> header
> + * and the max size of one sampling-buffer entry.
> + */
> +#define PFM_CELL_HW_SMPL_MIN_BUF_SIZE (sizeof(struct pfm_cell_hw_smpl_hdr) +
> \
> + PFM_CELL_HW_SMPL_MAX_ENTRY_SIZE)
> +
> +#define PFM_CELL_HW_SMPL_VERSION 1
> +#define PFM_CELL_HW_SMPL_NAME "perfmon_cell_hw_smpl"
> +#define PFM_CELL_HW_SMPL_OVFL_PMD (PFM_MAX_PMDS + 1)
> +#define PFM_CELL_HW_SMPL_OVFL_PMD_PARTIAL (PFM_MAX_PMDS + 2)
> +#define PFM_MSG_CELL_HW_SMPL_BUF_FULL 99
> +
> +/* Values are indexes into pfm_cell_pmc_desc[] array. */
> +#define CELL_PMC_PM_STATUS 20
> +#define CELL_PMC_PM_CONTROL 21
> +#define CELL_PMC_PM_INTERVAL 22
> +
> +#endif /* __ASM_POWERPC_PERFMON_CELL_HW_SMPL_H__ */
> diff --git a/include/linux/perfmon.h b/include/linux/perfmon.h
> index 631cfca..66f0a44 100644
> --- a/include/linux/perfmon.h
> +++ b/include/linux/perfmon.h
> @@ -593,10 +593,13 @@ void pfm_switch_sets(struct pfm_context *ctx,
> int reset_mode,
> int no_restart);
>
> +union pfarg_msg *pfm_get_new_msg(struct pfm_context *ctx);
> void pfm_save_pmds(struct pfm_context *ctx, struct pfm_event_set *set);
> int pfm_ovfl_notify_user(struct pfm_context *ctx,
> struct pfm_event_set *set,
> unsigned long ip);
> +void pfm_mask_monitoring(struct pfm_context *ctx,
> + struct pfm_event_set *set);
>
> int pfm_init_fs(void);
>
> diff --git a/perfmon/perfmon.c b/perfmon/perfmon.c
> index 6d1b56e..eb7648c 100644
> --- a/perfmon/perfmon.c
> +++ b/perfmon/perfmon.c
> @@ -58,6 +58,7 @@ DEFINE_PER_CPU(struct task_struct *, pmu_owner);
> DEFINE_PER_CPU(struct pfm_context *, pmu_ctx);
> DEFINE_PER_CPU(u64, pmu_activation_number);
> DEFINE_PER_CPU(struct pfm_stats, pfm_stats);
> +EXPORT_PER_CPU_SYMBOL(pmu_ctx);
>
> #define PFM_INVALID_ACTIVATION ((u64)~0)
>
> @@ -74,7 +75,7 @@ int perfmon_disabled; /* >0 if perfmon is disabled */
> * get a new message slot from the queue. If the queue is full NULL
> * is returned and monitoring stops.
> */
> -static union pfarg_msg *pfm_get_new_msg(struct pfm_context *ctx)
> +union pfarg_msg *pfm_get_new_msg(struct pfm_context *ctx)
> {
> int next;
>
> @@ -95,6 +96,7 @@ static union pfarg_msg *pfm_get_new_msg(struct pfm_context
> *ctx)
>
> return ctx->msgq+next;
> }
> +EXPORT_SYMBOL(pfm_get_new_msg);
>
> void pfm_context_free(struct pfm_context *ctx)
> {
> diff --git a/perfmon/perfmon_intr.c b/perfmon/perfmon_intr.c
> index ba23be5..f9b7cf9 100644
> --- a/perfmon/perfmon_intr.c
> +++ b/perfmon/perfmon_intr.c
> @@ -40,7 +40,7 @@
> #include <linux/module.h>
> #include <linux/random.h>
>
> -static inline void pfm_mask_monitoring(struct pfm_context *ctx,
> +inline void pfm_mask_monitoring(struct pfm_context *ctx,
> struct pfm_event_set *set)
> {
> u64 now;
> @@ -63,6 +63,7 @@ static inline void pfm_mask_monitoring(struct pfm_context
> *ctx,
> */
> set->duration += now - set->duration_start;
> }
> +EXPORT_SYMBOL(pfm_mask_monitoring);
>
> /*
> * main overflow processing routine.
> _______________________________________________
> perfmon mailing list
> [email protected]
> http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
--
-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/