I'm submitting the following closed approved automatic fasttrack on behalf of Jon Haslam and the DTrace community. It has been approved by the community after discussion on dtrace-discuss at opensolaris.org. The stability is Committed and the binding is Patch.
Adam ---8<--- A. INTRODUCTION This case adds the 'cpc' provider which will enable consumers to access the performance counters of a CPU. This will allow users to easily connect CPU events (e.g. TLB misses, L2 cache misses) to the cause of the event on a system-wide basis. The Solaris CPU Performance Counter (CPC) subsystem (PSARC 2002/180) gives general purpose access to the hardware performance counters of a microprocessor. The cpc provider leverages the infrastructure provided by the CPC subsystem to access the CPU performance counter resources of a system. The provider utilises the hardware overflow interrupt mechanism to allow profiling based upon CPU performance counter events (in the same way that the profile provider allows us to profile by time). B. DESCRIPTION 1. Probe Format The format of probes made available by the cpc provider: cpc:::event_name-mode-{optional mask}-count where: event_name: The event name of interest. A full list of events available on each platform are given in the output of `cpustat -h`. mode: The operating mode of the processor in which the event is counted. Valid settings are "user" (user mode), "kernel" (kernel mode) and "all" (user and kernel mode). optional mask: Some platform specific events can be further specified with the use of a mask (sometimes known as a 'umask' or an 'emask'). This field is optional and can only be specified for platform specific events. It cannot be used with generic performance counter events (PSARC 2008/334). Specified as a hex value. count: Specifies the number of events to be counted on a CPU for a probe to fire on that CPU. As an example, the specification for a probe which fires every 10000 user mode DTLB misses on an UltraSPARC IV processor would look like: cpc:::DTLB_miss-user-10000 The probes exported by the cpc provider are unanchored and are not associated with a particular point of execution, but rather an asynchronous performance counter event interrupt. When a probe is fired we can sample aspects of system state and inferences can be made about system behaviour. The following example records the user-land stack trace if the "foo" executable was executing when the probe fired and the probe fires every 10000 user mode L1 instruction cache misses (note that executable "foo" may have generated anywhere between 1 and 10000 of those events). cpc:::IC_miss-user-10000 /execname == "foo"/ { @[ustack()] = count(); } 2. Probe arguments All probes provide two arguments: arg0 The program counter (PC) in the kernel at the time the probe fired, or 0 if the current process was not executing in the kernel at the time the probe fired. arg1 The PC in the user-level process at the time the probe fired, or 0 if the current process was executing in the kernel at the time the probe fired. 3. Probe Availability Probes are made available dynamically when requested by a user. The probes available will differ according to the events exported by the CPC subsystem on a platform. The names of available events can be discovered, as mentioned in section 'B1 - Probe Format', using the output of `cpustat -h`. CPU performance counters are a finite resource and the number of probes that can be enabled depends upon hardware capabilities. Processors that cannot determine which counter has overflowed when multiple counters are programmed (e.g. AMD, UltraSPARC) are only allowed to have a single enabling at any one time. On such platforms, consumers attempting to enable more than 1 probe will fail as will consumers attempting to enable a probe when a disparate enabling already exists. Processors that can detect which counter has overflowed (e.g. Niagara2, Intel P4) are allowed to have as many probes enabled as the hardware will allow. This will be, at most, the number of counters available on a processor. On such configurations, multiple probes can be enabled at any one time. Probes are enabled by consumers on a first-come, first-served basis. When hardware resources are fully utilised subsequent enablings will fail until resources become available. 3. Co-existence with existing tools The provider has priority over per-LWP libcpc usage (i.e. cputrack) for access to counters. In the same manner as cpustat, enabling probes causes all existing per-LWP counter contexts to be invalidated. As long as these enablings remain active, the counters will remain unavailable to cputrack-type consumers. Only one of cpustat and DTrace may use the counter hardware at any one time. Ownership of the counters is given on a first-come, first-served basis. 4. Limiting Overflow Rate So as to not saturate the system with overflow interrupts, a default minimum of 5000 is imposed on the value that can be specified for the 'count' part of the probename (refer to section 'B1 - Probe Format'). This can be reduced explicitly by altering the 'dcpc_min_overflow' kernel variable with mdb(1) or by modifying the dcpc.conf driver configuration file and unloading and reloading the dcpc driver module. C. EXAMPLES 1. Instructions executed by applications on an AMD platform: cpc:::FR_retired_x86_instr_w_excp_intr-user-10000 { @[execname] = count(); } # ./user-insts.d dtrace: script './user-insts.d' matched 2 probes ^C [chop] init 138 dtrace 175 nis_cachemgr 179 automountd 183 intrd 235 run-mozilla.sh 306 thunderbird 316 Xorg 453 thunderbird-bin 2370 sshd 8114 2. A kernel profiled by cycle usage on an AMD platform. cpc:::BU_cpu_clk_unhalted-kernel-10000 { @[func(arg0)] = count(); } # ./kerncycprof.d dtrace: script './kerncycprof.d' matched 1 probe ^C [chop] genunix`vpm_sync_pages 478948 genunix`vpm_unmap_pages 496626 genunix`vpm_map_pages 640785 unix`mutex_delay_default 916703 unix`hat_kpm_page2va 988880 tmpfs`rdtmp 991252 unix`hat_page_setattr 1077717 unix`page_try_reclaim_lock 1213379 genunix`free_vpmap 1914810 genunix`get_vpmap 2417896 unix`page_lookup_create 3992197 unix`mutex_enter 5595647 unix`do_copy_fault_nta 27803554 3. L2 cache misses, by function, generated by any running executables called 'brendan' on an AMD platform. cpc:::BU_fill_req_missed_L2-all-0x7-10000 /execname == "brendan"/ { @[ufunc(arg1)] = count(); } ./brendan-l2miss.d dtrace: script './brendan-l2miss.d' matched 1 probe CPU ID FUNCTION:NAME ^C brendan`func_gamma 930 brendan`func_beta 1578 brendan`func_alpha 2945 4. The same example as in example (3) above but using a generic event to specify L2 data cache misses: cpc:::PAPI_l2_dcm-all-10000 /execname == "brendan"/ { @[ufunc(arg1)] = count(); } # ./papi-l2miss.d dtrace: script './papi-l2miss.d' matched 1 probe ^C brendan`func_gamma 1681 brendan`func_beta 2521 brendan`func_alpha 5068 D. REFERENCES http://bugs.opensolaris.org/view_bug.do?bug_id=6486156 PSARC/2002/180 CPU Performance Counters (CPC) Version 2 PSARC/2008/334 CPU Performance Counter Generic Event Names E. DOCUMENTATION A new chapter has been added to the Solaris Dynamic Tracing Guide for this proposed provider: http://wikis.sun.com/display/DTrace/Documentation # DTrace Guide http://wikis.sun.com/display/DTrace/cpc+Provider # CPC Provider Chapter F. STABILITY The DTrace internal stability table is described below: Element Name stability Data stability Dependency class Provider Evolving Evolving Common Module Private Private Unknown Function Private Private Unknown Name Evolving Evolving CPU Arguments Evolving Evolving Common