Stephane I sent this response directly to your email address when I meant to post it back to the mailing list. I have added to the end some information related to running pfmon in sampling mode and am now posting it to the mailing list. Sorry for the incorrect routing that caused you to get multiple copies.
Stephane, Phil Responses imbeded below. "stephane eranian" <[EMAIL PROTECTED]> wrote on 04/16/2008 02:31:49 AM: > Phil, > > On Wed, Apr 16, 2008 at 11:22 AM, Philip Mucci <[EMAIL PROTECTED]> wrote: > > Folks, > > > > hpcrun does it's sampling inside the target process using first person > > access, not a 3rd person ptrace() like pfmon, so the process is > > implicitly blocked when processing samples, i.e. there are no dropped > > samples unless something else has gone wrong. > > > Thanks for clarifing this. It makes more sense given how PAPI works. I think this means that I just lost the best explanation I have had so far as to why I see these inconsistencies. > > > Another thing, you cannot rely on the sample count of hpcrun to > > compute cycles. Why? Because those are samples that only have not been > > dropped. If samples occur ourside of the sample space (as can happen > > when one has floating point exceptions), the address will be in kernel > > space and it will be dropped. pfmon has no concept of filtering out > > addresses, so even if you ask for user-space samples, you'll still get > > samples in the output with kernel addresses. I'm not sure what the > > default is for your version of pfmon. > > > PFmon does not do filtering of samples. It relies on the hardware via > the priv levels. By default, pfmon only measures at the user level. > That does not mean you won't get kernel-level samples because there > are boundary effects when sampling. > > > Which value is correct, according to /bin/time? 2Billion or 154 Billion? > > > This is a valid point. Which value makes most sense related to time? Let me provide all the steps I use when running these tests. Maybe I am just doing something wrong and you can correct my misunderstanding. When I use "time" with my hpcrun test, it provides this information: time hpcrun -e CPU_CYCLES:32767 -o hpcrun.data -- ./code.exe >hpcrun 2>hpcrun.debug real 1m44.921s user 1m39.490s sys 0m2.420s The output from the hpcprof run on all of the data files produced by this test shows the following summary information: Columns correspond to the following events [event:period (events/sample)] CPU_CYCLES:32767 - CPU Cycles (29 samples) [not shown] CPU_CYCLES:32767 - CPU Cycles (9 samples) [not shown] CPU_CYCLES:32767 - CPU Cycles (29 samples) [not shown] CPU_CYCLES:32767 - CPU Cycles (29 samples) [not shown] CPU_CYCLES:32767 - CPU Cycles (41602 samples) [not shown] CPU_CYCLES:32767 - CPU Cycles (24490 samples) [not shown] CPU_CYCLES:32767 - CPU Cycles (7 samples) [not shown] CPU_CYCLES:32767 - CPU Cycles (5 samples) [not shown] CPU_CYCLES (min):32767 - CPU Cycles (The minimum for events of this type.) (1 samples) CPU_CYCLES (max):32767 - CPU Cycles (The maximum for events of this type.) (66127 samples) CPU_CYCLES (sum):32767 - CPU Cycles (Summed over all events of this type.) (66200 samples) When I multiply the number of samples (66200) times the sample period I get the total number of CPU_CYCLES of: 2169175400 The file /proc/cpuinfo shows (this is repeated for 7 more processors): processor : 0 vendor : GenuineIntel arch : IA-64 family : Itanium 2 model : 2 revision : 1 archrev : 0 features : branchlong cpu number : 0 cpu regs : 4 cpu MHz : 1600.000442 itc MHz : 1600.442800 BogoMIPS : 2398.61 siblings : 1 physical id: 24576 LID : 0x00000060 When I divide this by the cpu frequency (1600) I get: 1355734.625 (or 1.3 seconds) When I use "time" with my pfmon test, it provides this information: pfmon --debug -v -e CPU_CYCLES ./code.exe >pfmon 2>pfmon.debug real 1m38.834s user 1m38.360s sys 0m0.490s The output from the pfmon test run shows: [10489] monitoring ./code.exe ... results are on terminal [10489] activating monitoring measurements started at Wed Apr 16 09:12:59 2008 [10489] task exited 156255456222 CPU_CYCLES [10489] detached created tasks : 1 maximum tasks : 1 maximum active tasks : 1 measurements completed at Wed Apr 16 09:14:38 2008 When I divide this by the cpu frequency (1600) I get: 97659660.13875 (or 97.6 seconds) When I use "time" with pfmon (with sampling), it provides this information: pfmon --debug -v -e CPU_CYCLES --long-smpl-periods=32767 --smpl-outfile=pfmons.data ./code.exe >pfmons 2>pfmons.debug real 1m55.932s user 1m53.370s sys 0m2.350s The output from the pfmon (with sampling) test run shows: loaded 16582 text symbols from /proc/kallsyms loaded 4566 data symbols from /proc/kallsyms table_size=188319 effective=88389 min_addr=0x40000000004a82f0 loaded 88389 text symbols from ELF file ./code.exe table_size=188319 effective=9030 min_addr=0x4000000003138f00 loaded 9030 data symbols from ELF file ./code.exe measuring at user privilege level ONLY 1 event set(s) defined long sampling periods(val/mask/seed): 32767/0x0/0 short sampling periods(val/mask/seed): 32767/0x0/0 using detailed-ia64 sampling module pfmlib_itanium2.c (pfm_ita2_dispatch_counters.484): ev[0]=CPU_CYCLES counters=0xf0 pfmlib_itanium2.c (pfm_ita2_dispatch_counters.500): max_l0=8 max_l1=4 max_l2=4 max_l3=4 4 (-1,-1): CPU_CYCLES [PMC4(pmc4)=0x001228 thres=0 es=0x12 plm=8 umask=0x0 pm=0 ism=0x0 oi=1] CPU_CYCLES [PMD4(pmd4)] [PMC15(pmc15)=0xffffffff ibrp0_pmc8=1 ibrp1_pmc9=1 ibrp2_pmc8=1 ibrp3_pmc9=1] pfmlib_itanium2.c (pfm_dispatch_btb.949): found_btb=0 found_bar_dear=0 pmd setup for event set0: [pmd4 set=0 ival=0xffffffffffff8001 long_rate=0xffffffffffff8001 short_rate=0xffffffffffff8001 mask=0x0 seed=0 randomize=n] [pmd4 set=0 smpl_pmds=0x0 reset_pmds=0x0] sampling buffer #entries=15437341 size=617493912, max_entry_size=40 vCPU1 -> pCPU1 vCPU0 -> pCPU0 vCPU6 -> pCPU6 vCPU2 -> pCPU2 vCPU3 -> pCPU3 vCPU4 -> pCPU4 exec-pattern=* vCPU5 -> pCPU5 vCPU7 -> pCPU7 [14370] started task: ./code.exe follow_exec=n follow_vfork=n follow_fork=n follow_pthread=n [14370] monitoring ./code.exe ... [14370] sampling results will be in file "pfmon.data" [14370] activating monitoring measurements started at Wed Apr 16 14:03:21 2008 [14370] task exited [14370] 4804705 samples collected (0 buffer overflows) [14370] detached created tasks : 1 maximum tasks : 1 maximum active tasks : 1 measurements completed at Wed Apr 16 14:05:17 2008 Samples (4804705) times sample period (32767) divided by CPU frequency (1600) gives: 98397355.459375 (or 98.3 seconds) The pfmon test (both with and without sampling) reports CPU cycle counts that seem to account for the cpu time used during the test. In the sampling case I can see a slight loss (113 seconds used and 98 seconds accounted for) which is probably explained by the sampling blind spot Stehpane described in previous email. The hpcrun test reports a CPU cycle count that only seems to represent about 1 to 2 percent of the cpu time used during the test. Phil pointed out that samples with kernel addresses would be droped by hpcrun but in is it really reasonable to assume that 98 percent of the samples happened in kernel space when only about 3 percent of the processing time happens in the kernel ? I guess I still do not understand why hpcrun does not report sample counts that account for the CPU time used in user space ? Thanks again for any information that may help solve this mystery. Gary ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ perfmon2-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
