Hi Stephane,

I'm not what you mean about lists or switching on the fly. PAPI
eventsets contain everything needed to start and stop PMU state. For a
given thread, only 1 PAPI eventset can be running at a given time. A
user is free to create as many eventsets as he wants and then start/stop
them whenever subject to the above limitation. Each PAPI eventset may
map to many counters requiring more than 1 PerfMon2 eventset which means
I need to do a create_eventset, load_context upon every start. 

Today I discovered something even "worse" (as far as papi goes), I
implemented sampling today for PAPI profiling...and I discovered that
sampling has to be set up a create_context time. Which means that for
PAPI eventsets that sample, I have to do a create_context inside start
and close(fd) inside stop. Basically, for PAPI start and stop to work, I
have to run through the entire Perfmon API. ;-)

PAPI_start and stop does not have to be fast...we claim it's slow
because it usually means a system call. User's are supposed to use
read(). However in the perfmon2 case, it will be the slowest of all
implementations. I'm not dogging perfmon2 here...I'm just saying that it
doesn't fit the semantics of PAPI (and has been for a while, I make no
claims it's good or bad.) But clearly in this case, it doesn't match the
usage model of PFM. Is there any way to provide ioctl()'s on the FD to
do the things that create_context and create_eventsets do so I don't
always have to jump through all the hoops. I don't think this is super
important, just something to be aware of. PAPI can survive as it is and
be optimized later.

Now for some other points about sampling.

1) It would be great if when sampling, the PFM_OVFL_MSG contained a
pointer to the sample header. When self sampling multiple threads,
there's no such thing as global variables (which are used all throughout
the test suite)
2) It would also be great, if the sample header also contained the
number of sampled pmds (also a global in the test cases).

The above two would remove the need for a hash lookup function and would
make the signal handler fully self contained to process sample entries.

Now for the bad news (really, I'm sorry about this one...)

- BAD: Sampling works great in PAPI up until the 3rd cycle of
create_context() witrh 4*getpagesize() sample entries. The fourth one
always returns the dreaded 'not supported' error message and ERRNO is
set to ENOMEM. I have munmap()'d the buffer and close()d the context
file descriptor between each incantation. This is in PAPI's profile test
case which runs profiling a bunch of times (and init and shutdown PAPI
for every time) Would you like to see the DEBUG log (it's big)

- WORSE: I can hard lock my i386 2.7.17.10 kernel by running
task_smpl_user on emacs and Cntrl-C it about 1 out of every 5 times.
There is no oops, nothing...just hard lock.

The good news:

- PAPI for Perfmon is well on it's way to having full support. I believe
it will break horribly on the PIV due to the pmc/pmd mapping issues (and
trying to figure out the final offset into the pmd structure with
multiplexing is even worse...)

Thanks for listening. We're almost home...

Phil

P.S. Have you had any requests for pfm_dispatch_events to be able to
dispatch events with multiplexing enabled? That would simplify things
greatly. I am not confident in the ability of the code to get the
resulting offsets of the final PD structure correct...especially in
light of PIV like beasties.

On Tue, 2006-08-29 at 07:45 -0700, Stephane Eranian wrote:
> Phil,
> 
> On Mon, Aug 28, 2006 at 10:07:11PM +0000, Philip Mucci wrote:
> > 
> > Today I got kernel multiplexing with Perfmon2 working in PAPI. All tests
> > in PAPI are passing at this juncture. However, I must say that
> > implementing multiplexing was somewhat painful. Before I get on the
> > soapbox about that, there is a more serious issue (I think).
> > 
> > You can't do anything related to eventsets while the context is loaded.
> > Every time I tried to do a create_evtsets after a load_context I would
> > get a 'not supported' error from Perfmon. 
> > 
> Yes, this is the expected behavior. 
> 
> > PAPI has a few function at the low level, in short they can be referred
> > to as:
> > 
> > init_control_state
> > update_control_state
> > start
> > read
> > stop
> > 
> > and of course, init/tear down/option handling routines. PAPI can have
> > multiple eventsets, even though only 1 can be running at any given time
> > (unless you are attached to another process, which I have also
> > implemented)
> > 
> 
> There is something confusing about your PAPI description of eventsets.
> Before I can comment, you need to describe this a bit more.
> 
> Are you saying that PAPI can manage multiple lists of distinct events sets?
> For instance:
>  L1 = set1, set2, set3
>  L2 = set1, set2, set3, set4
> 
> Where setX encapsulates the full PMU state (i.e., all accessible registers).
> And you want to start with L1 and then switch to L2 on the fly?
> 
> Am I getting this right?
> 
> --
> -Stephane

_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to