Re: Is multithreaded profiling on cygwin possible?

2003-10-20 Thread peter garrone
Even if no one else comments, I really appreciate all this work you're
doing!  Also, thanks for continuing to send me the updated patches.  I
wish I had more time to look over them in detail right now.  I'll try and
do that soon.  I assume it is ok to give an open invitation for anyone
else who would like to give the a whirl?

BTW, if you are interested in contributing this, please take a look at the
Before you get started section of http://cygwin.com/contrib.html since
the assignment process can take some time.

Again, great work from my point of view!

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
Phone: 314-551-8460
Fax:   314-551-8444


Thanks Brian.

This message, I believe, gives the code upon which Brian's endorsement is based.

Regarding the IP issues, there is no problem with my posting the code to the group,
or with anyone using it, from my employers point of view. No warranty is implied etc.
I have forwarded the small document that our legal people need to sign.
Until then I would imagine that the code cannot be part of cygwin.

This message contains the code for two files, profil.c, and gmraw.c.

To use this code, replace winsup/cygwin/profil.c with the file profil.c here.
Recompile cygwin. Either update the DLL, or directly link in the profil.o object file.

- Compile and link with -pg.

- If using multi-threaded code, each new thread needs to call moncontrol(1)
  upon creation. The main thread calls it automatically.

- To profile DLL's, and general memory ranges,
  set the environment variable PROFILE_RANGE. This variable
  should consist of one or more colon separated entries. Each entry consists
  of a name, followed by optional comma separated fields representing
  the scale, offset, and size, in hex.

  It the name is the name of a DLL, then the entry is a DLL entry, else a general 
memory range.

  The scale field determines the profiling resolution. Divide 0x1 by the scale
  value to give the profiling resolution in bytes. For a DLL, the default scale is
  1 giving a resolution of 1 byte. For a memory range, the default scale is such 
as to
  give 256 incrementing counters, or a scale of 1 if the range is too great for that.
  
  If the name is the name of a DLL, then the offset is with respect to the loaded DLL,
  otherwise it is a fixed address. The default offset value is zero.

  The size field defines the size of the address range to be profiled. If the range
  covers a DLL, then the default value is the size of the DLL less the offset. If
  the range is a general memory range, the default size is 0x8000.
  
  I use the following to profile all the DLL's my application links to, the
  entire memory range from 0 to 0x8000 (called general), and a small area where 
my application
  is spending a lot of CPU (called busy). The list of DLL's was obtained using 
cygcheck.
 
export 
PROFILE_RANGE='general:busy,1,7ffe,1:cygX11-6.dll:cygcygipc-2.dll:cygwin1.dll:KERNEL32.dll:ntdll.dll:GDI32.dll:USER32.dll:ADVAPI32.dll:RPCRT4.dll:GLU32.DLL:msvcrt.dll:OPENGL32.dll:DDRAW.dll:DCIMAN32.dll:glut32.dll:WINMM.dll'

  Alternatively, the following command causes only the cygwin dll to be profiled.

export PROFILE_RANGE='cygwin1.dll'

- Execute your program.

Upon program termination, besides gmon.out,
a file in gmon.out format will be output for each range, with .gmonout appended
to the name of each range, cygwin1.dll.gmonout etc.
To profile the cygwin dll, provided you are using an unstripped dll, the command

gprof /bin/cygwin1.dll cygwin1.dll.gmonout

produces a flat profile. 

To see a memory range, or for dlls with no symbolic information, a utility called 
gmraw is provided.
Compile this with gcc gmraw.c -o gmraw.
The command gmraw -f general.gmonout for example provides information about that 
particular range.

- WRT the interface. Calling moncontrol(0) at any point should terminate the profiling 
thread,
  and profiling of the memory ranges, but the data should still be saved upon program 
termination.
  It should be possible to use the profil function provided -pg 
compilation/linkage option
  is not used. Each thread should call profil with identical parameters. I have not 
tested
  this much.

-
File profil.c
-
/* profil.c -- win32 profil.c equivalent

   Copyright 1998, 1999, 2000, 2001 Red Hat, Inc.

   This file is part of Cygwin.

   This software is a copyrighted work licensed under the terms of the
   Cygwin license.  Please consult the file CYGWIN_LICENSE for
   details. */

#include windows.h
#include psapi.h
#include stdio.h
#include sys/types.h
#include sys/stat.h
#include unistd.h
#include fcntl.h
#include errno.h
#include math.h

#include profil.h
#include gmon.h
#include assert.h

#define SLEEPTIME (1000 / 

Re: Is multithreaded profiling on cygwin possible

2003-10-20 Thread peter garrone
Brian Ford wrote:
On Fri, 17 Oct 2003, peter garrone wrote:


  I have dropped the dll import library concept.

Probably good.  Although, it would still be neat to figure out a way to
trace back to the application leaf functions.  I guess that will be an
exercise for later.

Unfortunately I dont have any problems that this approach would address.
I am keeping a copy of my work so far, and would send it to anyone,
but I have no current plans to continue with it.

  My current approach is to keep track of the time accumulated
 by each thread, and when it has exceeded the amount represented
 by a profiling period, assign the tick to the current PC,
 and subtract that amount of time from the running total for the
 thread. So it always adds up, anyway.

I was going to suggest this when we were talking about partial ticks, but
I was worried about charging time to the wrong PC.  Looking back, this
still fits in fine with the PC sampling philosophy.

Yes. It does get a bit philosophical. The program being profiled must
be regarded as a random process in itself, so adding an RNG only confuses
the issue.


  I still have it so that each thread that is to be profiled calls
 moncontrol(1). Also, an application compiled and linked without
 -pg could always use the profil call in a similar way.
 Each thread would call profil with identical parameters.

It should be simple to add an all threads mode later if we want, so this
is fine.  Does the main thread still call moncontrol(1) when compiled with
-pg?  I would think this would be required.

Yes. It links in a special crt program that calls moncontrol(1). On linux,
it is necessary to call getitimer/setitimer for each thread as discussed previously,
and with this code on cygwin moncontrol(1). But this allows a lot of flexibility.


 To do the DLL's, I have added a linked list of profiling ranges
 to profil.c. These ranges are specified using an environment
 variable. The ranges may be DLL specific, or general memory ranges.
 There is a separate data file output upon program termination
 for each range, in addition to gmon.out.

The linked list sounds reasonable.  I guess if there were too many
threads, something less linear would be better.  Does anyone have a
suggestion about how to find these address ranges automatically (at least
for non dynamically loaded DLLs)?

I assume the gmon.out file contains just the original program proper
memory range?

Yes, from text to etext, the static linked range.
This is as per the current gmon.c file, which I am not proposing to change.
If call counts were required, it is usually possible to link a cygwin dll
function statically, and that would produce the call counts.


I really wish we could get someone on the binutils list interested in
helping to extend the gmon.out file format to contain multiple hashes.  We
would still need a method to map the ranges to the DLLs.  Determining the
DLLs should be easy unless they were dynamically loaded.  You really
should send at least a ping over there, but you're doing all the work, so
I'll shut up now.

As it pans out, it doesnt seem so necessary to me, because gprof profiles the 
un-stripped
DLL's as they are. gprof does one thing well, profiling a BFD, which is
sort of in line with the UNIX philosophy. All the profiling data exist
as separate files for the various profiling ranges. This way solves my current
profiling problems and gives me fine control. Why profile every DLL when
you may only be interested in one? But it is absolutely essential to me
to be able to profile selected memory ranges.


  If a dll has not been stripped, gprof will use the data file
 and the dll to output a flat profile, but without call counts though.
 (At least this works with cygwin1.dll)

That sounds like it would be *really* usefull to cygwin developers!  This
concept was discussed before in the references I gave you, but never
pushed this far.

I must say that it was unforeseen by myself, and very easy, accidental almost.
I had to save the profiling data for raw address ranges in some format,
gmon.out being the logical one. That was all the work necessary. 
I wondered if gprof would work with that data, and it did.
I hope this is standard for gprof, as in it reads any BFD with symbolic information,
and not some kludge. I am not expert enough to know.


  I have written a simple utility to summarise the information
 in these data files, giving flat addresses and CPU usage.

This sounds like a useful inclusion.  Again, it would be more functional
if we could feed all this into gprof and get partial call graphs.

Sure, mayby a special flag to gprof saying to generate a flat raw output.
The code is preety simple.


Even if no one else comments, I really appreciate all this work you're
doing!  Also, thanks for continuing to send me the updated patches.  I
wish I had more time to look over them in detail right now.  I'll try and
do that soon.  I assume it is ok to give an open invitation for anyone
else who would 

Re: Is multithreaded profiling on cygwin possible?

2003-10-17 Thread peter garrone

Hi Brian
 
 Thanks very much for your comments.

 I think I have changed my approach so that it is broadly similar to
your suggestions, but may differ in some details.

 I have dropped the RNG. I dont think it is necessary or warranted.
 I have dropped the dll import library concept.

 I would agree that Corinna's suggestion about WaitForSingleObject is
probably better, though I havent yet done it that way.

 My current approach is to keep track of the time accumulated
by each thread, and when it has exceeded the amount represented
by a profiling period, assign the tick to the current PC,
and subtract that amount of time from the running total for the
thread. So it always adds up, anyway.

 I still have it so that each thread that is to be profiled calls
moncontrol(1). Also, an application compiled and linked without
-pg could always use the profil call in a similar way.
Each thread would call profil with identical parameters.

To do the DLL's, I have added a linked list of profiling ranges
to profil.c. These ranges are specified using an environment
variable. The ranges may be DLL specific, or general memory ranges.
There is a separate data file output upon program termination
for each range, in addition to gmon.out.

 If a dll has not been stripped, gprof will use the data file
and the dll to output a flat profile, but without call counts though.
(At least this works with cygwin1.dll)

 I have written a simple utility to summarise the information
in these data files, giving flat addresses and CPU usage.

 Peter Garrone


-- 
__
Check out the latest SMS services @ http://www.linuxmail.org 
This allows you to send and receive SMS through your mailbox.


Powered by Outblaze

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-17 Thread Brian Ford
On Fri, 17 Oct 2003, peter garrone wrote:

 Hi Brian

Hi Peter.  Seems like a private conversation, doesn't it? :)

  Thanks very much for your comments.

You're welcome.

  I think I have changed my approach so that it is broadly similar to
 your suggestions, but may differ in some details.

  I have dropped the RNG. I dont think it is necessary or warranted.

I agree, especially in light of your new approach.

  I have dropped the dll import library concept.

Probably good.  Although, it would still be neat to figure out a way to
trace back to the application leaf functions.  I guess that will be an
exercise for later.

  I would agree that Corinna's suggestion about WaitForSingleObject is
 probably better, though I havent yet done it that way.

  My current approach is to keep track of the time accumulated
 by each thread, and when it has exceeded the amount represented
 by a profiling period, assign the tick to the current PC,
 and subtract that amount of time from the running total for the
 thread. So it always adds up, anyway.

I was going to suggest this when we were talking about partial ticks, but
I was worried about charging time to the wrong PC.  Looking back, this
still fits in fine with the PC sampling philosophy.

  I still have it so that each thread that is to be profiled calls
 moncontrol(1). Also, an application compiled and linked without
 -pg could always use the profil call in a similar way.
 Each thread would call profil with identical parameters.

It should be simple to add an all threads mode later if we want, so this
is fine.  Does the main thread still call moncontrol(1) when compiled with
-pg?  I would think this would be required.

 To do the DLL's, I have added a linked list of profiling ranges
 to profil.c. These ranges are specified using an environment
 variable. The ranges may be DLL specific, or general memory ranges.
 There is a separate data file output upon program termination
 for each range, in addition to gmon.out.

The linked list sounds reasonable.  I guess if there were too many
threads, something less linear would be better.  Does anyone have a
suggestion about how to find these address ranges automatically (at least
for non dynamically loaded DLLs)?

I assume the gmon.out file contains just the original program proper
memory range?

I really wish we could get someone on the binutils list interested in
helping to extend the gmon.out file format to contain multiple hashes.  We
would still need a method to map the ranges to the DLLs.  Determining the
DLLs should be easy unless they were dynamically loaded.  You really
should send at least a ping over there, but you're doing all the work, so
I'll shut up now.

  If a dll has not been stripped, gprof will use the data file
 and the dll to output a flat profile, but without call counts though.
 (At least this works with cygwin1.dll)

That sounds like it would be *really* usefull to cygwin developers!  This
concept was discussed before in the references I gave you, but never
pushed this far.

  I have written a simple utility to summarise the information
 in these data files, giving flat addresses and CPU usage.

This sounds like a useful inclusion.  Again, it would be more functional
if we could feed all this into gprof and get partial call graphs.

Even if no one else comments, I really appreciate all this work you're
doing!  Also, thanks for continuing to send me the updated patches.  I
wish I had more time to look over them in detail right now.  I'll try and
do that soon.  I assume it is ok to give an open invitation for anyone
else who would like to give the a whirl?

BTW, if you are interested in contributing this, please take a look at the
Before you get started section of http://cygwin.com/contrib.html since
the assignment process can take some time.

Again, great work from my point of view!

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
Phone: 314-551-8460
Fax:   314-551-8444

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-15 Thread peter garrone

Brian Ford wrote:

On Tue, 14 Oct 2003, peter garrone wrote:

 A list of active threads is maintained. A thread calling moncontrol(1) gets
 put in the list. When a call to SuspendThread fails, the thread is assumed
 to be defunct and taken off the list.

Seems reasonable.

I guess I was originally thinking just profile all threads all the
time, but I guess you method is more flexable.  That could
probably be an option somehow after you polish this up.

There is no other way to detect a dead thread. Nothing like atexit or
anything I could find anyway in the win32 api, of which my experience
is limited, and ordinary.


 One of the fields in the thread is a counter corresponding to the sum of cpu
 returned by GetThreadTimes. This function has fields corresponding to
 kernel cpu and user cpu. The amount of time consumed by every thread is
 saved.

 Generally only one thread will have consumed CPU. However to be general,
 and in case the profiling thread is inadvertently delayed, all threads are
 considered.

 There is a partial tick problem. Suppose that a thread has consumed say
 155% of the cpu time corresponding to a tick. I would assign one tick
 and use a local random number generator to assign an extra tick on
 average 55% of the time.

I'm getting lost here.

What is your tick definition?  A sampling interval?

Yes.
The amount of CPU usage corresponging to a loop of the profiling sampler.
The amount represented by incrementing the sample data by one.
The amount of time represented by (1.0/PROF_HZ)


How can a thread ever consume more that 100%?  I can see how two or more
threads might on a multi CPU system.

The profiling thread calls Sleep. That function is guaranteed to sleep for
at least its argument. It can always sleep longer. So the actual profiling interval
could be longer than the nominal profiling interval. In the meantime,
a thread could have consumed more than one interval worth of cpu.


I'm lost in the random number generator application too.



1) The rng algorithm itself

Its a linear congruential rng algorithm.
Given a 31 bit seed, do a 64 bit multiply by 69069,
add 1, and mask off the lower 31 bits. 
I dont think the application calls for anything special.
If it bothers you, call rand(); 
I dont have references for this, but I'm preety sure it has been
academically described and extensively tested. As an RNG,
it does have severe limitations but it is very fast.

2) Why use one? 

What else do you do if only a fraction of a tick
is used. If a thread has consumed say 70% of a complete ticks worth of cpu,
then roll the dice and 70% of the time assign a tick. I dont think I can
explain it better. It all averages out in the end, and its basically an
estimate anyway.

 I tried getting the program counter for all threads, but this was found
 not to work very well, consuming excessive cpu, on average 50 milliseconds.

I thought there might be an overhead issue.

Definitely. This was causing the scheduler to assign very long profile thread 
sleeptimes,
much longer than nominal,
and is the reason for all the probabilistic stuff. That scheduler is a bit sus.


Please do (come back with your results, that is).  I'm definately
interested.


Concerning the DLL import function profile assignment solution,
I can get it to work, but only for toy situations unfortunately.
If I create a simple test DLL, then it apparently works OK, giving about 70 
nanoseconds per
DLL function call, 7 seconds for 100 million, compared with
about 20 nanoseconds for a simple unoptimised profiled function call,
and some nice looking gprof output indeed.

However when I try things like the cygwin dll or kernel32,
I just get a segmentation violation on startup. I think there are some issues with the 
import
library format which I dont understand. I have changed the little bit of code found in
dlltool.c that goes in each import library function to cause a call to the profiler 
code,
which is always at a fixed address since I cant get ld to accept an extra relocation.
Also I might have some assembler issues. I am currently trashing eax and edx in the 
profiler
call, which C functions appear to do.

My cygwin installation is behaving rather oddly at the moment so it could be something 
to do with that.
I have a snapshot, and I have to have my own version of bash, compiled with that
snapshot unfortunately. gdb always has an application initialization
failure on startup. Setup hangs at the last. I think I'll try a new snapshot.

Any suggestions on this, by anyone at all, would be greatly appreciated.
 Cheers,
  Peter

-- 
__
Check out the latest SMS services @ http://www.linuxmail.org 
This allows you to send and receive SMS through your mailbox.


Powered by Outblaze

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-15 Thread Corinna Vinschen
On Wed, Oct 15, 2003 at 05:57:31PM +0800, peter garrone wrote:
 Brian Ford wrote:
 On Tue, 14 Oct 2003, peter garrone wrote:
  A list of active threads is maintained. A thread calling moncontrol(1) gets
  put in the list. When a call to SuspendThread fails, the thread is assumed
  to be defunct and taken off the list.
 
 Seems reasonable.
 
 I guess I was originally thinking just profile all threads all the
 time, but I guess you method is more flexable.  That could
 probably be an option somehow after you polish this up.
 
 There is no other way to detect a dead thread. Nothing like atexit or
 anything I could find anyway in the win32 api, of which my experience
 is limited, and ordinary.

WaitForSingleObject and friends.  A terminated thread is in signalled
state.

Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Developermailto:[EMAIL PROTECTED]
Red Hat, Inc.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-15 Thread Brian Ford
On Wed, 15 Oct 2003, peter garrone wrote:

 Brian Ford wrote:
 On Tue, 14 Oct 2003, peter garrone wrote:
  A list of active threads is maintained. A thread calling moncontrol(1) gets
  put in the list. When a call to SuspendThread fails, the thread is assumed
  to be defunct and taken off the list.
 
 Seems reasonable.
 
 I guess I was originally thinking just profile all threads all the
 time, but I guess your method is more flexable.  That could
 probably be an option somehow after you polish this up.

 There is no other way to detect a dead thread. Nothing like atexit or
 anything I could find anyway in the win32 api, of which my experience
 is limited, and ordinary.

Sorry, I wasn't disputing that, but it sounded like Corinna had a good
suggestion here.

I just thought it would be nice if compiling with -pg automatically
profiled all threads.  It is really just a policy decision on other OSs.

Maybe it could be configurable based upon an environment variable or
something.  In that case, you would need to look inside Cygwin for a list
of threads and a notification method for new ones.  Or maybe just
arrange for all Cygwin created threads to call moncontrol(1) in that
case.  Maybe Corinna has a good suggestion here as well.

I'm starting to get out of my leage/time commitment here.  I'm good at
making architecture suggestions, but I'm short on implimentation details.
:)

 What is your tick definition?  A sampling interval?

 Yes.
 The amount of CPU usage corresponging to a loop of the profiling sampler.
 The amount represented by incrementing the sample data by one.

These two are the same.

 The amount of time represented by (1.0/PROF_HZ)

This is absolute, and I see below how this is different.

 How can a thread ever consume more that 100%?  I can see how two or more
 threads might on a multi CPU system.

 The profiling thread calls Sleep. That function is guaranteed to sleep for
 at least its argument. It can always sleep longer. So the actual
 profiling interval could be longer than the nominal profiling interval.
 In the meantime, a thread could have consumed more than one interval
 worth of cpu.

Ok, I see.  I think most implimentations would just throw the partial tick
away.  It seems like trying to account for it gets way too messy,
especially since you only have one PC sample value.

 I'm lost in the random number generator application too.
 
 1) The rng algorithm itself

Sorry, I wasn't clear enough here too.  Thanks for the description, but I
really just wanted to know how you were applying it to partial ticks.

 2) Why use one?

 What else do you do if only a fraction of a tick
 is used. If a thread has consumed say 70% of a complete ticks worth of cpu,
 then roll the dice and 70% of the time assign a tick. I dont think I can
 explain it better. It all averages out in the end, and its basically an
 estimate anyway.

The sample is usually considered the smallest granularity possible since
it is the only thing you have a PC for.  You are getting sub atomic timing
data, but only have atomic units of assignment.

If you're sure it helps, great!  To me, it seems like you're trying to use
more digits than are significant.  I'd just take the thread that consumed
the most CPU's PC and give it the tick, but I'm lazy.

 Concerning the DLL import function profile assignment solution,
[snip]
 However when I try things like the cygwin dll
You can't replace the cygwin1.dll's import library.  It has magic
redirections and such in it.  You could modify it, though.

Sorry, the rest is beyond what I have time to think about.

 Setup hangs at the last.

Obviously, this is a normal state for some.  Don't run it from explorer
and you should be ok.

 Any suggestions on this, by anyone at all, would be greatly appreciated.

This might be more suited for cygwin-developers.  Have you tried to jump
through those hoops yet (if you're interested, that is)?

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
Phone: 314-551-8460
Fax:   314-551-8444

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-14 Thread Brian Ford
Sorry for the delay.  I've been swamped, both with the setup issue and
work.  I haven't had a chance to look at the actual patch you sent yet.

On Tue, 14 Oct 2003, peter garrone wrote:

 A list of active threads is maintained. A thread calling moncontrol(1) gets
 put in the list. When a call to SuspendThread fails, the thread is assumed
 to be defunct and taken off the list.

Seems reasonable.

I guess I was originally thinking just profile all threads all the
time, but I guess you method is more flexable.  That could
probably be an option somehow after you polish this up.

 One of the fields in the thread is a counter corresponding to the sum of cpu
 returned by GetThreadTimes. This function has fields corresponding to
 kernel cpu and user cpu. The amount of time consumed by every thread is
 saved.

 Generally only one thread will have consumed CPU. However to be general,
 and in case the profiling thread is inadvertently delayed, all threads are
 considered.

 There is a partial tick problem. Suppose that a thread has consumed say
 155% of the cpu time corresponding to a tick. I would assign one tick
 and use a local random number generator to assign an extra tick on
 average 55% of the time.

I'm getting lost here.

What is your tick definition?  A sampling interval?

How can a thread ever consume more that 100%?  I can see how two or more
threads might on a multi CPU system.

I'm lost in the random number generator application too.

 I tried getting the program counter for all threads, but this was found
 not to work very well, consuming excessive cpu, on average 50 milliseconds.

I thought there might be an overhead issue.

 All the other calls were of the order of 1 microsecond. However getting
 the program counter only for any thread that used cpu according to
 GetThreadTimes appeared to take about 50 microseconds.
 Generally of course only one thread will have used CPU. The function
 GetThreadContext is used to obtain the PC.

That doesn't sound too bad.

 Brian Ford wrote:
  I tried using a backtrace method to map the sampling time onto
  DLL leaf functions (the import stubs) once, but it did not seem possible
  to perfect.  Also, that is not always what you want.

 I would be interested if you would expand on this. Do you mean looking at
 the stack to find the calling function?

Yes.  All the way back into the application address space, and then
munging the address to assign it to the import stub.  Calls into the
Microsoft DLL's don't have frame pointer info, so the backtrace is
difficult, if not impossible.  I did have some success, though.

  But, if you want this to be usefull for the community at large, attacking
  the two points in the previous email directly would probably be more
  useful.  ie. Figure out a way to store the samples using a
  non-contiguous address space model, and modify gprof to load the symbol
  tables for the dependent DLLs (gdb does this to some extent).  Note that
  UNIX shared libraries have similar issues.  You may want to consult with
  [EMAIL PROTECTED] for a general solution since they own gprof.
 
 I am thinking of implementing a separate profil call so that it can be used
 simultaneously with -pg compilation and linking. Also a profile-dll call
 so that profiling of the space occupied by a dll would occur.

 My problem with profiling the entire dll address space is
 1) The necessity of recompiling dll's so that mapping and call counting
 is implemented

If you want call counts, there is never a way around that easily.
Mapping?

 2) The difficulty of doing anything with propriety dll's

Sure.

 3) The size and sparsity of the resulting gmon.out data file.

It really needs a different algorithm.  Maybe simply multiple gmon.outs in
one?

I have also seen just a recording algorithm without the hash that
stops when the buffer supplied is full.  That has limited use.

 So I thought I would try attacking the problem using the import libraries.
 Perhaps it is a silly idea, but if it could be made to work it avoids
 these problems.

I think it is a good idea.  I just don't understand or see the details
yet.  Too bad this method wouldn't help other shared library platforms,
though.  (No import libraries.)  Well, maybe it could.  You could probably
make the stub libs automatically and have them load the shared libs.  Not
sure of the details, again, though.

 If I can get it to work, I'll be back.

Please do (come back with your results, that is).  I'm definately
interested.

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
Phone: 314-551-8460
Fax:   314-551-8444

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-13 Thread peter garrone
Brian Ford wrote:
snipped prior discussion
2.) Paraphrasing, the UNIX profil call (that gprof.c is currently using),
has a contiguous flat address space model.  It hashes address samples over
that space into a buffer.  The starting and ending address are
automatically pulled from the executable and are in its address space.
DLLs are mapped outside this space non-contiguously.

4.) Paraphrasing, gprof doesn't know how to find and read the symbol
tables from DLLs linked into the executable.  I'm not even sure if the
addresses are deterministic.


As you have suggested, I have tried setting up a list of 
threads in profil.c, calling SuspendThread,
GetThreadTimes, to get timing information for all threads,
and to create a reasonably accurate profile for non-dll user space using gprof.

My plan now is to create new dll import libraries so that when these
dll functions are
called, a flag is set in the thread structure list, and the profiling thread
assigns cpu ticks against the statically linked small import functions, so that
hopefully gprof will pick it up and assign some sort of cpu usage and call
frequency count to all the functions in the import libraries.

If you can see any obvious pitfalls with this approach, I would be grateful.

-- 
__
Check out the latest SMS services @ http://www.linuxmail.org 
This allows you to send and receive SMS through your mailbox.


Powered by Outblaze

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-13 Thread Brian Ford
On Mon, 13 Oct 2003, peter garrone wrote:

 As you have suggested, I have tried setting up a list of
 threads in profil.c, calling SuspendThread,
 GetThreadTimes, to get timing information for all threads,
 and to create a reasonably accurate profile for non-dll user space
 using gprof.

Sounds good, although I'm not quite sure I understand the implementation.
What you really need to know is what thread was running just before the
sampling thread so that it can sample the correct thread's PC.  How are
you using GetThreadTimes for this?

 My plan now is to create new dll import libraries so that when these
 dll functions are called, a flag is set in the thread structure list,
 and the profiling thread assigns cpu ticks against the statically linked
 small import functions, so that hopefully gprof will pick it up and
 assign some sort of cpu usage and call frequency count to all the
 functions in the import libraries.

 If you can see any obvious pitfalls with this approach, I would be grateful.

Using a flag in a structure list sounds like you're asking for race
conditions with threads.

I tried using a backtrace method to map the sampling time onto
DLL leaf functions (the import stubs) once, but it did not seem possible
to perfect.  Also, that is not always what you want.

I don't have any good suggestions or pitfalls to point out.

But, if you want this to be usefull for the community at large, attacking
the two points in the previous email directly would probably be more
useful.  ie. Figure out a way to store the samples using a
non-contiguous address space model, and modify gprof to load the symbol
tables for the dependent DLLs (gdb does this to some extent).  Note that
UNIX shared libraries have similar issues.  You may want to consult with
[EMAIL PROTECTED] for a general solution since they own gprof.

If you're just doing this for your own use, go for it.

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
Phone: 314-551-8460
Fax:   314-551-8444

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-13 Thread peter garrone

- Original Message -
From: Brian Ford [EMAIL PROTECTED]
Date: Mon, 13 Oct 2003 14:36:34 -0500 (CDT)
To: peter garrone [EMAIL PROTECTED]
Subject: Re: Is multithreaded profiling on cygwin possible?

 On Mon, 13 Oct 2003, peter garrone wrote:
 
 
 Sounds good, although I'm not quite sure I understand the implementation.
 What you really need to know is what thread was running just before the
 sampling thread so that it can sample the correct thread's PC.  How are
 you using GetThreadTimes for this?

A list of active threads is maintained. A thread calling moncontrol(1) gets
put in the list. When a call to SuspendThread fails, the thread is assumed
to be defunct and taken off the list.

One of the fields in the thread is a counter corresponding to the sum of cpu
returned by GetThreadTimes. This function has fields corresponding to kernel cpu and
user cpu. The amount of time consumed by every thread is saved.

Generally only one thread will have consumed CPU. However to be general,
and in case the profiling thread is inadvertently delayed, all threads are
considered.

There is a partial tick problem. Suppose that a thread has consumed say
155% of the cpu time corresponding to a tick. I would assign one tick
and use a local random number generator to assign an extra tick on average 55% of the 
time.

I tried getting the program counter for all threads, but this was found not to work 
very well,
consuming excessive cpu, on average 50 milliseconds. All the other calls were of
the order of 1 microsecond. However getting the program counter only for any thread
that used cpu according to GetThreadTimes appeared to take about 50 microseconds.
Generally of course only one thread will have used CPU. The function GetThreadContext
is used to obtain the PC. 
 
 I tried using a backtrace method to map the sampling time onto
 DLL leaf functions (the import stubs) once, but it did not seem possible
 to perfect.  Also, that is not always what you want.

I would be interested if you would expand on this. Do you mean looking at
the stack to find the calling function?

 But, if you want this to be usefull for the community at large, attacking
 the two points in the previous email directly would probably be more
 useful.  ie. Figure out a way to store the samples using a
 non-contiguous address space model, and modify gprof to load the symbol
 tables for the dependent DLLs (gdb does this to some extent).  Note that
 UNIX shared libraries have similar issues.  You may want to consult with
 [EMAIL PROTECTED] for a general solution since they own gprof.
 

I am thinking of implementing a separate profil call so that it can be used
simultaneously with -pg compilation and linking. Also a profile-dll call
so that profiling of the space occupied by a dll would occur.

My problem with profiling the entire dll address space is 
1) The necessity of recompiling dll's so that mapping and call counting is implemented
2) The difficulty of doing anything with propriety dll's
3) The size and sparsity of the resulting gmon.out data file.

So I thought I would try attacking the problem using the import libraries.
Perhaps it is a silly idea, but if it could be made to work it avoids these problems.
If I can get it to work, I'll be back.

Thanks
-- 
__
Check out the latest SMS services @ http://www.linuxmail.org 
This allows you to send and receive SMS through your mailbox.


Powered by Outblaze

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-07 Thread Brian Ford
peter garrone wrote:

Brian Ford wrote:

peter garrone wrote:

Sorry for the delay, or the repeat information, my original reply is
lost.

No problem.

If I profile my multi-threaded application, it appears that only the
main thread is profiled.

Currently, yes.

Actually, I think I was only partially correct.
Time for the main thread is accumulated, but function calls
are counted for all threads. This creates misleading data.

True.  I primarily just use PC sampling and not call counts, so I forgot
about that part.

You can, however, profile other threads one at a time if you use
the gprof API's manually, called from the thread you want to profile.  I
have done this, but it has been too long for me to give you specific
instructions.  Have a look at profile.c, profile.[ch], gmon.[ch] in the
cygwin sources to see how its done.

Thanks very much, this advice is a great start.
I didnt see any way in the mcount function (winsup/cygwin/mcount.c)
to specify a particular thread. I did see the possibility of calling
moncontrol(1) to enable time accumulation for a particular thread,
and searching dejanews, noticed that this is a
recognised approach to multithreaded profiling.

Well, I might be able to devise a way to count only one thread's calls,
but it would be horrifically slow.

PTC
While you're there, it should be fairly trivial to create a patch that
at least loops through all Cygwin created pthreads in the sampler.  I
don't know if that kind of flat profile is what you wanted, though.

Sometimes per-thread profiling is useful, but a flat profile is what
I want for now. Not so much for optimisation, but porting. If a thread
is taking x% cpu on system 1 and y% cpu on system 2, then per-thread
profiling is useful. If the whole application is running much too slow,
then the flat profile is useful. I havent figured out how to get
per-thread cpu on cygwin yet anyway.

Flat profiles are usually what I want also.  For per thread cpu see:


snipped dll discussion
You commented that dll code is difficult to profile. Would you kindly
send me a few references to this, or keyword sets, my searching is blank.
I am aware of the profiling cygwin information, and assume you mean
extra to this.

Points 2 and 4 here are what I was referring to (note that they are
applicable to all DLLs, not just cygwin1.dll).

http://sources.redhat.com/ml/cygwin-patches/2002-q2/msg00206.html

I couldn't seem to dig up any more detail easily.

2.) Paraphrasing, the UNIX profil call (that gprof.c is currently using),
has a contiguous flat address space model.  It hashes address samples over
that space into a buffer.  The starting and ending address are
automatically pulled from the executable and are in its address space.
DLLs are mapped outside this space non-contiguously.

4.) Paraphrasing, gprof doesn't know how to find and read the symbol
tables from DLLs linked into the executable.  I'm not even sure if the
addresses are deterministic.

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
Phone: 314-551-8460
Fax:   314-551-8444

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-07 Thread Brian Ford
On Tue, 7 Oct 2003, Brian Ford wrote:

 Flat profiles are usually what I want also.  For per thread cpu see:

Sorry, I forgot the reference.  Here it is:

http://www.microsoft.com/windows2000/techinfo/reskit/tools/existing/pstat-o.asp
http://www.microsoft.com/windows2000/techinfo/reskit/tools/existing/qslice-o.asp

BTW, you might also look at SSP (the single step profiler), although it
too is horribly slow.

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
Phone: 314-551-8460
Fax:   314-551-8444

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-06 Thread peter garrone
Sorry for the delay, or the repeat information, my original reply is lost.

Brian Ford wrote:
peter garrone wrote:

If I profile my multi-threaded application, it appears that only the main
thread is profiled.

Currently, yes.

Actually, I think I was only partially correct. 
Time for the main thread is accumulated, but function calls
are counted for all threads. This creates misleading data.


You can, however, profile other threads one at a time if you use
the gprof API's manually, called from the thread you want to profile.  I
have done this, but it has been too long for me to give you specific
instructions.  Have a look at profile.c, profile.[ch], gmon.[ch] in the
cygwin sources to see how its done.

Thanks very much, this advice is a great start.
I didnt see any way in the mcount function (winsup/cygwin/mcount.c)
to specify a particular thread. I did see the possibility of calling
moncontrol(1) to enable time accumulation for a particular thread,
and searching dejanews, noticed that this is a
recognised approach to multithreaded profiling.


PTC
While you're there, it should be fairly trivial to create a patch that
at least loops through all Cygwin created pthreads in the sampler.  I
don't know if that kind of flat profile is what you wanted, though.

Sometimes per-thread profiling is useful, but a flat profile is what
I want for now. Not so much for optimisation, but porting. If a thread
is taking x% cpu on system 1 and y% cpu on system 2, then per-thread
profiling is useful. If the whole application is running much too slow,
then the flat profile is useful. I havent figured out how to get per-thread
cpu on cygwin yet anyway.

snipped dll discussion
You commented that dll code is difficult to profile. Would you kindly
send me a few references to this, or keyword sets, my searching is blank.
I am aware of the profiling cygwin information, and assume you mean
extra to this.

On linux, it is possible to save and set the virtual timer upon creation
of each thread, and thereby get a decent profile.
However the virtual timer is unavailable on cygwin, and I would imagine
that this approach is incorrect, due to differing thread models.

I've never profiled on Linux and I don't know anything about the virtual
timer you are refering to.  On Solaris, I get a nice flat profile of all
threads combined, like the implimentation I suggested above.  The same
shared library concerns exist there, but Solaris is good about providing
static profile enabled libs.

Sorry, I was incorrect. I meant by saving the profiling timer ITIMER_PROF before
thread creation and resetting after, in the thread, cpu profiling was possible.
Refer http://sam.zoy.org/writings/programming/gprof.html


Let me know if you want to discuss patch ideas.  I used to have a few, but
no priority time to work on them. :(

I am afraid that this email is the sum of my current knowledge about cygwin
profiling. But if I find out anything else, I will post it.


-- 
__
http://www.linuxmail.org/
Now with e-mail forwarding for only US$5.95/yr

Powered by Outblaze

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Is multithreaded profiling on cygwin possible?

2003-10-02 Thread Brian Ford
peter garrone wrote:

If I profile my multi-threaded application, it appears that only the main
thread is profiled.

Currently, yes.

You can, however, profile other threads one at a time if you use
the gprof API's manually, called from the thread you want to profile.  I
have done this, but it has been too long for me to give you specific
instructions.  Have a look at profile.c, profile.[ch], gmon.[ch] in the
cygwin sources to see how its done.

PTC
While you're there, it should be fairly trivial to create a patch that
at least loops through all Cygwin created pthreads in the sampler.  I
don't know if that kind of flat profile is what you wanted, though.

BTW, code in DLL's is difficult to profile because of the monolithic
segment view of the profiling hash.  Check the archives for a discussion
on this and possible work arounds if you are interested.

On linux, it is possible to save and set the virtual timer upon creation
of each thread, and thereby get a decent profile.
However the virtual timer is unavailable on cygwin, and I would imagine
that this approach is incorrect, due to differing thread models.

I've never profiled on Linux and I don't know anything about the virtual
timer you are refering to.  On Solaris, I get a nice flat profile of all
threads combined, like the implimentation I suggested above.  The same
shared library concerns exist there, but Solaris is good about providing
static profile enabled libs.

Let me know if you want to discuss patch ideas.  I used to have a few, but
no priority time to work on them. :(

-- 
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
Phone: 314-551-8460
Fax:   314-551-8444

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Is multithreaded profiling on cygwin possible?

2003-10-01 Thread peter garrone
Firstly, apologies for repeated postings to the gmane cygwin newsgroup, I thought they 
were bounced.

If I profile my multi-threaded application, it appears that only the main thread is 
profiled.

On linux, it is possible to save and set the virtual timer upon creation of each 
thread, and thereby get a decent profile.
However the virtual timer is unavailable on cygwin, and I would imagine that this 
approach is incorrect,
due to differing thread models.
-- 
__
http://www.linuxmail.org/
Now with e-mail forwarding for only US$5.95/yr

Powered by Outblaze

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/