Re: Is multithreaded profiling on cygwin possible?
Even if no one else comments, I really appreciate all this work you're doing! Also, thanks for continuing to send me the updated patches. I wish I had more time to look over them in detail right now. I'll try and do that soon. I assume it is ok to give an open invitation for anyone else who would like to give the a whirl? BTW, if you are interested in contributing this, please take a look at the Before you get started section of http://cygwin.com/contrib.html since the assignment process can take some time. Again, great work from my point of view! -- Brian Ford Senior Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International Phone: 314-551-8460 Fax: 314-551-8444 Thanks Brian. This message, I believe, gives the code upon which Brian's endorsement is based. Regarding the IP issues, there is no problem with my posting the code to the group, or with anyone using it, from my employers point of view. No warranty is implied etc. I have forwarded the small document that our legal people need to sign. Until then I would imagine that the code cannot be part of cygwin. This message contains the code for two files, profil.c, and gmraw.c. To use this code, replace winsup/cygwin/profil.c with the file profil.c here. Recompile cygwin. Either update the DLL, or directly link in the profil.o object file. - Compile and link with -pg. - If using multi-threaded code, each new thread needs to call moncontrol(1) upon creation. The main thread calls it automatically. - To profile DLL's, and general memory ranges, set the environment variable PROFILE_RANGE. This variable should consist of one or more colon separated entries. Each entry consists of a name, followed by optional comma separated fields representing the scale, offset, and size, in hex. It the name is the name of a DLL, then the entry is a DLL entry, else a general memory range. The scale field determines the profiling resolution. Divide 0x1 by the scale value to give the profiling resolution in bytes. For a DLL, the default scale is 1 giving a resolution of 1 byte. For a memory range, the default scale is such as to give 256 incrementing counters, or a scale of 1 if the range is too great for that. If the name is the name of a DLL, then the offset is with respect to the loaded DLL, otherwise it is a fixed address. The default offset value is zero. The size field defines the size of the address range to be profiled. If the range covers a DLL, then the default value is the size of the DLL less the offset. If the range is a general memory range, the default size is 0x8000. I use the following to profile all the DLL's my application links to, the entire memory range from 0 to 0x8000 (called general), and a small area where my application is spending a lot of CPU (called busy). The list of DLL's was obtained using cygcheck. export PROFILE_RANGE='general:busy,1,7ffe,1:cygX11-6.dll:cygcygipc-2.dll:cygwin1.dll:KERNEL32.dll:ntdll.dll:GDI32.dll:USER32.dll:ADVAPI32.dll:RPCRT4.dll:GLU32.DLL:msvcrt.dll:OPENGL32.dll:DDRAW.dll:DCIMAN32.dll:glut32.dll:WINMM.dll' Alternatively, the following command causes only the cygwin dll to be profiled. export PROFILE_RANGE='cygwin1.dll' - Execute your program. Upon program termination, besides gmon.out, a file in gmon.out format will be output for each range, with .gmonout appended to the name of each range, cygwin1.dll.gmonout etc. To profile the cygwin dll, provided you are using an unstripped dll, the command gprof /bin/cygwin1.dll cygwin1.dll.gmonout produces a flat profile. To see a memory range, or for dlls with no symbolic information, a utility called gmraw is provided. Compile this with gcc gmraw.c -o gmraw. The command gmraw -f general.gmonout for example provides information about that particular range. - WRT the interface. Calling moncontrol(0) at any point should terminate the profiling thread, and profiling of the memory ranges, but the data should still be saved upon program termination. It should be possible to use the profil function provided -pg compilation/linkage option is not used. Each thread should call profil with identical parameters. I have not tested this much. - File profil.c - /* profil.c -- win32 profil.c equivalent Copyright 1998, 1999, 2000, 2001 Red Hat, Inc. This file is part of Cygwin. This software is a copyrighted work licensed under the terms of the Cygwin license. Please consult the file CYGWIN_LICENSE for details. */ #include windows.h #include psapi.h #include stdio.h #include sys/types.h #include sys/stat.h #include unistd.h #include fcntl.h #include errno.h #include math.h #include profil.h #include gmon.h #include assert.h #define SLEEPTIME (1000 /
Re: Is multithreaded profiling on cygwin possible
Brian Ford wrote: On Fri, 17 Oct 2003, peter garrone wrote: I have dropped the dll import library concept. Probably good. Although, it would still be neat to figure out a way to trace back to the application leaf functions. I guess that will be an exercise for later. Unfortunately I dont have any problems that this approach would address. I am keeping a copy of my work so far, and would send it to anyone, but I have no current plans to continue with it. My current approach is to keep track of the time accumulated by each thread, and when it has exceeded the amount represented by a profiling period, assign the tick to the current PC, and subtract that amount of time from the running total for the thread. So it always adds up, anyway. I was going to suggest this when we were talking about partial ticks, but I was worried about charging time to the wrong PC. Looking back, this still fits in fine with the PC sampling philosophy. Yes. It does get a bit philosophical. The program being profiled must be regarded as a random process in itself, so adding an RNG only confuses the issue. I still have it so that each thread that is to be profiled calls moncontrol(1). Also, an application compiled and linked without -pg could always use the profil call in a similar way. Each thread would call profil with identical parameters. It should be simple to add an all threads mode later if we want, so this is fine. Does the main thread still call moncontrol(1) when compiled with -pg? I would think this would be required. Yes. It links in a special crt program that calls moncontrol(1). On linux, it is necessary to call getitimer/setitimer for each thread as discussed previously, and with this code on cygwin moncontrol(1). But this allows a lot of flexibility. To do the DLL's, I have added a linked list of profiling ranges to profil.c. These ranges are specified using an environment variable. The ranges may be DLL specific, or general memory ranges. There is a separate data file output upon program termination for each range, in addition to gmon.out. The linked list sounds reasonable. I guess if there were too many threads, something less linear would be better. Does anyone have a suggestion about how to find these address ranges automatically (at least for non dynamically loaded DLLs)? I assume the gmon.out file contains just the original program proper memory range? Yes, from text to etext, the static linked range. This is as per the current gmon.c file, which I am not proposing to change. If call counts were required, it is usually possible to link a cygwin dll function statically, and that would produce the call counts. I really wish we could get someone on the binutils list interested in helping to extend the gmon.out file format to contain multiple hashes. We would still need a method to map the ranges to the DLLs. Determining the DLLs should be easy unless they were dynamically loaded. You really should send at least a ping over there, but you're doing all the work, so I'll shut up now. As it pans out, it doesnt seem so necessary to me, because gprof profiles the un-stripped DLL's as they are. gprof does one thing well, profiling a BFD, which is sort of in line with the UNIX philosophy. All the profiling data exist as separate files for the various profiling ranges. This way solves my current profiling problems and gives me fine control. Why profile every DLL when you may only be interested in one? But it is absolutely essential to me to be able to profile selected memory ranges. If a dll has not been stripped, gprof will use the data file and the dll to output a flat profile, but without call counts though. (At least this works with cygwin1.dll) That sounds like it would be *really* usefull to cygwin developers! This concept was discussed before in the references I gave you, but never pushed this far. I must say that it was unforeseen by myself, and very easy, accidental almost. I had to save the profiling data for raw address ranges in some format, gmon.out being the logical one. That was all the work necessary. I wondered if gprof would work with that data, and it did. I hope this is standard for gprof, as in it reads any BFD with symbolic information, and not some kludge. I am not expert enough to know. I have written a simple utility to summarise the information in these data files, giving flat addresses and CPU usage. This sounds like a useful inclusion. Again, it would be more functional if we could feed all this into gprof and get partial call graphs. Sure, mayby a special flag to gprof saying to generate a flat raw output. The code is preety simple. Even if no one else comments, I really appreciate all this work you're doing! Also, thanks for continuing to send me the updated patches. I wish I had more time to look over them in detail right now. I'll try and do that soon. I assume it is ok to give an open invitation for anyone else who would
Re: Is multithreaded profiling on cygwin possible?
Hi Brian Thanks very much for your comments. I think I have changed my approach so that it is broadly similar to your suggestions, but may differ in some details. I have dropped the RNG. I dont think it is necessary or warranted. I have dropped the dll import library concept. I would agree that Corinna's suggestion about WaitForSingleObject is probably better, though I havent yet done it that way. My current approach is to keep track of the time accumulated by each thread, and when it has exceeded the amount represented by a profiling period, assign the tick to the current PC, and subtract that amount of time from the running total for the thread. So it always adds up, anyway. I still have it so that each thread that is to be profiled calls moncontrol(1). Also, an application compiled and linked without -pg could always use the profil call in a similar way. Each thread would call profil with identical parameters. To do the DLL's, I have added a linked list of profiling ranges to profil.c. These ranges are specified using an environment variable. The ranges may be DLL specific, or general memory ranges. There is a separate data file output upon program termination for each range, in addition to gmon.out. If a dll has not been stripped, gprof will use the data file and the dll to output a flat profile, but without call counts though. (At least this works with cygwin1.dll) I have written a simple utility to summarise the information in these data files, giving flat addresses and CPU usage. Peter Garrone -- __ Check out the latest SMS services @ http://www.linuxmail.org This allows you to send and receive SMS through your mailbox. Powered by Outblaze -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
On Fri, 17 Oct 2003, peter garrone wrote: Hi Brian Hi Peter. Seems like a private conversation, doesn't it? :) Thanks very much for your comments. You're welcome. I think I have changed my approach so that it is broadly similar to your suggestions, but may differ in some details. I have dropped the RNG. I dont think it is necessary or warranted. I agree, especially in light of your new approach. I have dropped the dll import library concept. Probably good. Although, it would still be neat to figure out a way to trace back to the application leaf functions. I guess that will be an exercise for later. I would agree that Corinna's suggestion about WaitForSingleObject is probably better, though I havent yet done it that way. My current approach is to keep track of the time accumulated by each thread, and when it has exceeded the amount represented by a profiling period, assign the tick to the current PC, and subtract that amount of time from the running total for the thread. So it always adds up, anyway. I was going to suggest this when we were talking about partial ticks, but I was worried about charging time to the wrong PC. Looking back, this still fits in fine with the PC sampling philosophy. I still have it so that each thread that is to be profiled calls moncontrol(1). Also, an application compiled and linked without -pg could always use the profil call in a similar way. Each thread would call profil with identical parameters. It should be simple to add an all threads mode later if we want, so this is fine. Does the main thread still call moncontrol(1) when compiled with -pg? I would think this would be required. To do the DLL's, I have added a linked list of profiling ranges to profil.c. These ranges are specified using an environment variable. The ranges may be DLL specific, or general memory ranges. There is a separate data file output upon program termination for each range, in addition to gmon.out. The linked list sounds reasonable. I guess if there were too many threads, something less linear would be better. Does anyone have a suggestion about how to find these address ranges automatically (at least for non dynamically loaded DLLs)? I assume the gmon.out file contains just the original program proper memory range? I really wish we could get someone on the binutils list interested in helping to extend the gmon.out file format to contain multiple hashes. We would still need a method to map the ranges to the DLLs. Determining the DLLs should be easy unless they were dynamically loaded. You really should send at least a ping over there, but you're doing all the work, so I'll shut up now. If a dll has not been stripped, gprof will use the data file and the dll to output a flat profile, but without call counts though. (At least this works with cygwin1.dll) That sounds like it would be *really* usefull to cygwin developers! This concept was discussed before in the references I gave you, but never pushed this far. I have written a simple utility to summarise the information in these data files, giving flat addresses and CPU usage. This sounds like a useful inclusion. Again, it would be more functional if we could feed all this into gprof and get partial call graphs. Even if no one else comments, I really appreciate all this work you're doing! Also, thanks for continuing to send me the updated patches. I wish I had more time to look over them in detail right now. I'll try and do that soon. I assume it is ok to give an open invitation for anyone else who would like to give the a whirl? BTW, if you are interested in contributing this, please take a look at the Before you get started section of http://cygwin.com/contrib.html since the assignment process can take some time. Again, great work from my point of view! -- Brian Ford Senior Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International Phone: 314-551-8460 Fax: 314-551-8444 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
Brian Ford wrote: On Tue, 14 Oct 2003, peter garrone wrote: A list of active threads is maintained. A thread calling moncontrol(1) gets put in the list. When a call to SuspendThread fails, the thread is assumed to be defunct and taken off the list. Seems reasonable. I guess I was originally thinking just profile all threads all the time, but I guess you method is more flexable. That could probably be an option somehow after you polish this up. There is no other way to detect a dead thread. Nothing like atexit or anything I could find anyway in the win32 api, of which my experience is limited, and ordinary. One of the fields in the thread is a counter corresponding to the sum of cpu returned by GetThreadTimes. This function has fields corresponding to kernel cpu and user cpu. The amount of time consumed by every thread is saved. Generally only one thread will have consumed CPU. However to be general, and in case the profiling thread is inadvertently delayed, all threads are considered. There is a partial tick problem. Suppose that a thread has consumed say 155% of the cpu time corresponding to a tick. I would assign one tick and use a local random number generator to assign an extra tick on average 55% of the time. I'm getting lost here. What is your tick definition? A sampling interval? Yes. The amount of CPU usage corresponging to a loop of the profiling sampler. The amount represented by incrementing the sample data by one. The amount of time represented by (1.0/PROF_HZ) How can a thread ever consume more that 100%? I can see how two or more threads might on a multi CPU system. The profiling thread calls Sleep. That function is guaranteed to sleep for at least its argument. It can always sleep longer. So the actual profiling interval could be longer than the nominal profiling interval. In the meantime, a thread could have consumed more than one interval worth of cpu. I'm lost in the random number generator application too. 1) The rng algorithm itself Its a linear congruential rng algorithm. Given a 31 bit seed, do a 64 bit multiply by 69069, add 1, and mask off the lower 31 bits. I dont think the application calls for anything special. If it bothers you, call rand(); I dont have references for this, but I'm preety sure it has been academically described and extensively tested. As an RNG, it does have severe limitations but it is very fast. 2) Why use one? What else do you do if only a fraction of a tick is used. If a thread has consumed say 70% of a complete ticks worth of cpu, then roll the dice and 70% of the time assign a tick. I dont think I can explain it better. It all averages out in the end, and its basically an estimate anyway. I tried getting the program counter for all threads, but this was found not to work very well, consuming excessive cpu, on average 50 milliseconds. I thought there might be an overhead issue. Definitely. This was causing the scheduler to assign very long profile thread sleeptimes, much longer than nominal, and is the reason for all the probabilistic stuff. That scheduler is a bit sus. Please do (come back with your results, that is). I'm definately interested. Concerning the DLL import function profile assignment solution, I can get it to work, but only for toy situations unfortunately. If I create a simple test DLL, then it apparently works OK, giving about 70 nanoseconds per DLL function call, 7 seconds for 100 million, compared with about 20 nanoseconds for a simple unoptimised profiled function call, and some nice looking gprof output indeed. However when I try things like the cygwin dll or kernel32, I just get a segmentation violation on startup. I think there are some issues with the import library format which I dont understand. I have changed the little bit of code found in dlltool.c that goes in each import library function to cause a call to the profiler code, which is always at a fixed address since I cant get ld to accept an extra relocation. Also I might have some assembler issues. I am currently trashing eax and edx in the profiler call, which C functions appear to do. My cygwin installation is behaving rather oddly at the moment so it could be something to do with that. I have a snapshot, and I have to have my own version of bash, compiled with that snapshot unfortunately. gdb always has an application initialization failure on startup. Setup hangs at the last. I think I'll try a new snapshot. Any suggestions on this, by anyone at all, would be greatly appreciated. Cheers, Peter -- __ Check out the latest SMS services @ http://www.linuxmail.org This allows you to send and receive SMS through your mailbox. Powered by Outblaze -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
On Wed, Oct 15, 2003 at 05:57:31PM +0800, peter garrone wrote: Brian Ford wrote: On Tue, 14 Oct 2003, peter garrone wrote: A list of active threads is maintained. A thread calling moncontrol(1) gets put in the list. When a call to SuspendThread fails, the thread is assumed to be defunct and taken off the list. Seems reasonable. I guess I was originally thinking just profile all threads all the time, but I guess you method is more flexable. That could probably be an option somehow after you polish this up. There is no other way to detect a dead thread. Nothing like atexit or anything I could find anyway in the win32 api, of which my experience is limited, and ordinary. WaitForSingleObject and friends. A terminated thread is in signalled state. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Developermailto:[EMAIL PROTECTED] Red Hat, Inc. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
On Wed, 15 Oct 2003, peter garrone wrote: Brian Ford wrote: On Tue, 14 Oct 2003, peter garrone wrote: A list of active threads is maintained. A thread calling moncontrol(1) gets put in the list. When a call to SuspendThread fails, the thread is assumed to be defunct and taken off the list. Seems reasonable. I guess I was originally thinking just profile all threads all the time, but I guess your method is more flexable. That could probably be an option somehow after you polish this up. There is no other way to detect a dead thread. Nothing like atexit or anything I could find anyway in the win32 api, of which my experience is limited, and ordinary. Sorry, I wasn't disputing that, but it sounded like Corinna had a good suggestion here. I just thought it would be nice if compiling with -pg automatically profiled all threads. It is really just a policy decision on other OSs. Maybe it could be configurable based upon an environment variable or something. In that case, you would need to look inside Cygwin for a list of threads and a notification method for new ones. Or maybe just arrange for all Cygwin created threads to call moncontrol(1) in that case. Maybe Corinna has a good suggestion here as well. I'm starting to get out of my leage/time commitment here. I'm good at making architecture suggestions, but I'm short on implimentation details. :) What is your tick definition? A sampling interval? Yes. The amount of CPU usage corresponging to a loop of the profiling sampler. The amount represented by incrementing the sample data by one. These two are the same. The amount of time represented by (1.0/PROF_HZ) This is absolute, and I see below how this is different. How can a thread ever consume more that 100%? I can see how two or more threads might on a multi CPU system. The profiling thread calls Sleep. That function is guaranteed to sleep for at least its argument. It can always sleep longer. So the actual profiling interval could be longer than the nominal profiling interval. In the meantime, a thread could have consumed more than one interval worth of cpu. Ok, I see. I think most implimentations would just throw the partial tick away. It seems like trying to account for it gets way too messy, especially since you only have one PC sample value. I'm lost in the random number generator application too. 1) The rng algorithm itself Sorry, I wasn't clear enough here too. Thanks for the description, but I really just wanted to know how you were applying it to partial ticks. 2) Why use one? What else do you do if only a fraction of a tick is used. If a thread has consumed say 70% of a complete ticks worth of cpu, then roll the dice and 70% of the time assign a tick. I dont think I can explain it better. It all averages out in the end, and its basically an estimate anyway. The sample is usually considered the smallest granularity possible since it is the only thing you have a PC for. You are getting sub atomic timing data, but only have atomic units of assignment. If you're sure it helps, great! To me, it seems like you're trying to use more digits than are significant. I'd just take the thread that consumed the most CPU's PC and give it the tick, but I'm lazy. Concerning the DLL import function profile assignment solution, [snip] However when I try things like the cygwin dll You can't replace the cygwin1.dll's import library. It has magic redirections and such in it. You could modify it, though. Sorry, the rest is beyond what I have time to think about. Setup hangs at the last. Obviously, this is a normal state for some. Don't run it from explorer and you should be ok. Any suggestions on this, by anyone at all, would be greatly appreciated. This might be more suited for cygwin-developers. Have you tried to jump through those hoops yet (if you're interested, that is)? -- Brian Ford Senior Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International Phone: 314-551-8460 Fax: 314-551-8444 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
Sorry for the delay. I've been swamped, both with the setup issue and work. I haven't had a chance to look at the actual patch you sent yet. On Tue, 14 Oct 2003, peter garrone wrote: A list of active threads is maintained. A thread calling moncontrol(1) gets put in the list. When a call to SuspendThread fails, the thread is assumed to be defunct and taken off the list. Seems reasonable. I guess I was originally thinking just profile all threads all the time, but I guess you method is more flexable. That could probably be an option somehow after you polish this up. One of the fields in the thread is a counter corresponding to the sum of cpu returned by GetThreadTimes. This function has fields corresponding to kernel cpu and user cpu. The amount of time consumed by every thread is saved. Generally only one thread will have consumed CPU. However to be general, and in case the profiling thread is inadvertently delayed, all threads are considered. There is a partial tick problem. Suppose that a thread has consumed say 155% of the cpu time corresponding to a tick. I would assign one tick and use a local random number generator to assign an extra tick on average 55% of the time. I'm getting lost here. What is your tick definition? A sampling interval? How can a thread ever consume more that 100%? I can see how two or more threads might on a multi CPU system. I'm lost in the random number generator application too. I tried getting the program counter for all threads, but this was found not to work very well, consuming excessive cpu, on average 50 milliseconds. I thought there might be an overhead issue. All the other calls were of the order of 1 microsecond. However getting the program counter only for any thread that used cpu according to GetThreadTimes appeared to take about 50 microseconds. Generally of course only one thread will have used CPU. The function GetThreadContext is used to obtain the PC. That doesn't sound too bad. Brian Ford wrote: I tried using a backtrace method to map the sampling time onto DLL leaf functions (the import stubs) once, but it did not seem possible to perfect. Also, that is not always what you want. I would be interested if you would expand on this. Do you mean looking at the stack to find the calling function? Yes. All the way back into the application address space, and then munging the address to assign it to the import stub. Calls into the Microsoft DLL's don't have frame pointer info, so the backtrace is difficult, if not impossible. I did have some success, though. But, if you want this to be usefull for the community at large, attacking the two points in the previous email directly would probably be more useful. ie. Figure out a way to store the samples using a non-contiguous address space model, and modify gprof to load the symbol tables for the dependent DLLs (gdb does this to some extent). Note that UNIX shared libraries have similar issues. You may want to consult with [EMAIL PROTECTED] for a general solution since they own gprof. I am thinking of implementing a separate profil call so that it can be used simultaneously with -pg compilation and linking. Also a profile-dll call so that profiling of the space occupied by a dll would occur. My problem with profiling the entire dll address space is 1) The necessity of recompiling dll's so that mapping and call counting is implemented If you want call counts, there is never a way around that easily. Mapping? 2) The difficulty of doing anything with propriety dll's Sure. 3) The size and sparsity of the resulting gmon.out data file. It really needs a different algorithm. Maybe simply multiple gmon.outs in one? I have also seen just a recording algorithm without the hash that stops when the buffer supplied is full. That has limited use. So I thought I would try attacking the problem using the import libraries. Perhaps it is a silly idea, but if it could be made to work it avoids these problems. I think it is a good idea. I just don't understand or see the details yet. Too bad this method wouldn't help other shared library platforms, though. (No import libraries.) Well, maybe it could. You could probably make the stub libs automatically and have them load the shared libs. Not sure of the details, again, though. If I can get it to work, I'll be back. Please do (come back with your results, that is). I'm definately interested. -- Brian Ford Senior Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International Phone: 314-551-8460 Fax: 314-551-8444 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
Brian Ford wrote: snipped prior discussion 2.) Paraphrasing, the UNIX profil call (that gprof.c is currently using), has a contiguous flat address space model. It hashes address samples over that space into a buffer. The starting and ending address are automatically pulled from the executable and are in its address space. DLLs are mapped outside this space non-contiguously. 4.) Paraphrasing, gprof doesn't know how to find and read the symbol tables from DLLs linked into the executable. I'm not even sure if the addresses are deterministic. As you have suggested, I have tried setting up a list of threads in profil.c, calling SuspendThread, GetThreadTimes, to get timing information for all threads, and to create a reasonably accurate profile for non-dll user space using gprof. My plan now is to create new dll import libraries so that when these dll functions are called, a flag is set in the thread structure list, and the profiling thread assigns cpu ticks against the statically linked small import functions, so that hopefully gprof will pick it up and assign some sort of cpu usage and call frequency count to all the functions in the import libraries. If you can see any obvious pitfalls with this approach, I would be grateful. -- __ Check out the latest SMS services @ http://www.linuxmail.org This allows you to send and receive SMS through your mailbox. Powered by Outblaze -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
On Mon, 13 Oct 2003, peter garrone wrote: As you have suggested, I have tried setting up a list of threads in profil.c, calling SuspendThread, GetThreadTimes, to get timing information for all threads, and to create a reasonably accurate profile for non-dll user space using gprof. Sounds good, although I'm not quite sure I understand the implementation. What you really need to know is what thread was running just before the sampling thread so that it can sample the correct thread's PC. How are you using GetThreadTimes for this? My plan now is to create new dll import libraries so that when these dll functions are called, a flag is set in the thread structure list, and the profiling thread assigns cpu ticks against the statically linked small import functions, so that hopefully gprof will pick it up and assign some sort of cpu usage and call frequency count to all the functions in the import libraries. If you can see any obvious pitfalls with this approach, I would be grateful. Using a flag in a structure list sounds like you're asking for race conditions with threads. I tried using a backtrace method to map the sampling time onto DLL leaf functions (the import stubs) once, but it did not seem possible to perfect. Also, that is not always what you want. I don't have any good suggestions or pitfalls to point out. But, if you want this to be usefull for the community at large, attacking the two points in the previous email directly would probably be more useful. ie. Figure out a way to store the samples using a non-contiguous address space model, and modify gprof to load the symbol tables for the dependent DLLs (gdb does this to some extent). Note that UNIX shared libraries have similar issues. You may want to consult with [EMAIL PROTECTED] for a general solution since they own gprof. If you're just doing this for your own use, go for it. -- Brian Ford Senior Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International Phone: 314-551-8460 Fax: 314-551-8444 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
- Original Message - From: Brian Ford [EMAIL PROTECTED] Date: Mon, 13 Oct 2003 14:36:34 -0500 (CDT) To: peter garrone [EMAIL PROTECTED] Subject: Re: Is multithreaded profiling on cygwin possible? On Mon, 13 Oct 2003, peter garrone wrote: Sounds good, although I'm not quite sure I understand the implementation. What you really need to know is what thread was running just before the sampling thread so that it can sample the correct thread's PC. How are you using GetThreadTimes for this? A list of active threads is maintained. A thread calling moncontrol(1) gets put in the list. When a call to SuspendThread fails, the thread is assumed to be defunct and taken off the list. One of the fields in the thread is a counter corresponding to the sum of cpu returned by GetThreadTimes. This function has fields corresponding to kernel cpu and user cpu. The amount of time consumed by every thread is saved. Generally only one thread will have consumed CPU. However to be general, and in case the profiling thread is inadvertently delayed, all threads are considered. There is a partial tick problem. Suppose that a thread has consumed say 155% of the cpu time corresponding to a tick. I would assign one tick and use a local random number generator to assign an extra tick on average 55% of the time. I tried getting the program counter for all threads, but this was found not to work very well, consuming excessive cpu, on average 50 milliseconds. All the other calls were of the order of 1 microsecond. However getting the program counter only for any thread that used cpu according to GetThreadTimes appeared to take about 50 microseconds. Generally of course only one thread will have used CPU. The function GetThreadContext is used to obtain the PC. I tried using a backtrace method to map the sampling time onto DLL leaf functions (the import stubs) once, but it did not seem possible to perfect. Also, that is not always what you want. I would be interested if you would expand on this. Do you mean looking at the stack to find the calling function? But, if you want this to be usefull for the community at large, attacking the two points in the previous email directly would probably be more useful. ie. Figure out a way to store the samples using a non-contiguous address space model, and modify gprof to load the symbol tables for the dependent DLLs (gdb does this to some extent). Note that UNIX shared libraries have similar issues. You may want to consult with [EMAIL PROTECTED] for a general solution since they own gprof. I am thinking of implementing a separate profil call so that it can be used simultaneously with -pg compilation and linking. Also a profile-dll call so that profiling of the space occupied by a dll would occur. My problem with profiling the entire dll address space is 1) The necessity of recompiling dll's so that mapping and call counting is implemented 2) The difficulty of doing anything with propriety dll's 3) The size and sparsity of the resulting gmon.out data file. So I thought I would try attacking the problem using the import libraries. Perhaps it is a silly idea, but if it could be made to work it avoids these problems. If I can get it to work, I'll be back. Thanks -- __ Check out the latest SMS services @ http://www.linuxmail.org This allows you to send and receive SMS through your mailbox. Powered by Outblaze -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
peter garrone wrote: Brian Ford wrote: peter garrone wrote: Sorry for the delay, or the repeat information, my original reply is lost. No problem. If I profile my multi-threaded application, it appears that only the main thread is profiled. Currently, yes. Actually, I think I was only partially correct. Time for the main thread is accumulated, but function calls are counted for all threads. This creates misleading data. True. I primarily just use PC sampling and not call counts, so I forgot about that part. You can, however, profile other threads one at a time if you use the gprof API's manually, called from the thread you want to profile. I have done this, but it has been too long for me to give you specific instructions. Have a look at profile.c, profile.[ch], gmon.[ch] in the cygwin sources to see how its done. Thanks very much, this advice is a great start. I didnt see any way in the mcount function (winsup/cygwin/mcount.c) to specify a particular thread. I did see the possibility of calling moncontrol(1) to enable time accumulation for a particular thread, and searching dejanews, noticed that this is a recognised approach to multithreaded profiling. Well, I might be able to devise a way to count only one thread's calls, but it would be horrifically slow. PTC While you're there, it should be fairly trivial to create a patch that at least loops through all Cygwin created pthreads in the sampler. I don't know if that kind of flat profile is what you wanted, though. Sometimes per-thread profiling is useful, but a flat profile is what I want for now. Not so much for optimisation, but porting. If a thread is taking x% cpu on system 1 and y% cpu on system 2, then per-thread profiling is useful. If the whole application is running much too slow, then the flat profile is useful. I havent figured out how to get per-thread cpu on cygwin yet anyway. Flat profiles are usually what I want also. For per thread cpu see: snipped dll discussion You commented that dll code is difficult to profile. Would you kindly send me a few references to this, or keyword sets, my searching is blank. I am aware of the profiling cygwin information, and assume you mean extra to this. Points 2 and 4 here are what I was referring to (note that they are applicable to all DLLs, not just cygwin1.dll). http://sources.redhat.com/ml/cygwin-patches/2002-q2/msg00206.html I couldn't seem to dig up any more detail easily. 2.) Paraphrasing, the UNIX profil call (that gprof.c is currently using), has a contiguous flat address space model. It hashes address samples over that space into a buffer. The starting and ending address are automatically pulled from the executable and are in its address space. DLLs are mapped outside this space non-contiguously. 4.) Paraphrasing, gprof doesn't know how to find and read the symbol tables from DLLs linked into the executable. I'm not even sure if the addresses are deterministic. -- Brian Ford Senior Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International Phone: 314-551-8460 Fax: 314-551-8444 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
On Tue, 7 Oct 2003, Brian Ford wrote: Flat profiles are usually what I want also. For per thread cpu see: Sorry, I forgot the reference. Here it is: http://www.microsoft.com/windows2000/techinfo/reskit/tools/existing/pstat-o.asp http://www.microsoft.com/windows2000/techinfo/reskit/tools/existing/qslice-o.asp BTW, you might also look at SSP (the single step profiler), although it too is horribly slow. -- Brian Ford Senior Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International Phone: 314-551-8460 Fax: 314-551-8444 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
Sorry for the delay, or the repeat information, my original reply is lost. Brian Ford wrote: peter garrone wrote: If I profile my multi-threaded application, it appears that only the main thread is profiled. Currently, yes. Actually, I think I was only partially correct. Time for the main thread is accumulated, but function calls are counted for all threads. This creates misleading data. You can, however, profile other threads one at a time if you use the gprof API's manually, called from the thread you want to profile. I have done this, but it has been too long for me to give you specific instructions. Have a look at profile.c, profile.[ch], gmon.[ch] in the cygwin sources to see how its done. Thanks very much, this advice is a great start. I didnt see any way in the mcount function (winsup/cygwin/mcount.c) to specify a particular thread. I did see the possibility of calling moncontrol(1) to enable time accumulation for a particular thread, and searching dejanews, noticed that this is a recognised approach to multithreaded profiling. PTC While you're there, it should be fairly trivial to create a patch that at least loops through all Cygwin created pthreads in the sampler. I don't know if that kind of flat profile is what you wanted, though. Sometimes per-thread profiling is useful, but a flat profile is what I want for now. Not so much for optimisation, but porting. If a thread is taking x% cpu on system 1 and y% cpu on system 2, then per-thread profiling is useful. If the whole application is running much too slow, then the flat profile is useful. I havent figured out how to get per-thread cpu on cygwin yet anyway. snipped dll discussion You commented that dll code is difficult to profile. Would you kindly send me a few references to this, or keyword sets, my searching is blank. I am aware of the profiling cygwin information, and assume you mean extra to this. On linux, it is possible to save and set the virtual timer upon creation of each thread, and thereby get a decent profile. However the virtual timer is unavailable on cygwin, and I would imagine that this approach is incorrect, due to differing thread models. I've never profiled on Linux and I don't know anything about the virtual timer you are refering to. On Solaris, I get a nice flat profile of all threads combined, like the implimentation I suggested above. The same shared library concerns exist there, but Solaris is good about providing static profile enabled libs. Sorry, I was incorrect. I meant by saving the profiling timer ITIMER_PROF before thread creation and resetting after, in the thread, cpu profiling was possible. Refer http://sam.zoy.org/writings/programming/gprof.html Let me know if you want to discuss patch ideas. I used to have a few, but no priority time to work on them. :( I am afraid that this email is the sum of my current knowledge about cygwin profiling. But if I find out anything else, I will post it. -- __ http://www.linuxmail.org/ Now with e-mail forwarding for only US$5.95/yr Powered by Outblaze -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: Is multithreaded profiling on cygwin possible?
peter garrone wrote: If I profile my multi-threaded application, it appears that only the main thread is profiled. Currently, yes. You can, however, profile other threads one at a time if you use the gprof API's manually, called from the thread you want to profile. I have done this, but it has been too long for me to give you specific instructions. Have a look at profile.c, profile.[ch], gmon.[ch] in the cygwin sources to see how its done. PTC While you're there, it should be fairly trivial to create a patch that at least loops through all Cygwin created pthreads in the sampler. I don't know if that kind of flat profile is what you wanted, though. BTW, code in DLL's is difficult to profile because of the monolithic segment view of the profiling hash. Check the archives for a discussion on this and possible work arounds if you are interested. On linux, it is possible to save and set the virtual timer upon creation of each thread, and thereby get a decent profile. However the virtual timer is unavailable on cygwin, and I would imagine that this approach is incorrect, due to differing thread models. I've never profiled on Linux and I don't know anything about the virtual timer you are refering to. On Solaris, I get a nice flat profile of all threads combined, like the implimentation I suggested above. The same shared library concerns exist there, but Solaris is good about providing static profile enabled libs. Let me know if you want to discuss patch ideas. I used to have a few, but no priority time to work on them. :( -- Brian Ford Senior Realtime Software Engineer VITAL - Visual Simulation Systems FlightSafety International Phone: 314-551-8460 Fax: 314-551-8444 -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Is multithreaded profiling on cygwin possible?
Firstly, apologies for repeated postings to the gmane cygwin newsgroup, I thought they were bounced. If I profile my multi-threaded application, it appears that only the main thread is profiled. On linux, it is possible to save and set the virtual timer upon creation of each thread, and thereby get a decent profile. However the virtual timer is unavailable on cygwin, and I would imagine that this approach is incorrect, due to differing thread models. -- __ http://www.linuxmail.org/ Now with e-mail forwarding for only US$5.95/yr Powered by Outblaze -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/