For the sake of completeness, I don't think I ever mentioned what I used to profile when I was working on speeding up the scalars. I used AQTime 7. It is commercial and only for Windows (as far as I know). It works great and it gave me fairly accurate timings and all sorts of visual navigation features. I do have to mock around with the numpy code every time I want to compile it to get it to play nicely with Visual Studio to generate the proper bindings for the profiler.
Raul On 02/05/2013 7:14 AM, Nathaniel Smith wrote: > On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkve...@iitrpr.ac.in> wrote: >> Yes, we need to ensure that.. >> Code generator can be made, which can create code for table of registered >> dtype during build time itself. > I'd probably just generate it at run-time on an as-needed basis. > (I.e., use the full lookup logic the first time, then save the > result.) New dtypes can be registered, which will mean the tables need > to change size at runtime anyway. If someone does some strange thing > like add float16's and float64's, we can do the lookup to determine > that this should be handled by the float64/float64 loop, and then > store that information so that the next time it's fast (but we > probably don't want to be calculating all combinations at build-time, > which would require running the full type resolution machinery, esp. > since it wouldn't really bring any benefits that I can see). > > * Re: the profiling, I wrote a full oprofile->callgrind format script > years ago: http://vorpus.org/~njs/op2calltree.py > Haven't used it in years either but neither oprofile nor kcachegrind > are terribly fast-moving projects so it's probably still working, or > could be made so without much work. > Or easier is to use the gperftools CPU profiler: > https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html > > Instead of linking to it at build time, you can just use ctypes: > > In [7]: profiler = ctypes.CDLL("libprofiler.so.0") > > In [8]: profiler.ProfilerStart("some-file-name-here") > Out[8]: 1 > > In [9]: # do stuff here > > In [10]: profiler.ProfilerStop() > PROFILE: interrupts/evictions/bytes = 2/0/592 > Out[10]: 46 > > Then all the pprof analysis tools are available as described on that webpage. > > * Please don't trust those random suggestions for possible > improvements I threw out when writing the original description. > Probably it's true that FP flag checking and ufunc type lookup are > expensive, but one should fix what the profile says to fix, not what > someone guessed might be good to fix based on a few minutes thought. > > * Instead of making a giant table of everything that needs to be done > to make stuff fast first, before writing any code, I'd suggest picking > one operation, figuring out what change would be the biggest > improvement for it, making that change, checking that it worked, and > then repeat until that operation is really fast. Then if there's still > time pick another operation. Producing a giant todo list isn't very > productive by itself if there's no time then to actually do all the > things on the list :-). > > * Did you notice this line on the requirements page? "Having your > first pull request merged before the GSoC application deadline (May 3) > is required for your application to be accepted." > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion