Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Fri, 06 Sep 2013, josef.p...@gmail.com wrote: > On Fri, Sep 6, 2013 at 3:21 PM, Yaroslav Halchenko > wrote: > > FWIW -- updated runs of the benchmarks are available at > > http://yarikoptic.github.io/numpy-vbench which now include also > > maintenance/1.8.x branch (no divergences were detected yet). There are > > only recent improvements as I see and no new (but some old ones are > > still there, some might be specific to my CPU here) performance > > regressions. > You would have enough data to add some quality control bands to the > charts (like cusum chart for example). > Then it would be possible to send a congratulation note or ring an > alarm bell without looking at all the plots. well -- I did cook up some basic "detector" but I believe I haven't adjusted it for multiple branches yet: http://yarikoptic.github.io/numpy-vbench/#benchmarks-performance-analysis you are welcome to introduce additional (or replacement) detection goodness http://github.com/yarikoptic/vbench/blob/HEAD/vbench/analysis.py and plotting is done here I believe: https://github.com/yarikoptic/vbench/blob/HEAD/vbench/benchmark.py#L155 -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Fri, 06 Sep 2013, Daπid wrote: > some old ones are > still there, some might be specific to my CPU here >How long does one run take? Maybe I can run it in my machine (Intel i5) >for comparison. In current configuration where I "target" benchmark run to around 200ms (thus possibly jumping up to 400ms) and thus 1-2 sec for 3 actual runs to figure out min among those -- on that elderly box it takes about a day to run "end of the day" commits (iirc around 400 of them) and then 3-4 days for a full run (all commits). I am not sure if targetting 200ms is of any benefit, as opposed to 100ms which would run twice faster. you are welcome to give it a shout right away http://github.com/yarikoptic/numpy-vbench it is still a bit ad-hoc and I also use additional shell wrapper to set cpu affinity (taskset -cp 1) and renice to -10 the benchmarking process. -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
FWIW -- updated runs of the benchmarks are available at http://yarikoptic.github.io/numpy-vbench which now include also maintenance/1.8.x branch (no divergences were detected yet). There are only recent improvements as I see and no new (but some old ones are still there, some might be specific to my CPU here) performance regressions. Cheers, -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Fri, Sep 6, 2013 at 1:21 PM, Yaroslav Halchenko wrote: > FWIW -- updated runs of the benchmarks are available at > http://yarikoptic.github.io/numpy-vbench which now include also > maintenance/1.8.x branch (no divergences were detected yet). There are > only recent improvements as I see and no new (but some old ones are > still there, some might be specific to my CPU here) performance > regressions. > > This work is really nice. Thank you Yaroslav. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Fri, Sep 6, 2013 at 3:21 PM, Yaroslav Halchenko wrote: > FWIW -- updated runs of the benchmarks are available at > http://yarikoptic.github.io/numpy-vbench which now include also > maintenance/1.8.x branch (no divergences were detected yet). There are > only recent improvements as I see and no new (but some old ones are > still there, some might be specific to my CPU here) performance > regressions. You would have enough data to add some quality control bands to the charts (like cusum chart for example). Then it would be possible to send a congratulation note or ring an alarm bell without looking at all the plots. Josef > > Cheers, > -- > Yaroslav O. Halchenko, Ph.D. > http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org > Senior Research Associate, Psychological and Brain Sciences Dept. > Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 > Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 > WWW: http://www.linkedin.com/in/yarik > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On 6 September 2013 21:21, Yaroslav Halchenko wrote: > some old ones are > still there, some might be specific to my CPU here > How long does one run take? Maybe I can run it in my machine (Intel i5) for comparison. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
and to put so far reported findings into some kind of automated form, please welcome http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis This is based on a simple 1-way anova of last 10 commits and some point in the past where 10 other commits had smallest timing and were significantly different from the last 10 commits. "Possible recent" is probably too noisy and not sure if useful -- it should point to a closest in time (to the latest commits) diff where a significant excursion from current performance was detected. So per se it has nothing to do with the initial detected performance hit, but in some cases seems still to reasonably locate commits hitting on performance. Enjoy, On Tue, 09 Jul 2013, Yaroslav Halchenko wrote: > Julian Taylor contributed some benchmarks he was "concerned" about, so > now the collection is even better. > I will keep updating tests on the same url: > http://www.onerussian.com/tmp/numpy-vbench/ > [it is now running and later I will upload with more commits for higher > temporal fidelity] > of particular interest for you might be: > some minor consistent recent losses in > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-float64 > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-float32 > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int16 > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int8 > seems have lost more than 25% of performance throughout the timeline > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#memcpy-int8 > "fast" calls to all/any seemed to be hurt twice in their life time now running > *3 times slower* than in 2011 -- inflection points correspond to regressions > and/or their fixes in those functions to bring back performance on "slow" > cases (when array traversal is needed, e.g. on arrays of zeros for any) > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-all-fast > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-fast > Enjoy > On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > > FWIW -- updated plots with contribution from Julian Taylor > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_indexing.html#mmap-slicing > > ;-) > > On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > > > Hi Guys, > > > not quite the recommendations you expressed, but here is my ugly > > > attempt to improve benchmarks coverage: > > > http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html > > > initially I also ran those ufunc benchmarks per each dtype separately, > > > but then resulting webpage is loong which brings my laptop on its knees > > > by firefox. So I commented those out for now, and left only "summary" > > > ones across multiple datatypes. > > > There is a bug in sphinx which forbids embedding some figures for > > > vb_random "as is", so pardon that for now... > > > I have not set cpu affinity of the process (but ran it at nice -10), so > > > may be > > > that also contributed to variance of benchmark estimates. And there > > > probably > > > could be more of goodies (e.g. gc control etc) to borrow from > > > https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which > > > I have > > > just discovered to minimize variance. > > > nothing really interesting was pin-pointed so far, besides that > > > - svd became a bit faster since few months back ;-) > > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html > > > - isnan (and isinf, isfinite) got improved > > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-isnan-a-10types > > > - right_shift got a miniscule slowdown from what it used to be? > > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-right-shift-a-a-3types > > > As before -- current code of those benchmarks collection is available > > > at http://github.com/yarikoptic/numpy-vbench/pull/new/master > > > if you have specific snippets you would like to benchmark -- just state > > > them > > > here or send a PR -- I will add them in. > > > Cheers, > > > On Tue, 07 May 2013, Daπid wrote: > > > > On 7 May 2013 13:47, Sebastian Berg wrote: > > > > > Indexing/assignment was the first thing I thought of too (also because > > > > > fancy indexing/assignment really could use some speedups...). Other > > > > > then > > > > > that maybe some timings for small arrays/scalar math, but that might > > > > > be > > > > > nice for that GSoC project. > > > > Why not going bigger? Ufunc operations on big arrays, CPU and memory > > > > bound. > > > > Also, what about interfacing with other packages? It may increase the > > > > compiling overhead, but I would like to see Cython in action (say, > > > > only last version, maybe it can be fixed). > > > > ___ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion@scipy
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
Julian Taylor contributed some benchmarks he was "concerned" about, so now the collection is even better. I will keep updating tests on the same url: http://www.onerussian.com/tmp/numpy-vbench/ [it is now running and later I will upload with more commits for higher temporal fidelity] of particular interest for you might be: some minor consistent recent losses in http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-float64 http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-float32 http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int16 http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int8 seems have lost more than 25% of performance throughout the timeline http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#memcpy-int8 "fast" calls to all/any seemed to be hurt twice in their life time now running *3 times slower* than in 2011 -- inflection points correspond to regressions and/or their fixes in those functions to bring back performance on "slow" cases (when array traversal is needed, e.g. on arrays of zeros for any) http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-all-fast http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-fast Enjoy On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > FWIW -- updated plots with contribution from Julian Taylor > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_indexing.html#mmap-slicing > ;-) > On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > > Hi Guys, > > not quite the recommendations you expressed, but here is my ugly > > attempt to improve benchmarks coverage: > > http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html > > initially I also ran those ufunc benchmarks per each dtype separately, > > but then resulting webpage is loong which brings my laptop on its knees > > by firefox. So I commented those out for now, and left only "summary" > > ones across multiple datatypes. > > There is a bug in sphinx which forbids embedding some figures for > > vb_random "as is", so pardon that for now... > > I have not set cpu affinity of the process (but ran it at nice -10), so > > may be > > that also contributed to variance of benchmark estimates. And there > > probably > > could be more of goodies (e.g. gc control etc) to borrow from > > https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I > > have > > just discovered to minimize variance. > > nothing really interesting was pin-pointed so far, besides that > > - svd became a bit faster since few months back ;-) > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html > > - isnan (and isinf, isfinite) got improved > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-isnan-a-10types > > - right_shift got a miniscule slowdown from what it used to be? > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-right-shift-a-a-3types > > As before -- current code of those benchmarks collection is available > > at http://github.com/yarikoptic/numpy-vbench/pull/new/master > > if you have specific snippets you would like to benchmark -- just state them > > here or send a PR -- I will add them in. > > Cheers, > > On Tue, 07 May 2013, Daπid wrote: > > > On 7 May 2013 13:47, Sebastian Berg wrote: > > > > Indexing/assignment was the first thing I thought of too (also because > > > > fancy indexing/assignment really could use some speedups...). Other then > > > > that maybe some timings for small arrays/scalar math, but that might be > > > > nice for that GSoC project. > > > Why not going bigger? Ufunc operations on big arrays, CPU and memory > > > bound. > > > Also, what about interfacing with other packages? It may increase the > > > compiling overhead, but I would like to see Cython in action (say, > > > only last version, maybe it can be fixed). > > > ___ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
FWIW -- updated plots with contribution from Julian Taylor http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_indexing.html#mmap-slicing ;-) On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > Hi Guys, > not quite the recommendations you expressed, but here is my ugly > attempt to improve benchmarks coverage: > http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html > initially I also ran those ufunc benchmarks per each dtype separately, > but then resulting webpage is loong which brings my laptop on its knees > by firefox. So I commented those out for now, and left only "summary" > ones across multiple datatypes. > There is a bug in sphinx which forbids embedding some figures for > vb_random "as is", so pardon that for now... > I have not set cpu affinity of the process (but ran it at nice -10), so may > be > that also contributed to variance of benchmark estimates. And there probably > could be more of goodies (e.g. gc control etc) to borrow from > https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I > have > just discovered to minimize variance. > nothing really interesting was pin-pointed so far, besides that > - svd became a bit faster since few months back ;-) > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html > - isnan (and isinf, isfinite) got improved > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-isnan-a-10types > - right_shift got a miniscule slowdown from what it used to be? > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-right-shift-a-a-3types > As before -- current code of those benchmarks collection is available > at http://github.com/yarikoptic/numpy-vbench/pull/new/master > if you have specific snippets you would like to benchmark -- just state them > here or send a PR -- I will add them in. > Cheers, > On Tue, 07 May 2013, Daπid wrote: > > On 7 May 2013 13:47, Sebastian Berg wrote: > > > Indexing/assignment was the first thing I thought of too (also because > > > fancy indexing/assignment really could use some speedups...). Other then > > > that maybe some timings for small arrays/scalar math, but that might be > > > nice for that GSoC project. > > Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. > > Also, what about interfacing with other packages? It may increase the > > compiling overhead, but I would like to see Cython in action (say, > > only last version, maybe it can be fixed). > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
Hi Guys, not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage: http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html initially I also ran those ufunc benchmarks per each dtype separately, but then resulting webpage is loong which brings my laptop on its knees by firefox. So I commented those out for now, and left only "summary" ones across multiple datatypes. There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now... I have not set cpu affinity of the process (but ran it at nice -10), so may be that also contributed to variance of benchmark estimates. And there probably could be more of goodies (e.g. gc control etc) to borrow from https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I have just discovered to minimize variance. nothing really interesting was pin-pointed so far, besides that - svd became a bit faster since few months back ;-) http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html - isnan (and isinf, isfinite) got improved http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-isnan-a-10types - right_shift got a miniscule slowdown from what it used to be? http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-right-shift-a-a-3types As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in. Cheers, On Tue, 07 May 2013, Daπid wrote: > On 7 May 2013 13:47, Sebastian Berg wrote: > > Indexing/assignment was the first thing I thought of too (also because > > fancy indexing/assignment really could use some speedups...). Other then > > that maybe some timings for small arrays/scalar math, but that might be > > nice for that GSoC project. > Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. > Also, what about interfacing with other packages? It may increase the > compiling overhead, but I would like to see Cython in action (say, > only last version, maybe it can be fixed). > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On 7 May 2013 13:47, Sebastian Berg wrote: > Indexing/assignment was the first thing I thought of too (also because > fancy indexing/assignment really could use some speedups...). Other then > that maybe some timings for small arrays/scalar math, but that might be > nice for that GSoC project. Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. Also, what about interfacing with other packages? It may increase the compiling overhead, but I would like to see Cython in action (say, only last version, maybe it can be fixed). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Mon, 2013-05-06 at 12:11 -0400, Yaroslav Halchenko wrote: > On Mon, 06 May 2013, Sebastian Berg wrote: > > > > if you care to tune it up/extend and then I could fire it up again on > > > that box (which doesn't do anything else ATM AFAIK). Since majority of > > > time is spent actually building it (did it with ccache though) it would > > > be neat if you come up with more of benchmarks to run which you might > > > think could be interesting/important. > > > I think this is pretty cool! Probably would be a while until there are > > many tests, but if you or someone could set such thing up it could > > slowly grow when larger code changes are done? > > that is the idea but it would be nice to gather such simple > benchmark-tests. if you could hint on the numpy functionality you think > especially worth benchmarking (I know -- there is a lot of things > which could be set to be benchmarked) -- that would be a nice starting > point: just list functionality/functions you consider of primary > interest. and either it is worth testing for different types or just a > gross estimate (e.g. for the selection of types in a loop) > > As for myself -- I guess I will add fancy indexing and slicing tests. > Indexing/assignment was the first thing I thought of too (also because fancy indexing/assignment really could use some speedups...). Other then that maybe some timings for small arrays/scalar math, but that might be nice for that GSoC project. Maybe array creation functions, just to see if performance bugs should sneak into something that central. But can't think of something else that isn't specific functionality. - Sebastian > Adding them is quite easy: have a look at > https://github.com/yarikoptic/numpy-vbench/blob/master/vb_reduce.py > which is actually a bit more cumbersome because of running them for > different types. > This one is more obvious: > https://github.com/yarikoptic/numpy-vbench/blob/master/vb_io.py > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Mon, 06 May 2013, Sebastian Berg wrote: > > if you care to tune it up/extend and then I could fire it up again on > > that box (which doesn't do anything else ATM AFAIK). Since majority of > > time is spent actually building it (did it with ccache though) it would > > be neat if you come up with more of benchmarks to run which you might > > think could be interesting/important. > I think this is pretty cool! Probably would be a while until there are > many tests, but if you or someone could set such thing up it could > slowly grow when larger code changes are done? that is the idea but it would be nice to gather such simple benchmark-tests. if you could hint on the numpy functionality you think especially worth benchmarking (I know -- there is a lot of things which could be set to be benchmarked) -- that would be a nice starting point: just list functionality/functions you consider of primary interest. and either it is worth testing for different types or just a gross estimate (e.g. for the selection of types in a loop) As for myself -- I guess I will add fancy indexing and slicing tests. Adding them is quite easy: have a look at https://github.com/yarikoptic/numpy-vbench/blob/master/vb_reduce.py which is actually a bit more cumbersome because of running them for different types. This one is more obvious: https://github.com/yarikoptic/numpy-vbench/blob/master/vb_io.py -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Mon, 2013-05-06 at 10:32 -0400, Yaroslav Halchenko wrote: > On Wed, 01 May 2013, Sebastian Berg wrote: > > > btw -- is there something like panda's vbench for numpy? i.e. where > > > it would be possible to track/visualize such performance > > > improvements/hits? > > > > Sorry if it seemed harsh, but only skimmed mails and it seemed a bit > > like the an obvious piece was missing... There are no benchmark tests I > > am aware of. You can try: > > > a = np.random.random((1000, 1000)) > > > and then time a.sum(1) and a.sum(0), on 1.7. the fast axis (1), is only > > slightly faster then the sum over the slow axis. On earlier numpy > > versions you will probably see something like half the speed for the > > slow axis (only got ancient or 1.7 numpy right now, so reluctant to give > > exact timings). > > FWIW -- just as a cruel first attempt look at > > http://www.onerussian.com/tmp/numpy-vbench-20130506/vb_vb_reduce.html > > why float16 case is so special? Float16 is special, it is cpu-bound -- not memory bound as most reductions -- because it is not a native type. First thought it was weird, but it actually makes sense, if you have a and b as float16: a + b is actually more like (I believe...): float16(float32(a) + float32(b)) This means there is type casting going on *inside* the ufunc! Normally casting is handled outside the ufunc (by the buffered iterator). Now I did not check, but when the iteration order is not optimized, the ufunc *can* simplify this to something similar to this (along the reduction axis): result = float32(a[0]) for i in xrange(a[1:]): result += float32(a.next()) return float16(result) While for "optimized" iteration order, this cannot happen because the intermediate result is always written back. This means for optimized iteration order a single conversion to float is necessary (in the inner loop), while for unoptimized iteration order two conversions to float and one back is done. Since this conversion is costly, the memory throughput is actually not important (no gain from buffering). This leads to the visible slowdown. This is of course a bit annoying, but not sure how you would solve it (Have the dtype signal that it doesn't even want iteration order optimization? Try to do move those weird float16 conversations from the ufunc to the iterator somehow?). > > I have pushed this really coarse setup (based on some elderly copy of > pandas' vbench) to > https://github.com/yarikoptic/numpy-vbench > > if you care to tune it up/extend and then I could fire it up again on > that box (which doesn't do anything else ATM AFAIK). Since majority of > time is spent actually building it (did it with ccache though) it would > be neat if you come up with more of benchmarks to run which you might > think could be interesting/important. > I think this is pretty cool! Probably would be a while until there are many tests, but if you or someone could set such thing up it could slowly grow when larger code changes are done? Regards, Sebastian ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Mon, May 6, 2013 at 10:32 AM, Yaroslav Halchenko wrote: > > On Wed, 01 May 2013, Sebastian Berg wrote: >> > btw -- is there something like panda's vbench for numpy? i.e. where >> > it would be possible to track/visualize such performance >> > improvements/hits? > > >> Sorry if it seemed harsh, but only skimmed mails and it seemed a bit >> like the an obvious piece was missing... There are no benchmark tests I >> am aware of. You can try: > >> a = np.random.random((1000, 1000)) > >> and then time a.sum(1) and a.sum(0), on 1.7. the fast axis (1), is only >> slightly faster then the sum over the slow axis. On earlier numpy >> versions you will probably see something like half the speed for the >> slow axis (only got ancient or 1.7 numpy right now, so reluctant to give >> exact timings). > > FWIW -- just as a cruel first attempt look at > > http://www.onerussian.com/tmp/numpy-vbench-20130506/vb_vb_reduce.html > > why float16 case is so special? > > I have pushed this really coarse setup (based on some elderly copy of > pandas' vbench) to > https://github.com/yarikoptic/numpy-vbench > > if you care to tune it up/extend and then I could fire it up again on > that box (which doesn't do anything else ATM AFAIK). Since majority of > time is spent actually building it (did it with ccache though) it would > be neat if you come up with more of benchmarks to run which you might > think could be interesting/important. nice results Thanks Yaroslav, Josef my default: axis=0 > > -- > Yaroslav O. Halchenko, Ph.D. > http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org > Senior Research Associate, Psychological and Brain Sciences Dept. > Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 > Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 > WWW: http://www.linkedin.com/in/yarik > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
On Wed, 01 May 2013, Sebastian Berg wrote: > > btw -- is there something like panda's vbench for numpy? i.e. where > > it would be possible to track/visualize such performance > > improvements/hits? > Sorry if it seemed harsh, but only skimmed mails and it seemed a bit > like the an obvious piece was missing... There are no benchmark tests I > am aware of. You can try: > a = np.random.random((1000, 1000)) > and then time a.sum(1) and a.sum(0), on 1.7. the fast axis (1), is only > slightly faster then the sum over the slow axis. On earlier numpy > versions you will probably see something like half the speed for the > slow axis (only got ancient or 1.7 numpy right now, so reluctant to give > exact timings). FWIW -- just as a cruel first attempt look at http://www.onerussian.com/tmp/numpy-vbench-20130506/vb_vb_reduce.html why float16 case is so special? I have pushed this really coarse setup (based on some elderly copy of pandas' vbench) to https://github.com/yarikoptic/numpy-vbench if you care to tune it up/extend and then I could fire it up again on that box (which doesn't do anything else ATM AFAIK). Since majority of time is spent actually building it (did it with ccache though) it would be neat if you come up with more of benchmarks to run which you might think could be interesting/important. -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion