On Mon, Nov 23, 2015 at 2:13 PM, Michael Niedermayer <michae...@gmx.at> wrote: > On Mon, Nov 23, 2015 at 01:57:24PM -0500, Ganesh Ajjanagadde wrote: >> On Mon, Nov 23, 2015 at 1:02 PM, Michael Niedermayer <michae...@gmx.at> >> wrote: >> > On Mon, Nov 23, 2015 at 12:43:52PM -0500, Ganesh Ajjanagadde wrote: >> >> On Sun, Nov 22, 2015 at 3:56 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >> >> wrote: >> >> > On Sun, Nov 22, 2015 at 3:07 PM, Michael Niedermayer <michae...@gmx.at> >> >> > wrote: >> >> >> On Sun, Nov 22, 2015 at 12:05:49PM -0500, Ganesh Ajjanagadde wrote: >> >> >>> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >> >> >>> --- >> >> >>> libavfilter/vsrc_mandelbrot.c | 2 +- >> >> >>> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> >>> >> >> >>> diff --git a/libavfilter/vsrc_mandelbrot.c >> >> >>> b/libavfilter/vsrc_mandelbrot.c >> >> >>> index 950c5c8..a0c101e 100644 >> >> >>> --- a/libavfilter/vsrc_mandelbrot.c >> >> >>> +++ b/libavfilter/vsrc_mandelbrot.c >> >> >>> @@ -291,7 +291,7 @@ static void draw_mandelbrot(AVFilterContext *ctx, >> >> >>> uint32_t *color, int linesize, >> >> >>> >> >> >>> use_zyklus= (x==0 || s->inner!=BLACK ||color[x-1 + >> >> >>> y*linesize] == 0xFF000000); >> >> >>> if(use_zyklus) >> >> >>> - epsilon= scale*1*sqrt(SQR(x-s->w/2) + >> >> >>> SQR(y-s->h/2))/s->w; >> >> >>> + epsilon= scale*hypot(x-s->w/2, y-s->h/2)/s->w; >> >> >> >> >> >> old: >> >> >> 704 decicycles in hypo, 1048570 runs, 6 skips >> >> >> >> >> >> new: >> >> >> 1075 decicycles in hypo, 1048566 runs, 10 skips >> >> >> >> >> >> that is from START/STOP_TIMER over hypot() >> >> >> >> >> >> the code is speed relevant as its executed per pixel >> >> > >> >> > Thanks for testing. Looking more closely, I see no reason for >> >> > expensive sqrt calls anyway: one can simply square both sides; it >> >> > should be cheaper. Will rework, post benchmark if it is indeed faster >> >> > and does not suffer from floating point overflow, else will simply >> >> > push a trivial removal of the "1". >> >> >> >> It seems like getting rid of the sqrt altogether has a very slight >> >> positive impact (if any at all). I can post the patch, but would like >> >> to know what to benchmark. There are numerous choices, e.g >> >> draw_mandelbrot as a whole, the outer loop, or the inner loop. >> >> I personally think the inner x loop (lines 268-388) is a good place to >> >> look at, since the difference is very small anyway, and further >> >> localization is impossible. >> > > >> > please post the patch >> >> bench posted first to see if it is considered interesting enough. >> Bench over whole draw_mandelbrot using START/STOP timer on x86-64, >> Haswell, GNU/Linux, command line: >> ffmpeg -v error -f lavfi -i mandelbrot -f null - >> new (draw_mandelbrot): > [...] >> 20857881401 decicycles in draw_mandelbrot, 1024 runs, 0 skips >> >> old (draw_mandelbrot): > [...] >> 21393227201 decicycles in draw_mandelbrot, 1024 runs, 0 skips > > if this is consistent over several tries then its interresting
There is a reason why I am posting a full vector, since it is very hard to judge. I ran for a longer duration below. I do see a downward trend, but unfortunately the magnitude of the effect is unclear. Furthermore, there seem to be runtime variations in the actual numbers compared to the previous run, though they ran on the same hardware. I did not use any fancy tricks like core pinning etc, which could have helped in ensuring minimal background task interference. BTW, this filter is terribly slow as it zooms in, together with a bunch of messages at the info level "Mandelbrot cache is too small!" that do not seem very user friendly to me. old (draw_mandelbrot): 2232680340 decicycles in draw_mandelbrot, 1 runs, 0 skips 1842048190 decicycles in draw_mandelbrot, 2 runs, 0 skips 1674804840 decicycles in draw_mandelbrot, 4 runs, 0 skips 1698806217 decicycles in draw_mandelbrot, 8 runs, 0 skips 1854012313 decicycles in draw_mandelbrot, 16 runs, 0 skips 2064778166 decicycles in draw_mandelbrot, 32 runs, 0 skips 2414843681 decicycles in draw_mandelbrot, 64 runs, 0 skips 3099993554 decicycles in draw_mandelbrot, 128 runs, 0 skips 3982389425 decicycles in draw_mandelbrot, 256 runs, 0 skips 7634221782 decicycles in draw_mandelbrot, 512 runs, 0 skips 20576449397 decicycles in draw_mandelbrot, 1024 runs, 0 skips 12949998655 decicycles in draw_mandelbrot, 2048 runs, 0 skips new (draw_mandelbrot): 2177824300 decicycles in draw_mandelbrot, 1 runs, 0 skips 1766861190 decicycles in draw_mandelbrot, 2 runs, 0 skips 1586299055 decicycles in draw_mandelbrot, 4 runs, 0 skips 1658036837 decicycles in draw_mandelbrot, 8 runs, 0 skips 1836125036 decicycles in draw_mandelbrot, 16 runs, 0 skips 2058982311 decicycles in draw_mandelbrot, 32 runs, 0 skips 2423381281 decicycles in draw_mandelbrot, 64 runs, 0 skips 3066657833 decicycles in draw_mandelbrot, 128 runs, 0 skips 3966406060 decicycles in draw_mandelbrot, 256 runs, 0 skips 7553322112 decicycles in draw_mandelbrot, 512 runs, 0 skips 20454169970 decicycles in draw_mandelbrot, 1024 runs, 0 skips 12822228615 decicycles in draw_mandelbrot, 2048 runs, 0 skips > > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > No snowflake in an avalanche ever feels responsible. -- Voltaire > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel