On Wed, Jul 15, 2015 at 9:45 PM, Adam Jackson <a...@redhat.com> wrote: > On Wed, 2015-07-15 at 16:36 +0300, Oded Gabbay wrote: > >> + else >> + { >> + return FALSE; >> + } >> + >> + vfiller = create_mask_1x32_128(&filler); > > This appears to be less of a regression for small operations if you > fall back to the C code for byte_width < 64 here. That seems to be > about the optimal magic number to use for a cutoff. Making it 32 > doesn't affect 10x10 fill at 32bpp (40 bytes wide), but making it 128 > severely punishes 100x100 fill at 8bpp (100 bytes wide). > > x11perf -rect{1,10,100,500} at 32bpp: > > master vmx_fill vmx_fill for >=64 Operation > ---------- ------------------- ------------------- ----------------- > 19644005.2 16586597.0 (0.844x) 17323615.4 (0.882x) 1x1 rectangle > 6880326.9 5516644.2 (0.802x) 6619153.8 (0.962x) 10x10 rectangle > 126143.3 456974.3 (3.623x) 474572.7 (3.762x) 100x100 rectangle > 5421.5 29948.8 (5.524x) 29519.3 (5.445x) 500x500 rectangle > > - ajax
ajax, I checked your suggestion in cairo benchmark (trimmed) and I got a minor slowdown vs. the original patch: Slowdowns ========= t-firefox-scrolling 1197.96 (1199.58 0.17%) -> 1303.35 (1306.72 0.16%): 1.09x slowdown t-firefox-asteroids 492.77 (509.11 2.49%) -> 552.92 (575.34 2.88%): 1.12x slowdown Oded _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman