Hi, On Wed, Jul 6, 2011 at 4:18 PM, Jason Garrett-Glaser <[email protected]> wrote: > On Wed, Jul 6, 2011 at 4:10 PM, Ronald S. Bultje <[email protected]> wrote: >> Hi, >> >> On Wed, Jul 6, 2011 at 4:08 PM, Ronald S. Bultje <[email protected]> wrote: >>> I believe the reason this exists is so that we duplicate the code, and >>> shifting becomes a constant in the code, rather then a variable to be >>> loaded into cl. This leaves cl free for others and lead to somewhat of >>> a speedup. >> >> Actually, it may not be the shift alone, but also the fact that the >> shift is completely not there for the 8bit version. So it's actually >> the 8bit version that gets significantly faster with this, the 10bit >> maybe not so much (or just a little bit, at best, because cl is not >> clobbered). > > Except that: > > 1) hl_motion is ALREADY always_inline'd. So it doesn't matter! > > 2) The mc_dir_part chain of functions is not fully inlined into > hl_motion, despite the use of "inline". If we force this, the code > size increases by 500 kilobytes for no real speed gain. There might > be some gain to be had in tweaking this.
Hm, I see, and decode_mb_simple() is already av_always_inline also so I guess this is indeed just duplication for no real gain. Weird. I assume you tested this to not give any slowdown in regular (simple) h264 content so then it's fine. Ronald _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
