Re: [libav-devel] [PATCH 3/7] vf_hqdn3d: simplify and optimize
Hi, On Thu, Jul 26, 2012 at 6:40 PM, Loren Merritt wrote: > On Thu, 26 Jul 2012, Ronald S. Bultje wrote: >> On Thu, Jul 26, 2012 at 3:51 PM, Loren Merritt >> wrote: >> > 14% faster on penryn, 2% on sandybridge, 9% on bulldozer >> > --- >> > libavfilter/vf_hqdn3d.c | 157 >> > +++--- >> > 1 files changed, 51 insertions(+), 106 deletions(-) >> >> Looks good. >> >> I am going to ask a very stupid question: why is this faster? I see a >> lot of simplification, which is good, but I'm not quite sure which >> part actually has a clear speed impact. > > Old code's sline_offs and dline_offs confused gcc into incrementing the > src and dst pointers rather than using x as an index reg. > > Old code did horizontal(x), vertical(x), temporal(x). There's a dependency > chain between those 3 filters, so you need to interleave multiple loop > iterations to get maximum throughput. OOE might theoretically handle that, > but doesn't do so perfectly on the CPUs I tested. > New code does vertical(x), horizontal(x+1), temporal(x); which requires > less OOE. Thanks for the explanation - and pushed. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 3/7] vf_hqdn3d: simplify and optimize
On Thu, 26 Jul 2012, Ronald S. Bultje wrote: > Hi, > > On Thu, Jul 26, 2012 at 3:51 PM, Loren Merritt > wrote: > > 14% faster on penryn, 2% on sandybridge, 9% on bulldozer > > --- > > libavfilter/vf_hqdn3d.c | 157 > > +++--- > > 1 files changed, 51 insertions(+), 106 deletions(-) > > Looks good. > > I am going to ask a very stupid question: why is this faster? I see a > lot of simplification, which is good, but I'm not quite sure which > part actually has a clear speed impact. Old code's sline_offs and dline_offs confused gcc into incrementing the src and dst pointers rather than using x as an index reg. Old code did horizontal(x), vertical(x), temporal(x). There's a dependency chain between those 3 filters, so you need to interleave multiple loop iterations to get maximum throughput. OOE might theoretically handle that, but doesn't do so perfectly on the CPUs I tested. New code does vertical(x), horizontal(x+1), temporal(x); which requires less OOE. --Loren Merritt ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 3/7] vf_hqdn3d: simplify and optimize
Hi, On Thu, Jul 26, 2012 at 3:51 PM, Loren Merritt wrote: > 14% faster on penryn, 2% on sandybridge, 9% on bulldozer > --- > libavfilter/vf_hqdn3d.c | 157 +++--- > 1 files changed, 51 insertions(+), 106 deletions(-) Looks good. I am going to ask a very stupid question: why is this faster? I see a lot of simplification, which is good, but I'm not quite sure which part actually has a clear speed impact. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel