Re: [libav-devel] [PATCH 3/7] vf_hqdn3d: simplify and optimize

2012-07-28 Thread Ronald S. Bultje
Hi,

On Thu, Jul 26, 2012 at 6:40 PM, Loren Merritt  wrote:
> On Thu, 26 Jul 2012, Ronald S. Bultje wrote:
>> On Thu, Jul 26, 2012 at 3:51 PM, Loren Merritt  
>> wrote:
>> > 14% faster on penryn, 2% on sandybridge, 9% on bulldozer
>> > ---
>> >  libavfilter/vf_hqdn3d.c |  157 
>> > +++---
>> >  1 files changed, 51 insertions(+), 106 deletions(-)
>>
>> Looks good.
>>
>> I am going to ask a very stupid question: why is this faster? I see a
>> lot of simplification, which is good, but I'm not quite sure which
>> part actually has a clear speed impact.
>
> Old code's sline_offs and dline_offs confused gcc into incrementing the
> src and dst pointers rather than using x as an index reg.
>
> Old code did horizontal(x), vertical(x), temporal(x). There's a dependency
> chain between those 3 filters, so you need to interleave multiple loop
> iterations to get maximum throughput. OOE might theoretically handle that,
> but doesn't do so perfectly on the CPUs I tested.
> New code does vertical(x), horizontal(x+1), temporal(x); which requires
> less OOE.

Thanks for the explanation - and pushed.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 3/7] vf_hqdn3d: simplify and optimize

2012-07-26 Thread Loren Merritt
On Thu, 26 Jul 2012, Ronald S. Bultje wrote:

> Hi,
>
> On Thu, Jul 26, 2012 at 3:51 PM, Loren Merritt  
> wrote:
> > 14% faster on penryn, 2% on sandybridge, 9% on bulldozer
> > ---
> >  libavfilter/vf_hqdn3d.c |  157 
> > +++---
> >  1 files changed, 51 insertions(+), 106 deletions(-)
>
> Looks good.
>
> I am going to ask a very stupid question: why is this faster? I see a
> lot of simplification, which is good, but I'm not quite sure which
> part actually has a clear speed impact.

Old code's sline_offs and dline_offs confused gcc into incrementing the
src and dst pointers rather than using x as an index reg.

Old code did horizontal(x), vertical(x), temporal(x). There's a dependency
chain between those 3 filters, so you need to interleave multiple loop
iterations to get maximum throughput. OOE might theoretically handle that,
but doesn't do so perfectly on the CPUs I tested.
New code does vertical(x), horizontal(x+1), temporal(x); which requires
less OOE.

--Loren Merritt
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 3/7] vf_hqdn3d: simplify and optimize

2012-07-26 Thread Ronald S. Bultje
Hi,

On Thu, Jul 26, 2012 at 3:51 PM, Loren Merritt  wrote:
> 14% faster on penryn, 2% on sandybridge, 9% on bulldozer
> ---
>  libavfilter/vf_hqdn3d.c |  157 +++---
>  1 files changed, 51 insertions(+), 106 deletions(-)

Looks good.

I am going to ask a very stupid question: why is this faster? I see a
lot of simplification, which is good, but I'm not quite sure which
part actually has a clear speed impact.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel