2012/11/30 Loren Merritt <lor...@u.washington.edu>:
> cpu is more relevant than os.

Will amend commit message, but then I may as well put in each commit both, then.

>> +; r0q=Y   r1q=s_m   r2q=q_filt   r3q=noise  r4q=max_m
>> +cglobal hf_apply_noise_main
>
> You can invoke DEFINE_ARGS even if not generating a prologue.

I didn't know about DEFINE_ARGS, will use.

>> +  movh       m3, [r1q + r4q]
>> +  movh       m4, [r1q + r4q + 8]
>
> Can these be a single aligned load?

Yes, but then I'm probably missing a trick here, because altering the
above and following code like that:
    movu       m3, [s_mq + max_mq]
    mova       m4, m3
    unpcklps   m3, m3
    unpckhps   m4, m4
is slower. (movhlps/unpcklps is even slower)
Is there a way to do that in 3 insns then?

>> +  cmpps      m6, m5, 0 ; m1 == 0
>> +  cmpps      m7, m5, 0 ; m1 == 0
>
> You mean m7 == 0?

Will correct, remnant of the code from before unrolling.

>> +cglobal sbr_hf_apply_noise_0, 4,5,8, Y,s_m,q_filt,noise,kx,m_max
>> +  mova       m0, [ps_noise0]
>> +  mov       r4d, m_maxm
>> +  call      hf_apply_noise_main
>> +  RET
>
> TAIL_CALL hf_apply_noise_main, 1

Which makes me think that every caller should have the same epilog
(same stack offset etc). Is there a way I just do a jmp here and let
the "jumpee" do the epilog.

Another thing I'm wondering (can't make sure for the next 4 days):
mov       r4d, m_maxm
If I'm not mistaken, m_max should already be in r5 for linux
x86_64/amd64 ABI (whatever I should call it).
So I could save that mov and have instead hf_apply_noise_main use r5
under that condition.

Does that make sense?

-- 
Christophe
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to