2012/11/30 Loren Merritt <lor...@u.washington.edu>: > cpu is more relevant than os.
Will amend commit message, but then I may as well put in each commit both, then. >> +; r0q=Y r1q=s_m r2q=q_filt r3q=noise r4q=max_m >> +cglobal hf_apply_noise_main > > You can invoke DEFINE_ARGS even if not generating a prologue. I didn't know about DEFINE_ARGS, will use. >> + movh m3, [r1q + r4q] >> + movh m4, [r1q + r4q + 8] > > Can these be a single aligned load? Yes, but then I'm probably missing a trick here, because altering the above and following code like that: movu m3, [s_mq + max_mq] mova m4, m3 unpcklps m3, m3 unpckhps m4, m4 is slower. (movhlps/unpcklps is even slower) Is there a way to do that in 3 insns then? >> + cmpps m6, m5, 0 ; m1 == 0 >> + cmpps m7, m5, 0 ; m1 == 0 > > You mean m7 == 0? Will correct, remnant of the code from before unrolling. >> +cglobal sbr_hf_apply_noise_0, 4,5,8, Y,s_m,q_filt,noise,kx,m_max >> + mova m0, [ps_noise0] >> + mov r4d, m_maxm >> + call hf_apply_noise_main >> + RET > > TAIL_CALL hf_apply_noise_main, 1 Which makes me think that every caller should have the same epilog (same stack offset etc). Is there a way I just do a jmp here and let the "jumpee" do the epilog. Another thing I'm wondering (can't make sure for the next 4 days): mov r4d, m_maxm If I'm not mistaken, m_max should already be in r5 for linux x86_64/amd64 ABI (whatever I should call it). So I could save that mov and have instead hf_apply_noise_main use r5 under that condition. Does that make sense? -- Christophe _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel