Re: [libav-devel] [libav-commits] x86: fft: convert sse inline asm to yasm

Ronald S. Bultje Tue, 24 Jul 2012 19:27:11 -0700

Hi,

On Tue, Jul 24, 2012 at 3:05 PM, Jason Garrett-Glaser <ja...@x264.com> wrote:
> On Tue, Jul 24, 2012 at 9:02 AM, John Stebbins <stebb...@jetheaddev.com> 
> wrote:
>> On 07/24/2012 05:53 PM, Jason Garrett-Glaser wrote:
>>>
>>> On Tue, Jul 24, 2012 at 8:34 AM, Måns Rullgård <m...@mansr.com> wrote:
>>>>
>>>> Jason Garrett-Glaser <ja...@x264.com> writes:
>>>>
>>>>> On Tue, Jul 24, 2012 at 8:05 AM, John Stebbins <stebb...@jetheaddev.com>
>>>>> wrote:
>>>>>>
>>>>>> On 06/25/2012 02:42 PM, Mans Rullgard wrote:
>>>>>>>
>>>>>>> Module: libav
>>>>>>> Branch: master
>>>>>>> Commit: 82992604706144910f4a2f875d48cfc66c1b70d7
>>>>>>>
>>>>>>> Author:    Mans Rullgard <m...@mansr.com>
>>>>>>> Committer: Mans Rullgard <m...@mansr.com>
>>>>>>> Date:      Sat Jun 23 19:08:11 2012 +0100
>>>>>>>
>>>>>>> x86: fft: convert sse inline asm to yasm
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>>    libavcodec/x86/Makefile    |    1 -
>>>>>>>    libavcodec/x86/fft_mmx.asm |  139
>>>>>>> ++++++++++++++++++++++++++++++++++++++++---
>>>>>>>    libavcodec/x86/fft_sse.c   |  110
>>>>>>> ----------------------------------
>>>>>>>    3 files changed, 129 insertions(+), 121 deletions(-)
>>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This commit is causing some strange interaction with libx264 in
>>>>>> HandBrake
>>>>>> under certain conditions.  x264 is encoding at about 1/10th it's normal
>>>>>> rate
>>>>>> after updating to this commit.
>>>>>>
>>>>>> A little more background.  When doing ac3 passthru HandBrake encodes a
>>>>>> single packet of silence data to ac3 that is uses for filling any gaps
>>>>>> that
>>>>>> it detects in the audio.  Encoding of this packet happens before any
>>>>>> other
>>>>>> encoding or decoding starts. For some crazy reason, if we encode this
>>>>>> silence, we get the x264 slowdown.  If we do not encode the silence,
>>>>>> the
>>>>>> speed is ok.  I ran gprof on the code to see where all the time is
>>>>>> being
>>>>>> spent and it is all in x264.  So it's not like there is some run-away
>>>>>> loop
>>>>>> somewhere that is bringing everything to it's knees.  I'm guessing some
>>>>>> cpu
>>>>>> state must not be getting cleared or restored properly somewhere.
>>>>>>
>>>>>> John
>>>>>
>>>>> Could it have anything to do with denormals/NaN?
>>>>
>>>> Does x264 use floating-point SSE instructions anywhere?
>>>
>>> Yes, in macroblock-tree (because floating-point reciprocal is fast and
>>> IDIV is slow), and in ratecontrol.
>>>
>>>
>>
>> I don't know if it is of any help, but here's the top entries from gprof
>> when this slowdown is happening.
>> x264 defaults + b-adapt=2
>>
>> Each sample counts as 0.01 seconds.
>>   %   cumulative   self              self     total
>>  time   seconds   seconds    calls  ms/call  ms/call  name
>>  19.56     26.71    26.71 x264_pixel_satd_16x4_internal_avx
>>  17.85     51.08    24.37 x264_pixel_satd_8x8_internal_avx
>>  10.22     65.03    13.95 x264_sub8x8_dct_avx.skip_prologue
>>   9.11     77.47    12.44 x264_hadamard_ac_8x8_avx
>>   9.08     89.87    12.40 x264_intra_sa8d_x9_8x8_avx
>>   5.08     96.81     6.94 x264_sub8x8_dct8_avx.skip_prologue
>>   2.96    100.85     4.04 x264_pixel_satd_4x4_avx
>>   2.45    104.20     3.35 x264_intra_satd_x9_4x4_avx
>>   1.80    106.66     2.46 x264_mc_chroma_avx
>>   1.58    108.82     2.16 x264_hpel_filter_avx
>>   1.46    110.81     1.99 x264_pixel_ssim_4x4x2_core_avx
>>   1.21    112.46     1.65 x264_add8x8_idct_avx.skip_prologue
>>   1.09    113.95     1.49 x264_pixel_ssd_16x16_avx
>>   1.09    115.44     1.49 x264_me_search_ref
>>   1.02    116.83     1.39 x264_add8x8_idct8_avx.skip_prologue
>>
>> According to top, all CPUs are fully saturated
>
> That's an incredibly distorted profile -- it looks like all the AVX
> functions are running incredibly slowly.
>
> Note that all those functions do not use 256-bit AVX, only 128-bit
> AVX; Intel hasn't documented any sort of slowdown when mixing 128-bit
> SSE and 128-bit AVX, which we do without problems.
>
> Could the problem be that ffmpeg is doing 256-bit AVX, but then not
> using vzeroupper afterwards?  Which CPU is this anyways?


Do the x264 functions sign-extend all their integer arguments? Or put
differently, does the problem occur for 32-bit builds also, or only
for 64-bit builds?

Ronald
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [libav-commits] x86: fft: convert sse inline asm to yasm

Reply via email to