On 07/24/2012 05:53 PM, Jason Garrett-Glaser wrote:
On Tue, Jul 24, 2012 at 8:34 AM, Måns Rullgård <m...@mansr.com> wrote:
Jason Garrett-Glaser <ja...@x264.com> writes:

On Tue, Jul 24, 2012 at 8:05 AM, John Stebbins <stebb...@jetheaddev.com> wrote:
On 06/25/2012 02:42 PM, Mans Rullgard wrote:
Module: libav
Branch: master
Commit: 82992604706144910f4a2f875d48cfc66c1b70d7

Author:    Mans Rullgard <m...@mansr.com>
Committer: Mans Rullgard <m...@mansr.com>
Date:      Sat Jun 23 19:08:11 2012 +0100

x86: fft: convert sse inline asm to yasm

---

   libavcodec/x86/Makefile    |    1 -
   libavcodec/x86/fft_mmx.asm |  139
++++++++++++++++++++++++++++++++++++++++---
   libavcodec/x86/fft_sse.c   |  110 ----------------------------------
   3 files changed, 129 insertions(+), 121 deletions(-)

Hi,

This commit is causing some strange interaction with libx264 in HandBrake
under certain conditions.  x264 is encoding at about 1/10th it's normal rate
after updating to this commit.

A little more background.  When doing ac3 passthru HandBrake encodes a
single packet of silence data to ac3 that is uses for filling any gaps that
it detects in the audio.  Encoding of this packet happens before any other
encoding or decoding starts. For some crazy reason, if we encode this
silence, we get the x264 slowdown.  If we do not encode the silence, the
speed is ok.  I ran gprof on the code to see where all the time is being
spent and it is all in x264.  So it's not like there is some run-away loop
somewhere that is bringing everything to it's knees.  I'm guessing some cpu
state must not be getting cleared or restored properly somewhere.

John
Could it have anything to do with denormals/NaN?
Does x264 use floating-point SSE instructions anywhere?
Yes, in macroblock-tree (because floating-point reciprocal is fast and
IDIV is slow), and in ratecontrol.



I don't know if it is of any help, but here's the top entries from gprof when this slowdown is happening.
x264 defaults + b-adapt=2

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 19.56     26.71    26.71 x264_pixel_satd_16x4_internal_avx
 17.85     51.08    24.37 x264_pixel_satd_8x8_internal_avx
 10.22     65.03    13.95 x264_sub8x8_dct_avx.skip_prologue
  9.11     77.47    12.44 x264_hadamard_ac_8x8_avx
  9.08     89.87    12.40 x264_intra_sa8d_x9_8x8_avx
  5.08     96.81     6.94 x264_sub8x8_dct8_avx.skip_prologue
  2.96    100.85     4.04 x264_pixel_satd_4x4_avx
  2.45    104.20     3.35 x264_intra_satd_x9_4x4_avx
  1.80    106.66     2.46 x264_mc_chroma_avx
  1.58    108.82     2.16 x264_hpel_filter_avx
  1.46    110.81     1.99 x264_pixel_ssim_4x4x2_core_avx
  1.21    112.46     1.65 x264_add8x8_idct_avx.skip_prologue
  1.09    113.95     1.49 x264_pixel_ssd_16x16_avx
  1.09    115.44     1.49 x264_me_search_ref
  1.02    116.83     1.39 x264_add8x8_idct8_avx.skip_prologue

According to top, all CPUs are fully saturated


_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to