On 07/25/2012 12:05 AM, Jason Garrett-Glaser wrote:
On Tue, Jul 24, 2012 at 9:02 AM, John Stebbins <stebb...@jetheaddev.com> wrote:
On 07/24/2012 05:53 PM, Jason Garrett-Glaser wrote:
On Tue, Jul 24, 2012 at 8:34 AM, Måns Rullgård <m...@mansr.com> wrote:
Jason Garrett-Glaser <ja...@x264.com> writes:

On Tue, Jul 24, 2012 at 8:05 AM, John Stebbins <stebb...@jetheaddev.com>
wrote:
On 06/25/2012 02:42 PM, Mans Rullgard wrote:
Module: libav
Branch: master
Commit: 82992604706144910f4a2f875d48cfc66c1b70d7

Author:    Mans Rullgard <m...@mansr.com>
Committer: Mans Rullgard <m...@mansr.com>
Date:      Sat Jun 23 19:08:11 2012 +0100

x86: fft: convert sse inline asm to yasm

---

    libavcodec/x86/Makefile    |    1 -
    libavcodec/x86/fft_mmx.asm |  139
++++++++++++++++++++++++++++++++++++++++---
    libavcodec/x86/fft_sse.c   |  110
----------------------------------
    3 files changed, 129 insertions(+), 121 deletions(-)

Hi,

This commit is causing some strange interaction with libx264 in
HandBrake
under certain conditions.  x264 is encoding at about 1/10th it's normal
rate
after updating to this commit.

A little more background.  When doing ac3 passthru HandBrake encodes a
single packet of silence data to ac3 that is uses for filling any gaps
that
it detects in the audio.  Encoding of this packet happens before any
other
encoding or decoding starts. For some crazy reason, if we encode this
silence, we get the x264 slowdown.  If we do not encode the silence,
the
speed is ok.  I ran gprof on the code to see where all the time is
being
spent and it is all in x264.  So it's not like there is some run-away
loop
somewhere that is bringing everything to it's knees.  I'm guessing some
cpu
state must not be getting cleared or restored properly somewhere.

John
Could it have anything to do with denormals/NaN?
Does x264 use floating-point SSE instructions anywhere?
Yes, in macroblock-tree (because floating-point reciprocal is fast and
IDIV is slow), and in ratecontrol.


I don't know if it is of any help, but here's the top entries from gprof
when this slowdown is happening.
x264 defaults + b-adapt=2

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  19.56     26.71    26.71 x264_pixel_satd_16x4_internal_avx
  17.85     51.08    24.37 x264_pixel_satd_8x8_internal_avx
  10.22     65.03    13.95 x264_sub8x8_dct_avx.skip_prologue
   9.11     77.47    12.44 x264_hadamard_ac_8x8_avx
   9.08     89.87    12.40 x264_intra_sa8d_x9_8x8_avx
   5.08     96.81     6.94 x264_sub8x8_dct8_avx.skip_prologue
   2.96    100.85     4.04 x264_pixel_satd_4x4_avx
   2.45    104.20     3.35 x264_intra_satd_x9_4x4_avx
   1.80    106.66     2.46 x264_mc_chroma_avx
   1.58    108.82     2.16 x264_hpel_filter_avx
   1.46    110.81     1.99 x264_pixel_ssim_4x4x2_core_avx
   1.21    112.46     1.65 x264_add8x8_idct_avx.skip_prologue
   1.09    113.95     1.49 x264_pixel_ssd_16x16_avx
   1.09    115.44     1.49 x264_me_search_ref
   1.02    116.83     1.39 x264_add8x8_idct8_avx.skip_prologue

According to top, all CPUs are fully saturated
That's an incredibly distorted profile -- it looks like all the AVX
functions are running incredibly slowly.

Note that all those functions do not use 256-bit AVX, only 128-bit
AVX; Intel hasn't documented any sort of slowdown when mixing 128-bit
SSE and 128-bit AVX, which we do without problems.

Could the problem be that ffmpeg is doing 256-bit AVX, but then not
using vzeroupper afterwards?  Which CPU is this anyways?
Intel(R) Core(TM) i7-2677M
4GB ram
Ubuntu 12.04

The user that initially reported the issue is running
Intel Core i5-2500k
8GB DDR3-1600 RAM
Ubuntu 11.04 Natty Narwal x64

I haven't tried this on a 32bit machine. I can do that tomorrow if it's still needed. My laptop is running out of juice tonight and I left the brick at the office :(.

_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to