Hi, 2015-10-13 2:26 GMT+02:00 Michael Niedermayer <mich...@niedermayer.cc>: > On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote: >> When the input of a pass has 15 or 16 bits of precision (in particular >> the column pass), the addition of a bias to W4 may lead to overflows >> in the input to pmaddwd. >> >> This requires postponing the adding of the bias to after the first >> butterfly. To do so, the fact that m15, unused although zeroed, is >> exploited. In case the pass is safe, an address can be directly used, >> and the number of xmm regs can be decreased. Otherwise, the 32bits bias >> is loaded into it. >> --- >> libavcodec/x86/proresdsp.asm | 8 ++++---- >> libavcodec/x86/simple_idct10_template.asm | 13 ++++++++++++- >> 2 files changed, 16 insertions(+), 5 deletions(-) > > how can i reproduce these overflows ?
Generate the vsynth3-dnxhd-1080i-10bit.mov added after another patch. Decode it first using faani (you could miss the error). Now, for the parameters that fail. You know how (1<<(%pass_bitdepth-1))/W4 is added to the first butterfly. The macro allows to pass the right pw_ to it (essentially times 4 dw 1<<(%pass_bitdepth-1-14)), or "" and expects to find a pd_round_%pass_bitdepth (essentially times 4 dd 1<<(%pass_bitdepth-1)). This is indicated in the comments of the template: "Adding 1<<(%2-1) for >=15 bits values". Contrast: "", 13, pw_8, 18, 0, pw_1023 => stddev: 1.33 PSNR: 45.61 MAXDIFF: 255 "", 12, pw_16, 19, 0, pw_1023 => stddev: 0.33 PSNR: 57.61 MAXDIFF: 255 to the result of the current parameters (no difference) The same input doesn't cause issue to prores, for some reason, probably because the mean DC (through times 4 dw 0x2008) is added at the last pass. -- Christophe _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel