Re: [FFmpeg-devel] [PATCH] hevc: fix WPP mode
Hi Christophe, the fix looks weird to me. There is something else underlying. Mickaël 2015-09-23 16:53 GMT+02:00 Ronald S. Bultje: > Hi, > > On Wed, Sep 23, 2015 at 10:33 AM, Christophe Gisquet < > christophe.gisq...@gmail.com> wrote: > > > Hi, > > > > under highly-threaded loads, parallel decoding of WPP is subject to a > > race condition. > > > > This basically fixes ticket #4365. > > > Nice catch! Lgtm. > > Ronald > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 4/5] x86: hevc_mc: fewer xmm regs used in epel h/v
Looks better to me. Mickaël Le mardi 17 février 2015, Christophe Gisquet christophe.gisq...@gmail.com a écrit : 2015-02-17 8:28 GMT+01:00 Mickaël Raulet mrau...@insa-rennes.fr javascript:;: It seems to me that you are affecting 8 when it is avx2 instead of 11. Shouldn't it be the opposite? At least this what the commit message says. Huh, brainfart... And the fact that I can't easily test avx2 doesn't help. So here's a patch with the values swapped out. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] hevc : support deinterlacing inside the decoder
I was trying the sample #4141 and it seems to me that the number of frames to keep before outputting a frame is too low. Where does this bitstream comes from? Previous commit was available here: https://github.com/OpenHEVC/FFmpeg/commit/6b93a7a175fb500d1f5d4d671b2fab73798ca7b6 This commit adds support for another kind of pic_struct I was not using before (used in this sample). https://github.com/OpenHEVC/FFmpeg/commit/e360c4077e9658669e80424e63fdcf07400290c7 To display it fluently you need add one extra field in hevc_refs.c (I don't think I have to incresase it by one, but I might be wrong) nb_output = s-sps-temporal_layer[s-sps-max_sub_layers - 1]. num_reorder_pics + s-interlaced +1 2015-02-08 20:09 GMT+01:00 Carl Eugen Hoyos ceho...@ag.or.at: Kacper Michajłow kasper93 at gmail.com writes: 2015-02-08 10:48 GMT+01:00 Carl Eugen Hoyos: Mickaël Raulet mraulet at insa-rennes.fr writes: As we can consider, we won't have 4k interlaced content, copying a field into a frame should be ok. This is what has been done in this implementation. Do you have a sample? I am only interested in testing this. If I understand correctly, this should fix this ticket #4141. Sample is included. The patch - unfortunately! - makes no difference for this sample. Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] DSP function ARM NEON patches for hevc
Michael, Please find some commits that can be cherry picked from https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch Optimized deblocking filter (8bits only) 1b9ee47d2f43b0a029a9468233626102eb1473b8 Optimzed transform functions (4x4, 8x8, transform add 8bits only) b153f55935969c794de4640f8d34e01c58e027ae ARM NEON optimized qpel functions (8bits only) 965cd82e376f17125c0ad6465d14f4ab1749fda1 Comments welcome if there are any. More coming soon for epel and SAO! Mickaël ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] hevc : support deinterlacing inside the decoder
As we can consider, we won't have 4k interlaced content, copying a field into a frame should be ok. This is what has been done in this implementation. Commit hash from ffmpeg/openhevc: 6b93a7a175fb500d1f5d4d671b2fab73798ca7b6 Comments welcome! Mickaël ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 5/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3, avx2}
LGTM Mickael 2015-02-04 13:39 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com : Hi, 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com: Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Add your own copyright to this file then. Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips Width 64 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Are the first number for each case from before you split out the restore part? Otherwise, that's gruesome. -void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst, -ptrdiff_t stride_src, int16_t *sao_offset_val, int sao_eo_class, -int width, int height); +void (*sao_edge_filter[5])(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst, + ptrdiff_t stride_src, int16_t *sao_offset_val, int sao_eo_class, + int width, int height); Maybe add a comment on top of that to indicate that _dst is 16-byte-aligned? Also, src and stride_src are so that the buffer is 32-byte-aligned, because of: stride_dst = 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE; dst = lc-edge_emu_buffer + stride_dst + FF_INPUT_BUFFER_PADDING_SIZE; in hevc_filter.c, but I'm not sure how much it is a benefit here, or often it is helping here. Don't hesitate to modify them if need be. +%else ; ARCH_X86_32 +cglobal hevc_sao_edge_filter_%1_8, 1, 7, 8, dst, src, dststride, srcstride, a_stride, b_stride, height As seen from above, srcstride is constant and is 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE. That may save you one whole gpr. Not really useful here, but I think you are more limited for the8 bits case. If you want to exploit this, also add it above void (*sao_edge_filter[5]) No comment on the actual assembly, it looks fine. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 6/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_{10, 12}_{sse2, avx2}
LGTM. Mickael 2015-02-04 13:51 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com : Hi, 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com: -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1)= { 0x0001000100010001ULL, 0x0001000100010001ULL }; -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_2)= { 0x0002000200020002ULL, 0x0002000200020002ULL }; +DECLARE_ALIGNED(32, const ymm_reg, ff_pw_1)= { 0x0001000100010001ULL, 0x0001000100010001ULL, + 0x0001000100010001ULL, 0x0001000100010001ULL }; +DECLARE_ALIGNED(32, const ymm_reg, ff_pw_2)= { 0x0002000200020002ULL, 0x0002000200020002ULL, + 0x0002000200020002ULL, 0x0002000200020002ULL }; DECLARE_ALIGNED(16, const xmm_reg, ff_pw_3)= { 0x0003000300030003ULL, 0x0003000300030003ULL }; DECLARE_ALIGNED(16, const xmm_reg, ff_pw_4)= { 0x0004000400040004ULL, 0x0004000400040004ULL }; DECLARE_ALIGNED(16, const xmm_reg, ff_pw_5)= { 0x0005000500050005ULL, 0x0005000500050005ULL }; @@ -48,7 +50,8 @@ DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1019) = { 0x03FB03FB03FB03FBULL, 0x03F DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1024) = { 0x0400040004000400ULL, 0x0400040004000400ULL }; DECLARE_ALIGNED(16, const xmm_reg, ff_pw_2048) = { 0x0800080008000800ULL, 0x0800080008000800ULL }; DECLARE_ALIGNED(16, const xmm_reg, ff_pw_8192) = { 0x2000200020002000ULL, 0x2000200020002000ULL }; -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_m1) = { 0xULL, 0xULL }; +DECLARE_ALIGNED(32, const ymm_reg, ff_pw_m1) = { 0xULL, 0xULL, + 0xULL, 0xULL }; Nice of you to do this. There is more sharing to do, but I have patches waiting for your patchset and the avx2 patch to clean even more. +;void ff_hevc_sao_edge_filter_width_depth_opt(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src, +; int16_t *sao_offset_val, int eo, int width, int height); +%macro HEVC_SAO_EDGE_FILTER_16 3 +%if WIN64 +cglobal hevc_sao_edge_filter_%2_%1, 4, 8, 16, dst, src, dststride, srcstride, eo, a_stride, b_stride, height Ok, nevermind my comment in patch 5/6: 16 xmm regs are too much for x86_32. Or playing with the stack is required, but that would be another patch, if ever. Otherwise, nothing striking in that code, looks good. Thanks, -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/6] hevcdsp: simplified sao_edge_filter
OK too. 2015-02-04 8:04 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Hi, 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com: +int a_stride, b_stride; +int src_offset = 0; +int dst_offset = 0; Could maybe use ptrdiff_t type, like the other strides? With or without, ok. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 4/6] hevcdsp: replace the SAOParams struct parameter from sao_edge_filter
lgtm. Mickael 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com: As with sao_band_filter, pass instead the two variables from the struct needed in the function. This simplifies writing asm optimized versions. Signed-off-by: James Almer jamr...@gmail.com --- libavcodec/hevc_filter.c | 4 +++- libavcodec/hevcdsp.h | 4 ++-- libavcodec/hevcdsp_template.c | 7 ++- 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/libavcodec/hevc_filter.c b/libavcodec/hevc_filter.c index b002d87..bf88b47 100644 --- a/libavcodec/hevc_filter.c +++ b/libavcodec/hevc_filter.c @@ -427,7 +427,9 @@ static void sao_filter_CTB(HEVCContext *s, int x, int y) copy_CTB_to_hv(s, src, stride_src, x0, y0, width, height, c_idx, x_ctb, y_ctb); -s-hevcdsp.sao_edge_filter(src, dst, stride_src, stride_dst, sao, width, height, c_idx); +s-hevcdsp.sao_edge_filter(src, dst, stride_src, stride_dst, + sao-offset_val[c_idx], sao-eo_class[c_idx], + width, height); s-hevcdsp.sao_edge_restore[restore](src, dst, stride_src, stride_dst, sao, diff --git a/libavcodec/hevcdsp.h b/libavcodec/hevcdsp.h index 53d7b1b..1510f39 100644 --- a/libavcodec/hevcdsp.h +++ b/libavcodec/hevcdsp.h @@ -62,8 +62,8 @@ typedef struct HEVCDSPContext { int16_t *sao_offset_val, int sao_left_class, int width, int height); void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst, -ptrdiff_t stride_src, SAOParams *sao, int width, -int height, int c_idx); +ptrdiff_t stride_src, int16_t *sao_offset_val, int sao_eo_class, +int width, int height); void (*sao_edge_restore[2])(uint8_t *_dst, uint8_t *_src, ptrdiff_t _stride_dst, ptrdiff_t _stride_src, struct SAOParams *sao, int *borders, int _width, int _height, int c_idx, diff --git a/libavcodec/hevcdsp_template.c b/libavcodec/hevcdsp_template.c index 4479435..ac98709 100644 --- a/libavcodec/hevcdsp_template.c +++ b/libavcodec/hevcdsp_template.c @@ -328,9 +328,8 @@ static void FUNC(sao_band_filter_0)(uint8_t *_dst, uint8_t *_src, #define CMP(a, b) ((a) (b) ? 1 : ((a) == (b) ? 0 : -1)) static void FUNC(sao_edge_filter)(uint8_t *_dst, uint8_t *_src, - ptrdiff_t stride_dst, ptrdiff_t stride_src, SAOParams *sao, - int width, int height, - int c_idx) { + ptrdiff_t stride_dst, ptrdiff_t stride_src, int16_t *sao_offset_val, + int eo, int width, int height) { static const uint8_t edge_idx[] = { 1, 2, 0, 3, 4 }; static const int8_t pos[4][2][2] = { @@ -339,8 +338,6 @@ static void FUNC(sao_edge_filter)(uint8_t *_dst, uint8_t *_src, { { -1, -1 }, { 1, 1 } }, // 45 degree { { 1, -1 }, { -1, 1 } }, // 135 degree }; -int16_t *sao_offset_val = sao-offset_val[c_idx]; -int eo = sao-eo_class[c_idx]; pixel *dst = (pixel *)_dst; pixel *src = (pixel *)_src; int a_stride, b_stride; -- 2.2.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 3/6] hevcdsp: further simplify sao_edge_filter
ok. 2015-02-04 8:07 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Hi, 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com: [...] Ok, no need to resend a refreshed patch if patch 2/6 changes. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves
I will check it this evening. Mickaël 2015-02-03 15:15 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com : 2015-02-03 12:57 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Actually, 940300945 does need to be reverted for the patch to work, as Mickael stated. It miscompiles hevc_mc.asm, more particularly the [eq]pel_hv functions. No idea why. The patch in [PATCH] x86: lavu/x264asm: fix ymm register instanciation fixes the generated assembly for me. Mickaël and/or James, could you confirm that? I'll submit it to the x264 project afterwards. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves
it works now for me. Thanks, Mickaël Le 3 févr. 2015 à 15:28, Mickaël Raulet mrau...@insa-rennes.fr a écrit : I will check it this evening. Mickaël 2015-02-03 15:15 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: 2015-02-03 12:57 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com: Actually, 940300945 does need to be reverted for the patch to work, as Mickael stated. It miscompiles hevc_mc.asm, more particularly the [eq]pel_hv functions. No idea why. The patch in [PATCH] x86: lavu/x264asm: fix ymm register instanciation fixes the generated assembly for me. Mickaël and/or James, could you confirm that? I'll submit it to the x264 project afterwards. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] support for monochrome sequences in hevc decoder
Hi here is a commit that support monochrome sequences! https://github.com/OpenHEVC/FFmpeg/commit/8e50557707d2ec11ccad657470b2e140f314348e Commit hash: 8e50557707d2ec11ccad657470b2e140f314348e Mickael ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] support for monochrome sequences in hevc decoder
I did it first :) 2015-02-02 16:47 GMT+01:00 Michael Niedermayer michae...@gmx.at: On Mon, Feb 02, 2015 at 04:11:33PM +0100, Mickaël Raulet wrote: Hi here is a commit that support monochrome sequences! https://github.com/OpenHEVC/FFmpeg/commit/8e50557707d2ec11ccad657470b2e140f314348e Commit hash: 8e50557707d2ec11ccad657470b2e140f314348e is this an independant implementation than the one from fabrices BPG? or is it based on his ? iam asking so i know if i should add a Based-on: ... to the commit message Thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The misfortune of the wise is better than the prosperity of the fool. -- Epicurus ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves
PL Lepere is the original author and I did some improvements on top of it. Mickael 2015-02-02 18:11 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com : Hi, 2015-02-02 17:16 GMT+01:00 Mickaël Raulet mrau...@insa-rennes.fr: https://github.com/OpenHEVC/FFmpeg/commit/940300945995c20f7583394ebe6907e72829b4a No longer apply cleanly, as multiple fixes and improvements have been committed since then. The attached patch fixes that, and passes on a non-avx2 machine. I can't test it, and I'm not looking forward to do debug through a ssh shell. And who is the actual author? It has been committed under your name, but wouldn't that be P-E Lepere rather? And I guess I'll drop the previous patch for now. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 3/3] x86/hevc: add ff_hevc_sao_band_filter_{8, 10, 12}_{sse2, avx2}
LGTM. Mickaël Le samedi 31 janvier 2015, Christophe Gisquet christophe.gisq...@gmail.com a écrit : Hi, 2015-01-30 19:50 GMT+01:00 James Almer jamr...@gmail.com javascript:;: +%macro HEVC_SAO_BAND_FILTER_COMPUTE 3 +psraw %2, %3, %1-5 +pcmpeqw m10, %2, m0 +pcmpeqw m11, %2, m1 +pcmpeqw m12, %2, m2 +pcmpeqw %2, m3 +pand m10, m4 +pand m11, m5 +pand m12, m6 +pand %2, m7 +por m10, m11 +por m12, %2 +por m10, m12 +paddw %3, m10 +%endmacro The shift does really force to work on bytes, too bad. Some pshufb might still be possible using the result, but it would be cumbersome because the psraw result is [0-31], and offset might be signed. +.loop: +movu m13, [srcq+widthq] [...] +movu [dstq+widthq], m8 Some of those moves could be aligned, but there's some work to be done at the buffer levels. So it's not like it's really part of this patch. Looks good, any improvement seems like an additional patch. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org javascript:; http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/5] x86: hevc_mc: correct unneeded use of SSE4 code
this commit might help to solve the issue with SSE4 https://github.com/OpenHEVC/FFmpeg/commit/df8ebe304df453f26c28ff8f11d607f49b90a4c2 Mickaël Le 24 août 2014 à 11:52, Michael Niedermayer michae...@gmx.at a écrit : On Sun, Aug 24, 2014 at 08:46:31AM +, Christophe Gisquet wrote: --- libavcodec/x86/hevc_mc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) applied thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When you are offended at any man's fault, turn to yourself and study your own failings. Then you will forget your anger. -- Epictetus ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3
For 10bits and 12bits, they should stay sse4 as well because of packusdw. You need some instructions to convert it to ssse3 see below static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b ) { a = _mm_slli_epi32 (a, 16); a = _mm_srai_epi32 (a, 16); b = _mm_slli_epi32 (b, 16); b = _mm_srai_epi32 (b, 16); a = _mm_packs_epi32 (a, b); return a; } Mickaël Le 23 août 2014 à 15:22, Christophe Gisquet christophe.gisq...@gmail.com a écrit : As far as I can see, the only reason those functions are SSE4 is because of the pextrw needed for the following block widths: - 2, used only by chroma; - 6, used by chroma and indirectly by luma; - 12, used by both. The better solution would be to convert all chroma handling to NV12, but it is vastly simpler to modify the above cases to not use pextrw. This is done in 2 steps: - Fix width of 12 to do 8+4 instead of 6+6; - Modify the store macros for width 2 and 6 by passing data through a GPR (alas at the cost for some functions of a supplementary GPR). Christophe Gisquet (2): x86: hevc_mc: split differently calls x86: hevc_mc: convert to ssse3 libavcodec/x86/hevc_mc.asm| 63 +++-- libavcodec/x86/hevcdsp.h | 48 ++-- libavcodec/x86/hevcdsp_init.c | 561 ++ 3 files changed, 362 insertions(+), 310 deletions(-) -- 1.9.2.msysgit.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 0/4] Exploit compile-time constant
Patch okay. Mickaël Le 4 août 2014 à 10:31, Christophe Gisquet christophe.gisq...@gmail.com a écrit : Hi, 2014-08-02 14:48 GMT+02:00 Michael Niedermayer michae...@gmx.at: seems to fail with libavcodec/x86/hevc_mc.asm:1258: error: (add:2) cannot reference symbol `MAX_PB_SIZE' in preprocessor I forgot the initial patch when generating the patchset, that you can find here. I expect no changes for the others, so I didn't bother resending them/starting another thread. -- Christophe 0001-x86-hevc_mc-assume-2nd-source-stride-is-64.patch___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 0/4] Exploit compile-time constant
for the whole patchset. Mickaël Le 22 août 2014 à 13:25, Michael Niedermayer michae...@gmx.at a écrit : On Fri, Aug 22, 2014 at 11:40:17AM +0200, Mickaël Raulet wrote: Patch okay. patch applied just to make sure i dont misunderstand, that okay was just for this patch or the whole patchset ? thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In fact, the RIAA has been known to suggest that students drop out of college or go to community college in order to be able to afford settlements. -- The RIAA ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] x86/hecv_res_add: add ff_hevc_transform_add{8, 16, 32}_8_avx
Patch ok Mickael Le mercredi 20 août 2014, James Almer jamr...@gmail.com a écrit : ~15% faster than sse2 Signed-off-by: James Almer jamr...@gmail.com javascript:; --- libavcodec/x86/hevc_res_add.asm | 15 +++ libavcodec/x86/hevcdsp.h| 4 libavcodec/x86/hevcdsp_init.c | 4 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/libavcodec/x86/hevc_res_add.asm b/libavcodec/x86/hevc_res_add.asm index 47022d3..feea50c 100644 --- a/libavcodec/x86/hevc_res_add.asm +++ b/libavcodec/x86/hevc_res_add.asm @@ -156,8 +156,8 @@ cglobal hevc_transform_add4_8, 3, 4, 6 %endmacro -INIT_XMM sse2 -; void ff_hevc_transform_add8_8_sse2(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride) +%macro TRANSFORM_ADD_8 0 +; void ff_hevc_transform_add8_8_opt(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride) cglobal hevc_transform_add8_8, 3, 4, 8 lea r3, [r2*3] TR_ADD_SSE_8_8 @@ -167,7 +167,7 @@ cglobal hevc_transform_add8_8, 3, 4, 8 RET %if ARCH_X86_64 -; void ff_hevc_transform_add16_8_sse2(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride) +; void ff_hevc_transform_add16_8_opt(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride) cglobal hevc_transform_add16_8, 3, 4, 12 lea r3, [r2*3] TR_ADD_SSE_16_8 @@ -178,7 +178,7 @@ cglobal hevc_transform_add16_8, 3, 4, 12 %endrep RET -; void ff_hevc_transform_add16_8_sse2(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride) +; void ff_hevc_transform_add32_8_opt(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride) cglobal hevc_transform_add32_8, 3, 4, 12 TR_ADD_SSE_32_8 @@ -190,6 +190,13 @@ cglobal hevc_transform_add32_8, 3, 4, 12 RET %endif ;ARCH_X86_64 +%endmacro + +INIT_XMM sse2 +TRANSFORM_ADD_8 +INIT_XMM avx +TRANSFORM_ADD_8 + ;- ; void ff_hevc_transform_add_10(pixel *dst, int16_t *block, int stride) ;- diff --git a/libavcodec/x86/hevcdsp.h b/libavcodec/x86/hevcdsp.h index 7ced22c..74b5173 100644 --- a/libavcodec/x86/hevcdsp.h +++ b/libavcodec/x86/hevcdsp.h @@ -139,6 +139,10 @@ void ff_hevc_transform_add8_8_sse2(uint8_t *dst, int16_t *coeffs, ptrdiff_t stri void ff_hevc_transform_add16_8_sse2(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride); void ff_hevc_transform_add32_8_sse2(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_transform_add8_8_avx(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_transform_add16_8_avx(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride); +void ff_hevc_transform_add32_8_avx(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride); + void ff_hevc_transform_add4_10_mmxext(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride); void ff_hevc_transform_add8_10_sse2(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride); void ff_hevc_transform_add16_10_sse2(uint8_t *dst, int16_t *coeffs, ptrdiff_t stride); diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c index 0f9fe7d..f6f0a4b 100644 --- a/libavcodec/x86/hevcdsp_init.c +++ b/libavcodec/x86/hevcdsp_init.c @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth) if (ARCH_X86_64) { c-hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_avx; c-hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_avx; + +c-transform_add[2]= ff_hevc_transform_add16_8_avx; +c-transform_add[3]= ff_hevc_transform_add32_8_avx; } +c-transform_add[1]= ff_hevc_transform_add8_8_avx; } if (EXTERNAL_AVX2(cpu_flags)) { c-idct_dc[2] = ff_hevc_idct16x16_dc_8_avx2; -- 1.8.5.5 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org javascript:; http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/hevc_ps: do cleanup in case of unsupported bit depth
Ok. Mickael Le mercredi 20 août 2014, Timothy Gu timothyg...@gmail.com a écrit : On Tue, Aug 19, 2014 at 6:49 PM, Michael Niedermayer michae...@gmx.at javascript:; wrote: Fixes memleak Fixes CID1231989 Signed-off-by: Michael Niedermayer michae...@gmx.at javascript:; --- libavcodec/hevc_ps.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Looks OK. diff --git a/libavcodec/hevc_ps.c b/libavcodec/hevc_ps.c index 163c5e4..2ccce5f 100644 --- a/libavcodec/hevc_ps.c +++ b/libavcodec/hevc_ps.c @@ -810,7 +810,8 @@ int ff_hevc_decode_nal_sps(HEVCContext *s) default: av_log(s-avctx, AV_LOG_ERROR, 4:2:0, 4:2:2, 4:4:4 supports are currently specified for 8, 10 and 12 bits.\n); -return AVERROR_PATCHWELCOME; +ret = AVERROR_PATCHWELCOME; +goto err; } desc = av_pix_fmt_desc_get(sps-pix_fmt); -- 1.7.9.5 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org javascript:; http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org javascript:; http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] hevc_ps: verify P/T/L information
Hi Le 10 août 2014 à 15:16, Christophe Gisquet christophe.gisq...@gmail.com a écrit : Hi, 2014-08-10 14:42 GMT+02:00 Ronald S. Bultje rsbul...@gmail.com: Are we using the checked bitstream reader? If we are, we're fine already... I think we are. On the other hand, it seems the top caller, ff_hevc_decode_nal_vps, is never checking if we have read past the bitstream end. Shouldn't this be checked at the very end? Hitting the bitstream end yet not reporting invalid data at some point looks weird to me. information from the vps is not used for the time being. and yes we are using the checked bitstream reader. we rely on the AVC cabac engine. So, I'm just not sure this always yields vps/sps/... info, so catching it might be good. On the other hand, this doesn't help catching bugs in the code elsewhere. If not, maybe we should, because let's be honest, getbits is only in headers, so it's not particularly performance-sensitive. And this is high-level syntax (think sps), so indeed. They are some missing checks in the PSes but I think most of them are there. Mickaël ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] hevc_ps: verify P/T/L information
Hi Le 10 août 2014 à 15:48, Michael Niedermayer michae...@gmx.at a écrit : On Sun, Aug 10, 2014 at 03:16:23PM +0200, Christophe Gisquet wrote: Hi, 2014-08-10 14:42 GMT+02:00 Ronald S. Bultje rsbul...@gmail.com: Are we using the checked bitstream reader? If we are, we're fine already... I think we are. On the other hand, it seems the top caller, ff_hevc_decode_nal_vps, is never checking if we have read past the bitstream end. Shouldn't this be checked at the very end? Hitting the bitstream end yet not reporting invalid data at some point looks weird to me. So, I'm just not sure this always yields vps/sps/... info, so catching it might be good. On the other hand, this doesn't help catching bugs in the code elsewhere. If not, maybe we should, because let's be honest, getbits is only in headers, so it's not particularly performance-sensitive. And this is high-level syntax (think sps), so indeed. agree with all should i apply the patch or apply something else ? This can be applied. Mickaël ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] hevc_deblock: change tc type
Patch ok Mickael Le mercredi 6 août 2014, Christophe Gisquet christophe.gisq...@gmail.com a écrit : Hi, this patch is mostly cosmetical. I don't like seeing arrays passed to dsp functions being of a type whose length may not be fixed, though it's a small matter here. -- Christophe ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] x86/hevc_mc: remove an unnecessary pxor
Patch ok. Mickael Le lundi 4 août 2014, James Almer jamr...@gmail.com a écrit : Signed-off-by: James Almer jamr...@gmail.com javascript:; --- libavcodec/x86/hevc_mc.asm | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm index fc78062..a16b0ab 100644 --- a/libavcodec/x86/hevc_mc.asm +++ b/libavcodec/x86/hevc_mc.asm @@ -551,8 +551,7 @@ cglobal hevc_put_hevc_pel_pixels%1_%2, 5, 5, 3, dst, dststride, src, srcstride,h LOOP_END dst, dststride, src, srcstride RET -cglobal hevc_put_hevc_uni_pel_pixels%1_%2, 5, 5, 3, dst, dststride, src, srcstride,height -pxor m2, m2 +cglobal hevc_put_hevc_uni_pel_pixels%1_%2, 5, 5, 2, dst, dststride, src, srcstride,height .loop SIMPLE_LOAD %1, %2, srcq, m0 PEL_%2STORE%1 dstq, m0, m1 -- 1.8.5.5 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org javascript:; http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] hevc_mc: reduce stride for bidir temp buffers
Hi Christophe hevc.c |9 + 1 file changed, 5 insertions(+), 4 deletions(-) 2445ba15d38b2472f8f1b24aa75e63c089971480 0012-hevc_mc-reduce-stride-for-bidir-temp-buffers.patch From 126adf820bc54c2d00f794629595ad6310fbfc37 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet christophe.gisq...@gmail.com Date: Sat, 26 Jul 2014 17:17:18 +0200 Subject: [PATCH 12/13] hevc_mc: reduce stride for bidir temp buffers It is unconditionally set to 64, which is quite higher than the actual block size. is this faster? [...] I have the same concern. what is the gain? Mickaël ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel