Re: [FFmpeg-devel] [PATCH] hevc: fix WPP mode

2015-09-23 Thread Mickaël Raulet
Hi Christophe,

the fix looks weird to me. There is something else underlying.

Mickaël

2015-09-23 16:53 GMT+02:00 Ronald S. Bultje :

> Hi,
>
> On Wed, Sep 23, 2015 at 10:33 AM, Christophe Gisquet <
> christophe.gisq...@gmail.com> wrote:
>
> > Hi,
> >
> > under highly-threaded loads, parallel decoding of WPP is subject to a
> > race condition.
> >
> > This basically fixes ticket #4365.
>
>
> Nice catch! Lgtm.
>
> Ronald
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 4/5] x86: hevc_mc: fewer xmm regs used in epel h/v

2015-02-17 Thread Mickaël Raulet
Looks better to me.

Mickaël

Le mardi 17 février 2015, Christophe Gisquet christophe.gisq...@gmail.com
a écrit :

 2015-02-17 8:28 GMT+01:00 Mickaël Raulet mrau...@insa-rennes.fr
 javascript:;:
  It seems to me that you are affecting 8 when it is avx2 instead of 11.
  Shouldn't it be the opposite? At least this what the commit message says.


 Huh, brainfart... And the fact that I can't easily test avx2 doesn't help.

 So here's a patch with the values swapped out.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] hevc : support deinterlacing inside the decoder

2015-02-09 Thread Mickaël Raulet
I was trying the sample #4141 and it seems to me that the number of frames
to keep before outputting a frame is too low. Where does this bitstream
comes from?

Previous commit was available here:

https://github.com/OpenHEVC/FFmpeg/commit/6b93a7a175fb500d1f5d4d671b2fab73798ca7b6

This commit adds support for another kind of pic_struct I was not using
before (used in this sample).

https://github.com/OpenHEVC/FFmpeg/commit/e360c4077e9658669e80424e63fdcf07400290c7

To display it fluently you need add one extra field in hevc_refs.c (I don't
think I have to incresase it by one, but I might be wrong)

  nb_output = s-sps-temporal_layer[s-sps-max_sub_layers - 1].
num_reorder_pics + s-interlaced +1




2015-02-08 20:09 GMT+01:00 Carl Eugen Hoyos ceho...@ag.or.at:

 Kacper Michajłow kasper93 at gmail.com writes:

  2015-02-08 10:48 GMT+01:00 Carl Eugen Hoyos:
 
   Mickaël Raulet mraulet at insa-rennes.fr writes:
  
As we can consider, we won't have 4k interlaced
content, copying a field into a frame should be ok.
This is what has been done in this implementation.
  
   Do you have a sample?
   I am only interested in testing this.

  If I understand correctly, this should fix this ticket
  #4141. Sample is included.

 The patch - unfortunately! - makes no difference for
 this sample.

 Carl Eugen
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] DSP function ARM NEON patches for hevc

2015-02-05 Thread Mickaël Raulet
Michael,

Please find some commits that can be cherry picked from
https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch

Optimized deblocking filter (8bits only)
1b9ee47d2f43b0a029a9468233626102eb1473b8

Optimzed transform functions (4x4, 8x8, transform add 8bits only)
b153f55935969c794de4640f8d34e01c58e027ae

ARM NEON optimized qpel functions (8bits only)
965cd82e376f17125c0ad6465d14f4ab1749fda1

Comments welcome if there are any.

More coming soon for epel and SAO!

Mickaël
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] hevc : support deinterlacing inside the decoder

2015-02-05 Thread Mickaël Raulet
As we can consider, we won't have 4k interlaced content, copying a field
into a frame should be ok. This is what has been done in this
implementation.

Commit hash from ffmpeg/openhevc:

6b93a7a175fb500d1f5d4d671b2fab73798ca7b6

Comments welcome!

Mickaël
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 5/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3, avx2}

2015-02-04 Thread Mickaël Raulet
LGTM

Mickael

2015-02-04 13:39 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com
:

 Hi,

 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com:
  Original x86 intrinsics code and initial yasm port by Pierre-Edouard
 Lepere.
  Refactoring and optimizations by James Almer.

 Add your own copyright to this file then.

  Width 32
  158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
  5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1
 skips
  2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
 
  Width 64
  705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
  19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33
 skips
  10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29
 skips

 Are the first number for each case from before you split out the
 restore part? Otherwise, that's gruesome.

  -void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t
 stride_dst,
  -ptrdiff_t stride_src, int16_t
 *sao_offset_val, int sao_eo_class,
  -int width, int height);
  +void (*sao_edge_filter[5])(uint8_t *_dst, uint8_t *_src, ptrdiff_t
 stride_dst,
  +   ptrdiff_t stride_src, int16_t
 *sao_offset_val, int sao_eo_class,
  +   int width, int height);

 Maybe add a comment on top of that to indicate that _dst is
 16-byte-aligned?

 Also, src and stride_src are so that the buffer is 32-byte-aligned,
 because of:
 stride_dst = 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE;
 dst = lc-edge_emu_buffer + stride_dst +
 FF_INPUT_BUFFER_PADDING_SIZE;
 in hevc_filter.c, but I'm not sure how much it is a benefit here, or
 often it is helping here. Don't hesitate to modify them if need be.

  +%else ; ARCH_X86_32
  +cglobal hevc_sao_edge_filter_%1_8, 1, 7, 8, dst, src, dststride,
 srcstride, a_stride, b_stride, height

 As seen from above, srcstride is constant and is 2*MAX_PB_SIZE +
 FF_INPUT_BUFFER_PADDING_SIZE.
 That may save you one whole gpr. Not really useful here, but I think
 you are more limited for the8 bits case.
 If you want to exploit this, also add it above void (*sao_edge_filter[5])

 No comment on the actual assembly, it looks fine.

 --
 Christophe
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 6/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_{10, 12}_{sse2, avx2}

2015-02-04 Thread Mickaël Raulet
LGTM.

Mickael

2015-02-04 13:51 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com
:

 Hi,

 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com:

  -DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_1)= {
 0x0001000100010001ULL, 0x0001000100010001ULL };
  -DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_2)= {
 0x0002000200020002ULL, 0x0002000200020002ULL };
  +DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_1)= {
 0x0001000100010001ULL, 0x0001000100010001ULL,
  +
 0x0001000100010001ULL, 0x0001000100010001ULL };
  +DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_2)= {
 0x0002000200020002ULL, 0x0002000200020002ULL,
  +
 0x0002000200020002ULL, 0x0002000200020002ULL };
   DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_3)= {
 0x0003000300030003ULL, 0x0003000300030003ULL };
   DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_4)= {
 0x0004000400040004ULL, 0x0004000400040004ULL };
   DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_5)= {
 0x0005000500050005ULL, 0x0005000500050005ULL };
  @@ -48,7 +50,8 @@ DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_1019) = {
 0x03FB03FB03FB03FBULL, 0x03F
   DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_1024) = {
 0x0400040004000400ULL, 0x0400040004000400ULL };
   DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_2048) = {
 0x0800080008000800ULL, 0x0800080008000800ULL };
   DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_8192) = {
 0x2000200020002000ULL, 0x2000200020002000ULL };
  -DECLARE_ALIGNED(16, const xmm_reg,  ff_pw_m1)   = {
 0xULL, 0xULL };
  +DECLARE_ALIGNED(32, const ymm_reg,  ff_pw_m1)   = {
 0xULL, 0xULL,
  +
 0xULL, 0xULL };

 Nice of you to do this. There is more sharing to do, but I have
 patches waiting for your patchset and the avx2 patch to clean even
 more.

  +;void ff_hevc_sao_edge_filter_width_depth_opt(uint8_t *_dst,
 uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src,
  +;   int16_t
 *sao_offset_val, int eo, int width, int height);
  +%macro HEVC_SAO_EDGE_FILTER_16 3
  +%if WIN64
  +cglobal hevc_sao_edge_filter_%2_%1, 4, 8, 16, dst, src, dststride,
 srcstride, eo, a_stride, b_stride, height

 Ok, nevermind my comment in patch 5/6: 16 xmm regs are too much for
 x86_32. Or playing with the stack is required, but that would be
 another patch, if ever.

 Otherwise, nothing striking in that code, looks good.

 Thanks,
 --
 Christophe
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/6] hevcdsp: simplified sao_edge_filter

2015-02-03 Thread Mickaël Raulet
OK too.

2015-02-04 8:04 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com:

 Hi,

 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com:
  +int a_stride, b_stride;
  +int src_offset = 0;
  +int dst_offset = 0;

 Could maybe use ptrdiff_t type, like the other strides?

 With or without, ok.

 --
 Christophe
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 4/6] hevcdsp: replace the SAOParams struct parameter from sao_edge_filter

2015-02-03 Thread Mickaël Raulet
lgtm.

Mickael

2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com:

 As with sao_band_filter, pass instead the two variables from the struct
 needed in the function.
 This simplifies writing asm optimized versions.

 Signed-off-by: James Almer jamr...@gmail.com
 ---
  libavcodec/hevc_filter.c  | 4 +++-
  libavcodec/hevcdsp.h  | 4 ++--
  libavcodec/hevcdsp_template.c | 7 ++-
  3 files changed, 7 insertions(+), 8 deletions(-)

 diff --git a/libavcodec/hevc_filter.c b/libavcodec/hevc_filter.c
 index b002d87..bf88b47 100644
 --- a/libavcodec/hevc_filter.c
 +++ b/libavcodec/hevc_filter.c
 @@ -427,7 +427,9 @@ static void sao_filter_CTB(HEVCContext *s, int x, int
 y)

  copy_CTB_to_hv(s, src, stride_src, x0, y0, width, height,
 c_idx,
 x_ctb, y_ctb);
 -s-hevcdsp.sao_edge_filter(src, dst, stride_src, stride_dst,
 sao, width, height, c_idx);
 +s-hevcdsp.sao_edge_filter(src, dst, stride_src, stride_dst,
 +   sao-offset_val[c_idx],
 sao-eo_class[c_idx],
 +   width, height);
  s-hevcdsp.sao_edge_restore[restore](src, dst,
  stride_src, stride_dst,
  sao,
 diff --git a/libavcodec/hevcdsp.h b/libavcodec/hevcdsp.h
 index 53d7b1b..1510f39 100644
 --- a/libavcodec/hevcdsp.h
 +++ b/libavcodec/hevcdsp.h
 @@ -62,8 +62,8 @@ typedef struct HEVCDSPContext {
 int16_t *sao_offset_val, int
 sao_left_class, int width, int height);

  void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t
 stride_dst,
 -ptrdiff_t stride_src, SAOParams *sao, int
 width,
 -int height, int c_idx);
 +ptrdiff_t stride_src, int16_t
 *sao_offset_val, int sao_eo_class,
 +int width, int height);

  void (*sao_edge_restore[2])(uint8_t *_dst, uint8_t *_src, ptrdiff_t
 _stride_dst, ptrdiff_t _stride_src,
  struct SAOParams *sao, int *borders, int
 _width, int _height, int c_idx,
 diff --git a/libavcodec/hevcdsp_template.c b/libavcodec/hevcdsp_template.c
 index 4479435..ac98709 100644
 --- a/libavcodec/hevcdsp_template.c
 +++ b/libavcodec/hevcdsp_template.c
 @@ -328,9 +328,8 @@ static void FUNC(sao_band_filter_0)(uint8_t *_dst,
 uint8_t *_src,
  #define CMP(a, b) ((a)  (b) ? 1 : ((a) == (b) ? 0 : -1))

  static void FUNC(sao_edge_filter)(uint8_t *_dst, uint8_t *_src,
 -  ptrdiff_t stride_dst, ptrdiff_t
 stride_src, SAOParams *sao,
 -  int width, int height,
 -  int c_idx) {
 +  ptrdiff_t stride_dst, ptrdiff_t
 stride_src, int16_t *sao_offset_val,
 +  int eo, int width, int height) {

  static const uint8_t edge_idx[] = { 1, 2, 0, 3, 4 };
  static const int8_t pos[4][2][2] = {
 @@ -339,8 +338,6 @@ static void FUNC(sao_edge_filter)(uint8_t *_dst,
 uint8_t *_src,
  { { -1, -1 }, {  1, 1 } }, // 45 degree
  { {  1, -1 }, { -1, 1 } }, // 135 degree
  };
 -int16_t *sao_offset_val = sao-offset_val[c_idx];
 -int eo = sao-eo_class[c_idx];
  pixel *dst = (pixel *)_dst;
  pixel *src = (pixel *)_src;
  int a_stride, b_stride;
 --
 2.2.2

 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/6] hevcdsp: further simplify sao_edge_filter

2015-02-03 Thread Mickaël Raulet
ok.

2015-02-04 8:07 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com:

 Hi,

 2015-02-04 4:55 GMT+01:00 James Almer jamr...@gmail.com:
 [...]

 Ok, no need to resend a refreshed patch if patch 2/6 changes.

 --
 Christophe
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves

2015-02-03 Thread Mickaël Raulet
I will check it this evening.

Mickaël

2015-02-03 15:15 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com
:

 2015-02-03 12:57 GMT+01:00 Christophe Gisquet 
 christophe.gisq...@gmail.com:
  Actually, 940300945 does need to be reverted for the patch to work, as
  Mickael stated. It miscompiles hevc_mc.asm, more particularly the
  [eq]pel_hv functions. No idea why.

 The patch in [PATCH] x86: lavu/x264asm: fix ymm register
 instanciation fixes the generated assembly for me.

 Mickaël and/or James, could you confirm that? I'll submit it to the
 x264 project afterwards.

 --
 Christophe
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves

2015-02-03 Thread Mickaël Raulet
it works now for me.

Thanks,

Mickaël

Le 3 févr. 2015 à 15:28, Mickaël Raulet mrau...@insa-rennes.fr a écrit :

 I will check it this evening.
 
 Mickaël
 
 2015-02-03 15:15 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com:
 2015-02-03 12:57 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com:
  Actually, 940300945 does need to be reverted for the patch to work, as
  Mickael stated. It miscompiles hevc_mc.asm, more particularly the
  [eq]pel_hv functions. No idea why.
 
 The patch in [PATCH] x86: lavu/x264asm: fix ymm register
 instanciation fixes the generated assembly for me.
 
 Mickaël and/or James, could you confirm that? I'll submit it to the
 x264 project afterwards.
 
 --
 Christophe
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] support for monochrome sequences in hevc decoder

2015-02-02 Thread Mickaël Raulet
Hi here is a commit that support monochrome sequences!

https://github.com/OpenHEVC/FFmpeg/commit/8e50557707d2ec11ccad657470b2e140f314348e

Commit hash: 8e50557707d2ec11ccad657470b2e140f314348e

Mickael
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] support for monochrome sequences in hevc decoder

2015-02-02 Thread Mickaël Raulet
I did it first :)

2015-02-02 16:47 GMT+01:00 Michael Niedermayer michae...@gmx.at:

 On Mon, Feb 02, 2015 at 04:11:33PM +0100, Mickaël Raulet wrote:
  Hi here is a commit that support monochrome sequences!
 
 
 https://github.com/OpenHEVC/FFmpeg/commit/8e50557707d2ec11ccad657470b2e140f314348e
 
  Commit hash: 8e50557707d2ec11ccad657470b2e140f314348e

 is this an independant implementation than the one from fabrices BPG?
 or is it based on his ?
 iam asking so i know if i should add a Based-on: ... to the commit
 message

 Thanks

 [...]

 --
 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

 The misfortune of the wise is better than the prosperity of the fool.
 -- Epicurus

 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] x86: hevc_mc: remove non necessary moves

2015-02-02 Thread Mickaël Raulet
PL Lepere is the original author and I did some improvements on top of it.


Mickael

2015-02-02 18:11 GMT+01:00 Christophe Gisquet christophe.gisq...@gmail.com
:

 Hi,

 2015-02-02 17:16 GMT+01:00 Mickaël Raulet mrau...@insa-rennes.fr:
 
 https://github.com/OpenHEVC/FFmpeg/commit/940300945995c20f7583394ebe6907e72829b4a

 No longer apply cleanly, as multiple fixes and improvements have been
 committed since then.

 The attached patch fixes that, and passes on a non-avx2 machine. I
 can't test it, and I'm not looking forward to do debug through a ssh
 shell.

 And who is the actual author? It has been committed under your name,
 but wouldn't that be P-E Lepere rather?

 And I guess I'll drop the previous patch for now.

 --
 Christophe

 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/3] x86/hevc: add ff_hevc_sao_band_filter_{8, 10, 12}_{sse2, avx2}

2015-01-31 Thread Mickaël Raulet
LGTM.

Mickaël

Le samedi 31 janvier 2015, Christophe Gisquet christophe.gisq...@gmail.com
a écrit :

 Hi,

 2015-01-30 19:50 GMT+01:00 James Almer jamr...@gmail.com javascript:;:
  +%macro HEVC_SAO_BAND_FILTER_COMPUTE 3
  +psraw %2, %3, %1-5
  +pcmpeqw  m10, %2, m0
  +pcmpeqw  m11, %2, m1
  +pcmpeqw  m12, %2, m2
  +pcmpeqw   %2, m3
  +pand m10, m4
  +pand m11, m5
  +pand m12, m6
  +pand  %2, m7
  +por  m10, m11
  +por  m12, %2
  +por  m10, m12
  +paddw %3, m10
  +%endmacro

 The shift does really force to work on bytes, too bad. Some pshufb
 might still be possible using the result, but it would be cumbersome
 because the psraw result is [0-31], and offset might be signed.

  +.loop:
  +movu m13, [srcq+widthq]
 [...]
  +movu  [dstq+widthq], m8

 Some of those moves could be aligned, but there's some work to be done
 at the buffer levels. So it's not like it's really part of this patch.

 Looks good, any improvement seems like an additional patch.

 --
 Christophe
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org javascript:;
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/5] x86: hevc_mc: correct unneeded use of SSE4 code

2014-08-25 Thread Mickaël Raulet
this commit might help to solve the issue with SSE4
https://github.com/OpenHEVC/FFmpeg/commit/df8ebe304df453f26c28ff8f11d607f49b90a4c2

Mickaël

Le 24 août 2014 à 11:52, Michael Niedermayer michae...@gmx.at a écrit :

 On Sun, Aug 24, 2014 at 08:46:31AM +, Christophe Gisquet wrote:
 ---
 libavcodec/x86/hevc_mc.asm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 
 applied
 
 thanks
 
 [...]
 -- 
 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
 
 When you are offended at any man's fault, turn to yourself and study your
 own failings. Then you will forget your anger. -- Epictetus
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Mickaël Raulet
For 10bits and 12bits, they should stay sse4 as well because of packusdw. You 
need some instructions to convert it to ssse3 see below


static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b )
{
 a = _mm_slli_epi32 (a, 16);
 a = _mm_srai_epi32 (a, 16);
 b = _mm_slli_epi32 (b, 16);
 b = _mm_srai_epi32 (b, 16);
 a = _mm_packs_epi32 (a, b);
return a;
}

Mickaël



Le 23 août 2014 à 15:22, Christophe Gisquet christophe.gisq...@gmail.com a 
écrit :

 As far as I can see, the only reason those functions are SSE4 is because
 of the pextrw needed for the following block widths:
 - 2, used  only by chroma;
 - 6, used by chroma and indirectly by luma;
 - 12, used by both.
 The better solution would be to convert all chroma handling to NV12, but
 it is vastly simpler to modify the above cases to not use pextrw.
 
 This is done in 2 steps:
 - Fix width of 12 to do 8+4 instead of 6+6;
 - Modify the store macros for width 2 and 6 by passing data through
  a GPR (alas at the cost for some functions of a supplementary GPR).
 
 Christophe Gisquet (2):
  x86: hevc_mc: split differently calls
  x86: hevc_mc: convert to ssse3
 
 libavcodec/x86/hevc_mc.asm|  63 +++--
 libavcodec/x86/hevcdsp.h  |  48 ++--
 libavcodec/x86/hevcdsp_init.c | 561 ++
 3 files changed, 362 insertions(+), 310 deletions(-)
 
 -- 
 1.9.2.msysgit.0
 
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 0/4] Exploit compile-time constant

2014-08-22 Thread Mickaël Raulet
Patch okay.

Mickaël
Le 4 août 2014 à 10:31, Christophe Gisquet christophe.gisq...@gmail.com a 
écrit :

 Hi,
 
 2014-08-02 14:48 GMT+02:00 Michael Niedermayer michae...@gmx.at:
 seems to fail with
 libavcodec/x86/hevc_mc.asm:1258: error: (add:2) cannot reference symbol 
 `MAX_PB_SIZE' in preprocessor
 
 I forgot the initial patch when generating the patchset, that you can
 find here. I expect no changes for the others, so I didn't bother
 resending them/starting another thread.
 
 -- 
 Christophe
 0001-x86-hevc_mc-assume-2nd-source-stride-is-64.patch___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 0/4] Exploit compile-time constant

2014-08-22 Thread Mickaël Raulet
for the whole patchset.

Mickaël
Le 22 août 2014 à 13:25, Michael Niedermayer michae...@gmx.at a écrit :

 On Fri, Aug 22, 2014 at 11:40:17AM +0200, Mickaël Raulet wrote:
 Patch okay.
 
 patch applied
 
 just to make sure i dont misunderstand, that okay was just for this
 patch or the whole patchset ?
 
 thanks
 
 [...]
 -- 
 Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
 
 In fact, the RIAA has been known to suggest that students drop out
 of college or go to community college in order to be able to afford
 settlements. -- The RIAA

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] x86/hecv_res_add: add ff_hevc_transform_add{8, 16, 32}_8_avx

2014-08-20 Thread Mickaël Raulet
Patch ok

Mickael

Le mercredi 20 août 2014, James Almer jamr...@gmail.com a écrit :

 ~15% faster than sse2

 Signed-off-by: James Almer jamr...@gmail.com javascript:;
 ---
  libavcodec/x86/hevc_res_add.asm | 15 +++
  libavcodec/x86/hevcdsp.h|  4 
  libavcodec/x86/hevcdsp_init.c   |  4 
  3 files changed, 19 insertions(+), 4 deletions(-)

 diff --git a/libavcodec/x86/hevc_res_add.asm
 b/libavcodec/x86/hevc_res_add.asm
 index 47022d3..feea50c 100644
 --- a/libavcodec/x86/hevc_res_add.asm
 +++ b/libavcodec/x86/hevc_res_add.asm
 @@ -156,8 +156,8 @@ cglobal hevc_transform_add4_8, 3, 4, 6
  %endmacro


 -INIT_XMM sse2
 -; void ff_hevc_transform_add8_8_sse2(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride)
 +%macro TRANSFORM_ADD_8 0
 +; void ff_hevc_transform_add8_8_opt(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride)
  cglobal hevc_transform_add8_8, 3, 4, 8
  lea   r3, [r2*3]
  TR_ADD_SSE_8_8
 @@ -167,7 +167,7 @@ cglobal hevc_transform_add8_8, 3, 4, 8
  RET

  %if ARCH_X86_64
 -; void ff_hevc_transform_add16_8_sse2(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride)
 +; void ff_hevc_transform_add16_8_opt(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride)
  cglobal hevc_transform_add16_8, 3, 4, 12
  lea   r3, [r2*3]
  TR_ADD_SSE_16_8
 @@ -178,7 +178,7 @@ cglobal hevc_transform_add16_8, 3, 4, 12
  %endrep
  RET

 -; void ff_hevc_transform_add16_8_sse2(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride)
 +; void ff_hevc_transform_add32_8_opt(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride)
  cglobal hevc_transform_add32_8, 3, 4, 12

  TR_ADD_SSE_32_8
 @@ -190,6 +190,13 @@ cglobal hevc_transform_add32_8, 3, 4, 12
  RET

  %endif ;ARCH_X86_64
 +%endmacro
 +
 +INIT_XMM sse2
 +TRANSFORM_ADD_8
 +INIT_XMM avx
 +TRANSFORM_ADD_8
 +

  
 ;-
  ; void ff_hevc_transform_add_10(pixel *dst, int16_t *block, int stride)

  
 ;-
 diff --git a/libavcodec/x86/hevcdsp.h b/libavcodec/x86/hevcdsp.h
 index 7ced22c..74b5173 100644
 --- a/libavcodec/x86/hevcdsp.h
 +++ b/libavcodec/x86/hevcdsp.h
 @@ -139,6 +139,10 @@ void ff_hevc_transform_add8_8_sse2(uint8_t *dst,
 int16_t *coeffs, ptrdiff_t stri
  void ff_hevc_transform_add16_8_sse2(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride);
  void ff_hevc_transform_add32_8_sse2(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride);

 +void ff_hevc_transform_add8_8_avx(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride);
 +void ff_hevc_transform_add16_8_avx(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride);
 +void ff_hevc_transform_add32_8_avx(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride);
 +
  void ff_hevc_transform_add4_10_mmxext(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride);
  void ff_hevc_transform_add8_10_sse2(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride);
  void ff_hevc_transform_add16_10_sse2(uint8_t *dst, int16_t *coeffs,
 ptrdiff_t stride);
 diff --git a/libavcodec/x86/hevcdsp_init.c b/libavcodec/x86/hevcdsp_init.c
 index 0f9fe7d..f6f0a4b 100644
 --- a/libavcodec/x86/hevcdsp_init.c
 +++ b/libavcodec/x86/hevcdsp_init.c
 @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const
 int bit_depth)
  if (ARCH_X86_64) {
  c-hevc_v_loop_filter_luma =
 ff_hevc_v_loop_filter_luma_8_avx;
  c-hevc_h_loop_filter_luma =
 ff_hevc_h_loop_filter_luma_8_avx;
 +
 +c-transform_add[2]= ff_hevc_transform_add16_8_avx;
 +c-transform_add[3]= ff_hevc_transform_add32_8_avx;
  }
 +c-transform_add[1]= ff_hevc_transform_add8_8_avx;
  }
  if (EXTERNAL_AVX2(cpu_flags)) {
  c-idct_dc[2] = ff_hevc_idct16x16_dc_8_avx2;
 --
 1.8.5.5

 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org javascript:;
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/hevc_ps: do cleanup in case of unsupported bit depth

2014-08-20 Thread Mickaël Raulet
Ok.

Mickael

Le mercredi 20 août 2014, Timothy Gu timothyg...@gmail.com a écrit :

 On Tue, Aug 19, 2014 at 6:49 PM, Michael Niedermayer michae...@gmx.at
 javascript:; wrote:
  Fixes memleak
  Fixes CID1231989
 
  Signed-off-by: Michael Niedermayer michae...@gmx.at javascript:;
  ---
   libavcodec/hevc_ps.c |3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

 Looks OK.

 
  diff --git a/libavcodec/hevc_ps.c b/libavcodec/hevc_ps.c
  index 163c5e4..2ccce5f 100644
  --- a/libavcodec/hevc_ps.c
  +++ b/libavcodec/hevc_ps.c
  @@ -810,7 +810,8 @@ int ff_hevc_decode_nal_sps(HEVCContext *s)
   default:
   av_log(s-avctx, AV_LOG_ERROR,
  4:2:0, 4:2:2, 4:4:4 supports are currently specified
 for 8, 10 and 12 bits.\n);
  -return AVERROR_PATCHWELCOME;
  +ret = AVERROR_PATCHWELCOME;
  +goto err;
   }
 
   desc = av_pix_fmt_desc_get(sps-pix_fmt);
  --
  1.7.9.5
 
  ___
  ffmpeg-devel mailing list
  ffmpeg-devel@ffmpeg.org javascript:;
  http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org javascript:;
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] hevc_ps: verify P/T/L information

2014-08-10 Thread Mickaël Raulet
Hi

Le 10 août 2014 à 15:16, Christophe Gisquet christophe.gisq...@gmail.com a 
écrit :

 Hi,
 
 2014-08-10 14:42 GMT+02:00 Ronald S. Bultje rsbul...@gmail.com:
 Are we using the checked bitstream reader? If we are, we're fine already...
 
 I think we are. On the other hand, it seems the top caller,
 ff_hevc_decode_nal_vps, is never checking if we have read past the
 bitstream end. Shouldn't this be checked at the very end? Hitting the
 bitstream end yet not reporting invalid data at some point looks weird
 to me.
 

information from the vps is not used for the time being.
and yes we are using the checked bitstream reader. we rely on the AVC cabac 
engine.



 So, I'm just not sure this always yields vps/sps/... info, so catching
 it might be good. On the other hand, this doesn't help catching bugs
 in the code elsewhere.
 
 If not, maybe we should, because let's be honest, getbits is only in
 headers, so it's not particularly performance-sensitive.
 
 And this is high-level syntax (think sps), so indeed.

They are some missing checks in the PSes but I think most of them are there.

Mickaël 
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] hevc_ps: verify P/T/L information

2014-08-10 Thread Mickaël Raulet
Hi
Le 10 août 2014 à 15:48, Michael Niedermayer michae...@gmx.at a écrit :

 On Sun, Aug 10, 2014 at 03:16:23PM +0200, Christophe Gisquet wrote:
 Hi,
 
 2014-08-10 14:42 GMT+02:00 Ronald S. Bultje rsbul...@gmail.com:
 Are we using the checked bitstream reader? If we are, we're fine already...
 
 I think we are. On the other hand, it seems the top caller,
 ff_hevc_decode_nal_vps, is never checking if we have read past the
 bitstream end. Shouldn't this be checked at the very end? Hitting the
 bitstream end yet not reporting invalid data at some point looks weird
 to me.
 
 So, I'm just not sure this always yields vps/sps/... info, so catching
 it might be good. On the other hand, this doesn't help catching bugs
 in the code elsewhere.
 
 If not, maybe we should, because let's be honest, getbits is only in
 headers, so it's not particularly performance-sensitive.
 
 And this is high-level syntax (think sps), so indeed.
 
 agree with all
 
 should i apply the patch or apply something else ?
 

This can be applied.

Mickaël

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] hevc_deblock: change tc type

2014-08-06 Thread Mickaël Raulet
Patch ok


Mickael
Le mercredi 6 août 2014, Christophe Gisquet christophe.gisq...@gmail.com
a écrit :

 Hi,

 this patch is mostly cosmetical. I don't like seeing arrays passed to
 dsp functions being of a type whose length may not be fixed, though
 it's a small matter here.

 --
 Christophe

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/3] x86/hevc_mc: remove an unnecessary pxor

2014-08-04 Thread Mickaël Raulet
Patch ok.

Mickael

Le lundi 4 août 2014, James Almer jamr...@gmail.com a écrit :

 Signed-off-by: James Almer jamr...@gmail.com javascript:;
 ---
  libavcodec/x86/hevc_mc.asm | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

 diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm
 index fc78062..a16b0ab 100644
 --- a/libavcodec/x86/hevc_mc.asm
 +++ b/libavcodec/x86/hevc_mc.asm
 @@ -551,8 +551,7 @@ cglobal hevc_put_hevc_pel_pixels%1_%2, 5, 5, 3, dst,
 dststride, src, srcstride,h
  LOOP_END dst, dststride, src, srcstride
  RET

 -cglobal hevc_put_hevc_uni_pel_pixels%1_%2, 5, 5, 3, dst, dststride, src,
 srcstride,height
 -pxor  m2, m2
 +cglobal hevc_put_hevc_uni_pel_pixels%1_%2, 5, 5, 2, dst, dststride, src,
 srcstride,height
  .loop
  SIMPLE_LOAD   %1, %2, srcq, m0
  PEL_%2STORE%1   dstq, m0, m1
 --
 1.8.5.5

 ___
 ffmpeg-devel mailing list
 ffmpeg-devel@ffmpeg.org javascript:;
 http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] hevc_mc: reduce stride for bidir temp buffers

2014-07-27 Thread Mickaël Raulet
Hi Christophe
 hevc.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)
 2445ba15d38b2472f8f1b24aa75e63c089971480  
 0012-hevc_mc-reduce-stride-for-bidir-temp-buffers.patch
 From 126adf820bc54c2d00f794629595ad6310fbfc37 Mon Sep 17 00:00:00 2001
 From: Christophe Gisquet christophe.gisq...@gmail.com
 Date: Sat, 26 Jul 2014 17:17:18 +0200
 Subject: [PATCH 12/13] hevc_mc: reduce stride for bidir temp buffers
 
 It is unconditionally set to 64, which is quite higher than the
 actual block size.
 
 is this faster?
 
 [...]
I have the same concern. what is the gain?

Mickaël
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel