Re: [FFmpeg-devel] [PATCH v3] libavcodec/vp8dec: fix the multi-thread HWAccel decode error

2019-07-01 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> myp...@gmail.com
> Sent: Monday, July 1, 2019 3:28 PM
> To: FFmpeg development discussions and patches 
> Cc: Xiang, Haihao 
> Subject: Re: [FFmpeg-devel] [PATCH v3] libavcodec/vp8dec: fix the
> multi-thread HWAccel decode error
> 
> On Mon, Jul 1, 2019 at 2:38 PM Wang, Shaofei 
> wrote:
> Is it this patch https://patchwork.ffmpeg.org/patch/13723/ fix the same
> issue?
It won't fix this issue in vp8.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3] libavcodec/vp8dec: fix the multi-thread HWAccel decode error

2019-07-01 Thread Wang, Shaofei
Hello here,
A simple ping about this patch
Please feel free to ask if you have any question 
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2] libavcodec/vp8dec: fix the multi-thread HWAccel decode error

2019-06-06 Thread Wang, Shaofei
> -Original Message-
> From: Xiang, Haihao
> Sent: Thursday, June 6, 2019 11:57 AM
> To: ffmpeg-devel@ffmpeg.org; Wang, Shaofei 
> Subject: Re: [FFmpeg-devel] [PATCH v2] libavcodec/vp8dec: fix the
> multi-thread HWAccel decode error
> 
> On Tue, 2019-06-04 at 15:21 +0800, Wang, Shaofei wrote:
> > > -Original Message-
> > > From: Xiang, Haihao
> > > Sent: Tuesday, May 28, 2019 12:23 PM
> > > To: ffmpeg-devel@ffmpeg.org
> > > Cc: Wang, Shaofei 
> > > Subject: Re: [FFmpeg-devel] [PATCH v2] libavcodec/vp8dec: fix the
> > > multi-thread HWAccel decode error
> > >
> > > On Thu, 2019-03-28 at 13:28 -0400, Shaofei Wang wrote:
> > > > Fix the issue: https://github.com/intel/media-driver/issues/317
> > > >
> > > > the root cause is update_dimensions will be called multple times
> > > > when decoder thread number is not only 1, but update_dimensions
> > > > call get_pixel_format in each decode thread will trigger the
> > > > hwaccel_uninit/hwaccel_init more than once. But only one hwaccel
> > > > should be shared with all decode threads.
> > > > in current context,
> > > > there are 3 situations in the update_dimensions():
> > > > 1. First time calling. No matter single thread or multithread,
> > > >get_pixel_format() should be called after dimensions were
> > > >set;
> > > > 2. Dimention changed at the runtime. Dimention need to be
> > > >updated when macroblocks_base is already allocated,
> > > >get_pixel_format() should be called to recreate new frames
> > > >according to updated dimention;
> > >
> > > s/Dimention/dimension ?
> >
> > OK, should be dimension
> >
> > > BTW this version of patch doesn't address the concern provided when
> > > reviewing the first version of patch.
> > >
> > > When (width != s->avctx->width || height != s->avctx->height) is
> > > true,
> > > ff_set_dimensions() is called even if s->macroblocks_base is not
> > > allocated, so why set dim_reset to (s->macroblocks_base != NULL)? I
> > > think dim_reset should be set to 1.
> >
> > If s->macroblocks_base is available, it means macroblocks_base of the
> > context  has been already allocated by one of threads, so it's a reset
> > operation when  (width != s->avctx->width...
> > If s->macroblocks_base is null, it's not allocated yet, and (width !=
> > s->avctx->width..., just dimension need to be updated but it's not a
> > dim reset operation. Since we only call get_pixel_format() in the
> > first thread or  in the reset operation
> 
> Is it reasonable when dimension is updated however the low level frame still
> use stale dimension info?
> 
> Thanks
> Haihao

The low level frame, especially hw frame will just need to be updated once.
For example, the init phase of the first thread will call update_dimension and
init hwaccel by get_pixel_format, but not needed to update low level frame for
the follow-up threads.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2] libavcodec/vp8dec: fix the multi-thread HWAccel decode error

2019-06-04 Thread Wang, Shaofei
> -Original Message-
> From: Xiang, Haihao
> Sent: Tuesday, May 28, 2019 12:23 PM
> To: ffmpeg-devel@ffmpeg.org
> Cc: Wang, Shaofei 
> Subject: Re: [FFmpeg-devel] [PATCH v2] libavcodec/vp8dec: fix the
> multi-thread HWAccel decode error
> 
> On Thu, 2019-03-28 at 13:28 -0400, Shaofei Wang wrote:
> > Fix the issue: https://github.com/intel/media-driver/issues/317
> >
> > the root cause is update_dimensions will be called multple times when
> > decoder thread number is not only 1, but update_dimensions call
> > get_pixel_format in each decode thread will trigger the
> > hwaccel_uninit/hwaccel_init more than once. But only one hwaccel
> > should be shared with all decode threads.
> > in current context,
> > there are 3 situations in the update_dimensions():
> > 1. First time calling. No matter single thread or multithread,
> >get_pixel_format() should be called after dimensions were
> >set;
> > 2. Dimention changed at the runtime. Dimention need to be
> >updated when macroblocks_base is already allocated,
> >get_pixel_format() should be called to recreate new frames
> >according to updated dimention;
> 
> s/Dimention/dimension ?
OK, should be dimension

> BTW this version of patch doesn't address the concern provided when
> reviewing the first version of patch.
> 
> When (width != s->avctx->width || height != s->avctx->height) is true,
> ff_set_dimensions() is called even if s->macroblocks_base is not allocated, so
> why set dim_reset to (s->macroblocks_base != NULL)? I think dim_reset
> should be set to 1.
If s->macroblocks_base is available, it means macroblocks_base of the context 
 has been already allocated by one of threads, so it's a reset operation when
 (width != s->avctx->width...
If s->macroblocks_base is null, it's not allocated yet, and
(width != s->avctx->width..., just dimension need to be updated but it's not a
dim reset operation. Since we only call get_pixel_format() in the first thread 
or
 in the reset operation

> > if (width  != s->avctx->width || ((width+15)/16 != s->mb_width ||
> > (height+15)/16 != s->mb_height) && s->macroblocks_base ||
> >  height != s->avctx->height) { @@ -196,9 +196,12 @@ int
> > update_dimensions(VP8Context *s, int width, int height, int is_vp7)
> >  ret = ff_set_dimensions(s->avctx, width, height);
> >  if (ret < 0)
> >  return ret;
> > +
> > +dim_reset = (s->macroblocks_base != NULL);
> 
> 
> > 3. Multithread first time calling. After decoder init, the
> >other threads will call update_dimensions() at first time
> >to allocate macroblocks_base and set dimensions.
> >But get_pixel_format() is shouldn't be called due to low
> >level frames and context are already created.
> >
> > In this fix, we only call update_dimensions as need.
> >
> > Signed-off-by: Wang, Shaofei 
> > Reviewed-by: Jun, Zhao 
> > Reviewed-by: Haihao Xiang 
> > ---
> > Previous code reviews:
> > 2019-03-06 9:25 GMT+01:00, Wang, Shaofei :
> > > > -Original Message-
> > > > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On
> > > > Behalf Of Carl Eugen Hoyos
> > > > Sent: Wednesday, March 6, 2019 3:49 PM
> > > > To: FFmpeg development discussions and patches
> > > > 
> > > > Subject: Re: [FFmpeg-devel] [PATCH] libavcodec/vp8dec: fix the
> > > > multi-thread HWAccel decode error
> > > >
> > > > 2018-08-09 9:09 GMT+02:00, Jun Zhao :
> > > > > the root cause is update_dimentions call get_pixel_format will
> > > > > trigger the hwaccel_uninit/hwaccel_init , in current context,
> > > > > there are 3 situations in the update_dimentions():
> > > > > 1. First time calling. No matter single thread or multithread,
> > > > >get_pixel_format() should be called after dimentions were
> > > > >set;
> > > > > 2. Dimention changed at the runtime. Dimention need to be
> > > > >updated when macroblocks_base is already allocated,
> > > > >get_pixel_format() should be called to recreate new frames
> > > > >according to updated dimention; 3. Multithread first time
> > > > > calling. After decoder init, the
> > > > >other threads will call update_dimentions() at first time
> > > > >to allocate macroblocks_base and set dimentions.
> > > > >But get_pixel_format() is shouldn't be calle

Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-03-27 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Carl Eugen Hoyos
> Sent: Tuesday, March 26, 2019 6:36 PM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode +
> N filter graphs and adaptive bitrate.
> 
> 2019-03-26 23:07 GMT+01:00, Shaofei Wang :
> > It enabled MULTIPLE SIMPLE filter graph concurrency, which bring above
> > about 4%~20% improvement in some 1:N scenarios by CPU or GPU
> > acceleration
> 
> Which version of the patch did you test to get this numbers?
> 
The current one got it. It may move depend on different platform, density and 
case
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-03-27 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Michael Niedermayer
> Sent: Wednesday, March 27, 2019 5:24 AM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode +
> N filter graphs and adaptive bitrate.
 
> this still fails with some (fuzzed) samples valgrind does not seem to produce
> much usefull though ...
Still the join issue, will fix them in next version, thanks
> Error while decoding stream #0:0: Invalid data found when processing input
> [rv40 @ 0x25829f40] marking unfished frame as finished
> [rv40 @ 0x25829f40] concealing 27 DC, 27 AC, 27 MV errors in B frame
> [rv40 @ 0x2584f000] Slice indicates MB offset 142, got 140
> [rv40 @ 0x2584f000] Dquant for P-frame
> [rv40 @ 0x2584f000] concealing 2 DC, 2 AC, 2 MV errors in P frame Error
> while decoding stream #0:0: Invalid data found when processing input
> [rv40 @ 0x256dfb40] Slice indicates MB offset 140, got 135
> [rv40 @ 0x256dfb40] concealing 5 DC, 5 AC, 5 MV errors in B frame
> [rv40 @ 0x25701940] concealing 112 DC, 112 AC, 112 MV errors in P frame
> Error while decoding stream #0:0: Invalid data found when processing
> input=N/A
> Last message repeated 1 times
> [rv40 @ 0x25804e80] First slice header is incorrect
> [rv40 @ 0x256dfb40] Slice indicates MB offset 187, got 140
> [rv40 @ 0x256dfb40] Dquant for P-frame
> [rv40 @ 0x256dfb40] concealing 47 DC, 47 AC, 47 MV errors in P frame
> [rv40 @ 0x25726a00] Dquant for B-frame
> [rv40 @ 0x25770b80] concealing 115 DC, 115 AC, 115 MV errors in P frame
> Error while decoding stream #0:0: Invalid data found when processing
> input=N/A
> [rv40 @ 0x25804e80] Dquant for P-frame
> [rv40 @ 0x257dfdc0] Dquant for P-framee=-577014:32:22.77 bitrate=N/A
> speed=N/A
> [rv40 @ 0x25804e80] Dquant for B-frame
> Too many packets buffered for output stream 0:0.
> pthread_join failed with error: Resource deadlock avoided
> 
> gdb output:
> pthread_join failed with error: Resource deadlock avoided
> 
> Program received signal SIGABRT, Aborted.
> [Switching to Thread 0x7fffe4def700 (LWP 13350)]
> 0x7fffefeaec37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) bt
> Python Exception  No module named
> gdb.frames:
> #0  0x7fffefeaec37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x7fffefeb2028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x0042bea1 in strict_pthread_join (thread=140737033205504,
> value_ptr=0x0) at ./libavutil/thread.h:55
> #3  0x0042d342 in ffmpeg_cleanup (ret=1) at fftools/ffmpeg.c:526
> #4  0x004250d8 in exit_program (ret=1) at fftools/cmdutils.c:139
> #5  0x0042dfd4 in write_packet (of=0x231b780,
> pkt=0x7fffe4dee680, ost=0x237f680, unqueue=0) at fftools/ffmpeg.c:738
> #6  0x0042eb39 in output_packet (of=0x231b780,
> pkt=0x7fffe4dee680, ost=0x237f680, eof=0) at fftools/ffmpeg.c:903
> #7  0x00430c30 in do_video_out (of=0x231b780, ost=0x237f680,
> next_picture=0x7fffa4005200, sync_ipts=98940.800010681152) at
> fftools/ffmpeg.c:1337
> #8  0x00431986 in reap_filters (flush=1, ifilter=0x22d2c00) at
> fftools/ffmpeg.c:1533
> #9  0x00434dc8 in filter_pipeline (arg=0x22d2c00) at
> fftools/ffmpeg.c:2318
> #10 0x70249184 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #11 0x7fffeff7603d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> 
> [...]
> --
> Michael GnuPG fingerprint:
> 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Those who are too smart to engage in politics are punished by being governed
> by those who are dumber. -- Plato
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-03-27 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Mark Thompson
> Sent: Wednesday, March 27, 2019 6:44 AM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode +
> N filter graphs and adaptive bitrate.
>
It will always without an ending to work out a perfect solution, that' also the
current situation of the parallelism issue in ffmpeg.
No matter how users would like to resolved the "well-known" serial
limitation by them self in their app, but why not do some positive thing in
the big amount user used ffmpeg tool? Although it's codebase was not
designed with parallelism mind, it can also be paralleled at some key part
of pipeline as the patch shows with lightly modification. Otherwise, a brand
new parallel designed ffmpeg can be delivered by extra big effort, and
concerns, behaviour problems will no less than the current one.
Some thoughts:
- Provide the option -abr_pipeline to enable this multiple simple filter graph
parallel optimization instead of by default, which means users who enable
it knows what they are doing and performance pursuing. The feature/
optimization shouldn't cover all the situations in ffmpeg which is a cool box
provides huge amount of different features. And some cases need parallel,
some not.
- To fix those already known parallel problems such as -benchmark_all.
Those consideration are good to improve and refine the feature/solution,
, even some hidden/not easy triggered issue found in future, they can be
fixed by dedicated patch.

Thanks

> On further thought, I don't think this patch should continue in its present
> form.
> 
> The fundamental problem is that the bolted-on parallelism renders it pretty
> much unreviewable - the codebase here was not designed with any
> parallelism in mind, so it does not follow the usual rules expected of such
> code.  In particular, there are global variables in many places accessed
> without regard for data races, and that becomes fatal undefined behaviour
> once parallelism is added.  While in theory every one of those can be fixed
> (by adding more locks, or one big lock covering many instances together), it
> will be very hard to verify that all cases have been covered and the code will
> suffer significantly in readability/maintainability for the change.
> 
> To give an explicit example of the sort of problems you are hitting, the
> -benchmark_all option is entirely broken in the current proposed version.  It
> invokes undefined behaviour by writing to the current_time global in every
> thread.  Even if that were avoided by moving the value to a thread-local
> structure, it also gives wrong results - getrusage(RUSAGE_SELF) returns values
> for the whole process, so it won't say anything useful once there are multiple
> threads looking at the value in parallel.  Perhaps that is fixable by using 
> some
> sort of per-thread storage and RUSAGE_THREAD (and some equivalent on
> Windows), but whatever that is certainly requires more investigation and
> probably a separate patch.
> 
> Three possible thoughts on where to go from here:
> 
> * Start by refactoring the code to make it easier to verify.  This would 
> entail
> something like removing all global variable accesses from parallel paths, and
> then ensuring that each thread only ever writes to its own local working set
> (e.g. OutputStream structure) while marking all other structures as const.
> After such refactoring has happened, a new version of the present patch
> would then be possible to assess.
> 
> * Run exhaustively in tsan/valgrind/other race detectors and fix every
> problem it finds, then provide evidence from that to help with review.  Since
> these issues can be hidden behind specific option combinations (as the
> benchmark one is), this would need a lot of care to ensure full coverage, and
> some parts (like the benchmark one) would need rewriting anyway.  As
> noted above I'm not sure this would be very good for the code, but we could
> at least have some amount of confidence that the result was correct.
For the existing code, already got lots of issues detected by valgrind/tsan.
We can provide evidence to prove the problem have been covered.
> 
> * Reconsider whether the patch is worth pursuing.  While I agree that a
> correct implementation of this would help in the specific use-cases you
> describe, I'm not sure that the set of users who gain from it is actually
> particularly large - most people wanting to maximise throughput in the way
> this change can help with already run multiple FFmpeg instances (or their own
> programs using the libraries) because the serial limitations of FFmpeg are
> well-known.
To run multiple FFmpeg instances couldn't stand for 1:N cases. And of course
users can use their own program to call the libraries, but we also see a lot of
users who use ffmpeg tool to build up their solution, demo, App and etc.


Re: [FFmpeg-devel] [PATCH v7] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-03-25 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Mark Thompson
> Sent: Saturday, March 23, 2019 9:42 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v7] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> 
> On 21/03/2019 15:09, Shaofei Wang wrote:
> > It enabled MULTIPLE SIMPLE filter graph concurrency, which bring above
> > about 4%~20% improvement in some 1:N scenarios by CPU or GPU
> > acceleration
> >
> > ...>
> > Signed-off-by: Wang, Shaofei 
> > Reviewed-by: Michael Niedermayer 
> > Reviewed-by: Mark Thompson 
> 
> The reviewed-by annotation generally implies review approval for a patch,
> which I don't think is true for either of the stated people.
> 
Will remove them. Just thought about the reviewer contribution.
Thx
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v7] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-03-22 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Michael Niedermayer
> Sent: Friday, March 22, 2019 8:14 AM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH v7] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> 
> This changes the output for some cases like:
> 
> ./ffmpeg -y -i mm-short.mpg -flags +aic -qscale 2 -vcodec h263p -t 1  -qscale
> 10 old.avi
> 
> -rw-r- 1 michael michael 86434 Mar 22 01:07 new.avi
> -rw-r- 1 michael michael 86402 Mar 22 01:07 old.avi
> 
> i assume this is unintended
> 
thanks
tested several rounds which not reproduced
what's the resolution of "mm-short.mpg"? it could not be very accurated in 
terms of fps with the option "-t 1" which may bring a little bit more or less 
output stream.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavcodec/vp8dec: fix the multi-thread HWAccel decode error

2019-03-06 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Carl Eugen Hoyos
> Sent: Wednesday, March 6, 2019 3:49 PM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH] libavcodec/vp8dec: fix the multi-thread
> HWAccel decode error
> 
> 2018-08-09 9:09 GMT+02:00, Jun Zhao :
> > the root cause is update_dimentions call get_pixel_format will trigger
> > the hwaccel_uninit/hwaccel_init , in current context, there are 3
> > situations in the update_dimentions():
> > 1. First time calling. No matter single thread or multithread,
> >get_pixel_format() should be called after dimentions were
> >set;
> > 2. Dimention changed at the runtime. Dimention need to be
> >updated when macroblocks_base is already allocated,
> >get_pixel_format() should be called to recreate new frames
> >according to updated dimention;
> > 3. Multithread first time calling. After decoder init, the
> >other threads will call update_dimentions() at first time
> >to allocate macroblocks_base and set dimentions.
> >But get_pixel_format() is shouldn't be called due to low
> >level frames and context are already created.
> > In this fix, we only call update_dimentions as need.
> >
> > Signed-off-by: Wang, Shaofei 
> > Reviewed-by: Jun, Zhao 
> > ---
> >  libavcodec/vp8.c |7 +--
> >  1 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/libavcodec/vp8.c b/libavcodec/vp8.c index
> > 3adfeac..18d1ada 100644
> > --- a/libavcodec/vp8.c
> > +++ b/libavcodec/vp8.c
> > @@ -187,7 +187,7 @@ static av_always_inline  int
> > update_dimensions(VP8Context *s, int width, int height, int is_vp7)  {
> >  AVCodecContext *avctx = s->avctx;
> > -int i, ret;
> > +int i, ret, dim_reset = 0;
> >
> >  if (width  != s->avctx->width || ((width+15)/16 != s->mb_width ||
> > (height+15)/16 != s->mb_height) && s->macroblocks_base ||
> >  height != s->avctx->height) { @@ -196,9 +196,12 @@ int
> > update_dimensions(VP8Context *s, int width, int height, int is_vp7)
> >  ret = ff_set_dimensions(s->avctx, width, height);
> >  if (ret < 0)
> >  return ret;
> > +
> > +dim_reset = (s->macroblocks_base != NULL);
> >  }
> >
> > -if (!s->actually_webp && !is_vp7) {
> > +if ((s->pix_fmt == AV_PIX_FMT_NONE || dim_reset) &&
> > + !s->actually_webp && !is_vp7) {
> 
> Why is the new variable dim_reset needed?
> Wouldn't the patch be simpler if you used s->macroblocks_base here?
Since dim_reset was set in the "if" segment, it equal to
(width != s->avctx->width || ((width+15)/16 != s->mb_width ||
(height+15)/16 != s->mb_height) || height != s->avctx->height) && 
s->macroblocks_base

> 
> Carl Eugen
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v5] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-02-20 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Mark Thompson
> Sent: Saturday, February 16, 2019 8:12 PM
> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v5] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> On 15/02/2019 21:54, Shaofei Wang wrote:
> > It enabled multiple filter graph concurrency, which bring above about
> > 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> >
> > Below are some test cases and comparison as reference.
> > (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> > (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> >
> > For 1:N transcode by GPU acceleration with vaapi:
> > ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
> > -hwaccel_output_format vaapi \
> > -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> > -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
> > -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null
> >
> > test results:
> > 2 encoders 5 encoders 10 encoders
> > Improved   6.1%6.9%   5.5%
> >
> > For 1:N transcode by GPU acceleration with QSV:
> > ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> > -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> > -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null
> \
> > -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null
> > /dev/null
> >
> > test results:
> > 2 encoders  5 encoders 10 encoders
> > Improved   6%   4% 15%
> >
> > For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> > ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> > -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> > -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f
> null /dev/null \
> > -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f
> > null /dev/null
> >
> > test results:
> > 2 scale  5 scale   10 scale
> > Improved   12% 21%21%
> >
> > For CPU only 1 decode to N scaling:
> > ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> > -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
> > -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null
> >
> > test results:
> > 2 scale  5 scale   10 scale
> > Improved   25%107%   148%
> >
> > Signed-off-by: Wang, Shaofei 
> > Reviewed-by: Zhao, Jun 
> > ---
> >  fftools/ffmpeg.c| 121
> 
> >  fftools/ffmpeg.h|  14 ++
> >  fftools/ffmpeg_filter.c |   1 +
> >  3 files changed, 128 insertions(+), 8 deletions(-)
> 
> On a bit more review, I don't think this patch works at all.
> 
It has been tested and verified by a lot of cases. More fate cases need to be 
covered now.

> The existing code is all written to be run serially.  This simplistic 
> approach to
> parallelising it falls down because many of those functions use variables
> written in what were previously other functions called at different times but
> have now become other threads, introducing undefined behaviour due to
> data races.
>
Actually, this is not a patch to parallel every thing in the ffmpeg. It just 
thread the input filter
of the filter graph(tend for simple filter graph), which is a simple way to 
improve N filter graph
performance and also without introduce huge modification. So that there is 
still a lot of serial function call, differences
are that each filter graph need to init its output stream instead of init all 
together and each
filter graph will reap filters for its filter chain.

> To consider a single example (not the only one), the function
> check_init_output_file() does not work at all after this change.  The test for
> OutputStream initialisation (so that you run exactly once after all of the
> output streams are ready) races with other threads setting those variables.
> Since that's undefined behaviour you may get lucky sometimes and have the
> output file initialisation run exactly once, but in general it will fail in 
> unknown
> ways.
> 

The check_init_output_file() should be responsible for the output file related 
with
specified output stream which managed by each thread chain, that means even it
called by different thread, the data setting are different. Let me double check.

> If you want to resubmit this patch, you will need to refactor a lot of the 
> other
> code in ffmpeg.c to rule out these undefined cases.
> 
OK. This patch would only effect on SIMPLE filter graph.

> - Mark
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v5] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-02-17 Thread Wang, Shaofei
Thanks. Seems I need to cover external samples either

> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Michael Niedermayer
> Sent: Saturday, February 16, 2019 5:22 AM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH v5] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> 
> On Fri, Feb 15, 2019 at 04:54:23PM -0500, Shaofei Wang wrote:
> > It enabled multiple filter graph concurrency, which bring above about
> > 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> >
> > Below are some test cases and comparison as reference.
> > (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> > (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> 
> breaks fate
> 
> make -j12 fate-filter-overlay-dvdsub-2397 V=2
> 
> 
> frame=  208 fps=0.0 q=-0.0 Lsize=  48kB time=00:00:08.04 bitrate=
> 49.0kbits/s speed=10.6x
> video:105300kB audio:1254kB subtitle:0kB other streams:0kB global
> headers:0kB muxing overhead: unknown pthread_join failed with error: No
> such process Aborted (core dumped)
> make: *** [fate-filter-overlay-dvdsub-2397] Error 134
> 
> [...]
> --
> Michael GnuPG fingerprint:
> 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Asymptotically faster algorithms should always be preferred if you have
> asymptotical amounts of data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v4] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-02-13 Thread Wang, Shaofei
> >> sizeof(AVFrame) is not part of the ABI.  You need to allocate it
> >> somewhere.
> >>
> > Please tell more?
> 
> See the documentation for AVFrame in libavutil/frame.h Use av_frame_alloc()
> 
> Carl Eugen

Thanks Carl
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v4] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-02-12 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Mark Thompson
> Sent: Tuesday, February 12, 2019 8:18 AM
It should be UTC time when received the email

> To: ffmpeg-devel@ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v4] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> 
> On 11/02/2019 22:41, Shaofei Wang wrote:
And the above time I've sent the previous email is also a correct UTC time
 
> Please avoid sending messages from the future - the list received this about
> thirteen hours before its supposed send time (received "Mon, 11 Feb 2019
> 11:42:09 +0200", sent "Mon, 11 Feb 2019 17:41:04 -0500").

> Probably the sending machine or some intermediate has an incorrect time or
> time zone.
It may be the reason.

> Some numbers for more use-cases and platforms (with different architectures
> and core counts) would be a good idea if you intend to enable this by default.
It would be better to have more platforms data.
Actually, it provide option for user to choose a "faster" path in the previous
version. In this patch it simplified code path.

> Presumably it's a bit slower on less powerful machines with fewer cores when
> it makes many threads, but by how much?  Is that acceptable?
Is it resource limited machine that we should disable HAVE_THREADS?

> > diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c index
> > 544f1a1..67b1a2a 100644
> > --- a/fftools/ffmpeg.c
> > +++ b/fftools/ffmpeg.c
> > @@ -1419,13 +1419,18 @@ static void
> finish_output_stream(OutputStream *ost)
> >   *
> >   * @return  0 for success, <0 for severe errors
> >   */
> > -static int reap_filters(int flush)
> > +static int reap_filters(int flush, InputFilter * ifilter)
> >  {
> >  AVFrame *filtered_frame = NULL;
> >  int i;
> >
> > -/* Reap all buffers present in the buffer sinks */
> > +/* Reap all buffers present in the buffer sinks or just reap specified
> > + * input filter buffer */
> >  for (i = 0; i < nb_output_streams; i++) {
> > +if (ifilter) {
> > +if (ifilter != output_streams[i]->filter->graph->inputs[0])
> > +continue;
> > +}
> 
> No mixed declarations and code.
OK. 
> >  OutputStream *ost = output_streams[i];
> >  OutputFile*of = output_files[ost->file_index];
> >  AVFilterContext *filter;
> 
> How carefully has this been audited to make sure that there are no data races?
> The calls to init_output_stream() and do_video_out() both do /a lot/, and in
> particular they interact with the InputStream which might be shared with
> other threads (and indeed is in all your examples above).
Base on the code path of multithread, it won't have duplicated path to call
init_output_stream() and do_video_out(), since there's no output stream share
multiple filter graphs. And this concern should be hightlight, will investigate
more in the code.

> > @@ -2179,7 +2184,8 @@ static int ifilter_send_frame(InputFilter *ifilter,
> AVFrame *frame)
> >  }
> >  }
> >
> > -ret = reap_filters(1);
> > +ret = HAVE_THREADS ? reap_filters(1, ifilter) :
> > + reap_filters(1, NULL);
> > +
> >  if (ret < 0 && ret != AVERROR_EOF) {
> >  av_log(NULL, AV_LOG_ERROR, "Error while filtering: %s\n",
> av_err2str(ret));
> >  return ret;
> > @@ -2208,6 +2214,14 @@ static int ifilter_send_eof(InputFilter
> > *ifilter, int64_t pts)
> >
> >  ifilter->eof = 1;
> >
> > +#if HAVE_THREADS
> > +ifilter->waited_frm = NULL;
> > +pthread_mutex_lock(>process_mutex);
> > +ifilter->t_end = 1;
> > +pthread_cond_signal(>process_cond);
> > +pthread_mutex_unlock(>process_mutex);
> > +pthread_join(ifilter->f_thread, NULL); #endif
> >  if (ifilter->filter) {
> >  ret = av_buffersrc_close(ifilter->filter, pts,
> AV_BUFFERSRC_FLAG_PUSH);
> >  if (ret < 0)
> > @@ -2252,12 +2266,95 @@ static int decode(AVCodecContext *avctx,
> AVFrame *frame, int *got_frame, AVPacke
> >  return 0;
> >  }
> >
> > +#if HAVE_THREADS
> > +static void *filter_pipeline(void *arg) {
> > +InputFilter *fl = arg;
> > +AVFrame *frm;
> > +int ret;
> > +while(1) {
> > +pthread_mutex_lock(>process_mutex);
> > +while (fl->waited_frm == NULL && !fl->t_end)
> > +pthread_cond_wait(>process_cond, >process_mutex);
> > +pthread_mutex_unlock(>process_mutex);
> > +
> > +if (fl->t_end) break;
> > +
> > +frm = fl->waited_frm;
> > +ret = ifilter_send_frame(fl, frm);
> > +if (ret < 0) {
> > +av_log(NULL, AV_LOG_ERROR,
> > +   "Failed to inject frame into filter network: %s\n",
> av_err2str(ret));
> > +} else {
> > +ret = reap_filters(0, fl);
> > +}
> > +fl->t_error = ret;
> > +
> > +pthread_mutex_lock(>finish_mutex);
> > +fl->waited_frm = NULL;
> > +

Re: [FFmpeg-devel] [PATCH v4] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-02-11 Thread Wang, Shaofei
Code clean and remove the "-abr_pipeline" option, use the perf improved code 
path by default only if HAVE_THREAD enabled.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v3] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-01-21 Thread Wang, Shaofei
> -Original Message-
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Michael Niedermayer
> Sent: Thursday, January 17, 2019 8:30 PM
> To: FFmpeg development discussions and patches 
> Cc: Nicolas George 
> Subject: Re: [FFmpeg-devel] [PATCH v3] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> 
> On Wed, Jan 16, 2019 at 04:17:07PM -0500, Shaofei Wang wrote:
> > With new option "-abr_pipeline"
> > It enabled multiple filter graph concurrency, which bring obove about
> > 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> >
> > Below are some test cases and comparison as reference.
> > (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> > (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> >
> > For 1:N transcode by GPU acceleration with vaapi:
> > ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
> > -hwaccel_output_format vaapi \
> > -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> > -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
> > -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null \
> > -abr_pipeline
> >
> > test results:
> > 2 encoders 5 encoders 10 encoders
> > Improved   6.1%6.9%   5.5%
> >
> > For 1:N transcode by GPU acceleration with QSV:
> > ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> > -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> > -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null
> \
> > -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null
> > /dev/null
> >
> > test results:
> > 2 encoders  5 encoders 10 encoders
> > Improved   6%   4% 15%
> >
> > For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> > ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> > -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> > -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f
> null /dev/null \
> > -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f
> > null /dev/null
> >
> > test results:
> > 2 scale  5 scale   10 scale
> > Improved   12% 21%21%
> >
> > For CPU only 1 decode to N scaling:
> > ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> > -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
> > -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null \
> > -abr_pipeline
> >
> > test results:
> > 2 scale  5 scale   10 scale
> > Improved   25%107%   148%
> >
> > Signed-off-by: Wang, Shaofei 
> > Reviewed-by: Zhao, Jun 
> > ---
> >  fftools/ffmpeg.c| 228
> 
> >  fftools/ffmpeg.h|  15 
> >  fftools/ffmpeg_filter.c |   4 +
> >  fftools/ffmpeg_opt.c|   6 +-
> >  4 files changed, 237 insertions(+), 16 deletions(-)
> 
> Looking at this i see alot of duplicated code and alot of ifdefs
Since I didn't want to change the function interface of reap_filters(), a 
none-loop reap
function generated.
Will change it base on the reap_filters() to avoid duplicated lines in the next 
patch.

> Preferably one codepath when possible, and best results by default no need to
> manually enable the fast path.
If disable/enable the fast path option is not needed for users, i'll remove it. 
But before
that, there are some reasons:
1. it provide more choice for user to decide whether to use it depend on their 
cases, 
otherwise we need to implement the 'strategies' for users to decide when to 
enable/disable
the fast path.
2. it's easy to compare the result to make sure which is the best

Thanks
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH v3] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-01-16 Thread Wang, Shaofei
> From: Guo, Yejun
> Sent: Thursday, January 17, 2019 9:25 AM
> To: FFmpeg development discussions and patches 
> Cc: mich...@niedermayer.cc; atomnu...@gmail.com; c...@passwd.hu;
> Wang, Shaofei ; ceffm...@gmail.com
> Subject: RE: [FFmpeg-devel] [PATCH v3] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> 
> 
> > +static int pipeline_reap_filters(int flush, InputFilter * ifilter) {
> > +AVFrame *filtered_frame = NULL;
> > +int i;
> > +
> > +for (i = 0; i < nb_output_streams; i++) {
> > +if (ifilter == output_streams[i]->filter->graph->inputs[0]) break;
> > +}
> > +OutputStream *ost = output_streams[i];
> > +OutputFile*of = output_files[ost->file_index];
> > +AVFilterContext *filter;
> > +AVCodecContext *enc = ost->enc_ctx;
> > +int ret = 0;
> > +
> > +if (!ost->filter || !ost->filter->graph->graph)
> > +return 0;
> > +filter = ost->filter->filter;
> > +
> > +if (!ost->initialized) {
> > +char error[1024] = "";
> > +ret = init_output_stream(ost, error, sizeof(error));
> > +if (ret < 0) {
> > +av_log(NULL, AV_LOG_ERROR, "Error initializing output
> > + stream %d:%d
> > -- %s\n",
> > +   ost->file_index, ost->index, error);
> > +exit_program(1);
> 
> imo, it's not good to exit the program.
Any reason? These lines are similar as them in reap_filters(). Line 1445.

> > +}
> > +}
> > +
> > +if (!ost->filtered_frame && !(ost->filtered_frame = av_frame_alloc()))
> > +return AVERROR(ENOMEM);
> > +filtered_frame = ost->filtered_frame;
> > +
> > +while (1) {
> > +double float_pts = AV_NOPTS_VALUE; // this is identical to
> > filtered_frame.pts but with higher precision
> > +ret = av_buffersink_get_frame_flags(filter, filtered_frame,
> > +
> AV_BUFFERSINK_FLAG_NO_REQUEST);
> > +if (ret < 0) {
> > +if (ret != AVERROR(EAGAIN) && ret != AVERROR_EOF) {
> > +av_log(NULL, AV_LOG_WARNING,
> > +   "Error in av_buffersink_get_frame_flags():
> > + %s\n",
> > av_err2str(ret));
> > +} else if (flush && ret == AVERROR_EOF) {
> > +if (av_buffersink_get_type(filter) ==
> AVMEDIA_TYPE_VIDEO)
> > +do_video_out(of, ost, NULL, AV_NOPTS_VALUE);
> > +}
> > +break;
> > +}
> > +if (ost->finished) {
> > +av_frame_unref(filtered_frame);
> > +continue;
> > +}
> > +if (filtered_frame->pts != AV_NOPTS_VALUE) {
> > +int64_t start_time = (of->start_time == AV_NOPTS_VALUE) ?
> > + 0 : of-
> > >start_time;
> > +AVRational filter_tb = av_buffersink_get_time_base(filter);
> > +AVRational tb = enc->time_base;
> > +int extra_bits = av_clip(29 - av_log2(tb.den), 0, 16);
> > +
> > +tb.den <<= extra_bits;
> > +float_pts =
> > +av_rescale_q(filtered_frame->pts, filter_tb, tb) -
> > +av_rescale_q(start_time, AV_TIME_BASE_Q, tb);
> > +float_pts /= 1 << extra_bits;
> > +// avoid exact midoints to reduce the chance of rounding
> > + differences,
> > this can be removed in case the fps code is changed to work with
> > integers
> > +float_pts += FFSIGN(float_pts) * 1.0 / (1<<17);
> > +
> > +filtered_frame->pts =
> > +av_rescale_q(filtered_frame->pts, filter_tb,
> enc->time_base) -
> > +av_rescale_q(start_time, AV_TIME_BASE_Q,
> enc->time_base);
> > +}
> > +
> > +switch (av_buffersink_get_type(filter)) {
> > +case AVMEDIA_TYPE_VIDEO:
> > +if (!ost->frame_aspect_ratio.num)
> > +enc->sample_aspect_ratio =
> > + filtered_frame->sample_aspect_ratio;
> > +
> > +if (debug_ts) {
> > +av_log(NULL, AV_LOG_INFO, "filter -> pts:%s
> > + pts_time:%s exact:%f
> > time_base:%d/%d\n",
> > +av_ts2str(filtered_frame->pts),
> > + av_ts2timestr(filtered_frame-
> > >pts, >time_base),
> > +float_pts,
> > +   

Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-01-11 Thread Wang, Shaofei
> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
> Carl Eugen Hoyos
> Sent: Friday, January 11, 2019 1:14 AM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode +
> N filter graphs and adaptive bitrate.
> 
> My question is simply "why do you need this condition? If you disable threads,
> FFmpeg should still work fine, it simulates threads using one thread."
> 
> Please do not top-post here, Carl Eugen

Without this condition, it will fail the compile when !HAVE_THREAD.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-01-10 Thread Wang, Shaofei
Also caused by some old mingw64 toolchain? Will try to avoid the error in v2 
patch. Thanks

-Original Message-
From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of 
Michael Niedermayer
Sent: Thursday, January 10, 2019 7:36 AM
To: FFmpeg development discussions and patches 
Subject: Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N 
filter graphs and adaptive bitrate.

On Wed, Jan 09, 2019 at 03:50:03PM -0500, Shaofei Wang wrote:
> With new option "-abr_pipeline"
> It enabled multiple filter graph concurrency, which bring obove about 
> 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> 
> Below are some test cases and comparison as reference.
> (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> 
> For 1:N transcode by GPU acceleration with vaapi:
> ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
> -hwaccel_output_format vaapi \
> -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
> -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null \
> -abr_pipeline
> 
> test results:
> 2 encoders 5 encoders 10 encoders
> Improved   6.1%6.9%   5.5%
> 
> For 1:N transcode by GPU acceleration with QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null \
> -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null 
> /dev/null
> 
> test results:
> 2 encoders  5 encoders 10 encoders
> Improved   6%   4% 15%
> 
> For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f null 
> /dev/null \
> -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f 
> null /dev/null
> 
> test results:
> 2 scale  5 scale   10 scale
> Improved   12% 21%21%
> 
> For CPU only 1 decode to N scaling:
> ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
> -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null \
> -abr_pipeline
> 
> test results:
> 2 scale  5 scale   10 scale
> Improved   25%107%   148%
> 
> Signed-off-by: Wang, Shaofei 
> Reviewed-by: Zhao, Jun 
> ---
>  fftools/ffmpeg.c| 239 
> +---
>  fftools/ffmpeg.h|  14 +++
>  fftools/ffmpeg_filter.c |   6 ++
>  fftools/ffmpeg_opt.c|   6 +-
>  4 files changed, 251 insertions(+), 14 deletions(-)

fails to build on mingw64

CC  fftools/ffmpeg_filter.o
src/fftools/ffmpeg_filter.c: In function ‘init_simple_filtergraph’:
src/fftools/ffmpeg_filter.c:231:39: error: incompatible types when assigning to 
type ‘pthread_t’ from type ‘int’
 ist->filters[i]->f_thread = 0;
   ^
make: *** [fftools/ffmpeg_filter.o] Error 1
CC  fftools/ffmpeg.o
src/fftools/ffmpeg.c: In function ‘pipeline_reap_filters’:
src/fftools/ffmpeg.c:1534:5: warning: ISO C90 forbids mixed declarations and 
code [-Wdeclaration-after-statement]
 OutputStream *ost = output_streams[i];
 ^
src/fftools/ffmpeg.c: In function ‘do_streamcopy’:
src/fftools/ffmpeg.c:2179:5: warning: ‘av_copy_packet_side_data’ is deprecated 
(declared at src/libavcodec/avcodec.h:4424) [-Wdeprecated-declarations]
 av_copy_packet_side_data(, pkt);
 ^
src/fftools/ffmpeg.c: In function ‘filter_pipeline’:
src/fftools/ffmpeg.c:2412:5: warning: ‘return’ with no value, in function 
returning non-void [enabled by default]
 return;
 ^
src/fftools/ffmpeg.c: In function ‘send_frame_to_filters’:
src/fftools/ffmpeg.c:2451:17: error: wrong type argument to unary exclamation 
mark
 if (!ist->filters[i]->f_thread) {
 ^
src/fftools/ffmpeg.c: In function ‘init_output_stream’:
src/fftools/ffmpeg.c:3754:9: warning: ‘avcodec_copy_context’ is deprecated 
(declared at src/libavcodec/avcodec.h:4196) [-Wdeprecated-declarations]
 ret = avcodec_copy_context(ost->st->codec, ost->enc_ctx);
 ^
src/fftools/ffmpeg.c:3754:9: warning: ‘codec’ is deprecated (declared at 
src/libavformat/avformat.h:878) [-Wdeprecated-declarations]
src/fftools/ffmpeg.c:3800:9: warning: ‘codec’ is deprecated (declared at 
src/libavformat/avformat.h:878) [-Wdeprecated-declarations]
 ost->st->

Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

2019-01-10 Thread Wang, Shaofei

Please ignore those commented lines which will be removed in the v2 patch, they 
were referred from previous reap_filters() code.

"if (HAVE_THREADS && !abr_pipeline)" looks better.

Could you add more about "not work with thread emulation"? Thx.

-Original Message-
From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of Carl 
Eugen Hoyos
Sent: Thursday, January 10, 2019 4:51 AM
To: FFmpeg development discussions and patches 
Subject: Re: [FFmpeg-devel] [PATCH] Improved the performance of 1 decode + N 
filter graphs and adaptive bitrate.

2019-01-09 21:50 GMT+01:00, Shaofei Wang :

> +//if (ost->source_index >= 0)
> +//*filtered_frame=
> *input_streams[ost->source_index]->decoded_frame; //for me_threshold

Is there a reason why this is commented?

> @@ -2179,7 +2285,15 @@ static int ifilter_send_frame(InputFilter 
> *ifilter, AVFrame *frame)
>  }
>  }
>
> +#if HAVE_THREADS
> +if (!abr_pipeline) {

The following is less ugly:
if (HAVE_THREADS && !abr_pipeline)
Same below, you can define the new variables unconditionally.

But why does your code not work with FFmpeg's  thread emulation?

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel