Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-29 Thread Ronald S. Bultje
Hi,

On Mon, Feb 29, 2016 at 2:47 PM, Wan-Teh Chang  wrote:

> On Fri, Feb 26, 2016 at 5:04 PM, Ronald S. Bultje 
> wrote:
> >
> > To be a little bit more explicit, it should be relatively easy to
> > accomplish this by changing the position of qscale in the relevant
> struct,
> > and only copy that half of the struct (where entries are obviously
> grouped)
> > where we actually care about the copied value.
> >
> > See e.g. the copy_fields macro [1] on how we do partial-struct copies.
> >
> > Ronald
> >
> > [1]
> >
> http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/h264_slice.c;h=9a5bc3f63fb5c6aae3747a03657b8dc821c7d750;hb=HEAD#l427
>
> Hi Ronald,
>
> Thanks a lot for the hints. As soon as I had access to a computer last
> Saturday, I implemented your suggestion. Although the data race on
> |qscale| is gone, ThreadSantizer warned about the next data race of
> this kind. I fixed that using the same approach, and then
> ThreadSantizer warned about the next data race of this kind.
>
> That made me realize the fix for this class of data race is going to
> big, and I should do this work in the ffmpeg HEAD rather than the old
> version I'm using. I followed the instructions James Almer gave
> earlier in this email thread:
>
> ./configure --samples=fate-suite/ --toolchain=clang-tsan
> --disable-stripping
> make fate-h264 THREADS=4
>
> (Note: James suggested --toolchain=gcc-tsan. That didn't work for me
> for some reason. I tried --toolchain=clang-tsan instead and it
> worked.)
>
> ThreadSanitizer reported the same class of data races in the h264
> decoder with multiple threads. To understand the code and propose a
> proper fix, I am afraid that I will need to live and breathe this code
> for several days because the structs involved (H264Context,
> H264SliceContext, etc.) have a lot of members.
>
> So, I am wondering if you could run fate-h264 under tsan and take a
> look at the data races in the h264 decoding code. I think you will be
> able to fix the data races much faster than I can. If you don't have a
> lot of time, I'll be happy to implement a solution if you can outline
> it after your investigation. Thanks.


I don't think I care enough about tsan, I'd prefer a tool like Microsoft
Chess that replays concurrency paths. Sorry :(

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-29 Thread Clément Bœsch
On Mon, Feb 29, 2016 at 11:47:14AM -0800, Wan-Teh Chang wrote:
> On Fri, Feb 26, 2016 at 5:04 PM, Ronald S. Bultje  wrote:
> >
> > To be a little bit more explicit, it should be relatively easy to
> > accomplish this by changing the position of qscale in the relevant struct,
> > and only copy that half of the struct (where entries are obviously grouped)
> > where we actually care about the copied value.
> >
> > See e.g. the copy_fields macro [1] on how we do partial-struct copies.
> >
> > Ronald
> >
> > [1]
> > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/h264_slice.c;h=9a5bc3f63fb5c6aae3747a03657b8dc821c7d750;hb=HEAD#l427
> 
> Hi Ronald,
> 
> Thanks a lot for the hints. As soon as I had access to a computer last
> Saturday, I implemented your suggestion. Although the data race on
> |qscale| is gone, ThreadSantizer warned about the next data race of
> this kind. I fixed that using the same approach, and then
> ThreadSantizer warned about the next data race of this kind.
> 
> That made me realize the fix for this class of data race is going to
> big, and I should do this work in the ffmpeg HEAD rather than the old
> version I'm using. I followed the instructions James Almer gave
> earlier in this email thread:
> 
> ./configure --samples=fate-suite/ --toolchain=clang-tsan --disable-stripping
> make fate-h264 THREADS=4
> 
> (Note: James suggested --toolchain=gcc-tsan. That didn't work for me
> for some reason. I tried --toolchain=clang-tsan instead and it
> worked.)
> 
> ThreadSanitizer reported the same class of data races in the h264
> decoder with multiple threads. To understand the code and propose a
> proper fix, I am afraid that I will need to live and breathe this code
> for several days because the structs involved (H264Context,
> H264SliceContext, etc.) have a lot of members.
> 
> So, I am wondering if you could run fate-h264 under tsan and take a
> look at the data races in the h264 decoding code. I think you will be
> able to fix the data races much faster than I can. If you don't have a
> lot of time, I'll be happy to implement a solution if you can outline
> it after your investigation. Thanks.
> 

BTW, thanks a lot for working on this. You can also play with valgrind
(tool helgrind or drd) if it's supported on your system. You won't need a
special build to use it (although for fate, you'll need something like
--target-exec="valgrind --tool=helgrind" or get the command with make
fate-foobar V=1 and prepend valgrind).

Note: according to Ronald it's actually very close to TSAN.

-- 
Clément B.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-29 Thread Wan-Teh Chang
On Fri, Feb 26, 2016 at 5:04 PM, Ronald S. Bultje  wrote:
>
> To be a little bit more explicit, it should be relatively easy to
> accomplish this by changing the position of qscale in the relevant struct,
> and only copy that half of the struct (where entries are obviously grouped)
> where we actually care about the copied value.
>
> See e.g. the copy_fields macro [1] on how we do partial-struct copies.
>
> Ronald
>
> [1]
> http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/h264_slice.c;h=9a5bc3f63fb5c6aae3747a03657b8dc821c7d750;hb=HEAD#l427

Hi Ronald,

Thanks a lot for the hints. As soon as I had access to a computer last
Saturday, I implemented your suggestion. Although the data race on
|qscale| is gone, ThreadSantizer warned about the next data race of
this kind. I fixed that using the same approach, and then
ThreadSantizer warned about the next data race of this kind.

That made me realize the fix for this class of data race is going to
big, and I should do this work in the ffmpeg HEAD rather than the old
version I'm using. I followed the instructions James Almer gave
earlier in this email thread:

./configure --samples=fate-suite/ --toolchain=clang-tsan --disable-stripping
make fate-h264 THREADS=4

(Note: James suggested --toolchain=gcc-tsan. That didn't work for me
for some reason. I tried --toolchain=clang-tsan instead and it
worked.)

ThreadSanitizer reported the same class of data races in the h264
decoder with multiple threads. To understand the code and propose a
proper fix, I am afraid that I will need to live and breathe this code
for several days because the structs involved (H264Context,
H264SliceContext, etc.) have a lot of members.

So, I am wondering if you could run fate-h264 under tsan and take a
look at the data races in the h264 decoding code. I think you will be
able to fix the data races much faster than I can. If you don't have a
lot of time, I'll be happy to implement a solution if you can outline
it after your investigation. Thanks.

Wan-Teh Chang
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-29 Thread Wan-Teh Chang
Hi James,

On Thu, Feb 25, 2016 at 6:50 PM, James Almer  wrote:
>
> Did you try running the FATE suite using threadsanitizer, or just decoded 
> random
> videos? I remember like half the tests were failing because of data races and
> most of them pointed to code changed by your patches. It would be interesting 
> to
> see how they are affected.

When I submitted my patch for review, I only decoded random videos
(with an old version of ffmpeg). Later in this email thread, Ronald
showed my patch removed all concurrency, so I'll need to go back to
the drawing board.

> configure has the --toolchain=gcc-tsan option for this. Make sure to also add
> --disable-stripping, then run "make fate THREADS=4 SAMPLES=/path/to/samples" 
> or
> similar. You may instead want to run a small set at a time instead of the 
> whole
> suite since it's big and threadsanitizer slow. Try for example fate-h264 in 
> that
> case.

Thank you very much for the command line. I did this last Saturday on
the ffmpeg HEAD. Although the h264 decoding code has changed
significantly between the old version I'm using and the ffmpeg HEAD, I
am certain these are the same class of data race.

Wan-Teh Chang
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-26 Thread Ronald S. Bultje
Hi,

On Fri, Feb 26, 2016 at 5:49 PM, Ronald S. Bultje 
wrote:

> Now, one could argue that maybe we shouldn't copy the value *at all* if we
> don't use it anyway, which would incidentally also make tools like tsan
> happy about this particular thingy. I'm not against that, if it doesn't
> make the code uglier.
>

To be a little bit more explicit, it should be relatively easy to
accomplish this by changing the position of qscale in the relevant struct,
and only copy that half of the struct (where entries are obviously grouped)
where we actually care about the copied value.

See e.g. the copy_fields macro [1] on how we do partial-struct copies.

Ronald

[1]
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/h264_slice.c;h=9a5bc3f63fb5c6aae3747a03657b8dc821c7d750;hb=HEAD#l427
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-26 Thread Ronald S. Bultje
Hi,

On Fri, Feb 26, 2016 at 5:35 PM, Wan-Teh Chang  wrote:

> The data race is reported on the |qscale| member of MpegEncContext.


sl->qscale is unconditionally assigned in ff_h264_decode_slice_header(),
which run at the start of each frame. Therefore, qscale is (B), not (A) (it
may be copied, but it's never used, so we don't care) and is free to be
modified after the copy because the copied value is irrelevant.

Now, one could argue that maybe we shouldn't copy the value *at all* if we
don't use it anyway, which would incidentally also make tools like tsan
happy about this particular thingy. I'm not against that, if it doesn't
make the code uglier.

But there is no race here.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-26 Thread Wan-Teh Chang
On Fri, Feb 26, 2016 at 1:17 PM, Ronald S. Bultje  wrote:
>
> I'm happy to help out if you tell me which field/member tsan is complaining
> about.

Hi Ronald,

I am using an old version of ffmpeg. Here is the ThreadSanitizer
warning on the data race and the relevant source code from that ffmpeg
source tree.

The data race is reported on the |qscale| member of MpegEncContext.
(In the current version, the corresponding thing seems to be the
|qscale| member in H264SliceContext.)

I will also try to reproduce the data race with the current version of
ffmpeg. The relevant code in the current version looks similar, but it
is certainly possible that the data race is gone.

Thanks a lot for your help!

Wan-Teh Chang

==

WARNING: ThreadSanitizer: data race (pid=161600)
  Write of size 4 at 0x7f53af71a830 by thread T8 (mutexes: write M16537):
#0 ff_h264_decode_mb_cabac ffmpeg/libavcodec/h264_cabac.c:2336:23
(754b7d528ac43192f37741068b3495f5+0x00673706)
#1 decode_slice ffmpeg/libavcodec/h264.c:3553:23
(754b7d528ac43192f37741068b3495f5+0x00667e37)
#2 execute_decode_slices ffmpeg/libavcodec/h264.c:3709:16
(754b7d528ac43192f37741068b3495f5+0x0066691d)
#3 decode_nal_units ffmpeg/libavcodec/h264.c:4010:17
(754b7d528ac43192f37741068b3495f5+0x00633530)
#4 decode_frame ffmpeg/libavcodec/h264.c:4132:17
(754b7d528ac43192f37741068b3495f5+0x0064a6eb)
#5 frame_worker_thread ffmpeg/libavcodec/pthread.c:389:21
(754b7d528ac43192f37741068b3495f5+0x0098669f)

  Previous read of size 8 at 0x7f53af71a830 by main thread (mutexes:
write M16539):
#0 memcpy 
llvm/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:685:5
(754b7d528ac43192f37741068b3495f5+0x000d6600)
#1 ff_mpeg_update_thread_context
ffmpeg/libavcodec/mpegvideo.c:542:9
(754b7d528ac43192f37741068b3495f5+0x008e08e7)
#2 decode_update_thread_context ffmpeg/libavcodec/h264.c:1178:11
(754b7d528ac43192f37741068b3495f5+0x0064965a)
#3 update_context_from_thread ffmpeg/libavcodec/pthread.c:462:19
(754b7d528ac43192f37741068b3495f5+0x00983a5e)
#4 submit_packet ffmpeg/libavcodec/pthread.c:563:15
(754b7d528ac43192f37741068b3495f5+0x00982e0c)
#5 ff_thread_decode_frame ffmpeg/libavcodec/pthread.c:624
(754b7d528ac43192f37741068b3495f5+0x00982e0c)
#6 avcodec_decode_video2 ffmpeg/libavcodec/utils.c:1565:19
(754b7d528ac43192f37741068b3495f5+0x00a8e401)

Code snippets for the stack trace of the first thread (thread T8):

libavcodec/h264_cabac.c
2316 // decode_cabac_mb_dqp
2317 if(get_cabac_noinline( >cabac, >cabac_state[60 +
(h->last_qscale_diff != 0)])){
2318 int val = 1;
2319 int ctx= 2;
2320 const int max_qp = 51 + 6*(h->sps.bit_depth_luma-8);
2321
2322 while( get_cabac_noinline( >cabac,
>cabac_state[60 + ctx] ) ) {
2323 ctx= 3;
2324 val++;
2325 if(val > 2*max_qp){ //prevent infinite loop
2326 av_log(h->s.avctx, AV_LOG_ERROR, "cabac
decode of qscale diff failed at %d %d\n", s->mb_x, s->mb_y);
2327 return -1;
2328 }
2329 }
2330
2331 if( val&0x01 )
2332 val=   (val + 1)>>1 ;
2333 else
2334 val= -((val + 1)>>1);
2335 h->last_qscale_diff = val;
2336 s->qscale += val;
2337 if(((unsigned)s->qscale) > max_qp){
2338 if(s->qscale<0) s->qscale+= max_qp+1;
2339 elses->qscale-= max_qp+1;
2340 }
2341 h->chroma_qp[0] = get_chroma_qp(h, 0, s->qscale);
2342 h->chroma_qp[1] = get_chroma_qp(h, 1, s->qscale);
2343 }else

libavcodec/h264.c (the first two code snippets are in
libavcodec/h264_slice.c in the current version):
3525 static int decode_slice(struct AVCodecContext *avctx, void *arg)
3526 {
3527 H264Context *h = *(void **)arg;
3528 MpegEncContext *const s = >s;
3529 const int part_mask = s->partitioned_frame ? (ER_AC_END |
ER_AC_ERROR)
3530: 0x7F;
3531 int lf_x_start = s->mb_x;
3532
3533 s->mb_skip_run = -1;
3534
3535 h->is_complex = FRAME_MBAFF || s->picture_structure != PICT_FRAME ||
3536 s->codec_id != AV_CODEC_ID_H264 ||
3537 (CONFIG_GRAY && (s->flags & CODEC_FLAG_GRAY));
3538
3539 if (h->pps.cabac) {
3540 /* realign */
3541 align_get_bits(>gb);
3542
3543 /* init cabac */
3544 ff_init_cabac_states();
3545 ff_init_cabac_decoder(>cabac,
3546   s->gb.buffer + get_bits_count(>gb) / 8,
3547   (get_bits_left(>gb) + 7) / 8);
3548
3549 ff_h264_init_cabac_states(h);
3550
3551 for (;;) {
3552 // START_TIMER
3553 int ret = 

Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-26 Thread Ronald S. Bultje
Hi,

On Fri, Feb 26, 2016 at 4:12 PM, Wan-Teh Chang  wrote:

> On Fri, Feb 26, 2016 at 12:44 PM, Ronald S. Bultje 
> wrote:
> > Hi,
> >
> > On Fri, Feb 26, 2016 at 3:26 PM, Ronald S. Bultje 
> > wrote:
> >
> >>
> >> So doesn't this patch remove all concurrency by making the decode() of
> >> thread N-1 finish before decode() of thread N can start? More
> practically,
> >> what is the output of time ffmpeg -threads N -i large-h264-file -f null
> -
> >> with N being 1 and 2 before and after your patch?
> >>
> >
> > I actually tested this, just for fun:
> >
> > before:
> > $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f
> null
> > -v 0 -nostats -vframes 500 -; done
> >
> > real 0m6.732s
> > user 0m6.643s
> > sys 0m0.064s
> >
> > real 0m4.124s <<< note how this is significantly smaller than 6.732
> > user 0m6.566s
> > sys 0m0.114s
> >
> > after:
> > $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f
> null
> > -v 0 -nostats -vframes 500 -; done
> >
> > real 0m6.808s
> > user 0m6.541s
> > sys 0m0.071s
> >
> > real 0m7.001s <<< note how this is no longer significantly smaller than
> > 6.808
> > user 0m6.945s
> > sys 0m0.108s
> >
> > That seems like a pretty major malfunction of MT performance to me...
>
> Hi Ronald,
>
> Thank you very much for reviewing and testing my patch. I will study
> your description of the code, analyze the data race reported by
> ThreadSanitizer again, and report back.
>
> Among the four data races I reported yesterday, this data race is the
> most difficult to analyze because the code that has the data race is
> outside libavcodec/pthread_frame.c, in the h264 decoder I remember.


I'm happy to help out if you tell me which field/member tsan is complaining
about.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-26 Thread Wan-Teh Chang
On Fri, Feb 26, 2016 at 12:44 PM, Ronald S. Bultje  wrote:
> Hi,
>
> On Fri, Feb 26, 2016 at 3:26 PM, Ronald S. Bultje 
> wrote:
>
>>
>> So doesn't this patch remove all concurrency by making the decode() of
>> thread N-1 finish before decode() of thread N can start? More practically,
>> what is the output of time ffmpeg -threads N -i large-h264-file -f null -
>> with N being 1 and 2 before and after your patch?
>>
>
> I actually tested this, just for fun:
>
> before:
> $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f null
> -v 0 -nostats -vframes 500 -; done
>
> real 0m6.732s
> user 0m6.643s
> sys 0m0.064s
>
> real 0m4.124s <<< note how this is significantly smaller than 6.732
> user 0m6.566s
> sys 0m0.114s
>
> after:
> $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f null
> -v 0 -nostats -vframes 500 -; done
>
> real 0m6.808s
> user 0m6.541s
> sys 0m0.071s
>
> real 0m7.001s <<< note how this is no longer significantly smaller than
> 6.808
> user 0m6.945s
> sys 0m0.108s
>
> That seems like a pretty major malfunction of MT performance to me...

Hi Ronald,

Thank you very much for reviewing and testing my patch. I will study
your description of the code, analyze the data race reported by
ThreadSanitizer again, and report back.

Among the four data races I reported yesterday, this data race is the
most difficult to analyze because the code that has the data race is
outside libavcodec/pthread_frame.c, in the h264 decoder I remember.

Wan-Teh Chang
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-26 Thread Ronald S. Bultje
Hi,

On Fri, Feb 26, 2016 at 3:26 PM, Ronald S. Bultje 
wrote:

> Hi,
>
> On Thu, Feb 25, 2016 at 9:32 PM, Wan-Teh Chang <
> wtc-at-google@ffmpeg.org> wrote:
>
>> This is because the codec->decode() call in frame_worker_thread may be
>> modifying that avctx at the same time. This data race is reported by
>> ThreadSanitizer.
>>
>> Although submit_thread holds two locks simultaneously, it always
>> acquires them in the same order because |prev_thread| is always the
>> array element before |p| (with a wraparound at the end of array), and
>> submit_thread is the only function that acquires those two locks, so
>> this won't result in a deadlock.
>> ---
>>  libavcodec/pthread_frame.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
>> index b77dd1e..e7879e3 100644
>> --- a/libavcodec/pthread_frame.c
>> +++ b/libavcodec/pthread_frame.c
>> @@ -330,7 +330,9 @@ static int submit_packet(PerThreadContext *p,
>> AVPacket *avpkt)
>>  pthread_mutex_unlock(_thread->progress_mutex);
>>  }
>>
>> +pthread_mutex_lock(_thread->mutex);
>>  err = update_context_from_thread(p->avctx, prev_thread->avctx,
>> 0);
>> +pthread_mutex_unlock(_thread->mutex);
>>  if (err) {
>>  pthread_mutex_unlock(>mutex);
>>  return err;
>> --
>> 2.7.0.rc3.207.g0ac5344
>
>
> Uhm, isn't this the lock that is being held while we call decode()?
>
> static attribute_align_arg void *frame_worker_thread(void *arg)
> {
> [..]
> pthread_mutex_lock(>mutex);
> while (1) {
> while (p->state == STATE_INPUT_READY && !fctx->die)
> pthread_cond_wait(>input_cond, >mutex);
> [..]
> p->result = codec->decode(avctx, p->frame, >got_frame,
> >avpkt);
> [..]
> }
> pthread_mutex_unlock(>mutex);
> [..]
> }
>
> So doesn't this patch remove all concurrency by making the decode() of
> thread N-1 finish before decode() of thread N can start? More practically,
> what is the output of time ffmpeg -threads N -i large-h264-file -f null -
> with N being 1 and 2 before and after your patch?
>

I actually tested this, just for fun:

before:
$ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f null
-v 0 -nostats -vframes 500 -; done

real 0m6.732s
user 0m6.643s
sys 0m0.064s

real 0m4.124s <<< note how this is significantly smaller than 6.732
user 0m6.566s
sys 0m0.114s

after:
$ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f null
-v 0 -nostats -vframes 500 -; done

real 0m6.808s
user 0m6.541s
sys 0m0.071s

real 0m7.001s <<< note how this is no longer significantly smaller than
6.808
user 0m6.945s
sys 0m0.108s

That seems like a pretty major malfunction of MT performance to me...

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-26 Thread Ronald S. Bultje
Hi,

On Thu, Feb 25, 2016 at 9:32 PM, Wan-Teh Chang  wrote:

> This is because the codec->decode() call in frame_worker_thread may be
> modifying that avctx at the same time. This data race is reported by
> ThreadSanitizer.
>
> Although submit_thread holds two locks simultaneously, it always
> acquires them in the same order because |prev_thread| is always the
> array element before |p| (with a wraparound at the end of array), and
> submit_thread is the only function that acquires those two locks, so
> this won't result in a deadlock.
> ---
>  libavcodec/pthread_frame.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
> index b77dd1e..e7879e3 100644
> --- a/libavcodec/pthread_frame.c
> +++ b/libavcodec/pthread_frame.c
> @@ -330,7 +330,9 @@ static int submit_packet(PerThreadContext *p, AVPacket
> *avpkt)
>  pthread_mutex_unlock(_thread->progress_mutex);
>  }
>
> +pthread_mutex_lock(_thread->mutex);
>  err = update_context_from_thread(p->avctx, prev_thread->avctx, 0);
> +pthread_mutex_unlock(_thread->mutex);
>  if (err) {
>  pthread_mutex_unlock(>mutex);
>  return err;
> --
> 2.7.0.rc3.207.g0ac5344


Uhm, isn't this the lock that is being held while we call decode()?

static attribute_align_arg void *frame_worker_thread(void *arg)
{
[..]
pthread_mutex_lock(>mutex);
while (1) {
while (p->state == STATE_INPUT_READY && !fctx->die)
pthread_cond_wait(>input_cond, >mutex);
[..]
p->result = codec->decode(avctx, p->frame, >got_frame,
>avpkt);
[..]
}
pthread_mutex_unlock(>mutex);
[..]
}

So doesn't this patch remove all concurrency by making the decode() of
thread N-1 finish before decode() of thread N can start? More practically,
what is the output of time ffmpeg -threads N -i large-h264-file -f null -
with N being 1 and 2 before and after your patch?

The contract between threads is like this: there is two types of fields,
these required for decoding the next frame (A), and those irrelevant for
decoding the next frame (B). When finish_setup() is called, we require all
values of (A) to have reached their final value. We don't care about values
of (B). Then, in the main thread, (A) are copied from prev_thread to this
thread and we continue decoding. We should have continuous race conditions
and real-world decoding failures if this contract were violated. But that's
not the case, frame-MT has been fairly stable over the past years.

So, let's make this real-world: what kind of variables is tsan complaining
about? Are they really written after setup_finished is called and used in
the next decode call? Or are they copied but never used? Or something else?

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-25 Thread James Almer
On 2/25/2016 11:32 PM, Wan-Teh Chang wrote:
> This is because the codec->decode() call in frame_worker_thread may be
> modifying that avctx at the same time. This data race is reported by
> ThreadSanitizer.
> 
> Although submit_thread holds two locks simultaneously, it always
> acquires them in the same order because |prev_thread| is always the
> array element before |p| (with a wraparound at the end of array), and
> submit_thread is the only function that acquires those two locks, so
> this won't result in a deadlock.

Did you try running the FATE suite using threadsanitizer, or just decoded random
videos? I remember like half the tests were failing because of data races and
most of them pointed to code changed by your patches. It would be interesting to
see how they are affected.
configure has the --toolchain=gcc-tsan option for this. Make sure to also add
--disable-stripping, then run "make fate THREADS=4 SAMPLES=/path/to/samples" or
similar. You may instead want to run a small set at a time instead of the whole
suite since it's big and threadsanitizer slow. Try for example fate-h264 in that
case.

Thanks for working on this.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.

2016-02-25 Thread Wan-Teh Chang
This is because the codec->decode() call in frame_worker_thread may be
modifying that avctx at the same time. This data race is reported by
ThreadSanitizer.

Although submit_thread holds two locks simultaneously, it always
acquires them in the same order because |prev_thread| is always the
array element before |p| (with a wraparound at the end of array), and
submit_thread is the only function that acquires those two locks, so
this won't result in a deadlock.
---
 libavcodec/pthread_frame.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
index b77dd1e..e7879e3 100644
--- a/libavcodec/pthread_frame.c
+++ b/libavcodec/pthread_frame.c
@@ -330,7 +330,9 @@ static int submit_packet(PerThreadContext *p, AVPacket 
*avpkt)
 pthread_mutex_unlock(_thread->progress_mutex);
 }
 
+pthread_mutex_lock(_thread->mutex);
 err = update_context_from_thread(p->avctx, prev_thread->avctx, 0);
+pthread_mutex_unlock(_thread->mutex);
 if (err) {
 pthread_mutex_unlock(>mutex);
 return err;
-- 
2.7.0.rc3.207.g0ac5344

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel