Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
Hi, On Mon, Feb 29, 2016 at 2:47 PM, Wan-Teh Changwrote: > On Fri, Feb 26, 2016 at 5:04 PM, Ronald S. Bultje > wrote: > > > > To be a little bit more explicit, it should be relatively easy to > > accomplish this by changing the position of qscale in the relevant > struct, > > and only copy that half of the struct (where entries are obviously > grouped) > > where we actually care about the copied value. > > > > See e.g. the copy_fields macro [1] on how we do partial-struct copies. > > > > Ronald > > > > [1] > > > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/h264_slice.c;h=9a5bc3f63fb5c6aae3747a03657b8dc821c7d750;hb=HEAD#l427 > > Hi Ronald, > > Thanks a lot for the hints. As soon as I had access to a computer last > Saturday, I implemented your suggestion. Although the data race on > |qscale| is gone, ThreadSantizer warned about the next data race of > this kind. I fixed that using the same approach, and then > ThreadSantizer warned about the next data race of this kind. > > That made me realize the fix for this class of data race is going to > big, and I should do this work in the ffmpeg HEAD rather than the old > version I'm using. I followed the instructions James Almer gave > earlier in this email thread: > > ./configure --samples=fate-suite/ --toolchain=clang-tsan > --disable-stripping > make fate-h264 THREADS=4 > > (Note: James suggested --toolchain=gcc-tsan. That didn't work for me > for some reason. I tried --toolchain=clang-tsan instead and it > worked.) > > ThreadSanitizer reported the same class of data races in the h264 > decoder with multiple threads. To understand the code and propose a > proper fix, I am afraid that I will need to live and breathe this code > for several days because the structs involved (H264Context, > H264SliceContext, etc.) have a lot of members. > > So, I am wondering if you could run fate-h264 under tsan and take a > look at the data races in the h264 decoding code. I think you will be > able to fix the data races much faster than I can. If you don't have a > lot of time, I'll be happy to implement a solution if you can outline > it after your investigation. Thanks. I don't think I care enough about tsan, I'd prefer a tool like Microsoft Chess that replays concurrency paths. Sorry :( Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
On Mon, Feb 29, 2016 at 11:47:14AM -0800, Wan-Teh Chang wrote: > On Fri, Feb 26, 2016 at 5:04 PM, Ronald S. Bultjewrote: > > > > To be a little bit more explicit, it should be relatively easy to > > accomplish this by changing the position of qscale in the relevant struct, > > and only copy that half of the struct (where entries are obviously grouped) > > where we actually care about the copied value. > > > > See e.g. the copy_fields macro [1] on how we do partial-struct copies. > > > > Ronald > > > > [1] > > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/h264_slice.c;h=9a5bc3f63fb5c6aae3747a03657b8dc821c7d750;hb=HEAD#l427 > > Hi Ronald, > > Thanks a lot for the hints. As soon as I had access to a computer last > Saturday, I implemented your suggestion. Although the data race on > |qscale| is gone, ThreadSantizer warned about the next data race of > this kind. I fixed that using the same approach, and then > ThreadSantizer warned about the next data race of this kind. > > That made me realize the fix for this class of data race is going to > big, and I should do this work in the ffmpeg HEAD rather than the old > version I'm using. I followed the instructions James Almer gave > earlier in this email thread: > > ./configure --samples=fate-suite/ --toolchain=clang-tsan --disable-stripping > make fate-h264 THREADS=4 > > (Note: James suggested --toolchain=gcc-tsan. That didn't work for me > for some reason. I tried --toolchain=clang-tsan instead and it > worked.) > > ThreadSanitizer reported the same class of data races in the h264 > decoder with multiple threads. To understand the code and propose a > proper fix, I am afraid that I will need to live and breathe this code > for several days because the structs involved (H264Context, > H264SliceContext, etc.) have a lot of members. > > So, I am wondering if you could run fate-h264 under tsan and take a > look at the data races in the h264 decoding code. I think you will be > able to fix the data races much faster than I can. If you don't have a > lot of time, I'll be happy to implement a solution if you can outline > it after your investigation. Thanks. > BTW, thanks a lot for working on this. You can also play with valgrind (tool helgrind or drd) if it's supported on your system. You won't need a special build to use it (although for fate, you'll need something like --target-exec="valgrind --tool=helgrind" or get the command with make fate-foobar V=1 and prepend valgrind). Note: according to Ronald it's actually very close to TSAN. -- Clément B. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
On Fri, Feb 26, 2016 at 5:04 PM, Ronald S. Bultjewrote: > > To be a little bit more explicit, it should be relatively easy to > accomplish this by changing the position of qscale in the relevant struct, > and only copy that half of the struct (where entries are obviously grouped) > where we actually care about the copied value. > > See e.g. the copy_fields macro [1] on how we do partial-struct copies. > > Ronald > > [1] > http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/h264_slice.c;h=9a5bc3f63fb5c6aae3747a03657b8dc821c7d750;hb=HEAD#l427 Hi Ronald, Thanks a lot for the hints. As soon as I had access to a computer last Saturday, I implemented your suggestion. Although the data race on |qscale| is gone, ThreadSantizer warned about the next data race of this kind. I fixed that using the same approach, and then ThreadSantizer warned about the next data race of this kind. That made me realize the fix for this class of data race is going to big, and I should do this work in the ffmpeg HEAD rather than the old version I'm using. I followed the instructions James Almer gave earlier in this email thread: ./configure --samples=fate-suite/ --toolchain=clang-tsan --disable-stripping make fate-h264 THREADS=4 (Note: James suggested --toolchain=gcc-tsan. That didn't work for me for some reason. I tried --toolchain=clang-tsan instead and it worked.) ThreadSanitizer reported the same class of data races in the h264 decoder with multiple threads. To understand the code and propose a proper fix, I am afraid that I will need to live and breathe this code for several days because the structs involved (H264Context, H264SliceContext, etc.) have a lot of members. So, I am wondering if you could run fate-h264 under tsan and take a look at the data races in the h264 decoding code. I think you will be able to fix the data races much faster than I can. If you don't have a lot of time, I'll be happy to implement a solution if you can outline it after your investigation. Thanks. Wan-Teh Chang ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
Hi James, On Thu, Feb 25, 2016 at 6:50 PM, James Almerwrote: > > Did you try running the FATE suite using threadsanitizer, or just decoded > random > videos? I remember like half the tests were failing because of data races and > most of them pointed to code changed by your patches. It would be interesting > to > see how they are affected. When I submitted my patch for review, I only decoded random videos (with an old version of ffmpeg). Later in this email thread, Ronald showed my patch removed all concurrency, so I'll need to go back to the drawing board. > configure has the --toolchain=gcc-tsan option for this. Make sure to also add > --disable-stripping, then run "make fate THREADS=4 SAMPLES=/path/to/samples" > or > similar. You may instead want to run a small set at a time instead of the > whole > suite since it's big and threadsanitizer slow. Try for example fate-h264 in > that > case. Thank you very much for the command line. I did this last Saturday on the ffmpeg HEAD. Although the h264 decoding code has changed significantly between the old version I'm using and the ffmpeg HEAD, I am certain these are the same class of data race. Wan-Teh Chang ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
Hi, On Fri, Feb 26, 2016 at 5:49 PM, Ronald S. Bultjewrote: > Now, one could argue that maybe we shouldn't copy the value *at all* if we > don't use it anyway, which would incidentally also make tools like tsan > happy about this particular thingy. I'm not against that, if it doesn't > make the code uglier. > To be a little bit more explicit, it should be relatively easy to accomplish this by changing the position of qscale in the relevant struct, and only copy that half of the struct (where entries are obviously grouped) where we actually care about the copied value. See e.g. the copy_fields macro [1] on how we do partial-struct copies. Ronald [1] http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/h264_slice.c;h=9a5bc3f63fb5c6aae3747a03657b8dc821c7d750;hb=HEAD#l427 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
Hi, On Fri, Feb 26, 2016 at 5:35 PM, Wan-Teh Changwrote: > The data race is reported on the |qscale| member of MpegEncContext. sl->qscale is unconditionally assigned in ff_h264_decode_slice_header(), which run at the start of each frame. Therefore, qscale is (B), not (A) (it may be copied, but it's never used, so we don't care) and is free to be modified after the copy because the copied value is irrelevant. Now, one could argue that maybe we shouldn't copy the value *at all* if we don't use it anyway, which would incidentally also make tools like tsan happy about this particular thingy. I'm not against that, if it doesn't make the code uglier. But there is no race here. Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
On Fri, Feb 26, 2016 at 1:17 PM, Ronald S. Bultjewrote: > > I'm happy to help out if you tell me which field/member tsan is complaining > about. Hi Ronald, I am using an old version of ffmpeg. Here is the ThreadSanitizer warning on the data race and the relevant source code from that ffmpeg source tree. The data race is reported on the |qscale| member of MpegEncContext. (In the current version, the corresponding thing seems to be the |qscale| member in H264SliceContext.) I will also try to reproduce the data race with the current version of ffmpeg. The relevant code in the current version looks similar, but it is certainly possible that the data race is gone. Thanks a lot for your help! Wan-Teh Chang == WARNING: ThreadSanitizer: data race (pid=161600) Write of size 4 at 0x7f53af71a830 by thread T8 (mutexes: write M16537): #0 ff_h264_decode_mb_cabac ffmpeg/libavcodec/h264_cabac.c:2336:23 (754b7d528ac43192f37741068b3495f5+0x00673706) #1 decode_slice ffmpeg/libavcodec/h264.c:3553:23 (754b7d528ac43192f37741068b3495f5+0x00667e37) #2 execute_decode_slices ffmpeg/libavcodec/h264.c:3709:16 (754b7d528ac43192f37741068b3495f5+0x0066691d) #3 decode_nal_units ffmpeg/libavcodec/h264.c:4010:17 (754b7d528ac43192f37741068b3495f5+0x00633530) #4 decode_frame ffmpeg/libavcodec/h264.c:4132:17 (754b7d528ac43192f37741068b3495f5+0x0064a6eb) #5 frame_worker_thread ffmpeg/libavcodec/pthread.c:389:21 (754b7d528ac43192f37741068b3495f5+0x0098669f) Previous read of size 8 at 0x7f53af71a830 by main thread (mutexes: write M16539): #0 memcpy llvm/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:685:5 (754b7d528ac43192f37741068b3495f5+0x000d6600) #1 ff_mpeg_update_thread_context ffmpeg/libavcodec/mpegvideo.c:542:9 (754b7d528ac43192f37741068b3495f5+0x008e08e7) #2 decode_update_thread_context ffmpeg/libavcodec/h264.c:1178:11 (754b7d528ac43192f37741068b3495f5+0x0064965a) #3 update_context_from_thread ffmpeg/libavcodec/pthread.c:462:19 (754b7d528ac43192f37741068b3495f5+0x00983a5e) #4 submit_packet ffmpeg/libavcodec/pthread.c:563:15 (754b7d528ac43192f37741068b3495f5+0x00982e0c) #5 ff_thread_decode_frame ffmpeg/libavcodec/pthread.c:624 (754b7d528ac43192f37741068b3495f5+0x00982e0c) #6 avcodec_decode_video2 ffmpeg/libavcodec/utils.c:1565:19 (754b7d528ac43192f37741068b3495f5+0x00a8e401) Code snippets for the stack trace of the first thread (thread T8): libavcodec/h264_cabac.c 2316 // decode_cabac_mb_dqp 2317 if(get_cabac_noinline( >cabac, >cabac_state[60 + (h->last_qscale_diff != 0)])){ 2318 int val = 1; 2319 int ctx= 2; 2320 const int max_qp = 51 + 6*(h->sps.bit_depth_luma-8); 2321 2322 while( get_cabac_noinline( >cabac, >cabac_state[60 + ctx] ) ) { 2323 ctx= 3; 2324 val++; 2325 if(val > 2*max_qp){ //prevent infinite loop 2326 av_log(h->s.avctx, AV_LOG_ERROR, "cabac decode of qscale diff failed at %d %d\n", s->mb_x, s->mb_y); 2327 return -1; 2328 } 2329 } 2330 2331 if( val&0x01 ) 2332 val= (val + 1)>>1 ; 2333 else 2334 val= -((val + 1)>>1); 2335 h->last_qscale_diff = val; 2336 s->qscale += val; 2337 if(((unsigned)s->qscale) > max_qp){ 2338 if(s->qscale<0) s->qscale+= max_qp+1; 2339 elses->qscale-= max_qp+1; 2340 } 2341 h->chroma_qp[0] = get_chroma_qp(h, 0, s->qscale); 2342 h->chroma_qp[1] = get_chroma_qp(h, 1, s->qscale); 2343 }else libavcodec/h264.c (the first two code snippets are in libavcodec/h264_slice.c in the current version): 3525 static int decode_slice(struct AVCodecContext *avctx, void *arg) 3526 { 3527 H264Context *h = *(void **)arg; 3528 MpegEncContext *const s = >s; 3529 const int part_mask = s->partitioned_frame ? (ER_AC_END | ER_AC_ERROR) 3530: 0x7F; 3531 int lf_x_start = s->mb_x; 3532 3533 s->mb_skip_run = -1; 3534 3535 h->is_complex = FRAME_MBAFF || s->picture_structure != PICT_FRAME || 3536 s->codec_id != AV_CODEC_ID_H264 || 3537 (CONFIG_GRAY && (s->flags & CODEC_FLAG_GRAY)); 3538 3539 if (h->pps.cabac) { 3540 /* realign */ 3541 align_get_bits(>gb); 3542 3543 /* init cabac */ 3544 ff_init_cabac_states(); 3545 ff_init_cabac_decoder(>cabac, 3546 s->gb.buffer + get_bits_count(>gb) / 8, 3547 (get_bits_left(>gb) + 7) / 8); 3548 3549 ff_h264_init_cabac_states(h); 3550 3551 for (;;) { 3552 // START_TIMER 3553 int ret =
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
Hi, On Fri, Feb 26, 2016 at 4:12 PM, Wan-Teh Changwrote: > On Fri, Feb 26, 2016 at 12:44 PM, Ronald S. Bultje > wrote: > > Hi, > > > > On Fri, Feb 26, 2016 at 3:26 PM, Ronald S. Bultje > > wrote: > > > >> > >> So doesn't this patch remove all concurrency by making the decode() of > >> thread N-1 finish before decode() of thread N can start? More > practically, > >> what is the output of time ffmpeg -threads N -i large-h264-file -f null > - > >> with N being 1 and 2 before and after your patch? > >> > > > > I actually tested this, just for fun: > > > > before: > > $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f > null > > -v 0 -nostats -vframes 500 -; done > > > > real 0m6.732s > > user 0m6.643s > > sys 0m0.064s > > > > real 0m4.124s <<< note how this is significantly smaller than 6.732 > > user 0m6.566s > > sys 0m0.114s > > > > after: > > $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f > null > > -v 0 -nostats -vframes 500 -; done > > > > real 0m6.808s > > user 0m6.541s > > sys 0m0.071s > > > > real 0m7.001s <<< note how this is no longer significantly smaller than > > 6.808 > > user 0m6.945s > > sys 0m0.108s > > > > That seems like a pretty major malfunction of MT performance to me... > > Hi Ronald, > > Thank you very much for reviewing and testing my patch. I will study > your description of the code, analyze the data race reported by > ThreadSanitizer again, and report back. > > Among the four data races I reported yesterday, this data race is the > most difficult to analyze because the code that has the data race is > outside libavcodec/pthread_frame.c, in the h264 decoder I remember. I'm happy to help out if you tell me which field/member tsan is complaining about. Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
On Fri, Feb 26, 2016 at 12:44 PM, Ronald S. Bultjewrote: > Hi, > > On Fri, Feb 26, 2016 at 3:26 PM, Ronald S. Bultje > wrote: > >> >> So doesn't this patch remove all concurrency by making the decode() of >> thread N-1 finish before decode() of thread N can start? More practically, >> what is the output of time ffmpeg -threads N -i large-h264-file -f null - >> with N being 1 and 2 before and after your patch? >> > > I actually tested this, just for fun: > > before: > $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f null > -v 0 -nostats -vframes 500 -; done > > real 0m6.732s > user 0m6.643s > sys 0m0.064s > > real 0m4.124s <<< note how this is significantly smaller than 6.732 > user 0m6.566s > sys 0m0.114s > > after: > $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f null > -v 0 -nostats -vframes 500 -; done > > real 0m6.808s > user 0m6.541s > sys 0m0.071s > > real 0m7.001s <<< note how this is no longer significantly smaller than > 6.808 > user 0m6.945s > sys 0m0.108s > > That seems like a pretty major malfunction of MT performance to me... Hi Ronald, Thank you very much for reviewing and testing my patch. I will study your description of the code, analyze the data race reported by ThreadSanitizer again, and report back. Among the four data races I reported yesterday, this data race is the most difficult to analyze because the code that has the data race is outside libavcodec/pthread_frame.c, in the h264 decoder I remember. Wan-Teh Chang ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
Hi, On Fri, Feb 26, 2016 at 3:26 PM, Ronald S. Bultjewrote: > Hi, > > On Thu, Feb 25, 2016 at 9:32 PM, Wan-Teh Chang < > wtc-at-google@ffmpeg.org> wrote: > >> This is because the codec->decode() call in frame_worker_thread may be >> modifying that avctx at the same time. This data race is reported by >> ThreadSanitizer. >> >> Although submit_thread holds two locks simultaneously, it always >> acquires them in the same order because |prev_thread| is always the >> array element before |p| (with a wraparound at the end of array), and >> submit_thread is the only function that acquires those two locks, so >> this won't result in a deadlock. >> --- >> libavcodec/pthread_frame.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c >> index b77dd1e..e7879e3 100644 >> --- a/libavcodec/pthread_frame.c >> +++ b/libavcodec/pthread_frame.c >> @@ -330,7 +330,9 @@ static int submit_packet(PerThreadContext *p, >> AVPacket *avpkt) >> pthread_mutex_unlock(_thread->progress_mutex); >> } >> >> +pthread_mutex_lock(_thread->mutex); >> err = update_context_from_thread(p->avctx, prev_thread->avctx, >> 0); >> +pthread_mutex_unlock(_thread->mutex); >> if (err) { >> pthread_mutex_unlock(>mutex); >> return err; >> -- >> 2.7.0.rc3.207.g0ac5344 > > > Uhm, isn't this the lock that is being held while we call decode()? > > static attribute_align_arg void *frame_worker_thread(void *arg) > { > [..] > pthread_mutex_lock(>mutex); > while (1) { > while (p->state == STATE_INPUT_READY && !fctx->die) > pthread_cond_wait(>input_cond, >mutex); > [..] > p->result = codec->decode(avctx, p->frame, >got_frame, > >avpkt); > [..] > } > pthread_mutex_unlock(>mutex); > [..] > } > > So doesn't this patch remove all concurrency by making the decode() of > thread N-1 finish before decode() of thread N can start? More practically, > what is the output of time ffmpeg -threads N -i large-h264-file -f null - > with N being 1 and 2 before and after your patch? > I actually tested this, just for fun: before: $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f null -v 0 -nostats -vframes 500 -; done real 0m6.732s user 0m6.643s sys 0m0.064s real 0m4.124s <<< note how this is significantly smaller than 6.732 user 0m6.566s sys 0m0.114s after: $ for t in 1 2; do time ffmpeg -threads $t -i tos3k-vp9-b5000.webm -f null -v 0 -nostats -vframes 500 -; done real 0m6.808s user 0m6.541s sys 0m0.071s real 0m7.001s <<< note how this is no longer significantly smaller than 6.808 user 0m6.945s sys 0m0.108s That seems like a pretty major malfunction of MT performance to me... Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
Hi, On Thu, Feb 25, 2016 at 9:32 PM, Wan-Teh Changwrote: > This is because the codec->decode() call in frame_worker_thread may be > modifying that avctx at the same time. This data race is reported by > ThreadSanitizer. > > Although submit_thread holds two locks simultaneously, it always > acquires them in the same order because |prev_thread| is always the > array element before |p| (with a wraparound at the end of array), and > submit_thread is the only function that acquires those two locks, so > this won't result in a deadlock. > --- > libavcodec/pthread_frame.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c > index b77dd1e..e7879e3 100644 > --- a/libavcodec/pthread_frame.c > +++ b/libavcodec/pthread_frame.c > @@ -330,7 +330,9 @@ static int submit_packet(PerThreadContext *p, AVPacket > *avpkt) > pthread_mutex_unlock(_thread->progress_mutex); > } > > +pthread_mutex_lock(_thread->mutex); > err = update_context_from_thread(p->avctx, prev_thread->avctx, 0); > +pthread_mutex_unlock(_thread->mutex); > if (err) { > pthread_mutex_unlock(>mutex); > return err; > -- > 2.7.0.rc3.207.g0ac5344 Uhm, isn't this the lock that is being held while we call decode()? static attribute_align_arg void *frame_worker_thread(void *arg) { [..] pthread_mutex_lock(>mutex); while (1) { while (p->state == STATE_INPUT_READY && !fctx->die) pthread_cond_wait(>input_cond, >mutex); [..] p->result = codec->decode(avctx, p->frame, >got_frame, >avpkt); [..] } pthread_mutex_unlock(>mutex); [..] } So doesn't this patch remove all concurrency by making the decode() of thread N-1 finish before decode() of thread N can start? More practically, what is the output of time ffmpeg -threads N -i large-h264-file -f null - with N being 1 and 2 before and after your patch? The contract between threads is like this: there is two types of fields, these required for decoding the next frame (A), and those irrelevant for decoding the next frame (B). When finish_setup() is called, we require all values of (A) to have reached their final value. We don't care about values of (B). Then, in the main thread, (A) are copied from prev_thread to this thread and we continue decoding. We should have continuous race conditions and real-world decoding failures if this contract were violated. But that's not the case, frame-MT has been fairly stable over the past years. So, let's make this real-world: what kind of variables is tsan complaining about? Are they really written after setup_finished is called and used in the next decode call? Or are they copied but never used? Or something else? Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
On 2/25/2016 11:32 PM, Wan-Teh Chang wrote: > This is because the codec->decode() call in frame_worker_thread may be > modifying that avctx at the same time. This data race is reported by > ThreadSanitizer. > > Although submit_thread holds two locks simultaneously, it always > acquires them in the same order because |prev_thread| is always the > array element before |p| (with a wraparound at the end of array), and > submit_thread is the only function that acquires those two locks, so > this won't result in a deadlock. Did you try running the FATE suite using threadsanitizer, or just decoded random videos? I remember like half the tests were failing because of data races and most of them pointed to code changed by your patches. It would be interesting to see how they are affected. configure has the --toolchain=gcc-tsan option for this. Make sure to also add --disable-stripping, then run "make fate THREADS=4 SAMPLES=/path/to/samples" or similar. You may instead want to run a small set at a time instead of the whole suite since it's big and threadsanitizer slow. Try for example fate-h264 in that case. Thanks for working on this. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] submit_thread should also lock the mutex that guards the source avctx (prev_thread->avctx) when calling update_context_from_thread.
This is because the codec->decode() call in frame_worker_thread may be modifying that avctx at the same time. This data race is reported by ThreadSanitizer. Although submit_thread holds two locks simultaneously, it always acquires them in the same order because |prev_thread| is always the array element before |p| (with a wraparound at the end of array), and submit_thread is the only function that acquires those two locks, so this won't result in a deadlock. --- libavcodec/pthread_frame.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c index b77dd1e..e7879e3 100644 --- a/libavcodec/pthread_frame.c +++ b/libavcodec/pthread_frame.c @@ -330,7 +330,9 @@ static int submit_packet(PerThreadContext *p, AVPacket *avpkt) pthread_mutex_unlock(_thread->progress_mutex); } +pthread_mutex_lock(_thread->mutex); err = update_context_from_thread(p->avctx, prev_thread->avctx, 0); +pthread_mutex_unlock(_thread->mutex); if (err) { pthread_mutex_unlock(>mutex); return err; -- 2.7.0.rc3.207.g0ac5344 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel