On Mon, Apr 19 2021, Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote: > On Fri, Apr 09 2021, Mark Kettenis <mark.kette...@xs4all.nl> wrote: >>> Date: Fri, 9 Apr 2021 07:09:06 +1000 >>> From: Jonathan Matthew <jonat...@d14n.org> >>> >>> On Thu, Apr 08, 2021 at 10:24:06AM +0200, Martin Pieuchot wrote: >>> > firefox often crash when somebody else connects to the jitsi I'm in. >>> > The trace looks like a stack exhaustion, see below. >>> > >>> > Does this ring a bell? >>> > >>> > #530 <signal handler called> >>> > #531 pthread_setschedparam (thread=0x0, policy=1, param=0x2ade5ca1df0) >>> > at /usr/src/lib/librthread/rthread_sched.c:56 >>> > #532 0x000002ae65f9f016 in >>> > rtc::PlatformThread::SetPriority(rtc::ThreadPriority) () from >>> > /usr/local/lib/firefox/libxul.so.101.0 >>> > #533 0x000002ae65f9ed75 in rtc::PlatformThread::Run() () >>> > from /usr/local/lib/firefox/libxul.so.101.0 >>> > #534 0x000002ae65f9ead9 in rtc::PlatformThread::StartThread(void*) () >>> > from /usr/local/lib/firefox/libxul.so.101.0 >>> > #535 0x000002ade9d4df51 in _rthread_start (v=<optimized out>) >>> > at /usr/src/lib/librthread/rthread.c:96 >>> > #536 0x000002ad9c2ec3da in __tfork_thread () >>> > at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 >>> >>> This looks like, at least with our pthreads implementation, libwebrtc has a >>> race between the newly created thread and the thread creating it. >>> >>> The creating thread stores the thread handle in thread_ here: >>> https://github.com/mozilla/gecko-dev/blob/master/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#L186 >>> >>> and the new thread uses it here: >>> https://github.com/mozilla/gecko-dev/blob/master/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#L363 >>> which is called as almost the first thing the new thread does. >>> >>> Our pthread_create() only stores the pthread handle into the supplied >>> address after __tfork_thread() returns, by which time the new thread >>> could already be running. >> >> Right. POSIX isn't very explicit about this, but it does state that >> the thread ID is stored upon successful completion. So I don't think >> portable code can rely on that until pthread_create() has returned. >> >> Using pthread_self() in the pthread_setschedparam() call will probably >> help. But if PlatformThread::GetThreadRef() gets called early on in >> the thread, there will still be a race. Setting thread_ at the start of >> PlatformThread::Run() would be better. > > I didn't have much time to read the firefox code and rebuild it, but > fwiw the librthread diff below fixes the issue for me (diff implemented > following jmatthew's analysis). > > Test setup: 2 firefox clients in a jitsi conference + another client > connecting in a loop. > Test results: 8/9 crashes with the unpatched librthread, 0 with the > patched librthread.
Missing data: that was with the third client connection 50 times to the conference. > The benefit of being lenient here is that we don't have to fix other > software that could depend on this type of race to work properly. > And of course there's no need to rebuild firefox, which I heard is > almost a valid technical argument these days. :) > > But I agree that the "proper" fix would be in the firefox codebase, and > Mark's proposal makes sense to me even without reading the code. > > Not asking for oks. Which way do you folks prefer to go? > > Index: rthread.c > =================================================================== > RCS file: /d/cvs/src/lib/librthread/rthread.c,v > retrieving revision 1.99 > diff -u -p -p -u -r1.99 rthread.c > --- rthread.c 4 Nov 2017 22:53:57 -0000 1.99 > +++ rthread.c 19 Apr 2021 11:53:39 -0000 > @@ -391,15 +391,15 @@ pthread_create(pthread_t *threadp, const > LIST_INSERT_HEAD(&_thread_list, thread, threads); > _spinunlock(&_thread_lock); > > + *threadp = thread; > /* we're going to be multi-threaded real soon now */ > __isthreaded = 1; > rc = __tfork_thread(¶m, sizeof(param), _rthread_start, thread); > if (rc != -1) { > /* success */ > - *threadp = thread; > return (0); > } > - > + > rc = errno; > > _spinlock(&_thread_lock); > @@ -408,6 +408,11 @@ pthread_create(pthread_t *threadp, const > _rthread_free_stack(thread->stack); > fail1: > _dl_free_tib(tib, sizeof(*thread)); > + > + /* > + * XXX let *threadp point to garbage? Should we reset it to NULL > + * or its previous value ? > + */ > > return (rc); > } -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE