On Mon, Apr 19 2021, Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote:
> On Fri, Apr 09 2021, Mark Kettenis <mark.kette...@xs4all.nl> wrote:
>>> Date: Fri, 9 Apr 2021 07:09:06 +1000
>>> From: Jonathan Matthew <jonat...@d14n.org>
>>> 
>>> On Thu, Apr 08, 2021 at 10:24:06AM +0200, Martin Pieuchot wrote:
>>> > firefox often crash when somebody else connects to the jitsi I'm in.
>>> > The trace looks like a stack exhaustion, see below. 
>>> > 
>>> > Does this ring a bell?
>>> > 
>>> > #530 <signal handler called>
>>> > #531 pthread_setschedparam (thread=0x0, policy=1, param=0x2ade5ca1df0)
>>> >     at /usr/src/lib/librthread/rthread_sched.c:56
>>> > #532 0x000002ae65f9f016 in 
>>> > rtc::PlatformThread::SetPriority(rtc::ThreadPriority) () from 
>>> > /usr/local/lib/firefox/libxul.so.101.0
>>> > #533 0x000002ae65f9ed75 in rtc::PlatformThread::Run() ()
>>> >    from /usr/local/lib/firefox/libxul.so.101.0
>>> > #534 0x000002ae65f9ead9 in rtc::PlatformThread::StartThread(void*) ()
>>> >    from /usr/local/lib/firefox/libxul.so.101.0
>>> > #535 0x000002ade9d4df51 in _rthread_start (v=<optimized out>)
>>> >     at /usr/src/lib/librthread/rthread.c:96
>>> > #536 0x000002ad9c2ec3da in __tfork_thread ()
>>> >     at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
>>> 
>>> This looks like, at least with our pthreads implementation, libwebrtc has a
>>> race between the newly created thread and the thread creating it.
>>> 
>>> The creating thread stores the thread handle in thread_ here:
>>> https://github.com/mozilla/gecko-dev/blob/master/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#L186
>>> 
>>> and the new thread uses it here:
>>> https://github.com/mozilla/gecko-dev/blob/master/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#L363
>>> which is called as almost the first thing the new thread does.
>>> 
>>> Our pthread_create() only stores the pthread handle into the supplied
>>> address after __tfork_thread() returns, by which time the new thread
>>> could already be running.
>>
>> Right.  POSIX isn't very explicit about this, but it does state that
>> the thread ID is stored upon successful completion.  So I don't think
>> portable code can rely on that until pthread_create() has returned.
>>
>> Using pthread_self() in the pthread_setschedparam() call will probably
>> help.  But if PlatformThread::GetThreadRef() gets called early on in
>> the thread, there will still be a race.  Setting thread_ at the start of
>> PlatformThread::Run() would be better.
>
> I didn't have much time to read the firefox code and rebuild it, but
> fwiw the librthread diff below fixes the issue for me (diff implemented
> following jmatthew's analysis).
>
> Test setup: 2 firefox clients in a jitsi conference + another client
> connecting in a loop.
> Test results: 8/9 crashes with the unpatched librthread, 0 with the
> patched librthread.

Missing data: that was with the third client connection 50 times to the
conference.

> The benefit of being lenient here is that we don't have to fix other
> software that could depend on this type of race to work properly.
> And of course there's no need to rebuild firefox, which I heard is
> almost a valid technical argument these days. :)
>
> But I agree that the "proper" fix would be in the firefox codebase, and
> Mark's proposal makes sense to me even without reading the code.
>
> Not asking for oks.  Which way do you folks prefer to go?
>
> Index: rthread.c
> ===================================================================
> RCS file: /d/cvs/src/lib/librthread/rthread.c,v
> retrieving revision 1.99
> diff -u -p -p -u -r1.99 rthread.c
> --- rthread.c 4 Nov 2017 22:53:57 -0000       1.99
> +++ rthread.c 19 Apr 2021 11:53:39 -0000
> @@ -391,15 +391,15 @@ pthread_create(pthread_t *threadp, const
>       LIST_INSERT_HEAD(&_thread_list, thread, threads);
>       _spinunlock(&_thread_lock);
>  
> +     *threadp = thread;
>       /* we're going to be multi-threaded real soon now */
>       __isthreaded = 1;
>       rc = __tfork_thread(&param, sizeof(param), _rthread_start, thread);
>       if (rc != -1) {
>               /* success */
> -             *threadp = thread;
>               return (0);
>       }
> -             
> +
>       rc = errno;
>  
>       _spinlock(&_thread_lock);
> @@ -408,6 +408,11 @@ pthread_create(pthread_t *threadp, const
>       _rthread_free_stack(thread->stack);
>  fail1:
>       _dl_free_tib(tib, sizeof(*thread));
> +
> +     /*
> +      * XXX let *threadp point to garbage? Should we reset it to NULL
> +      * or its previous value ?
> +      */
>  
>       return (rc);
>  }

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE

Reply via email to