On 2/5/2017 20:57, Stefan wrote: > On 2/4/2017 12:41, Stefan Fuhrmann wrote: >> On 31.01.2017 10:09, Stefan wrote: >>> Hi, >>> >>> I've been looking at the cause of a deadlock when running ra-test.exe >>> with -fs-type=fsx (trunk version). >>> >>> The most important findings are summed up here atm [1]. >>> >>> The issue was discussed with brane and danielsh on IRC (thanks for your >>> time, once again). >>> >>> As far as my current understanding of the problem goes: the deadlock is >>> caused by the fact that the apr_terminate() function registered in >>> svn_cmdline_init() via the atexit-call is called after the termination >>> of the threads which were created as part of the calls to >>> apr_thread_pool_push() in svn_fs_x__batch_fsync_run(). >>> >>> This means that apr's thread counter (thd_cnt) is getting out of sync >>> (since the apr-function thread_pool_func() is not executed) and then >>> gets stuck in thread_pool_cleanup() (waiting for the already terminated >>> threads to be terminated). >>> >>> To me it looks like svnserve's main-function already contains a >>> safeguard against a corresponding issue, and calls >>> apr_thread_pool_destroy(threads) (or was this a completely different >>> scenario?). This however does not cover the threads created from >>> svn_fs_x__batch_fsync_run(). >>> >>> Talking to danielsh and brane it became apparent to me that the issue >>> might not be too obvious (in the end it might still be an issue on how I >>> build SVN and therefore cause the atexit-registered apr_terminate() >>> function to be called too late). It's also not fully clear to me at >>> which exact point (in regards to registerd atexit()-calls) threads of >>> the process are terminated if the process itself terminates. If indeed >>> atexit()-registered functions get called after the threads are forcibly >>> terminates (which to me it looks like it does atm) it might contradict >>> the C(89/99) standard - see[2] 7.20.4.2/7.20.4.3. On the other side this >>> thread on stackoverflow [3] suggests it's simply undefined (by the >>> standard) what comes first. >>> >>> As danielsh suggested, I'm planning to come up with a plain minimal >>> repro app only based on APR demonstrating the problem, so to make it >>> more obvious (and double check for myself) what the issue is about. >>> >>> Regards, >>> Stefan >>> >>> [1] http://www.luke1410.de:8090/browse/MAXSVN-94 >>> [2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf >>> [3] >>> https://stackoverflow.com/questions/39655868/what-does-the-posix-standard-say-about-thread-stacks-in-atexit-handlers-what >>> >> Hi Stefan, >> >> I had a look at the code and found a possibly related problem. >> If you are using DLLs, this might have affected you. >> >> It would be nice if you could try r1781657 and see whether it >> makes any difference in your case. >> >> -- Stefan^2. > Hi Stefan^2, > > I tested trunk r1781790 which also includes your follow-up commit > (r1781726). With that one the ra-test.exe test which previously > deadlocked passes now. However, test 60 (basic_test.py) deadlocks now > (svnmucc.exe seems to be the process which is being tested here). > > I'm planning to details of the underlying issue which I think has now > been traced down to the actual root-cause in a blog post most likely > tomorrow. That should explain the actual issue in full detail then. > Details are published now here: http://www.luke1410.de/blog/?p=95
Regards, Stefan
smime.p7s
Description: S/MIME Cryptographic Signature