Thanks. This is all really helpful. I need to spend some time digesting all of this information, reviewing the OpenSSL code and trying to figure out what might be going on.
One of the things I am trying to do is to understand why the proposed patch fixes the issue and if an alternative approach is possible. Ideally, I'd like to fix this entirely in tc-native so we don't have to cross the Java-native boundary an additional time which is relatively slow. Thanks for all you help tracking this down so far. Mark On 25/06/2016 00:51, therealnewo...@gmail.com wrote: > On Fri, Jun 24, 2016 at 5:31 PM, Mark Thomas <ma...@apache.org> wrote: >> On 24/06/2016 21:52, therealnewo...@gmail.com wrote: >> >>>> I'm wondering if this is a problem that builds up over time. If I >>>> understood your previous posts correctly, running the big tests >>>> immediately gave ~700MB/s whereas running the small tests then the big >>>> tests resulting in ~350MB/s during the big tests. Are you able to >>>> experiment with this a little bit? For example, if you do big tests, 1M >>>> (~20%) small tests, big tests, 1M small tests, big tests etc. What is >>>> the data rate for the big tests after 0, 1M, 2M, 3M, 4M and 5M little >>>> tests. >>> >>> Sure I can try that. For the in between tests do you want me to run >>> those for a set amount of time or number of files? Like each smaller >>> batch like 20min and then 10min of large and then next smaller size? >> >> I was thinking set number of files. >> >> I would also be useful to know how many threads the executor has created >> at each point as well. (JMX should tell you that. You might need to >> adjust the executor so it doesn't stop idle threads.). > > I saw your message about not stopping idle threads after I already > started things. > > 1st 100M test: > 851348MB/s > Executor: > largestPoolSize: 25 > poolSize: 25 > 1st 4k test > Executor: > largestPoolSize: 401 > poolSize: 401 > 2nd 100M test: > 460147MB/s > Executor: > largestPoolSize: 401 > poolSize: 79 > 2nd 4k test > Executor: > largestPoolSize: 414 > poolSize: 414 > 3rd 100M test: > 429127MB/s > Executor: > largestPoolSize: 414 > poolSize: 80 > 3rd 4k test: > Executor: > largestPoolSize: 414 > poolSize: 401 > 4th 100M test: > 437918MB/s > Executor: > largestPoolSize: 414 > poolSize: 86 > 4th 4k test: > Executor: > largestPoolSize: 414 > poolSize: 401 > 5th 100M test: > 464837MB/s > Executor: > largestPoolSize: 414 > poolSize: 87 > > It looks like the problem occurs right after the first set of 4k puts > and doesn't get any worse so what ever causes the issue happens early. > This is getting stranger and I really can not explain why calling > ERR_remove_thread_state reliably improves performance. > >> Going back to your original description, you said you saw blocking >> during the call to ERR_clear_err(). Did you mean ERR_clear_error()? >> Either way, could you provide the full stack trace of an example blocked >> thread? And, ideally, the stack trace of the thread currently holding >> the lock? I'm still trying to understand what is going on here since >> based on my understanding of the code so far, the HashMap is bounded (to >> the number of threads) and should reach that limit fairly quickly. > > Sorry, yes I did mean ERR_clear_error(). > > #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > #1 0x00007f0f61ab989d in __GI___pthread_mutex_lock (mutex=0x2b49c58) > at ../nptl/pthread_mutex_lock.c:80 > #2 0x00007f0f3205f183 in int_thread_get (create=0) at err.c:446 > #3 0x00007f0f3205f68d in int_thread_get_item (d=0x7f0ca89c7ce0) at err.c:491 > #4 0x00007f0f32060094 in ERR_get_state () at err.c:1014 > #5 0x00007f0f320602cf in ERR_clear_error () at err.c:747 > #6 0x00007f0f325f3579 in ssl_socket_recv (sock=0x7f0dcc391980, > buf=0x7f0eec067820 > "lock->199808-Source_filename->rhino_perf_https_lt_100g_a-Loop->1-Count->11089487-11089488-11089489-11089490-11089491-11089492-11089493-11089494-11089495-11089496-11089497-11089498-11089499-11089500-11"..., > len=0x7f0ca89c7ff0) at src/sslnetwork.c:401 > #7 0x00007f0f325ece99 in Java_org_apache_tomcat_jni_Socket_recvbb > (e=<optimized out>, o=<optimized out>, sock=<optimized out>, > offset=<optimized out>, len=<optimized out>) at src/network.c:957 > > I tried getting more data but the jvm tends to dump core when gdb is > attached or is going to slow to actually cause the lock contention. I > can reliably see a thread waiting on this lock if I attach to a single > thread and randomly interrupt it and look at the back trace. When I > look at the mutex it has a different owner each time so different > threads are getting the lock. > > I will play with this a bit more on Monday. > > -nate > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org