Thanks. This is all really helpful. I need to spend some time digesting
all of this information, reviewing the OpenSSL code and trying to figure
out what might be going on.

One of the things I am trying to do is to understand why the proposed
patch fixes the issue and if an alternative approach is possible.
Ideally, I'd like to fix this entirely in tc-native so we don't have to
cross the Java-native boundary an additional time which is relatively slow.

Thanks for all you help tracking this down so far.

Mark

On 25/06/2016 00:51, therealnewo...@gmail.com wrote:
> On Fri, Jun 24, 2016 at 5:31 PM, Mark Thomas <ma...@apache.org> wrote:
>> On 24/06/2016 21:52, therealnewo...@gmail.com wrote:
>>
>>>> I'm wondering if this is a problem that builds up over time. If I
>>>> understood your previous posts correctly, running the big tests
>>>> immediately gave ~700MB/s whereas running the small tests then the big
>>>> tests resulting in ~350MB/s during the big tests. Are you able to
>>>> experiment with this a little bit? For example, if you do big tests, 1M
>>>> (~20%) small tests, big tests, 1M small tests, big tests etc. What is
>>>> the data rate for the big tests after 0, 1M, 2M, 3M, 4M and 5M little 
>>>> tests.
>>>
>>> Sure I can try that. For the in between tests do you want me to run
>>> those for a set amount of time or number of files? Like each smaller
>>> batch like 20min and then 10min of large and then next smaller size?
>>
>> I was thinking set number of files.
>>
>> I would also be useful to know how many threads the executor has created
>> at each point as well. (JMX should tell you that. You might need to
>> adjust the executor so it doesn't stop idle threads.).
> 
> I saw your message about not stopping idle threads after I already
> started things.
> 
> 1st 100M test:
>     851348MB/s
>     Executor:
>         largestPoolSize: 25
>         poolSize: 25
> 1st 4k test
>     Executor:
>         largestPoolSize: 401
>         poolSize: 401
> 2nd 100M test:
>      460147MB/s
>     Executor:
>         largestPoolSize: 401
>         poolSize: 79
> 2nd 4k test
>     Executor:
>         largestPoolSize: 414
>         poolSize: 414
> 3rd 100M test:
>     429127MB/s
>     Executor:
>         largestPoolSize: 414
>         poolSize: 80
> 3rd 4k test:
>     Executor:
>         largestPoolSize: 414
>         poolSize: 401
> 4th 100M test:
>     437918MB/s
>     Executor:
>         largestPoolSize: 414
>         poolSize: 86
> 4th 4k test:
>     Executor:
>         largestPoolSize: 414
>         poolSize: 401
> 5th 100M test:
>     464837MB/s
>     Executor:
>         largestPoolSize: 414
>         poolSize: 87
> 
> It looks like the problem occurs right after the first set of 4k puts
> and doesn't get any worse so what ever causes the issue happens early.
> This is getting stranger and I really can not explain why calling
> ERR_remove_thread_state reliably improves performance.
> 
>> Going back to your original description, you said you saw blocking
>> during the call to ERR_clear_err(). Did you mean ERR_clear_error()?
>> Either way, could you provide the full stack trace of an example blocked
>> thread? And, ideally, the stack trace of the thread currently holding
>> the lock? I'm still trying to understand what is going on here since
>> based on my understanding of the code so far, the HashMap is bounded (to
>> the number of threads) and should reach that limit fairly quickly.
> 
> Sorry, yes I did mean ERR_clear_error().
> 
> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x00007f0f61ab989d in __GI___pthread_mutex_lock (mutex=0x2b49c58)
> at ../nptl/pthread_mutex_lock.c:80
> #2  0x00007f0f3205f183 in int_thread_get (create=0) at err.c:446
> #3  0x00007f0f3205f68d in int_thread_get_item (d=0x7f0ca89c7ce0) at err.c:491
> #4  0x00007f0f32060094 in ERR_get_state () at err.c:1014
> #5  0x00007f0f320602cf in ERR_clear_error () at err.c:747
> #6  0x00007f0f325f3579 in ssl_socket_recv (sock=0x7f0dcc391980,
>     buf=0x7f0eec067820
> "lock->199808-Source_filename->rhino_perf_https_lt_100g_a-Loop->1-Count->11089487-11089488-11089489-11089490-11089491-11089492-11089493-11089494-11089495-11089496-11089497-11089498-11089499-11089500-11"...,
>     len=0x7f0ca89c7ff0) at src/sslnetwork.c:401
> #7  0x00007f0f325ece99 in Java_org_apache_tomcat_jni_Socket_recvbb
> (e=<optimized out>, o=<optimized out>, sock=<optimized out>,
> offset=<optimized out>, len=<optimized out>) at src/network.c:957
> 
> I tried getting more data but the jvm tends to dump core when gdb is
> attached or is going to slow to actually cause the lock contention. I
> can reliably see a thread waiting on this lock if I attach to a single
> thread and randomly interrupt it and look at the back trace. When I
> look at the mutex it has a different owner each time so different
> threads are getting the lock.
> 
> I will play with this a bit more on Monday.
> 
> -nate
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: dev-h...@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to