Re: Bug that spans tomcat and tomcat-native

Mark Thomas Tue, 28 Jun 2016 02:15:47 -0700

On 27/06/2016 23:54, Nate Clark wrote:
> On Mon, Jun 27, 2016 at 5:19 PM, Mark Thomas <[email protected]> wrote:
>> On 27/06/2016 19:59, Nate Clark wrote:
>>> On Mon, Jun 27, 2016 at 2:13 PM, Rainer Jung <[email protected]> 
>>> wrote:
>>
>> <snip/>
>>
>>>> I wouldn't care too much about portability right now. If you can prove
>>>> Mark's analysis by using gettid() on Linux and checking that this fixes the
>>>> performance issue, that would very useful feedback. If it doesn't help, we
>>>> can go down the path of using the hash statistics to get more info about
>>>> what's happening. If it helps, we can think further about how to reduce 
>>>> hash
>>>> collisions fo every platform.
>>
>> I went down the hash statistics route any way. The short version is with
>> 11 threads and 8 buckets every thread ends up in the same bucket. This
>> was on OSX. I'm pretty sure Linux will be the same.
>>
>>> I can try that, but unfortunately will not be able to run the test
>>> right away. The machine I was using is a shared resource, which
>>> somebody else needs for the next few days.
>>
>> Next up is trying to figure out a patch to fix the bucket distribution.


Here is my proposed fix for OSX:

Index: src/ssl.c
===================================================================
--- src/ssl.c   (revision 1750259)
+++ src/ssl.c   (working copy)
@@ -420,6 +420,10 @@
     return psaptr->PSATOLD;
 #elif defined(WIN32)
     return (unsigned long)GetCurrentThreadId();
+#elif defined(DARWIN)
+    uint64_t tid;
+    pthread_threadid_np(NULL, &tid);
+    return (unsigned long)tid;
 #else
     return (unsigned long)(apr_os_thread_current());
 #endif


I want to do some similar testing for Linux before adding what I suspect
will be a very similar block using gettid().

>>> Even if that does help the performance and makes it so the hash map
>>> has better distribution. It won't help with long term bloat. If a
>>> system is running long enough and threads are started and stopped the
>>> hash map could grow to include up to almost max pid entries, 32768 by
>>> default on Linux. With the size of the ERR_STATE struct that will be
>>> roughly 12MB of memory, not horrible but not ideal. I still think
>>> ERR_remove_thread_state() should probably be invoked prior to a thread
>>> exiting. The only way I can think of doing that and keeping everything
>>> in native and not call it for every request would be to call
>>> pthread_cleanup_push(ERR_remove_thread_state, NULL) the first time a
>>> thread is used to perform any ssl operation. That would then require
>>> you to track per thread if the cleanup function had been registered.
>>
>> That is definitely a secondary consideration at the moment.
>>
>>> Also, any solution which make the thread id random and not a bounded
>>> set of values would cause the max memory foot print of the hash map to
>>> be virtually unbounded and require that ERR_remove_thread_state be
>>> invoked to clean up the error states.
>>
>> The function that generated the 'random' ID from the thread ID would
>> have to be deterministic so the total number of values should remain
>> bounded. Shouldn't it?
>>
> 
> If the error queue's are not being removed that would be a
> requirement. If it is to be bound it can not use
> apr_os_thread_current(), since that is a "random" memory address and
> inherently unbounded. If you tried to reduce that value to a smaller
> set you would then have to worry about collisions which is something
> any algorithm would have to worry about. The code in err.c assumes
> that the error state is only ever modified by one thread and doesn't
> lock modifying the state just the map access. The easiest thing to
> base the ID on, from a Linux perspective, would be the TID, which is
> bounded and guaranteed to be unique for a running thread. I am not as
> sure about other operating systems. It seems like on some versions of
> OSx pthread_getthreadid_np(), might do what you want.

I do think OS specific code to get the real thread ID is the way to go
here until we can switch to OpenSSL 1.1.0 at which point the problem
should go away.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Bug that spans tomcat and tomcat-native

Reply via email to