On Mon, Jun 27, 2016 at 11:54 AM, Rainer Jung <rainer.j...@kippdata.de> wrote:
> Hi Mark,
>
>
> Am 27.06.2016 um 15:11 schrieb Mark Thomas:
>>
>> I believe I have an explanation for what is going on that fits both the
>> reported behaviour and the proposed fix.
>>
>> Background
>> ==========
>>
>> OpenSSL tracks a list of the most recent errors for each thread in a
>> hash map keyed on the thread (int_thread_hash in err.c). Reading and
>> writing to this hash map is protected by a lock. The hash map is created
>> and populated lazily.
>>
>> tc-native calls ERR_clear_error() before every call to
>> SSL_do_handshake(), SSL_read() and SSL_write(). The call to
>> ERR_clear_error() either clears the error list for the current thread or
>> inserts a new empty list into the hash map of the thread is not already
>> present.
>>
>> The performance problem was tracked down to threads waiting in
>> ERR_clear_error() to obtain the write lock for the hash map.
>>
>> The proposed solution was to call ERR_remove_thread_state() just before
>> the current Tomcat thread processing the connection is returned to the
>> thread pool. This method removes the current thread and its associated
>> error list from the hash map.
>>
>>
>> Analysis
>> ========
>>
>> The proposed solution, calling ERR_remove_thread_state(), adds a call
>> that also obtains the write lock for the hash map. This indicates that
>> the problem is not delays in obtaining the lock but contention for the
>> lock because one or more operations taking place within the lock are
>> taking a long time.
>>
>> Removing unused threads from the hash map removes the bottleneck. This
>> points towards the hash map being the source of the problem.
>>
>> Testing by the OP showed that as soon as a test had been ran that
>> required ~ 400 concurrent threads performance dropped significantly. It
>> did not get noticeably worse if the same 400 thread test was run
>> repeatedly.
>>
>> My testing indicated, on OSX at least, that the thread IDs used in the
>> hash map were stable and that uncontrolled growth of the hash map was
>> unlikely to be the cause.
>>
>> The manner in which thread IDs are generated varies by platform. On
>> Linux, where this problem was observed, the thread ID is derived from
>> (is normally equal to) the memory address of the per thread errno
>> variable. This means that thread IDs tend to be concentrated in a
>> relatively narrow range of values. For example, in a simple 10 thread
>> test on OSX thread IDs ranged from 123145344839680 to 123145354387455.
>>
>> Thread IDs therefore fall with a 10^7 range within a possible range of
>> 1.8x10^19. i.e. a very small, contiguous range.
>>
>> Hash maps use hashing functions to ensure that entries are (roughly)
>> evenly distributed between the available buckets. The hash function,
>> err_state_hash, used for the thread IDs in OpenSSL is threadID * 13.
>>
>> Supposition
>> ===========
>>
>> The hash function used (multiple by 13) is insufficient to distribute
>> the resulting values across multiple buckets because they will still
>> fall in a relatively narrow band. Therefore all the threads end up in a
>> single bucket which makes the performance of the hash map poor. This in
>> turn makes calls to thread_get_item() slow because it does a hash map
>> lookup. This lookup is performed with the read lock held for the hash
>> map which in turn will slow down the calls that require the write lock.
>>
>> Proposal
>> ========
>>
>> The analysis and supposition above need to be checked by someone with a
>> better understanding of C than me. Assuming my work is correct, the next
>> step is to look at possible fixes. I do not believe that patching
>> OpenSSL is a viable option.
>>
>> The OpenSSL API needs to be reviewed to see if there is a way to avoid
>> the calls that require the write lock.
>>
>> If the write lock cannot be avoided then we need to see if there is a
>> better place to call ERR_remove_thread_state(). I'd like to fix this
>> entirely in tc-native but that may mean calling
>> ERR_remove_thread_state() more frequently which could create its own
>> performance problems.
>>
>> Nate - I may have some patches for you to test in the next few days.
>>
>> Mark
>
>
> Great analysis. I was really wondering, what could make the hash map so huge
> and hadn't thought about the hash function as the problem.
>
> Before OpenSSL 1.1.0 there's a callback for applications to provide their
> own thread IDs:
>
> https://www.openssl.org/docs/man1.0.2/crypto/CRYPTO_THREADID_set_callback.html
>
> So we could probably work around the problem of the poor hashing function by
> passing in IDs that work for hashing (pre-hashed ID?). Of course then we
> loose the direct association of the OpenSSL thread ID with the real platform
> thread id.
>
> Currently our callback in tcnative is ssl_set_thread_id() which refers to
> ssl_thread_id(), which on Linux gets the ID from the APR function
> apr_os_thread_current(). So we could add some hashing formula in
> ssl_thread_id().

I think just using the real thread id would work, since now it isn't
using the real thread id instead it is using the address location of
errno. If this was the real thread id I think the hash algorithm and
bucket selection they have now will work much better since the thread
ids are basically numerically increasing and aren't aligned to powers
of 2.

> In OpenSSL 1.1.0 this thread ID callback is gone. We'll see whether the lock
> hash problem is gone too ...

1.1.0 doesn't use the hash table anymore. For systems using posix
threads they are using pthread keys to store their error queues. This
is also why you do not have to remove the state when the thread exits
since pthreads does it for you.

-nate

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to