On Mon, Jun 27, 2016 at 9:11 AM, Mark Thomas <ma...@apache.org> wrote:
> I believe I have an explanation for what is going on that fits both the
> reported behaviour and the proposed fix.
>
> Background
> ==========
>
> OpenSSL tracks a list of the most recent errors for each thread in a
> hash map keyed on the thread (int_thread_hash in err.c). Reading and
> writing to this hash map is protected by a lock. The hash map is created
> and populated lazily.
>
> tc-native calls ERR_clear_error() before every call to
> SSL_do_handshake(), SSL_read() and SSL_write(). The call to
> ERR_clear_error() either clears the error list for the current thread or
> inserts a new empty list into the hash map of the thread is not already
> present.
>
> The performance problem was tracked down to threads waiting in
> ERR_clear_error() to obtain the write lock for the hash map.
>
> The proposed solution was to call ERR_remove_thread_state() just before
> the current Tomcat thread processing the connection is returned to the
> thread pool. This method removes the current thread and its associated
> error list from the hash map.
>
>
> Analysis
> ========
>
> The proposed solution, calling ERR_remove_thread_state(), adds a call
> that also obtains the write lock for the hash map. This indicates that
> the problem is not delays in obtaining the lock but contention for the
> lock because one or more operations taking place within the lock are
> taking a long time.
>
> Removing unused threads from the hash map removes the bottleneck. This
> points towards the hash map being the source of the problem.
>
> Testing by the OP showed that as soon as a test had been ran that
> required ~ 400 concurrent threads performance dropped significantly. It
> did not get noticeably worse if the same 400 thread test was run repeatedly.
>
> My testing indicated, on OSX at least, that the thread IDs used in the
> hash map were stable and that uncontrolled growth of the hash map was
> unlikely to be the cause.
>
> The manner in which thread IDs are generated varies by platform. On
> Linux, where this problem was observed, the thread ID is derived from
> (is normally equal to) the memory address of the per thread errno
> variable. This means that thread IDs tend to be concentrated in a
> relatively narrow range of values. For example, in a simple 10 thread
> test on OSX thread IDs ranged from 123145344839680 to 123145354387455.
>
> Thread IDs therefore fall with a 10^7 range within a possible range of
> 1.8x10^19. i.e. a very small, contiguous range.
>
> Hash maps use hashing functions to ensure that entries are (roughly)
> evenly distributed between the available buckets. The hash function,
> err_state_hash, used for the thread IDs in OpenSSL is threadID * 13.
>
> Supposition
> ===========
>
> The hash function used (multiple by 13) is insufficient to distribute
> the resulting values across multiple buckets because they will still
> fall in a relatively narrow band. Therefore all the threads end up in a
> single bucket which makes the performance of the hash map poor. This in
> turn makes calls to thread_get_item() slow because it does a hash map
> lookup. This lookup is performed with the read lock held for the hash
> map which in turn will slow down the calls that require the write lock.

This makes sense. Looking at openssl source code a bit more, I think
it is a combination of the hash function and bucket selection in the
hash map. The bucket which is chosen is basically done by a low bit
mask since pmax and num_alloc_nodes are always powers of 2. With 400
entries pmax=128 and num_alloc_nodes=256, making the bucket selection
either hash & 0x7F or hash & 0xFF. The errno value memory locations
are probably aligned and often that mask will result in the same
bucket being chosen.

> Proposal
> ========
>
> The analysis and supposition above need to be checked by someone with a
> better understanding of C than me. Assuming my work is correct, the next
> step is to look at possible fixes. I do not believe that patching
> OpenSSL is a viable option.
>
> The OpenSSL API needs to be reviewed to see if there is a way to avoid
> the calls that require the write lock.
>
> If the write lock cannot be avoided then we need to see if there is a
> better place to call ERR_remove_thread_state(). I'd like to fix this
> entirely in tc-native but that may mean calling
> ERR_remove_thread_state() more frequently which could create its own
> performance problems.
>
> Nate - I may have some patches for you to test in the next few days.

Sounds good.

Thanks,
-nate

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to