On Mon, Jun 27, 2016 at 9:11 AM, Mark Thomas <ma...@apache.org> wrote: > I believe I have an explanation for what is going on that fits both the > reported behaviour and the proposed fix. > > Background > ========== > > OpenSSL tracks a list of the most recent errors for each thread in a > hash map keyed on the thread (int_thread_hash in err.c). Reading and > writing to this hash map is protected by a lock. The hash map is created > and populated lazily. > > tc-native calls ERR_clear_error() before every call to > SSL_do_handshake(), SSL_read() and SSL_write(). The call to > ERR_clear_error() either clears the error list for the current thread or > inserts a new empty list into the hash map of the thread is not already > present. > > The performance problem was tracked down to threads waiting in > ERR_clear_error() to obtain the write lock for the hash map. > > The proposed solution was to call ERR_remove_thread_state() just before > the current Tomcat thread processing the connection is returned to the > thread pool. This method removes the current thread and its associated > error list from the hash map. > > > Analysis > ======== > > The proposed solution, calling ERR_remove_thread_state(), adds a call > that also obtains the write lock for the hash map. This indicates that > the problem is not delays in obtaining the lock but contention for the > lock because one or more operations taking place within the lock are > taking a long time. > > Removing unused threads from the hash map removes the bottleneck. This > points towards the hash map being the source of the problem. > > Testing by the OP showed that as soon as a test had been ran that > required ~ 400 concurrent threads performance dropped significantly. It > did not get noticeably worse if the same 400 thread test was run repeatedly. > > My testing indicated, on OSX at least, that the thread IDs used in the > hash map were stable and that uncontrolled growth of the hash map was > unlikely to be the cause. > > The manner in which thread IDs are generated varies by platform. On > Linux, where this problem was observed, the thread ID is derived from > (is normally equal to) the memory address of the per thread errno > variable. This means that thread IDs tend to be concentrated in a > relatively narrow range of values. For example, in a simple 10 thread > test on OSX thread IDs ranged from 123145344839680 to 123145354387455. > > Thread IDs therefore fall with a 10^7 range within a possible range of > 1.8x10^19. i.e. a very small, contiguous range. > > Hash maps use hashing functions to ensure that entries are (roughly) > evenly distributed between the available buckets. The hash function, > err_state_hash, used for the thread IDs in OpenSSL is threadID * 13. > > Supposition > =========== > > The hash function used (multiple by 13) is insufficient to distribute > the resulting values across multiple buckets because they will still > fall in a relatively narrow band. Therefore all the threads end up in a > single bucket which makes the performance of the hash map poor. This in > turn makes calls to thread_get_item() slow because it does a hash map > lookup. This lookup is performed with the read lock held for the hash > map which in turn will slow down the calls that require the write lock.
This makes sense. Looking at openssl source code a bit more, I think it is a combination of the hash function and bucket selection in the hash map. The bucket which is chosen is basically done by a low bit mask since pmax and num_alloc_nodes are always powers of 2. With 400 entries pmax=128 and num_alloc_nodes=256, making the bucket selection either hash & 0x7F or hash & 0xFF. The errno value memory locations are probably aligned and often that mask will result in the same bucket being chosen. > Proposal > ======== > > The analysis and supposition above need to be checked by someone with a > better understanding of C than me. Assuming my work is correct, the next > step is to look at possible fixes. I do not believe that patching > OpenSSL is a viable option. > > The OpenSSL API needs to be reviewed to see if there is a way to avoid > the calls that require the write lock. > > If the write lock cannot be avoided then we need to see if there is a > better place to call ERR_remove_thread_state(). I'd like to fix this > entirely in tc-native but that may mean calling > ERR_remove_thread_state() more frequently which could create its own > performance problems. > > Nate - I may have some patches for you to test in the next few days. Sounds good. Thanks, -nate --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org