RE: Deadlock in openssl1.0.1e.
-Original Message- From: owner-openssl-us...@openssl.org [mailto:owner-openssl- us...@openssl.org] On Behalf Of Dr. Stephen Henson Sent: Wednesday, January 15, 2014 1:56 AM To: openssl-users@openssl.org Subject: Re: Deadlock in openssl1.0.1e. On Tue, Jan 14, 2014, Tayade, Nilesh wrote: Hi, I am not able to understand what???s going wrong, below is the latest stack trace I got. Thread 19 is killed by a watchdog, and it was waiting on a lock. Other threads do not show any lock being held. I'd suggest you modify your locking callback to store the location (file, line number) for the lock in question (18 which is CRYPTO_LOCK_RAND) when a lock is acquired successfully and clear it when the lock is released. Then when you get a deadlock you can see the last instance that held (but did not release) the lock. You could try compiling with LOCK_DEBUG but that will create a *lot* of output. Thanks for your suggestions. I did try capturing the debug information and below is how the state of the locks is. The array per lock stores information of the owner, file and lock_present i.e. lock is held or not. I see that the lock # 18 is held but when thread 19 was waiting, it was trying for lock # 19. Not sure what's going on. (gdb) p ssl_locks[18] $22 = { thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 384, lock_present = 1 } (gdb) p ssl_locks[19] $23 = { thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 389, lock_present = 0 } The mutex buffer looks like this. I am unable to interpret all the fields, any pointers on these? (gdb) p mutex_buf[19] $15 = { __data = { __lock = -1770245546, __count = 161970406, __owner = -794915378, __nusers = 890065223, __kind = 634752049, __spins = 239812167, __list = { __prev = 0x3cab28a258ddf6a1, __next = 0x1b78c957a2cf4cef } }, __size = V2|\226\346x\247\t\316\215\236\320GQ\r51\214\325%GK\016\241\366\335X\242(\253\357LϢW\311x\033, __align = 695657599214563926 } Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org -- Thanks, Nilesh
Re: Deadlock in openssl1.0.1e.
On Wed, Jan 15, 2014, Tayade, Nilesh wrote: Thanks for your suggestions. I did try capturing the debug information and below is how the state of the locks is. The array per lock stores information of the owner, file and lock_present i.e. lock is held or not. I see that the lock # 18 is held but when thread 19 was waiting, it was trying for lock # 19. Not sure what's going on. (gdb) p ssl_locks[18] $22 = { thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 384, lock_present = 1 } (gdb) p ssl_locks[19] $23 = { thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 389, lock_present = 0 } By that I'm assuming it has acquired lock 18 and is attempting to get 19 but failing because another thread already has it? If so you need to know which thread has lock 19 and why it hasn't released it. One way to do that is to store the details of each lock in a static array. Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
RE: Deadlock in openssl1.0.1e.
-Original Message- From: owner-openssl-us...@openssl.org [mailto:owner-openssl- us...@openssl.org] On Behalf Of Dr. Stephen Henson Sent: Wednesday, January 15, 2014 6:30 PM To: openssl-users@openssl.org Subject: Re: Deadlock in openssl1.0.1e. On Wed, Jan 15, 2014, Tayade, Nilesh wrote: Thanks for your suggestions. I did try capturing the debug information and below is how the state of the locks is. The array per lock stores information of the owner, file and lock_present i.e. lock is held or not. I see that the lock # 18 is held but when thread 19 was waiting, it was trying for lock # 19. Not sure what's going on. (gdb) p ssl_locks[18] $22 = { thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 384, lock_present = 1 } (gdb) p ssl_locks[19] $23 = { thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 389, lock_present = 0 } By that I'm assuming it has acquired lock 18 and is attempting to get 19 but failing because another thread already has it? If so you need to know which thread has lock 19 and why it hasn't released it. One way to do that is to store the details of each lock in a static array. Thanks. I have already captured the lock details in an array. Please see below. The leftmost digit is lock number index. Lock 19 was taken by same thread and it was released (lock_present = 0). And this is causing confusion for me. 0 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 1 {thread_id = 140009934862112, file_name = err.c, '\000' repeats 34 times, file_name_len = 5, line = 408, lock_present = 0}, 2 {thread_id = 140003819828992, file_name = ex_data.c, '\000' repeats 30 times, file_name_len = 9, line = 511, lock_present = 0}, 3 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 4 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 5 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 6 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 7 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 8 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 9 {thread_id = 140003803043584, file_name = rsa_eay.c, '\000' repeats30 times, file_name_len = 9, line = 313, lock_present = 0}, 10 {thread_id = 140003819828992, file_name = p_lib.c, '\000' repeats 32 times, file_name_len = 7, line = 393, lock_present = 0}, 11 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 12 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 13 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 14 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 15 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 16 {thread_id = 140009934862112,file_name = ssl_ciph.c, '\000' repeats 29 times, file_name_len = 10, line = 481, lock_present = 0}, 17 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 18 {thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 384, lock_present = 1}, 19 {thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 389, lock_present = 0}, 20 {thread_id = 140009934862112, file_name = mem_dbg.c, '\000' repeats 30 times, file_name_len = 9, line = 282, lock_present = 0}, 21 {thread_id = 140003819828992, file_name = bio_lib.c, '\000' repeats 30 times, file_name_len = 9, line = 117, lock_present = 0}, 22 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 23 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 24 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 25 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 26 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 27 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 28 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 29 {thread_id = 0, file_name = '\000' repeats 39 times, file_name_len = 0, line = 0, lock_present = 0}, 30 {thread_id = 140009934862112,file_name
Re: Deadlock in openssl1.0.1e.
On Wed, Jan 15, 2014, Tayade, Nilesh wrote: Thanks. I have already captured the lock details in an array. Please see below. The leftmost digit is lock number index. Lock 19 was taken by same thread and it was released (lock_present = 0). And this is causing confusion for me. 18 {thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 384, lock_present = 1}, 19 {thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 389, lock_present = 0}, Does it imply that thread has lock 18 but can't get 19 even though the lock isn't present? Does this deadlock happen almost immediately or some time later? The way the PRNG works it's normal for lock 18 and 19 to be acquired and released often sometimes simultaneously. Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: Deadlock in openssl1.0.1e.
Yes. The issue is fairly reproducible and takes not more than 10-15mins to occur. -- Thanks, Nilesh Sent via CanvasHD Dr. Stephen Henson st...@openssl.org wrote: On Wed, Jan 15, 2014, Tayade, Nilesh wrote: Thanks. I have already captured the lock details in an array. Please see below. The leftmost digit is lock number index. Lock 19 was taken by same thread and it was released (lock_present = 0). And this is causing confusion for me. 18 {thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 384, lock_present = 1}, 19 {thread_id = 140003803043584, file_name = md_rand.c, '\000' repeats 30 times, file_name_len = 9, line = 389, lock_present = 0}, Does it imply that thread has lock 18 but can't get 19 even though the lock isn't present? Does this deadlock happen almost immediately or some time later? The way the PRNG works it's normal for lock 18 and 19 to be acquired and released often sometimes simultaneously. Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: Deadlock in openssl1.0.1e.
and what are all the other threads doing? some other thread must already own this lock and is waiting on something else. -lee On 2014-01-10 13:11, Tayade, Nilesh wrote: Hi, I am facing a deadlock issue on openssl1.0.1e. Please see the stack below. Could anyone please advise if this is a known bug? Is there any workaround/fix? FIPS is disabled on my system. (gdb) bt full #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 No locals. #1 0x00373da092c5 in _L_lock_868 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00373da09197 in __pthread_mutex_lock (mutex=0x7418430) at pthread_mutex_lock.c:61 ignore1 = 128 ignore2 = 121734192 ignore3 = -512 type = value optimized out id = 10428 #3 0x009f3585 in locking_function (mode=9, n=18, file=0x3c79bdf93b md_rand.c, line=228) at ssl_prf.c:653 No locals. #4 0x003c79b48f69 in ?? () from /usr/lib64/libcrypto.so.10 No symbol table info available. #5 0x003c79b06c72 in ?? () from /usr/lib64/libcrypto.so.10 No symbol table info available. #6 0x003c79b06aa0 in ?? () from /usr/lib64/libcrypto.so.10 No symbol table info available. #7 0x003c79b07be2 in BN_BLINDING_create_param () from /usr/lib64/libcrypto.so.10 No symbol table info available. #8 0x003c79b07d6a in BN_BLINDING_update () from /usr/lib64/libcrypto.so.10 No symbol table info available. #9 0x003c79b07e56 in BN_BLINDING_convert_ex () from /usr/lib64/libcrypto.so.10 No symbol table info available. -- Thanks, Nilesh Sent via CanvasHD:??I???r?m (???Z+?K?+1???x??h???[?z?(???Z+???f?y??????f???h??)z{,??
RE: Deadlock in openssl1.0.1e.
Hi, I am not able to understand what’s going wrong, below is the latest stack trace I got. Thread 19 is killed by a watchdog, and it was waiting on a lock. Other threads do not show any lock being held. 21 Thread 23735 0x00373d2a6a8d in nanosleep () at ../sysdeps/unix/syscall-template.S:82 20 Thread 23752 0x00373d2da4f3 in select () at ../sysdeps/unix/syscall-template.S:82 19 Thread 23754 0x00373da0ed70 in __pause_nocancel () at ../sysdeps/unix/syscall-template.S:82 18 Thread 23724 0x00373da0e7ed in accept () at ../sysdeps/unix/syscall-template.S:82 17 Thread 23733 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 16 Thread 23704 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 15 Thread 23732 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 14 Thread 23705 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 13 Thread 23753 0x00373da0e963 in recvfrom () at ../sysdeps/unix/syscall-template.S:82 12 Thread 23563 0x00373d2a6a8d in nanosleep () at ../sysdeps/unix/syscall-template.S:82 11 Thread 23734 0x00373da0e7ed in accept () at ../sysdeps/unix/syscall-template.S:82 10 Thread 23751 0x00373d2a6a8d in nanosleep () at ../sysdeps/unix/syscall-template.S:82 9 Thread 23756 0x00373d2c8c07 in sched_yield () at ../sysdeps/unix/syscall-template.S:82 8 Thread 23757 0x00373d2c8c07 in sched_yield () at ../sysdeps/unix/syscall-template.S:82 7 Thread 23755 0x00373d2c8c07 in sched_yield () at ../sysdeps/unix/syscall-template.S:82 6 Thread 23736 0x00373d2c8c07 in sched_yield () at ../sysdeps/unix/syscall-template.S:82 5 Thread 23759 0x00373d2c8c07 in sched_yield () at ../sysdeps/unix/syscall-template.S:82 4 Thread 23728 0x00373d2c8c07 in sched_yield () at ../sysdeps/unix/syscall-template.S:82 3 Thread 23731 0x00373d2c8c07 in sched_yield () at ../sysdeps/unix/syscall-template.S:82 * 2 Thread 23703 0x00373da0e8ec in __libc_recv (fd=value optimized out, buf=value optimized out, n=value optimized out, flags=value optimized out) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:34 1 Thread 23758 0x00373d2329a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 (gdb) thread 19 [Switching to thread 19 (Thread 23754)]#0 0x00373da0ed70 in __pause_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 ../sysdeps/unix/syscall-template.S: No such file or directory. in ../sysdeps/unix/syscall-template.S (gdb) bt #0 0x00373da0ed70 in __pause_nocancel () at ../sysdeps/unix/syscall-template.S:82 #1 0x00373da08f0b in __pthread_mutex_lock_full (mutex=0x7418438) at pthread_mutex_lock.c:316 #2 0x009f3591 in locking_function (mode=9, n=19, file=0x3c79bdf93b Address 0x3c79bdf93b out of bounds, line=387) at ssl_prf.c:665 #3 0x003c79b49403 in ?? () -- Thanks, Nilesh From: owner-openssl-us...@openssl.org [mailto:owner-openssl-us...@openssl.org] On Behalf Of Lee Dilkie Sent: Tuesday, January 14, 2014 7:25 PM To: openssl-users@openssl.org Subject: Re: Deadlock in openssl1.0.1e. and what are all the other threads doing? some other thread must already own this lock and is waiting on something else. -lee On 2014-01-10 13:11, Tayade, Nilesh wrote: Hi, I am facing a deadlock issue on openssl1.0.1e. Please see the stack below. Could anyone please advise if this is a known bug? Is there any workaround/fix? FIPS is disabled on my system. (gdb) bt full #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 No locals. #1 0x00373da092c5 in _L_lock_868 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00373da09197 in __pthread_mutex_lock (mutex=0x7418430) at pthread_mutex_lock.c:61 ignore1 = 128 ignore2 = 121734192 ignore3 = -512 type = value optimized out id = 10428 #3 0x009f3585 in locking_function (mode=9, n=18, file=0x3c79bdf93b md_rand.c, line=228) at ssl_prf.c:653 No locals. #4 0x003c79b48f69 in ?? () from /usr/lib64/libcrypto.so.10 No symbol table info available. #5 0x003c79b06c72 in ?? () from /usr/lib64/libcrypto.so.10 No symbol table info available. #6 0x003c79b06aa0 in ?? () from /usr/lib64/libcrypto.so.10 No symbol table info available. #7 0x003c79b07be2 in BN_BLINDING_create_param () from /usr/lib64/libcrypto.so.10 No symbol table info available. #8 0x003c79b07d6a in BN_BLINDING_update () from /usr/lib64/libcrypto.so.10 No symbol table info available. #9 0x003c79b07e56 in BN_BLINDING_convert_ex () from /usr/lib64/libcrypto.so.10 No symbol table info available. -- Thanks, Nilesh Sent via CanvasHD:��IϮ��r�m (���Z+�K‑�+1���x ��h���[�z�(���Z+� ��f�y�‑�f���h
Re: Deadlock in openssl1.0.1e.
On Tue, Jan 14, 2014, Tayade, Nilesh wrote: Hi, I am not able to understand what???s going wrong, below is the latest stack trace I got. Thread 19 is killed by a watchdog, and it was waiting on a lock. Other threads do not show any lock being held. I'd suggest you modify your locking callback to store the location (file, line number) for the lock in question (18 which is CRYPTO_LOCK_RAND) when a lock is acquired successfully and clear it when the lock is released. Then when you get a deadlock you can see the last instance that held (but did not release) the lock. You could try compiling with LOCK_DEBUG but that will create a *lot* of output. Steve. -- Dr Stephen N. Henson. OpenSSL project core developer. Commercial tech support now available see: http://www.openssl.org __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org