MSDN says " To enumerate the heap or module states for all processes, specify TH32CS_SNAPALL and set *th32ProcessID* to zero. "
So it presumably does the heap and module walk for all processes and not only for the current process. Do you think *CreateToolhelp32Snapshot's* lock on the read-only snapshot could be a possible culprit? I am now thinking about removing the calls to Heap32First and Heap32Next in rand_win.c and look for alternate sources of entropy. Thanks for you help. Regards Sandeep On Sat, Feb 25, 2012 at 2:38 AM, Jakob Bohm <jb-open...@wisemo.com> wrote: > On 2/24/2012 2:14 PM, sandeep kiran p wrote: > >> You mentioned that OpenSSL is holding a "snapshot" lock in rand_win.c. I >> couldn't find anything like that in that file. Can you specifically point >> me to the code that you are referring to? I would also like to get an >> opinion on possible workarounds that I can enforce to avoid the deadlock. >> >> In OpenSSL 1.0.0 it is line 486 which says > > module_next && (handle = snap(TH32CS_SNAPALL,0)) > > where snap is a pointer to KERNEL32.**CreateToolhelp32Snapshot() > > > 1. Can I remove the heap traversal routines Heap32First and Heap32Next? >> Will it badly affect the PRNG output later on? >> > It depends how good the other sources of random numbers are, > more below. > > >> 2. Can I replace Heap32First and Heap32Next calls with any other sources >> of entropy? What if I make a call to CryptGenRandom again in place of the >> heap traversal routines? >> > Calling CryptGenRandom() twice isn't going to help much. > > If CryptGenRandom() is as good as it is "supposed to" be, > the other entropy sources are not really needed. But if > CryptGenRandom() is somehow broken or untrustworthy, > calling it a million times wouldn't help. > > Anyway, I have my doubts about the value of using the local > heap walking functions as a source of entropy, as they > reflect only the state of your own process. Pretending that > the address and size of each malloc()-ed memory block in > your process contributes 3 to 5 bytes of additional entropy > (which is what the comments say) is wildly optimistic and > quite unrealistic. > > In a long-running web browser or a similarly long running > web server, the net total of the memory layout effects of > thousands of semi-chaotic previous network requests and > user actions might contribute a total of 10 to 50 bits of > entropy. But in a typical freshly started process, the > layout is going to be pretty deterministic (if the OS > uses address layout randomization, it probably does so > based on entropy sources already incorporated into its > standard random source, i.e. CryptGenRandom() on Windows). > > >> 3. Any other possible ways out? >> >> Thanks, >> Sandeep >> >> On Thu, Feb 23, 2012 at 10:08 PM, Jakob Bohm <jb-open...@wisemo.com<mailto: >> jb-open...@wisemo.com>**> wrote: >> >> From the evidence given, I would *almost* certainly characterize >> this as a deadlock bug in ntdll.dll, the deepest, most trusted >> user mode component of Windows! >> >> Specifically, nothing should allow regular user code such as >> OpenSSL to hold onto NT internal critical sections while not >> running inside NTDLL, and NTDLL should be designed not to >> deadlock against itself. >> >> There is one other possibility though: >> >> The OpenSSL code in rand_win.c holds on to a "snapshot" lock >> on some of the heap data while walking it. It may be doing >> this in a way not permitted by the rules that are presumed >> by the deadlock avoidance design of the speed critical heap >> locking code. >> >> >> On 2/23/2012 2:11 PM, sandeep kiran p wrote: >> >> Hi, >> >> OpenSSL Version: 0.9.8o >> OS : Windows Server 2008 R2 SP1 >> >> I am seeing a deadlock in a windows application between two >> threads, one thread calling Heap32First from OpenSSL's >> RAND_poll and the other that allocates memory over the heap. >> >> Here is the relevant stack trace from both the threads >> involved in deadlock. >> >> Thread 523 >> ---------------- >> ntdll!ZwWaitForSingleObject+a >> ntdll!**RtlpWaitOnCriticalSection+e8 >> ntdll!RtlEnterCriticalSection+**d1 >> ntdll!RtlpAllocateHeap+18a6 >> ntdll!RtlAllocateHeap+16c >> ntdll!RtlpAllocateUserBlock+**145 >> ntdll!**RtlpLowFragHeapAllocFromContex**t+4e7 >> ntdll!RtlAllocateHeap+e4 >> ntdll!**RtlInitializeCriticalSectionEx**+d2 >> ntdll!**RtlpActivateLowFragmentationHe**ap+181 >> ntdll!**RtlpPerformHeapMaintenance+27 >> ntdll!RtlpAllocateHeap+1819 >> ntdll!RtlAllocateHeap+16c >> >> >> Thread 454 >> ----------------- >> ntdll!NtWaitForSingleObject+**0xa >> ntdll!**RtlpWaitOnCriticalSection+0xe8 >> ntdll!RtlEnterCriticalSection+**0xd1 >> ntdll!RtlLockHeap+0x3b >> ntdll!**RtlpQueryExtendedHeapInformati**on+0xf4 >> ntdll!RtlQueryHeapInformation+**0x3c >> ntdll!**RtlQueryProcessHeapInformation**+0x3ad >> ntdll!**RtlQueryProcessDebugInformatio**n+0x3b0 >> kernel32!Heap32First+0x71 >> >> WinDBG reports that thread 523 and 454 both hold locks and are >> waiting for each other locks thereby resulting in a deadlock. >> >> On searching, I have found a couple instances where such an >> issue has been reported with Heap32Next on Windows 7 but >> haven't found anything that helps me solve the problem. Most >> of the references I found conclude that this could be because >> of a possible bug in heap traversal APIs. If someone has faced >> a similar problem, can you guide me to possible workarounds by >> which I can avoid the deadlock? Can I remove the heap >> traversal routines and find some other sources of entropy? >> >> Thanks for your help. >> >> >> Enjoy > > Jakob > -- > Jakob Bohm, CIO, Partner, WiseMo A/S. http://www.wisemo.com > Transformervej 29, 2730 Herlev, Denmark. Direct +45 31 13 16 10 > This public discussion message is non-binding and may contain errors. > WiseMo - Remote Service Management for PCs, Phones and Embedded > > ______________________________**______________________________**__________ > OpenSSL Project http://www.openssl.org > User Support Mailing List openssl-users@openssl.org > Automated List Manager majord...@openssl.org >