Punchline: The time taken by a call to Heap32Next on 64-bit Windows-7 SCALES (roughly linearly?) with the number of heap entries in the heap list. This seems to be a serious problem that would affect (at least) most 32-bit-compiled OpenSSL users on 64-bit Win7.
I've cleared my accusation against the CryptoAPI functions - those are working fine. The time is taken up by Heap32Next, even though good == 1 and stoptime is set. The 1-second constraint on the number of heaplists walked is ineffective because the time is all spent in the inner loop, walking the first 80 heap entries in the first heaplist. By the time I got up to 4 million (2-byte) heap objects in my test harness, each Heap32Next call was taking multiple seconds. It is not the overall size of the heap that counts, but the number of heap objects. The performance of each Heap32Next (the 1st versus the 80th) is roughly the same. I do not know whether the problem is specific to only 64-bit Win7 (due to WoW), or whether it applies to all Windows 7 versions. What then is the fix? Sure, this may be a Windows problem, but letting RAND_poll take dozens to hundreds of seconds is obviously not acceptable. This problem is sort of related to previous "heap walking is slooow" threads on this list dealing with lines ~500-515 in rand_win.c, but we can no longer get 80 entries from the first list in anything near 1 second. What would the cryptographic effect (on the entropy of the randomness pool) be from cutting the heap traversal entirely (i.e. cutting 80 bytes of entropy) - is that cryptographically acceptable? Is there some alternate way of traversing large heaps, or some alternate source of entropy we could turn to? I have a single cpp repro file with a slightly chopped-down RAND_poll ripped out of rand_win.c that I could pass on to any OpenSSL developer/contributor. Thanks, James my debugging output: stoptime: 851485984 Got heaplist_first. heap1st ................................................................................ tickcount: 851624250 Exiting RAND_poll On Wed, Nov 11, 2009 at 4:50 PM, James Baker <j...@j-baker.org> wrote: > It's not the CryptoAPI calls that are taking time - nearly all of the > time is spent within Heap32Next. Thus my hypothesis is that > CryptAcquireContextW or CryptGenRandom is failing, causing 'good' to > be 0 and the heap traversal to be unbounded. > > I see the "entrycnt = 80" constraint on walking the length of each > heaplist, but there is no bound on the outer while loop calling > Heap32ListNext? You say that "very first block of heap" is retrieved > when good is 0 - is that because "GetTickCount() < stoptime" is > supposed to be a short-circuit when stoptime == 0? (It's not - > perhaps I should examine next whether GetTickCount is malfunctioning, > or returning a signed negative int for comparison) > > The problem does occur with full admin privileges. I might speculate > about the effect the WoW layer has on using the Heap32* functions, but > my investigation so far is focused on why the traversal isn't bounded > (i.e. the CryptoAPI --> good relationship), as 4 seconds (1 each for > heap/process/thread/module) would be tolerable. > > I have not yet written a standalone C program that simulates the same > CryptoAPI call sequence. If no one on this list can say "Yes, the > RAND_Poll CryptoAPI calls work on Windows-7", this will be my next > step. > > Thanks, > James ______________________________________________________________________ OpenSSL Project http://www.openssl.org User Support Mailing List openssl-users@openssl.org Automated List Manager majord...@openssl.org