Re: Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)
> Ger Hobbelt wrote: > Odd question maybe, but does the API call slowdown too when traversing > other heaps (which carry fewer items)? Yes. This surprised me, but Heap32Next takes the same amount of time to execute when traversing the 2nd heaplist (which has 15 items) as it does the 1st heaplist (which has a million items). > Are those time-per-API-call numbers averaged or does /each/ Heap32Next > call take this long?! Each and every call takes the same long amount of time. To me, this indicates that the time spent is not actually spent *finding* the next heap entry (as if we were traversing a linked list to get to our destination), but in allocating (to the nearest 2^N) space for and/or recording info about every heap entry in every heap list. > an adjustment to keep the rand collecting scan within reasonable > bounds is well feasible (no hard upper limit, though, because, ah, > 'granularity' there is the time one (slowest) API call takes, no > matter how the solution is coded. It would definitely be easy to constrain the number of heap entries checked even further, based on time spent in the inner loop, but doesn't that run into this: > Oh yeah, to answer one Q in first post: it's not a very smart idea to > strip out entropy collecting code sections ... If we limited the inner loop to 1 second as we do the outer loop, we'd effectively be cutting out (in this case) 79 of the usual 80 bytes of entropy which, as you say, makes one trepidatious. RAND_poll appears to gather randomly varying amounts of entropy, basically "what it can grab in a few seconds". Is there a minimum effective amount of entropy that is known? The ideal thing is to add another source of entropy to compensate, but that's not something that's within my capabilities or time limits right now. __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)
I've confirmed my linear performance conjecture w/r/t heap objects. Click here to see pretty pictures graphing my results: http://thenewjamesbaker.blogspot.com/2009/11/performance-of-heap32next-on-64-bit.html On Thu, Nov 12, 2009 at 11:50 AM, James Baker wrote: > Punchline: The time taken by a call to Heap32Next on 64-bit Windows-7 > SCALES (roughly linearly?) with the number of heap entries in the heap > list. This seems to be a serious problem that would affect (at least) > most 32-bit-compiled OpenSSL users on 64-bit Win7. > > I've cleared my accusation against the CryptoAPI functions - those are > working fine. The time is taken up by Heap32Next, even though good == > 1 and stoptime is set. The 1-second constraint on the number of > heaplists walked is ineffective because the time is all spent in the > inner loop, walking the first 80 heap entries in the first heaplist. > > By the time I got up to 4 million (2-byte) heap objects in my test > harness, each Heap32Next call was taking multiple seconds. It is not > the overall size of the heap that counts, but the number of heap > objects. The performance of each Heap32Next (the 1st versus the 80th) > is roughly the same. I do not know whether the problem is specific to > only 64-bit Win7 (due to WoW), or whether it applies to all Windows 7 > versions. > > What then is the fix? Sure, this may be a Windows problem, but > letting RAND_poll take dozens to hundreds of seconds is obviously not > acceptable. This problem is sort of related to previous "heap walking > is slooow" threads on this list dealing with lines ~500-515 in > rand_win.c, but we can no longer get 80 entries from the first list in > anything near 1 second. What would the cryptographic effect (on the > entropy of the randomness pool) be from cutting the heap traversal > entirely (i.e. cutting 80 bytes of entropy) - is that > cryptographically acceptable? Is there some alternate way of > traversing large heaps, or some alternate source of entropy we could > turn to? > > I have a single cpp repro file with a slightly chopped-down RAND_poll > ripped out of rand_win.c that I could pass on to any OpenSSL > developer/contributor. > > Thanks, > James __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)
Punchline: The time taken by a call to Heap32Next on 64-bit Windows-7 SCALES (roughly linearly?) with the number of heap entries in the heap list. This seems to be a serious problem that would affect (at least) most 32-bit-compiled OpenSSL users on 64-bit Win7. I've cleared my accusation against the CryptoAPI functions - those are working fine. The time is taken up by Heap32Next, even though good == 1 and stoptime is set. The 1-second constraint on the number of heaplists walked is ineffective because the time is all spent in the inner loop, walking the first 80 heap entries in the first heaplist. By the time I got up to 4 million (2-byte) heap objects in my test harness, each Heap32Next call was taking multiple seconds. It is not the overall size of the heap that counts, but the number of heap objects. The performance of each Heap32Next (the 1st versus the 80th) is roughly the same. I do not know whether the problem is specific to only 64-bit Win7 (due to WoW), or whether it applies to all Windows 7 versions. What then is the fix? Sure, this may be a Windows problem, but letting RAND_poll take dozens to hundreds of seconds is obviously not acceptable. This problem is sort of related to previous "heap walking is slooow" threads on this list dealing with lines ~500-515 in rand_win.c, but we can no longer get 80 entries from the first list in anything near 1 second. What would the cryptographic effect (on the entropy of the randomness pool) be from cutting the heap traversal entirely (i.e. cutting 80 bytes of entropy) - is that cryptographically acceptable? Is there some alternate way of traversing large heaps, or some alternate source of entropy we could turn to? I have a single cpp repro file with a slightly chopped-down RAND_poll ripped out of rand_win.c that I could pass on to any OpenSSL developer/contributor. Thanks, James my debugging output: stoptime: 851485984 Got heaplist_first. heap1st tickcount: 851624250 Exiting RAND_poll On Wed, Nov 11, 2009 at 4:50 PM, James Baker wrote: > It's not the CryptoAPI calls that are taking time - nearly all of the > time is spent within Heap32Next. Thus my hypothesis is that > CryptAcquireContextW or CryptGenRandom is failing, causing 'good' to > be 0 and the heap traversal to be unbounded. > > I see the "entrycnt = 80" constraint on walking the length of each > heaplist, but there is no bound on the outer while loop calling > Heap32ListNext? You say that "very first block of heap" is retrieved > when good is 0 - is that because "GetTickCount() < stoptime" is > supposed to be a short-circuit when stoptime == 0? (It's not - > perhaps I should examine next whether GetTickCount is malfunctioning, > or returning a signed negative int for comparison) > > The problem does occur with full admin privileges. I might speculate > about the effect the WoW layer has on using the Heap32* functions, but > my investigation so far is focused on why the traversal isn't bounded > (i.e. the CryptoAPI --> good relationship), as 4 seconds (1 each for > heap/process/thread/module) would be tolerable. > > I have not yet written a standalone C program that simulates the same > CryptoAPI call sequence. If no one on this list can say "Yes, the > RAND_Poll CryptoAPI calls work on Windows-7", this will be my next > step. > > Thanks, > James __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: CryptoAPI calls failing in rand_win on Windows 7
It's not the CryptoAPI calls that are taking time - nearly all of the time is spent within Heap32Next. Thus my hypothesis is that CryptAcquireContextW or CryptGenRandom is failing, causing 'good' to be 0 and the heap traversal to be unbounded. I see the "entrycnt = 80" constraint on walking the length of each heaplist, but there is no bound on the outer while loop calling Heap32ListNext? You say that "very first block of heap" is retrieved when good is 0 - is that because "GetTickCount() < stoptime" is supposed to be a short-circuit when stoptime == 0? (It's not - perhaps I should examine next whether GetTickCount is malfunctioning, or returning a signed negative int for comparison) The problem does occur with full admin privileges. I might speculate about the effect the WoW layer has on using the Heap32* functions, but my investigation so far is focused on why the traversal isn't bounded (i.e. the CryptoAPI --> good relationship), as 4 seconds (1 each for heap/process/thread/module) would be tolerable. I have not yet written a standalone C program that simulates the same CryptoAPI call sequence. If no one on this list can say "Yes, the RAND_Poll CryptoAPI calls work on Windows-7", this will be my next step. Thanks, James On Sun, Nov 8, 2009 at 6:36 AM, sandeep kiran p wrote: >>RAND_poll runs very quickly with a near-empty heap. > Do you mean that the calls > to Heap32First, Heap32Next, Heap32ListFirst, Heap32ListNext are failing? Can > you check the return values from these calls? (using GetLastError?). In any > case, the heap traversals are bounded by the 1 sec limit. Even if the > variable "good" is 0, the very first block of heap allocated by the current > process is retrieved. Can you exactly specify which CryptoAPI is taking so > much time? > -Sandeep > > On Fri, Nov 6, 2009 at 11:45 AM, James Baker wrote: >> >> Background: Testing a Ruby app on 64-bit Windows 7 Ultimate, I found >> that OpenSSL::PKey::RSA.generate() was taking 98 seconds. Jumping to >> C, sampling showed that the great majority of this time was spent in >> Heap32Next, which led me to the "heap list and heap walking" section >> of RAND_poll in crypto/rand/rand_win.c >> >> The heap walking (and thread and module walking) are limited to 1s >> unless the variable "good" is set, and advapi32.dll is loaded, which >> means that "poll the CryptoAPI PRNG" using the conjunction of >> CryptAcquireContextW and CryptGenRandom must be failing. >> >> The 98 seconds comes from walking the contents of the heap after >> loading a Rails environment - RAND_poll runs very quickly with a >> near-empty heap. Are the crypo-API calls ever expected to fail under >> any Windows platform, or is this the abnormality? I'm not aware of any >> changes in Win7 that would break those calls (though I'm investigating >> whether something permission/security-related is in play here), but >> I'm not aware of much about Win7 in general. I also don't see any >> Win7-related changes in the OpenSSL changelog - has this platform been >> validated already? >> >> Thanks, >> James >> __ >> OpenSSL Project http://www.openssl.org >> User Support Mailing List openssl-us...@openssl.org >> Automated List Manager majord...@openssl.org > > __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org