Re: Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)

2009-11-13 Thread James Baker
> Ger Hobbelt  wrote:
> Odd question maybe, but does the API call slowdown too when traversing
> other heaps (which carry fewer items)?

Yes.  This surprised me, but Heap32Next takes the same amount of time
to execute when traversing the 2nd heaplist (which has 15 items) as it
does the 1st heaplist (which has a million items).

> Are those time-per-API-call numbers averaged or does /each/ Heap32Next
> call take this long?!

Each and every call takes the same long amount of time.  To me, this
indicates that the time spent is not actually spent *finding* the next
heap entry (as if we were traversing a linked list to get to our
destination), but in allocating (to the nearest 2^N) space for and/or
recording info about every heap entry in every heap list.

> an adjustment to keep the rand collecting scan within reasonable
> bounds is well feasible (no hard upper limit, though, because, ah,
> 'granularity' there is the time one (slowest) API call takes, no
> matter how the solution is coded.

It would definitely be easy to constrain the number of heap entries
checked even further, based on time spent in the inner loop, but
doesn't that run into this:

> Oh yeah, to answer one Q in first post: it's not a very smart idea to
> strip out entropy collecting code sections ...

If we limited the inner loop to 1 second as we do the outer loop, we'd
effectively be cutting out (in this case) 79 of the usual 80 bytes of
entropy which, as you say, makes one trepidatious.

RAND_poll appears to gather randomly varying amounts of entropy,
basically "what it can grab in a few seconds".  Is there a minimum
effective amount of entropy that is known? The ideal thing is to add
another source of entropy to compensate, but that's not something
that's within my capabilities or time limits right now.
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)

2009-11-12 Thread James Baker
I've confirmed my linear performance conjecture w/r/t heap objects.
Click here to see pretty pictures graphing my results:

http://thenewjamesbaker.blogspot.com/2009/11/performance-of-heap32next-on-64-bit.html

On Thu, Nov 12, 2009 at 11:50 AM, James Baker  wrote:
> Punchline: The time taken by a call to Heap32Next on 64-bit Windows-7
> SCALES (roughly linearly?) with the number of heap entries in the heap
> list.  This seems to be a serious problem that would affect (at least)
> most 32-bit-compiled OpenSSL users on 64-bit Win7.
>
> I've cleared my accusation against the CryptoAPI functions - those are
> working fine.  The time is taken up by Heap32Next, even though good ==
> 1 and stoptime is set.  The 1-second constraint on the number of
> heaplists walked is ineffective because the time is all spent in the
> inner loop, walking the first 80 heap entries in the first heaplist.
>
> By the time I got up to 4 million (2-byte) heap objects in my test
> harness, each Heap32Next call was taking multiple seconds.  It is not
> the overall size of the heap that counts, but the number of heap
> objects.  The performance of each Heap32Next (the 1st versus the 80th)
> is roughly the same.  I do not know whether the problem is specific to
> only 64-bit Win7 (due to WoW), or whether it applies to all Windows 7
> versions.
>
> What then is the fix?  Sure, this may be a Windows problem, but
> letting RAND_poll take dozens to hundreds of seconds is obviously not
> acceptable.  This problem is sort of related to previous "heap walking
> is slooow" threads on this list dealing with lines ~500-515 in
> rand_win.c, but we can no longer get 80 entries from the first list in
> anything near 1 second.  What would the cryptographic effect (on the
> entropy of the randomness pool) be from cutting the heap traversal
> entirely (i.e. cutting 80 bytes of entropy) - is that
> cryptographically acceptable?  Is there some alternate way of
> traversing large heaps, or some alternate source of entropy we could
> turn to?
>
> I have a single cpp repro file with a slightly chopped-down RAND_poll
> ripped out of rand_win.c that I could pass on to any OpenSSL
> developer/contributor.
>
> Thanks,
> James
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)

2009-11-12 Thread James Baker
Punchline: The time taken by a call to Heap32Next on 64-bit Windows-7
SCALES (roughly linearly?) with the number of heap entries in the heap
list.  This seems to be a serious problem that would affect (at least)
most 32-bit-compiled OpenSSL users on 64-bit Win7.

I've cleared my accusation against the CryptoAPI functions - those are
working fine.  The time is taken up by Heap32Next, even though good ==
1 and stoptime is set.  The 1-second constraint on the number of
heaplists walked is ineffective because the time is all spent in the
inner loop, walking the first 80 heap entries in the first heaplist.

By the time I got up to 4 million (2-byte) heap objects in my test
harness, each Heap32Next call was taking multiple seconds.  It is not
the overall size of the heap that counts, but the number of heap
objects.  The performance of each Heap32Next (the 1st versus the 80th)
is roughly the same.  I do not know whether the problem is specific to
only 64-bit Win7 (due to WoW), or whether it applies to all Windows 7
versions.

What then is the fix?  Sure, this may be a Windows problem, but
letting RAND_poll take dozens to hundreds of seconds is obviously not
acceptable.  This problem is sort of related to previous "heap walking
is slooow" threads on this list dealing with lines ~500-515 in
rand_win.c, but we can no longer get 80 entries from the first list in
anything near 1 second.  What would the cryptographic effect (on the
entropy of the randomness pool) be from cutting the heap traversal
entirely (i.e. cutting 80 bytes of entropy) - is that
cryptographically acceptable?  Is there some alternate way of
traversing large heaps, or some alternate source of entropy we could
turn to?

I have a single cpp repro file with a slightly chopped-down RAND_poll
ripped out of rand_win.c that I could pass on to any OpenSSL
developer/contributor.

Thanks,
James

my debugging output:

stoptime: 851485984
Got heaplist_first.
heap1st 

tickcount: 851624250
Exiting RAND_poll

On Wed, Nov 11, 2009 at 4:50 PM, James Baker  wrote:
> It's not the CryptoAPI calls that are taking time - nearly all of the
> time is spent within Heap32Next.  Thus my hypothesis is that
> CryptAcquireContextW or CryptGenRandom is failing, causing 'good' to
> be 0 and the heap traversal to be unbounded.
>
> I see the "entrycnt = 80" constraint on walking the length of each
> heaplist, but there is no bound on the outer while loop calling
> Heap32ListNext?  You say that "very first block of heap" is retrieved
> when good is 0 - is that because "GetTickCount() < stoptime" is
> supposed to be a short-circuit when stoptime == 0?  (It's not -
> perhaps I should examine next whether GetTickCount is malfunctioning,
> or returning a signed negative int for comparison)
>
> The problem does occur with full admin privileges.  I might speculate
> about the effect the WoW layer has on using the Heap32* functions, but
> my investigation so far is focused on why the traversal isn't bounded
> (i.e. the CryptoAPI --> good relationship), as 4 seconds (1 each for
> heap/process/thread/module) would be tolerable.
>
> I have not yet written a standalone C program that simulates the same
> CryptoAPI call sequence.  If no one on this list can say "Yes, the
> RAND_Poll CryptoAPI calls work on Windows-7", this will be my next
> step.
>
> Thanks,
> James
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: CryptoAPI calls failing in rand_win on Windows 7

2009-11-11 Thread James Baker
It's not the CryptoAPI calls that are taking time - nearly all of the
time is spent within Heap32Next.  Thus my hypothesis is that
CryptAcquireContextW or CryptGenRandom is failing, causing 'good' to
be 0 and the heap traversal to be unbounded.

I see the "entrycnt = 80" constraint on walking the length of each
heaplist, but there is no bound on the outer while loop calling
Heap32ListNext?  You say that "very first block of heap" is retrieved
when good is 0 - is that because "GetTickCount() < stoptime" is
supposed to be a short-circuit when stoptime == 0?  (It's not -
perhaps I should examine next whether GetTickCount is malfunctioning,
or returning a signed negative int for comparison)

The problem does occur with full admin privileges.  I might speculate
about the effect the WoW layer has on using the Heap32* functions, but
my investigation so far is focused on why the traversal isn't bounded
(i.e. the CryptoAPI --> good relationship), as 4 seconds (1 each for
heap/process/thread/module) would be tolerable.

I have not yet written a standalone C program that simulates the same
CryptoAPI call sequence.  If no one on this list can say "Yes, the
RAND_Poll CryptoAPI calls work on Windows-7", this will be my next
step.

Thanks,
James

On Sun, Nov 8, 2009 at 6:36 AM, sandeep kiran p  wrote:
>>RAND_poll runs very quickly with a near-empty heap.
> Do you mean that the calls
> to Heap32First, Heap32Next, Heap32ListFirst, Heap32ListNext are failing? Can
> you check the return values from these calls? (using GetLastError?). In any
> case, the heap traversals are bounded by the 1 sec limit. Even if the
> variable "good" is 0, the very first block of heap allocated by the current
> process is retrieved. Can you exactly specify which CryptoAPI is taking so
> much time?
> -Sandeep
>
> On Fri, Nov 6, 2009 at 11:45 AM, James Baker  wrote:
>>
>> Background:  Testing a Ruby app on 64-bit Windows 7 Ultimate, I found
>> that OpenSSL::PKey::RSA.generate() was taking 98 seconds.  Jumping to
>> C, sampling showed that the great majority of this time was spent in
>> Heap32Next, which led me to the "heap list and heap walking" section
>> of RAND_poll in crypto/rand/rand_win.c
>>
>> The heap walking (and thread and module walking) are limited to 1s
>> unless the variable "good" is set, and advapi32.dll is loaded, which
>> means that "poll the CryptoAPI PRNG" using the conjunction of
>> CryptAcquireContextW and CryptGenRandom must be failing.
>>
>> The 98 seconds comes from walking the contents of the heap after
>> loading a Rails environment - RAND_poll runs very quickly with a
>> near-empty heap.  Are the crypo-API calls ever expected to fail under
>> any Windows platform, or is this the abnormality? I'm not aware of any
>> changes in Win7 that would break those calls (though I'm investigating
>> whether something permission/security-related is in play here), but
>> I'm not aware of much about Win7 in general.  I also don't see any
>> Win7-related changes in the OpenSSL changelog - has this platform been
>> validated already?
>>
>> Thanks,
>> James
>> __
>> OpenSSL Project                                 http://www.openssl.org
>> User Support Mailing List                    openssl-us...@openssl.org
>> Automated List Manager                           majord...@openssl.org
>
>
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org