Punchline: The time taken by a call to Heap32Next on 64-bit Windows-7
SCALES (roughly linearly?) with the number of heap entries in the heap
list.  This seems to be a serious problem that would affect (at least)
most 32-bit-compiled OpenSSL users on 64-bit Win7.

I've cleared my accusation against the CryptoAPI functions - those are
working fine.  The time is taken up by Heap32Next, even though good ==
1 and stoptime is set.  The 1-second constraint on the number of
heaplists walked is ineffective because the time is all spent in the
inner loop, walking the first 80 heap entries in the first heaplist.

By the time I got up to 4 million (2-byte) heap objects in my test
harness, each Heap32Next call was taking multiple seconds.  It is not
the overall size of the heap that counts, but the number of heap
objects.  The performance of each Heap32Next (the 1st versus the 80th)
is roughly the same.  I do not know whether the problem is specific to
only 64-bit Win7 (due to WoW), or whether it applies to all Windows 7
versions.

What then is the fix?  Sure, this may be a Windows problem, but
letting RAND_poll take dozens to hundreds of seconds is obviously not
acceptable.  This problem is sort of related to previous "heap walking
is slooow" threads on this list dealing with lines ~500-515 in
rand_win.c, but we can no longer get 80 entries from the first list in
anything near 1 second.  What would the cryptographic effect (on the
entropy of the randomness pool) be from cutting the heap traversal
entirely (i.e. cutting 80 bytes of entropy) - is that
cryptographically acceptable?  Is there some alternate way of
traversing large heaps, or some alternate source of entropy we could
turn to?

I have a single cpp repro file with a slightly chopped-down RAND_poll
ripped out of rand_win.c that I could pass on to any OpenSSL
developer/contributor.

Thanks,
James

my debugging output:

stoptime: 851485984
Got heaplist_first.
heap1st 
................................................................................
tickcount: 851624250
Exiting RAND_poll

On Wed, Nov 11, 2009 at 4:50 PM, James Baker <j...@j-baker.org> wrote:
> It's not the CryptoAPI calls that are taking time - nearly all of the
> time is spent within Heap32Next.  Thus my hypothesis is that
> CryptAcquireContextW or CryptGenRandom is failing, causing 'good' to
> be 0 and the heap traversal to be unbounded.
>
> I see the "entrycnt = 80" constraint on walking the length of each
> heaplist, but there is no bound on the outer while loop calling
> Heap32ListNext?  You say that "very first block of heap" is retrieved
> when good is 0 - is that because "GetTickCount() < stoptime" is
> supposed to be a short-circuit when stoptime == 0?  (It's not -
> perhaps I should examine next whether GetTickCount is malfunctioning,
> or returning a signed negative int for comparison)
>
> The problem does occur with full admin privileges.  I might speculate
> about the effect the WoW layer has on using the Heap32* functions, but
> my investigation so far is focused on why the traversal isn't bounded
> (i.e. the CryptoAPI --> good relationship), as 4 seconds (1 each for
> heap/process/thread/module) would be tolerable.
>
> I have not yet written a standalone C program that simulates the same
> CryptoAPI call sequence.  If no one on this list can say "Yes, the
> RAND_Poll CryptoAPI calls work on Windows-7", this will be my next
> step.
>
> Thanks,
> James
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to