Re: Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)

2009-11-13 Thread James Baker
 Ger Hobbelt g...@hobbelt.com wrote:
 Odd question maybe, but does the API call slowdown too when traversing
 other heaps (which carry fewer items)?

Yes.  This surprised me, but Heap32Next takes the same amount of time
to execute when traversing the 2nd heaplist (which has 15 items) as it
does the 1st heaplist (which has a million items).

 Are those time-per-API-call numbers averaged or does /each/ Heap32Next
 call take this long?!

Each and every call takes the same long amount of time.  To me, this
indicates that the time spent is not actually spent *finding* the next
heap entry (as if we were traversing a linked list to get to our
destination), but in allocating (to the nearest 2^N) space for and/or
recording info about every heap entry in every heap list.

 an adjustment to keep the rand collecting scan within reasonable
 bounds is well feasible (no hard upper limit, though, because, ah,
 'granularity' there is the time one (slowest) API call takes, no
 matter how the solution is coded.

It would definitely be easy to constrain the number of heap entries
checked even further, based on time spent in the inner loop, but
doesn't that run into this:

 Oh yeah, to answer one Q in first post: it's not a very smart idea to
 strip out entropy collecting code sections ...

If we limited the inner loop to 1 second as we do the outer loop, we'd
effectively be cutting out (in this case) 79 of the usual 80 bytes of
entropy which, as you say, makes one trepidatious.

RAND_poll appears to gather randomly varying amounts of entropy,
basically what it can grab in a few seconds.  Is there a minimum
effective amount of entropy that is known? The ideal thing is to add
another source of entropy to compensate, but that's not something
that's within my capabilities or time limits right now.
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)

2009-11-13 Thread Ger Hobbelt
On Fri, Nov 13, 2009 at 6:34 PM, James Baker j...@j-baker.org wrote:
[...]
 Each and every call takes the same long amount of time.  To me, this
 indicates that the time spent is not actually spent *finding* the next
 heap entry (as if we were traversing a linked list to get to our
 destination), but in allocating (to the nearest 2^N) space for and/or
 recording info about every heap entry in every heap list.

.[censored] eh... Gosh!
The only 'close to reasonable' cause for this I can come up with is
someone leaving some 'scan all to verify everything on each
invocation' diag code in there. May it's doing a full snapshot every
time. Whatever.

So, yes, cutting short the heaplist scan based on timing results
obtained over one or a few API calls is the way to go IMO; yes, when
your machine had been heaplist-treated like that, it effectively cuts
out 99.9% of the (semi-)entropy that might have been waiting for us in
the heaplist, so that's not very nice. I know the whole entropy
gathering business there comes with some handwaving anyway as nothing
is 'hard' - after all, all the 'entropy' gathered in the scan can be
theorized to be deterministic in some /very/ complex way (if one
assumes no external influences like humans and such touch the machine
up to than, knowing all the context and innards, etc.etc.), but it's
all about making it bloody hard for attackers to make it close to
deterministic within the confines of a reasonable amount of effort
spent. So every source of possible (semi)entropy counts.

Hence there's never going to be hard numbers on 'minimum effective
amount' of gathering effort or other detail of the scanning/gathering
process -- unless someone includes provable random hardware sources
with standard motherboards some day; maybe an on-board QRBG121, say.
;-)

So my remark about please not stripping out gathering code sections
should be read in that light: when a code section /can/ cause trouble
for someone, it should not be discarded for /everybody/. That's the
best we can do for now. Yep, it doesn't make the handwaving any less,
but at least the impact is minimized to fringe cases (if I may call
this huge-number heaplist thing a 'fringe case' ?).

There /is/ another possible source of entropy available nowadays, at
least for a lot of folks: with the existance of on-board analog audio
and ubiquitous DirectSound support (which goes for Windows 7 as well)
it would be a nice thing to take a few seconds of direct line and mic
channel 'sound' and turn that into a few hashes to feed to the
randomness collector. Input and mic lines can be 'noisy' due to noise
in the open analog circuits and A/D when the volumes are not moved
down to 0%: one can see this happen when recording mic line sound
without a mic attached in sound editors like Adobe Audition: the VU
bars show the very tiny line noise in the sampled signal and when you
amplify such a sample (say, 'normalize' track), the actual analog
hardware noise is clearly visible. It's not perfect white noise, but
at least it's grey or pink and it's got entropy, yes sir.
MIC line is favourable in this regard as it's 'signal-to-noise' ratio
is less than for 'line', almost everywhere, mostly due to higher
amplification ratios in the analog section there, to ensure the tiny
microphone signal makes it through the A/D quantization with the least
amount of quality loss. Wherever a sound engineer would curse the
noise (analog and A/.D quantization ~), we want to be in on the show.
Of course, there's squibles like the gatherer then temporarily
'occupying' the DirectSound I/O which other apps may not like all that
much, so it's not a thing that's way easy in that context regard, but
it'd be a nice addition to the gatherer.

Hm, when there's some time available, I should have a look at that.
Unless someone likes to beat me to it, that is. ;-))
 (Note to self: see how we can grab DirectSound channels with 'share
device with other applications' enabled, like some of the sound apps
can do out there. Second note: traverse DirectSound devices and grab
from each for extra noise yumminess.)

(Note: the same trick applies to sampled analog video, but there's
fewer folks with analog video-in, and analog video is becoming a bit
of a rare species itself with video broadcasts becoming digital
end-to-end nowadays.   sigh ;-) )


 If we limited the inner loop to 1 second as we do the outer loop, we'd
 effectively be cutting out (in this case) 79 of the usual 80 bytes of
 entropy which, as you say, makes one trepidatious.

 RAND_poll appears to gather randomly varying amounts of entropy,
 basically what it can grab in a few seconds.  Is there a minimum
 effective amount of entropy that is known? The ideal thing is to add
 another source of entropy to compensate, but that's not something
 that's within my capabilities or time limits right now.

For both parts, see above. (What's bad for one should not disappear
for everyone + analog audio noise)


-- 
Met vriendelijke 

Re: CryptoAPI calls failing in rand_win on Windows 7

2009-11-12 Thread William A. Rowe Jr.
James Baker wrote:
 
 The problem does occur with full admin privileges.

To be 100% clear, this is full admin with no UAC?  UAC will drop privilege
of an app seemingly running as 'administrator'.
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)

2009-11-12 Thread James Baker
Punchline: The time taken by a call to Heap32Next on 64-bit Windows-7
SCALES (roughly linearly?) with the number of heap entries in the heap
list.  This seems to be a serious problem that would affect (at least)
most 32-bit-compiled OpenSSL users on 64-bit Win7.

I've cleared my accusation against the CryptoAPI functions - those are
working fine.  The time is taken up by Heap32Next, even though good ==
1 and stoptime is set.  The 1-second constraint on the number of
heaplists walked is ineffective because the time is all spent in the
inner loop, walking the first 80 heap entries in the first heaplist.

By the time I got up to 4 million (2-byte) heap objects in my test
harness, each Heap32Next call was taking multiple seconds.  It is not
the overall size of the heap that counts, but the number of heap
objects.  The performance of each Heap32Next (the 1st versus the 80th)
is roughly the same.  I do not know whether the problem is specific to
only 64-bit Win7 (due to WoW), or whether it applies to all Windows 7
versions.

What then is the fix?  Sure, this may be a Windows problem, but
letting RAND_poll take dozens to hundreds of seconds is obviously not
acceptable.  This problem is sort of related to previous heap walking
is slooow threads on this list dealing with lines ~500-515 in
rand_win.c, but we can no longer get 80 entries from the first list in
anything near 1 second.  What would the cryptographic effect (on the
entropy of the randomness pool) be from cutting the heap traversal
entirely (i.e. cutting 80 bytes of entropy) - is that
cryptographically acceptable?  Is there some alternate way of
traversing large heaps, or some alternate source of entropy we could
turn to?

I have a single cpp repro file with a slightly chopped-down RAND_poll
ripped out of rand_win.c that I could pass on to any OpenSSL
developer/contributor.

Thanks,
James

my debugging output:

stoptime: 851485984
Got heaplist_first.
heap1st 

tickcount: 851624250
Exiting RAND_poll

On Wed, Nov 11, 2009 at 4:50 PM, James Baker j...@j-baker.org wrote:
 It's not the CryptoAPI calls that are taking time - nearly all of the
 time is spent within Heap32Next.  Thus my hypothesis is that
 CryptAcquireContextW or CryptGenRandom is failing, causing 'good' to
 be 0 and the heap traversal to be unbounded.

 I see the entrycnt = 80 constraint on walking the length of each
 heaplist, but there is no bound on the outer while loop calling
 Heap32ListNext?  You say that very first block of heap is retrieved
 when good is 0 - is that because GetTickCount()  stoptime is
 supposed to be a short-circuit when stoptime == 0?  (It's not -
 perhaps I should examine next whether GetTickCount is malfunctioning,
 or returning a signed negative int for comparison)

 The problem does occur with full admin privileges.  I might speculate
 about the effect the WoW layer has on using the Heap32* functions, but
 my investigation so far is focused on why the traversal isn't bounded
 (i.e. the CryptoAPI -- good relationship), as 4 seconds (1 each for
 heap/process/thread/module) would be tolerable.

 I have not yet written a standalone C program that simulates the same
 CryptoAPI call sequence.  If no one on this list can say Yes, the
 RAND_Poll CryptoAPI calls work on Windows-7, this will be my next
 step.

 Thanks,
 James
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)

2009-11-12 Thread James Baker
I've confirmed my linear performance conjecture w/r/t heap objects.
Click here to see pretty pictures graphing my results:

http://thenewjamesbaker.blogspot.com/2009/11/performance-of-heap32next-on-64-bit.html

On Thu, Nov 12, 2009 at 11:50 AM, James Baker j...@j-baker.org wrote:
 Punchline: The time taken by a call to Heap32Next on 64-bit Windows-7
 SCALES (roughly linearly?) with the number of heap entries in the heap
 list.  This seems to be a serious problem that would affect (at least)
 most 32-bit-compiled OpenSSL users on 64-bit Win7.

 I've cleared my accusation against the CryptoAPI functions - those are
 working fine.  The time is taken up by Heap32Next, even though good ==
 1 and stoptime is set.  The 1-second constraint on the number of
 heaplists walked is ineffective because the time is all spent in the
 inner loop, walking the first 80 heap entries in the first heaplist.

 By the time I got up to 4 million (2-byte) heap objects in my test
 harness, each Heap32Next call was taking multiple seconds.  It is not
 the overall size of the heap that counts, but the number of heap
 objects.  The performance of each Heap32Next (the 1st versus the 80th)
 is roughly the same.  I do not know whether the problem is specific to
 only 64-bit Win7 (due to WoW), or whether it applies to all Windows 7
 versions.

 What then is the fix?  Sure, this may be a Windows problem, but
 letting RAND_poll take dozens to hundreds of seconds is obviously not
 acceptable.  This problem is sort of related to previous heap walking
 is slooow threads on this list dealing with lines ~500-515 in
 rand_win.c, but we can no longer get 80 entries from the first list in
 anything near 1 second.  What would the cryptographic effect (on the
 entropy of the randomness pool) be from cutting the heap traversal
 entirely (i.e. cutting 80 bytes of entropy) - is that
 cryptographically acceptable?  Is there some alternate way of
 traversing large heaps, or some alternate source of entropy we could
 turn to?

 I have a single cpp repro file with a slightly chopped-down RAND_poll
 ripped out of rand_win.c that I could pass on to any OpenSSL
 developer/contributor.

 Thanks,
 James
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: Heap32Next performance awful on 64-bit Win7 (Was: CryptoAPI calls failing in rand_win on Windows 7)

2009-11-12 Thread Ger Hobbelt
Odd question maybe, but does the API call slowdown too when traversing
other heaps (which carry fewer items)? I assume not, but since you
tested this and I don't see that aspect in your blog.
(Pondering what can be done here; when the answer is 'no' to previous
it means the only way out is to 'measure' each HeapFirst/Next to see
if it is a 'slow' one (plus of course watch the total time spent in
the outer loop).

There's no way to get the total number of heap blocks up front, so
we're somehow stuck with 'seeing what happens while we traverse' one
way or another, to checking after only a few API calls whether it
registers on the clock()/ticks radar or not might work out... which
leads to the second question regarding your values:

Are those time-per-API-call numbers averaged or does /each/ Heap32Next
call take this long?! (I assume here the first ones are faster and
time spent increases gradually while the list is traversed, but again,
that's only assuming and no observation data to aye or naye that yet)

If the initial calls are faster, then the solution is still kind of
the same, but needs a little further thought; a hacky 'check first N
for time spent' won't work.

just thinking out loud here


slap self! / Aw, heck, this is doing things wrong way around anyway:
whether those two assumptions are correct or not, the scanner code
shouldn't depend on those anyhow and be able to cope with either one;
an adjustment to keep the rand collecting scan within reasonable
bounds is well feasible (no hard upper limit, though, because, ah,
'granularity' there is the time one (slowest) API call takes, no
matter how the solution is coded.





On Fri, Nov 13, 2009 at 2:38 AM, James Baker j...@j-baker.org wrote:
 I've confirmed my linear performance conjecture w/r/t heap objects.
 Click here to see pretty pictures graphing my results:


Oh yeah, to answer one Q in first post: it's not a very smart idea to
strip out entropy collecting code sections; it's the slow way to
arrive at an undesirably predictable random generator as you take away
a chance to introduce some entropy, one scanner part at the time.
It's perfectly okay to /add/ other sources, such as noise input from
audio sources, etc. (A/D converter and analog h/w noise) but taking
out should be done with trepidation. There enough horror cases about
the ones that have gone that road before, so, unless there's no other
way, no need to add to that collection. ;-)




-- 
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--
web:http://www.hobbelt.com/
http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--
__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: CryptoAPI calls failing in rand_win on Windows 7

2009-11-11 Thread James Baker
It's not the CryptoAPI calls that are taking time - nearly all of the
time is spent within Heap32Next.  Thus my hypothesis is that
CryptAcquireContextW or CryptGenRandom is failing, causing 'good' to
be 0 and the heap traversal to be unbounded.

I see the entrycnt = 80 constraint on walking the length of each
heaplist, but there is no bound on the outer while loop calling
Heap32ListNext?  You say that very first block of heap is retrieved
when good is 0 - is that because GetTickCount()  stoptime is
supposed to be a short-circuit when stoptime == 0?  (It's not -
perhaps I should examine next whether GetTickCount is malfunctioning,
or returning a signed negative int for comparison)

The problem does occur with full admin privileges.  I might speculate
about the effect the WoW layer has on using the Heap32* functions, but
my investigation so far is focused on why the traversal isn't bounded
(i.e. the CryptoAPI -- good relationship), as 4 seconds (1 each for
heap/process/thread/module) would be tolerable.

I have not yet written a standalone C program that simulates the same
CryptoAPI call sequence.  If no one on this list can say Yes, the
RAND_Poll CryptoAPI calls work on Windows-7, this will be my next
step.

Thanks,
James

On Sun, Nov 8, 2009 at 6:36 AM, sandeep kiran p sandeepkir...@gmail.com wrote:
RAND_poll runs very quickly with a near-empty heap.
 Do you mean that the calls
 to Heap32First, Heap32Next, Heap32ListFirst, Heap32ListNext are failing? Can
 you check the return values from these calls? (using GetLastError?). In any
 case, the heap traversals are bounded by the 1 sec limit. Even if the
 variable good is 0, the very first block of heap allocated by the current
 process is retrieved. Can you exactly specify which CryptoAPI is taking so
 much time?
 -Sandeep

 On Fri, Nov 6, 2009 at 11:45 AM, James Baker j...@j-baker.org wrote:

 Background:  Testing a Ruby app on 64-bit Windows 7 Ultimate, I found
 that OpenSSL::PKey::RSA.generate() was taking 98 seconds.  Jumping to
 C, sampling showed that the great majority of this time was spent in
 Heap32Next, which led me to the heap list and heap walking section
 of RAND_poll in crypto/rand/rand_win.c

 The heap walking (and thread and module walking) are limited to 1s
 unless the variable good is set, and advapi32.dll is loaded, which
 means that poll the CryptoAPI PRNG using the conjunction of
 CryptAcquireContextW and CryptGenRandom must be failing.

 The 98 seconds comes from walking the contents of the heap after
 loading a Rails environment - RAND_poll runs very quickly with a
 near-empty heap.  Are the crypo-API calls ever expected to fail under
 any Windows platform, or is this the abnormality? I'm not aware of any
 changes in Win7 that would break those calls (though I'm investigating
 whether something permission/security-related is in play here), but
 I'm not aware of much about Win7 in general.  I also don't see any
 Win7-related changes in the OpenSSL changelog - has this platform been
 validated already?

 Thanks,
 James
 __
 OpenSSL Project                                 http://www.openssl.org
 User Support Mailing List                    openssl-us...@openssl.org
 Automated List Manager                           majord...@openssl.org


__
OpenSSL Project http://www.openssl.org
User Support Mailing Listopenssl-users@openssl.org
Automated List Manager   majord...@openssl.org


Re: CryptoAPI calls failing in rand_win on Windows 7

2009-11-08 Thread sandeep kiran p
RAND_poll runs very quickly with a near-empty heap.

Do you mean that the calls
to Heap32First, Heap32Next, Heap32ListFirst, Heap32ListNext are failing? Can
you check the return values from these calls? (using GetLastError?). In any
case, the heap traversals are bounded by the 1 sec limit. Even if the
variable good is 0, the very first block of heap allocated by the current
process is retrieved. Can you exactly specify which CryptoAPI is taking so
much time?

-Sandeep

On Fri, Nov 6, 2009 at 11:45 AM, James Baker j...@j-baker.org wrote:

 Background:  Testing a Ruby app on 64-bit Windows 7 Ultimate, I found
 that OpenSSL::PKey::RSA.generate() was taking 98 seconds.  Jumping to
 C, sampling showed that the great majority of this time was spent in
 Heap32Next, which led me to the heap list and heap walking section
 of RAND_poll in crypto/rand/rand_win.c

 The heap walking (and thread and module walking) are limited to 1s
 unless the variable good is set, and advapi32.dll is loaded, which
 means that poll the CryptoAPI PRNG using the conjunction of
 CryptAcquireContextW and CryptGenRandom must be failing.

 The 98 seconds comes from walking the contents of the heap after
 loading a Rails environment - RAND_poll runs very quickly with a
 near-empty heap.  Are the crypo-API calls ever expected to fail under
 any Windows platform, or is this the abnormality? I'm not aware of any
 changes in Win7 that would break those calls (though I'm investigating
 whether something permission/security-related is in play here), but
 I'm not aware of much about Win7 in general.  I also don't see any
 Win7-related changes in the OpenSSL changelog - has this platform been
 validated already?

 Thanks,
 James
 __
 OpenSSL Project http://www.openssl.org
 User Support Mailing Listopenssl-users@openssl.org
 Automated List Manager   majord...@openssl.org