Later I noticed that I had these changes in my tree, and decided since the percentages were now correct, that I should check in the results. Nelson noticed that my changes to handle the 'wrap around' problem, in fact induced errors (an off by one error if the code thought that it 'wrapped'). The checked in code ended up fixing the percentage problem because it corrected the code to do what the comments said, which was to calculate the time in ms. In fact this rounded out time from routines that were fairly short, but called a lot.
As a result I revamped these changes. I also went through and collected the counts in an array so that we can manage the output much more cleanly. With the manageable output function, the code now can calculate how best to output it's results. Data is maintained in 'ticks' (whatever that happens to be natively for a given machine). Total time in a function and Total time in the module is now outputted as a scaled value, trying to keep at least 2 significant figures. Average time is always outputted in microseconds (the smallest unit NSPR allows us to convert 'ticks' to). Percent is calculated in ticks, and displays correctly now (it turns out the problem was simply a units mismatch -- we were displaying time in ticks/ total time in msec *100. Now we display time in ticks/ total time in ticks *100).
Anyway the output is sorted by FunctionTable order, which ironically is pretty meaningful because it collects the Init, Update, and Final functions together. Oh, and the function doesn't output anything for functions that were not called.
here's two sample outputs from strclnt:
SSL2:
Function # Calls Time Avg. % Time
C_Initialize 1 56ms 55970.00us 1.13%
C_Finalize 1 370us 370.00us 0.01%
C_GetInfo 1 10us 10.00us 0.00%
C_GetSlotList 2 10us 5.00us 0.00%
C_GetSlotInfo 2 10us 5.00us 0.00%
C_GetTokenInfo 2 20us 10.00us 0.00%
C_GetMechanismList 4 30us 7.50us 0.00%
C_OpenSession 4024 75ms 18.53us 1.50%
C_CloseSession 4022 888ms 220.82us 17.88%
C_CloseAllSessions 2 220us 110.00us 0.00%
C_GetSessionInfo 1 10us 10.00us 0.00%
C_CreateObject 2002 1473ms 735.74us 29.65%
C_DestroyObject 2002 70ms 35.02us 1.41%
C_GetAttributeValue 13 1020us 78.46us 0.02%
C_FindObjectsInit 12 1440us 120.00us 0.03%
C_FindObjects 12 100us 8.33us 0.00%
C_FindObjectsFinal 12 50us 4.17us 0.00%
C_EncryptInit 1001 424ms 423.36us 8.53%
C_Encrypt 1 2370us 2370.00us 0.05%
C_EncryptUpdate 4000 97ms 24.26us 1.95%
C_EncryptFinal 1000 15ms 14.51us 0.29%
C_DecryptInit 1000 62ms 62.43us 1.26%
C_DecryptUpdate 3000 90ms 29.93us 1.81%
C_DecryptFinal 1000 18ms 17.82us 0.36%
C_DigestInit 9004 344ms 38.23us 6.93%
C_DigestUpdate 27004 215ms 7.97us 4.33%
C_DigestFinal 12004 1092ms 90.97us 21.98%
C_VerifyRecoverInit 1 20us 20.00us 0.00%
C_VerifyRecover 1 1970us 1970.00us 0.04%
C_GenerateRandom 1002 42ms 42.03us 0.85%
Totals 72133 4969ms
Maximum number of concurrent open sessions: 42
And strsclnt:
Function # Calls Time Avg. % Time
C_Initialize 1 59ms 58830.00us 0.29%
C_Finalize 1 370us 370.00us 0.00%
C_GetInfo 1 0 z 0.00us 0.00%
C_GetSlotList 2 0 z 0.00us 0.00%
C_GetSlotInfo 2 10us 5.00us 0.00%
C_GetTokenInfo 2 20us 10.00us 0.00%
C_GetMechanismList 4 30us 7.50us 0.00%
C_GetMechanismInfo 1000 5380us 5.38us 0.03%
C_OpenSession 2048 40ms 19.73us 0.20%
C_CloseSession 2045 39ms 19.08us 0.19%
C_CloseAllSessions 2 430us 215.00us 0.00%
C_GetSessionInfo 1 10us 10.00us 0.00%
C_GetOperationState 4000 368ms 92.08us 1.83%
C_SetOperationState 4000 111ms 27.75us 0.55%
C_CreateObject 2 510us 255.00us 0.00%
C_DestroyObject 5003 339ms 67.80us 1.68%
C_GetAttributeValue 15 1330us 88.67us 0.01%
C_FindObjectsInit 15 1780us 118.67us 0.01%
C_FindObjects 15 110us 7.33us 0.00%
C_FindObjectsFinal 15 90us 6.00us 0.00%
C_EncryptInit 1000 217ms 217.14us 1.08%
C_EncryptUpdate 2000 33ms 16.58us 0.16%
C_EncryptFinal 1000 16ms 16.23us 0.08%
C_DecryptInit 1000 401ms 401.48us 1.99%
C_DecryptUpdate 3000 391ms 130.33us 1.94%
C_DecryptFinal 1000 14ms 14.30us 0.07%
C_DigestInit 2004 39ms 19.23us 0.19%
C_DigestUpdate 34002 1446ms 42.54us 7.18%
C_DigestFinal 8004 371ms 46.30us 1.84%
C_SignInit 7000 2515ms 359.27us 12.49%
C_SignUpdate 14000 444ms 31.73us 2.21%
C_SignFinal 14000 3866ms 276.11us 19.20%
C_VerifyRecoverInit 1 20us 20.00us 0.00%
C_VerifyRecover 1 2390us 2390.00us 0.01%
C_GenerateKey 2 500us 250.00us 0.00%
C_WrapKey 2 2330us 1165.00us 0.01%
C_UnWrapKey 999 854ms 854.97us 4.24%
C_DeriveKey 1001 8510ms 8501.96us 42.26%
C_GenerateRandom 1000 46ms 46.26us 0.23%
Totals 109190 20 s
Maximum number of concurrent open sessions: 57
The think I noticed is the large time spent in C_DeriveKey, particularly when SSL2 has no call to C_DeriveKey. You may also notice that SSL2 spends a lot more time in C_CreateObject than SSL3 does. This is because the objects used by SSL3 are created in C_DeriveKey, which is creating 4 keys as a result of the call. The same underlying function (pk11_handleObject) is used to do the actual create in both of these cases, so if the majority of the C_CreateObject time is in pk11_handleObject(), then almost 3 seconds of the 8.5 seconds spent doing derive key is spent creating the object. It seems that maybe pk11_handleObject() might be a place for me to look to see if we can make some improvements on.
bob
smime.p7s
Description: S/MIME Cryptographic Signature
