About a month ago I was looking into the performance of some smart cards, and decided to use the debug_module.c. Unfortunately it was not displaying the percentages correctly, and I assumed it was because there was some wrap around problem with the PR_Interval time. I tweaked with that calculation and got correct percentages, did my testing, and then forgot about my changes. I also added an environment variable to allow output to a file instead of stdout, since some apps that I was trying to debug close stdout and I couldn't get any output.

Later I noticed that I had these changes in my tree, and decided since the percentages were now correct, that I should check in the results. Nelson noticed that my changes to handle the 'wrap around' problem, in fact induced errors (an off by one error if the code thought that it 'wrapped'). The checked in code ended up fixing the percentage problem because it corrected the code to do what the comments said, which was to calculate the time in ms. In fact this rounded out time from routines that were fairly short, but called a lot.

As a result I revamped these changes. I also went through and collected the counts in an array so that we can manage the output much more cleanly. With the manageable output function, the code now can calculate how best to output it's results. Data is maintained in 'ticks' (whatever that happens to be natively for a given machine). Total time in a function and Total time in the module is now outputted as a scaled value, trying to keep at least 2 significant figures. Average time is always outputted in microseconds (the smallest unit NSPR allows us to convert 'ticks' to). Percent is calculated in ticks, and displays correctly now (it turns out the problem was simply a units mismatch -- we were displaying time in ticks/ total time in msec *100. Now we display time in ticks/ total time in ticks *100).

Anyway the output is sorted by FunctionTable order, which ironically is pretty meaningful because it collects the Init, Update, and Final functions together. Oh, and the function doesn't output anything for functions that were not called.

here's two sample outputs from strclnt:

SSL2:
Function                     # Calls         Time         Avg.     % Time

C_Initialize                       1         56ms   55970.00us      1.13%
C_Finalize                         1        370us     370.00us      0.01%
C_GetInfo                          1         10us      10.00us      0.00%
C_GetSlotList                      2         10us       5.00us      0.00%
C_GetSlotInfo                      2         10us       5.00us      0.00%
C_GetTokenInfo                     2         20us      10.00us      0.00%
C_GetMechanismList                 4         30us       7.50us      0.00%
C_OpenSession                   4024         75ms      18.53us      1.50%
C_CloseSession                  4022        888ms     220.82us     17.88%
C_CloseAllSessions                 2        220us     110.00us      0.00%
C_GetSessionInfo                   1         10us      10.00us      0.00%
C_CreateObject                  2002       1473ms     735.74us     29.65%
C_DestroyObject                 2002         70ms      35.02us      1.41%
C_GetAttributeValue               13       1020us      78.46us      0.02%
C_FindObjectsInit                 12       1440us     120.00us      0.03%
C_FindObjects                     12        100us       8.33us      0.00%
C_FindObjectsFinal                12         50us       4.17us      0.00%
C_EncryptInit                   1001        424ms     423.36us      8.53%
C_Encrypt                          1       2370us    2370.00us      0.05%
C_EncryptUpdate                 4000         97ms      24.26us      1.95%
C_EncryptFinal                  1000         15ms      14.51us      0.29%
C_DecryptInit                   1000         62ms      62.43us      1.26%
C_DecryptUpdate                 3000         90ms      29.93us      1.81%
C_DecryptFinal                  1000         18ms      17.82us      0.36%
C_DigestInit                    9004        344ms      38.23us      6.93%
C_DigestUpdate                 27004        215ms       7.97us      4.33%
C_DigestFinal                  12004       1092ms      90.97us     21.98%
C_VerifyRecoverInit                1         20us      20.00us      0.00%
C_VerifyRecover                    1       1970us    1970.00us      0.04%
C_GenerateRandom                1002         42ms      42.03us      0.85%

                   Totals      72133       4969ms



Maximum number of concurrent open sessions: 42


And strsclnt:

Function                     # Calls         Time         Avg.     % Time

C_Initialize                       1         59ms   58830.00us      0.29%
C_Finalize                         1        370us     370.00us      0.00%
C_GetInfo                          1          0 z       0.00us      0.00%
C_GetSlotList                      2          0 z       0.00us      0.00%
C_GetSlotInfo                      2         10us       5.00us      0.00%
C_GetTokenInfo                     2         20us      10.00us      0.00%
C_GetMechanismList                 4         30us       7.50us      0.00%
C_GetMechanismInfo              1000       5380us       5.38us      0.03%
C_OpenSession                   2048         40ms      19.73us      0.20%
C_CloseSession                  2045         39ms      19.08us      0.19%
C_CloseAllSessions                 2        430us     215.00us      0.00%
C_GetSessionInfo                   1         10us      10.00us      0.00%
C_GetOperationState             4000        368ms      92.08us      1.83%
C_SetOperationState             4000        111ms      27.75us      0.55%
C_CreateObject                     2        510us     255.00us      0.00%
C_DestroyObject                 5003        339ms      67.80us      1.68%
C_GetAttributeValue               15       1330us      88.67us      0.01%
C_FindObjectsInit                 15       1780us     118.67us      0.01%
C_FindObjects                     15        110us       7.33us      0.00%
C_FindObjectsFinal                15         90us       6.00us      0.00%
C_EncryptInit                   1000        217ms     217.14us      1.08%
C_EncryptUpdate                 2000         33ms      16.58us      0.16%
C_EncryptFinal                  1000         16ms      16.23us      0.08%
C_DecryptInit                   1000        401ms     401.48us      1.99%
C_DecryptUpdate                 3000        391ms     130.33us      1.94%
C_DecryptFinal                  1000         14ms      14.30us      0.07%
C_DigestInit                    2004         39ms      19.23us      0.19%
C_DigestUpdate                 34002       1446ms      42.54us      7.18%
C_DigestFinal                   8004        371ms      46.30us      1.84%
C_SignInit                      7000       2515ms     359.27us     12.49%
C_SignUpdate                   14000        444ms      31.73us      2.21%
C_SignFinal                    14000       3866ms     276.11us     19.20%
C_VerifyRecoverInit                1         20us      20.00us      0.00%
C_VerifyRecover                    1       2390us    2390.00us      0.01%
C_GenerateKey                      2        500us     250.00us      0.00%
C_WrapKey                          2       2330us    1165.00us      0.01%
C_UnWrapKey                      999        854ms     854.97us      4.24%
C_DeriveKey                     1001       8510ms    8501.96us     42.26%
C_GenerateRandom                1000         46ms      46.26us      0.23%

                   Totals     109190         20 s


Maximum number of concurrent open sessions: 57

The think I noticed is the large time spent in C_DeriveKey, particularly when SSL2 has no call to C_DeriveKey. You may also notice that SSL2 spends a lot more time in C_CreateObject than SSL3 does. This is because the objects used by SSL3 are created in C_DeriveKey, which is creating 4 keys as a result of the call. The same underlying function (pk11_handleObject) is used to do the actual create in both of these cases, so if the majority of the C_CreateObject time is in pk11_handleObject(), then almost 3 seconds of the 8.5 seconds spent doing derive key is spent creating the object. It seems that maybe pk11_handleObject() might be a place for me to look to see if we can make some improvements on.

bob

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to