On Sun, 2 Jul 2000, Arne Ansper wrote:

> 
> 
> > Hmm, I was able to create 354406 unnamed mutexes, before
> > CreateMutex() failed with ERROR_NOT_ENOUGH_QUOTA. Tested under
> > NT4 SP6.
> 
> btw, why use mutexes at all? openssl uses only unnamed mutexes and always
> waits indefinitly long on mutex. so we could use critical sections
> instead. they are much faster and basically
> unlimited. InitializeCriticalSection etc.
> 

Thats true, indeed. I realized this shortly after posting my
previous letter :)

Concerning relative performance of mutexes and critical sections
- I wrote a simple test program. Code is MSC specific.

-------------------8<-----------------------------------------------
#include <windows.h>
#include <stdio.h>
#include <assert.h>

BOOL    (*set_spin_count)() = NULL;

DWORD dummy_set_spin_count(CRITICAL_SECTION *s, DWORD c)
{
        return 0;
}

__int64
read_pentium_timestamp_counter(void)
{
        __asm   cpuid;
        __asm   rdtsc;
}

int main()
{
        unsigned        i;
        HANDLE          h, kernel32;
        __int64         t0,t1, t;
        CRITICAL_SECTION        cs;
        BOOL            blah;

        h = CreateMutex(NULL, FALSE, NULL);
        assert(h);

        /*
         * Setting spin count makes sense on SMP systems, especially
         * when critical sections are locked for a very short time.
         * Spin count indicates to EnterCriticalSection(), how many times
         * should it try to lock critical section in userland before it
         * decides that locking is going to take a lot of time anyway
         * and invokes WaitForSingleObject system call (which is expensive
         * due to context switches) on mutex associated with critical section.
         * Spin count is ignored on single CPU systems (according to m$ docs).
         *
         * This function requires a recent windows version, therefore
         * we try to figure out its address dynamically.
         */
        kernel32 = GetModuleHandle("kernel32.dll");
        assert(kernel32);
        set_spin_count = (DWORD (*)())GetProcAddress(kernel32,
                        "SetCriticalSectionSpinCount");
        if (set_spin_count == NULL) {
                printf("%s not found in %s\n", "SetCriticalSectionSpinCount()",
                        "kernel32.dll");
                set_spin_count = dummy_set_spin_count;
        }
        InitializeCriticalSection(&cs);
        printf("Setting spin count to %u, previous value was %u\n",
                8192, set_spin_count(&cs, 8192));
        printf("Size of CRITICAL_SECTION: %u\n\n", sizeof(CRITICAL_SECTION));

        for (i = 0, t = 0; i < 1024; ++i) {
                t0 = read_pentium_timestamp_counter();
                t1 = read_pentium_timestamp_counter();
                t += t1 - t0;
        }
        t /= i;
        /*
         * Value of t may not be precise, because first 
         * few cpuid instruction invocations in a tight loop
         * will take somewhat longer than subsequent ones, by
         * 5..6 ticks or so.
         */
        printf("%-40s%5I64u CPU ticks (total overhead)\n",
                "read_pentium_timestamp_counter()", t);

        t0 = read_pentium_timestamp_counter();
        EnterCriticalSection(&cs);
        t1 = read_pentium_timestamp_counter();
        printf("%-40s%5I64u CPU ticks\n", "EnterCriticalSection()",
                t1 - t0 - t);
        t0 = read_pentium_timestamp_counter();
        LeaveCriticalSection(&cs);
        t1 = read_pentium_timestamp_counter();
        printf("%-40s%5I64u CPU ticks\n", "LeaveCriticalSection()",
                t1 - t0 - t);

        t0 = read_pentium_timestamp_counter();
        switch (WaitForSingleObject(h, INFINITE)) {
        case WAIT_OBJECT_0:
                break;
        default:
                assert(0);
        }
        t1 = read_pentium_timestamp_counter();
        printf("%-40s%5I64u CPU ticks\n", "WaitForSingleObject()",
                t1 - t0 - t);
        t0 = read_pentium_timestamp_counter();
        ReleaseMutex(h);
        t1 = read_pentium_timestamp_counter();
        printf("%-40s%5I64u CPU ticks\n", "ReleaseMutex()", t1 - t0 - t);

        CloseHandle(h);

        return 0;
}
---------------------------------------->8---------------------------------

Sample output:

Setting spin count to 8192, previous value was 0
Size of CRITICAL_SECTION: 24

read_pentium_timestamp_counter()          132 CPU ticks (total overhead)
EnterCriticalSection()                    153 CPU ticks
LeaveCriticalSection()                     86 CPU ticks
WaitForSingleObject()                    8361 CPU ticks
ReleaseMutex()                           2321 CPU ticks


BTW, I just discovered that redirecting stdout to some file
improves performance significantlyly. Here's a output on the same
system with output redirected to some file:

Setting spin count to 8192, previous value was 0
Size of CRITICAL_SECTION: 24

read_pentium_timestamp_counter()          132 CPU ticks (total overhead)
EnterCriticalSection()                     75 CPU ticks
LeaveCriticalSection()                     10 CPU ticks
WaitForSingleObject()                    5862 CPU ticks
ReleaseMutex()                           1500 CPU ticks

I do not have good explanation to this. hm.

Have fun :)

-- 

vix


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to