Jason Evans wrote:

(LI Xin) wrote:
2006-08-15 02:38 +0300ladimir Kushnir>>> On -CURENT amd64 (Athlon64 3000+, 512k 
L2 cache):

With jemalloc (without MY_MALLOS):
  ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
...
116.34 real       113.69 user         0.00 sys

With MY_MALLOC:
  ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
...
45.30 real        44.29 user         0.00 sys

Have you turned off the debugging options, i.e. ln -sf
'aj' /etc/malloc.conf?

If you want to do a fair comparison, you will also define NO_MALLOC_EXTRAS when compiling malloc.c, in order to turn off the copious assertions, not to mention the statistics gathering code.

Before you do that though, it would be useful to turn statistics reporting on (add MALLOC_OPTIONS=P to your environment when running the test program) and see what malloc says is going on.

[I am away from my -current system at the moment, so can't benchmark the program.] If I understand the code correctly (and assuming the command line parameters specified), it starts out by requesting 3517 2000-byte allocations from malloc, and never frees any of those allocations.

You're right. My friend's program evaluates an electromagnetic field
by Finite-Difference Time-Domain method.


Both phkmalloc and jemalloc will fit two allocations in each page of memory. phkmalloc will call sbrk() at least 1759 times. jemalloc will call sbrk() approximately 6 times. 2kB allocations are a worst case for some of jemalloc's internal bookkeeping, but this shouldn't be a serious problem. Fragmentation for these 2000-byte allocations should total approximately 6%.

The same bad case with mmap(2) if mmap(2) is used to obtain small memory
block each time. A hierarchical memory management mechanism is required,
just like those of GNU libc and your new code.

The essence of this problem is that memory management of operating system
can affect working efficiency of CPU hardware greatly.

Actually, not only my friend's program can cause the problem, but also
many applications using strdup(3) frequently.

/usr/src/lib/libc/string/strdup.c:

char *
strdup(str)
        const char *str;
{
        size_t len;
        char *copy;

        len = strlen(str) + 1;
        if ((copy = malloc(len)) == NULL)
                return (NULL);
        memcpy(copy, str, len);
        return (copy);
}


malloc certainly incurs more overhead than a specialized sbrk()-based allocator, but I don't see any particular reason that jemalloc should perform suboptimally, as compared to any complete malloc implementation, for fdtd. If you find otherwise, please tell me so that I can look into the issue.

Thanks,
Jason
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


------------------------------------------------------------------------
                                               From Beijing, China

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to