Re: Reason for doing malloc / bzero over calloc (performance)?
On Thu, Jun 14, 2007 at 06:04:27PM -0700, Matthew Dillon wrote: From this point of view it is much, much better to bzero() memory that is already mapped then it is to map/unmap new memory. For kernel land, you are right. For userland, there's one big down-side to always bzero/memset newly allocated memory: it touches the page and thereby can add a lot of back pressure on you are not having that much memory. This can be completely uncessary at that point and at least one application of using calloc with large parameter values is creation of a hash table. Forcing it to consume memory is not such a good idea. Joerg ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Reason for doing malloc / bzero over calloc (performance)?
[EMAIL PROTECTED] wrote: Hmmm... I wonder what the Mach kernel in OSX does to allocate memory then. I'll have to take a look at OpenDarwin's source sometime and see what it does. Following the link chain from the benchmark link posted in this thread I've come to the information that it's similar to -CURRENT: small allocations are carved from the local pool, big ones from prezeroed pages (from kernel). signature.asc Description: OpenPGP digital signature
Re: Reason for doing malloc / bzero over calloc (performance)?
On Thu, 14 Jun 2007, Ivan Voras wrote: [EMAIL PROTECTED] wrote: Hmmm... I wonder what the Mach kernel in OSX does to allocate memory then. I'll have to take a look at OpenDarwin's source sometime and see what it does. Following the link chain from the benchmark link posted in this thread I've come to the information that it's similar to -CURRENT: small allocations are carved from the local pool, big ones from prezeroed pages (from kernel). Do you know if that's with malloc or calloc? What portion of the source demonstrates this? -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Reason for doing malloc / bzero over calloc (performance)?
[EMAIL PROTECTED] wrote: Do you know if that's with malloc or calloc? What portion of the source demonstrates this? No source, but here's a quote from http://boredzo.org/blog/archives/2006-11-26/calloc-vs-malloc: For large blocks (where large is surprisingly small, something like 14 KB) Mac OS X's default malloc() will always go to the kernel for memory by calling vm_allocate(). vm_allocate() always returns zero-filled pages; otherwise, you might get back a chunk of physical RAM or swap space that had been written to by some root process, and you'd get privileged data. So for large blocks, we'd expect calloc() and malloc() to perform identically. Mach will reserve some memory but not physically allocate it until you read or write it. The pages can also be marked zero filled without having to write zeros to RAM. But the first time you read from the page, it has to allocate and then zero-fill it. Google lead me to this: http://developer.apple.com/documentation/Performance/Conceptual/ManagingMemory/Articles/MemoryAlloc.html It's not conclusive: For allocations greater than a few virtual memory pages, malloc uses the vm_allocate routine to obtain a block of the requested size.The vm_allocate routine assigns an address range to the new block in the virtual memory space of the current process but does not allocate any physical memory. Instead, the malloc routine pages in the memory for the allocated block as it is used. The granularity of large memory blocks is 4096 bytes, the size of a virtual memory page. If you are allocating a large memory buffer, you should consider making it a multiple of this size. Note: Large memory allocations are guaranteed to be page-aligned. but: The calloc routine reserves the required virtual address space for the memory but waits until the memory is actually used before initializing it. This approach alleviates the need to map the pages into memory right away. It also lets the system initialize pages as they’re used, as opposed to all at once. signature.asc Description: OpenPGP digital signature
Re: Reason for doing malloc / bzero over calloc (performance)?
I'm going to throw a wrench in the works, because it all gets turned around the moment you find yourself in a SMP environment where several threads are running on different cpus at the same time, using the same shared VM space. The moment you have a situation like that where you are futzing with the page tables, i.e. using mmap() for demand-zero and munmap() to free, the operation becomes extremely expensive verses anything else because any update to the page table (specifically any removal of page table entries from the page table) requires a SMP synchronization to occur between all the cpu's actively sharing that VM space, and that's on top of the overhead of taking the page fault(s). This is true of any memory mapping the kernel has to do in kernel virtual memory (must be synchronized with ALL cpus) and any mapping the kernel does on behalf of userland for user memory (must be synchronized with any cpu's actively using that VM space, i.e. threaded user programs). The synchronization is required to properly invalidate stale mappings on other cpus and it must be done synchronously due to bugs in Intel/AMD related to changing page table entries on one cpu when instructions are executing using that memory on another cpu. There is no way to avoid it without tripping up on the Intel/AMD hardware bugs. From this point of view it is much, much better to bzero() memory that is already mapped then it is to map/unmap new memory. I recently audited DragonFly and found an insane number of IPIs flying about due to PAGE_SIZE'd kernel mallocs using the VM trick via kernel_map kmem_alloc(). They all went away when I made the kernel malloc use the slab cache for allocations up to and including PAGE_SIZE*2 bytes. Fun, eh? -Matt Matthew Dillon [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Reason for doing malloc / bzero over calloc (performance)?
On Thu, 14 Jun 2007, Matthew Dillon wrote: I'm going to throw a wrench in the works, because it all gets turned around the moment you find yourself in a SMP environment where several threads are running on different cpus at the same time, using the same shared VM space. The moment you have a situation like that where you are futzing with the page tables, i.e. using mmap() for demand-zero and munmap() to free, the operation becomes extremely expensive verses anything else because any update to the page table (specifically any removal of page table entries from the page table) requires a SMP synchronization to occur between all the cpu's actively sharing that VM space, and that's on top of the overhead of taking the page fault(s). This is true of any memory mapping the kernel has to do in kernel virtual memory (must be synchronized with ALL cpus) and any mapping the kernel does on behalf of userland for user memory (must be synchronized with any cpu's actively using that VM space, i.e. threaded user programs). The synchronization is required to properly invalidate stale mappings on other cpus and it must be done synchronously due to bugs in Intel/AMD related to changing page table entries on one cpu when instructions are executing using that memory on another cpu. There is no way to avoid it without tripping up on the Intel/AMD hardware bugs. From this point of view it is much, much better to bzero() memory that is already mapped then it is to map/unmap new memory. I recently audited DragonFly and found an insane number of IPIs flying about due to PAGE_SIZE'd kernel mallocs using the VM trick via kernel_map kmem_alloc(). They all went away when I made the kernel malloc use the slab cache for allocations up to and including PAGE_SIZE*2 bytes. Fun, eh? -Matt Matthew Dillon [EMAIL PROTECTED] I have no intention of using malloc/calloc with free, and then repeating the same procedure. It's better just to use the memory allocated, if possible, size permitting this. I wasn't thinking that closely though (ISA/hardware config versus OS implementation), but I had my suspicions since the AMD64 architecture is very different from the PowerPC architecture, in terms of word size, sychronization schemes, instruction count, etc. Interesting insight though. Thanks :). -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Reason for doing malloc / bzero over calloc (performance)?
Garrett Cooper wrote: Title says it all -- is there a particular reason why malloc/bzero should be used instead of calloc? -Garrett As someone just brought to my attention, I should do some Googling. Initial results brought up this: http://boredzo.org/blog/archives/2006-11-26/calloc-vs-malloc. I would like to provide results for CURRENT, but I don't know offhand what C interface right supports nanoseconds or microseconds precision timing in FreeBSD (apart from just doing nanosleeps, which isn't such a great idea and can drift I would think due to clock skew). The original author's solution is for Mac OSX only :(.. I think it's decided though -- calloc for now wins over malloc / bzero, so I'm going to change that alloc/bzero to reflect the change. -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Reason for doing malloc / bzero over calloc (performance)?
* Garrett Cooper [EMAIL PROTECTED] wrote: Garrett Cooper wrote: Title says it all -- is there a particular reason why malloc/bzero should be used instead of calloc? -Garrett As someone just brought to my attention, I should do some Googling. Initial results brought up this: http://boredzo.org/blog/archives/2006-11-26/calloc-vs-malloc. To be more precise; I took a look at the source code of calloc on my FreeBSD 6 box: | void * | calloc(num, size) | size_t num; | size_t size; | { | void *p; | | if (size != 0 SIZE_T_MAX / size num) { | errno = ENOMEM; | return (NULL); | } | | size *= num; | if ( (p = malloc(size)) ) | bzero(p, size); | return(p); | } This means that the results on that website would be quite different than the the ones that the FreeBSD 6 malloc/calloc should give. There is even a difference between calloc'ing 10 block of 10 MB and 1 block of 100 MB, which shouldn't make a difference here. calloc doesn't have any performance-advantage here, because it just calls malloc/bzero. When looking at FreeBSD -CURRENT's calloc (won't paste it; too long), it just does a arena_malloc/memset (which is malloc/bzero) for small allocations but a huge_malloc for big allocations (say, multiple pages big). The latter one already returns pages that are zero'd by the kernel, so I suspect the calloc performance for big allocations on -CURRENT is a lot better than on FreeBSD 6. As with FreeBSD 6, it wouldn't matter if you calloc 10 pieces of 10 MB or one piece of 100 MB. Yours, -- Ed Schouten [EMAIL PROTECTED] WWW: http://g-rave.nl/ pgpk3IoTh9vfe.pgp Description: PGP signature
Re: Reason for doing malloc / bzero over calloc (performance)?
On Wed, 13 Jun 2007, Ed Schouten wrote: * Garrett Cooper [EMAIL PROTECTED] wrote: Garrett Cooper wrote: Title says it all -- is there a particular reason why malloc/bzero should be used instead of calloc? -Garrett As someone just brought to my attention, I should do some Googling. Initial results brought up this: http://boredzo.org/blog/archives/2006-11-26/calloc-vs-malloc. To be more precise; I took a look at the source code of calloc on my FreeBSD 6 box: | void * | calloc(num, size) | size_t num; | size_t size; | { | void *p; | | if (size != 0 SIZE_T_MAX / size num) { | errno = ENOMEM; | return (NULL); | } | | size *= num; | if ( (p = malloc(size)) ) | bzero(p, size); | return(p); | } This means that the results on that website would be quite different than the the ones that the FreeBSD 6 malloc/calloc should give. There is even a difference between calloc'ing 10 block of 10 MB and 1 block of 100 MB, which shouldn't make a difference here. calloc doesn't have any performance-advantage here, because it just calls malloc/bzero. When looking at FreeBSD -CURRENT's calloc (won't paste it; too long), it just does a arena_malloc/memset (which is malloc/bzero) for small allocations but a huge_malloc for big allocations (say, multiple pages big). The latter one already returns pages that are zero'd by the kernel, so I suspect the calloc performance for big allocations on -CURRENT is a lot better than on FreeBSD 6. As with FreeBSD 6, it wouldn't matter if you calloc 10 pieces of 10 MB or one piece of 100 MB. Yours, -- Ed Schouten [EMAIL PROTECTED] WWW: http://g-rave.nl/ Hmmm... I wonder what the Mach kernel in OSX does to allocate memory then. I'll have to take a look at OpenDarwin's source sometime and see what it does. -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]