Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-15 Thread Joerg Sonnenberger
On Thu, Jun 14, 2007 at 06:04:27PM -0700, Matthew Dillon wrote:
 From this point of view it is much, much better to bzero() memory that
 is already mapped then it is to map/unmap new memory.

For kernel land, you are right. For userland, there's one big down-side
to always bzero/memset newly allocated memory: it touches the page and
thereby can add a lot of back pressure on you are not having that much
memory. This can be completely uncessary at that point and at least one
application of using calloc with large parameter values is creation of a
hash table. Forcing it to consume memory is not such a good idea.

Joerg
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-14 Thread Ivan Voras
[EMAIL PROTECTED] wrote:

 Hmmm... I wonder what the Mach kernel in OSX does to allocate memory
 then. I'll have to take a look at OpenDarwin's source sometime and see
 what it does.

Following the link chain from the benchmark link posted in this thread
I've come to the information that it's similar to -CURRENT: small
allocations are carved from the local pool, big ones from prezeroed
pages (from kernel).



signature.asc
Description: OpenPGP digital signature


Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-14 Thread youshi10

On Thu, 14 Jun 2007, Ivan Voras wrote:


[EMAIL PROTECTED] wrote:


Hmmm... I wonder what the Mach kernel in OSX does to allocate memory
then. I'll have to take a look at OpenDarwin's source sometime and see
what it does.


Following the link chain from the benchmark link posted in this thread
I've come to the information that it's similar to -CURRENT: small
allocations are carved from the local pool, big ones from prezeroed
pages (from kernel).




Do you know if that's with malloc or calloc? What portion of the source 
demonstrates this?

-Garrett

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-14 Thread Ivan Voras
[EMAIL PROTECTED] wrote:

 Do you know if that's with malloc or calloc? What portion of the source
 demonstrates this?

No source, but here's a quote from
http://boredzo.org/blog/archives/2006-11-26/calloc-vs-malloc:

For large blocks (where large is surprisingly small, something like 14
KB) Mac OS X's default malloc() will always go to the kernel for memory
by calling vm_allocate(). vm_allocate() always returns zero-filled
pages; otherwise, you might get back a chunk of physical RAM or swap
space that had been written to by some root process, and you'd get
privileged data. So for large blocks, we'd expect calloc() and malloc()
to perform identically.

Mach will reserve some memory but not physically allocate it until you
read or write it. The pages can also be marked zero filled without
having to write zeros to RAM. But the first time you read from the page,
it has to allocate and then zero-fill it.


Google lead me to this:
http://developer.apple.com/documentation/Performance/Conceptual/ManagingMemory/Articles/MemoryAlloc.html

It's not conclusive:

For allocations greater than a few virtual memory pages, malloc uses the
vm_allocate routine to obtain a block of the requested size.The
vm_allocate routine assigns an address range to the new block in the
virtual memory space of the current process but does not allocate any
physical memory. Instead, the malloc routine pages in the memory for the
allocated block as it is used.
The granularity of large memory blocks is 4096 bytes, the size of a
virtual memory page. If you are allocating a large memory buffer, you
should consider making it a multiple of this size.
Note: Large memory allocations are guaranteed to be page-aligned.

but:

The calloc routine reserves the required virtual address space for the
memory but waits until the memory is actually used before initializing
it. This approach alleviates the need to map the pages into memory right
away. It also lets the system initialize pages as they’re used, as
opposed to all at once.



signature.asc
Description: OpenPGP digital signature


Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-14 Thread Matthew Dillon
I'm going to throw a wrench in the works, because it all gets turned
around the moment you find yourself in a SMP environment where several
threads are running on different cpus at the same time, using the 
same shared VM space.

The moment you have a situation like that where you are futzing with
the page tables, i.e. using mmap() for demand-zero and munmap() to
free, the operation becomes extremely expensive verses anything
else because any update to the page table (specifically any removal
of page table entries from the page table) requires a SMP synchronization
to occur between all the cpu's actively sharing that VM space, and
that's on top of the overhead of taking the page fault(s).

This is true of any memory mapping the kernel has to do in kernel
virtual memory (must be synchronized with ALL cpus) and any mapping
the kernel does on behalf of userland for user memory (must be
synchronized with any cpu's actively using that VM space, i.e. threaded
user programs).  The synchronization is required to properly invalidate
stale mappings on other cpus and it must be done synchronously due
to bugs in Intel/AMD related to changing page table entries on one
cpu when instructions are executing using that memory on another cpu.
There is no way to avoid it without tripping up on the Intel/AMD hardware
bugs.

From this point of view it is much, much better to bzero() memory that
is already mapped then it is to map/unmap new memory.  I recently
audited DragonFly and found an insane number of IPIs flying about due
to PAGE_SIZE'd kernel mallocs using the VM trick via kernel_map 
kmem_alloc().  They all went away when I made the kernel malloc use
the slab cache for allocations up to and including PAGE_SIZE*2 bytes.

Fun, eh?

-Matt
Matthew Dillon 
[EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-14 Thread youshi10

On Thu, 14 Jun 2007, Matthew Dillon wrote:


   I'm going to throw a wrench in the works, because it all gets turned
   around the moment you find yourself in a SMP environment where several
   threads are running on different cpus at the same time, using the
   same shared VM space.

   The moment you have a situation like that where you are futzing with
   the page tables, i.e. using mmap() for demand-zero and munmap() to
   free, the operation becomes extremely expensive verses anything
   else because any update to the page table (specifically any removal
   of page table entries from the page table) requires a SMP synchronization
   to occur between all the cpu's actively sharing that VM space, and
   that's on top of the overhead of taking the page fault(s).

   This is true of any memory mapping the kernel has to do in kernel
   virtual memory (must be synchronized with ALL cpus) and any mapping
   the kernel does on behalf of userland for user memory (must be
   synchronized with any cpu's actively using that VM space, i.e. threaded
   user programs).  The synchronization is required to properly invalidate
   stale mappings on other cpus and it must be done synchronously due
   to bugs in Intel/AMD related to changing page table entries on one
   cpu when instructions are executing using that memory on another cpu.
   There is no way to avoid it without tripping up on the Intel/AMD hardware
   bugs.

   From this point of view it is much, much better to bzero() memory that
   is already mapped then it is to map/unmap new memory.  I recently
   audited DragonFly and found an insane number of IPIs flying about due
   to PAGE_SIZE'd kernel mallocs using the VM trick via kernel_map 
   kmem_alloc().  They all went away when I made the kernel malloc use
   the slab cache for allocations up to and including PAGE_SIZE*2 bytes.

   Fun, eh?

-Matt
Matthew Dillon
[EMAIL PROTECTED]


I have no intention of using malloc/calloc with free, and then repeating the 
same procedure. It's better just to use the memory allocated, if possible, size 
permitting this.

I wasn't thinking that closely though (ISA/hardware config versus OS 
implementation), but I had my suspicions since the AMD64 architecture is very 
different from the PowerPC architecture, in terms of word size, sychronization 
schemes, instruction count, etc.

Interesting insight though. Thanks :).

-Garrett

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-13 Thread Garrett Cooper

Garrett Cooper wrote:
   Title says it all -- is there a particular reason why malloc/bzero 
should be used instead of calloc?

-Garrett

As someone just brought to my attention, I should do some Googling.

Initial results brought up this: 
http://boredzo.org/blog/archives/2006-11-26/calloc-vs-malloc. I would 
like to provide results for CURRENT, but I don't know offhand what C 
interface right supports nanoseconds or microseconds precision timing in 
FreeBSD (apart from just doing nanosleeps, which isn't such a great idea 
and can drift I would think due to clock skew). The original author's 
solution is for Mac OSX only :(..


I think it's decided though -- calloc for now wins over malloc / bzero, 
so I'm going to change that alloc/bzero to reflect the change.


-Garrett
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-13 Thread Ed Schouten
* Garrett Cooper [EMAIL PROTECTED] wrote:
  Garrett Cooper wrote:
 Title says it all -- is there a particular reason why malloc/bzero 
  should be used instead of calloc?
  -Garrett
  As someone just brought to my attention, I should do some Googling.
 
  Initial results brought up this: 
  http://boredzo.org/blog/archives/2006-11-26/calloc-vs-malloc.

To be more precise; I took a look at the source code of calloc on my
FreeBSD 6 box:

| void *
| calloc(num, size)
| size_t num;
| size_t size;
| {
| void *p;
| 
| if (size != 0  SIZE_T_MAX / size  num) {
| errno = ENOMEM;
| return (NULL);
| }
| 
| size *= num;
| if ( (p = malloc(size)) )
| bzero(p, size);
| return(p);
| }

This means that the results on that website would be quite different
than the the ones that the FreeBSD 6 malloc/calloc should give. There is
even a difference between calloc'ing 10 block of 10 MB and 1 block of
100 MB, which shouldn't make a difference here. calloc doesn't have any
performance-advantage here, because it just calls malloc/bzero.

When looking at FreeBSD -CURRENT's calloc (won't paste it; too long), it
just does a arena_malloc/memset (which is malloc/bzero) for small
allocations but a huge_malloc for big allocations (say, multiple pages
big). The latter one already returns pages that are zero'd by the
kernel, so I suspect the calloc performance for big allocations on
-CURRENT is a lot better than on FreeBSD 6. As with FreeBSD 6, it
wouldn't matter if you calloc 10 pieces of 10 MB or one piece of 100 MB.

Yours,
-- 
 Ed Schouten [EMAIL PROTECTED]
 WWW: http://g-rave.nl/


pgpk3IoTh9vfe.pgp
Description: PGP signature


Re: Reason for doing malloc / bzero over calloc (performance)?

2007-06-13 Thread youshi10

On Wed, 13 Jun 2007, Ed Schouten wrote:


* Garrett Cooper [EMAIL PROTECTED] wrote:

 Garrett Cooper wrote:

   Title says it all -- is there a particular reason why malloc/bzero
should be used instead of calloc?
-Garrett

 As someone just brought to my attention, I should do some Googling.

 Initial results brought up this:
 http://boredzo.org/blog/archives/2006-11-26/calloc-vs-malloc.


To be more precise; I took a look at the source code of calloc on my
FreeBSD 6 box:

| void *
| calloc(num, size)
| size_t num;
| size_t size;
| {
| void *p;
|
| if (size != 0  SIZE_T_MAX / size  num) {
| errno = ENOMEM;
| return (NULL);
| }
|
| size *= num;
| if ( (p = malloc(size)) )
| bzero(p, size);
| return(p);
| }

This means that the results on that website would be quite different
than the the ones that the FreeBSD 6 malloc/calloc should give. There is
even a difference between calloc'ing 10 block of 10 MB and 1 block of
100 MB, which shouldn't make a difference here. calloc doesn't have any
performance-advantage here, because it just calls malloc/bzero.

When looking at FreeBSD -CURRENT's calloc (won't paste it; too long), it
just does a arena_malloc/memset (which is malloc/bzero) for small
allocations but a huge_malloc for big allocations (say, multiple pages
big). The latter one already returns pages that are zero'd by the
kernel, so I suspect the calloc performance for big allocations on
-CURRENT is a lot better than on FreeBSD 6. As with FreeBSD 6, it
wouldn't matter if you calloc 10 pieces of 10 MB or one piece of 100 MB.

Yours,
--
Ed Schouten [EMAIL PROTECTED]
WWW: http://g-rave.nl/


Hmmm... I wonder what the Mach kernel in OSX does to allocate memory then. I'll 
have to take a look at OpenDarwin's source sometime and see what it does.

-Garrett

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]