On Sun, Nov 26, 2017 at 02:17:18PM +0800, Shannon Zhao wrote:
> Hi,
> 
> On 2017/11/24 14:30, Yang Zhong wrote:
> > Since there are some issues in memory alloc/free machenism
> > in glibc for little chunk memory, if Qemu frequently
> > alloc/free little chunk memory, the glibc doesn't alloc
> > little chunk memory from free list of glibc and still
> > allocate from OS, which make the heap size bigger and bigger.
> > 
> > This patch introduce malloc_trim(), which will free heap memory.
> > 
> > Below are test results from smaps file.
> > (1)without patch
> > 55f0783e1000-55f07992a000 rw-p 00000000 00:00 0  [heap]
> > Size:              21796 kB
> > Rss:               14260 kB
> > Pss:               14260 kB
> > 
> > (2)with patch
> > 55cc5fadf000-55cc61008000 rw-p 00000000 00:00 0  [heap]
> > Size:              21668 kB
> > Rss:                6940 kB
> > Pss:                6940 kB
> > 
> > Signed-off-by: Yang Zhong <yang.zh...@intel.com>
> > ---
> >  configure  | 29 +++++++++++++++++++++++++++++
> >  util/rcu.c |  6 ++++++
> >  2 files changed, 35 insertions(+)
> > 
> > diff --git a/configure b/configure
> > index 0c6e757..6292ab0 100755
> > --- a/configure
> > +++ b/configure
> > @@ -426,6 +426,7 @@ vxhs=""
> >  supported_cpu="no"
> >  supported_os="no"
> >  bogus_os="no"
> > +malloc_trim="yes"
> >  
> >  # parse CC options first
> >  for opt do
> > @@ -3857,6 +3858,30 @@ if test "$tcmalloc" = "yes" && test "$jemalloc" = 
> > "yes" ; then
> >      exit 1
> >  fi
> >  
> > +# Even if malloc_trim() is available, these non-libc memory allocators
> > +# do not support it.
> > +if test "$tcmalloc" = "yes" || test "$jemalloc" = "yes" ; then
> > +    if test "$malloc_trim" = "yes" ; then
> > +        echo "Disabling malloc_trim with non-libc memory allocator"
> > +    fi
> > +    malloc_trim="no"
> > +fi
> > +
> > +#######################################
> > +# malloc_trim
> > +
> > +if test "$malloc_trim" != "no" ; then
> > +    cat > $TMPC << EOF
> > +#include <malloc.h>
> > +int main(void) { malloc_trim(0); return 0; }
> > +EOF
> > +    if compile_prog "" "" ; then
> > +        malloc_trim="yes"
> > +    else
> > +        malloc_trim="no"
> > +    fi
> > +fi
> > +
> >  ##########################################
> >  # tcmalloc probe
> >  
> > @@ -6012,6 +6037,10 @@ if test "$opengl" = "yes" ; then
> >    fi
> >  fi
> >  
> > +if test "$malloc_trim" = "yes" ; then
> > +  echo "CONFIG_MALLOC_TRIM=y" >> $config_host_mak
> > +fi
> > +
> >  if test "$avx2_opt" = "yes" ; then
> >    echo "CONFIG_AVX2_OPT=y" >> $config_host_mak
> >  fi
> > diff --git a/util/rcu.c b/util/rcu.c
> > index ca5a63e..f403b77 100644
> > --- a/util/rcu.c
> > +++ b/util/rcu.c
> > @@ -32,6 +32,9 @@
> >  #include "qemu/atomic.h"
> >  #include "qemu/thread.h"
> >  #include "qemu/main-loop.h"
> > +#if defined(CONFIG_MALLOC_TRIM)
> > +#include <malloc.h>
> > +#endif
> >  
> >  /*
> >   * Global grace period counter.  Bit 0 is always one in rcu_gp_ctr.
> > @@ -272,6 +275,9 @@ static void *call_rcu_thread(void *opaque)
> >              node->func(node);
> >          }
> >          qemu_mutex_unlock_iothread();
> > +#if defined(CONFIG_MALLOC_TRIM)
> > +        malloc_trim(4 * 1024 * 1024);
> > +#endif
> >      }
> >      abort();
> >  }
> > 
> 
> Looks like this patch introduces a performance regression. With this
> patch the time of booting a VM with 60 scsi disks on ARM64 is increased
> by 200+ seconds.
> 
  Hello Shannon,

  Thanks for your reply!
  As for your concerns, i did VM bootup compared tests, and results as below:

  #test command
  ./qemu-system-x86_64 -enable-kvm -cpu host -m 2G -smp cpus=4,cores=4,\
                       threads=1,sockets=1 -drive format=raw,\
                       file=test.img,index=0,media=disk -nographic

  #without patch
  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 4.979s (kernel) + 1.214s (userspace) = 6.193s

  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 4.922s (kernel) + 1.175s (userspace) = 6.097s

  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 4.990s (kernel) + 1.301s (userspace) = 6.291s

  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 5.063s (kernel) + 1.336s (userspace) = 6.400s

  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 4.820s (kernel) + 1.237s (userspace) = 6.057s

  avg: kernel 4.9548, userspace 1.2526


  #with this patch
  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 5.099s (kernel) + 1.579s (userspace) = 6.679s

  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 5.003s (kernel) + 1.343s (userspace) = 6.347s

  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 4.853s (kernel) + 1.220s (userspace) = 6.074s

  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 4.836s (kernel) + 1.111s (userspace) = 5.948s

  root@intel-internal-corei7-64:~# systemd-analyze
  Startup finished in 4.917s (kernel) + 1.166s (userspace) = 6.083s

  avg: kernel 4.9416s, userspace: 1.2838

  From above test results, there are almost not any performance regression
  on x86 platform. Sorry, there is not any ARM based platform in my hand,
  i can't give related datas.  thanks!

  Regards,

  Yang


> Thanks,
> -- 
> Shannon

Reply via email to