Bug#767295: [PATCH for-4.5 v2] libxc: don't leak buffer containing the uncompressed PV kernel

2014-11-21 Thread Gedalya

On 11/21/2014 06:03 AM, Ian Campbell wrote:



So here's what happens now.
1. Starts up tiny
2. reboot: leak
3. reboot: freed (process larger, but the delta is all/mostly shared pages)
4. reboot: leak
5. reboot: freed
etc..

WTF, how very strange!

:-)




--- reboot domu ---

root@xen:~/xen-pkgs# ps aux | grep asterisk_deb80
root 22981  0.6  3.3 131652 20008 ?SLsl 21:55   0:00 
/usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
root@xen:~/xen-pkgs# pmap -x 22981
22981:   /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
Address   Kbytes RSS   Dirty Mode  Mapping
0040 144 144   0 r-x-- xl
00623000   4   4   4 r xl
00624000   8   8   8 rw--- xl
00626000   4   4   4 rw---   [ anon ]
009a6000 288 288 288 rw---   [ anon ]
009ee000   35676   16772   16772 rw---   [ anon ]

This is the (temporarily) leaked mapping, right?

Yea that's the one that popped in after the reboot..
About 16 MB.




Tried valgrind, it doesn't look like it was able to see what was going on

Indeed. The values for total heap usage at exist and still reachable etc
also don't seem to account for the ~3M of mapping on each iteration.

I don't know how glibc's allocator works, but I suppose it isn't
impossible that it is retaining some mappings of free regions and
collecting them to free later somehow, which just happens to only
trigger every other reboot (e.g. perhaps it is based on some threshold
of free memory).

...investigates...

So, http://man7.org/linux/man-pages/man3/malloc.3.html talks about
special behaviour using mmap for allocations above MMAP_THRESHOLD (128K
by default), which we will be hitting here I think. That explains the
anon mapping.

http://man7.org/linux/man-pages/man3/mallopt.3.html also talks about
various dynamic thresholds for growing and shrinking the heap. My guess
is that we are bouncing up and down over some threshold with every other
reboot.

Ian.
OK this is way over my head.. I don't have a full and precise 
understanding of all of the above, but let me try to comment nevertheless.
There are two issues here. The added mappings (as I understand, files, 
non-anon, shared) do happen only on reboot, but they're not a real 
memory leak issue because they are shared with other processes, so no 
matter ho many xl processes we have, it's only another 2.6 or so MB 
added to the total memory usage of the server, right?
On the other hand we have this anon area, 16 MB, that pops in on one 
reboot and gets freed on the next. That's the real issue, as I see it..!
And all that only starts happening after the last line of valgrind 
output. Valgrind only had output up to the first boot of the VM, none later.
For a fresh, non-rebooted domu, the xl process shows up as in top as 
~588 KB RES, 0 SHR, how can the latter be anyway?? I don't understand.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#767295: [PATCH for-4.5 v2] libxc: don't leak buffer containing the uncompressed PV kernel

2014-11-21 Thread Ian Campbell
On Thu, 2014-11-20 at 22:13 -0500, Gedalya wrote:
> On 11/20/2014 03:21 PM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 20, 2014 at 03:48:47PM +, Ian Campbell wrote:
> >> The libxc xc_dom_* infrastructure uses a very simple malloc memory pool 
> >> which
> >> is freed by xc_dom_release. However the various xc_try_*_decode routines 
> >> (other
> >> than the gzip one) just use plain malloc/realloc and therefore the buffer 
> >> ends
> >> up leaked.
> >>
> >> The memory pool currently supports mmap'd buffers as well as a directly
> >> allocated buffers, however the try decode routines make use of realloc and 
> >> do
> >> not fit well into this model. Introduce a concept of an external memory 
> >> block
> >> to the memory pool and provide an interface to register such memory.
> >>
> >> The mmap_ptr and mmap_len fields of the memblock tracking struct lose their
> >> mmap_ prefix since they are now also used for external memory blocks.
> >>
> >> We are only seeing this now because the gzip decoder doesn't leak and it's 
> >> only
> >> relatively recently that kernels in the wild have switched to better
> >> compression.
> >>
> >> This is https://bugs.debian.org/767295
> >>
> >> Reported by: Gedalya 
> > Gedelya,
> >
> > Could you also test this patch to make sure it does fix the
> > reported issue please?
> 
> So here's what happens now.
> 1. Starts up tiny
> 2. reboot: leak
> 3. reboot: freed (process larger, but the delta is all/mostly shared pages)
> 4. reboot: leak
> 5. reboot: freed
> etc..

WTF, how very strange!

> root@xen:~/xen-pkgs# xl cr /etc/xen/auto/asterisk_deb80.cfg
> Parsing config from /etc/xen/auto/asterisk_deb80.cfg
> root@xen:~/xen-pkgs# ps aux | grep asterisk_deb80
> root 22981  0.0  0.0  95968   588 ?SLsl 21:55   0:00 
> /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
> root@xen:~/xen-pkgs# pmap -x 22981
> 22981:   /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
> Address   Kbytes RSS   Dirty Mode  Mapping
> 0040 144 128   0 r-x-- xl
> 00623000   4   4   4 r xl
> 00624000   8   8   8 rw--- xl
> 00626000   4   4   4 rw---   [ anon ]
> 009a6000 288 240 240 rw---   [ anon ]
> 7f14d400 132   8   8 rw---   [ anon ]
> 7f14d4021000   65404   0   0 -   [ anon ]
> << snip >>
>  --- --- ---
> total kB   959682728 596
> 
> --- reboot domu ---
> 
> root@xen:~/xen-pkgs# ps aux | grep asterisk_deb80
> root 22981  0.6  3.3 131652 20008 ?SLsl 21:55   0:00 
> /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
> root@xen:~/xen-pkgs# pmap -x 22981
> 22981:   /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
> Address   Kbytes RSS   Dirty Mode  Mapping
> 0040 144 144   0 r-x-- xl
> 00623000   4   4   4 r xl
> 00624000   8   8   8 rw--- xl
> 00626000   4   4   4 rw---   [ anon ]
> 009a6000 288 288 288 rw---   [ anon ]
> 009ee000   35676   16772   16772 rw---   [ anon ]

This is the (temporarily) leaked mapping, right?

> Tried valgrind, it doesn't look like it was able to see what was going on

Indeed. The values for total heap usage at exist and still reachable etc
also don't seem to account for the ~3M of mapping on each iteration.

I don't know how glibc's allocator works, but I suppose it isn't
impossible that it is retaining some mappings of free regions and
collecting them to free later somehow, which just happens to only
trigger every other reboot (e.g. perhaps it is based on some threshold
of free memory).

...investigates...

So, http://man7.org/linux/man-pages/man3/malloc.3.html talks about
special behaviour using mmap for allocations above MMAP_THRESHOLD (128K
by default), which we will be hitting here I think. That explains the
anon mapping.

http://man7.org/linux/man-pages/man3/mallopt.3.html also talks about
various dynamic thresholds for growing and shrinking the heap. My guess
is that we are bouncing up and down over some threshold with every other
reboot.

Ian.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#767295: [PATCH for-4.5 v2] libxc: don't leak buffer containing the uncompressed PV kernel

2014-11-20 Thread Gedalya

On 11/20/2014 10:13 PM, Gedalya wrote:

On 11/20/2014 03:21 PM, Konrad Rzeszutek Wilk wrote:

On Thu, Nov 20, 2014 at 03:48:47PM +, Ian Campbell wrote:
The libxc xc_dom_* infrastructure uses a very simple malloc memory 
pool which
is freed by xc_dom_release. However the various xc_try_*_decode 
routines (other
than the gzip one) just use plain malloc/realloc and therefore the 
buffer ends

up leaked.

The memory pool currently supports mmap'd buffers as well as a directly
allocated buffers, however the try decode routines make use of 
realloc and do
not fit well into this model. Introduce a concept of an external 
memory block

to the memory pool and provide an interface to register such memory.

The mmap_ptr and mmap_len fields of the memblock tracking struct 
lose their

mmap_ prefix since they are now also used for external memory blocks.

We are only seeing this now because the gzip decoder doesn't leak 
and it's only

relatively recently that kernels in the wild have switched to better
compression.

This is https://bugs.debian.org/767295

Reported by: Gedalya 

Gedelya,

Could you also test this patch to make sure it does fix the
reported issue please?


So here's what happens now.
1. Starts up tiny
2. reboot: leak
3. reboot: freed (process larger, but the delta is all/mostly shared 
pages)

4. reboot: leak
5. reboot: freed
etc..

For the record, I applied it again the xen package currently in debian, 
4.4.1-3, see attached patch.
The only problem was that tools/libxc/include/xc_dom.h is in 
tools/libxc/xc_dom.h, otherwise it applied fine.


Index: xen-4.4.1/tools/libxc/xc_dom.h
===
--- xen-4.4.1.orig/tools/libxc/xc_dom.h
+++ xen-4.4.1/tools/libxc/xc_dom.h
@@ -33,8 +33,13 @@ struct xc_dom_seg {
 
 struct xc_dom_mem {
 struct xc_dom_mem *next;
-void *mmap_ptr;
-size_t mmap_len;
+void *ptr;
+enum {
+XC_DOM_MEM_TYPE_MALLOC_INTERNAL,
+XC_DOM_MEM_TYPE_MALLOC_EXTERNAL,
+XC_DOM_MEM_TYPE_MMAP,
+} type;
+size_t len;
 unsigned char memory[0];
 };
 
@@ -290,6 +295,7 @@ void xc_dom_log_memory_footprint(struct
 /* --- simple memory pool -- */
 
 void *xc_dom_malloc(struct xc_dom_image *dom, size_t size);
+int xc_dom_register_external(struct xc_dom_image *dom, void *ptr, size_t size);
 void *xc_dom_malloc_page_aligned(struct xc_dom_image *dom, size_t size);
 void *xc_dom_malloc_filemap(struct xc_dom_image *dom,
 const char *filename, size_t * size,
Index: xen-4.4.1/tools/libxc/xc_dom_bzimageloader.c
===
--- xen-4.4.1.orig/tools/libxc/xc_dom_bzimageloader.c
+++ xen-4.4.1/tools/libxc/xc_dom_bzimageloader.c
@@ -161,6 +161,13 @@ static int xc_try_bzip2_decode(
 
 total = (((uint64_t)stream.total_out_hi32) << 32) | stream.total_out_lo32;
 
+if ( xc_dom_register_external(dom, out_buf, total) )
+{
+DOMPRINTF("BZIP2: Error registering stream output");
+free(out_buf);
+goto bzip2_cleanup;
+}
+
 DOMPRINTF("%s: BZIP2 decompress OK, 0x%zx -> 0x%lx",
   __FUNCTION__, *size, (long unsigned int) total);
 
@@ -305,6 +312,13 @@ static int _xc_try_lzma_decode(
 }
 }
 
+if ( xc_dom_register_external(dom, out_buf, stream->total_out) )
+{
+DOMPRINTF("%s: Error registering stream output", what);
+free(out_buf);
+goto lzma_cleanup;
+}
+
 DOMPRINTF("%s: %s decompress OK, 0x%zx -> 0x%zx",
   __FUNCTION__, what, *size, (size_t)stream->total_out);
 
@@ -464,7 +478,13 @@ static int xc_try_lzo1x_decode(
 
 dst_len = lzo_read_32(cur);
 if ( !dst_len )
+{
+msg = "Error registering stream output";
+if ( xc_dom_register_external(dom, out_buf, out_len) )
+break;
+
 return 0;
+}
 
 if ( dst_len > LZOP_MAX_BLOCK_SIZE )
 {
Index: xen-4.4.1/tools/libxc/xc_dom_core.c
===
--- xen-4.4.1.orig/tools/libxc/xc_dom_core.c
+++ xen-4.4.1/tools/libxc/xc_dom_core.c
@@ -132,6 +132,7 @@ void *xc_dom_malloc(struct xc_dom_image
 return NULL;
 }
 memset(block, 0, sizeof(*block) + size);
+block->type = XC_DOM_MEM_TYPE_MALLOC_INTERNAL;
 block->next = dom->memblocks;
 dom->memblocks = block;
 dom->alloc_malloc += sizeof(*block) + size;
@@ -151,23 +152,45 @@ void *xc_dom_malloc_page_aligned(struct
 return NULL;
 }
 memset(block, 0, sizeof(*block));
-block->mmap_len = size;
-block->mmap_ptr = mmap(NULL, block->mmap_len,
-   PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON,
-   -1, 0);
-if ( block->mmap_ptr == MAP_FAILED )
+block->len = size;
+block->ptr = mmap(NULL, block->len,
+  PROT_READ | PROT_WRITE, MAP_PRI

Bug#767295: [PATCH for-4.5 v2] libxc: don't leak buffer containing the uncompressed PV kernel

2014-11-20 Thread Gedalya

On 11/20/2014 03:21 PM, Konrad Rzeszutek Wilk wrote:

On Thu, Nov 20, 2014 at 03:48:47PM +, Ian Campbell wrote:

The libxc xc_dom_* infrastructure uses a very simple malloc memory pool which
is freed by xc_dom_release. However the various xc_try_*_decode routines (other
than the gzip one) just use plain malloc/realloc and therefore the buffer ends
up leaked.

The memory pool currently supports mmap'd buffers as well as a directly
allocated buffers, however the try decode routines make use of realloc and do
not fit well into this model. Introduce a concept of an external memory block
to the memory pool and provide an interface to register such memory.

The mmap_ptr and mmap_len fields of the memblock tracking struct lose their
mmap_ prefix since they are now also used for external memory blocks.

We are only seeing this now because the gzip decoder doesn't leak and it's only
relatively recently that kernels in the wild have switched to better
compression.

This is https://bugs.debian.org/767295

Reported by: Gedalya 

Gedelya,

Could you also test this patch to make sure it does fix the
reported issue please?


So here's what happens now.
1. Starts up tiny
2. reboot: leak
3. reboot: freed (process larger, but the delta is all/mostly shared pages)
4. reboot: leak
5. reboot: freed
etc..

root@xen:~/xen-pkgs# xl cr /etc/xen/auto/asterisk_deb80.cfg
Parsing config from /etc/xen/auto/asterisk_deb80.cfg
root@xen:~/xen-pkgs# ps aux | grep asterisk_deb80
root 22981  0.0  0.0  95968   588 ?SLsl 21:55   0:00 
/usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg

root@xen:~/xen-pkgs# pmap -x 22981
22981:   /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
Address   Kbytes RSS   Dirty Mode  Mapping
0040 144 128   0 r-x-- xl
00623000   4   4   4 r xl
00624000   8   8   8 rw--- xl
00626000   4   4   4 rw---   [ anon ]
009a6000 288 240 240 rw---   [ anon ]
7f14d400 132   8   8 rw---   [ anon ]
7f14d4021000   65404   0   0 -   [ anon ]
<< snip >>
 --- --- ---
total kB   959682728 596

--- reboot domu ---

root@xen:~/xen-pkgs# ps aux | grep asterisk_deb80
root 22981  0.6  3.3 131652 20008 ?SLsl 21:55   0:00 
/usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg

root@xen:~/xen-pkgs# pmap -x 22981
22981:   /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
Address   Kbytes RSS   Dirty Mode  Mapping
0040 144 144   0 r-x-- xl
00623000   4   4   4 r xl
00624000   8   8   8 rw--- xl
00626000   4   4   4 rw---   [ anon ]
009a6000 288 288 288 rw---   [ anon ]
009ee000   35676   16772   16772 rw---   [ anon ]
7f14d400 132   8   8 rw---   [ anon ]
7f14d4021000   65404   0   0 -   [ anon ]
7f14d9ae 104 100   0 r-x-- libz.so.1.2.8
<< snip >>
 --- --- ---
total kB  131652   20072   17424

--- reboot domu ---

root@xen:~/xen-pkgs# ps aux | grep asterisk_deb80
root 22981  0.5  0.5  95876  3136 ?SLsl 21:55   0:01 
/usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg

root@xen:~/xen-pkgs# pmap -x 22981
22981:   /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
Address   Kbytes RSS   Dirty Mode  Mapping
0040 144 144   0 r-x-- xl
00623000   4   4   4 r xl
00624000   8   8   8 rw--- xl
00626000   4   4   4 rw---   [ anon ]
009a6000 188 188 188 rw---   [ anon ]
7f14d400 132   8   8 rw---   [ anon ]
7f14d4021000   65404   0   0 -   [ anon ]
7f14d9ae 104 100   0 r-x-- libz.so.1.2.8
<< snip >>
 --- --- ---
total kB   958763200 552

--- reboot domu ---

root@xen:~/xen-pkgs# ps aux | grep asterisk_deb80
root 22981  0.6  3.4 134660 20548 ?SLsl 21:55   0:02 
/usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg

root@xen:~/xen-pkgs# pmap -x 22981
22981:   /usr/lib/xen-4.4/bin/xl cr /etc/xen/auto/asterisk_deb80.cfg
Address   Kbytes RSS   Dirty Mode  Mapping
0040 144 144   0 r-x-- xl
00623000   4   4   4 r xl
00624000   8   8   8 rw--- xl
00626000   4   4   4 rw---   [ anon ]
009a6000 188 188 188 rw---   [ anon ]
009d5000   38784   17348   17348 rw---   [ anon ]
7f14d400 132   8   8 rw---   [ anon ]
7f14d4021000   65404   0   0 -   [ anon ]
7f14d9ae 104 100   0 r-x-- libz.so.1.2.8
<< snip >>
 --- --- ---
total k