Re: [Gluster-devel] finding and fixing memory leaks in xlators

2017-04-25 Thread Niels de Vos
On Tue, Apr 25, 2017 at 02:55:30PM +0530, Amar Tumballi wrote:
> Thanks for this detailed Email with clear instructions Niels.
> 
> Can we have this as github issues as well? I don't want this information to
> be lost as part of email thread in ML archive. I would like to see if I can
> get some interns to work on these leaks per component as part of their
> college project etc.

This is already reported as a GitHub issue as
https://github.com/gluster/glusterfs/issues/176 . Fixes for the leaks in
the different xlators will need a bug report in Bugzilla. For now I have
been creating new bugs per xlator, and added BZ 1425623 in the "blocks"
field of the xlator bugs.

Many fixes will be part of glusterfs-3.11. At the moment I am focussing
mostly on the Gluster core (libglusterfs) leaks. Leaks in the xlators
tend to be a little easier to fix, and would be more suitable for junior
engineers. I'm happy to explain more details if needed :)

Thanks!
Niels


> 
> -Amar
> 
> On Tue, Apr 25, 2017 at 2:29 PM, Niels de Vos  wrote:
> 
> > Hi,
> >
> > with the use of gfapi it has become clear that Gluster was never really
> > developed to be loaded in applications. There are many different memory
> > leaks that get exposed through the usage of gfapi. Before, memory leaks
> > were mostly cleaned up automatically because processes would exit. Now,
> > there are applications that initialize a GlusterFS client to access a
> > volume, and de-init that client once not needed anymore. Unfortunately
> > upon the de-init, not all allocated memory is free'd again. These are
> > often just a few bytes, but for long running processes this can become a
> > real problem.
> >
> > Finding the memory leaks in xlators has been a tricky thing. Valgrind
> > would often not know what function/source the allocation did, and fixing
> > the leak would become a real hunt. There have been some patches merged
> > that make Valgrind work more easily, and a few patches still need a
> > little more review before everything is available. The document below
> > describes how to use a new "sink" xlator to detect memory leaks. This
> > xlator can only be merged when a change for the graph initialization is
> > included too:
> >
> >  - https://review.gluster.org/16796 - glusterfs_graph_prepare
> >  - https://review.gluster.org/16806 - sink xlator and developer doc
> >
> > It would be most welcome if other developers can reviwe the linked
> > changes, so that everyone can easily debug memory leaks from thair
> > favorite xlators.
> >
> > There is a "run-xlator.sh" script in my (for now) personal
> > "gluster-debug" repository that can be used to load an arbitrary xlator
> > along with "sink". See
> > https://github.com/nixpanic/gluster-debug/tree/master/gfapi-load-volfile
> > for more details.
> >
> > Thanks,
> > Niels
> >
> >
> > From doc/developer-guide/identifying-resource-leaks.md:
> >
> > # Identifying Resource Leaks
> >
> > Like most other pieces of software, GlusterFS is not perfect in how it
> > manages
> > its resources like memory, threads and the like. Gluster developers try
> > hard to
> > prevent leaking resources but releasing and unallocating the used
> > structures.
> > Unfortunately every now and then some resource leaks are unintentionally
> > added.
> >
> > This document tries to explain a few helpful tricks to identify resource
> > leaks
> > so that they can be addressed.
> >
> >
> > ## Debug Builds
> >
> > There are certain techniques used in GlusterFS that make it difficult to
> > use
> > tools like Valgrind for memory leak detection. There are some build options
> > that make it more practical to use Valgrind and other tools. When running
> > Valgrind, it is important to have GlusterFS builds that contain the
> > debuginfo/symbols. Some distributions (try to) strip the debuginfo to get
> > smaller executables. Fedora and RHEL based distributions have sub-packages
> > called ...-debuginfo that need to be installed for symbol resolving.
> >
> >
> > ### Memory Pools
> >
> > By using memory pools, there are no allocation/freeing of single structures
> > needed. This improves performance, but also makes it impossible to track
> > the
> > allocation and freeing of srtuctures.
> >
> > It is possible to disable the use of memory pools, and use standard
> > `malloc()`
> > and `free()` functions provided by the C library. Valgrind is then able to
> > track the allocated areas and verify if they have been free'd. In order to
> > disable memory pools, the Gluster sources needs to be configured with the
> > `--enable-debug` option:
> >
> > ```shell
> > ./configure --enable-debug
> > ```
> >
> > When building RPMs, the `.spec` handles the `--with=debug` option too:
> >
> > ```shell
> > make dist
> > rpmbuild -ta --with=debug glusterfs-tar.gz
> > ```
> >
> > ### Dynamically Loaded xlators
> >
> > Valgrind tracks the call chain of functions that do memory allocations. The
> > addresses of the functions are stored and before Valgrind exits the
> 

Re: [Gluster-devel] finding and fixing memory leaks in xlators

2017-04-25 Thread Amar Tumballi
Thanks for this detailed Email with clear instructions Niels.

Can we have this as github issues as well? I don't want this information to
be lost as part of email thread in ML archive. I would like to see if I can
get some interns to work on these leaks per component as part of their
college project etc.

-Amar

On Tue, Apr 25, 2017 at 2:29 PM, Niels de Vos  wrote:

> Hi,
>
> with the use of gfapi it has become clear that Gluster was never really
> developed to be loaded in applications. There are many different memory
> leaks that get exposed through the usage of gfapi. Before, memory leaks
> were mostly cleaned up automatically because processes would exit. Now,
> there are applications that initialize a GlusterFS client to access a
> volume, and de-init that client once not needed anymore. Unfortunately
> upon the de-init, not all allocated memory is free'd again. These are
> often just a few bytes, but for long running processes this can become a
> real problem.
>
> Finding the memory leaks in xlators has been a tricky thing. Valgrind
> would often not know what function/source the allocation did, and fixing
> the leak would become a real hunt. There have been some patches merged
> that make Valgrind work more easily, and a few patches still need a
> little more review before everything is available. The document below
> describes how to use a new "sink" xlator to detect memory leaks. This
> xlator can only be merged when a change for the graph initialization is
> included too:
>
>  - https://review.gluster.org/16796 - glusterfs_graph_prepare
>  - https://review.gluster.org/16806 - sink xlator and developer doc
>
> It would be most welcome if other developers can reviwe the linked
> changes, so that everyone can easily debug memory leaks from thair
> favorite xlators.
>
> There is a "run-xlator.sh" script in my (for now) personal
> "gluster-debug" repository that can be used to load an arbitrary xlator
> along with "sink". See
> https://github.com/nixpanic/gluster-debug/tree/master/gfapi-load-volfile
> for more details.
>
> Thanks,
> Niels
>
>
> From doc/developer-guide/identifying-resource-leaks.md:
>
> # Identifying Resource Leaks
>
> Like most other pieces of software, GlusterFS is not perfect in how it
> manages
> its resources like memory, threads and the like. Gluster developers try
> hard to
> prevent leaking resources but releasing and unallocating the used
> structures.
> Unfortunately every now and then some resource leaks are unintentionally
> added.
>
> This document tries to explain a few helpful tricks to identify resource
> leaks
> so that they can be addressed.
>
>
> ## Debug Builds
>
> There are certain techniques used in GlusterFS that make it difficult to
> use
> tools like Valgrind for memory leak detection. There are some build options
> that make it more practical to use Valgrind and other tools. When running
> Valgrind, it is important to have GlusterFS builds that contain the
> debuginfo/symbols. Some distributions (try to) strip the debuginfo to get
> smaller executables. Fedora and RHEL based distributions have sub-packages
> called ...-debuginfo that need to be installed for symbol resolving.
>
>
> ### Memory Pools
>
> By using memory pools, there are no allocation/freeing of single structures
> needed. This improves performance, but also makes it impossible to track
> the
> allocation and freeing of srtuctures.
>
> It is possible to disable the use of memory pools, and use standard
> `malloc()`
> and `free()` functions provided by the C library. Valgrind is then able to
> track the allocated areas and verify if they have been free'd. In order to
> disable memory pools, the Gluster sources needs to be configured with the
> `--enable-debug` option:
>
> ```shell
> ./configure --enable-debug
> ```
>
> When building RPMs, the `.spec` handles the `--with=debug` option too:
>
> ```shell
> make dist
> rpmbuild -ta --with=debug glusterfs-tar.gz
> ```
>
> ### Dynamically Loaded xlators
>
> Valgrind tracks the call chain of functions that do memory allocations. The
> addresses of the functions are stored and before Valgrind exits the
> addresses
> are resolved into human readable function names and offsets (line numbers
> in
> source files). Because Gluster loads xlators dynamically, and unloads then
> before exiting, Valgrind is not able to resolve the function addresses into
> symbols anymore. Whenever this happend, Valgrind shows `???` in the output,
> like
>
> ```
>   ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233
> of 324
>   ==25170==at 0x4C29975: calloc (vg_replace_malloc.c:711)
>   ==25170==by 0x52C7C0B: __gf_calloc (mem-pool.c:117)
>   ==25170==by 0x12B0638A: ???
>   ==25170==by 0x528FCE6: __xlator_init (xlator.c:472)
>   ==25170==by 0x528FE16: xlator_init (xlator.c:498)
>   ...
> ```
>
> These `???` can be prevented by not calling `dlclose()` for unloading the
> xlator. This will cause a small leak of the handle that was return

[Gluster-devel] finding and fixing memory leaks in xlators

2017-04-25 Thread Niels de Vos
Hi,

with the use of gfapi it has become clear that Gluster was never really
developed to be loaded in applications. There are many different memory
leaks that get exposed through the usage of gfapi. Before, memory leaks
were mostly cleaned up automatically because processes would exit. Now,
there are applications that initialize a GlusterFS client to access a
volume, and de-init that client once not needed anymore. Unfortunately
upon the de-init, not all allocated memory is free'd again. These are
often just a few bytes, but for long running processes this can become a
real problem.

Finding the memory leaks in xlators has been a tricky thing. Valgrind
would often not know what function/source the allocation did, and fixing
the leak would become a real hunt. There have been some patches merged
that make Valgrind work more easily, and a few patches still need a
little more review before everything is available. The document below
describes how to use a new "sink" xlator to detect memory leaks. This
xlator can only be merged when a change for the graph initialization is
included too:

 - https://review.gluster.org/16796 - glusterfs_graph_prepare
 - https://review.gluster.org/16806 - sink xlator and developer doc

It would be most welcome if other developers can reviwe the linked
changes, so that everyone can easily debug memory leaks from thair
favorite xlators.

There is a "run-xlator.sh" script in my (for now) personal
"gluster-debug" repository that can be used to load an arbitrary xlator
along with "sink". See
https://github.com/nixpanic/gluster-debug/tree/master/gfapi-load-volfile
for more details.

Thanks,
Niels


From doc/developer-guide/identifying-resource-leaks.md:

# Identifying Resource Leaks

Like most other pieces of software, GlusterFS is not perfect in how it manages
its resources like memory, threads and the like. Gluster developers try hard to
prevent leaking resources but releasing and unallocating the used structures.
Unfortunately every now and then some resource leaks are unintentionally added.

This document tries to explain a few helpful tricks to identify resource leaks
so that they can be addressed.


## Debug Builds

There are certain techniques used in GlusterFS that make it difficult to use
tools like Valgrind for memory leak detection. There are some build options
that make it more practical to use Valgrind and other tools. When running
Valgrind, it is important to have GlusterFS builds that contain the
debuginfo/symbols. Some distributions (try to) strip the debuginfo to get
smaller executables. Fedora and RHEL based distributions have sub-packages
called ...-debuginfo that need to be installed for symbol resolving.


### Memory Pools

By using memory pools, there are no allocation/freeing of single structures
needed. This improves performance, but also makes it impossible to track the
allocation and freeing of srtuctures.

It is possible to disable the use of memory pools, and use standard `malloc()`
and `free()` functions provided by the C library. Valgrind is then able to
track the allocated areas and verify if they have been free'd. In order to
disable memory pools, the Gluster sources needs to be configured with the
`--enable-debug` option:

```shell
./configure --enable-debug
```

When building RPMs, the `.spec` handles the `--with=debug` option too:

```shell
make dist
rpmbuild -ta --with=debug glusterfs-tar.gz
```

### Dynamically Loaded xlators

Valgrind tracks the call chain of functions that do memory allocations. The
addresses of the functions are stored and before Valgrind exits the addresses
are resolved into human readable function names and offsets (line numbers in
source files). Because Gluster loads xlators dynamically, and unloads then
before exiting, Valgrind is not able to resolve the function addresses into
symbols anymore. Whenever this happend, Valgrind shows `???` in the output,
like

```
  ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324
  ==25170==at 0x4C29975: calloc (vg_replace_malloc.c:711)
  ==25170==by 0x52C7C0B: __gf_calloc (mem-pool.c:117)
  ==25170==by 0x12B0638A: ???
  ==25170==by 0x528FCE6: __xlator_init (xlator.c:472)
  ==25170==by 0x528FE16: xlator_init (xlator.c:498)
  ...
```

These `???` can be prevented by not calling `dlclose()` for unloading the
xlator. This will cause a small leak of the handle that was returned with
`dlopen()`, but for improved debugging this can be acceptible. For this and
other Valgrind features, a `--enable-valgrind` option is available to
`./configure`. When GlusterFS is built with this option, Valgrind will be able
to resolve the symbol names of the functions that do memory allocations inside
xlators.

```shell
./configure --enable-valgrind
```

When building RPMs, the `.spec` handles the `--with=valgrind` option too:

```shell
make dist
rpmbuild -ta --with=valgrind glusterfs-tar.gz
```

## Running Valgrind against a single xlator

Debugging a single xl