from:"Oleksandr Natalenko"

Re: [Gluster-devel] [RFC] Reducing maintenance burden and moving fuse support to an external project

2017-03-06 Thread Oleksandr Natalenko

H.

Keep me CCed, please, because for a couple last months I do not follow 
GlusterFS development…

On pátek 3. března 2017 21:50:07 CET Niels de Vos wrote:
> At the moment we have three top-level interfaces to maintain in Gluster,
> these are FUSE, Gluster/NFS and gfapi. If any work is needed to support
> new options, FOPs or other functionalities, we mostly have to do the
> work 3x. Often one of the interfaces gets forgotten, or does not need
> the new feature immediately (backlog++). This is bothering me every now
> and then, specially when bugs get introduced and need to get fixed in
> different ways for these three interfaced.
> 
> One of my main goals is to reduce the code duplication, and move
> everything to gfapi. We are on a good way to use NFS-Ganesha instead of
> Gluster/NFS already. In a similar approach, I would love to see
> deprecating our xlators/mount sources[0], and have it replaced by
> xglfs[1] from Oleksandr.
> 
> Having the FUSE mount binaries provided by a separate project should
> make it easier to implement things like subdirectory mounts (Samba and
> NFS-Ganesha already do this in some form through gfapi).
> 
> xglfs is not packaged in any distribution yet, this allows us to change
> the current commandline interface to something we deem more suitable (if
> so).
> 
> I would like to get some opinions from others, and if there are no
> absolute objections, we can work out a plan to make xglfs an alternative
> to the fuse-bridge and eventually replace it.
> 
> Thanks,
> Niels
> 
> 
> 0. https://github.com/gluster/glusterfs/tree/master/xlators/mount
> 1. https://github.com/gluster/xglfs


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Memory management and friends

2016-10-26 Thread Oleksandr Natalenko


Hello.

As a result of today's community meeting I start dedicated ML thread for 
gathering memory management issues together to make it possible to 
summarize them and construct some plan what to do next.


Very important notice: I'm not the active GlusterFS developer, but 
gained excessive experience with GlusterFS in the past at previous work, 
and the main issue that was chasing me all the time was memory leaking. 
Consider this as a request for action from GlusterFS customer, 
apparently approved by Kaushal and Amye during last meeting :).


So, here go key points.

1) Almost all nasty and obvious memory leaks have been successfully 
fixed during the last year, and that allowed me to run GlusterFS in 
production at previous work for almost all types of workload except one 
— dovecot mail storage. The specific of this workload is that it 
involved huge amount of files, and I assume this to be kinda of edge 
case unhiding some dark corners of GlusterFS memory management. I was 
able to provide Nithya with Valgrind+Massif memory profiling results and 
test case, and that helped her to prepare at least 1 extra fix (and more 
to come AFAIK), which has some deal with readdirp-related code. 
Nevertheless, it is reported that this is not the major source of 
leaking. Nithya suspect that memory gets fragmented heavily due to lots 
of small allocations, and memory pools cannot cope with this kind of 
fragmentation under constant load.


Related BZs:

  * https://bugzilla.redhat.com/show_bug.cgi?id=1369364
  * https://bugzilla.redhat.com/show_bug.cgi?id=1380249

People involved:

  * nbalacha, could you please provide more info on your findings?

2) Meanwhile, Jeff goes on with brick multiplexing feature, facing some 
issue with memory management too and blaming memory pools for that.


Related ML email:

  * 
http://www.gluster.org/pipermail/gluster-devel/2016-October/051118.html
  * 
http://www.gluster.org/pipermail/gluster-devel/2016-October/051160.html


People involved:

  * jdarcy, have you discussed this outside of ML? It seems your email 
didn't get proper attention.


3) We had brief discussion with obnox and anoopcs on #gluster-meeting 
and #gluster-dev regarding jemalloc and talloc. obnox believes that we 
may use both, jemalloc for substituting malloc/free, talloc for 
rewriting memory management for GlusterFS properly.


Related logs:

  * 
https://botbot.me/freenode/gluster-dev/2016-10-26/?msg=75501394&page=2


People involved:

  * obnox, could you share your ideas on this?

To summarize:

1) we need key devs involved in memory management to share their ideas;
2) using production-proven memory allocators and memory pools 
implementation is desired;
3) someone should manage the workflow of reconstructing memory 
management.


Feel free to add anything I've missed.

Regards,
  Oleksandr
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Spurious failure of ./tests/bugs/glusterd/bug-913555.t

2016-10-11 Thread Oleksandr Natalenko


Hello.

Vijay asked me to drop a note about spurious failure of 
./tests/bugs/glusterd/bug-913555.t test. Here are the examples:


* https://build.gluster.org/job/centos6-regression/1069/consoleFull
* https://build.gluster.org/job/centos6-regression/1076/consoleFull

Could someone take a look at it?

Also, last two tests were broken because of this:

===
Slave went offline during the build
===

See these builds for details:

* https://build.gluster.org/job/centos6-regression/1077/consoleFull
* https://build.gluster.org/job/centos6-regression/1078/consoleFull

Was that intentionally?

Thanks.

Regards,
  Oleksandr
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Libunwind

2016-09-08 Thread Oleksandr Natalenko


08.09.2016 16:07, Jeff Darcy wrote:

(1) Has somebody already gone down this path?  Does it work?


We've switched most of our internal projects to libunwind. It works OK.


(2) Are there any other reasons we wouldn't want to switch?


No, just go and switch :).
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Profiling GlusterFS FUSE client with Valgrind's Massif tool

2016-09-06 Thread Oleksandr Natalenko

Created BZ for it [1].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1373630

On вівторок, 6 вересня 2016 р. 23:32:51 EEST Pranith Kumar Karampuri wrote:
> I included you on a thread on users, let us see if he can help you out.
> 
> On Mon, Aug 29, 2016 at 4:02 PM, Oleksandr Natalenko <
> 
> oleksa...@natalenko.name> wrote:
> > More info here.
> > 
> > Massif puts the following warning on volume unmount:
> > 
> > ===
> > valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo ==
> > bszB_hi' failed.
> > valgrind: Heap block lo/hi size mismatch: lo = 1, hi = 0.
> > This is probably caused by your program erroneously writing past the
> > end of a heap block and corrupting heap metadata.  If you fix any
> > invalid writes reported by Memcheck, this assertion failure will
> > probably go away.  Please try that before reporting this as a bug.
> > ...
> > Thread 1: status = VgTs_Runnable
> > ==30590==at 0x4C29037: free (in /usr/lib64/valgrind/vgpreload_
> > massif-amd64-linux.so)
> > ==30590==by 0x67CE63B: __libc_freeres (in /usr/lib64/libc-2.17.so)
> > ==30590==by 0x4A246B4: _vgnU_freeres (in
> > /usr/lib64/valgrind/vgpreload_
> > core-amd64-linux.so)
> > ==30590==by 0x66A2E2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so
> > )
> > ==30590==by 0x66A2EB4: exit (in /usr/lib64/libc-2.17.so)
> > ==30590==by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
> > ==30590==by 0x669F66F: ??? (in /usr/lib64/libc-2.17.so)
> > ==30590==by 0x606EEF4: pthread_join (in /usr/lib64/libpthread-2.17.so)
> > ==30590==by 0x4EC2687: event_dispatch_epoll (event-epoll.c:762)
> > ==30590==by 0x10E876: main (glusterfsd.c:2370)
> > ...
> > ===
> > 
> > I rechecked mount/ls/unmount with memcheck tool as suggested and got the
> > following:
> > 
> > ===
> > ...
> > ==30315== Thread 8:
> > ==30315== Syscall param writev(vector[...]) points to uninitialised
> > byte(s)
> > ==30315==at 0x675FEA0: writev (in /usr/lib64/libc-2.17.so)
> > ==30315==by 0xE664795: send_fuse_iov (fuse-bridge.c:158)
> > ==30315==by 0xE6649B9: send_fuse_data (fuse-bridge.c:197)
> > ==30315==by 0xE666F7A: fuse_attr_cbk (fuse-bridge.c:753)
> > ==30315==by 0xE6671A6: fuse_root_lookup_cbk (fuse-bridge.c:783)
> > ==30315==by 0x14519937: io_stats_lookup_cbk (io-stats.c:1512)
> > ==30315==by 0x14300B3E: mdc_lookup_cbk (md-cache.c:867)
> > ==30315==by 0x13EE9226: qr_lookup_cbk (quick-read.c:446)
> > ==30315==by 0x13CD8B66: ioc_lookup_cbk (io-cache.c:260)
> > ==30315==by 0x1346405D: dht_revalidate_cbk (dht-common.c:985)
> > ==30315==by 0x1320EC60: afr_discover_done (afr-common.c:2316)
> > ==30315==by 0x1320EC60: afr_discover_cbk (afr-common.c:2361)
> > ==30315==by 0x12F9EE91: client3_3_lookup_cbk (client-rpc-fops.c:2981)
> > ==30315==  Address 0x170b238c is on thread 8's stack
> > ==30315==  in frame #3, created by fuse_attr_cbk (fuse-bridge.c:723)
> > ...
> > ==30315== Warning: invalid file descriptor -1 in syscall close()
> > ==30315== Thread 1:
> > ==30315== Invalid free() / delete / delete[] / realloc()
> > ==30315==at 0x4C2AD17: free (in /usr/lib64/valgrind/vgpreload_
> > memcheck-amd64-linux.so)
> > ==30315==by 0x67D663B: __libc_freeres (in /usr/lib64/libc-2.17.so)
> > ==30315==by 0x4A246B4: _vgnU_freeres (in
> > /usr/lib64/valgrind/vgpreload_
> > core-amd64-linux.so)
> > ==30315==by 0x66AAE2A: __run_exit_handlers (in /usr/lib64/libc-2.17.so
> > )
> > ==30315==by 0x66AAEB4: exit (in /usr/lib64/libc-2.17.so)
> > ==30315==by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
> > ==30315==by 0x66A766F: ??? (in /usr/lib64/libc-2.17.so)
> > ==30315==    by 0x6076EF4: pthread_join (in /usr/lib64/libpthread-2.17.so)
> > ==30315==by 0x4ECA687: event_dispatch_epoll (event-epoll.c:762)
> > ==30315==by 0x10E876: main (glusterfsd.c:2370)
> > ==30315==  Address 0x6a2d3d0 is 0 bytes inside data symbol
> > "noai6ai_cached"
> > ===
> > 
> > It seems Massif crashes (?) because of invalid memory access in glusterfs
> > process cleanup stage.
> > 
> > Pranith? Nithya?
> > 
> > 29.08.2016 13:14, Oleksandr Natalenko wrote:
> >> ===
> >> valgrind --tool=massif --trace-children=yes /usr/sbin/glusterfs -N
> >> --volfile-server=server.example.com --volfile-id=test
> >> /mnt/net/glusterfs/test
> >> ===
> > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Profiling GlusterFS FUSE client with Valgrind's Massif tool

2016-08-29 Thread Oleksandr Natalenko


More info here.

Massif puts the following warning on volume unmount:

===
valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo == 
bszB_hi' failed.

valgrind: Heap block lo/hi size mismatch: lo = 1, hi = 0.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.
...
Thread 1: status = VgTs_Runnable
==30590==at 0x4C29037: free (in 
/usr/lib64/valgrind/vgpreload_massif-amd64-linux.so)

==30590==by 0x67CE63B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==30590==by 0x4A246B4: _vgnU_freeres (in 
/usr/lib64/valgrind/vgpreload_core-amd64-linux.so)
==30590==by 0x66A2E2A: __run_exit_handlers (in 
/usr/lib64/libc-2.17.so)

==30590==by 0x66A2EB4: exit (in /usr/lib64/libc-2.17.so)
==30590==by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
==30590==by 0x669F66F: ??? (in /usr/lib64/libc-2.17.so)
==30590==by 0x606EEF4: pthread_join (in 
/usr/lib64/libpthread-2.17.so)

==30590==by 0x4EC2687: event_dispatch_epoll (event-epoll.c:762)
==30590==by 0x10E876: main (glusterfsd.c:2370)
...
===

I rechecked mount/ls/unmount with memcheck tool as suggested and got the 
following:


===
...
==30315== Thread 8:
==30315== Syscall param writev(vector[...]) points to uninitialised 
byte(s)

==30315==at 0x675FEA0: writev (in /usr/lib64/libc-2.17.so)
==30315==by 0xE664795: send_fuse_iov (fuse-bridge.c:158)
==30315==by 0xE6649B9: send_fuse_data (fuse-bridge.c:197)
==30315==by 0xE666F7A: fuse_attr_cbk (fuse-bridge.c:753)
==30315==by 0xE6671A6: fuse_root_lookup_cbk (fuse-bridge.c:783)
==30315==by 0x14519937: io_stats_lookup_cbk (io-stats.c:1512)
==30315==by 0x14300B3E: mdc_lookup_cbk (md-cache.c:867)
==30315==by 0x13EE9226: qr_lookup_cbk (quick-read.c:446)
==30315==by 0x13CD8B66: ioc_lookup_cbk (io-cache.c:260)
==30315==by 0x1346405D: dht_revalidate_cbk (dht-common.c:985)
==30315==by 0x1320EC60: afr_discover_done (afr-common.c:2316)
==30315==by 0x1320EC60: afr_discover_cbk (afr-common.c:2361)
==30315==by 0x12F9EE91: client3_3_lookup_cbk 
(client-rpc-fops.c:2981)

==30315==  Address 0x170b238c is on thread 8's stack
==30315==  in frame #3, created by fuse_attr_cbk (fuse-bridge.c:723)
...
==30315== Warning: invalid file descriptor -1 in syscall close()
==30315== Thread 1:
==30315== Invalid free() / delete / delete[] / realloc()
==30315==at 0x4C2AD17: free (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)

==30315==by 0x67D663B: __libc_freeres (in /usr/lib64/libc-2.17.so)
==30315==by 0x4A246B4: _vgnU_freeres (in 
/usr/lib64/valgrind/vgpreload_core-amd64-linux.so)
==30315==by 0x66AAE2A: __run_exit_handlers (in 
/usr/lib64/libc-2.17.so)

==30315==by 0x66AAEB4: exit (in /usr/lib64/libc-2.17.so)
==30315==by 0x1117E9: cleanup_and_exit (glusterfsd.c:1308)
==30315==by 0x66A766F: ??? (in /usr/lib64/libc-2.17.so)
==30315==by 0x6076EF4: pthread_join (in 
/usr/lib64/libpthread-2.17.so)

==30315==by 0x4ECA687: event_dispatch_epoll (event-epoll.c:762)
==30315==by 0x10E876: main (glusterfsd.c:2370)
==30315==  Address 0x6a2d3d0 is 0 bytes inside data symbol 
"noai6ai_cached"

===

It seems Massif crashes (?) because of invalid memory access in 
glusterfs process cleanup stage.


Pranith? Nithya?

29.08.2016 13:14, Oleksandr Natalenko wrote:

===
valgrind --tool=massif --trace-children=yes /usr/sbin/glusterfs -N
--volfile-server=server.example.com --volfile-id=test
/mnt/net/glusterfs/test
===

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Profiling GlusterFS FUSE client with Valgrind's Massif tool

2016-08-29 Thread Oleksandr Natalenko


Hello.

While dancing around huge memory consumption by FUSE client [1], I was 
suggested by Pranith to use Massif tool to find out the reason of the 
leak.


Unfortunately, it does not work for me properly, and I believe I do 
something wrong.


Instead of generating report after unmounting volume or sigterming 
glusterfs process, Valgrind generates 2 reports (for 2 PIDs) just right 
after launch, and does not update them further, even on exit. I believe, 
that is because something is going on with forking, but I cannot figure 
out, what's going wrong.


The command I use to launch GlusterFS via Valgrind+Massif:

===
valgrind --tool=massif --trace-children=yes /usr/sbin/glusterfs -N 
--volfile-server=server.example.com --volfile-id=test 
/mnt/net/glusterfs/test

===

Any ideas or sample usecases for Massif+GlusterFS?

Thanks.

Regards,
  Oleksandr

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1369364
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Update on GlusterFS-3.7.15

2016-08-22 Thread Oleksandr Natalenko

[1] and [2], please. Those are 2 parts of one fix that is backported 
from master. They are already backported to 3.8, so only backport to 3.7 
is left.


Regards,
  Oleksandr

[1] http://review.gluster.org/#/c/14835/
[2] http://review.gluster.org/#/c/15167/


22.08.2016 15:25, Kaushal M wrote:

Notify the maintainers and me of any changes you need merged. You can
reply to this thread to notify. Try to ensure that your changes get
merged before this weekend.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] CentOS Regressions Failures for the last week

2016-08-22 Thread Oleksandr Natalenko


22.08.2016 10:34, Nigel Babu wrote:

./TESTS/BASIC/GFAPI/GFAPI-TRUNC.T; Failed 6 times


Fixed: http://review.gluster.org/#/c/15223/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Glusterfs-3.7.13 release plans

2016-07-08 Thread Oleksandr Natalenko


Does this issue have some fix pending, or there is just bugreport?

08.07.2016 15:12, Kaushal M написав:

On Fri, Jul 8, 2016 at 2:22 PM, Raghavendra Gowdappa
 wrote:

There seems to be a major inode leak in fuse-clients:
https://bugzilla.redhat.com/show_bug.cgi?id=1353856

We have found an RCA through code reading (though have a high 
confidence on the RCA). Do we want to include this in 3.7.13?


I'm not going to be delaying the release anymore. I'll be adding this
issue into the release-notes as a known-issue.



regards,
Raghavendra.

- Original Message -

From: "Kaushal M" 
To: "Pranith Kumar Karampuri" 
Cc: maintain...@gluster.org, "Gluster Devel" 


Sent: Friday, July 8, 2016 11:51:11 AM
Subject: Re: [Gluster-Maintainers] Glusterfs-3.7.13 release plans

On Fri, Jul 8, 2016 at 9:59 AM, Pranith Kumar Karampuri
 wrote:
> Could you take in http://review.gluster.org/#/c/14598/ as well? It is ready
> for merge.
>
> On Thu, Jul 7, 2016 at 3:02 PM, Atin Mukherjee  wrote:
>>
>> Can you take in http://review.gluster.org/#/c/14861 ?

Can you get one of the maintainers to give it a +2?

>>
>>
>> On Thursday 7 July 2016, Kaushal M  wrote:
>>>
>>> On Thu, Jun 30, 2016 at 11:08 AM, Kaushal M  wrote:
>>> > Hi all,
>>> >
>>> > I'm (or was) planning to do a 3.7.13 release on schedule today. 3.7.12
>>> > has a huge issue with libgfapi, solved by [1].
>>> > I'm not sure if this fixes the other issues with libgfapi noticed by
>>> > Lindsay on gluster-users.
>>> >
>>> > This patch has been included in the packages 3.7.12 built for CentOS,
>>> > Fedora, Ubuntu, Debian and SUSE. I guess Lindsay is using one of these
>>> > packages, so it might be that the issue seen is new. So I'd like to do
>>> > a quick release once we have a fix.
>>> >
>>> > Maintainers can merge changes into release-3.7 that follow the
>>> > criteria given in [2]. Please make sure to add the bugs for patches
>>> > you are merging are added as dependencies for the 3.7.13 tracker bug
>>> > [3].
>>> >
>>>
>>> I've just merged the fix for the gfapi breakage into release-3.7, and
>>> hope to tag 3.7.13 soon.
>>>
>>> The current head for release-3.7 is commit bddf6f8. 18 patches have
>>> been merged since 3.7.12 for the following components,
>>>  - gfapi
>>>  - nfs (includes ganesha related changes)
>>>  - glusterd/cli
>>>  - libglusterfs
>>>  - fuse
>>>  - build
>>>  - geo-rep
>>>  - afr
>>>
>>> I need and acknowledgement from the maintainers of the above
>>> components that they are ready.
>>> If any maintainers know of any other issues, please reply here. We'll
>>> decide how to address them for this release here.
>>>
>>> Also, please don't merge anymore changes into release-3.7. If you need
>>> to get something merged, please inform me.
>>>
>>> Thanks,
>>> Kaushal
>>>
>>> > Thanks,
>>> > Kaushal
>>> >
>>> > [1]: https://review.gluster.org/14822
>>> > [2]: https://public.pad.fsfe.org/p/glusterfs-release-process-201606
>>> > under the GlusterFS minor release heading
>>> > [3]: https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.7.13
>>> ___
>>> maintainers mailing list
>>> maintain...@gluster.org
>>> http://www.gluster.org/mailman/listinfo/maintainers
>>
>>
>>
>> --
>> Atin
>> Sent from iPhone
>>
>> ___
>> maintainers mailing list
>> maintain...@gluster.org
>> http://www.gluster.org/mailman/listinfo/maintainers
>>
>
>
>
> --
> Pranith
___
maintainers mailing list
maintain...@gluster.org
http://www.gluster.org/mailman/listinfo/maintainers


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

2016-06-08 Thread Oleksandr Natalenko


OK, here the results go.

I've taken 5 statedumps with 30 mins between each statedump. Also, 
before taking the statedump, I've recorded memory usage.


Memory consumption:

1. root  1010  0.0  9.6 7538188 374864 ?  Ssl  чер07   0:16 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
2. root  1010  0.0  9.6 7825048 375312 ?  Ssl  чер07   0:16 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
3. root  1010  0.0  9.6 7825048 375312 ?  Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
4. root  1010  0.0  9.6 8202064 375892 ?  Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
5. root  1010  0.0  9.6 8316808 376084 ?  Ssl  чер07   0:17 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7


As you may see VIRT constantly grows (except for one measurements), and 
RSS grows as well, although its increase is considerably smaller.


Now lets take a look at statedumps:

1. https://gist.github.com/3fa121c7531d05b210b84d9db763f359
2. https://gist.github.com/87f48b8ac8378262b84d448765730fd9
3. https://gist.github.com/f8780014d8430d67687c70cfd1df9c5c
4. https://gist.github.com/916ac788f806328bad9de5311ce319d7
5. https://gist.github.com/8ba5dbf27d2cc61c04ca954d7fb0a7fd

I'd go with comparing first statedump with last one, and here is diff 
output: https://gist.github.com/e94e7f17fe8b3688c6a92f49cbc15193


I see numbers changing, but now cannot conclude what is meaningful and 
what is meaningless.


Pranith?

08.06.2016 10:06, Pranith Kumar Karampuri написав:

On Wed, Jun 8, 2016 at 12:33 PM, Oleksandr Natalenko
 wrote:


Yup, I can do that, but please note that RSS does not change. Will
statedump show VIRT values?

Also, I'm looking at the numbers now, and see that on each reconnect
VIRT grows by ~24M (once per ~10–15 mins). Probably, that could
give you some idea what is going wrong.


That's interesting. Never saw something like this happen. I would
still like to see if there are any clues in statedump when all this
happens. May be what you said will be confirmed that nothing new is
allocated but I would just like to confirm.


08.06.2016 09:50, Pranith Kumar Karampuri написав:

Oleksandr,
Could you take statedump of the shd process once in 5-10 minutes and
send may be 5 samples of them when it starts to increase? This will
help us find what datatypes are being allocated a lot and can lead
to
coming up with possible theories for the increase.

On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko
 wrote:

Also, I've checked shd log files, and found out that for some
reason
shd constantly reconnects to bricks: [1]

Please note that suggested fix [2] by Pranith does not help, VIRT
value still grows:

===
root  1010  0.0  9.6 7415248 374688 ?  Ssl  чер07   0:14
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
--xlator-option
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===

I do not know the reason why it is reconnecting, but I suspect leak
to happen on that reconnect.

CCing Pranith.

[1] http://termbin.com/brob
[2] http://review.gluster.org/#/c/14053/

06.06.2016 12:21, Kaushal M написав:
Has multi-threaded SHD been merged into 3.7.* by any chance? If
not,

what I'm saying below doesn't apply.

We saw problems when encrypted transports were used, because the RPC
layer was not reaping threads (doing pthread_join) when a connection
ended. This lead to similar observations of huge VIRT and relatively
small RSS.

I'm not sure how multi-threaded shd works, but it could be leaking
threads in a similar way.

On Mon, Jun 6, 2016 at 1:54 P

Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

2016-06-08 Thread Oleksandr Natalenko

Yup, I can do that, but please note that RSS does not change. Will 
statedump show VIRT values?


Also, I'm looking at the numbers now, and see that on each reconnect 
VIRT grows by ~24M (once per ~10–15 mins). Probably, that could give you 
some idea what is going wrong.


08.06.2016 09:50, Pranith Kumar Karampuri написав:

Oleksandr,
Could you take statedump of the shd process once in 5-10 minutes and
send may be 5 samples of them when it starts to increase? This will
help us find what datatypes are being allocated a lot and can lead to
coming up with possible theories for the increase.

On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko
 wrote:


Also, I've checked shd log files, and found out that for some reason
shd constantly reconnects to bricks: [1]

Please note that suggested fix [2] by Pranith does not help, VIRT
value still grows:

===
root  1010  0.0  9.6 7415248 374688 ?  Ssl  чер07   0:14
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
--xlator-option
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===

I do not know the reason why it is reconnecting, but I suspect leak
to happen on that reconnect.

CCing Pranith.

[1] http://termbin.com/brob
[2] http://review.gluster.org/#/c/14053/

06.06.2016 12:21, Kaushal M написав:
Has multi-threaded SHD been merged into 3.7.* by any chance? If
not,

what I'm saying below doesn't apply.

We saw problems when encrypted transports were used, because the RPC
layer was not reaping threads (doing pthread_join) when a connection
ended. This lead to similar observations of huge VIRT and relatively
small RSS.

I'm not sure how multi-threaded shd works, but it could be leaking
threads in a similar way.

On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko
 wrote:
Hello.

We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for
keeping
volumes metadata.

Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:

===
root 15109  0.0 13.7 76552820 535272 ? Ssl  тра26   2:11
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
--xlator-option
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===

that is ~73G. RSS seems to be OK (~522M). Here is the statedump of
glustershd process: [1]

Also, here is sum of sizes, presented in statedump:

===
# cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F
'=' 'BEGIN
{sum=0} /^size=/ {sum+=$2} END {print sum}'
353276406
===

That is ~337 MiB.

Also, here are VIRT values from 2 replica nodes:

===
root 24659  0.0  0.3 5645836 451796 ?  Ssl  тра24   3:28
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/44ec3f29003eccedf894865107d5db90.socket
--xlator-option
*replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
root 18312  0.0  0.3 6137500 477472 ?  Ssl  тра19   6:37
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket
--xlator-option
*replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
===

Those are 5 to 6G, which is much less than dummy node has, but still
look
too big for us.

Should we care about huge VIRT value on dummy node? Also, how one
would
debug that?

Regards,
Oleksandr.

[1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


--

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

2016-06-07 Thread Oleksandr Natalenko

Also, I've checked shd log files, and found out that for some reason shd 
constantly reconnects to bricks: [1]


Please note that suggested fix [2] by Pranith does not help, VIRT value 
still grows:


===
root  1010  0.0  9.6 7415248 374688 ?  Ssl  чер07   0:14 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7

===

I do not know the reason why it is reconnecting, but I suspect leak to 
happen on that reconnect.


CCing Pranith.

[1] http://termbin.com/brob
[2] http://review.gluster.org/#/c/14053/

06.06.2016 12:21, Kaushal M написав:

Has multi-threaded SHD been merged into 3.7.* by any chance? If not,
what I'm saying below doesn't apply.

We saw problems when encrypted transports were used, because the RPC
layer was not reaping threads (doing pthread_join) when a connection
ended. This lead to similar observations of huge VIRT and relatively
small RSS.

I'm not sure how multi-threaded shd works, but it could be leaking
threads in a similar way.

On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko
 wrote:

Hello.

We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for 
keeping

volumes metadata.

Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:

===
root 15109  0.0 13.7 76552820 535272 ? Ssl  тра26   2:11
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket 
--xlator-option

*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===

that is ~73G. RSS seems to be OK (~522M). Here is the statedump of
glustershd process: [1]

Also, here is sum of sizes, presented in statedump:

===
# cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 
'BEGIN

{sum=0} /^size=/ {sum+=$2} END {print sum}'
353276406
===

That is ~337 MiB.

Also, here are VIRT values from 2 replica nodes:

===
root 24659  0.0  0.3 5645836 451796 ?  Ssl  тра24   3:28
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/44ec3f29003eccedf894865107d5db90.socket 
--xlator-option

*replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
root 18312  0.0  0.3 6137500 477472 ?  Ssl  тра19   6:37
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket 
--xlator-option

*replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
===

Those are 5 to 6G, which is much less than dummy node has, but still 
look

too big for us.

Should we care about huge VIRT value on dummy node? Also, how one 
would

debug that?

Regards,
  Oleksandr.

[1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

2016-06-06 Thread Oleksandr Natalenko


Also, I see lots of entries in pmap output:

===
7ef9ff8f3000  4K -   [ anon ]
7ef9ff8f4000   8192K rw---   [ anon ]
7efa000f4000  4K -   [ anon ]
7efa000f5000   8192K rw---   [ anon ]
===

If I sum them, I get the following:

===
# pmap 15109 | grep '[ anon ]' | grep 8192K | wc -l
9261
$ echo "9261*(8192+4)" | bc
75903156
===

Which is something like 70G+ I have got in VIRT.

06.06.2016 11:24, Oleksandr Natalenko написав:

Hello.

We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for
keeping volumes metadata.

Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:

===
root 15109  0.0 13.7 76552820 535272 ? Ssl  тра26   2:11
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
--xlator-option
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===

that is ~73G. RSS seems to be OK (~522M). Here is the statedump of
glustershd process: [1]

Also, here is sum of sizes, presented in statedump:

===
# cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '='
'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}'
353276406
===

That is ~337 MiB.

Also, here are VIRT values from 2 replica nodes:

===
root 24659  0.0  0.3 5645836 451796 ?  Ssl  тра24   3:28
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/44ec3f29003eccedf894865107d5db90.socket
--xlator-option
*replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
root 18312  0.0  0.3 6137500 477472 ?  Ssl  тра19   6:37
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket
--xlator-option
*replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
===

Those are 5 to 6G, which is much less than dummy node has, but still
look too big for us.

Should we care about huge VIRT value on dummy node? Also, how one
would debug that?

Regards,
  Oleksandr.

[1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

2016-06-06 Thread Oleksandr Natalenko

I believe, multi-threaded shd has not been merged at least into 3.7 
branch prior to 3.7.11 (incl.), because I've found this [1].


[1] https://www.gluster.org/pipermail/maintainers/2016-April/000628.html

06.06.2016 12:21, Kaushal M написав:

Has multi-threaded SHD been merged into 3.7.* by any chance? If not,
what I'm saying below doesn't apply.

We saw problems when encrypted transports were used, because the RPC
layer was not reaping threads (doing pthread_join) when a connection
ended. This lead to similar observations of huge VIRT and relatively
small RSS.

I'm not sure how multi-threaded shd works, but it could be leaking
threads in a similar way.

On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko
 wrote:

Hello.

We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for 
keeping

volumes metadata.

Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:

===
root 15109  0.0 13.7 76552820 535272 ? Ssl  тра26   2:11
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket 
--xlator-option

*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
===

that is ~73G. RSS seems to be OK (~522M). Here is the statedump of
glustershd process: [1]

Also, here is sum of sizes, presented in statedump:

===
# cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 
'BEGIN

{sum=0} /^size=/ {sum+=$2} END {print sum}'
353276406
===

That is ~337 MiB.

Also, here are VIRT values from 2 replica nodes:

===
root 24659  0.0  0.3 5645836 451796 ?  Ssl  тра24   3:28
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/44ec3f29003eccedf894865107d5db90.socket 
--xlator-option

*replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
root 18312  0.0  0.3 6137500 477472 ?  Ssl  тра19   6:37
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/glustershd/run/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket 
--xlator-option

*replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
===

Those are 5 to 6G, which is much less than dummy node has, but still 
look

too big for us.

Should we care about huge VIRT value on dummy node? Also, how one 
would

debug that?

Regards,
  Oleksandr.

[1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

2016-06-06 Thread Oleksandr Natalenko


Hello.

We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for 
keeping volumes metadata.


Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:

===
root 15109  0.0 13.7 76552820 535272 ? Ssl  тра26   2:11 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option 
*replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7

===

that is ~73G. RSS seems to be OK (~522M). Here is the statedump of 
glustershd process: [1]


Also, here is sum of sizes, presented in statedump:

===
# cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 
'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}'

353276406
===

That is ~337 MiB.

Also, here are VIRT values from 2 replica nodes:

===
root 24659  0.0  0.3 5645836 451796 ?  Ssl  тра24   3:28 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option 
*replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
root 18312  0.0  0.3 6137500 477472 ?  Ssl  тра19   6:37 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option 
*replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2

===

Those are 5 to 6G, which is much less than dummy node has, but still 
look too big for us.


Should we care about huge VIRT value on dummy node? Also, how one would 
debug that?


Regards,
  Oleksandr.

[1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Idea: Alternate Release process

2016-05-29 Thread Oleksandr Natalenko


30.05.2016 05:08, Sankarshan Mukhopadhyay написав:

It would perhaps be worthwhile to extend this release timeline/cadence
discussion into (a) End-of-Life definition and invocation (b) whether
a 'long term support' (assuming that is what LTS is) is of essentially
any value to users of GlusterFS.

(b) especially can be (and perhaps should be) addressed by predictable
and tested upgrade paths to ensure that users are able to get to newer
releases without much hassles.


I believe 3.7 should be LTS with EOL in 1 year at least because it is 
the last branch released before changes to release process were 
committed.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Idea: Alternate Release process

2016-05-11 Thread Oleksandr Natalenko

My 2 cents on timings etc.

Rationale:

1. deliver new features to users as fast as possible to get the feedback;
2. leave an option of using LTS branch for those who do not want update too 
often.

Definition:

* "stable release" — .0 tag that receives critical bugfixes and security 
updates for 16 weeks;
* "LTS" — .0 tag that receives critical bugfixes and security updates for 1 
year;

New release happens every 8 weeks. Those 8 weeks include:

* merge window for 3 weeks, during this time all ready features get merged 
into master;
* feature freeze on -rc1 tagging;
* 5 weeks of testing, bugfixing and preparing new features;
* tagging .0 stable release.

Example (imaginary versions and dates):

March 1 — 5.0 release, merge window opens
March 22 — 6.0-rc1 release, merge window closes, feature freeze, new -rc each 
week
May 1 — 6.0 release, merge window opens, 5.0 still gets fixes
May 22 — 7.0-rc1 release
July 1 — 7.0 release, merge window closes, no more fixes for 5.0, 6.0 still 
gets fixes
...
September 1 — 8.0 release, LTS, EOT is Sep 1, next year.
...

Backward compatibility should be guaranteed during the time between two 
consecutive LTSes by excessive using of op-version. The user should have a 
possibility to upgrade from one LTS to another preferably with no downtime. 
LTS+1 is not guaranteed to backward compatible with LTS-1.

Pros:

* frequent releases with new features that do not break backward 
compatibility;
* max 2 stable branches supported simultaneously;
* guaranteed LTS branch with guaranteed upgrade to new LTS.

Cons:

* no idea what to do with things that break backward compatibility and that 
couldn't be implemented within op-version constraints (except postponing them 
for too much).
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [RFC] FUSE bridge based on GlusterFS API

2016-04-11 Thread Oleksandr Natalenko


11.04.2016 11:14, Niels de Vos wrote:

I would like to add a detection for a xglfs executable in the
/sbin/mount.glusterfs script. This then makes it possible to replace 
the

original FUSE client with xglfs. If we do something similar in our
regression tests, we can get an idea how stable and feature complete
xglfs is.


I believe adding it to tests prior to adding to packaged 
/sbin/mount.glusterfs script should be the first step.



I assume this is like the FIBMAP-ioctl().


So, obviously, we do not need it.


We actually do have a flush FOP in the GlusterFS protocol and xlators.
But is has not been added to libgfapi. The library calls flush from
glfs_close(). I'm not sure we really need to add glfs_flush() to
libgfapi, most (all?) applications would likely use glfs_fsync() 
anyway?


I've added dummy handler for this fop to always return 0. It shouldn't 
be a big deal to replace it with actual implementation if libgfapi gains 
glfs_flush() support.



There is both glfs_fsync() and glfs_fdatasync(). These match their
fsync() and fdatasync() counterparts.


OK, so they are implemented correctly in xglfs now.


* .fsyncdir fop (again, wtf?);


This is like calling fsync() on a directory. It guarantees that changes
in the directory (new/unlinked files) are persistent.


Cannot find similar function in GlusterFS API. Should it be implemented 
first, or we are fine to proceed with dummy handler returning 0?



* WHERE IS MY glfs_truncate()?


Almost there, Jeff sent a patch. We just need a bug to linke the patch
against.


Saw that, many thanks to Jeff for accepting the challenge :).


Would you be willing to move the GitHub repository under
github.com/gluster ? It gives a little move visibility in our community
that way.


See no issue with that. Ping me in IRC to arrange the movement.

Regards,
  post-factum
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [RFC] FUSE bridge based on GlusterFS API

2016-04-07 Thread Oleksandr Natalenko

On четвер, 7 квітня 2016 р. 16:12:07 EEST Jeff Darcy wrote:
> "Considered wrong" might be overstating the case.  It might be useful to
> keep in mind that the fuse-bridge code predates GFAPI by a considerable
> amount.  In fact, significant parts of GFAPI were borrowed from the
> existing fuse-bridge code.  At the time, there was a choice between
> FUSE's path-based and inode-based APIs.  Using the path-based API has
> always been easier, requiring less code.  We can see this effect in the
> fact that xglfs is a little less than 1/3 as code as fuse-bridge.

Sure, API came years later after FUSE bridge, and I understand that. Also, 
that explains to me some kind of code doubling, when FUSE bridge is 
(completely) separate entity from API doing pretty much the same (as I 
thought) thing.

> On the other hand, and again at the time, there were some pretty good
> reasons to eschew the path-based API.  I don't remember all of those
> reasons (some of them predate even my involvement with the project) but
> I'm pretty sure performance was chief among them.  The path-based API is
> completely synchronous, which would have been utterly disastrous for
> performance prior to syncops (which of course came later).  Even with
> syncops, it's not clear whether that gap has been or can be closed.  If
> we ever wanted to consider switching fully to the path-based API, we'd
> certainly need to examine that issue closely.  Other issues that
> differentiate the two APIs might include:
> 
>   * Access to unlinked files (which have no path).
> 
>   * Levels of control over various forms of caching.
> 
>   * Ability to use reverse invalidation.
> 
>   * Ability to support SELinux (which does nasty stuff during mount).
> 
>   * Other ops that might be present only in the inode API.
> 
>   * Security.
> 
> Perhaps we should ping Miklos Szeredi (kernel FUSE maintainer, now at
> Red Hat) about some of these.

Also, Soumya pointed me to handles API (/usr/include/glusterfs/api/glfs-
handles.h). If I got it correctly, probably, it could be used instead of path-
based API for FUSE bridge? I have briefly looked at it, but the article about 
NFS handles (again, supplied to me by Soumya) remains unread so far :). Does 
handles API represend inode API you are talking about? Then, also, we 
shouldn't use highlevel FUSE API and stick to lowlevel instead as it (AFAIK, 
correct me if I'm wrong) operates on inodes as well.

> > * FUSE .bmap fop (wtf is that?);
> > * trickery with .fgetattr (do we need that trickery?);
> 
> Not sure what you mean here.  Do you mean fgetxattr?

I mean .fgetattr calling .getattr for / and glfs_fstat() for everything else. 
Not sure why it happens. BBFS code says:

===
// On FreeBSD, trying to do anything with the mountpoint ends up
// opening it, and then using the FD for an fgetattr.  So in the
// special case of a path of "/", I need to do a getattr on the
// underlying root directory instead of doing the fgetattr().
===

Probably, I just wanted to note that (in case my toy could become portable 
across *nixes).

> 
> > * .flush fop (no GlusterFS equivalent?);
> > * fsync/fdatasync difference for GlusterFS;
> > * .fsyncdir fop (again, wtf?);
> 
> I suspect these are related to the path-based vs. inode-based issue.
> Fact is, the VFS calls and syscalls have never lined up entirely, and it
> shows up in differences like these.

I definitely need help with handles API if that is the right thing to address 
to.

> That just seems like a bug.  There should be one.

That is definitely the bug. /usr/include/glusterfs/api/glfs.h clearly defines 
it:

===
int glfs_truncate (glfs_t *fs, const char *path, off_t length) __THROW
GFAPI_PUBLIC(glfs_truncate, 3.4.0);
===

But linking executable with a call to glfs_truncate() results in error:

===
CMakeFiles/xglfs.dir/xglfs_truncate.c.o: In function `xglfs_truncate':
/home/pf/work/devel/own/xglfs/xglfs_truncate.c:31: undefined reference to 
`glfs_truncate'
===

The bug was discussed more than year ago [1], but it seems there is no 
solution so far.

Thanks.

Regards,
  post-factum

[1] http://irclog.perlgeek.de/gluster-dev/2015-01-25/text

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] [RFC] FUSE bridge based on GlusterFS API

2016-04-07 Thread Oleksandr Natalenko

Hello.

Pranith and Niels encouraged me to share here my toy project of simple FUSE 
bridge that uses GlusterFS API [1].

The rationale for this is that FUSE bridge currently present in GlusterFS code 
does not use GlusterFS API, and that is considered to be wrong, and there are 
some plans to replace it with modern solution. xglfs code could potentially go 
under glusterfs tree if the developers decide it should happen. Also, it could 
be rpm'ed and suggested to Fedora users.

Now xglfs it is just a separate executable that relies on glusterfs-devel and 
fuse-devel packages and does simple conversion between FUSE VFS calls and 
GlusterFS API. Thanks to API completeness (well, glfs_truncate() is an 
exception, AFAIK), this custom bridge is really thin and small.

As a guide I used Big Brother File System code by Joseph J. Pfeiffer, Jr. [2] 
that is freely available in the Internet (version 2014-06-12, but newer 
version has been released recently). However, I've adopted it to current FUSE 
libs reality just by inspecting /usr/include/fuse/fuse.h carefully and 
defining FUSE_USE_VERSION=26 explicitly.

What I would like for reviewers to pay an attention for:

* error path handling correctness (mostly, negated errno value is returned, is 
that correct?);
* fops semantic correctness;
* everything else you would like to comment on or suggest.

The code itself has been verified by GCC, Clang (+analyzer), Intel C Compiler, 
cppcheck and Valgrind. No idea what could go wrong there :). However, I'm not 
responsible for data damage caused by this project, of course.

Some things remain not so clear to me:

* FUSE .bmap fop (wtf is that?);
* trickery with .fgetattr (do we need that trickery?);
* .flush fop (no GlusterFS equivalent?);
* fsync/fdatasync difference for GlusterFS;
* .fsyncdir fop (again, wtf?);
* WHERE IS MY glfs_truncate()?

Feel free to happily accept this project or ignore it silently. Nevertheless, 
I would be happy to see your pull requests or comments, or even results of 
some test you might want to perform on you critical production. Also, I know 
that Soumya has already tried xglfs, and I would be glad if she shares some 
experience on it.

Best wishes,
  post-factum

[1] https://github.com/pfactum/xglfs
[2] http://www.cs.nmsu.edu/~pfeiffer/fuse-tutorial/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Arbiter brick size estimation

2016-03-19 Thread Oleksandr Natalenko

And for 256b inode:

(597904 - 33000) / (1066036 - 23) == 530 bytes per inode.

So I still consider 1k to be good estimation for average workload.

Regards,
  Oleksandr.

On четвер, 17 березня 2016 р. 09:58:14 EET Ravishankar N wrote:
> Looks okay to me Oleksandr. You might want to make a github gist of your
> tests+results as a reference for others.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Arbiter brick size estimation

2016-03-19 Thread Oleksandr Natalenko

Ravi, I will definitely arrange the results into some short handy 
document and post it here.


Also, @JoeJulian on IRC suggested me to perform this test on XFS bricks 
with inode size of 256b and 1k:


===
22:38 <@JoeJulian> post-factum: Just wondering what 256 byte inodes 
might look like for that. And, by the same token, 1k inodes.

22:39 < post-factum> JoeJulian: should I try 1k inodes instead?
22:41 <@JoeJulian> post-factum: Doesn't hurt to try. My expectation is 
that disk usage will go up despite inode usage going down.

22:41 < post-factum> JoeJulian: ok, will check that
22:41 <@JoeJulian> post-factum: and with 256, I'm curious if inode usage 
will stay close to the same while disk usage goes down.

===

Here are the results for 1k:

(1171336 - 33000) / (1066036 - 23) == 1068 bytes per inode.

Disk usage is indeed higher (1.2G), but inodes usage is the same.

Will test with 256b inode now.

17.03.2016 06:28, Ravishankar N wrote:

Looks okay to me Oleksandr. You might want to make a github gist of
your tests+results as a reference for others.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Arbiter brick size estimation

2016-03-19 Thread Oleksandr Natalenko

OK, I've repeated the test with the following hierarchy:

* 10 top-level folders with 10 second-level folders each;
* 10 000 files in each second-level folder.

So, this composes 10×10×1=1M files and 100 folders

Initial brick used space: 33 M
Initial inodes count: 24

After test:

* each brick in replica took 18G, and the arbiter brick took 836M;
* inodes count: 1066036

So:

(836 - 33) / (1066036 - 24) == 790 bytes per inode.

So, yes, it is slightly bigger value than with previous test due to, I guess, 
lots of files in one folder, but it is still too far from 4k. Given a good 
engineer should consider 30% reserve, the ratio is about 1k per stored inode.

Correct me if I'm missing something (regarding average workload and not corner 
cases).

Test script is here: [1]

Regards,
  Oleksandr.

[1] http://termbin.com/qlvz

On вівторок, 8 березня 2016 р. 19:13:05 EET Ravishankar N wrote:
> On 03/05/2016 03:45 PM, Oleksandr Natalenko wrote:
> > In order to estimate GlusterFS arbiter brick size, I've deployed test
> > setup
> > with replica 3 arbiter 1 volume within one node. Each brick is located on
> > separate HDD (XFS with inode size == 512). Using GlusterFS v3.7.6 +
> > memleak
> > patches. Volume options are kept default.
> > 
> > Here is the script that creates files and folders in mounted volume: [1]
> > 
> > The script creates 1M of files of random size (between 1 and 32768 bytes)
> > and some amount of folders. After running it I've got 1036637 folders.
> > So, in total it is 2036637 files and folders.
> > 
> > The initial used space on each brick is 42M . After running script I've
> > got:
> > 
> > replica brick 1 and 2: 19867168 kbytes == 19G
> > arbiter brick: 1872308 kbytes == 1.8G
> > 
> > The amount of inodes on each brick is 3139091. So here goes estimation.
> > 
> > Dividing arbiter used space by files+folders we get:
> > 
> > (1872308 - 42000)/2036637 == 899 bytes per file or folder
> > 
> > Dividing arbiter used space by inodes we get:
> > 
> > (1872308 - 42000)/3139091 == 583 bytes per inode
> > 
> > Not sure about what calculation is correct.
> 
> I think the first one is right because you still haven't used up all the
> inodes.(2036637 used vs. the max. permissible 3139091). But again this
> is an approximation because not all files would be 899 bytes. For
> example if there are a thousand files present in a directory, then du
>  would be more than du  because the directory will take
> some disk space to store the dentries.
> 
> >   I guess we should consider the one
> > 
> > that accounts inodes because of .glusterfs/ folder data.
> > 
> > Nevertheless, in contrast, documentation [2] says it should be 4096 bytes
> > per file. Am I wrong with my calculations?
> 
> The 4KB is a conservative estimate considering the fact that though the
> arbiter brick does not store data, it still keeps a copy of both user
> and gluster xattrs. For example, if the application sets a lot of
> xattrs, it can consume a data block if they cannot be accommodated on
> the inode itself.  Also there is the .glusterfs folder like you said
> which would take up some space. Here is what I tried on an XFS brick:
> [root@ravi4 brick]# touch file
> 
> [root@ravi4 brick]# ls -l file
> -rw-r--r-- 1 root root 0 Mar  8 12:54 file
> 
> [root@ravi4 brick]# du file
> *0   file**
> *
> [root@ravi4 brick]# for i in {1..100}
> 
>  > do
>  > setfattr -n user.value$i -v value$i file
>  > done
> 
> [root@ravi4 brick]# ll -l file
> -rw-r--r-- 1 root root 0 Mar  8 12:54 file
> 
> [root@ravi4 brick]# du -h file
> *4.0Kfile**
> *
> Hope this helps,
> Ravi
> 
> > Pranith?
> > 
> > [1] http://termbin.com/ka9x
> > [2]
> > http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-vo
> > lumes-and-quorum/ ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Arbiter brick size estimation

2016-03-18 Thread Oleksandr Natalenko

Ravi,

here is the summary: [1]

Regards,
  Oleksandr.

[1] https://gist.github.com/e8265ca07f7b19f30bb3

On четвер, 17 березня 2016 р. 09:58:14 EET Ravishankar N wrote:
> On 03/16/2016 10:57 PM, Oleksandr Natalenko wrote:
> > OK, I've repeated the test with the following hierarchy:
> > 
> > * 10 top-level folders with 10 second-level folders each;
> > * 10 000 files in each second-level folder.
> > 
> > So, this composes 10×10×1=1M files and 100 folders
> > 
> > Initial brick used space: 33 M
> > Initial inodes count: 24
> > 
> > After test:
> > 
> > * each brick in replica took 18G, and the arbiter brick took 836M;
> > * inodes count: 1066036
> > 
> > So:
> > 
> > (836 - 33) / (1066036 - 24) == 790 bytes per inode.
> > 
> > So, yes, it is slightly bigger value than with previous test due to, I
> > guess, lots of files in one folder, but it is still too far from 4k.
> > Given a good engineer should consider 30% reserve, the ratio is about 1k
> > per stored inode.
> > 
> > Correct me if I'm missing something (regarding average workload and not
> > corner cases).
> 
> Looks okay to me Oleksandr. You might want to make a github gist of your
> tests+results as a reference for others.
> Regards,
> Ravi
> 
> > Test script is here: [1]
> > 
> > Regards,
> > 
> >Oleksandr.
> > 
> > [1] http://termbin.com/qlvz
> > 
> > On вівторок, 8 березня 2016 р. 19:13:05 EET Ravishankar N wrote:
> >> On 03/05/2016 03:45 PM, Oleksandr Natalenko wrote:
> >>> In order to estimate GlusterFS arbiter brick size, I've deployed test
> >>> setup
> >>> with replica 3 arbiter 1 volume within one node. Each brick is located
> >>> on
> >>> separate HDD (XFS with inode size == 512). Using GlusterFS v3.7.6 +
> >>> memleak
> >>> patches. Volume options are kept default.
> >>> 
> >>> Here is the script that creates files and folders in mounted volume: [1]
> >>> 
> >>> The script creates 1M of files of random size (between 1 and 32768
> >>> bytes)
> >>> and some amount of folders. After running it I've got 1036637 folders.
> >>> So, in total it is 2036637 files and folders.
> >>> 
> >>> The initial used space on each brick is 42M . After running script I've
> >>> got:
> >>> 
> >>> replica brick 1 and 2: 19867168 kbytes == 19G
> >>> arbiter brick: 1872308 kbytes == 1.8G
> >>> 
> >>> The amount of inodes on each brick is 3139091. So here goes estimation.
> >>> 
> >>> Dividing arbiter used space by files+folders we get:
> >>> 
> >>> (1872308 - 42000)/2036637 == 899 bytes per file or folder
> >>> 
> >>> Dividing arbiter used space by inodes we get:
> >>> 
> >>> (1872308 - 42000)/3139091 == 583 bytes per inode
> >>> 
> >>> Not sure about what calculation is correct.
> >> 
> >> I think the first one is right because you still haven't used up all the
> >> inodes.(2036637 used vs. the max. permissible 3139091). But again this
> >> is an approximation because not all files would be 899 bytes. For
> >> example if there are a thousand files present in a directory, then du
> >>  would be more than du  because the directory will take
> >> some disk space to store the dentries.
> >> 
> >>>I guess we should consider the one
> >>> 
> >>> that accounts inodes because of .glusterfs/ folder data.
> >>> 
> >>> Nevertheless, in contrast, documentation [2] says it should be 4096
> >>> bytes
> >>> per file. Am I wrong with my calculations?
> >> 
> >> The 4KB is a conservative estimate considering the fact that though the
> >> arbiter brick does not store data, it still keeps a copy of both user
> >> and gluster xattrs. For example, if the application sets a lot of
> >> xattrs, it can consume a data block if they cannot be accommodated on
> >> the inode itself.  Also there is the .glusterfs folder like you said
> >> which would take up some space. Here is what I tried on an XFS brick:
> >> [root@ravi4 brick]# touch file
> >> 
> >> [root@ravi4 brick]# ls -l file
> >> -rw-r--r-- 1 root root 0 Mar  8 12:54 file
> >> 
> >> [root@ravi4 brick]# du file
> >> *0   file**
> >> *
> >> [root@ravi4 brick]# for i in {1..100}
> >> 
> >>   > do
> >>   > setfattr -n user.value$i -v value$i file
> >>   > done
> >> 
> >> [root@ravi4 brick]# ll -l file
> >> -rw-r--r-- 1 root root 0 Mar  8 12:54 file
> >> 
> >> [root@ravi4 brick]# du -h file
> >> *4.0Kfile**
> >> *
> >> Hope this helps,
> >> Ravi
> >> 
> >>> Pranith?
> >>> 
> >>> [1] http://termbin.com/ka9x
> >>> [2]
> >>> http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-v
> >>> o
> >>> lumes-and-quorum/ ___
> >>> Gluster-devel mailing list
> >>> Gluster-devel@gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Arbiter brick size estimation

2016-03-08 Thread Oleksandr Natalenko

Hi.

On вівторок, 8 березня 2016 р. 19:13:05 EET Ravishankar N wrote:
> I think the first one is right because you still haven't used up all the
> inodes.(2036637 used vs. the max. permissible 3139091). But again this
> is an approximation because not all files would be 899 bytes. For
> example if there are a thousand files present in a directory, then du
>  would be more than du  because the directory will take
> some disk space to store the dentries.

I believe you've got me wrong. 2036637 is the number of files+folders. 3139091 
is the amount of inodes actually allocated on the underlying FS (according to 
df -i information). The max. inodes number is much higher than that, and I do 
not take it into account.

Also, probably, I should recheck the results for 1000 files per folder to make 
it sure.

> The 4KB is a conservative estimate considering the fact that though the
> arbiter brick does not store data, it still keeps a copy of both user
> and gluster xattrs. For example, if the application sets a lot of
> xattrs, it can consume a data block if they cannot be accommodated on
> the inode itself.  Also there is the .glusterfs folder like you said
> which would take up some space. Here is what I tried on an XFS brick:

4KB as upper level sounds and looks reasonable to me, thanks. But the average 
value will be still lower, I believe, as it is uncommon for apps to set lots 
of xattrs, especially for ordinary deployment.

Regards,
  Oleksandr.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Arbiter brick size estimation

2016-03-05 Thread Oleksandr Natalenko

In order to estimate GlusterFS arbiter brick size, I've deployed test setup 
with replica 3 arbiter 1 volume within one node. Each brick is located on 
separate HDD (XFS with inode size == 512). Using GlusterFS v3.7.6 + memleak 
patches. Volume options are kept default.

Here is the script that creates files and folders in mounted volume: [1]

The script creates 1M of files of random size (between 1 and 32768 bytes) and 
some amount of folders. After running it I've got 1036637 folders. So, in 
total it is 2036637 files and folders.

The initial used space on each brick is 42M . After running script I've got:

replica brick 1 and 2: 19867168 kbytes == 19G
arbiter brick: 1872308 kbytes == 1.8G

The amount of inodes on each brick is 3139091. So here goes estimation.

Dividing arbiter used space by files+folders we get:

(1872308 - 42000)/2036637 == 899 bytes per file or folder

Dividing arbiter used space by inodes we get:

(1872308 - 42000)/3139091 == 583 bytes per inode

Not sure about what calculation is correct. I guess we should consider the one 
that accounts inodes because of .glusterfs/ folder data.

Nevertheless, in contrast, documentation [2] says it should be 4096 bytes per 
file. Am I wrong with my calculations?

Pranith?

[1] http://termbin.com/ka9x
[2] 
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] 3.7.8 client is slow

2016-02-22 Thread Oleksandr Natalenko

David,

could you please cross-post your observations to the following bugreport:

https://bugzilla.redhat.com/show_bug.cgi?id=1309462

?

It seems you have faced similar issue.

On понеділок, 22 лютого 2016 р. 16:46:01 EET David Robinson wrote:
> The 3.7.8 FUSE client is significantly slower than 3.7.6.  Is this
> related to some of the fixes that were done to correct memory leaks?  Is
> there anything that I can do to recover the performance of 3.7.6?
> 
> My testing involved creating a "bigfile" that is 20GB.  I then installed
> the 3.6.6 FUSE client and tested the copy of the bigfile from one
> gluster machine to another.  The test was repeated 2x to make sure cache
> wasn't affect performance.
> 
> Using Centos7.1
> FUSE 3.6.6 took 47-seconds and 38-seconds.
> FUSE 3.7.6 took 43-seconds and 34-seconds.
> FUSE 3.7.8 took 205-seconds and 224-seconds
> 
> I repeated the test on another machine that is running centos 6.7 and
> the results were even worse.  98-seconds for FUSE 3.6.6 versus
> 575-seconds for FUSE 3.7.8.
> 
> My server setup is:
> 
> Volume Name: gfsbackup
> Type: Distribute
> Volume ID: 29b8fae9-dfbf-4fa4-9837-8059a310669a
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: ffib01bkp:/data/brick01/gfsbackup
> Brick2: ffib01bkp:/data/brick02/gfsbackup
> Options Reconfigured:
> performance.readdir-ahead: on
> cluster.rebal-throttle: aggressive
> diagnostics.client-log-level: WARNING
> diagnostics.brick-log-level: WARNING
> changelog.changelog: off
> client.event-threads: 8
> server.event-threads: 8
> 
> David
> 
> 
> 
> 
> 
> 
> 
> David F. Robinson, Ph.D.
> 
> President - Corvid Technologies
> 
> 145 Overhill Drive
> 
> Mooresville, NC 28117
> 
> 704.799.6944 x101   [Office]
> 
> 704.252.1310   [Cell]
> 
> 704.799.7974   [Fax]
> 
> david.robin...@corvidtec.com
> 
> http://www.corvidtec.com


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS v3.7.8 client leaks summary — part II

2016-02-16 Thread Oleksandr Natalenko

Hmm, OK. I've rechecked 3.7.8 with the following patches (latest 
revisions):


===
Soumya Koduri (3):
  gfapi: Use inode_forget in case of handle objects
  inode: Retire the inodes from the lru list in inode_table_destroy
  rpc: Fix for rpc_transport_t leak
===

Here is Valgrind output: [1]

It seems that all leaks are gone, and that is very nice.

Many thanks to all devs.

[1] https://gist.github.com/anonymous/eddfdaf3eb7bff458326

16.02.2016 15:30, Soumya Koduri wrote:

I have tested using your API app (I/Os done - create,write and stat).
I still do not see any inode related leaks. However I posted another
fix for rpc_transport object related leak [1].

I request you to re-check if you have the latest patch of [2] applied
in your build.

[1] http://review.gluster.org/#/c/13456/
[2] http://review.gluster.org/#/c/13125/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS v3.7.8 client leaks summary — part II

2016-02-11 Thread Oleksandr Natalenko


And "API" test.

I used custom API app [1] and did brief file manipulations through it 
(create/remove/stat).


Then I performed drop_caches, finished API [2] and got the following 
Valgrind log [3].


I believe there are still some leaks occurring in glfs_lresolve() call 
chain.


Soumya?

[1] https://github.com/pfactum/xglfs
[2] https://github.com/pfactum/xglfs/blob/master/xglfs_destroy.c#L30
[3] https://gist.github.com/aec72b6164a695cf2d44

11.02.2016 10:12, Oleksandr Natalenko написав:

And here goes "rsync" test results (v3.7.8 + two patches by Soumya).

2 volumes involved: source and target.

=== Common indicators ===

slabtop before drop_caches: [1]
slabtop after drop_caches: [2]

=== Source volume (less interesting part) ===

RAM usage before drop_caches: [3]
statedump before drop_caches: [4]
RAM usage after drop_caches: [5]
statedump after drop_caches: [6]

=== Target volume (most interesting part) ===

RAM usage before drop_caches: [7]
statedump before drop_caches: [8]
RAM usage after drop_caches: [9]
statedump after drop_caches: [10]
Valgrind output: [11]

=== Conclusion ===

Again, see no obvious leaks.

[1] https://gist.github.com/e72fd30a4198dd630299
[2] https://gist.github.com/78ef9eae3dc16fd79c1b
[3] https://gist.github.com/4ed75e8d6cb40a1369d8
[4] https://gist.github.com/20a75d32db76795b90d4
[5] https://gist.github.com/0772959834610dfdaf2d
[6] https://gist.github.com/a71684bd3745c77c41eb
[7] https://gist.github.com/2c9be083cfe3bffe6cec
[8] https://gist.github.com/0102a16c94d3d8eb82e3
[9] https://gist.github.com/23f057dc8e4b2902bba1
[10] https://gist.github.com/385bbb95ca910ec9766f
[11] https://gist.github.com/685c4d3e13d31f597722

10.02.2016 15:37, Oleksandr Natalenko написав:

Hi, folks.

Here go new test results regarding client memory leak.

I use v3.7.8 with the following patches:

===
Soumya Koduri (2):
  inode: Retire the inodes from the lru list in 
inode_table_destroy

  gfapi: Use inode_forget in case of handle objects
===

Those are the only 2 not merged yet.

So far, I've performed only "find" test, and here are the results:

RAM usage before drop_caches: [1]
statedump before drop_caches: [2]
slabtop before drop_caches: [3]
RAM usage after drop_caches: [4]
statedump after drop_caches: [5]
slabtop after drop_caches: [6]
Valgrind output: [7]

No leaks either via statedump or via valgrind. However, statedump
stats still suffer from integer overflow.

Next steps I'm going to take:

1) "rsync" test;
2) API test.

[1] https://gist.github.com/88d2fa95c28baeb2543f
[2] https://gist.github.com/4f3e93ff2db6e3cf4081
[3] https://gist.github.com/62791a2c4258041ba821
[4] https://gist.github.com/1d3ce95a493d054bbac2
[5] https://gist.github.com/fa855a2752d3691365a7
[6] https://gist.github.com/84e9e27d2a2e5ff5dc33
[7] https://gist.github.com/f35bd32a5159d3571d3a
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GlusterFS v3.7.8 client leaks summary — part II

2016-02-11 Thread Oleksandr Natalenko


And here goes "rsync" test results (v3.7.8 + two patches by Soumya).

2 volumes involved: source and target.

=== Common indicators ===

slabtop before drop_caches: [1]
slabtop after drop_caches: [2]

=== Source volume (less interesting part) ===

RAM usage before drop_caches: [3]
statedump before drop_caches: [4]
RAM usage after drop_caches: [5]
statedump after drop_caches: [6]

=== Target volume (most interesting part) ===

RAM usage before drop_caches: [7]
statedump before drop_caches: [8]
RAM usage after drop_caches: [9]
statedump after drop_caches: [10]
Valgrind output: [11]

=== Conclusion ===

Again, see no obvious leaks.

[1] https://gist.github.com/e72fd30a4198dd630299
[2] https://gist.github.com/78ef9eae3dc16fd79c1b
[3] https://gist.github.com/4ed75e8d6cb40a1369d8
[4] https://gist.github.com/20a75d32db76795b90d4
[5] https://gist.github.com/0772959834610dfdaf2d
[6] https://gist.github.com/a71684bd3745c77c41eb
[7] https://gist.github.com/2c9be083cfe3bffe6cec
[8] https://gist.github.com/0102a16c94d3d8eb82e3
[9] https://gist.github.com/23f057dc8e4b2902bba1
[10] https://gist.github.com/385bbb95ca910ec9766f
[11] https://gist.github.com/685c4d3e13d31f597722

10.02.2016 15:37, Oleksandr Natalenko написав:

Hi, folks.

Here go new test results regarding client memory leak.

I use v3.7.8 with the following patches:

===
Soumya Koduri (2):
  inode: Retire the inodes from the lru list in inode_table_destroy
  gfapi: Use inode_forget in case of handle objects
===

Those are the only 2 not merged yet.

So far, I've performed only "find" test, and here are the results:

RAM usage before drop_caches: [1]
statedump before drop_caches: [2]
slabtop before drop_caches: [3]
RAM usage after drop_caches: [4]
statedump after drop_caches: [5]
slabtop after drop_caches: [6]
Valgrind output: [7]

No leaks either via statedump or via valgrind. However, statedump
stats still suffer from integer overflow.

Next steps I'm going to take:

1) "rsync" test;
2) API test.

[1] https://gist.github.com/88d2fa95c28baeb2543f
[2] https://gist.github.com/4f3e93ff2db6e3cf4081
[3] https://gist.github.com/62791a2c4258041ba821
[4] https://gist.github.com/1d3ce95a493d054bbac2
[5] https://gist.github.com/fa855a2752d3691365a7
[6] https://gist.github.com/84e9e27d2a2e5ff5dc33
[7] https://gist.github.com/f35bd32a5159d3571d3a
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] GlusterFS v3.7.8 client leaks summary — part II

2016-02-10 Thread Oleksandr Natalenko


Hi, folks.

Here go new test results regarding client memory leak.

I use v3.7.8 with the following patches:

===
Soumya Koduri (2):
  inode: Retire the inodes from the lru list in inode_table_destroy
  gfapi: Use inode_forget in case of handle objects
===

Those are the only 2 not merged yet.

So far, I've performed only "find" test, and here are the results:

RAM usage before drop_caches: [1]
statedump before drop_caches: [2]
slabtop before drop_caches: [3]
RAM usage after drop_caches: [4]
statedump after drop_caches: [5]
slabtop after drop_caches: [6]
Valgrind output: [7]

No leaks either via statedump or via valgrind. However, statedump stats 
still suffer from integer overflow.


Next steps I'm going to take:

1) "rsync" test;
2) API test.

[1] https://gist.github.com/88d2fa95c28baeb2543f
[2] https://gist.github.com/4f3e93ff2db6e3cf4081
[3] https://gist.github.com/62791a2c4258041ba821
[4] https://gist.github.com/1d3ce95a493d054bbac2
[5] https://gist.github.com/fa855a2752d3691365a7
[6] https://gist.github.com/84e9e27d2a2e5ff5dc33
[7] https://gist.github.com/f35bd32a5159d3571d3a
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GlusterFS FUSE client leaks summary — part I

2016-02-02 Thread Oleksandr Natalenko


Here goes the report on DHT-related leaks patch ("rsync" test).

RAM usage before drop_caches: [1]
Statedump before drop_caches: [2]
RAM usage after drop_caches: [3]
Statedump after drop_caches: [4]
Statedumps diff: [5]
Valgrind output: [6]

[1] https://gist.github.com/ca8d56834c14c4bfa98e
[2] https://gist.github.com/06dc910d7261750d486c
[3] https://gist.github.com/c482b170848a21b6e5f3
[4] https://gist.github.com/ed7f56336b4cbf39f7e8
[5] https://gist.github.com/f8597f34b56d949f7dcb
[6] https://gist.github.com/102fc2d2dfa2d2d179fa

I guess, the patch works.

29.01.2016 23:11, Vijay Bellur написав:

On 01/29/2016 01:09 PM, Oleksandr Natalenko wrote:

Here is intermediate summary of current memory leaks in FUSE client
investigation.

I use GlusterFS v3.7.6 release with the following patches:

===
Kaleb S KEITHLEY (1):
   fuse: use-after-free fix in fuse-bridge, revisited

Pranith Kumar K (1):
   mount/fuse: Fix use-after-free crash

Soumya Koduri (3):
   gfapi: Fix inode nlookup counts
   inode: Retire the inodes from the lru list in 
inode_table_destroy

   upcall: free the xdr* allocations
===

With those patches we got API leaks fixed (I hope, brief tests show 
that) and
got rid of "kernel notifier loop terminated" message. Nevertheless, 
FUSE

client still leaks.

I have several test volumes with several million of small files 
(100K…2M in

average). I do 2 types of FUSE client testing:

1) find /mnt/volume -type d
2) rsync -av -H /mnt/source_volume/* /mnt/target_volume/

And most up-to-date results are shown below:

=== find /mnt/volume -type d ===

Memory consumption: ~4G
Statedump: https://gist.github.com/10cde83c63f1b4f1dd7a
Valgrind: https://gist.github.com/097afb01ebb2c5e9e78d

I guess, fuse-bridge/fuse-resolve. related.

=== rsync -av -H /mnt/source_volume/* /mnt/target_volume/ ===

Memory consumption: ~3.3...4G
Statedump (target volume): 
https://gist.github.com/31e43110eaa4da663435

Valgrind (target volume): https://gist.github.com/f8e0151a6878cacc9b1a

I guess, DHT-related.

Give me more patches to test :).


Thank you as ever for your detailed reports!

This patch should help the dht leaks observed as part of
dht_do_rename() in valgrind logs of target volume.

http://review.gluster.org/#/c/13322/

Can you please verify if this indeed helps?

Regards,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GlusterFS FUSE client leaks summary — part I

2016-02-02 Thread Oleksandr Natalenko


02.02.2016 10:07, Xavier Hernandez написав:
Could it be memory used by Valgrind itself to track glusterfs' memory 
usage ?


Could you repeat the test without Valgrind and see if the memory usage
after dropping caches returns to low values ?


Yup. Here are the results:

===
pf@server:~ » ps aux | grep volume
root 19412 14.4 10.0 5416964 4971692 ? Ssl  10:15  36:32 
/usr/sbin/glusterfs --volfile-server=server.example.com 
--volfile-id=volume /mnt/volume


pf@server:~ » echo 2 | sudo tee /proc/sys/vm/drop_caches
2

pf@server:~ » ps aux | grep volume
root 19412 13.6  3.5 2336772 1740804 ? Ssl  10:15  36:53 
/usr/sbin/glusterfs --volfile-server=server.example.com 
--volfile-id=volume /mnt/volume

===

Dropped from 4.9G to 1.7G. But fresh mount consumes only 25M 
(megabytes):


===
root 23347  0.7  0.0 698376 25124 ?Ssl  14:49   0:00 
/usr/sbin/glusterfs --volfile-server=server.example.com 
--volfile-id=volume /mnt/volume

===

Why?




Examining statedump shows only the following snippet
with high "size" value:

===
[mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
size=4234592647
num_allocs=1
max_size=4294935223
max_num_allocs=3
total_allocs=4186991
===

Another leak?

Grepping "gf_fuse_mt_iov_base" on GlusterFS source tree shows the 
following:


===
$ grep -Rn gf_fuse_mt_iov_base
xlators/mount/fuse/src/fuse-mem-types.h:20:
gf_fuse_mt_iov_base,

xlators/mount/fuse/src/fuse-bridge.c:4887:
gf_fuse_mt_iov_base);
===

fuse-bridge.c snippet:

===
 /* Add extra 128 byte to the first iov so that it can
  * accommodate "ordinary" non-write requests. It's 
not
  * guaranteed to be big enough, as SETXATTR and 
namespace
  * operations with very long names may grow behind 
it,
  * but it's good enough in most cases (and we can 
handle

  * rest via realloc).
  */
 iov_in[0].iov_base = GF_CALLOC (1, msg0_size,
 gf_fuse_mt_iov_base);
===

Probably, some freeing missing for iov_base?


This is not a real memory leak. It's only a bad accounting of memory.
Note that num_allocs is 1. If you look at libglusterfs/src/mem-pool.c,
you will see this:


/* TBD: it would be nice to adjust the memory accounting info here,
 * but calling gf_mem_set_acct_info here is wrong because it bumps
 * up counts as though this is a new allocation - which it's not.
 * The consequence of doing nothing here is only that the sizes will be
 * wrong, but at least the counts won't be.
uint32_t   type = 0;
xlator_t  *xl = NULL;
type = header->type;
xl = (xlator_t *) header->xlator;
gf_mem_set_acct_info (xl, &new_ptr, size, type, NULL);
*/

This means that memory reallocs are not correctly accounted, so the
tracked size is incorrect (note that fuse_thread_proc() calls
GF_REALLOC() in some cases).

There are two problems here:

1. The memory is allocated with a given size S1, then reallocated with
a size S2 (S2 > S1), but not accounted, so the memory accounting
system still thinks that the allocated size is S1. When memory is
freed, S2 is substracted from the total size used. With enough
allocs/reallocs/frees, this value becomes negative.

2. statedump shows the 64-bit 'size' field representing the total
memory used by a given type as an unsigned 32-bit value, loosing some
information.

Xavi



[1] https://gist.github.com/f0cf98e8bff0c13ea38f
[2] https://gist.github.com/87baa0a778ba54f0f7f7
[3] https://gist.github.com/7013b493d19c8c5fffae
[4] https://gist.github.com/cc38155b57e68d7e86d5
[5] https://gist.github.com/6a24000c77760a97976a
[6] https://gist.github.com/74bd7a9f734c2fd21c33

On понеділок, 1 лютого 2016 р. 14:24:22 EET Soumya Koduri wrote:

On 02/01/2016 01:39 PM, Oleksandr Natalenko wrote:

Wait. It seems to be my bad.

Before unmounting I do drop_caches (2), and glusterfs process CPU 
usage

goes to 100% for a while. I haven't waited for it to drop to 0%, and
instead perform unmount. It seems glusterfs is purging inodes and 
that's

why it uses 100% of CPU. I've re-tested it, waiting for CPU usage to
become normal, and got no leaks.

Will verify this once again and report more.

BTW, if that works, how could I limit inode cache for FUSE client? I 
do
not want it to go beyond 1G, for example, even if I have 48G of RAM 
on

my server.


Its hard-coded for now. For fuse the lru limit (of the inodes which 
are

not active) is (32*1024).
One of the ways to address this (which we were discussing earlier) is 
to
have an option to configure inode cache limit. If that sounds good, 
we
can then check on if it has to be global/volume-level, 
client/server/both.


Thanks,
Soumya


01.02.2016 09:54, Soumya Koduri написав:

On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:

Unfortunately, this patch doesn't h

Re: [Gluster-devel] GlusterFS FUSE client leaks summary — part I

2016-02-01 Thread Oleksandr Natalenko

Please take a look at updated test results.

Test: find /mnt/volume -type d

RAM usage after "find" finishes: ~ 10.8G (see "ps" output [1]).

Statedump after "find" finishes: [2].

Then I did drop_caches, and RAM usage dropped to ~4.7G [3].

Statedump after drop_caches: [4].

Here is diff between statedumps: [5].

And, finally, Valgrind output: [6].

Definitely, no major leaks on exit, but why glusterfs process uses almost 5G 
of RAM after drop_caches? Examining statedump shows only the following snippet 
with high "size" value:

===
[mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
size=4234592647
num_allocs=1
max_size=4294935223
max_num_allocs=3
total_allocs=4186991
===

Another leak?

Grepping "gf_fuse_mt_iov_base" on GlusterFS source tree shows the following:

===
$ grep -Rn gf_fuse_mt_iov_base
xlators/mount/fuse/src/fuse-mem-types.h:20:gf_fuse_mt_iov_base,
xlators/mount/fuse/src/fuse-bridge.c:4887:  
  
gf_fuse_mt_iov_base);
===

fuse-bridge.c snippet:

===
/* Add extra 128 byte to the first iov so that it can
 * accommodate "ordinary" non-write requests. It's not
 * guaranteed to be big enough, as SETXATTR and namespace
 * operations with very long names may grow behind it,
 * but it's good enough in most cases (and we can handle
 * rest via realloc).
 */
iov_in[0].iov_base = GF_CALLOC (1, msg0_size,
gf_fuse_mt_iov_base);
===

Probably, some freeing missing for iov_base?

[1] https://gist.github.com/f0cf98e8bff0c13ea38f
[2] https://gist.github.com/87baa0a778ba54f0f7f7
[3] https://gist.github.com/7013b493d19c8c5fffae
[4] https://gist.github.com/cc38155b57e68d7e86d5
[5] https://gist.github.com/6a24000c77760a97976a
[6] https://gist.github.com/74bd7a9f734c2fd21c33

On понеділок, 1 лютого 2016 р. 14:24:22 EET Soumya Koduri wrote:
> On 02/01/2016 01:39 PM, Oleksandr Natalenko wrote:
> > Wait. It seems to be my bad.
> > 
> > Before unmounting I do drop_caches (2), and glusterfs process CPU usage
> > goes to 100% for a while. I haven't waited for it to drop to 0%, and
> > instead perform unmount. It seems glusterfs is purging inodes and that's
> > why it uses 100% of CPU. I've re-tested it, waiting for CPU usage to
> > become normal, and got no leaks.
> > 
> > Will verify this once again and report more.
> > 
> > BTW, if that works, how could I limit inode cache for FUSE client? I do
> > not want it to go beyond 1G, for example, even if I have 48G of RAM on
> > my server.
> 
> Its hard-coded for now. For fuse the lru limit (of the inodes which are
> not active) is (32*1024).
> One of the ways to address this (which we were discussing earlier) is to
> have an option to configure inode cache limit. If that sounds good, we
> can then check on if it has to be global/volume-level, client/server/both.
> 
> Thanks,
> Soumya
> 
> > 01.02.2016 09:54, Soumya Koduri написав:
> >> On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:
> >>> Unfortunately, this patch doesn't help.
> >>> 
> >>> RAM usage on "find" finish is ~9G.
> >>> 
> >>> Here is statedump before drop_caches: https://gist.github.com/
> >>> fc1647de0982ab447e20
> >> 
> >> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
> >> size=706766688
> >> num_allocs=2454051
> >> 
> >>> And after drop_caches: https://gist.github.com/5eab63bc13f78787ed19
> >> 
> >> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
> >> size=550996416
> >> num_allocs=1913182
> >> 
> >> There isn't much significant drop in inode contexts. One of the
> >> reasons could be because of dentrys holding a refcount on the inodes
> >> which shall result in inodes not getting purged even after
> >> fuse_forget.
> >> 
> >> 
> >> pool-name=fuse:dentry_t
> >> hot-count=32761
> >> 
> >> if  '32761' is the current active dentry count, it still doesn't seem
> >> to match up to inode count.
> >> 
> >> Thanks,
> >> Soumya
> >> 
> >>> And here is Valgrind output:
> >>> https://gist.github.com/2490aeac448320d98596
> >>> 
> >>> On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:
> >>>> There's another inode leak caused by an incorrect counting of
> >>>> lookups on directory reads.
> >&g

Re: [Gluster-devel] GlusterFS FUSE client leaks summary — part I

2016-02-01 Thread Oleksandr Natalenko


Wait. It seems to be my bad.

Before unmounting I do drop_caches (2), and glusterfs process CPU usage 
goes to 100% for a while. I haven't waited for it to drop to 0%, and 
instead perform unmount. It seems glusterfs is purging inodes and that's 
why it uses 100% of CPU. I've re-tested it, waiting for CPU usage to 
become normal, and got no leaks.


Will verify this once again and report more.

BTW, if that works, how could I limit inode cache for FUSE client? I do 
not want it to go beyond 1G, for example, even if I have 48G of RAM on 
my server.


01.02.2016 09:54, Soumya Koduri написав:

On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:

Unfortunately, this patch doesn't help.

RAM usage on "find" finish is ~9G.

Here is statedump before drop_caches: https://gist.github.com/
fc1647de0982ab447e20


[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
size=706766688
num_allocs=2454051



And after drop_caches: https://gist.github.com/5eab63bc13f78787ed19


[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
size=550996416
num_allocs=1913182

There isn't much significant drop in inode contexts. One of the
reasons could be because of dentrys holding a refcount on the inodes
which shall result in inodes not getting purged even after
fuse_forget.


pool-name=fuse:dentry_t
hot-count=32761

if  '32761' is the current active dentry count, it still doesn't seem
to match up to inode count.

Thanks,
Soumya


And here is Valgrind output: 
https://gist.github.com/2490aeac448320d98596


On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:

There's another inode leak caused by an incorrect counting of
lookups on directory reads.

Here's a patch that solves the problem for
3.7:

http://review.gluster.org/13324

Hopefully with this patch the
memory leaks should disapear.

Xavi

On 29.01.2016 19:09, Oleksandr

Natalenko wrote:

Here is intermediate summary of current memory


leaks in FUSE client


investigation.

I use GlusterFS v3.7.6


release with the following patches:

===



Kaleb S KEITHLEY (1):

fuse: use-after-free fix in fuse-bridge, revisited


Pranith Kumar K


(1):

mount/fuse: Fix use-after-free crash



Soumya Koduri (3):

gfapi: Fix inode nlookup counts


inode: Retire the inodes from the lru


list in inode_table_destroy


upcall: free the xdr* allocations
===


With those patches we got API leaks fixed (I hope, brief tests show


that) and


got rid of "kernel notifier loop terminated" message.


Nevertheless, FUSE


client still leaks.

I have several test


volumes with several million of small files (100K…2M in


average). I


do 2 types of FUSE client testing:

1) find /mnt/volume -type d
2)


rsync -av -H /mnt/source_volume/* /mnt/target_volume/


And most


up-to-date results are shown below:

=== find /mnt/volume -type d


===


Memory consumption: ~4G



Statedump:

https://gist.github.com/10cde83c63f1b4f1dd7a


Valgrind:

https://gist.github.com/097afb01ebb2c5e9e78d


I guess,


fuse-bridge/fuse-resolve. related.


=== rsync -av -H


/mnt/source_volume/* /mnt/target_volume/ ===


Memory consumption:

~3.3...4G


Statedump (target volume):

https://gist.github.com/31e43110eaa4da663435


Valgrind (target volume):

https://gist.github.com/f8e0151a6878cacc9b1a


I guess,


DHT-related.


Give me more patches to test :).


___


Gluster-devel mailing


list


Gluster-devel@gluster.org


http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GlusterFS FUSE client leaks summary — part I

2016-01-31 Thread Oleksandr Natalenko

Unfortunately, this patch doesn't help.

RAM usage on "find" finish is ~9G.

Here is statedump before drop_caches: https://gist.github.com/
fc1647de0982ab447e20

And after drop_caches: https://gist.github.com/5eab63bc13f78787ed19

And here is Valgrind output: https://gist.github.com/2490aeac448320d98596

On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:
> There's another inode leak caused by an incorrect counting of
> lookups on directory reads.
> 
> Here's a patch that solves the problem for
> 3.7:
> 
> http://review.gluster.org/13324
> 
> Hopefully with this patch the
> memory leaks should disapear.
> 
> Xavi
> 
> On 29.01.2016 19:09, Oleksandr
> 
> Natalenko wrote:
> > Here is intermediate summary of current memory
> 
> leaks in FUSE client
> 
> > investigation.
> > 
> > I use GlusterFS v3.7.6
> 
> release with the following patches:
> > ===
> 
> > Kaleb S KEITHLEY (1):
> fuse: use-after-free fix in fuse-bridge, revisited
> 
> > Pranith Kumar K
> 
> (1):
> > mount/fuse: Fix use-after-free crash
> 
> > Soumya Koduri (3):
> gfapi: Fix inode nlookup counts
> 
> > inode: Retire the inodes from the lru
> 
> list in inode_table_destroy
> 
> > upcall: free the xdr* allocations
> > ===
> > 
> > 
> > With those patches we got API leaks fixed (I hope, brief tests show
> 
> that) and
> 
> > got rid of "kernel notifier loop terminated" message.
> 
> Nevertheless, FUSE
> 
> > client still leaks.
> > 
> > I have several test
> 
> volumes with several million of small files (100K…2M in
> 
> > average). I
> 
> do 2 types of FUSE client testing:
> > 1) find /mnt/volume -type d
> > 2)
> 
> rsync -av -H /mnt/source_volume/* /mnt/target_volume/
> 
> > And most
> 
> up-to-date results are shown below:
> > === find /mnt/volume -type d
> 
> ===
> 
> > Memory consumption: ~4G
> 
> > Statedump:
> https://gist.github.com/10cde83c63f1b4f1dd7a
> 
> > Valgrind:
> https://gist.github.com/097afb01ebb2c5e9e78d
> 
> > I guess,
> 
> fuse-bridge/fuse-resolve. related.
> 
> > === rsync -av -H
> 
> /mnt/source_volume/* /mnt/target_volume/ ===
> 
> > Memory consumption:
> ~3.3...4G
> 
> > Statedump (target volume):
> https://gist.github.com/31e43110eaa4da663435
> 
> > Valgrind (target volume):
> https://gist.github.com/f8e0151a6878cacc9b1a
> 
> > I guess,
> 
> DHT-related.
> 
> > Give me more patches to test :).
> 
> ___
> 
> > Gluster-devel mailing
> 
> list
> 
> > Gluster-devel@gluster.org
> 
> http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] GlusterFS FUSE client leaks summary — part I

2016-01-29 Thread Oleksandr Natalenko

Here is intermediate summary of current memory leaks in FUSE client 
investigation.

I use GlusterFS v3.7.6 release with the following patches:

===
Kaleb S KEITHLEY (1):
  fuse: use-after-free fix in fuse-bridge, revisited

Pranith Kumar K (1):
  mount/fuse: Fix use-after-free crash

Soumya Koduri (3):
  gfapi: Fix inode nlookup counts
  inode: Retire the inodes from the lru list in inode_table_destroy
  upcall: free the xdr* allocations
===

With those patches we got API leaks fixed (I hope, brief tests show that) and 
got rid of "kernel notifier loop terminated" message. Nevertheless, FUSE 
client still leaks.

I have several test volumes with several million of small files (100K…2M in 
average). I do 2 types of FUSE client testing:

1) find /mnt/volume -type d
2) rsync -av -H /mnt/source_volume/* /mnt/target_volume/

And most up-to-date results are shown below:

=== find /mnt/volume -type d ===

Memory consumption: ~4G
Statedump: https://gist.github.com/10cde83c63f1b4f1dd7a
Valgrind: https://gist.github.com/097afb01ebb2c5e9e78d

I guess, fuse-bridge/fuse-resolve. related.

=== rsync -av -H /mnt/source_volume/* /mnt/target_volume/ ===

Memory consumption: ~3.3...4G
Statedump (target volume): https://gist.github.com/31e43110eaa4da663435
Valgrind (target volume): https://gist.github.com/f8e0151a6878cacc9b1a

I guess, DHT-related.

Give me more patches to test :).
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-29 Thread Oleksandr Natalenko

OK, given GlusterFS v3.7.6 with the following patches:

===
Kaleb S KEITHLEY (1):
  fuse: use-after-free fix in fuse-bridge, revisited

Pranith Kumar K (1):
  mount/fuse: Fix use-after-free crash

Soumya Koduri (3):
  gfapi: Fix inode nlookup counts
  inode: Retire the inodes from the lru list in inode_table_destroy
  upcall: free the xdr* allocations
===

I've repeated "rsync" test under Valgrind, and here is Valgrind output:

https://gist.github.com/f8e0151a6878cacc9b1a

I see DHT-related leaks.

On понеділок, 25 січня 2016 р. 02:46:32 EET Oleksandr Natalenko wrote:
> Also, I've repeated the same "find" test again, but with glusterfs process
> launched under valgrind. And here is valgrind output:
> 
> https://gist.github.com/097afb01ebb2c5e9e78d
> 
> On неділя, 24 січня 2016 р. 09:33:00 EET Mathieu Chateau wrote:
> > Thanks for all your tests and times, it looks promising :)
> > 
> > 
> > Cordialement,
> > Mathieu CHATEAU
> > http://www.lotp.fr
> > 
> > 2016-01-23 22:30 GMT+01:00 Oleksandr Natalenko :
> > > OK, now I'm re-performing tests with rsync + GlusterFS v3.7.6 + the
> > > following
> > > patches:
> > > 
> > > ===
> > > 
> > > Kaleb S KEITHLEY (1):
> > >   fuse: use-after-free fix in fuse-bridge, revisited
> > > 
> > > Pranith Kumar K (1):
> > >   mount/fuse: Fix use-after-free crash
> > > 
> > > Soumya Koduri (3):
> > >   gfapi: Fix inode nlookup counts
> > >   inode: Retire the inodes from the lru list in inode_table_destroy
> > >   upcall: free the xdr* allocations
> > > 
> > > ===
> > > 
> > > I run rsync from one GlusterFS volume to another. While memory started
> > > from
> > > under 100 MiBs, it stalled at around 600 MiBs for source volume and does
> > > not
> > > grow further. As for target volume it is ~730 MiBs, and that is why I'm
> > > going
> > > to do several rsync rounds to see if it grows more (with no patches bare
> > > 3.7.6
> > > could consume more than 20 GiBs).
> > > 
> > > No "kernel notifier loop terminated" message so far for both volumes.
> > > 
> > > Will report more in several days. I hope current patches will be
> > > incorporated
> > > into 3.7.7.
> > > 
> > > On пʼятниця, 22 січня 2016 р. 12:53:36 EET Kaleb S. KEITHLEY wrote:
> > > > On 01/22/2016 12:43 PM, Oleksandr Natalenko wrote:
> > > > > On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote:
> > > > >> I presume by this you mean you're not seeing the "kernel notifier
> > > > >> loop
> > > > >> terminated" error in your logs.
> > > > > 
> > > > > Correct, but only with simple traversing. Have to test under rsync.
> > > > 
> > > > Without the patch I'd get "kernel notifier loop terminated" within a
> > > > few
> > > > minutes of starting I/O.  With the patch I haven't seen it in 24 hours
> > > > of beating on it.
> > > > 
> > > > >> Hmmm.  My system is not leaking. Last 24 hours the RSZ and VSZ are
> > > 
> > > > >> stable:
> > > http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longe
> > > v
> > > 
> > > > >> ity /client.out
> > > > > 
> > > > > What ops do you perform on mounted volume? Read, write, stat? Is
> > > > > that
> > > > > 3.7.6 + patches?
> > > > 
> > > > I'm running an internally developed I/O load generator written by a
> > > > guy
> > > > on our perf team.
> > > > 
> > > > it does, create, write, read, rename, stat, delete, and more.
> 
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-25 Thread Oleksandr Natalenko

Here are the results of "rsync" test. I've got 2 volumes — source and target — 
performing multiple files rsyncing from one volume to another.

Source volume:

===
root 22259  3.5  1.5 1204200 771004 ?  Ssl  Jan23 109:42 /usr/sbin/
glusterfs --volfile-server=glusterfs.example.com --volfile-id=source /mnt/net/
glusterfs/source
===

One may see that memory consumption of source volume is not that high as with 
"find" test. Here is source volume client statedump: https://gist.github.com/
ef5b798859219e739aeb

Here is source volume info: https://gist.github.com/3d2f32e7346df9333004

Target volume:

===
root 22200 23.8  6.9 3983676 3456252 ? Ssl  Jan23 734:57 /usr/sbin/
glusterfs --volfile-server=glusterfs.example.com --volfile-id=target /mnt/net/
glusterfs/target
===

Here is target volume info: https://gist.github.com/c9de01168071575b109e

Target volume RAM consumption is very high (more than 3 GiBs). Here is client 
statedump too: https://gist.github.com/31e43110eaa4da663435

I see huge DHT-related memory usage, e.g.:

===
[cluster/distribute.asterisk_records-dht - usage-type gf_common_mt_mem_pool 
memusage]
size=725575592
num_allocs=7552486
max_size=725575836
max_num_allocs=7552489
total_allocs=90843958

[cluster/distribute.asterisk_records-dht - usage-type gf_common_mt_char 
memusage]
size=586404954
num_allocs=7572836
max_size=586405157
max_num_allocs=7572839
total_allocs=80463096
===

Ideas?

On понеділок, 25 січня 2016 р. 02:46:32 EET Oleksandr Natalenko wrote:
> Also, I've repeated the same "find" test again, but with glusterfs process
> launched under valgrind. And here is valgrind output:
> 
> https://gist.github.com/097afb01ebb2c5e9e78d
> 
> On неділя, 24 січня 2016 р. 09:33:00 EET Mathieu Chateau wrote:
> > Thanks for all your tests and times, it looks promising :)
> > 
> > 
> > Cordialement,
> > Mathieu CHATEAU
> > http://www.lotp.fr
> > 
> > 2016-01-23 22:30 GMT+01:00 Oleksandr Natalenko :
> > > OK, now I'm re-performing tests with rsync + GlusterFS v3.7.6 + the
> > > following
> > > patches:
> > > 
> > > ===
> > > 
> > > Kaleb S KEITHLEY (1):
> > >   fuse: use-after-free fix in fuse-bridge, revisited
> > > 
> > > Pranith Kumar K (1):
> > >   mount/fuse: Fix use-after-free crash
> > > 
> > > Soumya Koduri (3):
> > >   gfapi: Fix inode nlookup counts
> > >   inode: Retire the inodes from the lru list in inode_table_destroy
> > >   upcall: free the xdr* allocations
> > > 
> > > ===
> > > 
> > > I run rsync from one GlusterFS volume to another. While memory started
> > > from
> > > under 100 MiBs, it stalled at around 600 MiBs for source volume and does
> > > not
> > > grow further. As for target volume it is ~730 MiBs, and that is why I'm
> > > going
> > > to do several rsync rounds to see if it grows more (with no patches bare
> > > 3.7.6
> > > could consume more than 20 GiBs).
> > > 
> > > No "kernel notifier loop terminated" message so far for both volumes.
> > > 
> > > Will report more in several days. I hope current patches will be
> > > incorporated
> > > into 3.7.7.
> > > 
> > > On пʼятниця, 22 січня 2016 р. 12:53:36 EET Kaleb S. KEITHLEY wrote:
> > > > On 01/22/2016 12:43 PM, Oleksandr Natalenko wrote:
> > > > > On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote:
> > > > >> I presume by this you mean you're not seeing the "kernel notifier
> > > > >> loop
> > > > >> terminated" error in your logs.
> > > > > 
> > > > > Correct, but only with simple traversing. Have to test under rsync.
> > > > 
> > > > Without the patch I'd get "kernel notifier loop terminated" within a
> > > > few
> > > > minutes of starting I/O.  With the patch I haven't seen it in 24 hours
> > > > of beating on it.
> > > > 
> > > > >> Hmmm.  My system is not leaking. Last 24 hours the RSZ and VSZ are
> > > 
> > > > >> stable:
> > > http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longe
> > > v
> > > 
> > > > >> ity /client.out
> > > > > 
> > > > > What ops do you perform on mounted volume? Read, write, stat? Is
> > > > > that
> > > > > 3.7.6 + patches?
> > > > 
> > > > I'm running an internally developed I/O load generator written by a
> > > > guy
> > > > on our perf team.
> > > > 
> > > > it does, create, write, read, rename, stat, delete, and more.
> 
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-24 Thread Oleksandr Natalenko

Also, I've repeated the same "find" test again, but with glusterfs process 
launched under valgrind. And here is valgrind output:

https://gist.github.com/097afb01ebb2c5e9e78d

On неділя, 24 січня 2016 р. 09:33:00 EET Mathieu Chateau wrote:
> Thanks for all your tests and times, it looks promising :)
> 
> 
> Cordialement,
> Mathieu CHATEAU
> http://www.lotp.fr
> 
> 2016-01-23 22:30 GMT+01:00 Oleksandr Natalenko :
> > OK, now I'm re-performing tests with rsync + GlusterFS v3.7.6 + the
> > following
> > patches:
> > 
> > ===
> > 
> > Kaleb S KEITHLEY (1):
> >   fuse: use-after-free fix in fuse-bridge, revisited
> > 
> > Pranith Kumar K (1):
> >   mount/fuse: Fix use-after-free crash
> > 
> > Soumya Koduri (3):
> >   gfapi: Fix inode nlookup counts
> >   inode: Retire the inodes from the lru list in inode_table_destroy
> >   upcall: free the xdr* allocations
> > 
> > ===
> > 
> > I run rsync from one GlusterFS volume to another. While memory started
> > from
> > under 100 MiBs, it stalled at around 600 MiBs for source volume and does
> > not
> > grow further. As for target volume it is ~730 MiBs, and that is why I'm
> > going
> > to do several rsync rounds to see if it grows more (with no patches bare
> > 3.7.6
> > could consume more than 20 GiBs).
> > 
> > No "kernel notifier loop terminated" message so far for both volumes.
> > 
> > Will report more in several days. I hope current patches will be
> > incorporated
> > into 3.7.7.
> > 
> > On пʼятниця, 22 січня 2016 р. 12:53:36 EET Kaleb S. KEITHLEY wrote:
> > > On 01/22/2016 12:43 PM, Oleksandr Natalenko wrote:
> > > > On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote:
> > > >> I presume by this you mean you're not seeing the "kernel notifier
> > > >> loop
> > > >> terminated" error in your logs.
> > > > 
> > > > Correct, but only with simple traversing. Have to test under rsync.
> > > 
> > > Without the patch I'd get "kernel notifier loop terminated" within a few
> > > minutes of starting I/O.  With the patch I haven't seen it in 24 hours
> > > of beating on it.
> > > 
> > > >> Hmmm.  My system is not leaking. Last 24 hours the RSZ and VSZ are
> > 
> > > >> stable:
> > http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longev
> > 
> > > >> ity /client.out
> > > > 
> > > > What ops do you perform on mounted volume? Read, write, stat? Is that
> > > > 3.7.6 + patches?
> > > 
> > > I'm running an internally developed I/O load generator written by a guy
> > > on our perf team.
> > > 
> > > it does, create, write, read, rename, stat, delete, and more.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-24 Thread Oleksandr Natalenko

BTW, am I the only one who sees in

max_size=4294965480

almost 2^32? Could that be integer overflow?

On неділя, 24 січня 2016 р. 13:23:55 EET Oleksandr Natalenko wrote:
> The leak definitely remains. I did "find /mnt/volume -type d" over GlusterFS
> volume, with mentioned patches applied and without "kernel notifier loop
> terminated" message, but "glusterfs" process consumed ~4GiB of RAM after
> "find" finished.
> 
> Here is statedump:
> 
> https://gist.github.com/10cde83c63f1b4f1dd7a
> 
> I see the following:
> 
> ===
> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
> size=4235109959
> num_allocs=2
> max_size=4294965480
> max_num_allocs=3
> total_allocs=4533524
> ===
> 
> ~4GiB, right?
> 
> Pranith, Kaleb?
> 
> On неділя, 24 січня 2016 р. 09:33:00 EET Mathieu Chateau wrote:
> > Thanks for all your tests and times, it looks promising :)
> > 
> > 
> > Cordialement,
> > Mathieu CHATEAU
> > http://www.lotp.fr
> > 
> > 2016-01-23 22:30 GMT+01:00 Oleksandr Natalenko :
> > > OK, now I'm re-performing tests with rsync + GlusterFS v3.7.6 + the
> > > following
> > > patches:
> > > 
> > > ===
> > > 
> > > Kaleb S KEITHLEY (1):
> > >   fuse: use-after-free fix in fuse-bridge, revisited
> > > 
> > > Pranith Kumar K (1):
> > >   mount/fuse: Fix use-after-free crash
> > > 
> > > Soumya Koduri (3):
> > >   gfapi: Fix inode nlookup counts
> > >   inode: Retire the inodes from the lru list in inode_table_destroy
> > >   upcall: free the xdr* allocations
> > > 
> > > ===
> > > 
> > > I run rsync from one GlusterFS volume to another. While memory started
> > > from
> > > under 100 MiBs, it stalled at around 600 MiBs for source volume and does
> > > not
> > > grow further. As for target volume it is ~730 MiBs, and that is why I'm
> > > going
> > > to do several rsync rounds to see if it grows more (with no patches bare
> > > 3.7.6
> > > could consume more than 20 GiBs).
> > > 
> > > No "kernel notifier loop terminated" message so far for both volumes.
> > > 
> > > Will report more in several days. I hope current patches will be
> > > incorporated
> > > into 3.7.7.
> > > 
> > > On пʼятниця, 22 січня 2016 р. 12:53:36 EET Kaleb S. KEITHLEY wrote:
> > > > On 01/22/2016 12:43 PM, Oleksandr Natalenko wrote:
> > > > > On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote:
> > > > >> I presume by this you mean you're not seeing the "kernel notifier
> > > > >> loop
> > > > >> terminated" error in your logs.
> > > > > 
> > > > > Correct, but only with simple traversing. Have to test under rsync.
> > > > 
> > > > Without the patch I'd get "kernel notifier loop terminated" within a
> > > > few
> > > > minutes of starting I/O.  With the patch I haven't seen it in 24 hours
> > > > of beating on it.
> > > > 
> > > > >> Hmmm.  My system is not leaking. Last 24 hours the RSZ and VSZ are
> > > 
> > > > >> stable:
> > > http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longe
> > > v
> > > 
> > > > >> ity /client.out
> > > > > 
> > > > > What ops do you perform on mounted volume? Read, write, stat? Is
> > > > > that
> > > > > 3.7.6 + patches?
> > > > 
> > > > I'm running an internally developed I/O load generator written by a
> > > > guy
> > > > on our perf team.
> > > > 
> > > > it does, create, write, read, rename, stat, delete, and more.
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-24 Thread Oleksandr Natalenko

The leak definitely remains. I did "find /mnt/volume -type d" over GlusterFS 
volume, with mentioned patches applied and without "kernel notifier loop 
terminated" message, but "glusterfs" process consumed ~4GiB of RAM after 
"find" finished.

Here is statedump:

https://gist.github.com/10cde83c63f1b4f1dd7a

I see the following:

===
[mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
size=4235109959
num_allocs=2
max_size=4294965480
max_num_allocs=3
total_allocs=4533524
===

~4GiB, right?

Pranith, Kaleb?

On неділя, 24 січня 2016 р. 09:33:00 EET Mathieu Chateau wrote:
> Thanks for all your tests and times, it looks promising :)
> 
> 
> Cordialement,
> Mathieu CHATEAU
> http://www.lotp.fr
> 
> 2016-01-23 22:30 GMT+01:00 Oleksandr Natalenko :
> > OK, now I'm re-performing tests with rsync + GlusterFS v3.7.6 + the
> > following
> > patches:
> > 
> > ===
> > 
> > Kaleb S KEITHLEY (1):
> >   fuse: use-after-free fix in fuse-bridge, revisited
> > 
> > Pranith Kumar K (1):
> >   mount/fuse: Fix use-after-free crash
> > 
> > Soumya Koduri (3):
> >   gfapi: Fix inode nlookup counts
> >   inode: Retire the inodes from the lru list in inode_table_destroy
> >   upcall: free the xdr* allocations
> > 
> > ===
> > 
> > I run rsync from one GlusterFS volume to another. While memory started
> > from
> > under 100 MiBs, it stalled at around 600 MiBs for source volume and does
> > not
> > grow further. As for target volume it is ~730 MiBs, and that is why I'm
> > going
> > to do several rsync rounds to see if it grows more (with no patches bare
> > 3.7.6
> > could consume more than 20 GiBs).
> > 
> > No "kernel notifier loop terminated" message so far for both volumes.
> > 
> > Will report more in several days. I hope current patches will be
> > incorporated
> > into 3.7.7.
> > 
> > On пʼятниця, 22 січня 2016 р. 12:53:36 EET Kaleb S. KEITHLEY wrote:
> > > On 01/22/2016 12:43 PM, Oleksandr Natalenko wrote:
> > > > On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote:
> > > >> I presume by this you mean you're not seeing the "kernel notifier
> > > >> loop
> > > >> terminated" error in your logs.
> > > > 
> > > > Correct, but only with simple traversing. Have to test under rsync.
> > > 
> > > Without the patch I'd get "kernel notifier loop terminated" within a few
> > > minutes of starting I/O.  With the patch I haven't seen it in 24 hours
> > > of beating on it.
> > > 
> > > >> Hmmm.  My system is not leaking. Last 24 hours the RSZ and VSZ are
> > 
> > > >> stable:
> > http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longev
> > 
> > > >> ity /client.out
> > > > 
> > > > What ops do you perform on mounted volume? Read, write, stat? Is that
> > > > 3.7.6 + patches?
> > > 
> > > I'm running an internally developed I/O load generator written by a guy
> > > on our perf team.
> > > 
> > > it does, create, write, read, rename, stat, delete, and more.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file

2016-01-23 Thread Oleksandr Natalenko

With "performance.client-io-threads" set to "off" no hangs occurred in 3 
rsync/rm rounds. Could that be some fuse-bridge lock race? Will bring that 
option to "on" back again and try to get full statedump.

On четвер, 21 січня 2016 р. 14:54:47 EET Raghavendra G wrote:
> On Thu, Jan 21, 2016 at 10:49 AM, Pranith Kumar Karampuri <
> 
> pkara...@redhat.com> wrote:
> > On 01/18/2016 02:28 PM, Oleksandr Natalenko wrote:
> >> XFS. Server side works OK, I'm able to mount volume again. Brick is 30%
> >> full.
> > 
> > Oleksandr,
> > 
> >   Will it be possible to get the statedump of the client, bricks
> > 
> > output next time it happens?
> > 
> > https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.m
> > d#how-to-generate-statedump
> We also need to dump inode information. To do that you've to add "all=yes"
> to /var/run/gluster/glusterdump.options before you issue commands to get
> statedump.
> 
> > Pranith
> > 
> >> On понеділок, 18 січня 2016 р. 15:07:18 EET baul jianguo wrote:
> >>> What is your brick file system? and the glusterfsd process and all
> >>> thread status?
> >>> I met same issue when client app such as rsync stay in D status,and
> >>> the brick process and relate thread also be in the D status.
> >>> And the brick dev disk util is 100% .
> >>> 
> >>> On Sun, Jan 17, 2016 at 6:13 AM, Oleksandr Natalenko
> >>> 
> >>>  wrote:
> >>>> Wrong assumption, rsync hung again.
> >>>> 
> >>>> On субота, 16 січня 2016 р. 22:53:04 EET Oleksandr Natalenko wrote:
> >>>>> One possible reason:
> >>>>> 
> >>>>> cluster.lookup-optimize: on
> >>>>> cluster.readdir-optimize: on
> >>>>> 
> >>>>> I've disabled both optimizations, and at least as of now rsync still
> >>>>> does
> >>>>> its job with no issues. I would like to find out what option causes
> >>>>> such
> >>>>> a
> >>>>> behavior and why. Will test more.
> >>>>> 
> >>>>> On пʼятниця, 15 січня 2016 р. 16:09:51 EET Oleksandr Natalenko wrote:
> >>>>>> Another observation: if rsyncing is resumed after hang, rsync itself
> >>>>>> hangs a lot faster because it does stat of already copied files. So,
> >>>>>> the
> >>>>>> reason may be not writing itself, but massive stat on GlusterFS
> >>>>>> volume
> >>>>>> as well.
> >>>>>> 
> >>>>>> 15.01.2016 09:40, Oleksandr Natalenko написав:
> >>>>>>> While doing rsync over millions of files from ordinary partition to
> >>>>>>> GlusterFS volume, just after approx. first 2 million rsync hang
> >>>>>>> happens, and the following info appears in dmesg:
> >>>>>>> 
> >>>>>>> ===
> >>>>>>> [17075038.924481] INFO: task rsync:10310 blocked for more than 120
> >>>>>>> seconds.
> >>>>>>> [17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>>>> disables this message.
> >>>>>>> [17075038.940748] rsync   D 88207fc13680 0 10310
> >>>>>>> 10309 0x0080
> >>>>>>> [17075038.940752]  8809c578be18 0086
> >>>>>>> 8809c578bfd8
> >>>>>>> 00013680
> >>>>>>> [17075038.940756]  8809c578bfd8 00013680
> >>>>>>> 880310cbe660
> >>>>>>> 881159d16a30
> >>>>>>> [17075038.940759]  881e3aa25800 8809c578be48
> >>>>>>> 881159d16b10
> >>>>>>> 88087d553980
> >>>>>>> [17075038.940762] Call Trace:
> >>>>>>> [17075038.940770]  [] schedule+0x29/0x70
> >>>>>>> [17075038.940797]  []
> >>>>>>> __fuse_request_send+0x13d/0x2c0
> >>>>>>> [fuse]
> >>>>>>> [17075038.940801]  [] ?
> >>>>>>> fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse]
> >>>>>>> [17075038.940805]  [] ? wake_up_bit+0x30/0x30
> >>>>>>> [17075038.9408

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-23 Thread Oleksandr Natalenko

OK, now I'm re-performing tests with rsync + GlusterFS v3.7.6 + the following 
patches:

===
Kaleb S KEITHLEY (1):
  fuse: use-after-free fix in fuse-bridge, revisited

Pranith Kumar K (1):
  mount/fuse: Fix use-after-free crash

Soumya Koduri (3):
  gfapi: Fix inode nlookup counts
  inode: Retire the inodes from the lru list in inode_table_destroy
  upcall: free the xdr* allocations
===

I run rsync from one GlusterFS volume to another. While memory started from 
under 100 MiBs, it stalled at around 600 MiBs for source volume and does not 
grow further. As for target volume it is ~730 MiBs, and that is why I'm going 
to do several rsync rounds to see if it grows more (with no patches bare 3.7.6 
could consume more than 20 GiBs).

No "kernel notifier loop terminated" message so far for both volumes.

Will report more in several days. I hope current patches will be incorporated 
into 3.7.7.

On пʼятниця, 22 січня 2016 р. 12:53:36 EET Kaleb S. KEITHLEY wrote:
> On 01/22/2016 12:43 PM, Oleksandr Natalenko wrote:
> > On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote:
> >> I presume by this you mean you're not seeing the "kernel notifier loop
> >> terminated" error in your logs.
> > 
> > Correct, but only with simple traversing. Have to test under rsync.
> 
> Without the patch I'd get "kernel notifier loop terminated" within a few
> minutes of starting I/O.  With the patch I haven't seen it in 24 hours
> of beating on it.
> 
> >> Hmmm.  My system is not leaking. Last 24 hours the RSZ and VSZ are
> >> stable:
> >> http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longev
> >> ity /client.out
> > 
> > What ops do you perform on mounted volume? Read, write, stat? Is that
> > 3.7.6 + patches?
> 
> I'm running an internally developed I/O load generator written by a guy
> on our perf team.
> 
> it does, create, write, read, rename, stat, delete, and more.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-22 Thread Oleksandr Natalenko

On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote:
> I presume by this you mean you're not seeing the "kernel notifier loop
> terminated" error in your logs.

Correct, but only with simple traversing. Have to test under rsync.

> Hmmm.  My system is not leaking. Last 24 hours the RSZ and VSZ are
> stable:
> http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longevity
> /client.out

What ops do you perform on mounted volume? Read, write, stat? Is that 3.7.6 + 
patches?

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-22 Thread Oleksandr Natalenko

OK, compiles and runs well now, but still leaks. Will try to load the volume 
with rsync.

On четвер, 21 січня 2016 р. 20:40:45 EET Kaleb KEITHLEY wrote:
> On 01/21/2016 06:59 PM, Oleksandr Natalenko wrote:
> > I see extra GF_FREE (node); added with two patches:
> > 
> > ===
> > $ git diff HEAD~2 | gist
> > https://gist.github.com/9524fa2054cc48278ea8
> > ===
> > 
> > Is that intentionally? I guess I face double-free issue.
> 
> I presume you're referring to the release-3.7 branch.
> 
> Yup, bad edit. Long day. That's why we review. ;-)
> 
> Please try the latest.
> 
> Thanks,
> 
> --
> 
> Kaleb


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-21 Thread Oleksandr Natalenko

I see extra GF_FREE (node); added with two patches:

===
$ git diff HEAD~2 | gist
https://gist.github.com/9524fa2054cc48278ea8
===

Is that intentionally? I guess I face double-free issue.

On четвер, 21 січня 2016 р. 17:29:53 EET Kaleb KEITHLEY wrote:
> On 01/20/2016 04:08 AM, Oleksandr Natalenko wrote:
> > Yes, there are couple of messages like this in my logs too (I guess one
> > message per each remount):
> > 
> > ===
> > [2016-01-18 23:42:08.742447] I [fuse-bridge.c:3875:notify_kernel_loop] 0-
> > glusterfs-fuse: kernel notifier loop terminated
> > ===
> 
> Bug reports and fixes for master and release-3.7 branches are:
> 
> master)
>  https://bugzilla.redhat.com/show_bug.cgi?id=1288857
>  http://review.gluster.org/12886
> 
> release-3.7)
>  https://bugzilla.redhat.com/show_bug.cgi?id=1288922
>  http://review.gluster.org/12887
> 
> The release-3.7 fix will be in glusterfs-3.7.7 when it's released.
> 
> I think with even with the above fixes applied there are still some
> issues remaining. I have submitted additional/revised fixes on top of
> the above fixes at:
> 
>  master: http://review.gluster.org/13274
>  release-3.7: http://review.gluster.org/13275
> 
> I invite you to review the patches in gerrit (review.gluster.org).
> 
> Regards,
> 
> --
> 
> Kaleb


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-21 Thread Oleksandr Natalenko

With the proposed patches I get the following assertion while copying files to 
GlusterFS volume:

===
glusterfs: mem-pool.c:305: __gf_free: Assertion `0xCAFEBABE == header->magic' 
failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe9ffb700 (LWP 12635)]
0x76f215f8 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x76f215f8 in raise () from /usr/lib/libc.so.6
#1  0x76f22a7a in abort () from /usr/lib/libc.so.6
#2  0x76f1a417 in __assert_fail_base () from /usr/lib/libc.so.6
#3  0x76f1a4c2 in __assert_fail () from /usr/lib/libc.so.6
#4  0x77b6046b in __gf_free (free_ptr=0x7fffdc0b8f00) at mem-pool.c:
305
#5  0x75144eb9 in notify_kernel_loop (data=0x63df90) at fuse-bridge.c:
3893
#6  0x772994a4 in start_thread () from /usr/lib/libpthread.so.0
#7  0x76fd713d in clone () from /usr/lib/libc.so.6
===

On четвер, 21 січня 2016 р. 17:29:53 EET Kaleb KEITHLEY wrote:
> On 01/20/2016 04:08 AM, Oleksandr Natalenko wrote:
> > Yes, there are couple of messages like this in my logs too (I guess one
> > message per each remount):
> > 
> > ===
> > [2016-01-18 23:42:08.742447] I [fuse-bridge.c:3875:notify_kernel_loop] 0-
> > glusterfs-fuse: kernel notifier loop terminated
> > ===
> 
> Bug reports and fixes for master and release-3.7 branches are:
> 
> master)
>  https://bugzilla.redhat.com/show_bug.cgi?id=1288857
>  http://review.gluster.org/12886
> 
> release-3.7)
>  https://bugzilla.redhat.com/show_bug.cgi?id=1288922
>  http://review.gluster.org/12887
> 
> The release-3.7 fix will be in glusterfs-3.7.7 when it's released.
> 
> I think with even with the above fixes applied there are still some
> issues remaining. I have submitted additional/revised fixes on top of
> the above fixes at:
> 
>  master: http://review.gluster.org/13274
>  release-3.7: http://review.gluster.org/13275
> 
> I invite you to review the patches in gerrit (review.gluster.org).
> 
> Regards,
> 
> --
> 
> Kaleb


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-21 Thread Oleksandr Natalenko

I perform the tests using 1) rsync (massive copy of millions of files); 2) 
find (simple tree traversing).

To check if memory leak happens, I use find tool. I've performed two 
traversing (w/ and w/o fopen-keep-cache=off) with remount between them, but I 
didn't encounter "kernel notifier loop terminated" message during both 
traversing as well as before unmounting volume. Nevertheless, memory still 
leaks (at least up to 3 GiB in each case), so I believe invalidation requests 
are not the case.

I've also checked logs for the volume where I do rsync, and the message 
"kernel notifier loop terminated" happens somewhere in the middle of rsyncing, 
not before unmounting. But the memory starts leaking on rsync start as well, 
not just after "kernel notifier loop terminated" message. So, I believe, 
"kernel notifier loop terminated" is not the case again.

Also, I've tried to implement quick and dirty GlusterFS FUSE client using API 
(see https://github.com/pfactum/xglfs), and with latest patches from this 
thread (http://review.gluster.org/#/c/13096/, http://review.gluster.org/#/c/
13125/ and http://review.gluster.org/#/c/13232/) my FUSE client does not leak 
on tree traversing. So, I believe, this should be related to GlusterFS FUSE 
implementation.

How could I debug memory leak better?

On четвер, 21 січня 2016 р. 10:32:32 EET Xavier Hernandez wrote:
> If this message appears way before the volume is unmounted, can you try
> to start the volume manually using this command and repeat the tests ?
> 
> glusterfs --fopen-keep-cache=off --volfile-server=
> --volfile-id=/ 
> 
> This will prevent invalidation requests to be sent to the kernel, so
> there shouldn't be any memory leak even if the worker thread exits
> prematurely.
> 
> If that solves the problem, we could try to determine the cause of the
> premature exit and solve it.
> 
> Xavi
> 
> On 20/01/16 10:08, Oleksandr Natalenko wrote:
> > Yes, there are couple of messages like this in my logs too (I guess one
> > message per each remount):
> > 
> > ===
> > [2016-01-18 23:42:08.742447] I [fuse-bridge.c:3875:notify_kernel_loop] 0-
> > glusterfs-fuse: kernel notifier loop terminated
> > ===
> > 
> > On середа, 20 січня 2016 р. 09:51:23 EET Xavier Hernandez wrote:
> >> I'm seeing a similar problem with 3.7.6.
> >> 
> >> This latest statedump contains a lot of gf_fuse_mt_invalidate_node_t
> >> objects in fuse. Looking at the code I see they are used to send
> >> invalidations to kernel fuse, however this is done in a separate thread
> >> that writes a log message when it exits. On the system I'm seeing the
> >> memory leak, I can see that message in the log files:
> >> 
> >> [2016-01-18 23:04:55.384873] I [fuse-bridge.c:3875:notify_kernel_loop]
> >> 0-glusterfs-fuse: kernel notifier loop terminated
> >> 
> >> But the volume is still working at this moment, so any future inode
> >> invalidations will leak memory because it was this thread that should
> >> release it.
> >> 
> >> Can you check if you also see this message in the mount log ?
> >> 
> >> It seems that this thread terminates if write returns any error
> >> different than ENOENT. I'm not sure if there could be any other error
> >> that can cause this.
> >> 
> >> Xavi
> >> 
> >> On 20/01/16 00:13, Oleksandr Natalenko wrote:
> >>> Here is another RAM usage stats and statedump of GlusterFS mount
> >>> approaching to just another OOM:
> >>> 
> >>> ===
> >>> root 32495  1.4 88.3 4943868 1697316 ? Ssl  Jan13 129:18
> >>> /usr/sbin/
> >>> glusterfs --volfile-server=server.example.com --volfile-id=volume
> >>> /mnt/volume ===
> >>> 
> >>> https://gist.github.com/86198201c79e927b46bd
> >>> 
> >>> 1.6G of RAM just for almost idle mount (we occasionally store Asterisk
> >>> recordings there). Triple OOM for 69 days of uptime.
> >>> 
> >>> Any thoughts?
> >>> 
> >>> On середа, 13 січня 2016 р. 16:26:59 EET Soumya Koduri wrote:
> >>>> kill -USR1
> >>> 
> >>> ___
> >>> Gluster-devel mailing list
> >>> Gluster-devel@gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-20 Thread Oleksandr Natalenko

Yes, there are couple of messages like this in my logs too (I guess one 
message per each remount):

===
[2016-01-18 23:42:08.742447] I [fuse-bridge.c:3875:notify_kernel_loop] 0-
glusterfs-fuse: kernel notifier loop terminated
===

On середа, 20 січня 2016 р. 09:51:23 EET Xavier Hernandez wrote:
> I'm seeing a similar problem with 3.7.6.
> 
> This latest statedump contains a lot of gf_fuse_mt_invalidate_node_t
> objects in fuse. Looking at the code I see they are used to send
> invalidations to kernel fuse, however this is done in a separate thread
> that writes a log message when it exits. On the system I'm seeing the
> memory leak, I can see that message in the log files:
> 
> [2016-01-18 23:04:55.384873] I [fuse-bridge.c:3875:notify_kernel_loop]
> 0-glusterfs-fuse: kernel notifier loop terminated
> 
> But the volume is still working at this moment, so any future inode
> invalidations will leak memory because it was this thread that should
> release it.
> 
> Can you check if you also see this message in the mount log ?
> 
> It seems that this thread terminates if write returns any error
> different than ENOENT. I'm not sure if there could be any other error
> that can cause this.
> 
> Xavi
> 
> On 20/01/16 00:13, Oleksandr Natalenko wrote:
> > Here is another RAM usage stats and statedump of GlusterFS mount
> > approaching to just another OOM:
> > 
> > ===
> > root 32495  1.4 88.3 4943868 1697316 ? Ssl  Jan13 129:18
> > /usr/sbin/
> > glusterfs --volfile-server=server.example.com --volfile-id=volume
> > /mnt/volume ===
> > 
> > https://gist.github.com/86198201c79e927b46bd
> > 
> > 1.6G of RAM just for almost idle mount (we occasionally store Asterisk
> > recordings there). Triple OOM for 69 days of uptime.
> > 
> > Any thoughts?
> > 
> > On середа, 13 січня 2016 р. 16:26:59 EET Soumya Koduri wrote:
> >> kill -USR1
> > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-19 Thread Oleksandr Natalenko

And another statedump of FUSE mount client consuming more than 7 GiB of RAM:

https://gist.github.com/136d7c49193c798b3ade

DHT-related leak?

On середа, 13 січня 2016 р. 16:26:59 EET Soumya Koduri wrote:
> On 01/13/2016 04:08 PM, Soumya Koduri wrote:
> > On 01/12/2016 12:46 PM, Oleksandr Natalenko wrote:
> >> Just in case, here is Valgrind output on FUSE client with 3.7.6 +
> >> API-related patches we discussed before:
> >> 
> >> https://gist.github.com/cd6605ca19734c1496a4
> > 
> > Thanks for sharing the results. I made changes to fix one leak reported
> > there wrt ' client_cbk_cache_invalidation' -
> > 
> >  - http://review.gluster.org/#/c/13232/
> > 
> > The other inode* related memory reported as lost is mainly (maybe)
> > because fuse client process doesn't cleanup its memory (doesn't use
> > fini()) while exiting the process.  Hence majority of those allocations
> > are listed as lost. But most of the inodes should have got purged when
> > we drop vfs cache. Did you do drop vfs cache before exiting the process?
> > 
> > I shall add some log statements and check that part
> 
> Also please take statedump of the fuse mount process (after dropping vfs
> cache) when you see high memory usage by issuing the following command -
>   'kill -USR1 '
> 
> The statedump will be copied to 'glusterdump..dump.tim
> estamp` file in /var/run/gluster or /usr/local/var/run/gluster.
> Please refer to [1] for more information.
> 
> Thanks,
> Soumya
> [1] http://review.gluster.org/#/c/8288/1/doc/debugging/statedump.md
> 
> > Thanks,
> > Soumya
> > 
> >> 12.01.2016 08:24, Soumya Koduri написав:
> >>> For fuse client, I tried vfs drop_caches as suggested by Vijay in an
> >>> earlier mail. Though all the inodes get purged, I still doesn't see
> >>> much difference in the memory footprint drop. Need to investigate what
> >>> else is consuming so much memory here.
> > 
> > ___
> > Gluster-users mailing list
> > gluster-us...@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-19 Thread Oleksandr Natalenko

Here is another RAM usage stats and statedump of GlusterFS mount approaching 
to just another OOM:

===
root 32495  1.4 88.3 4943868 1697316 ? Ssl  Jan13 129:18 /usr/sbin/
glusterfs --volfile-server=server.example.com --volfile-id=volume /mnt/volume
===

https://gist.github.com/86198201c79e927b46bd

1.6G of RAM just for almost idle mount (we occasionally store Asterisk 
recordings there). Triple OOM for 69 days of uptime.

Any thoughts?

On середа, 13 січня 2016 р. 16:26:59 EET Soumya Koduri wrote:
> kill -USR1


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file

2016-01-18 Thread Oleksandr Natalenko

XFS. Server side works OK, I'm able to mount volume again. Brick is 30% full.

On понеділок, 18 січня 2016 р. 15:07:18 EET baul jianguo wrote:
> What is your brick file system? and the glusterfsd process and all
> thread status?
> I met same issue when client app such as rsync stay in D status,and
> the brick process and relate thread also be in the D status.
> And the brick dev disk util is 100% .
> 
> On Sun, Jan 17, 2016 at 6:13 AM, Oleksandr Natalenko
> 
>  wrote:
> > Wrong assumption, rsync hung again.
> > 
> > On субота, 16 січня 2016 р. 22:53:04 EET Oleksandr Natalenko wrote:
> >> One possible reason:
> >> 
> >> cluster.lookup-optimize: on
> >> cluster.readdir-optimize: on
> >> 
> >> I've disabled both optimizations, and at least as of now rsync still does
> >> its job with no issues. I would like to find out what option causes such
> >> a
> >> behavior and why. Will test more.
> >> 
> >> On пʼятниця, 15 січня 2016 р. 16:09:51 EET Oleksandr Natalenko wrote:
> >> > Another observation: if rsyncing is resumed after hang, rsync itself
> >> > hangs a lot faster because it does stat of already copied files. So,
> >> > the
> >> > reason may be not writing itself, but massive stat on GlusterFS volume
> >> > as well.
> >> > 
> >> > 15.01.2016 09:40, Oleksandr Natalenko написав:
> >> > > While doing rsync over millions of files from ordinary partition to
> >> > > GlusterFS volume, just after approx. first 2 million rsync hang
> >> > > happens, and the following info appears in dmesg:
> >> > > 
> >> > > ===
> >> > > [17075038.924481] INFO: task rsync:10310 blocked for more than 120
> >> > > seconds.
> >> > > [17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> > > disables this message.
> >> > > [17075038.940748] rsync   D 88207fc13680 0 10310
> >> > > 10309 0x0080
> >> > > [17075038.940752]  8809c578be18 0086 8809c578bfd8
> >> > > 00013680
> >> > > [17075038.940756]  8809c578bfd8 00013680 880310cbe660
> >> > > 881159d16a30
> >> > > [17075038.940759]  881e3aa25800 8809c578be48 881159d16b10
> >> > > 88087d553980
> >> > > [17075038.940762] Call Trace:
> >> > > [17075038.940770]  [] schedule+0x29/0x70
> >> > > [17075038.940797]  []
> >> > > __fuse_request_send+0x13d/0x2c0
> >> > > [fuse]
> >> > > [17075038.940801]  [] ?
> >> > > fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse]
> >> > > [17075038.940805]  [] ? wake_up_bit+0x30/0x30
> >> > > [17075038.940809]  [] fuse_request_send+0x12/0x20
> >> > > [fuse]
> >> > > [17075038.940813]  [] fuse_flush+0xff/0x150 [fuse]
> >> > > [17075038.940817]  [] filp_close+0x34/0x80
> >> > > [17075038.940821]  [] __close_fd+0x78/0xa0
> >> > > [17075038.940824]  [] SyS_close+0x23/0x50
> >> > > [17075038.940828]  []
> >> > > system_call_fastpath+0x16/0x1b
> >> > > ===
> >> > > 
> >> > > rsync blocks in D state, and to kill it, I have to do umount --lazy
> >> > > on
> >> > > GlusterFS mountpoint, and then kill corresponding client glusterfs
> >> > > process. Then rsync exits.
> >> > > 
> >> > > Here is GlusterFS volume info:
> >> > > 
> >> > > ===
> >> > > Volume Name: asterisk_records
> >> > > Type: Distributed-Replicate
> >> > > Volume ID: dc1fe561-fa3a-4f2e-8330-ec7e52c75ba4
> >> > > Status: Started
> >> > > Number of Bricks: 3 x 2 = 6
> >> > > Transport-type: tcp
> >> > > Bricks:
> >> > > Brick1:
> >> > > server1:/bricks/10_megaraid_0_3_9_x_0_4_3_hdd_r1_nolvm_hdd_storage_01
> >> > > /as
> >> > > te
> >> > > risk/records Brick2:
> >> > > server2:/bricks/10_megaraid_8_5_14_x_8_6_16_hdd_r1_nolvm_hdd_storage_
> >> > > 01/
> >> > > as
> >> > > terisk/records Brick3:
> >> > > server1:/bricks/11_megaraid_0_5_4_x_0_6_5_hdd_r1_nolvm_hdd_storage_02
> >> > > /as
> >> > > te
> >> > > risk/records Brick4:
> >>

Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file

2016-01-16 Thread Oleksandr Natalenko

Wrong assumption, rsync hung again.

On субота, 16 січня 2016 р. 22:53:04 EET Oleksandr Natalenko wrote:
> One possible reason:
> 
> cluster.lookup-optimize: on
> cluster.readdir-optimize: on
> 
> I've disabled both optimizations, and at least as of now rsync still does
> its job with no issues. I would like to find out what option causes such a
> behavior and why. Will test more.
> 
> On пʼятниця, 15 січня 2016 р. 16:09:51 EET Oleksandr Natalenko wrote:
> > Another observation: if rsyncing is resumed after hang, rsync itself
> > hangs a lot faster because it does stat of already copied files. So, the
> > reason may be not writing itself, but massive stat on GlusterFS volume
> > as well.
> > 
> > 15.01.2016 09:40, Oleksandr Natalenko написав:
> > > While doing rsync over millions of files from ordinary partition to
> > > GlusterFS volume, just after approx. first 2 million rsync hang
> > > happens, and the following info appears in dmesg:
> > > 
> > > ===
> > > [17075038.924481] INFO: task rsync:10310 blocked for more than 120
> > > seconds.
> > > [17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables this message.
> > > [17075038.940748] rsync   D 88207fc13680 0 10310
> > > 10309 0x0080
> > > [17075038.940752]  8809c578be18 0086 8809c578bfd8
> > > 00013680
> > > [17075038.940756]  8809c578bfd8 00013680 880310cbe660
> > > 881159d16a30
> > > [17075038.940759]  881e3aa25800 8809c578be48 881159d16b10
> > > 88087d553980
> > > [17075038.940762] Call Trace:
> > > [17075038.940770]  [] schedule+0x29/0x70
> > > [17075038.940797]  [] __fuse_request_send+0x13d/0x2c0
> > > [fuse]
> > > [17075038.940801]  [] ?
> > > fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse]
> > > [17075038.940805]  [] ? wake_up_bit+0x30/0x30
> > > [17075038.940809]  [] fuse_request_send+0x12/0x20
> > > [fuse]
> > > [17075038.940813]  [] fuse_flush+0xff/0x150 [fuse]
> > > [17075038.940817]  [] filp_close+0x34/0x80
> > > [17075038.940821]  [] __close_fd+0x78/0xa0
> > > [17075038.940824]  [] SyS_close+0x23/0x50
> > > [17075038.940828]  [] system_call_fastpath+0x16/0x1b
> > > ===
> > > 
> > > rsync blocks in D state, and to kill it, I have to do umount --lazy on
> > > GlusterFS mountpoint, and then kill corresponding client glusterfs
> > > process. Then rsync exits.
> > > 
> > > Here is GlusterFS volume info:
> > > 
> > > ===
> > > Volume Name: asterisk_records
> > > Type: Distributed-Replicate
> > > Volume ID: dc1fe561-fa3a-4f2e-8330-ec7e52c75ba4
> > > Status: Started
> > > Number of Bricks: 3 x 2 = 6
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1:
> > > server1:/bricks/10_megaraid_0_3_9_x_0_4_3_hdd_r1_nolvm_hdd_storage_01/as
> > > te
> > > risk/records Brick2:
> > > server2:/bricks/10_megaraid_8_5_14_x_8_6_16_hdd_r1_nolvm_hdd_storage_01/
> > > as
> > > terisk/records Brick3:
> > > server1:/bricks/11_megaraid_0_5_4_x_0_6_5_hdd_r1_nolvm_hdd_storage_02/as
> > > te
> > > risk/records Brick4:
> > > server2:/bricks/11_megaraid_8_7_15_x_8_8_20_hdd_r1_nolvm_hdd_storage_02/
> > > as
> > > terisk/records Brick5:
> > > server1:/bricks/12_megaraid_0_7_6_x_0_13_14_hdd_r1_nolvm_hdd_storage_03/
> > > as
> > > terisk/records Brick6:
> > > server2:/bricks/12_megaraid_8_9_19_x_8_13_24_hdd_r1_nolvm_hdd_storage_03
> > > /a
> > > sterisk/records Options Reconfigured:
> > > cluster.lookup-optimize: on
> > > cluster.readdir-optimize: on
> > > client.event-threads: 2
> > > network.inode-lru-limit: 4096
> > > server.event-threads: 4
> > > performance.client-io-threads: on
> > > storage.linux-aio: on
> > > performance.write-behind-window-size: 4194304
> > > performance.stat-prefetch: on
> > > performance.quick-read: on
> > > performance.read-ahead: on
> > > performance.flush-behind: on
> > > performance.write-behind: on
> > > performance.io-thread-count: 2
> > > performance.cache-max-file-size: 1048576
> > > performance.cache-size: 33554432
> > > features.cache-invalidation: on
> > > performance.readdir-ahead: on
> > > ===
> > > 
> > > The issue reproduces each time I rsync such an amount of files.
> > > 
> > > How could I debug this issue better?
> > > ___
> > > Gluster-users mailing list
> > > gluster-us...@gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file

2016-01-16 Thread Oleksandr Natalenko

One possible reason:

cluster.lookup-optimize: on
cluster.readdir-optimize: on

I've disabled both optimizations, and at least as of now rsync still does its 
job with no issues. I would like to find out what option causes such a 
behavior and why. Will test more.

On пʼятниця, 15 січня 2016 р. 16:09:51 EET Oleksandr Natalenko wrote:
> Another observation: if rsyncing is resumed after hang, rsync itself
> hangs a lot faster because it does stat of already copied files. So, the
> reason may be not writing itself, but massive stat on GlusterFS volume
> as well.
> 
> 15.01.2016 09:40, Oleksandr Natalenko написав:
> > While doing rsync over millions of files from ordinary partition to
> > GlusterFS volume, just after approx. first 2 million rsync hang
> > happens, and the following info appears in dmesg:
> > 
> > ===
> > [17075038.924481] INFO: task rsync:10310 blocked for more than 120
> > seconds.
> > [17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [17075038.940748] rsync   D 88207fc13680 0 10310
> > 10309 0x0080
> > [17075038.940752]  8809c578be18 0086 8809c578bfd8
> > 00013680
> > [17075038.940756]  8809c578bfd8 00013680 880310cbe660
> > 881159d16a30
> > [17075038.940759]  881e3aa25800 8809c578be48 881159d16b10
> > 88087d553980
> > [17075038.940762] Call Trace:
> > [17075038.940770]  [] schedule+0x29/0x70
> > [17075038.940797]  [] __fuse_request_send+0x13d/0x2c0
> > [fuse]
> > [17075038.940801]  [] ?
> > fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse]
> > [17075038.940805]  [] ? wake_up_bit+0x30/0x30
> > [17075038.940809]  [] fuse_request_send+0x12/0x20
> > [fuse]
> > [17075038.940813]  [] fuse_flush+0xff/0x150 [fuse]
> > [17075038.940817]  [] filp_close+0x34/0x80
> > [17075038.940821]  [] __close_fd+0x78/0xa0
> > [17075038.940824]  [] SyS_close+0x23/0x50
> > [17075038.940828]  [] system_call_fastpath+0x16/0x1b
> > ===
> > 
> > rsync blocks in D state, and to kill it, I have to do umount --lazy on
> > GlusterFS mountpoint, and then kill corresponding client glusterfs
> > process. Then rsync exits.
> > 
> > Here is GlusterFS volume info:
> > 
> > ===
> > Volume Name: asterisk_records
> > Type: Distributed-Replicate
> > Volume ID: dc1fe561-fa3a-4f2e-8330-ec7e52c75ba4
> > Status: Started
> > Number of Bricks: 3 x 2 = 6
> > Transport-type: tcp
> > Bricks:
> > Brick1:
> > server1:/bricks/10_megaraid_0_3_9_x_0_4_3_hdd_r1_nolvm_hdd_storage_01/aste
> > risk/records Brick2:
> > server2:/bricks/10_megaraid_8_5_14_x_8_6_16_hdd_r1_nolvm_hdd_storage_01/as
> > terisk/records Brick3:
> > server1:/bricks/11_megaraid_0_5_4_x_0_6_5_hdd_r1_nolvm_hdd_storage_02/aste
> > risk/records Brick4:
> > server2:/bricks/11_megaraid_8_7_15_x_8_8_20_hdd_r1_nolvm_hdd_storage_02/as
> > terisk/records Brick5:
> > server1:/bricks/12_megaraid_0_7_6_x_0_13_14_hdd_r1_nolvm_hdd_storage_03/as
> > terisk/records Brick6:
> > server2:/bricks/12_megaraid_8_9_19_x_8_13_24_hdd_r1_nolvm_hdd_storage_03/a
> > sterisk/records Options Reconfigured:
> > cluster.lookup-optimize: on
> > cluster.readdir-optimize: on
> > client.event-threads: 2
> > network.inode-lru-limit: 4096
> > server.event-threads: 4
> > performance.client-io-threads: on
> > storage.linux-aio: on
> > performance.write-behind-window-size: 4194304
> > performance.stat-prefetch: on
> > performance.quick-read: on
> > performance.read-ahead: on
> > performance.flush-behind: on
> > performance.write-behind: on
> > performance.io-thread-count: 2
> > performance.cache-max-file-size: 1048576
> > performance.cache-size: 33554432
> > features.cache-invalidation: on
> > performance.readdir-ahead: on
> > ===
> > 
> > The issue reproduces each time I rsync such an amount of files.
> > 
> > How could I debug this issue better?
> > ___
> > Gluster-users mailing list
> > gluster-us...@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file

2016-01-15 Thread Oleksandr Natalenko

Another observation: if rsyncing is resumed after hang, rsync itself 
hangs a lot faster because it does stat of already copied files. So, the 
reason may be not writing itself, but massive stat on GlusterFS volume 
as well.


15.01.2016 09:40, Oleksandr Natalenko написав:

While doing rsync over millions of files from ordinary partition to
GlusterFS volume, just after approx. first 2 million rsync hang
happens, and the following info appears in dmesg:

===
[17075038.924481] INFO: task rsync:10310 blocked for more than 120 
seconds.

[17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[17075038.940748] rsync   D 88207fc13680 0 10310
10309 0x0080
[17075038.940752]  8809c578be18 0086 8809c578bfd8
00013680
[17075038.940756]  8809c578bfd8 00013680 880310cbe660
881159d16a30
[17075038.940759]  881e3aa25800 8809c578be48 881159d16b10
88087d553980
[17075038.940762] Call Trace:
[17075038.940770]  [] schedule+0x29/0x70
[17075038.940797]  [] __fuse_request_send+0x13d/0x2c0 
[fuse]

[17075038.940801]  [] ?
fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse]
[17075038.940805]  [] ? wake_up_bit+0x30/0x30
[17075038.940809]  [] fuse_request_send+0x12/0x20 
[fuse]

[17075038.940813]  [] fuse_flush+0xff/0x150 [fuse]
[17075038.940817]  [] filp_close+0x34/0x80
[17075038.940821]  [] __close_fd+0x78/0xa0
[17075038.940824]  [] SyS_close+0x23/0x50
[17075038.940828]  [] system_call_fastpath+0x16/0x1b
===

rsync blocks in D state, and to kill it, I have to do umount --lazy on
GlusterFS mountpoint, and then kill corresponding client glusterfs
process. Then rsync exits.

Here is GlusterFS volume info:

===
Volume Name: asterisk_records
Type: Distributed-Replicate
Volume ID: dc1fe561-fa3a-4f2e-8330-ec7e52c75ba4
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1:
server1:/bricks/10_megaraid_0_3_9_x_0_4_3_hdd_r1_nolvm_hdd_storage_01/asterisk/records
Brick2:
server2:/bricks/10_megaraid_8_5_14_x_8_6_16_hdd_r1_nolvm_hdd_storage_01/asterisk/records
Brick3:
server1:/bricks/11_megaraid_0_5_4_x_0_6_5_hdd_r1_nolvm_hdd_storage_02/asterisk/records
Brick4:
server2:/bricks/11_megaraid_8_7_15_x_8_8_20_hdd_r1_nolvm_hdd_storage_02/asterisk/records
Brick5:
server1:/bricks/12_megaraid_0_7_6_x_0_13_14_hdd_r1_nolvm_hdd_storage_03/asterisk/records
Brick6:
server2:/bricks/12_megaraid_8_9_19_x_8_13_24_hdd_r1_nolvm_hdd_storage_03/asterisk/records
Options Reconfigured:
cluster.lookup-optimize: on
cluster.readdir-optimize: on
client.event-threads: 2
network.inode-lru-limit: 4096
server.event-threads: 4
performance.client-io-threads: on
storage.linux-aio: on
performance.write-behind-window-size: 4194304
performance.stat-prefetch: on
performance.quick-read: on
performance.read-ahead: on
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 2
performance.cache-max-file-size: 1048576
performance.cache-size: 33554432
features.cache-invalidation: on
performance.readdir-ahead: on
===

The issue reproduces each time I rsync such an amount of files.

How could I debug this issue better?
___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file

2016-01-14 Thread Oleksandr Natalenko


Here is similar issue described on serverfault.com:

https://serverfault.com/questions/716410/rsync-crashes-machine-while-performing-sync-on-glusterfs-mounted-share

I've checked GlusterFS logs with no luck — as if nothing happened.

P.S. GlusterFS v3.7.6.

15.01.2016 09:40, Oleksandr Natalenko написав:

While doing rsync over millions of files from ordinary partition to
GlusterFS volume, just after approx. first 2 million rsync hang
happens, and the following info appears in dmesg:

===
[17075038.924481] INFO: task rsync:10310 blocked for more than 120 
seconds.

[17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[17075038.940748] rsync   D 88207fc13680 0 10310
10309 0x0080
[17075038.940752]  8809c578be18 0086 8809c578bfd8
00013680
[17075038.940756]  8809c578bfd8 00013680 880310cbe660
881159d16a30
[17075038.940759]  881e3aa25800 8809c578be48 881159d16b10
88087d553980
[17075038.940762] Call Trace:
[17075038.940770]  [] schedule+0x29/0x70
[17075038.940797]  [] __fuse_request_send+0x13d/0x2c0 
[fuse]

[17075038.940801]  [] ?
fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse]
[17075038.940805]  [] ? wake_up_bit+0x30/0x30
[17075038.940809]  [] fuse_request_send+0x12/0x20 
[fuse]

[17075038.940813]  [] fuse_flush+0xff/0x150 [fuse]
[17075038.940817]  [] filp_close+0x34/0x80
[17075038.940821]  [] __close_fd+0x78/0xa0
[17075038.940824]  [] SyS_close+0x23/0x50
[17075038.940828]  [] system_call_fastpath+0x16/0x1b
===

rsync blocks in D state, and to kill it, I have to do umount --lazy on
GlusterFS mountpoint, and then kill corresponding client glusterfs
process. Then rsync exits.

Here is GlusterFS volume info:

===
Volume Name: asterisk_records
Type: Distributed-Replicate
Volume ID: dc1fe561-fa3a-4f2e-8330-ec7e52c75ba4
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1:
server1:/bricks/10_megaraid_0_3_9_x_0_4_3_hdd_r1_nolvm_hdd_storage_01/asterisk/records
Brick2:
server2:/bricks/10_megaraid_8_5_14_x_8_6_16_hdd_r1_nolvm_hdd_storage_01/asterisk/records
Brick3:
server1:/bricks/11_megaraid_0_5_4_x_0_6_5_hdd_r1_nolvm_hdd_storage_02/asterisk/records
Brick4:
server2:/bricks/11_megaraid_8_7_15_x_8_8_20_hdd_r1_nolvm_hdd_storage_02/asterisk/records
Brick5:
server1:/bricks/12_megaraid_0_7_6_x_0_13_14_hdd_r1_nolvm_hdd_storage_03/asterisk/records
Brick6:
server2:/bricks/12_megaraid_8_9_19_x_8_13_24_hdd_r1_nolvm_hdd_storage_03/asterisk/records
Options Reconfigured:
cluster.lookup-optimize: on
cluster.readdir-optimize: on
client.event-threads: 2
network.inode-lru-limit: 4096
server.event-threads: 4
performance.client-io-threads: on
storage.linux-aio: on
performance.write-behind-window-size: 4194304
performance.stat-prefetch: on
performance.quick-read: on
performance.read-ahead: on
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 2
performance.cache-max-file-size: 1048576
performance.cache-size: 33554432
features.cache-invalidation: on
performance.readdir-ahead: on
===

The issue reproduces each time I rsync such an amount of files.

How could I debug this issue better?
___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] GlusterFS FUSE client hangs on rsyncing lots of file

2016-01-14 Thread Oleksandr Natalenko

While doing rsync over millions of files from ordinary partition to 
GlusterFS volume, just after approx. first 2 million rsync hang happens, 
and the following info appears in dmesg:


===
[17075038.924481] INFO: task rsync:10310 blocked for more than 120 
seconds.
[17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[17075038.940748] rsync   D 88207fc13680 0 10310  10309 
0x0080
[17075038.940752]  8809c578be18 0086 8809c578bfd8 
00013680
[17075038.940756]  8809c578bfd8 00013680 880310cbe660 
881159d16a30
[17075038.940759]  881e3aa25800 8809c578be48 881159d16b10 
88087d553980

[17075038.940762] Call Trace:
[17075038.940770]  [] schedule+0x29/0x70
[17075038.940797]  [] __fuse_request_send+0x13d/0x2c0 
[fuse]
[17075038.940801]  [] ? 
fuse_get_req_nofail_nopages+0xc0/0x1e0 [fuse]

[17075038.940805]  [] ? wake_up_bit+0x30/0x30
[17075038.940809]  [] fuse_request_send+0x12/0x20 
[fuse]

[17075038.940813]  [] fuse_flush+0xff/0x150 [fuse]
[17075038.940817]  [] filp_close+0x34/0x80
[17075038.940821]  [] __close_fd+0x78/0xa0
[17075038.940824]  [] SyS_close+0x23/0x50
[17075038.940828]  [] system_call_fastpath+0x16/0x1b
===

rsync blocks in D state, and to kill it, I have to do umount --lazy on 
GlusterFS mountpoint, and then kill corresponding client glusterfs 
process. Then rsync exits.


Here is GlusterFS volume info:

===
Volume Name: asterisk_records
Type: Distributed-Replicate
Volume ID: dc1fe561-fa3a-4f2e-8330-ec7e52c75ba4
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 
server1:/bricks/10_megaraid_0_3_9_x_0_4_3_hdd_r1_nolvm_hdd_storage_01/asterisk/records
Brick2: 
server2:/bricks/10_megaraid_8_5_14_x_8_6_16_hdd_r1_nolvm_hdd_storage_01/asterisk/records
Brick3: 
server1:/bricks/11_megaraid_0_5_4_x_0_6_5_hdd_r1_nolvm_hdd_storage_02/asterisk/records
Brick4: 
server2:/bricks/11_megaraid_8_7_15_x_8_8_20_hdd_r1_nolvm_hdd_storage_02/asterisk/records
Brick5: 
server1:/bricks/12_megaraid_0_7_6_x_0_13_14_hdd_r1_nolvm_hdd_storage_03/asterisk/records
Brick6: 
server2:/bricks/12_megaraid_8_9_19_x_8_13_24_hdd_r1_nolvm_hdd_storage_03/asterisk/records

Options Reconfigured:
cluster.lookup-optimize: on
cluster.readdir-optimize: on
client.event-threads: 2
network.inode-lru-limit: 4096
server.event-threads: 4
performance.client-io-threads: on
storage.linux-aio: on
performance.write-behind-window-size: 4194304
performance.stat-prefetch: on
performance.quick-read: on
performance.read-ahead: on
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 2
performance.cache-max-file-size: 1048576
performance.cache-size: 33554432
features.cache-invalidation: on
performance.readdir-ahead: on
===

The issue reproduces each time I rsync such an amount of files.

How could I debug this issue better?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-13 Thread Oleksandr Natalenko

I've applied client_cbk_cache_invalidation leak patch, and here are the 
results.


Launch:

===
valgrind --leak-check=full --show-leak-kinds=all 
--log-file="valgrind_fuse.log" /usr/bin/glusterfs -N 
--volfile-server=server.example.com --volfile-id=somevolume 
/mnt/somevolume

find /mnt/somevolume -type d
===

During the traversing, memory RSS value for glusterfs process went from 
79M to 644M. Then I performed dropping VFS cache (as I did in previous 
tests), but RSS value was not affected. Then I did statedump:


https://gist.github.com/11c7b11fc99ab123e6e2

Then I unmounted the volume and got Valgrind log:

https://gist.github.com/99d2e3c5cb4ed50b091c

Leaks reported by Valgrind do not conform by their size to overall 
runtime memory consumption, so I believe with the latest patch some 
cleanup is being performed better on exit (unmount), but in runtime 
there are still some issues.


13.01.2016 12:56, Soumya Koduri написав:

On 01/13/2016 04:08 PM, Soumya Koduri wrote:



On 01/12/2016 12:46 PM, Oleksandr Natalenko wrote:

Just in case, here is Valgrind output on FUSE client with 3.7.6 +
API-related patches we discussed before:

https://gist.github.com/cd6605ca19734c1496a4



Thanks for sharing the results. I made changes to fix one leak 
reported

there wrt ' client_cbk_cache_invalidation' -
 - http://review.gluster.org/#/c/13232/

The other inode* related memory reported as lost is mainly (maybe)
because fuse client process doesn't cleanup its memory (doesn't use
fini()) while exiting the process.  Hence majority of those 
allocations

are listed as lost. But most of the inodes should have got purged when
we drop vfs cache. Did you do drop vfs cache before exiting the 
process?


I shall add some log statements and check that part


Also please take statedump of the fuse mount process (after dropping
vfs cache) when you see high memory usage by issuing the following
command -
'kill -USR1 '

The statedump will be copied to 'glusterdump..dump.tim
estamp` file in /var/run/gluster or /usr/local/var/run/gluster.
Please refer to [1] for more information.

Thanks,
Soumya
[1] http://review.gluster.org/#/c/8288/1/doc/debugging/statedump.md



Thanks,
Soumya


12.01.2016 08:24, Soumya Koduri написав:

For fuse client, I tried vfs drop_caches as suggested by Vijay in an
earlier mail. Though all the inodes get purged, I still doesn't see
much difference in the memory footprint drop. Need to investigate 
what

else is consuming so much memory here.

___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-11 Thread Oleksandr Natalenko

Just in case, here is Valgrind output on FUSE client with 3.7.6 + 
API-related patches we discussed before:


https://gist.github.com/cd6605ca19734c1496a4

12.01.2016 08:24, Soumya Koduri написав:

For fuse client, I tried vfs drop_caches as suggested by Vijay in an
earlier mail. Though all the inodes get purged, I still doesn't see
much difference in the memory footprint drop. Need to investigate what
else is consuming so much memory here.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-11 Thread Oleksandr Natalenko

Brief test shows that Ganesha stopped leaking and crashing, so it seems 
to be good for me.


Nevertheless, back to my original question: what about FUSE client? It 
is still leaking despite all the fixes applied. Should it be considered 
another issue?


11.01.2016 12:26, Soumya Koduri написав:

I have made changes to fix the lookup leak in a different way (as
discussed with Pranith) and uploaded them in the latest patch set #4
- http://review.gluster.org/#/c/13096/

Please check if it resolves the mem leak and hopefully doesn't result
in any assertion :)

Thanks,
Soumya

On 01/08/2016 05:04 PM, Soumya Koduri wrote:

I could reproduce while testing deep directories with in the mount
point. I root caus'ed the issue & had discussion with Pranith to
understand the purpose and recommended way of taking nlookup on 
inodes.


I shall make changes to my existing fix and post the patch soon.
Thanks for your patience!

-Soumya

On 01/07/2016 07:34 PM, Oleksandr Natalenko wrote:
OK, I've patched GlusterFS v3.7.6 with 43570a01 and 5cffb56b (the 
most

recent
revisions) and NFS-Ganesha v2.3.0 with 8685abfc (most recent revision
too).

On traversing GlusterFS volume with many files in one folder via NFS
mount I
get an assertion:

===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup 
>=

nlookup' failed.
===

I used GDB on NFS-Ganesha process to get appropriate stacktraces:

1. short stacktrace of failed thread:

https://gist.github.com/7f63bb99c530d26ded18

2. full stacktrace of failed thread:

https://gist.github.com/d9bc7bc8f6a0bbff9e86

3. short stacktrace of all threads:

https://gist.github.com/f31da7725306854c719f

4. full stacktrace of all threads:

https://gist.github.com/65cbc562b01211ea5612

GlusterFS volume configuration:

https://gist.github.com/30f0129d16e25d4a5a52

ganesha.conf:

https://gist.github.com/9b5e59b8d6d8cb84c85d

How I mount NFS share:

===
mount -t nfs4 127.0.0.1:/mail_boxes /mnt/tmp -o
defaults,_netdev,minorversion=2,noac,noacl,lookupcache=none,timeo=100
===

On четвер, 7 січня 2016 р. 12:06:42 EET Soumya Koduri wrote:

Entries_HWMark = 500;




___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-07 Thread Oleksandr Natalenko

OK, I've patched GlusterFS v3.7.6 with 43570a01 and 5cffb56b (the most recent 
revisions) and NFS-Ganesha v2.3.0 with 8685abfc (most recent revision too).

On traversing GlusterFS volume with many files in one folder via NFS mount I 
get an assertion:

===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >=
nlookup' failed.
===

I used GDB on NFS-Ganesha process to get appropriate stacktraces:

1. short stacktrace of failed thread:

https://gist.github.com/7f63bb99c530d26ded18

2. full stacktrace of failed thread:

https://gist.github.com/d9bc7bc8f6a0bbff9e86

3. short stacktrace of all threads:

https://gist.github.com/f31da7725306854c719f

4. full stacktrace of all threads:

https://gist.github.com/65cbc562b01211ea5612

GlusterFS volume configuration:

https://gist.github.com/30f0129d16e25d4a5a52

ganesha.conf:

https://gist.github.com/9b5e59b8d6d8cb84c85d

How I mount NFS share:

===
mount -t nfs4 127.0.0.1:/mail_boxes /mnt/tmp -o 
defaults,_netdev,minorversion=2,noac,noacl,lookupcache=none,timeo=100
===

On четвер, 7 січня 2016 р. 12:06:42 EET Soumya Koduri wrote:
> Entries_HWMark = 500;


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-06 Thread Oleksandr Natalenko

OK, here is valgrind log of patched Ganesha (I took recent version of 
your patchset, 8685abfc6d) with Entries_HWMARK set to 500.


https://gist.github.com/5397c152a259b9600af0

See no huge runtime leaks now. However, I've repeated this test with 
another volume in replica and got the following Ganesha error:


===
ganesha.nfsd: inode.c:716: __inode_forget: Assertion `inode->nlookup >= 
nlookup' failed.

===

06.01.2016 08:40, Soumya Koduri написав:

On 01/06/2016 03:53 AM, Oleksandr Natalenko wrote:
OK, I've repeated the same traversing test with patched GlusterFS API, 
and

here is new Valgrind log:

https://gist.github.com/17ecb16a11c9aed957f5


Fuse mount doesn't use gfapi helper. Does your above GlusterFS API
application call glfs_fini() during exit? glfs_fini() is responsible
for freeing the memory consumed by gfAPI applications.

Could you repeat the test with nfs-ganesha (which for sure calls
glfs_fini() and purges inodes if exceeds its inode cache limit) if
possible.

Thanks,
Soumya


Still leaks.

On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:

On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:

Unfortunately, both patches didn't make any difference for me.

I've patched 3.7.6 with both patches, recompiled and installed 
patched
GlusterFS package on client side and mounted volume with ~2M of 
files.

The I performed usual tree traverse with simple "find".

Memory RES value went from ~130M at the moment of mounting to ~1.5G
after traversing the volume for ~40 mins. Valgrind log still shows 
lots

of leaks. Here it is:

https://gist.github.com/56906ca6e657c4ffa4a1


Looks like you had done fuse mount. The patches which I have pasted
below apply to gfapi/nfs-ganesha applications.

Also, to resolve the nfs-ganesha issue which I had mentioned below  
(in
case if Entries_HWMARK option gets changed), I have posted below fix 
-

https://review.gerrithub.io/#/c/258687

Thanks,
Soumya


Ideas?

05.01.2016 12:31, Soumya Koduri написав:
I tried to debug the inode* related leaks and seen some 
improvements

after applying the below patches when ran the same test (but will
smaller load). Could you please apply those patches & confirm the
same?

a) http://review.gluster.org/13125

This will fix the inodes & their ctx related leaks during unexport 
and
the program exit. Please check the valgrind output after applying 
the

patch. It should not list any inodes related memory as lost.

b) http://review.gluster.org/13096

The reason the change in Entries_HWMARK (in your earlier mail) dint
have much effect is that the inode_nlookup count doesn't become 
zero
for those handles/inodes being closed by ganesha. Hence those 
inodes

shall get added to inode lru list instead of purge list which shall
get forcefully purged only when the number of gfapi inode table
entries reaches its limit (which is 137012).

This patch fixes those 'nlookup' counts. Please apply this patch 
and
reduce 'Entries_HWMARK' to much lower value and check if it 
decreases

the in-memory being consumed by ganesha process while being active.

CACHEINODE {

 Entries_HWMark = 500;

}


Note: I see an issue with nfs-ganesha during exit when the option
'Entries_HWMARK' gets changed. This is not related to any of the 
above

patches (or rather Gluster) and I am currently debugging it.

Thanks,
Soumya

On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf 
-N

NIV_EVENT

After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf 
-N

NIV_EVENT

~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf 
-N

NIV_EVENT

After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf 
-N

NIV_EVENT

~159M leak.

No reasonable correlation detected. Second test was finished much
faster than
first (I guess, server-side GlusterFS cache or server kernel page
cache is the
cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
Another addition: it seems to be GlusterFS API library memory 
leak
because NFS-Ganesha also consumes huge amount of memory while 
doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is 
memory

usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-05 Thread Oleksandr Natalenko

OK, I've repeated the same traversing test with patched GlusterFS API, and 
here is new Valgrind log:

https://gist.github.com/17ecb16a11c9aed957f5

Still leaks.

On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:
> On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:
> > Unfortunately, both patches didn't make any difference for me.
> > 
> > I've patched 3.7.6 with both patches, recompiled and installed patched
> > GlusterFS package on client side and mounted volume with ~2M of files.
> > The I performed usual tree traverse with simple "find".
> > 
> > Memory RES value went from ~130M at the moment of mounting to ~1.5G
> > after traversing the volume for ~40 mins. Valgrind log still shows lots
> > of leaks. Here it is:
> > 
> > https://gist.github.com/56906ca6e657c4ffa4a1
> 
> Looks like you had done fuse mount. The patches which I have pasted
> below apply to gfapi/nfs-ganesha applications.
> 
> Also, to resolve the nfs-ganesha issue which I had mentioned below  (in
> case if Entries_HWMARK option gets changed), I have posted below fix -
>   https://review.gerrithub.io/#/c/258687
> 
> Thanks,
> Soumya
> 
> > Ideas?
> > 
> > 05.01.2016 12:31, Soumya Koduri написав:
> >> I tried to debug the inode* related leaks and seen some improvements
> >> after applying the below patches when ran the same test (but will
> >> smaller load). Could you please apply those patches & confirm the
> >> same?
> >> 
> >> a) http://review.gluster.org/13125
> >> 
> >> This will fix the inodes & their ctx related leaks during unexport and
> >> the program exit. Please check the valgrind output after applying the
> >> patch. It should not list any inodes related memory as lost.
> >> 
> >> b) http://review.gluster.org/13096
> >> 
> >> The reason the change in Entries_HWMARK (in your earlier mail) dint
> >> have much effect is that the inode_nlookup count doesn't become zero
> >> for those handles/inodes being closed by ganesha. Hence those inodes
> >> shall get added to inode lru list instead of purge list which shall
> >> get forcefully purged only when the number of gfapi inode table
> >> entries reaches its limit (which is 137012).
> >> 
> >> This patch fixes those 'nlookup' counts. Please apply this patch and
> >> reduce 'Entries_HWMARK' to much lower value and check if it decreases
> >> the in-memory being consumed by ganesha process while being active.
> >> 
> >> CACHEINODE {
> >> 
> >> Entries_HWMark = 500;
> >> 
> >> }
> >> 
> >> 
> >> Note: I see an issue with nfs-ganesha during exit when the option
> >> 'Entries_HWMARK' gets changed. This is not related to any of the above
> >> patches (or rather Gluster) and I am currently debugging it.
> >> 
> >> Thanks,
> >> Soumya
> >> 
> >> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:
> >>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096
> >>> 
> >>> Before find . -type f:
> >>> 
> >>> root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>> 
> >>> After:
> >>> 
> >>> root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>> 
> >>> ~250M leak.
> >>> 
> >>> 2. test with default values (after ganesha restart)
> >>> 
> >>> Before:
> >>> 
> >>> root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>> 
> >>> After:
> >>> 
> >>> root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>> 
> >>> ~159M leak.
> >>> 
> >>> No reasonable correlation detected. Second test was finished much
> >>> faster than
> >>> first (I guess, server-side GlusterFS cache or server kernel page
> >>> cache

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-05 Thread Oleksandr Natalenko

Correct, I used FUSE mount. Shouldn't gfapi be used by FUSE mount helper (/
usr/bin/glusterfs)?

On вівторок, 5 січня 2016 р. 22:52:25 EET Soumya Koduri wrote:
> On 01/05/2016 05:56 PM, Oleksandr Natalenko wrote:
> > Unfortunately, both patches didn't make any difference for me.
> > 
> > I've patched 3.7.6 with both patches, recompiled and installed patched
> > GlusterFS package on client side and mounted volume with ~2M of files.
> > The I performed usual tree traverse with simple "find".
> > 
> > Memory RES value went from ~130M at the moment of mounting to ~1.5G
> > after traversing the volume for ~40 mins. Valgrind log still shows lots
> > of leaks. Here it is:
> > 
> > https://gist.github.com/56906ca6e657c4ffa4a1
> 
> Looks like you had done fuse mount. The patches which I have pasted
> below apply to gfapi/nfs-ganesha applications.
> 
> Also, to resolve the nfs-ganesha issue which I had mentioned below  (in
> case if Entries_HWMARK option gets changed), I have posted below fix -
>   https://review.gerrithub.io/#/c/258687
> 
> Thanks,
> Soumya
> 
> > Ideas?
> > 
> > 05.01.2016 12:31, Soumya Koduri написав:
> >> I tried to debug the inode* related leaks and seen some improvements
> >> after applying the below patches when ran the same test (but will
> >> smaller load). Could you please apply those patches & confirm the
> >> same?
> >> 
> >> a) http://review.gluster.org/13125
> >> 
> >> This will fix the inodes & their ctx related leaks during unexport and
> >> the program exit. Please check the valgrind output after applying the
> >> patch. It should not list any inodes related memory as lost.
> >> 
> >> b) http://review.gluster.org/13096
> >> 
> >> The reason the change in Entries_HWMARK (in your earlier mail) dint
> >> have much effect is that the inode_nlookup count doesn't become zero
> >> for those handles/inodes being closed by ganesha. Hence those inodes
> >> shall get added to inode lru list instead of purge list which shall
> >> get forcefully purged only when the number of gfapi inode table
> >> entries reaches its limit (which is 137012).
> >> 
> >> This patch fixes those 'nlookup' counts. Please apply this patch and
> >> reduce 'Entries_HWMARK' to much lower value and check if it decreases
> >> the in-memory being consumed by ganesha process while being active.
> >> 
> >> CACHEINODE {
> >> 
> >> Entries_HWMark = 500;
> >> 
> >> }
> >> 
> >> 
> >> Note: I see an issue with nfs-ganesha during exit when the option
> >> 'Entries_HWMARK' gets changed. This is not related to any of the above
> >> patches (or rather Gluster) and I am currently debugging it.
> >> 
> >> Thanks,
> >> Soumya
> >> 
> >> On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:
> >>> 1. test with Cache_Size = 256 and Entries_HWMark = 4096
> >>> 
> >>> Before find . -type f:
> >>> 
> >>> root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>> 
> >>> After:
> >>> 
> >>> root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>> 
> >>> ~250M leak.
> >>> 
> >>> 2. test with default values (after ganesha restart)
> >>> 
> >>> Before:
> >>> 
> >>> root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>> 
> >>> After:
> >>> 
> >>> root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40
> >>> /usr/bin/
> >>> ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N
> >>> NIV_EVENT
> >>> 
> >>> ~159M leak.
> >>> 
> >>> No reasonable correlation detected. Second test was finished much
> >>> faster than
> >>> first (I guess, server-side GlusterFS cache or server kernel page
> >>> cache is the
> >>> cause).
> >>>

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-05 Thread Oleksandr Natalenko


Unfortunately, both patches didn't make any difference for me.

I've patched 3.7.6 with both patches, recompiled and installed patched 
GlusterFS package on client side and mounted volume with ~2M of files. 
The I performed usual tree traverse with simple "find".


Memory RES value went from ~130M at the moment of mounting to ~1.5G 
after traversing the volume for ~40 mins. Valgrind log still shows lots 
of leaks. Here it is:


https://gist.github.com/56906ca6e657c4ffa4a1

Ideas?

05.01.2016 12:31, Soumya Koduri написав:

I tried to debug the inode* related leaks and seen some improvements
after applying the below patches when ran the same test (but will
smaller load). Could you please apply those patches & confirm the
same?

a) http://review.gluster.org/13125

This will fix the inodes & their ctx related leaks during unexport and
the program exit. Please check the valgrind output after applying the
patch. It should not list any inodes related memory as lost.

b) http://review.gluster.org/13096

The reason the change in Entries_HWMARK (in your earlier mail) dint
have much effect is that the inode_nlookup count doesn't become zero
for those handles/inodes being closed by ganesha. Hence those inodes
shall get added to inode lru list instead of purge list which shall
get forcefully purged only when the number of gfapi inode table
entries reaches its limit (which is 137012).

This patch fixes those 'nlookup' counts. Please apply this patch and
reduce 'Entries_HWMARK' to much lower value and check if it decreases
the in-memory being consumed by ganesha process while being active.

CACHEINODE {
Entries_HWMark = 500;
}


Note: I see an issue with nfs-ganesha during exit when the option
'Entries_HWMARK' gets changed. This is not related to any of the above
patches (or rather Gluster) and I am currently debugging it.

Thanks,
Soumya


On 12/25/2015 11:34 PM, Oleksandr Natalenko wrote:

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00 
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N 
NIV_EVENT


After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39 
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N 
NIV_EVENT


~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00 
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N 
NIV_EVENT


After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40 
/usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N 
NIV_EVENT


~159M leak.

No reasonable correlation detected. Second test was finished much 
faster than
first (I guess, server-side GlusterFS cache or server kernel page 
cache is the

cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is 
memory

usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?


nfs-ganesha also has cache layer which can scale to millions of 
entries
depending on the number of files/directories being looked upon. 
However
there are parameters to tune it. So either try stat with few entries 
or
add below block in nfs-ganesha.conf file, set low limits and check 
the

difference. That may help us narrow down how much memory actually
consumed by core nfs-ganesha and gfAPI.

CACHEINODE {
	Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache 
size
	Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max 
no.

of entries in the cache.
}

Thanks,
Soumya


24.12.2015 16:32, Oleksandr Natalenko написав:

Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:
In our GlusterFS deployment we've encountered something like 
memory

leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail 
(exim+dovecot,
maildir format). Here is inode stats for both bricks and 
mountpoint:


===
Brick 1 (Server 1):

Filesystem Inodes
IUsed


  IFree IUse% Mounted on

/dev/mapper/vg_vd1_misc-lv08_mail   578768144 
10954918


  5678132262% /bricks/r6sdLV08_vd1_mail

Brick 2 (Server 2):

Filesystem Inodes
IUsed


  IFree IUse% Mounted on

/dev/mapper/vg_vd0_misc-lv07_mail   578767984

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-03 Thread Oleksandr Natalenko

Here is another Valgrind log of similar scenario but with drop_caches before 
umount:

https://gist.github.com/06997ecc8c7bce83aec1

Also, I've tried to drop caches on production VM with GluserFS volume mounted 
and memleaking for several weeks with absolutely no effect:

===
root   945  0.1 48.2 1273900 739244 ?  Ssl   2015  58:54 /usr/sbin/
glusterfs --volfile-server=server.example.com --volfile-id=volume /mnt/volume
===

The numbers above stayed the same before drop as well as 5 mins after the 
drop.

On неділя, 3 січня 2016 р. 13:35:51 EET Vijay Bellur wrote:
> /proc/sys/vm/drop_caches


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-03 Thread Oleksandr Natalenko

Another Valgrind run.

I did the following:

===
valgrind --leak-check=full --show-leak-kinds=all --log-
file="valgrind_fuse.log" /usr/bin/glusterfs -N --volfile-
server=some.server.com --volfile-id=somevolume /mnt/volume
===

then cd to /mnt/volume and find . -type f. After traversing some part of 
hierarchy I've stopped find and did umount /mnt/volume. Here is 
valgrind_fuse.log file:

https://gist.github.com/7e2679e1e72e48f75a2b

On четвер, 31 грудня 2015 р. 14:09:03 EET Soumya Koduri wrote:
> On 12/28/2015 02:32 PM, Soumya Koduri wrote:
> > - Original Message -
> > 
> >> From: "Pranith Kumar Karampuri" 
> >> To: "Oleksandr Natalenko" , "Soumya Koduri"
> >>  Cc: gluster-us...@gluster.org,
> >> gluster-devel@gluster.org
> >> Sent: Monday, December 28, 2015 9:32:07 AM
> >> Subject: Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS
> >> FUSE client>> 
> >> On 12/26/2015 04:45 AM, Oleksandr Natalenko wrote:
> >>> Also, here is valgrind output with our custom tool, that does GlusterFS
> >>> volume
> >>> traversing (with simple stats) just like find tool. In this case
> >>> NFS-Ganesha
> >>> is not used.
> >>> 
> >>> https://gist.github.com/e4602a50d3c98f7a2766
> >> 
> >> hi Oleksandr,
> >> 
> >> I went through the code. Both NFS Ganesha and the custom tool use
> >> 
> >> gfapi and the leak is stemming from that. I am not very familiar with
> >> this part of code but there seems to be one inode_unref() that is
> >> missing in failure path of resolution. Not sure if that is corresponding
> >> to the leaks.
> >> 
> >> Soumya,
> >> 
> >>  Could this be the issue? review.gluster.org seems to be down. So
> >> 
> >> couldn't send the patch. Please ping me on IRC.
> >> diff --git a/api/src/glfs-resolve.c b/api/src/glfs-resolve.c
> >> index b5efcba..52b538b 100644
> >> --- a/api/src/glfs-resolve.c
> >> +++ b/api/src/glfs-resolve.c
> >> @@ -467,9 +467,11 @@ priv_glfs_resolve_at (struct glfs *fs, xlator_t
> >> *subvol, inode_t *at,
> >> 
> >>   }
> >>   
> >>   }
> >> 
> >> -   if (parent && next_component)
> >> +   if (parent && next_component) {
> >> +   inode_unref (parent);
> >> +   parent = NULL;
> >> 
> >>   /* resolution failed mid-way */
> >>   goto out;
> >> 
> >> +}
> >> 
> >>   /* At this point, all components up to the last parent
> >>   directory
> >>   
> >>  have been resolved successfully (@parent). Resolution of
> >> 
> >> basename
> > 
> > yes. This could be one of the reasons. There are few leaks with respect to
> > inode references in gfAPI. See below.
> > 
> > 
> > On GlusterFS side, looks like majority of the leaks are related to inodes
> > and their contexts. Possible reasons which I can think of are:
> > 
> > 1) When there is a graph switch, old inode table and their entries are not
> > purged (this is a known issue). There was an effort put to fix this
> > issue. But I think it had other side-effects and hence not been applied.
> > Maybe we should revive those changes again.
> > 
> > 2) With regard to above, old entries can be purged in case if any request
> > comes with the reference to old inode (as part of 'glfs_resolve_inode'),
> > provided their reference counts are properly decremented. But this is not
> > happening at the moment in gfapi.
> > 
> > 3) Applications should hold and release their reference as needed and
> > required. There are certain fixes needed in this area as well (including
> > the fix provided by Pranith above).> 
> >  From code-inspection, have made changes to fix few leaks of case (2) &
> >  (3) with respect to gfAPI.>  
> > http://review.gluster.org/#/c/13096 (yet to test the changes)
> > 
> > I haven't yet narrowed down any suspects pertaining to only NFS-Ganesha.
> > Will re-check and update.
> I tried similar tests but with smaller set of files. I could see the
> inode_ctx leak even without graph switches involved. I suspect that
> could be because valgrind checks for memory leaks during the exit of the
> program. We call 'glfs_fini()' to cleanup the memory  being used b

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-12-25 Thread Oleksandr Natalenko

Also, here is valgrind output with our custom tool, that does GlusterFS volume 
traversing (with simple stats) just like find tool. In this case NFS-Ganesha 
is not used.

https://gist.github.com/e4602a50d3c98f7a2766

One may see GlusterFS-related leaks here as well.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
> > Another addition: it seems to be GlusterFS API library memory leak
> > because NFS-Ganesha also consumes huge amount of memory while doing
> > ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
> > usage:
> > 
> > ===
> > root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
> > /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
> > /etc/ganesha/ganesha.conf -N NIV_EVENT
> > ===
> > 
> > 1.4G is too much for simple stat() :(.
> > 
> > Ideas?
> 
> nfs-ganesha also has cache layer which can scale to millions of entries
> depending on the number of files/directories being looked upon. However
> there are parameters to tune it. So either try stat with few entries or
> add below block in nfs-ganesha.conf file, set low limits and check the
> difference. That may help us narrow down how much memory actually
> consumed by core nfs-ganesha and gfAPI.
> 
> CACHEINODE {
>   Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
>   Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max no.
> of entries in the cache.
> }
> 
> Thanks,
> Soumya
> 
> > 24.12.2015 16:32, Oleksandr Natalenko написав:
> >> Still actual issue for 3.7.6. Any suggestions?
> >> 
> >> 24.09.2015 10:14, Oleksandr Natalenko написав:
> >>> In our GlusterFS deployment we've encountered something like memory
> >>> leak in GlusterFS FUSE client.
> >>> 
> >>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
> >>> maildir format). Here is inode stats for both bricks and mountpoint:
> >>> 
> >>> ===
> >>> Brick 1 (Server 1):
> >>> 
> >>> Filesystem InodesIUsed
> >>> 
> >>>  IFree IUse% Mounted on
> >>> 
> >>> /dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918
> >>> 
> >>>  5678132262% /bricks/r6sdLV08_vd1_mail
> >>> 
> >>> Brick 2 (Server 2):
> >>> 
> >>> Filesystem InodesIUsed
> >>> 
> >>>  IFree IUse% Mounted on
> >>> 
> >>> /dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913
> >>> 
> >>>  5678130712% /bricks/r6sdLV07_vd0_mail
> >>> 
> >>> Mountpoint (Server 3):
> >>> 
> >>> Filesystem  InodesIUsed  IFree
> >>> IUse% Mounted on
> >>> glusterfs.xxx:mail   578767760 10954915  567812845
> >>> 2% /var/spool/mail/virtual
> >>> ===
> >>> 
> >>> glusterfs.xxx domain has two A records for both Server 1 and Server 2.
> >>> 
> >>> Here is volume info:
> >>> 
> >>> ===
> >>> Volume Name: mail
> >>> Type: Replicate
> >>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
> >>> Status: Started
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>> Options Reconfigured:
> >>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
> >>> features.cache-invalidation-timeout: 10
> >>> performance.stat-prefetch: off
> >>> performance.quick-read: on
> >>> performance.read-ahead: off
> >>> performance.flush-behind: on
> >>> performance.write-behind: on
> >>> performance.io-thread-count: 4
> >>> performance.cache-max-file-size: 1048576
> >>> performance.cache-size: 67108864
> >>> performance.readdir-ahead: off
> >>> ===
> >>> 
> >>> Soon enough after mounting and exim/dovecot start, glusterfs client
> >>> process begins to consume huge amount of RAM:
> >>> 
> >>> ===
> >>> user@server3 ~$ ps aux | grep glusterfs | grep mail
> >>> root 28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-12-25 Thread Oleksandr Natalenko

OK, I've rebuild GlusterFS v3.7.6 with debug enabled as well as NFS-Ganesha 
with debug enabled as well (and libc allocator).

Here is my test steps:

1. launch nfs-ganesha:

valgrind --leak-check=full --show-leak-kinds=all --log-file="valgrind.log" /
opt/nfs-ganesha/bin/ganesha.nfsd -F -L ./ganesha.log -f ./ganesha.conf -N 
NIV_EVENT

2. mount NFS share:

mount -t nfs4 127.0.0.1:/share share -o 
defaults,_netdev,minorversion=2,noac,noacl,lookupcache=none,timeo=100

3. cd to share and run find . for some time

4. CTRL+C find, unmount share.

5. CTRL+C NFS-Ganesha.

Here is full valgrind output:

https://gist.github.com/eebd9f94ababd8130d49

One may see the probability of massive leaks at the end of valgrind output 
related to both GlusterFS and NFS-Ganesha code.

On пʼятниця, 25 грудня 2015 р. 23:29:07 EET Soumya Koduri wrote:
> On 12/25/2015 08:56 PM, Oleksandr Natalenko wrote:
> > What units Cache_Size is measured in? Bytes?
> 
> Its actually (Cache_Size * sizeof_ptr) bytes. If possible, could you
> please run ganesha process under valgrind? Will help in detecting leaks.
> 
> Thanks,
> Soumya
> 
> > 25.12.2015 16:58, Soumya Koduri написав:
> >> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
> >>> Another addition: it seems to be GlusterFS API library memory leak
> >>> because NFS-Ganesha also consumes huge amount of memory while doing
> >>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
> >>> usage:
> >>> 
> >>> ===
> >>> root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
> >>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
> >>> /etc/ganesha/ganesha.conf -N NIV_EVENT
> >>> ===
> >>> 
> >>> 1.4G is too much for simple stat() :(.
> >>> 
> >>> Ideas?
> >> 
> >> nfs-ganesha also has cache layer which can scale to millions of
> >> entries depending on the number of files/directories being looked
> >> upon. However there are parameters to tune it. So either try stat with
> >> few entries or add below block in nfs-ganesha.conf file, set low
> >> limits and check the difference. That may help us narrow down how much
> >> memory actually consumed by core nfs-ganesha and gfAPI.
> >> 
> >> CACHEINODE {
> >> 
> >> Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache
> >> 
> >> size
> >> 
> >> Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max
> >> 
> >> no. of entries in the cache.
> >> }
> >> 
> >> Thanks,
> >> Soumya
> >> 
> >>> 24.12.2015 16:32, Oleksandr Natalenko написав:
> >>>> Still actual issue for 3.7.6. Any suggestions?
> >>>> 
> >>>> 24.09.2015 10:14, Oleksandr Natalenko написав:
> >>>>> In our GlusterFS deployment we've encountered something like memory
> >>>>> leak in GlusterFS FUSE client.
> >>>>> 
> >>>>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
> >>>>> maildir format). Here is inode stats for both bricks and mountpoint:
> >>>>> 
> >>>>> ===
> >>>>> Brick 1 (Server 1):
> >>>>> 
> >>>>> Filesystem Inodes IUsed
> >>>>> 
> >>>>>  IFree IUse% Mounted on
> >>>>> 
> >>>>> /dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918
> >>>>> 
> >>>>>  5678132262% /bricks/r6sdLV08_vd1_mail
> >>>>> 
> >>>>> Brick 2 (Server 2):
> >>>>> 
> >>>>> Filesystem Inodes IUsed
> >>>>> 
> >>>>>  IFree IUse% Mounted on
> >>>>> 
> >>>>> /dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913
> >>>>> 
> >>>>>  5678130712% /bricks/r6sdLV07_vd0_mail
> >>>>> 
> >>>>> Mountpoint (Server 3):
> >>>>> 
> >>>>> Filesystem  InodesIUsed  IFree
> >>>>> IUse% Mounted on
> >>>>> glusterfs.xxx:mail   578767760 10954915  567812845
> >>>>> 2% /var/spool/mail/virtual
> >>>>> ===
> >>>>> 
> >>>>> glusterfs.xxx domain ha

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-12-25 Thread Oleksandr Natalenko

1. test with Cache_Size = 256 and Entries_HWMark = 4096

Before find . -type f:

root  3120  0.6 11.0 879120 208408 ?   Ssl  17:39   0:00 /usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

After:

root  3120 11.4 24.3 1170076 458168 ?  Ssl  17:39  13:39 /usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

~250M leak.

2. test with default values (after ganesha restart)

Before:

root 24937  1.3 10.4 875016 197808 ?   Ssl  19:39   0:00 /usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

After:

root 24937  3.5 18.9 1022544 356340 ?  Ssl  19:39   0:40 /usr/bin/
ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

~159M leak.

No reasonable correlation detected. Second test was finished much faster than 
first (I guess, server-side GlusterFS cache or server kernel page cache is the 
cause).

There are ~1.8M files on this test volume.

On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
> > Another addition: it seems to be GlusterFS API library memory leak
> > because NFS-Ganesha also consumes huge amount of memory while doing
> > ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
> > usage:
> > 
> > ===
> > root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
> > /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
> > /etc/ganesha/ganesha.conf -N NIV_EVENT
> > ===
> > 
> > 1.4G is too much for simple stat() :(.
> > 
> > Ideas?
> 
> nfs-ganesha also has cache layer which can scale to millions of entries
> depending on the number of files/directories being looked upon. However
> there are parameters to tune it. So either try stat with few entries or
> add below block in nfs-ganesha.conf file, set low limits and check the
> difference. That may help us narrow down how much memory actually
> consumed by core nfs-ganesha and gfAPI.
> 
> CACHEINODE {
>   Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
>   Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max no.
> of entries in the cache.
> }
> 
> Thanks,
> Soumya
> 
> > 24.12.2015 16:32, Oleksandr Natalenko написав:
> >> Still actual issue for 3.7.6. Any suggestions?
> >> 
> >> 24.09.2015 10:14, Oleksandr Natalenko написав:
> >>> In our GlusterFS deployment we've encountered something like memory
> >>> leak in GlusterFS FUSE client.
> >>> 
> >>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
> >>> maildir format). Here is inode stats for both bricks and mountpoint:
> >>> 
> >>> ===
> >>> Brick 1 (Server 1):
> >>> 
> >>> Filesystem InodesIUsed
> >>> 
> >>>  IFree IUse% Mounted on
> >>> 
> >>> /dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918
> >>> 
> >>>  5678132262% /bricks/r6sdLV08_vd1_mail
> >>> 
> >>> Brick 2 (Server 2):
> >>> 
> >>> Filesystem InodesIUsed
> >>> 
> >>>  IFree IUse% Mounted on
> >>> 
> >>> /dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913
> >>> 
> >>>  5678130712% /bricks/r6sdLV07_vd0_mail
> >>> 
> >>> Mountpoint (Server 3):
> >>> 
> >>> Filesystem  InodesIUsed  IFree
> >>> IUse% Mounted on
> >>> glusterfs.xxx:mail   578767760 10954915  567812845
> >>> 2% /var/spool/mail/virtual
> >>> ===
> >>> 
> >>> glusterfs.xxx domain has two A records for both Server 1 and Server 2.
> >>> 
> >>> Here is volume info:
> >>> 
> >>> ===
> >>> Volume Name: mail
> >>> Type: Replicate
> >>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
> >>> Status: Started
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>> Options Reconfigured:
> >>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
> >>> features.cache-invalidation-timeout: 10
> >>> performance.stat-prefetch: off
> >>> perfo

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-12-25 Thread Oleksandr Natalenko


What units Cache_Size is measured in? Bytes?

25.12.2015 16:58, Soumya Koduri написав:

On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:

Another addition: it seems to be GlusterFS API library memory leak
because NFS-Ganesha also consumes huge amount of memory while doing
ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
usage:

===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N NIV_EVENT
===

1.4G is too much for simple stat() :(.

Ideas?

nfs-ganesha also has cache layer which can scale to millions of
entries depending on the number of files/directories being looked
upon. However there are parameters to tune it. So either try stat with
few entries or add below block in nfs-ganesha.conf file, set low
limits and check the difference. That may help us narrow down how much
memory actually consumed by core nfs-ganesha and gfAPI.

CACHEINODE {
Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
Entries_HWMark(uint32, range 1 to UINT32_MAX, default 10); #Max
no. of entries in the cache.
}

Thanks,
Soumya



24.12.2015 16:32, Oleksandr Natalenko написав:

Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:

In our GlusterFS deployment we've encountered something like memory
leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
maildir format). Here is inode stats for both bricks and mountpoint:

===
Brick 1 (Server 1):

Filesystem Inodes
IUsed

 IFree IUse% Mounted on
/dev/mapper/vg_vd1_misc-lv08_mail   578768144 
10954918

 5678132262% /bricks/r6sdLV08_vd1_mail

Brick 2 (Server 2):

Filesystem Inodes
IUsed

 IFree IUse% Mounted on
/dev/mapper/vg_vd0_misc-lv07_mail   578767984 
10954913

 5678130712% /bricks/r6sdLV07_vd0_mail

Mountpoint (Server 3):

Filesystem  InodesIUsed  IFree
IUse% Mounted on
glusterfs.xxx:mail   578767760 10954915  567812845
2% /var/spool/mail/virtual
===

glusterfs.xxx domain has two A records for both Server 1 and Server 
2.


Here is volume info:

===
Volume Name: mail
Type: Replicate
Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
Options Reconfigured:
nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
features.cache-invalidation-timeout: 10
performance.stat-prefetch: off
performance.quick-read: on
performance.read-ahead: off
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 4
performance.cache-max-file-size: 1048576
performance.cache-size: 67108864
performance.readdir-ahead: off
===

Soon enough after mounting and exim/dovecot start, glusterfs client
process begins to consume huge amount of RAM:

===
user@server3 ~$ ps aux | grep glusterfs | grep mail
root 28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05
/usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
--volfile-server=glusterfs.xxx --volfile-id=mail
/var/spool/mail/virtual
===

That is, ~15 GiB of RAM.

Also we've tried to use mountpoint withing separate KVM VM with 2 or 
3

GiB of RAM, and soon after starting mail daemons got OOM killer for
glusterfs client process.

Mounting same share via NFS works just fine. Also, we have much less
iowait and loadavg on client side with NFS.

Also, we've tried to change IO threads count and cache size in order
to limit memory usage with no luck. As you can see, total cache size
is 4×64==256 MiB (compare to 15 GiB).

Enabling-disabling stat-prefetch, read-ahead and readdir-ahead 
didn't

help as well.

Here are volume memory stats:

===
Memory status for volume : mail
--
Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Mallinfo

Arena: 36859904
Ordblks  : 10357
Smblks   : 519
Hblks: 21
Hblkhd   : 30515200
Usmblks  : 0
Fsmblks  : 53440
Uordblks : 18604144
Fordblks : 18255760
Keepcost : 114112

Mempool Stats
-
NameHotCount ColdCount PaddedSizeof
AllocCount MaxAlloc   Misses Max-StdAlloc
 - 
--   
mail-server:fd_t   0  1024  108
30773120  13700
mail-server:dentry_t   16110   274   84
23567614816384  1106499 1152
mail-server:inode_t1636321  156
23721687616384  1876651 1169
mail-trash:fd_t0  1024  108
  0000
mail-trash

Re: [Gluster-devel] Memory leak in GlusterFS FUSE client

2015-12-24 Thread Oleksandr Natalenko

Another addition: it seems to be GlusterFS API library memory leak 
because NFS-Ganesha also consumes huge amount of memory while doing 
ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory 
usage:


===
root  5416 34.2 78.5 2047176 1480552 ? Ssl  12:02 117:54 
/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f 
/etc/ganesha/ganesha.conf -N NIV_EVENT

===

1.4G is too much for simple stat() :(.

Ideas?

24.12.2015 16:32, Oleksandr Natalenko написав:

Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:

In our GlusterFS deployment we've encountered something like memory
leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
maildir format). Here is inode stats for both bricks and mountpoint:

===
Brick 1 (Server 1):

Filesystem InodesIUsed
 IFree IUse% Mounted on
/dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918
 5678132262% /bricks/r6sdLV08_vd1_mail

Brick 2 (Server 2):

Filesystem InodesIUsed
 IFree IUse% Mounted on
/dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913
 5678130712% /bricks/r6sdLV07_vd0_mail

Mountpoint (Server 3):

Filesystem  InodesIUsed  IFree
IUse% Mounted on
glusterfs.xxx:mail   578767760 10954915  567812845
2% /var/spool/mail/virtual
===

glusterfs.xxx domain has two A records for both Server 1 and Server 2.

Here is volume info:

===
Volume Name: mail
Type: Replicate
Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
Options Reconfigured:
nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
features.cache-invalidation-timeout: 10
performance.stat-prefetch: off
performance.quick-read: on
performance.read-ahead: off
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 4
performance.cache-max-file-size: 1048576
performance.cache-size: 67108864
performance.readdir-ahead: off
===

Soon enough after mounting and exim/dovecot start, glusterfs client
process begins to consume huge amount of RAM:

===
user@server3 ~$ ps aux | grep glusterfs | grep mail
root 28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05
/usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
--volfile-server=glusterfs.xxx --volfile-id=mail
/var/spool/mail/virtual
===

That is, ~15 GiB of RAM.

Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
GiB of RAM, and soon after starting mail daemons got OOM killer for
glusterfs client process.

Mounting same share via NFS works just fine. Also, we have much less
iowait and loadavg on client side with NFS.

Also, we've tried to change IO threads count and cache size in order
to limit memory usage with no luck. As you can see, total cache size
is 4×64==256 MiB (compare to 15 GiB).

Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
help as well.

Here are volume memory stats:

===
Memory status for volume : mail
--
Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Mallinfo

Arena: 36859904
Ordblks  : 10357
Smblks   : 519
Hblks: 21
Hblkhd   : 30515200
Usmblks  : 0
Fsmblks  : 53440
Uordblks : 18604144
Fordblks : 18255760
Keepcost : 114112

Mempool Stats
-
NameHotCount ColdCount PaddedSizeof
AllocCount MaxAlloc   Misses Max-StdAlloc
 - 
--   
mail-server:fd_t   0  1024  108
30773120  13700
mail-server:dentry_t   16110   274   84
23567614816384  1106499 1152
mail-server:inode_t1636321  156
23721687616384  1876651 1169
mail-trash:fd_t0  1024  108
  0000
mail-trash:dentry_t0 32768   84
  0000
mail-trash:inode_t 4 32764  156
  4400
mail-trash:trash_local_t   064 8628
  0000
mail-changetimerecorder:gf_ctr_local_t 064
16540  0000
mail-changelog:rpcsvc_request_t 0 8 2828
   0000
mail-changelog:changelog_local_t 064  116
0000
mail-bitrot-stub:br_stub_local_t 0   512   84
79204400
mail-locks:pl_local_t  032

Re: [Gluster-devel] Memory leak in GlusterFS FUSE client

2015-12-24 Thread Oleksandr Natalenko


Still actual issue for 3.7.6. Any suggestions?

24.09.2015 10:14, Oleksandr Natalenko написав:

In our GlusterFS deployment we've encountered something like memory
leak in GlusterFS FUSE client.

We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
maildir format). Here is inode stats for both bricks and mountpoint:

===
Brick 1 (Server 1):

Filesystem InodesIUsed
 IFree IUse% Mounted on
/dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918
 5678132262% /bricks/r6sdLV08_vd1_mail

Brick 2 (Server 2):

Filesystem InodesIUsed
 IFree IUse% Mounted on
/dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913
 5678130712% /bricks/r6sdLV07_vd0_mail

Mountpoint (Server 3):

Filesystem  InodesIUsed  IFree
IUse% Mounted on
glusterfs.xxx:mail   578767760 10954915  567812845
2% /var/spool/mail/virtual
===

glusterfs.xxx domain has two A records for both Server 1 and Server 2.

Here is volume info:

===
Volume Name: mail
Type: Replicate
Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
Options Reconfigured:
nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
features.cache-invalidation-timeout: 10
performance.stat-prefetch: off
performance.quick-read: on
performance.read-ahead: off
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 4
performance.cache-max-file-size: 1048576
performance.cache-size: 67108864
performance.readdir-ahead: off
===

Soon enough after mounting and exim/dovecot start, glusterfs client
process begins to consume huge amount of RAM:

===
user@server3 ~$ ps aux | grep glusterfs | grep mail
root 28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05
/usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
--volfile-server=glusterfs.xxx --volfile-id=mail
/var/spool/mail/virtual
===

That is, ~15 GiB of RAM.

Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
GiB of RAM, and soon after starting mail daemons got OOM killer for
glusterfs client process.

Mounting same share via NFS works just fine. Also, we have much less
iowait and loadavg on client side with NFS.

Also, we've tried to change IO threads count and cache size in order
to limit memory usage with no luck. As you can see, total cache size
is 4×64==256 MiB (compare to 15 GiB).

Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
help as well.

Here are volume memory stats:

===
Memory status for volume : mail
--
Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Mallinfo

Arena: 36859904
Ordblks  : 10357
Smblks   : 519
Hblks: 21
Hblkhd   : 30515200
Usmblks  : 0
Fsmblks  : 53440
Uordblks : 18604144
Fordblks : 18255760
Keepcost : 114112

Mempool Stats
-
NameHotCount ColdCount PaddedSizeof
AllocCount MaxAlloc   Misses Max-StdAlloc
 - 
--   
mail-server:fd_t   0  1024  108
30773120  13700
mail-server:dentry_t   16110   274   84
23567614816384  1106499 1152
mail-server:inode_t1636321  156
23721687616384  1876651 1169
mail-trash:fd_t0  1024  108
  0000
mail-trash:dentry_t0 32768   84
  0000
mail-trash:inode_t 4 32764  156
  4400
mail-trash:trash_local_t   064 8628
  0000
mail-changetimerecorder:gf_ctr_local_t 064
16540  0000
mail-changelog:rpcsvc_request_t 0 8 2828
   0000
mail-changelog:changelog_local_t 064  116
0000
mail-bitrot-stub:br_stub_local_t 0   512   84
79204400
mail-locks:pl_local_t  032  148
6812757400
mail-upcall:upcall_local_t 0   512  108
  0000
mail-marker:marker_local_t 0   128  332
64980300
mail-quota:quota_local_t   064  476
  0000
mail-server:rpcsvc_request_t   0   512 2828
45462533   3400
glusterfs:struct saved_frame   0

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-10-13 Thread Oleksandr Natalenko

Here are two consecutive statedumps of brick in question memory usage 
[1] [2]. glusterfs client process went from ~630 MB to ~1350 MB of 
memory usage in less than one hour.


Volume options:

===
cluster.lookup-optimize: on
cluster.readdir-optimize: on
client.event-threads: 4
network.inode-lru-limit: 4096
server.event-threads: 8
performance.client-io-threads: on
storage.linux-aio: on
performance.write-behind-window-size: 4194304
performance.stat-prefetch: on
performance.quick-read: on
performance.read-ahead: on
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 4
performance.cache-max-file-size: 1048576
performance.cache-size: 33554432
performance.readdir-ahead: on
===

I observe such a behavior on similar volumes where millions of files are 
stored. The volume in question holds ~11M of small files (mail storage).


So, memleak persists. Had to switch to NFS temporarily :(.

Any idea?

[1] https://gist.github.com/46697b70ffe193fa797e
[2] https://gist.github.com/3a968ca909bfdeb31cca

28.09.2015 14:31, Raghavendra Bhat написав:

Hi Oleksandr,

You are right. The description should have said it as the limit on the
number of inodes in the lru list of the inode cache. I have sent a
patch for that.
http://review.gluster.org/#/c/12242/ [3]

Regards,
Raghavendra Bhat

On Thu, Sep 24, 2015 at 1:44 PM, Oleksandr Natalenko
 wrote:


I've checked statedump of volume in question and haven't found lots
of iobuf as mentioned in that bugreport.

However, I've noticed that there are lots of LRU records like this:

===
[conn.1.bound_xl./bricks/r6sdLV07_vd0_mail/mail.lru.1]
gfid=c4b29310-a19d-451b-8dd1-b3ac2d86b595
nlookup=1
fd-count=0
ref=0
ia_type=1
===

In fact, there are 16383 of them. I've checked "gluster volume set
help" in order to find something LRU-related and have found this:

===
Option: network.inode-lru-limit
Default Value: 16384
Description: Specifies the maximum megabytes of memory to be used in
the inode cache.
===

Is there error in description stating "maximum megabytes of memory"?
Shouldn't it mean "maximum amount of LRU records"? If no, is that
true, that inode cache could grow up to 16 GiB for client, and one
must lower network.inode-lru-limit value?

Another thought: we've enabled write-behind, and the default
write-behind-window-size value is 1 MiB. So, one may conclude that
with lots of small files written, write-behind buffer could grow up
to inode-lru-limit×write-behind-window-size=16 GiB? Who could
explain that to me?

24.09.2015 10:42, Gabi C write:


oh, my bad...
coulb be this one?

https://bugzilla.redhat.com/show_bug.cgi?id=1126831 [1] [2]
Anyway, on ovirt+gluster w I experienced similar behavior...


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel [2]




Links:
--
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1126831
[2] http://www.gluster.org/mailman/listinfo/gluster-devel
[3] http://review.gluster.org/#/c/12242/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-09-24 Thread Oleksandr Natalenko

I've checked statedump of volume in question and haven't found lots of 
iobuf as mentioned in that bugreport.


However, I've noticed that there are lots of LRU records like this:

===
[conn.1.bound_xl./bricks/r6sdLV07_vd0_mail/mail.lru.1]
gfid=c4b29310-a19d-451b-8dd1-b3ac2d86b595
nlookup=1
fd-count=0
ref=0
ia_type=1
===

In fact, there are 16383 of them. I've checked "gluster volume set help" 
in order to find something LRU-related and have found this:


===
Option: network.inode-lru-limit
Default Value: 16384
Description: Specifies the maximum megabytes of memory to be used in the 
inode cache.

===

Is there error in description stating "maximum megabytes of memory"? 
Shouldn't it mean "maximum amount of LRU records"? If no, is that true, 
that inode cache could grow up to 16 GiB for client, and one must lower 
network.inode-lru-limit value?


Another thought: we've enabled write-behind, and the default 
write-behind-window-size value is 1 MiB. So, one may conclude that with 
lots of small files written, write-behind buffer could grow up to 
inode-lru-limit×write-behind-window-size=16 GiB? Who could explain that 
to me?


24.09.2015 10:42, Gabi C write:

oh, my bad...
coulb be this one?

https://bugzilla.redhat.com/show_bug.cgi?id=1126831 [2]
Anyway, on ovirt+gluster w I experienced similar behavior...

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2015-09-24 Thread Oleksandr Natalenko


We use bare GlusterFS installation with no oVirt involved.

24.09.2015 10:29, Gabi C wrote:

google vdsm memory leak..it's been discussed on list last year and
earlier this one...

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Memory leak in GlusterFS FUSE client

2015-09-24 Thread Oleksandr Natalenko

In our GlusterFS deployment we've encountered something like memory leak 
in GlusterFS FUSE client.


We use replicated (×2) GlusterFS volume to store mail (exim+dovecot, 
maildir format). Here is inode stats for both bricks and mountpoint:


===
Brick 1 (Server 1):

Filesystem InodesIUsed   
   IFree IUse% Mounted on
/dev/mapper/vg_vd1_misc-lv08_mail   578768144 10954918  
5678132262% /bricks/r6sdLV08_vd1_mail


Brick 2 (Server 2):

Filesystem InodesIUsed   
   IFree IUse% Mounted on
/dev/mapper/vg_vd0_misc-lv07_mail   578767984 10954913  
5678130712% /bricks/r6sdLV07_vd0_mail


Mountpoint (Server 3):

Filesystem  InodesIUsed  IFree IUse% 
Mounted on
glusterfs.xxx:mail   578767760 10954915  5678128452% 
/var/spool/mail/virtual

===

glusterfs.xxx domain has two A records for both Server 1 and Server 2.

Here is volume info:

===
Volume Name: mail
Type: Replicate
Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
Options Reconfigured:
nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
features.cache-invalidation-timeout: 10
performance.stat-prefetch: off
performance.quick-read: on
performance.read-ahead: off
performance.flush-behind: on
performance.write-behind: on
performance.io-thread-count: 4
performance.cache-max-file-size: 1048576
performance.cache-size: 67108864
performance.readdir-ahead: off
===

Soon enough after mounting and exim/dovecot start, glusterfs client 
process begins to consume huge amount of RAM:


===
user@server3 ~$ ps aux | grep glusterfs | grep mail
root 28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05 
/usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable 
--volfile-server=glusterfs.xxx --volfile-id=mail /var/spool/mail/virtual

===

That is, ~15 GiB of RAM.

Also we've tried to use mountpoint withing separate KVM VM with 2 or 3 
GiB of RAM, and soon after starting mail daemons got OOM killer for 
glusterfs client process.


Mounting same share via NFS works just fine. Also, we have much less 
iowait and loadavg on client side with NFS.


Also, we've tried to change IO threads count and cache size in order to 
limit memory usage with no luck. As you can see, total cache size is 
4×64==256 MiB (compare to 15 GiB).


Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't 
help as well.


Here are volume memory stats:

===
Memory status for volume : mail
--
Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
Mallinfo

Arena: 36859904
Ordblks  : 10357
Smblks   : 519
Hblks: 21
Hblkhd   : 30515200
Usmblks  : 0
Fsmblks  : 53440
Uordblks : 18604144
Fordblks : 18255760
Keepcost : 114112

Mempool Stats
-
NameHotCount ColdCount PaddedSizeof 
AllocCount MaxAlloc   Misses Max-StdAlloc
 -  
--   
mail-server:fd_t   0  1024  108   
30773120  13700
mail-server:dentry_t   16110   274   84  
23567614816384  1106499 1152
mail-server:inode_t1636321  156  
23721687616384  1876651 1169
mail-trash:fd_t0  1024  108  
0000
mail-trash:dentry_t0 32768   84  
0000
mail-trash:inode_t 4 32764  156  
4400
mail-trash:trash_local_t   064 8628  
0000
mail-changetimerecorder:gf_ctr_local_t 06416540  
0000
mail-changelog:rpcsvc_request_t 0 8 2828 
 0000
mail-changelog:changelog_local_t 064  116
  0000
mail-bitrot-stub:br_stub_local_t 0   512   84  
79204400
mail-locks:pl_local_t  032  148
6812757400
mail-upcall:upcall_local_t 0   512  108  
0000
mail-marker:marker_local_t 0   128  332  
64980300
mail-quota:quota_local_t   064  476  
0000
mail-server:rpcsvc_request_t   0   512 2828   
45462533   3400
glusterfs:struct saved_frame

[Gluster-devel] GlusterFS cache architecture

2015-08-31 Thread Oleksandr Natalenko


Hello.

I'm trying to investigate how GlusterFS manages cache on both server and
client side, but unfortunately cannot find any exhaustive, appropriate 
and up

to date information.

The disposition is that we have, saying, 2 GlusterFS nodes (server_a and
server_b) with replicated volume some_volume. Also we have several 
clients
(saying client_1 and client_2) that mount some_volume and do some 
manipulation
with files on it (lets assume some_volume contains web-related assets, 
and
client_1/client_2 are web-servers). Also there is client_3 that does 
web-
related deploying on some_volume (lets assume that client_3 is 
web-developer).


We would like to use multilayered cache scheme that involves filesystem 
cache

(on both client/server sides) as well as web server cache.

So, my questions are:

1) does caching-related items (performance.cache-size, 
performance.cache-min-
file-size, performance.cache-max-file-size etc.) affect server side 
only?

2) are there any tunables that affect client side caching?
3) how client-side caching (we are talking about read cache only, write 
cache

is not interesting to us) is performed (if it is at all)?
4) how and in what cases client cache is discarded (and how that relates 
to

upcall framework)?

Ideally, there should be some documentation that covers general 
GlusterFS

cache workflow.

Any info would be appreciated. Thanks.

--
Oleksandr post-factum Natalenko, MSc
pf-kernel community
https://natalenko.name/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

80 matches

Mail list logo